The debate CNN wanted and the debate the candidates wanted

In last night’s Republican presidential debate, the CNN moderators tried very hard to pit individual candidates against each other. Pretty much every question was along the lines of: “Candidate A, what do you think about Candidate B’s attacks on you?”

While CNN wanted to generate controversy, the disputes it tried to initiate were not necessarily the disputes that the candidates strategically wanted to have. To get a better sense for this, I tallied up two things:

  1. Moderator Prompts. These were episodes where the moderators prompted one candidate to challenge another.
  2. Real Challenges. These were episodes where either (a) the candidate took the bait from a Moderator Prompt and attacked the other candidate or (b) a candidate launched an unprompted attack on another candidate. I did not include episodes where a prompted candidate acknowledged a disagreement with another candidate in a nonconfrontational way.

The results are below. In the first graph, an arrow from one candidate to another indicates that the moderator asked the first candidate to challenge the second. The next graph shows real challenges. In both graphs, the boldness of the curve indicates how many times the event occurred.

Some observations:

  • The graph of Moderator Prompts is much denser than the graph of Real Challenges. As was clear to anyone watching the debate, the moderators wanted to generate more controversy than the candidates wanted.
  • Most of the real action in the second graph was in the Trump-Bush-Paul Triangle of Controversy.
  • The moderators tried to prompt several challenges to Ben Carson but nobody took the bait. Conversely, the moderators paid relatively little attention to Rand Paul, and yet he was part of several real disputes.
  • The interests of Donald Trump were pretty well aligned with the interests of CNN. Donald Trump was prompted many times and engaged in many challenges.

It’s also interesting to look at how many times each of the candidates challenged Hillary Clinton, who was not on the stage. Jeb Bush mentioned her on five separate occassions, whereas Donald Trump never mentioned her at all. It’s just a few data points, but it’s consistent with Jeb’s larger focus on the general election.

I doubt any of this data is very predicitive of election outcomes, but it definitely seems to be related to current polling trends. Either way, it’s pretty interesting to look at. I might make these plots for some of the other debates to see how these relationships change over time.

Update: Coincidentally, the New York Times just released a similar analysis with similar graphics. Also, special thanks to @trebor for helping me set up mouseover-triggered transparency.


New Blog Address

Welcome to the new location for The File Drawer! This blog is now hosted on Github Pages and powered by Jekyll. My old blog at will be shutting down soon.

I actually really liked WordPress, but I wanted to have a little bit more control over what I can put in my posts. In particular, I wanted to be able to insert my own JavaScript animations, for example in this post on the recent presidential debates.

It was pretty fun to migrate everything over from WordPress – which is not to say there weren’t some hiccups along the way – but I was able to do so by following some nice instructions in this post by Joshua Lande.

For those of you using RSS, the feed for should be searchable in your RSS readers, but please let me know if it’s not.


10 classic dialogues you can find on the internet

Some videos on the internet are so good that I’ve watched them twice. Below is a list of 10 of my favorite interviews and dialogues. Obviously this isn’t an endorsement of all the positions taken. I just think they are very well done and fun to watch. The last four are best watched on 1.4x speed.

  1. 1971 Michael Parkinson interviews Muhammad Ali.

  2. 1974 Michael Parkinson interviews Muhammad Ali again, with better sound.

  3. 1997 Steve Jobs interacting with the audience when announcing the Microsoft deal.

  4. 1997 Steve Jobs interacting with the audience at WWDC.

  5. 2006 Stephen Colbert interviews Eleanor Holmes Norton.

  6. 2009 Robert Wright and Joel Achenbach

  7. 2009 Tyler Cowen and Peter Singer

  8. 2010 Robert Wright and Mickey Kaus

  9. 2011 Robert Wright and Mickey Kaus

  10. 2012 Glenn Loury and Ann Althouse

If I had to recommend just one, it would be Cowen/Singer.


Across industries, we’re getting better at picking metrics

Everywhere you look, people are optimizing bad metrics. Sometimes people optimize metrics that aren’t in their self interest, like when startups focus entirely on signup counts while forgetting about retention rates. In other cases, people optimize metrics that serve their immediate short term interest but which are bad for social welfare, like when California corrections officers lobby for longer prison sentences.

The good news is that as we become a more data-driven society, there seems to be a broad trend — albeit a very slow one — towards better metrics. Take the media economy, for example. A few years ago, media companies optimized for clicks, and companies like Upworthy thrived by producing low quality content with clickbaity headlines. But now, thanks to a more sustainable business model, companies like Buzzfeed are optimizing for shares rather than clicks. It’s not perfect, but overall it’s better for consumers.

In science, researchers used to optimize for publication counts and citation counts, which biased them towards publishing surprising and interesting results that were unlikely to be true. These metrics still loom large, but increasingly scientists are beginning to optimize for other metrics like open data badges and reproducibility, although we still have a long way to go before quality metrics are effectively measured and incentivized.

In health care, hospitals used to profit by maximizing the quantity of care. Perversely, hospitals benefited whenever patients were readmitted due to infections acquired in the hospital or due to lack of adequate follow-up plan. Now, with ACA policies that penalize hospitals for avoidable readmissions, hospitals are taking real steps to improve follow-up care and to reduce hospital-acquired infections. While the metrics should be adjusted so that they don’t unfairly penalize low income hospitals, the overall emphasis on quality rather than quantity is moving things in the right direction.

We still are light years from where we need to be, and bad incentives continue to plague everything from government to finance to education. But slowly, as we get better at measuring and storing data, I think we are getting at picking the right metrics.


Independent t-tests and the 83% confidence interval: A useful trick for eyeballing your data.

Like most people who have analyzed data using frequentist statistics, I have often found myself staring at error bars and trying to guess whether my results are significant. When comparing two independent sample means, this practice is confusing and difficult. The conventions that we use for testing differences between sample means are not aligned with the conventions we use for plotting error bars. As a result, it’s fair to say that there’s a lot of confusion about this issue.

Some people believe that two independent samples have significantly different means if and only if their standard error bars (68% confidence intervals for large samples) don’t overlap. This belief is incorrect. Two samples can have nonoverlapping standard error bars and still fail to reach statistical significance at . Other people believe that two means are significantly different if and only if their 95% confidence intervals overlap. This belief is also incorrect. For one sample t-tests, it is true that significance is reached when the 95% confidence interval crosses the test parameter . But for two-sample t-tests, which are more common in research, statistical significance can occur with overlapping 95% confidence intervals.

If neither the 68% confidence interval nor the 95% confidence interval tells us anything about statistical significance, what does? In most situations, the answer is the 83.4% confidence interval. This can be seen in the figure below, which shows two samples with a barely significant difference in means (p=.05). Only the 83.4% confidence intervals shown in the third panel are barely overlapping, reflecting the barely significant results.

To understand why, let’s start by defining the t-statistic for two independent samples:

where and are the means of the two samples, and and are their standard errors. By rearranging, we can see that significant results will be barely obtained () if the following condition holds:

where 1.96 is the large sample cutoff for significance. Assuming equal standard errors (more on this later), the equation simplifies to:

On a graph, the quantity is the distance between the means. If we want our error bars to just barely touch each other, we should set the length of the half-error bar to be exactly half of this, or:

This corresponds to an 83.4% confidence interval on the normal distribution. While this result assumes a large sample size, it remains quite useful for sample sizes as low as 20. The 83.4% confidence interval can also become slightly less useful when the samples have strongly different standard errors, which can stem from very unequal sample sizes or variances. If you really want a solution that generalizes to this situation, you can set your half-error bar on your first sample to:

and make the appropriate substitutions to compute the half-error bar in your second sample. However, this solution has the undesirable property that the error bar for one sample depends on the standard error of the other sample. For most purposes, it’s probably better to just plot the 83% confidence interval. If you are eyeballing data for a project that requires frequentist statistics, it is arguably more useful than plotting the standard error or the 95% confidence interval.

Update: Jeff Rouder helpfully points me to Tryon and Lewis (2008), which presents an error bar that generalizes both to unequal standard errors and small samples. Like the last equation presented above, it has the undesirable property that the size of the error bar around a particular sample depends on both samples. But on the plus side, it’s guaranteed to tell you about significance.