Friday, December 21, 2012

Newtown-inspired data analysis

Let me share with you a truly horrible bit of data analysis.

Inspired by the analysis of 9/11 in Nate Silver's book, I plotted the mean number of years y between school massacres in which x people died, on a log-log scale.  I used the last 50 years of data from a table in Wikipedia (of unknown completeness) and put everything on a log-log plot.  The largest few counts were singletons, so the mean time is shown as 50 on the plot.  Surprisingly, this all fits a line pretty well (r^2 is about 0.57), discounting the singletons.  

Both Newtown and VA Tech fall quite close to the line I drew below (which was done by eye, but looking at the regression coefficients).  For example:  Columbine is a unique case - there is only one event with 13 deaths in the dataset - but there are several cases with 12 deaths; so extrapolating, it looks like we can "look forward" to a Columbine (or worse) every 15-20 years.  We will have a Newtown or VA Tech every 40-50 years.  (So we've had more of these than expected - but not by much).  Killings of 5-10 children can be expected to happen every 10 years or so, and probably sometime in your lifetime you will see one Breivik-scale massacre in a school - with 70-80 dead.

These events are worldwide, not US-only - but on the other hand these are only at schools, not all gun-related killings.  And US policy is very relevant since many of these events (3 of the worse 5, 4 of the worse eight) are here.  

If you've been, like me, feeling a little stunned and avoiding the news lately, be warned: Newtown is horrible, but it isn't unprecedented.  Occurrences like these are not common, but they should not be unexpected. This is just the way things will be - unless we change.

No comments: