David McCoy

A member of the Gartner Blog Network

David McCoy header image 2

Bad Statistics – 89.76 % Faulty

November 6th, 2008 · 5 Comments

Scenario: Driving home late last night.   Listening to NPR when I could find it.  Heard a commentator say something to the effect of, “No Democratic president has ever won office without carrying the state of Missouri.”  Sudden urge to rip the radio from the dash and never listen to any newscast again.

I just searched for “without winning Missouri” on Google.com.  You don’t have to be smart to search nowadays – all you have to do is enter the key snippet.  Lots of results, most saying effectively the same thing: “Wow! How did Obama win without Missouri?”  Apparently, a Democrat being elected president has always coincided with Missouri’s going Democratic (but not the other way around – Missouri has predicted a Democrat and been wrong, as several sites bemoan).  Whoopee!  A way to predict the winner!   If a Democrat won then Missouri must have gone Democrat.  But this time, it failed.

Failed indeed. This is a fine example of pure statistical hype.  Good stats classes teach ways to avoid this kind of math mistake.  Missouri was never a validated predictor of the Democratic wins, and the so-called “American opinion.”  “What?  It aligns so perfectly.  How can you say that?”  Even if the next 10 Democrats carry Missouri and win the presidency, it is still an invalid predictor in my book.  I can’t even muster a reasonable case for real covariance. This was just the same bad math that has sportscasters saying:

“Well Tom… following a line-drive single, this batter has never hit the second pitch from a left-handed pitcher on a Tuesday night in the third inning”

Richard Feynman would have loved this one.  It’s the same mentality that makes people see faces in cookies and pancakes.  The brain wants to spot patterns – it is desperate for pattern detection.  When you take a big ol’ bunch of data – like all the state-by-state election results since who-knows-when – and you start to look for patterns, you know what?  You are going to find a few patterns.  Some of those patterns may hide a great predictive model – this is what well-cross-validated discriminant analysis is all about.  However, just digging in the data to find a pattern – any pattern – is a fool’s errand.  You will always find some alignment that looks like it’s predictive, but in reality, it’s just the randomness of life at work.  What real predictive capability takes is a lot of work on defining the precise criterion you are measuring, a lot of construct validity checking, and a lot of work on cross-validation against the criterion, etc.  Oh, and a little solid theory on the relationship between the independent and dependent variables wouldn’t hurt either.  In other words, sometimes what you see is just a nice reflection in the mirror, but there is nothing on the other side.

Missouri is a cool place.  And Missourians may be so close to the pulse of America that their collective voice echoes the blended wishes of the other 49 states.  But I don’t see the proof.  I don’t see the validity that I would want to see, and, in my opinion, Missouri is not a valid oracle for the American pulse, even if it names the right presidential party for as long as I live.   And that pancake that you keep locked in that safe is not a spitting image of Barney Fife.  Sorry, it’s just not real.  Come back to me with a rock-solid, statistically sound case on why Missouri is “the bellwether” – so called – against the loose criterion “How America Will Decide” and we can talk.  Do the hard work, show me how you cross-validated your predictive, discriminant model, and prove your point, and I will back down.  My stats are rusty – you may be able to take me.  But, don’t come to me with some interesting presidential election patterns and say that proves Missouri is the guide to all-things-American. Until then, just enjoy the pretty pictures randomness can draw.

P.S. Don’t get me started on “historically the stock market has always outperformed other investments” statements being used to predict a rosy future either… not that you would anytime soon.

Share:
  • Digg
  • Technorati
  • NewsVine
  • del.icio.us
  • StumbleUpon
  • Facebook
  • Google Bookmarks
  • MySpace

Tags: Academic Goings-On · Philosophy · Rabble-Rousing and General Hoopla

5 responses so far ↓

  • 1 Richard Veryard // Nov 7, 2008 at 7:58 am

    you said: “You don’t have to be smart to search nowadays – all you have to do is enter the key snippet.”

    Ah, but how do you find the key snippet?

    You don’t have to be smart to search here … but it helps.
    http://knowledgeanduncertainty.blogspot.com/2008/11/you-dont-have-to-be-smart-to-search.html

  • 2 Richard Veryard // Nov 7, 2008 at 8:29 am

    As for your main point, my old tutor, J.R. Lucas, used to call this the Dover fallacy.

    There was a young curate of Dover
    Who bowled twenty-five wides in an over,
    Which had never been done
    By a clergyman’s son
    On a Tuesday in August in Dover.

  • 3 David McCoy // Nov 7, 2008 at 10:45 am

    I love the Lucas rhyme. To be smart in my original sense means that (a) I know the full details of what I want to search for and (b) I know where to find it. In fact, I don’t have to know but a very few details and I only have to know approximately where to find it. Your blog posting seems to support this.

    My view of searching does require brain activity, but it is not the same level of “smarts” as offered by an expert in the field. My view is more that of “the detective” chasing down a few wild hunches. In the Missouri example, I only had a few snippets of text. That was all I needed. I knew nothing about the full context, but I did know enough to mine the most likely associated text in Google. Sure, that’s a level of smart searching, but it’s highly hit-and-miss, and artificial versus the domain expert’s approach. I would prefer to call that “cunning.” In my approach, I knew that the proper noun “Missouri,” and the verb “winning” and the preposition “without” would get me close enough. That same, rough template would apply in many cases.

    So, in effect, we are both right – it just comes down to the meaning of “smart.” I believe searching can be accomplished with a lower degree of domain specific knowledge, and a reasonable degree of language parsing to find the key phrases that you need – cunning. Templates and frames can substitute for domain expertise in searches, I believe. Is that smart? Well, sure… I would never admit to being a dumb searcher, just a cunning one. It’s just a different kind of smart. Now, in reality, how many of us search without some implicit domain knowledge? Your example of adding “Lucan” to the search terms is a very good example of domain knowledge, but also of speculative, cunning search that I discuss.

    Word meaning again…

  • 4 Richard Veryard // Nov 21, 2008 at 9:32 am

    Missouri has now finally lost its bellwether status. Why does anyone care? See the POSIWID blog.

    http://posiwid.blogspot.com/2008/11/missouri-loses-bellwether-status.html

  • 5 David McCoy // Nov 21, 2008 at 10:02 pm

    Regarding your comment directly above – Excellent post Richard!

Leave a Comment