8.Looking Too Hard for Patterns: a post about finding spurious patterns

Today, March 14th, is Pi Day. In celebration, this post is related to the film Pi.

Check out the retro style of his computer

Pi is the first film by Darren Aronofsky, who went on to make Requiem for a Dream and Black Swan. I’ll try not to spoil too much, but the starting premise is that the main character, Max, is a mathematician/computer-scientist, who believes he can model the stock market and predict future stock behaviour, if only he finds the right model. I was recently reminded of this central quote from Pi (via Tom Crick), which can be heard in the film’s trailer:

Restate my assumptions:

  1. Mathematics is the language of nature.
  2. Everything around us can be represented and understood through numbers.
  3. If you graph these numbers, patterns emerge. Therefore: there are patterns everywhere in nature.


By stating his assumptions, Max is following the scientific process (hurrah!). This allows us to analyse his assumptions and see if he has made a mistake. Indeed — the implication of his third assumption is flawed: if you graph things, patterns do emerge — but they might well be spurious.

Google Correlate

Google have released a tool that (inadvertently?) demonstrates this wonderfully:Google Correlate. The idea is that you can enter a term and see what other search terms produce a similar trend. That sounds somewhat useful. I decided to use the term “Greenfoot”. Here’s one of the top results I got at the time (Greenfoot is blue, the matching term is red):

That’s quite a decent match, and has a correlation coefficient of 0.9477. As Max suggested, we’ve graphed the numbers, and a pattern has emerged. This red term that matches so well with Greenfoot is… “Google Images”. Not very useful, and not much of a pattern: these two terms correlate well because they originated around the same time, and have grown in search-popularity with a similar pattern ever since. But really, this seems to me to be a spurious result (technically, a “type I” error): we’ve found an effect where really there is none.

This is the problem with Max’s approach. There are patterns everywhere if you look hard enough, but that doesn’t mean that they’re useful. And this is a real problem in science, especially with measurement techniques that generate a large amount of data (on which you can then perform a large variety of analysis). One example of a troublesome area is the neuroscience technique fMRI, wheretoo many comparisons can lead to a dead fish detecting human emotions. The quality of our understanding of the human brain is dependent on statistics being applied properly… by human brains. (Recursion!)

And so in Pi, Max demonstrates the dark side of science: an obsession with finding a result that drives him so hard that he loses his impartiality and risks finding phantom results. There are techniques to mitigate this problem, called alpha-level correction, and I intend to cover some statistics in future blog posts which will explain these sorts of issues.

發佈了0 篇原創文章 · 獲贊 1 · 訪問量 5萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章