traynor

04-29-2015, 12:27 PM

"What goes wrong most often in scientific research and data science? Statistics.

Statistical analysis is tricky to get right, even for the best and brightest. You'd be surprised how many pitfalls there are, and how many published papers succumb to them. Here's a sample:

Statistical power. Many researchers use sample sizes that are too small to detect any noteworthy effects and, failing to detect them, declare they must not exist. Even medical trials often don't have the sample size needed to detect a 50% difference in symptoms. And right turns at red lights are legal only because safety trials had inadequate sample sizes.

Truth inflation. If your sample size is too small, the only way you'll get a statistically significant result is if you get lucky and overestimate the effect you're looking for. Ever wonder why exciting new wonder drugs never work as well as first promised? Truth inflation.

The base rate fallacy. If you're screening for a rare event, there are many more opportunities for false positives than false negatives, and so most of your positive results will be false positives. That's important for cancer screening and medical tests, but it's also why surveys on the use of guns for self-defense produce exaggerated results.

Stopping rules. Why not start with a smaller sample size and increase it as necessary? This is quite common but, unless you're careful, it vastly increases the chances of exaggeration and false positives. Medical trials that stop early exaggerate their results by 30% on average."

Statistics Done Wrong: The Woefully Complete Guide

http://www.statisticsdonewrong.com/

Free.

Statistical analysis is tricky to get right, even for the best and brightest. You'd be surprised how many pitfalls there are, and how many published papers succumb to them. Here's a sample:

Statistical power. Many researchers use sample sizes that are too small to detect any noteworthy effects and, failing to detect them, declare they must not exist. Even medical trials often don't have the sample size needed to detect a 50% difference in symptoms. And right turns at red lights are legal only because safety trials had inadequate sample sizes.

Truth inflation. If your sample size is too small, the only way you'll get a statistically significant result is if you get lucky and overestimate the effect you're looking for. Ever wonder why exciting new wonder drugs never work as well as first promised? Truth inflation.

The base rate fallacy. If you're screening for a rare event, there are many more opportunities for false positives than false negatives, and so most of your positive results will be false positives. That's important for cancer screening and medical tests, but it's also why surveys on the use of guns for self-defense produce exaggerated results.

Stopping rules. Why not start with a smaller sample size and increase it as necessary? This is quite common but, unless you're careful, it vastly increases the chances of exaggeration and false positives. Medical trials that stop early exaggerate their results by 30% on average."

Statistics Done Wrong: The Woefully Complete Guide

http://www.statisticsdonewrong.com/

Free.