Statistical Literacy
#1 #2 #3
Descriptive Statistics:Pictures are worth a thousand numbers as well as a thousand words. |
Why a histogram is better than a mean or median or even a five number summary of a set of data. | Why a scatter plot is better than an Rsquared or a
Regression Equation in summarizing the relationship between two sets of data. |
Judgment is key to adjusting the axes of the histograms and scatter plots to maximize the quality of information |
The average American has one testicle and one ovary. | Gathered data is not always good data | Correlations are not causation | Most important may be ignored by the analyst |
Has the data been massaged? Are the outliers there? | The most important facts may not be quantifiable. | Problem sets should be prioritized by civic or personal relevance. | Failure to do so is a recipe for amnesia, boredom, and poor performance. |
Inferential Statistics:
All about randomness, probability, and sample size
|
Randomness is key to getting a good sample | The bigger the sample size the closer and more confident you can be in generalizing. | Roughly: a random sample of 100: 95% confident, plus or minus 10%.
Sample of 1200” 95% confident Plus or minus 3% |
Beware the file drawer problem!
Beware Type 1 and Type 2 Errors |
Probability is the key to statistical experiments.
Has the experiment been reproduced? How many times? |
Perfect analogy is to the jury system. As the jury should assume innocent, so the statistician assumes no effect
(null hypothesis) |
Then calculates odds of getting actual result from chance alone. If extremely rare then, rejects the null hypothesis |
Data omission and factor omission are likely when issue has a partisan dimension.
|
P values are arbitrary. P values should be stated a priori. P values should be thought about. |
Chi square calculations can be completely misleading.
|
Simpson’s Paradox is a warning to make sure all the data has been disclosed.
|
Finding Right Metric key | Best hitter: is batting average the right number?
Is Z-score better than absolute? |
Finance: absolute or relative performance? risk-adjusted or not,
But how? Sharpe? |
Justice: do women make $.77 on the dollar? What does this mean? Are you sure? |
Statistical Literacy -2
Level One |
The uncertain can often be predicted with amazing certainty. | The laws of chance lead often to extremely counter-intuitive results. | Data can be misleading and decisions based on them false. |
Quantification can lead to the double illusion of importance and objectivity, | The most important factors may not be quantifiable. | Most complex problems require non-quantitive judgment. | |
Statistical wizardry is no substitute for substantive knowledge. | Experiments should be reproduced multiple times. | The bigger the sample the lower the standard deviation. | |
Level Two | 1111 is a good sample size –
which is not a function of the population – the tasting soup analogy |
P values are arbitrary but should be decided on before experiments are conducted. | For what is a p value of 5% a good decision rule? Guilt or innocence? |
The inevitability of Type 1 and Type 2 errors | Studies should be based on random samples.
|
Experiments should be double blind and controlled. | |
Regression to the man, the Placebo effect, and the Hawthorne effect can be big | Adjusting data is often necessary but can be extremely misleading. | CPI adjustment is critical but fails to account for quality improvement. | |
Extrapolation is almost irresistible: budgets, stocks,
Climate. |
Partisan bias can distort data collection, experimental design. | Only 40% of social science experiments are ever repeated.
Is this science? |