Statistical significance is a statistical assessment of whether observations reflect a pattern rather than just chance. The fundamental challenge is that any partial picture of a given hypothesis, poll or question is subject to random error. In statistical testing, a result is deemed statistically significant if it is so extreme (without external variables which would influence the correlation results of the test) that such a result would be expected to arise simply by chance only in rare circumstances. Hence the result provides enough evidence to reject the hypothesis of 'no effect'.
For example, tossing 3 coins and obtaining 3 heads would not be considered an extreme result. However, tossing 10 coins and finding that all 10 land the same way up would be considered an extreme result: for fair coins the probability of having the first coin matched by all 9 others is which is rare. The result may therefore be considered statistically significant evidence that the coins are not fair.
When used in statistics, the word significant does not mean important or meaningful, as it does in everyday speech.
Researchers focusing solely on whether individual test results are significant or not may miss important response patterns which individually fall under the threshold set for tests of significance. Therefore along with tests of significance, it is preferable to examine effect-size statistics, which describe how large the effect is and the uncertainty around that estimate, so that the practical importance of the effect may be gauged by the reader.
The calculated statistical significance of a result is in principle only valid if the hypothesis was specified before any data were examined. If, instead, the hypothesis was specified after some of the data were examined, and specifically tuned to match the direction in which the early data appeared to point, the calculation would overestimate statistical significance.
An alternative (but nevertheless related) statistical hypothesis testing framework is the Neyman–Pearson frequentist school which requires both a null and an alternative hypothesis to be defined and investigates the repeat sampling properties of the procedure, i.e. the probability that a decision to reject the null hypothesis will be made when it is in fact true and should not have been rejected (this is called a "false positive" or Type I error) and the probability that a decision will be made to accept the null hypothesis when it is in fact false (Type II error). Fisherian p-values are philosophically different from Neyman–Pearson Type I errors. This confusion is unfortunately propagated by many statistics textbooks.
Read more about Statistical Significance: History, Use in Practice, In Terms of σ (sigma), Pitfalls and Criticism, Signal–noise Ratio Conceptualisation of Significance, Does Order of Procedure Affect Statistical Significance?
Famous quotes containing the word significance:
“History is the interpretation of the significance that the past has for us.”
—Johan Huizinga (18721945)