The p-value is used in the context of null hypothesis testing in order to quantify the idea of statistical significance of evidence. A result is said to be statistically significant if it allows us to reject the null hypothesis. Thus, this naive definition is inadequate and needs to be changed so as to accommodate the continuous random variables. The vertical coordinate is the probability density of each outcome, computed under the null hypothesis. The p-value is the area under the curve past the observed data point. The smaller the p-value, the higher the significance because it tells the investigator that the hypothesis under consideration may not adequately explain the observation. Thus, the p-value is not fixed.

This implies that p-value cannot be given a frequency counting interpretation since the probability has to be fixed for the frequency counting interpretation to hold. In other words, if the same test is repeated independently bearing upon the same overall null hypothesis, it will yield different p-values at every repetition. The p-value is widely used in statistical hypothesis testing, specifically in null hypothesis significance testing. A test statistic is the output of a scalar function of all the observations. This statistic provides a single number, such as the average or the correlation coefficient, that summarizes the characteristics of the data, in a way relevant to a particular inquiry. For the important case in which the data are hypothesized to follow the normal distribution, depending on the nature of the test statistic and thus the underlying hypothesis of the test statistic, different null hypothesis tests have been developed.

Here a few simple examples follow, each illustrating a potential pitfall. The test statistic is “the sum of the rolled numbers” and is one-tailed. The researcher rolls the dice and observes that both dice show 6, yielding a test statistic of 12. This illustrates the danger with blindly applying p-value without considering the experiment design. Suppose a researcher flips a coin five times in a row and assumes a null hypothesis that the coin is fair. The test statistic of “total number of heads” can be one-tailed or two-tailed: a one-tailed test corresponds to seeing if the coin is biased towards heads, but a two-tailed test corresponds to seeing if the coin is biased either way. The test statistic is the total number of heads and is a two-tailed test.