The Value of Posterior Probabilistic Inference in Educational Trials – An Intuitive and Useful Concept

It is known that educational trials commonly aim at directly or indirectly improving the education attainment of pupils. To this end, the p-value is commonly used in frequentist inference to judge whether an intervention truly works or just has worked by chance based on the significance level. However, the p-value has been blamed for the lack of reproducibility of research findings and the misuse of statistics.  It’s a typical case of Goodhart’s law which states that when a measure becomes a target, it ceases to be a good measure. To work around this, some education researchers advocate the use of effect size (ES), which can be thought of as the strength of the effect of the intervention, and its confidence interval (CI; the range of plausible values of the ES) to assess the effectiveness of interventions instead of relying on a p-value.  

Even if ES and its CI are useful metrics, there is empirical evidence that teachers and other education stakeholders will not easily understand them. To convey the impact of an educational intervention, these metrics should not stand-alone. There is a need for educational researchers to move towards a simpler, practical measure that can estimate the likely benefit of an intervention based on how effective it has been in a specific evaluation. The posterior probability is a useful metric to complement the ES in providing direct evidence of whether an intervention works for the study participants in an educational trial as the first step before generalising evidence to the wider population. Furthermore, it’s possibly a more intuitive concept than the often misunderstood p-value.

The posterior probability calculation requires the researcher to specify a minimum threshold effect size of an intervention that would be considered useful or valuable to know about.  The posterior probability is then the probability of getting at least this effect size.  Its calculation takes a Bayesian approach, which is a method of updating beliefs in light of new evidence.  At the start of the project, you have some knowledge which allows you to make an educated guess as to the values this probability might take, and this is called the ‘prior’ information.  Bayesian method updates this historical belief with the new data from your intervention study before comparing the estimated effect sizes and the threshold value, to produce the so-called ‘posterior’ probability. For a specific threshold, a higher posterior probability indicates greater effectiveness of the intervention.

The use of posterior probability is a more understandable way to inform policy-makers and educational stakeholders about the effectiveness of an intervention. It also accounts for study type as it depends on the threshold which must be defined based on the study context, for instance, the type of intervention, the targeted pupils, the outcome of interest, and so on. Moreover, Posterior probability can be more informative than p-value, in the sense that one can find a significant effect based on the p-value, while this effect might be too small to be of interest to the field of education practitioners. So the posterior probability considers this by setting the minimum expected effect as the threshold.  To avoid data snooping, the choice of threshold value(s) should be specified at the design stage based on the evidence from the literature and expert's opinions.

Overall, the posterior probability is a probability that an ES exceeds a certain threshold given the data. For example, in the figure below, a certain study has a posterior probability of 90% that a study intervention improves an educational outcome (the outcome might be literacy, math, reading …) by at least two month’s progress (Hedge’s g= 0.10).  In general, if everything else remains unchanged, the posterior probability gets lower when you increase the threshold value as shown in the figure. 


For researchers who are interested in estimating this metric, it is implemented in the eefAnalytics package in R and Stata, introduced in more detail in Dimitrios Vallis’ blog post. Here’s a more in-depth introduction for those wanting more detail on how the posterior probability is calculated: Given that ES is estimated from an appropriate statistical model according to the study design, posterior probability can be mathematically summarised as:  

  • where f is the prespecified threshold
  • K is the number of iterations with the burn-in part excluded and
  • is an indicator function which returns one if the condition insider the bracket hold and zero otherwise 

This blog is written by Germaine Uwimpuhwe, researcher working at the Department of Anthropology and Fellow of the Durham Research Methods Centre.

This blog expresses the author’s views and interpretation of comments made during the conversation. Any errors or omissions are the author's.

 


Comments