The Value of Posterior Probabilistic Inference in Educational Trials – An Intuitive and Useful Concept
It is known that educational trials commonly aim
at directly or indirectly improving the education attainment of pupils. To this
end, the p-value is commonly used in frequentist inference to judge whether an
intervention truly works or just has worked by chance based on the significance
level. However, the p-value has been blamed for the lack of reproducibility of
research findings and the misuse of statistics. It’s
a typical case of Goodhart’s law which states that when a measure
becomes a target, it ceases to be a good measure. To work around
this, some education researchers advocate the use of effect size (ES), which
can be thought of as the strength of the effect of the intervention, and its
confidence interval (CI; the range of plausible values of the ES) to assess the
effectiveness of interventions instead of relying on a p-value.
Even if ES and its CI are useful metrics,
there is empirical evidence that teachers and other education
stakeholders will not easily understand them. To convey the impact of an educational
intervention, these metrics should not stand-alone. There is a need for
educational researchers to move towards a simpler, practical measure that can
estimate the likely benefit of an intervention based on how effective it has
been in a specific evaluation. The posterior probability is a useful metric to complement the ES in
providing direct evidence of whether an intervention works for the study
participants in an educational trial as the first step before generalising
evidence to the wider population. Furthermore, it’s possibly a more intuitive concept than the often misunderstood p-value.
The posterior probability calculation requires the researcher to specify
a minimum threshold effect size of an intervention that would be considered
useful or valuable to know about. The
posterior probability is then the probability of getting at least this effect
size. Its calculation takes a Bayesian
approach, which is a method of updating beliefs in light of new evidence. At the start of the project, you have some
knowledge which allows you to make an educated guess as to the values this
probability might take, and this is called the ‘prior’ information. Bayesian method updates this historical
belief with the new data from your intervention study before comparing the
estimated effect sizes and the threshold value, to produce the so-called
‘posterior’ probability. For a specific
threshold, a higher posterior probability indicates greater effectiveness of
the intervention.
The use of posterior probability is a more
understandable way to inform policy-makers and educational stakeholders about
the effectiveness of an intervention. It
also accounts for study type as it depends on the threshold which must be
defined based on the study context, for instance, the type of intervention, the
targeted pupils, the outcome of interest, and so on. Moreover, Posterior
probability can be more informative than p-value, in the sense that one can
find a significant effect based on the p-value, while this effect might be too
small to be of interest to the field of education practitioners. So the posterior
probability considers this by setting the minimum expected effect as the threshold. To avoid data snooping, the choice of
threshold value(s) should be specified at the design stage based on the
evidence from the literature and expert's opinions.
Overall, the posterior probability is a probability that an ES exceeds a certain
threshold given the data. For example, in the figure below, a certain study has
a posterior probability
of 90% that a study intervention improves an educational outcome (the outcome might
be literacy, math, reading …) by at least two month’s progress (Hedge’s g=
0.10).
In general,
if everything else remains unchanged, the
posterior probability gets lower when you increase the threshold value as shown
in the figure.
For researchers who are interested in estimating this metric, it
is implemented in the eefAnalytics package in R and Stata, introduced in
more detail in Dimitrios Vallis’ blog post. Here’s a more in-depth introduction for those
wanting more detail on how the posterior probability is calculated: Given that ES is estimated from an appropriate
statistical model according to the study design, posterior probability can be
mathematically summarised as:
- where f is the prespecified threshold
- K is the number of iterations with the burn-in part excluded and
- is an indicator function which returns one if the condition insider the bracket hold and zero otherwise
This blog is written by Germaine Uwimpuhwe,
researcher working at the Department of Anthropology and Fellow of the Durham
Research Methods Centre.
This blog expresses the author’s views and
interpretation of comments made during the conversation. Any errors or
omissions are the author's.
Comments
Post a Comment