The Educational Endowment Foundation (EEF) has paved the way towards research-informed
interventions that aim towards investigating new methods to reduce educational inequality
among the deprived students in schools today. In order to facilitate the
efforts by EEF, we developed a statistical package to allow researchers to
conduct their investigation using state of the art methods for analysing data from
Randomised Controlled Trials (RCT). One of the many vessels
towards achieving the aforementioned goal is the eefAnalytics package. A set of
user-friendly commands, developed for both R and Stata software, allowing researchers in
education to use an optimal model to quantify causal effects in RCTs. It should
be noted that these commands can also be used in other disciplines as long as
the RCTs have continuous outcomes and do not need to exceed a two-level
structure (e.g. participants nested within schools). Moreover, advanced
knowledge of the specific calculations used by the package is not a necessity
or a pre-requisite; it is expected that users with an understanding of ordinary
least squares (OLS) regression and multilevel modelling will find this package
easy to use and interpret. This blog post will introduce the functionality of
the package and its contribution to the analysis of RCTs in educational
research.
Although RCTs have been used for a
long time, the appropriate methodology used in their analysis is contextually
and theoretically challenging. Educational RCTs pose various statistical
difficulties that need to be considered and controlled for. When discussing the
analytical aspect of these trials, it is important to remember that the
interventions proposed have the potential to benefit the lives of many children.
Thereby posing a challenge for the researcher in defining an appropriate
language of interpretation regarding the effect of a particular trial. In order
to facilitate this undertaking, the eefAnalytics package provides estimates of Hedge’s g effect size. This metric is calculated as the estimated difference in
the outcome variable between the intervention(s) and the control group,
standardised for the uncertainty in the data using conditional variance (model-based
variance after adjusting for covariates) and unconditional variance (total
variance without adjusting for covariates). As a statistical tool, the effect
size has been more widely used in recent years in conjunction with classical
p-values, allowing one to contextualise the results of an intervention in terms
of its practical -and not just statistical- importance. More importantly, one
of the main contributions of the eefAnalytics package is that it provides a
suite of functions to estimate effect size and its associated uncertainty.
As with any test statistic, the
interpretation of effect size relies on the uncertainty surrounding it. For
instance, a statistically significant effect size of a relatively large
magnitude will be of little practical use to a researcher if its confidence
interval is very large, covering values in its lower bound that are not
practically meaningful. In this case, even though the mean effect may be large and
significant, the true value could still lie in an area of the interval that is
of no or little practical importance. Therefore, calculation of uncertainty associated
with an effect size is equally as important as its point estimate. This is
especially true in educational RCTs where it is imperative to understand
whether more time and funds should be invested towards expanding the implementation
of an intervention to a wider audience of groups of children and schools.
More specifically, eefAnalytics provides
analytical methods underpinned by study design as it allows the use of ordinary
least squares regression in cases of Simple Randomised Trials (SRT), and the
use of mixed models for Cluster Randomised Trials (CRT) and Multi-Site Trials (MST). For example, in a CRT design
where randomisation to intervention and control group happens between clusters
(e.g. schools), there is a need to quantify the potential similarity of
subjects within each cluster as students of a particular cluster may share
similar attributes that can affect their outcome data. On the other hand, for
MST design randomisation of participants to intervention and control group occurs
within a school or cluster. This implies that there is an additional level of
variation stemming from the potential difference in the way the intervention is
administered (for instance the quality of the execution may vary between
schools). As a result, eefAnalytics functions for MSTs explicitly model
intervention-by-school interactions as random effects for educational trials or
intervention-by-site random effects for clinical trials. More detailed
information on MSTs and their effect size modelling can be found in Akansha
Singh’s blog post.
Although the aforementioned
potentially confounding issues of clustering effects within as well as between
schools are being controlled. There has been increasing interest in the
application of permutation tests and bootstrapping in randomised control trials
to obtain precise results when the assumptions of the parametric models are
violated. eefAnalytics package provides permutation and bootstrapping options
in order to provide more flexibility to researchers to test their model.
Bootstrapping is a re-sampling technique that randomly samples participants from the observed data to generate replicated samples. The random sampling is done with replacement and each new dataset produces its own effect size. This process is repeated numerous times accumulating the effect sizes to build a sampling distribution of the effect size. This distribution can be used to make direct inference on the variability of the effect size. Bootstrapping is particularly useful as it sensitises heterogeneity between participants and the variability of effect size given different snapshots of the original data. Similarly, permutation testing relies on the random shuffling of the intervention or treatment groups under the null hypothesis of no difference between the groups. For each permutated dataset, the effect size is calculated and contributes to a null distribution. This distribution can then be used to quantify how likely an observed effect size can be obtained by chance under the null hypothesis. It is important to note that both of these methods consider design effects from each study in the estimation of the effect size.
Alternatively, if a researcher is
faced with heterogeneous/noisy data or has quantifiable prior knowledge that
can be integrated within the model itself, the option for Bayesian analysis and
diagnostics is also available for all three RCT designs (SRT, CRT, and MST). The
package allows for the optional provision of a posterior probability which serves as a very intuitive and
easily understood way of reporting the weight of evidence. The posterior
distribution can be used to support an intervention, expressing the probability
that an effect size is above a pre-specified educationally relevant threshold(s).
Please see Germaine Uwimpuhwe’s blog for more thorough introduction to statistical
evidence using posterior probability.
Overall, the effect size, re-sampling
techniques, and Bayesian analysis are all examples of highly comprehensive
methods. Although these methods were introduced to the scientific community a
long time ago. They are now being used more frequently with easy access to high
computing power to deliver more intricate and interpretable findings. Therefore,
notwithstanding its practical use and the methodology it encompasses, this
package can also serve as a platform for the familiarisation and adoption of
these methods not just in education, but across disciplines that concern
themselves with the analysis of RCTs.
This blog is written by Dimitris Vallis, a Research
Assistant in Statistics in the Anthropology Department at Durham University.
This blog expresses the author’s views and interpretation of comments made during the conversation. Any errors or omissions are the author's.
Comments
Post a Comment