Understanding Uncertainty in Effect Size for Multisite Educational Trials

Multisite randomised controlled trials are routinely used in health and education to evaluate the benefit of a health or educational intervention on study outcome. A multisite trial involves two or more sites with a common intervention and data collection protocol. An important characteristic of a multisite trial is randomisation of participants to intervention and comparison groups within sites. This approach offers several advantages over single-site trials such as enhanced internal validity, and greater statistical power when studying outcomes with large variance (e.g. academic scores). Multisite trials have been used in health research for a long time and gained popularity in education studies in recent years. In education, multisite trials involve randomisation of pupils (students) into intervention and comparison groups within each school. This design makes it possible to rigorously study a cross-school distribution of intervention effects, facilitating estimation of both the overall impact of an intervention and the variation in the impact across schools—two factors that have significant implications for policy, practice, and research.

The multisite trial design is different from cluster randomised trials, where schools, instead of pupils, are randomised to an intervention or comparison group. Despite the advantages of the multisite trials approach discussed above, design of multisite trials poses several challenges. Some researchers analyse multisite trials data assuming that the pupils between schools are independent, but this approach has severe implications on the accuracy of the impact assessment of an education intervention. This assumption neglects one of the main strengths of a multisite trial which is to provide useful information on how the effect of an intervention on a pupil can differ between schools and can also depend on their school. The school-by-intervention interaction in multisite trials poses an additional challenge in analysing data from multisite trials. Ignoring both school and school-by-intervention interaction variation can offset the assumed benefit of a multisite trial compared with a cluster randomised trial because multisite trials enables a formal testing of the generalizability of an intervention over different schools.

Accounting for all sources of variation in a multisite trial can yield appropriate estimates of intervention effect and reduces the risk of false conclusions in the multisite trials. This extra source of variability due to school-by-intervention interaction in multisite trial studies means that the statistical model and calculation of the effect size (a common metric measuring the standardised mean difference in the outcome between the intervention and comparison groups) should account appropriately for the school-by-intervention interactions.

 

Multilevel models are commonly used to analyse educational trials data with cluster or multisite trial design. For example, a two-level model which allows for grouping of pupil numeracy academic scores within schools would include residuals at the pupil and school level. Within this multilevel model, it can be argued that a school-by-intervention interaction can be adjusted by including an interaction term of school and intervention in the model. By contrast, it is advisable to capture this interaction as a random slope for multisite trials i.e., the intervention effect will not be the same for each school. The inclusion of a random slope term is crucial for achieving correct statistical inference about the interaction term and the main effect of the intervention. In a model with interactions as fixed effect, it is problematic interpreting the main effects of intervention as the effect of an intervention includes both main and interaction effect.  

Results from our recent study on the estimation of effect size for multisite trials showed that ignoring variation in intervention across schools can affect estimation of the effect size and its confidence intervals in multisite trials. The confidence interval (uncertainty) of an effect size estimate for multisite trials obtained using the multilevel models with school and school-by-intervention variance terms are much wider than the simple regression or multilevel model with only school variance. Therefore there’s a lesser risk of a false-positive conclusion. The key message from this study is that the total variance in a multisite trial must be correctly estimated to access the intervention effect accurately.  

Complementing previous work that provides effect size and confidence intervals formulas for the simple or cluster randomised trial effect size estimates, our study also derives the equivalent formulas for a multisite trial.  The methods proposed in this study have been implemented in an open-access statistical package “eefAnalytics” in R, implemented using the functions called “mstFREQ” and “mstBayes”. A STATA module of this package is also available. The aim of this package is to support statisticians and researchers to perform sensitivity analysis for simple randomised, cluster, or multisite educational trials using different analytical approaches.  Dimitrios Vallis has written a separate post for this blog with more information on eefAnalytics.

Future research on optimal statistical approaches to evaluate educational interventions will include bootstrapping approaches, which can be used to estimate confidence intervals for effect size without having to assume anything about how the data are distributed; instead, confidence intervals of the effect size are derived by resampling directly from the data you’ve collected.  However, bootstrapping in multisite trials requires further research to understand the effect of bootstrapping at the school or pupils level. Accurate measurement of uncertainty in the effect size for multisite educational trials is important for programmatic decisions by setting accurate expectations about the potential magnitude of intervention effect in multisite trials.

This blog is written by Akansha Singh, a post-doctoral researcher working at the Department of Anthropology and a Fellow of Durham Research Methods Centre and the Institute for Data Science, Durham University.


This blog expresses the author’s views and interpretation of comments made during the conversation. Any errors or omissions are the author's.

Comments