Why randomised clinical trials requirements are so strict?

Although providing great scientific evidence for drug efficacy and safety, Randomised Controlled Trials (RCTs) have significant practical limitations as discussed in my earlier post. The question we will answer in this post is: What is the statistical method used for RCTs and why it has such strict criteria on data collection? 

What is the statistical approach used and why does it requires strict data collection requirement? 

Upon completion of an RCT, the gathered data is statistically analysed, mostly by means of hypothesis testing. For each collected parameter, a research question is formulated in the form of two hypotheses which are mutually exclusive, i.e. if one is true, the other must be false. These are often characterised as the null hypothesis H0 and the alternate hypothesis Ha. Typically H0 is chosen to reflect the situation that there is no statistically significant difference in a single parameter (i.e. representing safety or efficiency) between the control and experiment groups. For example, the null hypothesis is that there is no difference in death rate between patients who took aspirin daily and those who did not, the alternate hypothesis would automatically state that there is a difference. 

To perform hypothesis testing, the probability density function (pdf) of the control group has to be approximated. Often the pdf is approximated by a standard normal distribution with μ = 0. This approximation is justified using the Central Limit Theorem (CLT), which states that when you have many, small, independent, random variables, their sum is distributed as a bell-curve (i.e. normal distribution). 

The CLT is an amazing theorem, as by using it it can solve a lot of practical problems in probability theory. Data which is normally distributed allows you to accurate estimate probability. The problem is, for real world data there is no clear answer for how large the data should be for the CLT to apply. Also, to satisfy the CLT criteria, the data must be collected within the strict RCT requirements otherwise any deviation on the described data collection criteria above potentially jeopardise the scientific integrity of the hypothesis test. Hence, clinical trials are designed very specifically (e.g. patient polutation, randomisation and sampling) to ensure the normal distribution so we can apply the CLT. 

What are the limitations of using hypothesis testing when applied to health services?

Assuming the data is collected in line with RCT requirements - which is almost impossible in case of health services as discussed in my earlier post-, there are additional limitations of applying RCT hypothesis testing on health services: 

  •  Outliers: An outlier is an observation that does not follow the pattern of the majority of the data. Fitting a normal distribution on the data emphasises the effect of average patients and understates outliers. This is of no concern if one is interested in the ordinary behaviour of a therapy. However, for clinical development the information about outliers (high responders or non-responders) is very relevant. Outliers such as complications and adverse events define directly the negative outcome of all treatments. Those outliers are often clinically important and may correspond to medical errors; hence they are worthwhile to be flagged and analysed to produce better clinical guidelines
  • Factor-dependency: Many measurements seem to fit a normal distribution, especially when enough data is used as the case is in the RCTs of most medical treatments. Because this assumption tends to work well most of the time, it is usually taken for granted in many domains. A major reason why CLT fails is that the individual factors of a given study are not independent and therefore correlated. In this case the data will not fit a normal distribution well even for large study populations. Healthcare services are one of those domains where variables are not independent and other factors can influence the outcome (e.g. clinician skills) which result in considerable bias when evaluated in such a strict hypothesis testing framework. To avoid bias in RCTs, a very high sample size is needed, resulting in much larger scale RCTs. 

So what to do? Could we modify the statistical approach used to ease the data collection criteria and allow for evidence generation for health services? My next blog will discuss a new solution towards generating evidence from real world data.