What Clinical Trials Can, and Cannot, Tell us about treatment

There seems to be a growing disconnect between clinical research and clinical practice. I continue to see policies being made and online battles being waged based on the results of randomized trials. 'This works, but that doesn't work', and here's the evidence, that sort of thing. Anyone who follows rehab trials in particular will recognize that it's much more common to find evidence that a particular intervention, be it manual therapies, specific exercises, acupuncture, education, or electrotherapeutic modalities are either not effective or not very effective (overall small effect sizes). My experience from talking with clinicians is that these kinds of findings are at odds with clinical experience - many clinicians will know that if you find the right patient, a certain intervention approach can be quite effective despite the results of clinical trials stating otherwise. And this represents a serious problem - in many cases clinicians are forced to practice either with strict adherence to best evidence, or strict adherence to patient-centred care, but not both. This represents a tension I see many struggling to resolve, so let's dig into this a little further.

Let's start with a short description of the philosophy behind the randomized controlled trial. RCTs are the favoured design for those attempting to ascertain cause-and-effect relationships, meaning that it is driven from largely positivist or perhaps post-positivist epistemic positions. That's a bit of a mouthful so a brief explanation seems warranted: positivism is focused on understanding knowledge and reality through looking for universal constants or laws. So traditional physics, for example, would be rooted in positivistic thinking in that it assumes we can create laws that are universally applicable. Newton's laws of motion would be good examples - if you remember back to your days of learning physics you'll likely remember being given a problem and having to find a single correct solution by applying formulae to sort it out. In such a case there would be one correct answer and an infinite number of incorrect answers - drop an apple, the apple falls, and you can calculate the precise time it will hit the ground if you can quantify all the important variables (gravity, wind speed, mass, etc..). Post-positivism softens that position a little, representing a position where we can never really prove cause-and-effect but if we disprove enough alternative hypotheses, then we get closer and closer to the right one. A classic example would be that of testing the hypothesis that all swans are white. Post-positivists would argue that it would be impossible to sample all swans in the world so we usually recruit what we hope is a representative sample, then determine their colour. If our sample is all white, then we reject the alternative (sometimes called the 'null') hypothesis and state that we are certain to within a particular level of confidence that all swans are white (recognizing that we did not, in fact, sample all swans). This is where the concept of probability (p) values and confidence limits come from - a p value can never be exactly zero (100% confidence), but we generally accept that 95% confidence is pretty confident and are ready to reject the null. Of course you can understand the fragility of this way of knowing - all it would take is a single black swan to disprove your hypothesis. So, it is much easier to disprove a causal relationship than it is to prove one in post-positivist thinking, but this is where most of our quantitative research comes from. I won't go into the other end of the epistemic spectrum where more interpretive or contructivist philosophies live, because it's more than we need right now. However, suffice it to say that p values and confidence intervals are not the only means of creating and understanding knowledge, and perhaps that's our first important takeaway as far as what RCTs can and cannot tell us.

There are many useful elements of a good quality blinded (or double, or triple blind) RCT. If done well they reduce observer or subject bias, they should effectively eliminate several confounders, and provide support for a number of the Bradford-Hill criteria for cause-and-effect including strength of relationship, dose-response associations, and reversibility. It's the confounder piece that I want to focus on here because I think it's critical to understand. If a study is designed with a rigorous randomization protocol, then all the individual variables of humans that makes them very messy and could otherwise interfere with the researcher's ability to interpret their results should be equally distributed between the two or more arms of the study. In doing so, researchers can usually safely ignore the potential confounding effects of individual variations in people that may influence their response to the intervention under study. Most good study descriptions that conform to the CONSORT reporting guidelines statement will include a table of participant characteristics that includes a comparison of some key variables between the groups. These are usually age and sex, with others dependent on the nature of the study. If those aren't different, the researchers are generally happy that their results aren't being confounded too much by other person-level variables. But of course, it would be highly unlikely that age and sex are the only variables that affect outcome. Where they are known and can't be adequately randomized, another strategy is to simply exclude people who have known confounders. Pregnant women, as an example, are commonly excluded from rehab trials, as are those with neuromuscular disorders or complex comorbid conditions, unless of course pregnancy, neuro or complex conditions are the ones under study. 

Both randomization and exclusion should be of particular concern to clinicians. While good randomization (and keep in mind there is such a thing as bad randomization) should allow the researchers to effectively ignore those bothersome individual differences that would otherwise affect response to treatment, clinicians CANNOT ignore them! The results drawn from large group means represent only the average response from the average person in each group. Who is this average patient? Are you average? Am I average? What about those people who show more extreme treatment effects? Who are those on the very low end that didn't respond at all (or maybe even got worse), or on the very high end who showed a very strong response? It would be nice for clinicians to know more about those people, even if they only represent 10% of the sample combined, because that's 1 in 10 patients in front of them that may in fact be expected to react really well or really poorly to the treatment. Most research studies are of course not adequately powered to conduct such sub-analyses so we never really learn about those people. Exclusion criteria should also be closely reviewed, as most researchers will do their best to recruit a sample that is as 'clean' as possible, people who are otherwise completely healthy with the exception of the one condition under study. Again, clinicians will likely recognize that such people rarely come through their clinic door, so these criteria become critical to understanding to whom the results apply (and to whom they do not).

There are important considerations here, as the results of RCTs are often used to influence clinical practice. The recent opioid guidelines are a good example - while we now recognize that all of us can be grouped into different classes of opioid metabolism based on mutations of certain alleles of the cytochrome P450 gene, most RCTs of which I'm aware do not break results down by metaboliser type. Rather all we get are group averages, and policy makers then draw lines that should meet the 'average' need but likely don't consider those at the extremes. The same can most certainly be said for rehab research, though here we're even more in the dark about those individual differences that affect response to treatment.

So this is where the nature of true evidence-based practice becomes critical - clinicians must know the current state of good evidence, then be able to recognize patterns in the patient in front of them to determine if the results apply (based on experience or an intimate knowledge of all research), and finally determine if the intended treatment is in alignment with this patient's personal values. This is a pretty big ask - it's no wonder many surveys reveal that don't often practice in strict accordance with practice guidelines.

So perhaps before you get into your next 'this works or that works or that doesn't work' online flame war, you'll take a moment to consider something I continue to believe: that nothing works for everyone, but everything will work for someone. Heck, even a good punch in the ribs might be just the tonic a particular person needs to get on the right path, though I doubt you'll find an RCT on that any time soon (and perhaps you shouldn't). The point is that practice needs to be focused on applying the right treatment for the right person at the right time, and that is a very tall order but starts with sound clinical reasoning.

For those who are interested, here's my little infographic on N of 1 studies, which are probably due to be elevated to their proper place in the pantheon of healthcare research methodologies. In fact, I foresee a future where we move in two directions from the RCT - both down towards the individual N-of-1 level, and then use those data to feed up to the big data neural networks that can in fact consider the messiness of human nature when fed enough data.