Understanding Diagnostic Tests

I've been doing a bit of file cleaning lately, and have come across a series of relatively short pieces I wrote several years ago, probably for the benefit of student research groups I was working with at the time.  Considering the references to SLAP lesion tests, it was clearly written around the time that my good friend Jackie Sadi and I published a meta-analysis on clinical tests for SLAP lesions in 2008.  Since they're not publishable and aren't going anywhere, figured I might as well share a few of them here, in case they can be of help to anyone.  Here's the first one:

Understanding likelihood ratios


In order to understand the usefulness of likelihood ratios, one must first understand the concepts of pre-test and post-test probabilities.  Pre-test probability can be simplified (greatly) to be thought of as prevalence.  For example, after having gone through a history with a male, 25 year-old baseball player who is complaining of deep shoulder pain, a clinician might be 80% certain that this patient is presenting with a superior labrum antero-posterior (SLAP) lesion.  This level of certainty must be arrived at through a process of clinical reasoning which will include some knowledge of the scientific literature in the area and experience with this group of patients.  The clinician may know, from experience, that 80% of her young male patients who are baseball players and complain of deep shoulder pain have had SLAP lesions in the past.  Once again, the prevalence of SLAP lesions in this clinician’s practice, when given these findings, is 80%.  Realize that we can peel this down to it’s most basic function as well.  If the overall prevalence of SLAP lesions in this clinician’s practice is 5%, then any patient has a pre-test probability that they are presenting with a SLAP lesion of 5% before they even walk through the door, which translates to 1:19 odds.  If 10% of her male patients present with SLAP lesions, then simply by knowing that her next patient is male, the pre-test probability the he has a SLAP lesion is 10%, or 1:9 odds.  You get the idea.  Each question that is asked will have its own effect on the probability that this patient is presenting with a SLAP lesion (area of pain, recreational activity for example).


The question is just how sure does a clinician need to be in order to make a diagnosis.  It may be that a clinician is confident in making a diagnosis with an 80% chance of being correct, especially if the diagnosis is a) not life-threatening or b) does not involve treatment with significant side-effects.  However, in most situations, clinicians will probably want a higher likelihood that they are correct before either initiating treatment or abandoning the diagnosis.  In this case, a clinician might choose a physical test to either strengthen the probability that a condition is present, or that it is absent.


The traditional statistics that are used to describe a tests' diagnostic accuracy are sensitivity (the percentage of subjects with the condition that score positive on the test), specificity (the percentage of subjects without the condition that score negative on the test), positive predictive value (the percentage of subjects that score positive on the test and have the condition) and negative predictive value (the percentage of subjects that score negative on the test and don’t have the condition).  While sensitivity and specificity are useful measures, PPV and NPV have come under scrutiny of late due to their vulnerability to prevalence of a condition in an experimental sample.  These values are prone to over-inflation in many reports of diagnostic tests given the fact that the prevalence of the condition being studied is artificially high in subject samples used in such tests.  As such, the PPV and NPV reported in many tests is really only true for that particular sample, and when corrected for use in the general population is often quite a bit lower.  For this reason, it is becoming more common to see sensitivity and specificity reported along with the likelihood ratios of a test.


Positive (PLR) and negative (NLR) Likelihood Ratios have found favour amongst clinical researchers in recent years.  Simply put, the PLR or NLR are numeric values signifying the change in the odds that a condition is present given a positive or negative finding on a test, respectively.  The positive likelihood ratio is calculated as sensitivity/(1-specificity), and the negative likelihood ratio is calculated as (1-sensitivity)/specificity.  According to Jaeschke et al (1994), a PLR >10 or NLR < 0.1 provide significant, often conclusive, shifts in the probability that a condition is either present or not present.  A PLR between 5 and 10 generates a moderate shift (0.1-0.2 for NLR), 2-5 (0.2-0.5) generate small but sometimes important shifts, and 1-2 (0.5-1) generate small and rarely important shifts in probability.


To put this all together, suppose we go back to our scenario with a 25 year-old male baseball player with a complaint of deep shoulder pain.  The clinician is 80% sure that the patient is presenting with a SLAP lesion.  The pre-test probability that a SLAP lesion is the correct diagnosis in this case is 80%.  The clinician wants to be more sure of this diagnosis, as the preferred intervention for SLAP lesions is arthroscopic surgery.  The clinician turns to the literature and finds that the Resisted Supination/External Rotation test (Myers et al. 2005) has a reported sensitivity of 0.828 and specificity of 0.818 for diagnosing SLAP lesions.  These values equate to a PLR of [(0.828)/(1-0.818)] = 4.55 and NLR of [(0.818)/(1-0.828)] = 0.22.  While these values are unlikely to cause a large shift in probability, the clinician feels that the scientific rigour with which this test was developed and described is adequate and the patient population on whom it was developed is similar to hers.  She applies the test, and gets a positive result, in this case the patient’s deep shoulder pain is reproduced with this test.  To determine the post test probability that this diagnosis is correct, the clinician can turn to a nomogram or calculate the odds manually.  A nomogram is shown in Figure 1.  By using a straight edge, the clinician extends a straight line from her pre-test probability, in this case 80%, through the PLR, in this case 4.55, to get a post-test probability of roughly 94% (see Fig. 1).  The clinician then decides if a 6% chance of being incorrect is acceptable, and then either makes the diagnosis, or chooses another test that will shift her probability in one direction or another.

For those who prefer the actual calculation rather than the use of a nomogram, here's how you would do it in this case:

The first step is to determine the odds that the condition is present.  If the clinician is 80% certain that the condition is present, then this equates to 4:1 odds.  I personally find it easier to convert from odds to percentage than the other way round.  Odds can be converted to percentage by division, where the numerator is the number on the left of the colon, and the denominator is the sum of the two numbers.  In this case that would be 4/(4+1), or 80%.  You have to do this backwards in order to get the odds. 


Now that we have the odds, we can see how the results of the test shift those odds.  In the case of a positive test with a PLR of 4.55, we multiple the 4:1 odds by 4.55 to get our new odds of 18.2:1, giving us a post-test probability of 94.8%, very close to the 94% we got from the nomogram.  On the other hand, if the test were negative then we would take our 4:1 odds and multiply that instead by 0.22.  This gives us post-test odds of 0.88:1, and post-test probability of 46.8%.


To Rule In, or Rule Out – is that the question?


In the last section, we discussed the notion of likelihood ratios, and their ability to affect shifts in probabilities given a positive or negative test result.  The use of a nomogram was described, and a clinical example was used.  However, the busy clinician may not have the luxury of resorting to a nomogram when they are trying to determine the presence or absence of a condition.  It may well be more efficient for a clinician to have in his or her repertoire of clinical tests a group of tests that a) can be performed easily and b) can help the clinician to rule in or rule out a condition.


We can again turn to the descriptive statistics of sensitivity, specificity and positive or negative likelihood ratios to determine which tests are more effective in ruling a condition in, and which are more useful for ruling a condition out.  This process will not be as precise as using the nomogram method, but as mentioned previously, will likely be of more use to the busy clinician.


The SpPin and SnNout pneumonics can be used for this purpose:


SpPin: A Positive result on a highly Specific test will help to rule a condition IN (Specific Positive in).

SnNout:  A Negative result on a highly Sensitive test will help to rule a condition OUT (Sensitive Negative out).


This can be understood by looking at the 2x2 square that is used to calculate specificity and sensitivity (Figure 2).  If we label each of the squares, starting in the upper left hand corner and moving left-to-right as True Positive (a), False Positive (b), False Negative (c) and True Negative (d), then Sensitivity is calculated as a/(a+c) and Specificity is calculated as d/(b+d).  To put these values into words:


Sensitivity is the percentage of the sample population with the condition that score positive on the test.


Specificity is the percentage of the sample population without the condition that score negative on the test.

In the case of a highly sensitive test, a negative result is not likely to be a false negative, because we know that in highly sensitive tests, false negatives are rare.  Therefore, if a clinical test that is highly sensitive yields a negative result, it is likely to be a true negative, and this strengthens the evidence that the patient is free of the condition.


Conversely, in a highly specific test, a positive result is not likely to be a false positive.  Therefore, a positive result on a highly specific test will probably be a true positive, and strengthens the evidence that the patient has the condition.  Of course, determining the results of multiple tests will continue to strengthen the evidence for or against the presence of the condition.


When using predictive values to determine the chance that a condition is present or absent, one must be cautious.  As was mentioned in the previous section, predictive values are vulnerable to artificial inflation in the case of many research samples, in which the prevalence of the condition to be diagnosed is frequently much higher than that of a standard clinical setting.  Take the Resisted Supination/External Rotation test of Myers and colleagues (2005) that has been developed to diagnose SLAP lesions in the athletic shoulder.  This test was used in the example in the previous section.  In the sample group on which this test was initially developed and described, the prevalence of SLAP lesions was 72.5% of the group.  Now say, for arguments sake only, that the test was so non-specific that every subject in the study had a positive result on the test.  If the positive predictive value is defined as the percentage of subjects with a positive test who actually have the condition [a/(a+b)], then the PPV would be 0.725 because 72.5% of the sample would have been true positives regardless of the diagnostic accuracy of the test.  While not high, 0.725 would be considered acceptable for a clinical test by many authors.  Of course the NPV would be very low in this case, but that’s not always described as obviously as you might think. 


If a clinician were to take that test and perform it on her patient population, in whom the prevalence of SLAP lesions was only 15%, then the PPV of the test in her population decreases to 0.15, completely unacceptable.  This is an extreme example, but is an easy illustration of the potential pitfalls of relying on PPV and NPV values in ruling conditions in or out.


If SpPin and SnNout is not going to stick in your head easily, then the positive and negative likelihood ratios can also be used when determining which tests are more useful for ruling a condition in or out.  Simply put, a positive result on a test with a high positive likelihood ratio will strengthen the evidence in favour of the presence of the condition, whereas a negative result on a test with a low (good) negative likelihood ratio will strengthen the evidence against the presence of the condition. 


Of final note in this section, it would be ideal to have a clinical test with high values for both sensitivity and specificity, denoting high overall diagnostic accuracy.  Indeedm many clinical tests were originally reported to have excellent sensitivity and specificity values.  Unfortunately, it is almost a truism that these types of results often don’t stand up under further scrutiny by other researchers in other populations.  Whether this is due to problems with methodologic rigour in the initial development of the test or in the independent re-evaluation of the test is a debate that generates more heat than light.  The reader is left to decide, for their purposes, which are the “best” tests to perform on any given population.