Hegedus EJ, Goode A, Camp-bell S, et al. Physical examination tests of the shoulder: A systematic review with meta-analysis of individual tests.Br J Sports Med.2008;42:80–92.1
Clinical Question: Are orthopedic special tests for a variety of shoulder pathologies valuable to clinical practice?
Data Sources: Studies related to diagnostic accuracy were identified through generic and subject-specific search strategies. Generic strategies included searches of MEDLINE, CINAHL, and SPORTDiscus databases (1996–2006). Subject-specific strategies expanded by combining the generic strategy with subjects related to pathologies and physical examination of the shoulder. Hand searching of personal files as well as searching the reference list in review articles was conducted. Individual special test names also were searched through MEDLINE and PubMed.
Study Selection: Eligible articles were those written in English that identified surgery, magnetic resonance imaging, or injection as the criterion standard; studied at least one shoulder special test; and reported or included data to calculate the paired statistics of sensitivity and specificity for individual tests. Review articles, studies using composite special tests, and reports based on anesthetized or cadaver subjects were excluded. An unblinded, independent review was conducted by 2 reviewers for all included abstracts, and the final sample of articles was determined by consensus. Discrepancies were addressed by a third reviewer.
Quality Assessment and Data Extraction: Article quality was evaluated using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool that specifically addresses internal and external validity through the evaluation of 14 questions.2 For this review, QUADAS scores ≥10 were indicative of higher quality studies, and scores ≤9 were indicative of lower quality studies. Details regarding data extraction were not provided.
Statistical Analyses: Meta-analyses were performed on the Neer test for impingement, Hawkins-Kennedy test for impingement, and the Speed test for labral pathology, and the analyses included 4 articles for each test. The overall diagnostic power of these special tests was determined through the diagnostic odds ratio (DOR) and the area under the curve (AUC) for the summary receiver operating characteristics (ROC) curve.
Main Results: Specific search criteria identified 686, 182, and 54 abstracts from MEDLINE, CI-NAHL, and SPORTDiscus, respectively, and 7 articles from hand searching; of these, 45 articles met the inclusion criteria. Overall, 22 of the 45 articles were determined high quality through QUADAS rating. In total, nearly 50 orthopedic special tests for the shoulder were reviewed for diagnostic accuracy.
Impingement. The majority of studies regarding diagnostic accuracy of impingement were of low quality (4 of 6), and the authors reported, with reservation, that the supraspinatus/empty can or infraspinatus tests may be confirmatory for impingement, based on higher specificity values in a couple of studies. However, the corresponding positive likelihood ratios (+LR) were small (<5).
Reports of sensitivity and specificity were similar for the Neer and Hawkins-Kennedy tests. Pooled sensitivity and specificity was 0.79 (95% confidence interval [CI], 0.75-0.82) and 0.53 (95% CI, 0.48–0.58), respectively, for the Neer test, and 0.79(95% CI, 0.75–0.82) and 0.59 (95% CI, 0.53–0.64), respectively, for the Hawkins-Kennedy test. There was no diagnostic utility for impingement with either the Neer or Hawkins-Kennedy tests based on a combination of the DOR (~1) and the AUC for both tests (Neer test, 0.74 [95% CI, 0.70–0.78]; Hawkins-Kennedy test, 0.76 [95% CI, 0.72–0.80]).
Rotator Cuff Integrity. Eight of the 15 articles regarding special tests for rotator cuff integrity were classified as high quality according to the QUADAS tool. Of the 10 special tests (ie, supraspinatus/Jobe empty can test, lift-off test, Speed/Gilcreest palm-up test, Neer test, Hawkins-Kennedy test, external rotation lag sign, internal rotation lag sign, rent test, Napoleon sign, and drop arm) reviewed in more than one study, none were consistently diagnostic.
Both the external rotation lag sign and the drop arm test may be valuable as specific tests for tears of rotator cuff muscles; however, these studies were of lower QUADAS score. A negative supine impingement sign may assist in ruling out a rotator cuff tear (sensitivity = 0.97; specificity = 0.09; +LR = 1.07). Tears of the subscapularis may be ruled in with a positive bear-hug (sensitivity = 0.60; specificity = 0.92; +LR = 7.5) or belly press (sensitivity = 0.40; specificity = 0.98; +LR = 20) test as these orthopedic tests appear to be specific for this injury. The horn-blower sign was determined diagnostic of severe degeneration of the teres minor muscle in one low quality study based on the reported high sensitivity (0.95) and specificity (0.92) values of the test as well as a +LR of 12.
Glenoid Labrum Pathology. The majority of articles (14 of 21) related to glenoid labrum pathology investigated superior labrum pathology, and only 12 of the 21 articles were of high quality as determined with the QUADAS tool. It appears the Kim (sensitivity = 0.80; specificity = 0.94; +LR = 13.3) and jerk (sensitivity = 0.73; specificity = 0.98; +LR = 36.5) tests for diagnosing posterior labral tears are appropriate. Additionally, the biceps load II test may be diagnostic for superior labral anterior to posterior (SLAP) lesions based on its high diagnostic values (sensitivity = 0.90; specificity = 0.97; +LR = 30). However, these special tests were evaluated in only one study, often conducted by the originator of the test. Therefore, more examinations into these special tests are warranted to better elucidate their value in diagnosing labral pathologies.
A meta-analysis of 4 articles was conducted on the Speed test for SLAP lesions, and summary sensitivity and specificity was 0.32 (95% CI, 0.24–0.42) and 0.61 (95% CI, 0.54–0.68), respectively. Both the AUC (0.54 [95% CI, 0.44–0.64]) and the DOR (<1) indicate the Speed test has no utility for diagnosing a SLAP lesion.
Instability. Three of 5 articles evaluating anterior shoulder instability were of high quality as determined with the QUADAS tool. Although no meta-analyses were performed, the review indicated anterior instability may be diagnosed with the apprehension, relocation, and anterior release tests (sensitivity range = 0.46–0.81; specificity range = 0.92–0.99) when using apprehension, not pain, as the criterion indicative of a positive test.
Acromioclavicular (AC) Joint Pathology. Two of 3 articles investigating AC joint pathology were of high quality as determined through the QUADAS tool. Acromioclavicular joint pathology may be diagnosed with a positive active compression test based on its high specificity (sensitivity = 0.93; specificity = 0.96). However, studies with higher QUADAS scores tended to report poorer statistics related to diagnostic accuracy. Pain with palpation may be useful as a screening tool (sensitivity = 0.96; specificity = .010) to rule out AC joint pathology; however, this recommendation is based on one small study.
Conclusions: These data suggest there are a limited number of shoulder special tests that may be diagnostically discriminatory and clinically useful for shoulder pathologies. Furthermore, there remains a lack of high quality studies with sufficient sample sizes that address the diagnostic capabilities of orthopedic special tests related to the shoulder.
Commentary: Clinical examination of patients with shoulder pain is central to the diagnostic process because physical findings often direct the course of patient care, including the need for further, more expensive diagnostic testing. The use of orthopedic special tests is emphasized in clinical examination for the purpose of constructing a differential diagnosis of pathologies, injuries, and conditions, and determining whether or not a patient has a particular injury. Numerous investigations have evaluated the diagnostic accuracy of shoulder orthopedic special tests. However, due to concerns about study quality,3 there remains uncertainty about which of the many tests provide diagnostic value to clinicians when evaluating patients with orthopedic conditions.
The systematic review by Hegedus et al1 reviewed nearly 50 orthopedic special tests of the shoulder and determined there are few tests that provide diagnostic value when evaluating the shoulder for impingement (supraspinatus/empty can or infraspinatus tests), rotator cuff integrity (supine impingement sign, external rotation lag sign, hornblower sign, bear-hug test, and belly press test), glenoid labrum pathologies (Kim, jerk, and biceps load II tests), instability (apprehension, relocation, and anterior release tests), and AC joint pathology (pain with palpation and active compression tests).
More striking, however, is the finding by Hegedus et al1 that common shoulder special tests used in athletic training, including the Neer and Hawkins-Kennedy tests for impingement and the speed test for detection of SLAP lesions, had no diagnostic accuracy and limited utility in clinical evaluation based on poor DORs and moderate AUC values. Although the authors caution that their recommendations are a guide and not an absolute, these findings highlight the need for clinicians to be selective, by considering characteristics of diagnostic tests, when incorporating orthopedic special tests into the evaluation of shoulder pathologies.
Awareness and understanding of the limitations of diagnostic tests is important for interpreting the results of orthopedic tests and providing the knowledge to determine the extent to which a positive or negative test result supports or refutes a particular diagnosis. Characteristics of diagnostic tests include sensitivity, specificity, likelihood ratios, odds ratios, and ROC curves.
Sensitivity and specificity values provide information regarding the probability that a correct test result (positive or negative) will be obtained given that an individual has or does not have the condition. With a highly sensitive test, a negative test result will rule out the condition (SnOUT), whereas with a highly specific test, a positive test will rule in the condition (SpIN).4 In general, sensitivity and specificity values closer to 1, or 100%, have more diagnostic meaningfulness than lower values. Ideally, a test with high sensitivity and specificity is desired; however, this is infrequent.
Likelihood ratios are useful because they indicate how likely a test result is in people who have the condition compared to how likely the result is in people who do not have the condition. Higher +LR provide more certainty that a positive test is indicative of the condition, with values >10 considered large and conclusive, and values closer to 1 being considered unimportant.5
Diagnostic odds ratios summarize the overall discrimination of a dichotomous test, such as the orthopedic special test with either a positive or negative result, and are useful when combining studies in systematic reviews and meta-analyses. Tests with odds ratios closer to 1 provide little diagnostic ability, whereas those odds ratio values further from 1 are more informative.
The ROC curve provides insight into the performance of a test across different diagnostic thresholds. A key index of the ROC curve is the AUC, with AUC values ranging from 0.5 (indicating no diagnostic accuracy beyond chance) to 1.0 (identifying perfect accuracy).6
Use of orthopedic special tests with limited diagnostic ability hampers effective evaluation and patient care. Hegedus et al1 provided a comprehensive review of common orthopedic special tests used in athletic training for the evaluation of shoulder pathologies, with very few providing clinical utility. Increased understanding of the characteristics of diagnostic tests will assist clinicians in identifying those orthopedic tests that have the ability to assist in diagnosing pathologies accurately.
- Hegedus EJ, Goode A, Campbell S, et al. Physical examination tests of the shoulder: A systematic review with meta-analysis of individual tests. Br J Sports Med. 2008;42:80–92. doi:10.1136/bjsm.2007.038406 [CrossRef]
- Whiting P, Rutjes AW, Dinnes J, Reitsma J, Bossuyt PM, Kleijnen J. Development and validation of methods for assessing the quality of diagnostic accuracy studies. Health Technol Assess. 2004;8(25):iii, 1–234.
- Rama KR, Poovali S, Apsingi S. Quality of reporting of orthopaedic diagnostic accuracy studies is suboptimal. Clin Orthop Relat Res. 2006;447:237–246. doi:10.1097/01.blo.0000205906.44103.a3 [CrossRef]
- Davidson M. The interpretation of diagnostic test: a primer for physiotherapists. Aust J Physiother. 2002;48:227–232.
- Furukawa TA, Strauss S, Bucher HC, Guyatt G. Diagnostic tests. In: Guyatt G, Rennie D, Meade MO, Cook DJ, eds. User’s Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. 2nd ed. New York, NY: McGraw-Hill Medical; 2008:419–438.
- Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36.