Determining the Cut Point for Identifying Developmental Language Disorder with Norm-referenced Tests
Why might a child with developmental language disorder (DLD) qualify for language services in one district but not another? It is not because the child's DLD appears and disappears. Instead, it maybe because districts use different and often arbitrary cut points on tests to determine eligibility.
Under the provisions of the Individuals with Disabilities Education Improvement 56 Act of 2004(IDEA, 2004), children with DLD qualify for special education services at school if they (1) have a language impairment that (2) interferes with educational function and that (3) requires special instruction (34 CFR § 300.8). This article examines the evidence base for satisfying step 1, identifying the presence of language impairment. There are three critical pieces of information needed. The first is the data indicating the test's sensitivity, or how often the test correctly identifies those with the disorder as actually having the disorder. The second is specificity, or how often the test correctly identifies those without a disorder as not having the disorder. We can use sensitivity and specificity data to determine the diagnostic accuracy of any test (e.g., reading disorders, hearing loss, cancer). Here we will refer to accuracy for the diagnosis of DLD.
The third piece of information is the cut score which maximizes test sensitivity and specificity. Cut scores are specific to each test. For example, sensitivity and specificity may be best at -1.62 SD for one test, -1.20 SD for another, and -0.75 SD for a third. Therefore, it is critical to know the test-specific cut score. Suppose a speech-language pathologist selects a test with good sensitivity and specificity but fails to use the correct cut score. In that case, the reported sensitivity and specificity data no longer apply, and misdiagnoses are likely to occur.
The sensitivity and specificity data translate directly to how frequently a clinician will reach the correct conclusion concerning a child's status. For example, a sensitivity level of .80 means that 80% of the time, a score below the evidence-based cut score will correctly indicate DLD. A sensitivity level of .92 means that 92% of the time, a score below the evidence-based cut score will correctly indicate DLD, and so on. Conversely, specificity levels translate to the percentage of time a score above the evidence-based cut score will be correctly interpreted as reflecting typical development. Therefore, when used together, sensitivity and specificity information help prevent under- and over-diagnosis.
The probability of missing an actual case of DLD (a false negative) is 1 - sensitivity and misdiagnosing typical children as having DLD (a false positive) is 1 - specificity. The likelihood of misdiagnosis becomes increasingly small as sensitivity and specificity values go up. Sensitivity and specificity levels should be at least .80 or higher to justify using a test clinically to diagnose a disorder (Plante & Vance, 1994).
These three critical pieces of information should appear in the test manual, typically in the chapter on test validity. Data often appear in a table containing the components seen in the accompanying figure, although the layout varies by publisher. Some manuals indicate the sensitivity and specificity for the single cut score that maximizes diagnostic accuracy. Other manuals show the sensitivity and specificity for several cut scores. In these cases, clinicians should choose the cut score that maximizes both sensitivity and specificity to balance the risks of under- and over-identification. For some tests, the manual also reports sensitivity and specificity compared to the results of other tests (often other tests by the same publisher). However, unless the sensitivity and specificity levels of those tests are known to be high, this information does not help determine whether the test at hand can identify DLD. Suppose a test manual lacks information on sensitivity and specificity or there commended cut-off score. In that case, there is no guarantee that diagnostic decisions based on the test results will be accurate.
Figure Legend. Hypothetical sensitivity and specificity data. Note that the best balance between sensitivity and specificity in this example comes with a cut score of 85, making it the cut score that maximizes diagnostic validity for this particular test. Children with scores lower than 85 are correctly identified as having DLD 87% of the time, and children scoring above 85 are correctly identified as typically developing 84% of the time.
Many clinicians were not taught this information as students, and unfortunately, non-evidence-based diagnostic methods are still prevalent in the field. That said, the use of sensitivity and specificity data to support diagnostic accuracy has been around for about thirty years (Plante &Vance, 1994). Moreover, evidence contradicting the prevalent practice of applying a low but arbitrary cut score with any language test has been available for almost as long (Spaulding et al.,2008). Therefore, it is not only that relying on sensitivity and specificity data supports diagnostic validity but also that the most common alternative diagnostic practice is invalid.
What can you do if you work in a setting that requires tests that lack sensitivity and specificity data or non-evidence-based cut scores? First, it is helpful to recognize that such outdated procedures were established by fellow clinicians decades ago, with decades less knowledge about assessment validity. We are free to change them as the field advances. Moreover, those in charge of policies and procedures may be unaware that the given diagnostic procedures are, in fact, obsolete. Practice managers, special education directors, and state administrators often are not speech-language pathologists. To avoid misdiagnosis and legal risk, we must tell them when procedures should be updated. For example, the law governing eligibility for services in schools in the USA (the IDEA) requires that “Assessments…are used for the purposes for which the assessments or measures are valid and reliable” (IDEA section §300.304). Validation for the purpose of identifying a disorder requires data on accuracy. Sensitivity and specificity data provide these data.
Testing is one of the most high-stakes functions we perform as speech-language pathologists. It determines who gets help and who continues to struggle on without it. For this reason, we have an ethical obligation to get it right. If someone challenged you to defend how accurate your diagnostic method is, could you do it? If you know the sensitivity and specificity data for the test and use the evidence-based cut score, you can.
References
Plante, E. & Vance, R. (1994). Selection of preschool language tests: A data-based approach. Language, Speech, and Hearing Services in Schools, 25, 15-24.
Spaulding, T.J., Plante, E., & Farinella, K.A. (2006). Eligibility criteria for language impairment: Isthe low end of normal always appropriate? Language, Speech, & Hearing Services in Schools, 37,61-72.