Guest Column | March 16, 2015

Why Rater Training Matters in Clinical Trials: A Science Overview

Rater Training on data and medical equipment in Clinical Trials

By Christine Hall and Susan M. Dallabrida PhD, PHT Corporation

Standardized technique, interpretation and recording improves data consistency

Rater training is used within clinical trials to improve the consistency of subjective data collected from patients, caregivers/observers, and clinicians/interviewers. Data stability is critical for data analyses. To that end, standardized instruments are translated and localized to ensure that questions are psychometrically equivalent. Variability is introduced when interviewers or caregivers are required to interpret symptoms and behaviors and record patient cognitive, physical and/or mental scores on various instruments. Raters can be influenced by bias, technique, and scale recording; rater variance has been cited as contributing to low reliability and failed trials.

Rater training has been proven to reduce rater errors and standardize scale administration. Training on each instrument and its administration is recommended for clinicians, observers and patients by global regulators, and endorsed by ISPOR. This reviews regulatory guidelines for rater training, and evidence of rater training impact on clinical data integrity.

I. Regulatory Guidelines and Recommendations
Assessments that rely on repeated administrations to detect change are subject to high learning effects and variability due to rater interpretation, bias and scale administration. These variables risk trial success.

Recognizing these variables, global regulators endorse patient and clinician training for more standardized data sets. The most direct recommendations are from FDA, EMA and ISPOR. The FDA PRO Guidance , the EMA Reflection Paper and the ISPOR ClinRO Task Force: Clinician-Reported Outcomes Good Measurement Practices describe good measurement principles and recommend rater training.

II. Scientific Literature: Effects of Poor Rater Reliability
The benefit of rater training is supported with scientific research which continues to quantify how rater training reduces clinician drift and inter-rater reliability. In one study Kahn et al identified the sources of unreliability in a failed clinical trial by assessing scores on the Positive and Negative Syndrome Scale (PANSS). This was a sub-study from a failed Phase 2 double-blind, placebo-controlled trial of schizophrenia. Rater variability for PANSS symptom was cited as a source of unreliability and evidence for the relationship between low rater reliability and a failed trial.

III. Scientific Literature: How to Improve Clinical Trial Rater Consistencies
Scientific literature around rater training within clinical trials provides evidence for didactic and calibrated rater training. For some assessments, rater drift has been quantified to generate recommended retraining time periods. In 2006 Targum et al sought to determine if previous clinical experience or specific rater training programs improved rater competency. They discovered that participation in repeated rater training sessions significantly improved rater competency, enabling fewer scoring deviations. They concluded that while clinical experience is not a substitute for rater training, repetitive rater training sessions improved rater competency significantly, regardless of previous clinical experience.

Summary Rater training has been shown to impact rater reliability by improving scoring deviation, drift and bias. It is recommended by many instrument owners to ensure uniform completion and use of the measure, increase signal and reduce noise for improved statistics. Rater training has proven to improve data accuracy with a treatment of standardized technique, interpretation and recording.
 

About the Authors
Dr. Susan Dallabrida is Vice President; and Christine Hall is Director of Business Development, in the Clinical Science and Consulting department of patient-driven eData systems innovator PHT Corporation.


References:

[1] Khan A, Yavorsky WC, Liechti S, DiClemente G, Rothman B. Assessing the sources of unreliability (rater, subject, time-point) in a failed clinical trial using items of the Positive and Negative Syndrome Scale (PANSS). J Clin Psychopharmacol. 2013 Feb;33(1):109-17

[1] http://www.phtcorp.com/Resources/Regulatory-Documents/PDFs/FDA-Guidance-Patient-Reported-Outcome-Measures.aspx accessed July2014

[1] http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2011/08/WC500110059.pdf accessed Oct 2014

[1] http://www.ispor.org/taskforces/documents/ClinRO-TF-draft-abstracts.pdf accessed July2014

[1] Khan A, Yavorsky WC, Liechti S, DiClemente G, Rothman B. Assessing the sources of unreliability (rater, subject, time-point) in a failed clinical trial using items of the Positive and Negative Syndrome Scale (PANSS). J Clin Psychopharmacol. 2013 Feb;33(1):109-17

[1] Targum SD, Evaluating Rater Competency for CNS Clinical Trials. J Clin Psychopharmacol 2006;26:308-310