Issue: August 2013
August 01, 2013
15 min read

Scientific organizations collaborate to improve assay reliability

Issue: August 2013
You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact

The diagnostic criteria for diabetes and thyroid disease, as well as androgen, estrogen or 25-hydroxyvitamin D deficiencies have changed over time.

For decades, clinicians have recognized the variability surrounding measurements for the diagnosis and management of endocrine disorders and diseases, accepting the notion that variabilities are simply unavoidable. Now, however, members from the scientific community are working together to standardize and harmonize assays for improved clinical care.

“Accurate and reliable lab testing is fundamental to the correct diagnosis and management of patients with endocrine disorders,” David B. Sacks, MBChB, senior investigator and chief of clinical chemistry and deputy chief of the department of laboratory medicine at NIH and adjunct professor of medicine in the department of endocrinology and metabolism at Georgetown University, said in an interview.

Endocrine Today spoke with experts on the topic of diagnostic criteria, who said many assays are incapable of reproducibility and several have coefficients of variations over appropriate percentages. These controversies and the variability of assays have led to the formation of organizations such as the International Federation of Clinical Chemistry Working Group (IFCC-WG) on HbA1c and the National Glycohemoglobin Standardization Program (NGSP). Established in 1995 and 1996, respectively, the IFCC-WG and NGSP are the result of collaboration between organizations led by clinicians, researchers and scientific organizations to standardize HbA1c measurements.

David B. Sacks, MBChB, of the NIH and Georgetown University, said obtaining accurate lab results is essential so that diagnostic and treatment decisions are based on true values. Photo by Bill Branson.

David B. Sacks, MBChB, of the NIH and Georgetown University, said obtaining accurate lab results is essential so that diagnostic and treatment decisions are based on true values. Photo by Bill Branson.

HbA1c for diabetes

The IFCC and NGSP have developed accuracy-based numbers, linked to clinical outcomes and diabetes care goals that can be directly traced back to results from the Diabetes Control and Complications Trial (DCCT) and the United Kingdom Prospective Diabetes Study (UKPDS), which demonstrated that HbA1c predicted the risk for complications.

“The DCCT and UKPDS showed that HbA1c predicted the risk for complications,” Sacks, who is also chair of the NGSP steering committee, told Endocrine Today. “The HbA1c assays have had clinically significant impacts on treatment decision making.”

Thus, measurement of HbA1c has been used as a diagnostic tool for years, despite controversy over the precision of the test due to a lack of standardization, he said. In 2009, an international expert panel from the American Diabetes Association, European Association for the Study of Diabetes and the International Diabetes Federation endorsed the use of HbA1c for diagnosis. In the years that followed, more organizations agreed with the endorsement and accepted the test as a standard form of practice.

In May, for the first time ever, the FDA allowed marketing of an HbA1c assay intended for diabetes diagnosis, the COBAS INTEGRA 800 Tina-quant HbA1cDx (Roche). According to an agency press release, investigators analyzed 141 blood samples and found a difference of less than 6% in accuracy between the test and results from a standard reference of HbA1c analysis.

“When clinicians order HbA1c tests for diagnosis, they have no idea which method their laboratories are going to be using. Conversely, the laboratory typically doesn’t know why the clinician is ordering such a test. It doesn’t say on the request form as to whether it’s for diagnosis or monitoring because it’s been used for monitoring for over 20 years,” Sacks said.

The ADA Clinical Practice Recommendations now recommend using HbA1c to diagnose diabetes using a NGSP-certified method and a standard cut-off of HbA1c of at least 6.5%, according to the literature. However, point-of-care systems are not part of the ADA’s recommendations.


However, some factors may still interfere with HbA1c test measurements (ie, genetic variants, elevated fetal hemoglobin and chemically modified derivatives of hemoglobin in patients with renal failure). Furthermore, any condition that shortens the lifespan of red blood cells can lead to inaccurate HbA1c test results, according to Sacks.

“It’s important to have accurate results so the treatment decisions and diagnostic decisions are based on true values,” Sacks said.

Endocrine Today Editorial Board Member and clinical professor of medicine at Mount Sinai School of Medicine, Zachary T. Bloomgarden, MD, said he prefers using OGTT for measuring abnormal levels of HbA1c.

“For the majority of people with diabetes, HbA1c is an extremely useful way of getting a sense of what their average blood sugar is. But for something as important as the diagnosis of diabetes, you want to make sure the blood sugar is really in the diabetic range before you tell a person they have diabetes,” he said.

Zachary T. Bloomgarden

Vitamin D deficiency

In 2010, Neil Binkley, MD, of the University of Wisconsin Osteoporosis Clinical Center & Research Program, and colleagues published an NIH-funded study on the current status of clinical 25-(OH)D measurements. Variability between laboratory assays was again an issue of concern, so Binkley and colleagues sought to compare quality assurance between immunoassays and chromatographic-based methods.

They found that modest inter-laboratory variability continues in measurements of 25-(OH)D; they observed a slight systematic bias for some laboratories (range: positive mean bias of 4.2 ng/mL to a negative mean bias of 1.4 ng/mL). According to data, 22 of 25 results demonstrated numerically higher bias compared with liquid chromatography with ultraviolet detection (15.7%). Based on their findings, Binkley and colleagues said laboratory calibration schemes must be validated, and their traceability should be documented to a national standard.

According to Endocrine Today Editorial Board member, Robert D. Blank, MD, PhD, associate professor of medicine and chief of endocrinology at the Medical College of Wisconsin, measurements of 25-(OH)D are conducted using a variety of methods, but only two are superior: mass spectrometry and high-performance liquid chromatography.

Blank said one of the most important findings from Binkley and colleagues was that values near the clinical decision level of 30 ng/mL may be either above or below the actual threshold.

However, many 25-(OH)D assays are completed using immunoassays, which are much less reproducible than either of those methods, he said.

“They are also much less capable of distinguishing some of the related compounds that are important,” Blank said. “All of the immunoassays are potentially problematic because they are not as specific as the mass spectrometry and [high-performance liquid chromatography].”

Robert D. Blank

Robert D. Blank

On July 15, the FDA gave 510(k) clearance for the FastPack Vitamin D Immunoassay (Qualigen), indicated for the rapid testing of vitamin D status. According to the press release, the assay is the only fully quantitative immunoassay system featuring a one-touch operation with results in 10 minutes or less.

Blank told Endocrine Today that the amount of variability in the controls signal an area of concern. According to control material posted on the manufacturer’s website, two levels of control are available to allow performance monitoring within the clinical range: control 1 has an expected range of 27 ± 13 ng/mL, whereas control 2 has an expected range of 65 ± 26 ng/mL.

“We are typically interested in values between 10 and 40 ng/dl, so the values used to calibrate the Qualigen assay don’t speak to the bulk of the clinically relevant range,” Blank said.

He said there is a demand for standardizing the assays. Current Endocrine Society guidelines suggest that vitamin D deficiency is evident if 25-(OH)D levels are less than 30 ng/mL. However, the Institute of Medicine suggests at least 20 ng/mL to signal vitamin D deficiency. Yet, many patients have vitamin deficiencies. According to Blank, the controversy arises in part due to the fact that the guidelines address different populations. However, even when this factor is taken into account, differences of opinion remain.


“Whenever there are disagreements among authoritative groups, it is indicative that there’s no consensus among the expert community,” he said. “The bottom line is that there is no straight answer, and doctors still have to be doctors; they have to assess the benefits and the risks in each individual patient.”

Thyroid function assays

Thyroid function tests are not without problems, either. There, the major controversy has been the reliability of free thyroxine immunoassays, according to James Faix, MD, clinical professor of pathology at Stanford University School of Medicine and member of the IFCC Committee for the Standardization of Thyroid Function Tests.

“In 2010, we published a three-part series looking at commercially available immunoassays for all of the thyroid function tests. With regard to free T4, almost all of them were much lower than the reference measurement procedure,” Faix said. “There’s definitely a lack of agreement between the assays, and they all tend to be lower than the reference measurement procedure; that’s been the suspicion of most people in endocrinology.”

James Faix

James Faix

According to the report in Clinical Chemistry, this negative bias can be corrected by recalibrating. But this may be problematic for those manufacturers whose assays show the most negative bias. Additionally, it may not be feasible if the FDA considers this a major change requiring resubmission of the assays for approval.

“Most clinicians use TSH [thyroid-stimulating hormone] as a single test screen for thyroid function, but when the TSH looks abnormal, verification is used with the free T4 assay,” Faix said. “There hasn’t been much concern over TSH, but we also showed there was a bit of variability between the different TSH assays, and that harmonizing TSH could also be beneficial.”

However, studies have shown that TSH can suggest thyroid disease when it is not present.

“The most common problem is that TSH can be suppressed in the absence of thyroid disease because there are a lot of things that can lower the TSH release by the pituitary gland, especially variables such as stress. Patients who are critically ill, under tremendous psychological or physical stress can have low TSH levels,” Faix said. “This can cause a misdiagnosis of hyperthyroidism.”

Carole A. Spencer

Carole A. Spencer

However, standardization is not the only issue when it comes to TSH, according to Carole A. Spencer, PhD, of the USC Endocrine Laboratories. Spencer said TSH is one of the better assays across most platforms, in that most assays achieve optimal functional sensitivity of 0.01 mIU/L.

“What tends to be overlooked by most researchers and physicians regarding TSH assays is the fact that these immunoassays use monoclonal antibodies, which capture different TSH isoforms. Regardless of standardization, what we’re measuring is probably part biologically active and part not biologically active,” Spencer said.

According to Spencer, most of the assay manufacturers adopt the third US National Health and Nutrition Examination Survey reference range, but tend to only put one range on the report. “Ideally, reference ranges should be age dependent and, to some extent, ethnicity dependent,” she said.

Moreover, Spencer said there is an issue with a low index of individuality; meaning the reference range is a poor parameter for detecting thyroid dysfunction in the individual patient.

“References ranges are weak parameters for detecting thyroid dysfunction unless that dysfunction becomes severe,” she said.

Reproductive hormones

Testosterone ranges and testing methods have been the topic of recent presentations at scientific conferences and have led to new data regarding the methods of measurement. In 2010, the Partnership for the Accurate Testing of Hormones (PATH) was formed by William Rosner, MD, of St. Luke’s/Roosevelt Hospital Center and Columbia University College of Physicians and Surgeons, and Hubert W. Vesper, PhD, director of the clinical standardization programs, chief of protein biomarker and lipid reference laboratory at the clinical chemistry branch in the division of laboratory sciences at the CDC.


“Testosterone assays have been known to be flawed for a long time, but like many other things, not much was being done about it,” Rosner told Endocrine Today.

Rosner said the ability to trace a laboratory to the standards compiled by the group has already been established and grown exponentially over the last year or two.

“We’ve been enormously successful,” he said.

The major recommendations made by Rosner and colleagues include:

  1. Technical improvements for assay standardization;
  2. Education of health care providers, patients and all others concerned with testosterone testing;
  3. Plans to encourage all concerned journals, government agencies and health insurance companies to support the effort; and
  4. Encouragement of manufacturers to develop better and more cost-effective assays.

There’s been enormous progress; there’s no question about that,” Rosner said. “The awareness of the problem is widespread, and all of the major reference laboratories have put methods on board to allow accuracy-based measurements.”

William Rosner

William Rosner

Vesper agreed, adding that the testosterone program that began in 2006 has resulted in a 50% reduction in variability among mass spectrometry assays based on the work they have done thus far.

According to Rosner, professional organizations have a responsibility to their membership to point out when assays are not performing, what is being done, who is doing it the correct way and what should continue to happen. He said there will soon be standard testosterone ranges similar to those for cholesterol levels.

In April, an Endocrine Society working group led by Rosner reported that better estrogen testing methods are also needed. According to their paper published in the Journal of Clinical Endocrinology and Metabolism, the measurement of estradiol in biological fluids is important in human biology from birth until death.

“[Estradiol] is now known to have important effects on skin, blood vessels, bone, muscle, coagulation, hepatic cells, adipose tissue, the kidney, the gastrointestinal tract, brain, lung and pancreas,” they wrote.

Additionally, it is likely that variations in levels below 0.2 pg/mL — the limit of sensitivity of most estradiol assays — are meaningful, they said.

Hubert W. Vesper

Hubert W. Vesper

From a CDC perspective, Vesper spoke to Endocrine Today about the controversies or concerns with the reliability of estradiol assays. He said hormone standardization activities began with testosterone because The Endocrine Society found there was a need to improve measurements. Many other organizations also felt a need for estradiol to be included, he added.

“It’s often used with in vitro fertilization to ensure the safety of ovarian stimulation, but it is also used to assess other conditions, and research findings actually associate estradiol with bone health and breast cancer,” Vesper said.

There are many meta-analyses that show postmenopausal women with elevated levels of estradiol have an increased risk for breast cancer, he said.

“Here we have a situation where research shows there is a benefit in measuring estradiol to assess a woman’s risk for breast cancer, but we need to come up with a cut-off value to distinguish a person at high risk and a person not at high risk. We cannot come up with a cut-off value until there are comparable measurements,” Vesper said.


Moving forward

Although technological advances have been made, variations continue.

“One interesting thing is that over the past 20 years or so we saw a profound development in technology. In the early days, we used immunological methods like radioimmunoassay,” Vesper said. “Now, physicians have access to mass spectrometry-based methods so that physicians, researchers and health officials have a wide range of technologies at their finger tips, which are profound advancements. However, the measurement accuracy did not parallel this development. Therefore, we still have high variability, even though we have much more advanced technologies.”

Vesper said he and the PATH committee are now working to provide a point of reference for other laboratories and researchers so that assay manufacturers can calibrate their tests or verify calibration.

Second, they are evaluating measurement accuracy once a laboratory has calibrated their assay.

Finally, they are assessing the accuracy at a level of patient care, public health and research by collaborating with proficiency test providers, principal investigators on studies and health care professionals dealing with large population studies. From there, the group provides quality control materials or assigns target values to the quality control materials that are then used to enhance patient care and research accuracy over time.

They also have a program to lend support for vitamin D and for thyroid hormone standardization.

“It’s important for practicing physicians to understand the assays they are using,” Blank said. “Because then they know exactly how much faith they should have in the results they get. They need to recognize that there’s slop in even the best system.” – by Samantha Costa

American Diabetes Association. Diabetes Care. 2010; 33(Suppl 1):S11-S61.
Azziz R. J Clin Endocrinol Metab. 2005;90:4650-4658.
Binkley N. Clin Chim Acta. 2010;411:1976-1982.
Faix JD. #S16-4. Standardization and harmonization of thyroid hormone assays. Presented at: The Endocrine Society Annual Meeting and Expo; June 15-18, 2013; San Francisco.
Hoerger TJ. Prev Chronic Dis. 2011;8:A136.
Moskovic DJ. J Sex Med. 2013;10:562-569.
Rosner W. J Clin Endocrinol Metab. 2006;92:405-413.
Rosner W. J Clin Endocrinol Metab. 2010;95:4542-4548.
Rosner W. J Clin Endocrinol Metab. 2013;98:1376-1387.
Thienpont LM. Clin Chem. 2010; 56:902-929.
Vesper HW. Clin Lab News. 2012;38:1-9.
Robert D. Blank, MD, PhD, can be reached at the Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226; email:
Zachary T. Bloomgarden, MD, can be reached at Mount Sinai School of Medicine, 35 E. 85th St., New York, NY 10028; email:
James D. Faix, MD, can be reached at the Stanford Clinical Laboratory at Hillview, 3375 Hillview Ave., MC 5627, Palo Alto, CA 94304-1204; email:
William H. Herman, MD, MPH, can be reached at 1000 Wall St., Room 6100/SPC 5714, Ann Arbor, MI 48105; email:
William Rosner, MD, can be reached at Department of Medicine, St. Luke’s-Roosevelt Hospital Center, 1000 10th Ave., New York, NY 10019; email:
David M. Nathan, MD, can be reached at:
David B. Sacks, MBChB, can be reached at the Department of Laboratory Medicine Clinical Center, National Institutes of Health Building 10, Room 2C-306, 10 Center Drive, Bethesda, MD 20892; email:
Carole A. Spencer, PhD, can be reached at the USC Endocrine Laboratories, 126 W. Del Mar Blvd., Pasadena, CA 91105; email:
Hubert W. Vesper, PhD, can be reached at the Clinical Chemistry Branch Division of Laboratory Sciences at the Centers for Disease Control and Prevention, 4770 Buford Highway NE MS F25, Atlanta, GA 30341; email:
Disclosure: Bloomgarden reports being a consultant/advisor for: Boehringer Ingelheim, BMS/Astra Zeneca, Dainippon Sumitomo Pharma America, Forest Laboratories, Johnson & Johnson, Medtronics, Merck, and Novartis; speaker for: Boehringer Ingelheim, Merck, NovoNordisk, and Santarus; stockholder of: Baxter International, CVS Caremark, Novartis, Roche Holdings, and St Jude Medical. Blank, Faix, Herman, Nathan, Rosner, Sacks, Spencer, and Vesper report no relevant financial disclosures.


Does the variability in the rate of glycation among individual patients undermine the value of HbA1c as a measure of chronic glycemia?



There are a number of conditions that lower or raise HbA1c relative to the true level of glycemia. These include hemolysis, recovery from acute blood loss, erythropoetin therapy, hemoglobinopathies, thalassemias, advanced liver disease, uremia and iron deficiency anemia. When clinically recognized, the occurrence of these conditions should prompt clinicians to employ alternative measures of glycemia such as average glucose levels or fructosamine. Unfortunately, other, less well understood factors can also produce changes in HbA1c independent of glycemia. These factors that produce differences between direct intracellular measures of glycemia (HbA1c) and HbA1c predicted from extracellular measures of glycemia including plasma glucose or fructosamine (termed the glycation gap) are common, clinically important and largely unexplained. In the Estimated Average Glucose Study, a landmark study that sought to translate measured HbA1c results into estimated average glucose values, more than 500 subjects each provided approximately 2,700 glucose measurements over 3 months (Nathan DM. Diabetes Care. 2008; 31:1473-1478).

William H. Herman

William H. Herman

Despite the rigorous measurement of average glucose among individuals, the range of estimated average glucose calculated across individuals for each HbA1c level was wide. For example, the 95% CI for the estimated average glucose corresponding to an HbA1c of 7% was 123 mg/dL to 185 mg/dL, and the range for an HbA1c of 9% was 170 mg/dL to 249 mg/dL. Research by Cohen and colleagues has demonstrated that nearly one-quarter of apparently healthy individuals have measured HbA1c more than 1% higher than predicted, and nearly one in five have measured HbA1c more than 1% lower than predicted based on fructosamine levels (Cohen RM. Diabetes Care. 2003; 26:163-167). Studies have demonstrated that measured HbA1c tends to be higher than predicted in older adults compared to younger adults, in racial and ethnic minority groups compared to whites, in cigarette smokers compared to non-smokers, in individuals with higher total fat and saturated fat consumption compared to those with lower total and saturated fat consumption, and in alcohol abstainers compared to moderate drinkers. Subclinical variation in red cell turnover, differences in red cell membrane function, differences in hemoglobin oxygenation, and even genetic factors have been postulated to contribute to the glycation gap, but the mechanisms remain unknown. Nevertheless, it is clear that there is variability in the rate of hemoglobin glycation among individuals sufficient to impact the validity of HbA1c as a measure of average glycemia.

William H. Herman, MD, MPH, is professor of endocrinology and epidemiology at the University of Michigan.



The glycated hemoglobin assay (HbA1c) has represented the standard measure of chronic glycemia in epidemiologic studies, clinical trials and clinical care for more than 30 years. The need for such a measure is based on the constant fluctuations of blood glucose levels in diabetes that preclude the capture of average glucose exposure accurately with a limited number of blood glucose measurements.

The original assumption that glycated hemoglobin reflected average glycemia over the prior 2 to 3 months was predicated on the following: glycation is an irreversible non-enzymatic reaction that depends on the mass action between available amino groups (N-terminal valines and lysine residues) and ambient glucose concentrations over the life-span of the erythrocyte (with a half-life of approximately 60 days). Although glycation itself did not seem to be complicated, these assumptions did not take into account the potential for inter-individual differences in glucose transport across the erythrocyte and in red cell survival/lifespan.

The empirical data supporting the relationship between HbA1c levels and mean blood glucose (MBG) concentrations over the previous 2 to 3 months have been developed over more than 3 decades. The first two studies that established the mathematical relationship between MBG and HbA1c, and gave near-identical results, included a total of only 36 type 1 diabetes subjects and compared the arithmetic mean of four to six glucose tests per day collected over 5 to 8 weeks with the HbA1c result at the end of the period. Subsequent, larger studies have used longer sampling periods plus continuous glucose monitoring to capture the true MBG level more completely. These latter day studies have again demonstrated a durable and powerful relationship between MBG and HbA1c. The correlation coefficients approach R-values of approximately 0.93 which, considering the degree of accuracy of SMBG and CGM measurements, are quite strong.

David M. Nathan

David M. Nathan

The major question that has been raised by several studies is whether the relationship between MBG and HbA1c is different among racial groups. The studies that have suggested such differences have generally compared the relationship between a single or very limited number of blood glucose measurements and a simultaneous HbA1c measurement. These types of studies should largely be discounted owing to the potential for sampling errors and the likelihood that different groups may follow different day-to-day dietary or activity patterns that are captured by the HbA1c but not by the glucose measurements performed on the day of the visit. Until studies are performed that incorporate an adequate representation of different racial groups, capture real-life glucose levels over time and compare the derived MBG with HbA1c, we cannot conclude that glycation is different among racial groups. Recent studies have demonstrated parallel relationships of a variety of measures of chronic glycemia among racial groups and suggest that glycemia is truly different among racial groups and not a function of differences in glycation.

Any inter-individual or inter-group differences in the underlying mechanisms that may affect the formation of HbA1c may have a heuristic purpose; however, until appropriate studies as outlined above are completed, they are mechanisms in search of a phenomenon. For most persons with diabetes, HbA1c levels are a reliable measure of long-term glycemia.

David M. Nathan, MD, is director of the Clinical Research Center and of the Diabetes Center at Massachusetts General Hospital, and a professor of medicine at Harvard Medical School in Boston.