Commentary

Study used to justify international women’s track and field eligibility rule leaves questions

New track and field regulations from the International Association of Athletics Federations, or IAAF, due to take effect in November will require some female athletes with elevated testosterone levels to medically reduce their testosterone to compete in select events. The regulations specify that the threshold applies only to “relevant females” who have a disorder of sex development, as defined by the organization, testosterone levels of at least 5 nmol/L, and sufficient androgen sensitivity.

Most concerning to me, initially, was the suggestion that healthy women must alter their physiology with medications to participate in competition. That said, it is worth considering the impetus for the new regulations and carefully analyzing both the evidence on which the regulations are based and the likely effects.

History of women’s eligibility

The IAAF regulation states that women diagnosed with a disorder of sex development, defined as a variety of specific syndromes, and with endogenous testosterone levels of at least 5 nmol/L must reduce that level using medication to compete in certain track and field events. The threshold was designed to exclude women with polycystic ovary syndrome.

Tamara Wexler

This is not the first time the IAAF has established thresholds for endogenous testosterone levels in women, nor the first time it has been challenged. After the publicity surrounding the case of Caster Semenya, a winning female sprinter from South Africa, the IAAF enacted regulations on May 1, 2011, requiring all female athletes with hyperandrogenism to be evaluated by the committee prior to competing in any international event. To be eligible to compete, all women were required to have endogenous testosterone levels below 10 nmol/L (“the normal male range,” according to the IAAF) or have androgen resistance. These 2011 regulations left the decision for eligibility to return to competition in the hands of an IAAF-appointed expert medical panel, and the IAAF noted that the panel might itself require proof of compliance with a prescribed medical treatment, but did not specify any particular treatment.

The 2011 regulations were challenged by competitor Dutee Chand and in 2015 were suspended by the Court of Arbitration for Sport, based largely on the court’s determination that the degree of performance advantage from endogenous testosterone level was not sufficient to justify exclusion from competition. The IAAF was allotted 2 years to provide evidence that endogenous testosterone level substantially affects competitive advantage — ie, an impact on the order of 10%, not the 1% to 3% suggested at the time — to justify exclusion from competition; if this bar was not met, the ruling would be deemed void.

In response, the IAAF pursued the requested evidence, looking to quantify the relationship between higher testosterone level and competitive advantage. Results of the study, co-written by IAAF consultant Stéphane Bermon, MD, PhD, of the Université Côte d’Azur in Nice, France, and Monaco Institute of Sports Medicine and Surgery, and Pierre-Yves Garnier, MD, director of the IAAF health and science department, and funded by the IAAF and the World Anti-Doping Agency, were published last year in the British Journal of Sports Medicine. Interpretation of the IAAF study and other new evidence is likely to play a key role in whether the new regulation stands.

Evidence base

The reduction of the regulation’s threshold testosterone level from 10 nmol/L to 5 nmol/L is based on an IAAF review of literature that concludes that testosterone levels in “normal females” and in those with PCOS do not exceed 4.8 nmol/L. Further, the IAAF holds that “(a) below 5 nmol/L, there is limited evidence of any material testosterone doseresponse; but (b) an increase in circulating testosterone from normal female range up to between 5 and 10 nmol/L delivers a clear performance advantage (according to the studies, a 4.4% increase in muscle mass and a 12% to 26% increase in muscle strength, and a 7.8% increase in haemoglobin).” However, the evidence is not yet published and, thus, not available for review.

The 2017 IAAF study conducted by Bermon and Garnier is a cross-sectional examination of data from male and female athletes participating in the 2011 and 2013 IAAF World Championships in Athletics, which includes some overlap. Blood samples from these athletes were nonfasting and could be drawn at any time of day. The highest and lowest tertiles of calculated free testosterone were compared with athletic performance results in track and field events and with hemoglobin levels. The highest tertile included results from women later found to have been doping (nine of 1,332). The strongest association (4.53%) was found in the hammer throw, followed by the pole vault (2.94%); of note, neither event is included in the new regulations as a restricted event. The restricted events include the other events studied, the 400-m and 800-m races and the 400-m hurdles, as well as a number of events not included in the study. The highest calculated free testosterone tertile was associated with better performance in these events, but the differences were 2.73%, 1.78% and 2.78%, respectively — still within the 1% to 3% advantage deemed insufficient by the Court of Arbitration for Sport in its 2015 suspension ruling.

Statistical significance is not the same as clinical significance and, regardless, does not speak to causality. There was no attempt to account for other contributing variables. Associations were not found in the male athletes based on highest free T4 tertile. Although higher free T4 levels tracked with higher hemoglobin concentrations, no association was found between hemoglobin concentration and performance. This further underscores the limitations of a cross-sectional study using tertiles.

Although Bermon and Garnier interpret their results as demonstrating that high free T4 levels confer a 1.8% to 2.8% competitive advantage in select racing events, the methodology does not support such a strong statement, and the significance of a less than 3% difference was already disputed by the Court of Arbitration for Sport. In addition, the use of calculated free T4 levels is open to debate, particularly given that the IAAF regulation refers only to total testosterone levels. In addition, the use of testosterone measurements to determine relative levels of active testosterone in these low ranges (lower than those in a normal male range) is less reliable.

The publication issue that holds Bermon and Garnier’s study also includes another cross-sectional analysis by Eklund and colleagues that found an association between physical performance measures and androgen precursors and metabolites — but, notably, no association between performance and testosterone levels. Eklund and colleagues compared 106 Swedish female Olympic athletes with age- and BMI-matched sedentary controls. Morning fasting laboratory evaluation included androgen precursors, such as dehydroepiandrosterone, and androgen metabolites as well as testosterone. After correlation and regression analysis, DHEA levels were significantly higher and estrone levels significantly lower in athletes vs. sedentary controls; athletes also had more frequent menstrual dysfunction. Athletes had higher lean muscle mass and bone density and lower percent body fat compared with controls. Based on multiple regression analysis, DHEA most strongly predicted squat jump performance results, and lean mass most strongly predicted countermeasure jump performance results. Body composition and explosive physical performance measures (the squat jump and countermeasure jump) were collected in a subset of the athletes, limiting the study. There was no difference in testosterone levels.

Published criticism

It is important to highlight the methodology in these studies, as cross-sectional studies can suggest correlations but cannot demonstrate causality. Published criticism by colleagues has focused on the data set and the methodology used by Bermon and Garnier. Points of contention include the use of tertiles and calculated free testosterone. Menier focuses on the absence of a clear objective threshold for high testosterone levels. Several have noted the distinction between free testosterone used in the Bermon and Garnier paper and the total testosterone in the IAAF regulations. The correlation cited between free testosterone tertiles and performance in five of 21 events does not hold true for all five events when using total endogenous testosterone. Furthermore, Sonksen and colleagues note a nonsignificant trend toward better performance in the lowest free testosterone tertile in nine of the 21 events.

Sonksen and colleagues point to the Eklund study as employing the preferable methodology of looking for direct correlation between performance and endogenous testosterone levels — not tertiles — and noting that there was none found in the Eklund study. Interestingly, in a recent response to criticism of the data set and methodology upon which the regulation was based, Bermon, Eklund and others aver that the two studies both show that androgens within the normal range are associated with athletic performance, although the results with testosterone itself vary. In their response to criticisms, Bermon and colleagues share results of a separate statistical analysis using total testosterone values and excluding second appearances from individual athletes. Treating aggregated results from six events as one group, they found a correlation between performance and total testosterone levels; while they state that comparison between performance in select events and total testosterone tertiles confirm a correlation, the performance advantage was not greater than 3%. (Note: free testosterone is a better metric when measured properly; it is the IAAF regulation’s use of total testosterone that seems more curious.)

Franklin and colleagues argue that the associations found by Bermon and Garnier could have appeared by chance; Bermon and colleagues responded that chance is not a likely explanation. The raw data themselves are not publicly available, so Franklin and colleagues based their concern on an analysis of the P values, while calling for the raw data to be made public to allow independent rigorous analysis.

The authors’ responses do not fully address issues in the methodology. Even if the data are not made public, an independent analysis might alleviate some criticism. However, it is not only the data and interpretations used that are engendering criticism, but the fact that these correlations are being used to enact regulations — regulations of the physiology of healthy women. In question are both the results — their source, methodology and strength — and the use of those results to generate regulations.

Looking ahead

Semenya and Athletics South Africa have challenged the new IAAF regulations, and the Court of Arbitration for Sport has opened an arbitration procedure. While one can understand the IAAF’s stated intent, it also is easy to understand challenges to the regulation. The requirement to lower endogenous hormone levels with medication in healthy female athletes may sit poorly, resonating with tones of required “normalization” of individuals. In addition, there are issues with both the methodology of the IAAF study and with its application of study findings (with a < 3% association) to a far-reaching policy. The IAAF states that the regulation is intended to pursue fair competition in sport, but it focuses on one measurement, total testosterone level, among the many genetic and environmental factors that determine athleticism and competitive advantage. Statistical significance is not equivalent to clinical significance, and neither is it clear that any competitive advantage is due to higher testosterone levels (particularly given the discordance between the two recent studies), nor that any difference in performance is due to any androgen parameter.

Whether the IAAF regulation is suspended or deemed void is likely to depend in part on the interpretation of new evidence produced by the IAAF and whether the court is convinced that the chosen measure and threshold confer a meaningful competitive advantage.

Disclosure: Wexler reports no relevant financial disclosures.

New track and field regulations from the International Association of Athletics Federations, or IAAF, due to take effect in November will require some female athletes with elevated testosterone levels to medically reduce their testosterone to compete in select events. The regulations specify that the threshold applies only to “relevant females” who have a disorder of sex development, as defined by the organization, testosterone levels of at least 5 nmol/L, and sufficient androgen sensitivity.

Most concerning to me, initially, was the suggestion that healthy women must alter their physiology with medications to participate in competition. That said, it is worth considering the impetus for the new regulations and carefully analyzing both the evidence on which the regulations are based and the likely effects.

History of women’s eligibility

The IAAF regulation states that women diagnosed with a disorder of sex development, defined as a variety of specific syndromes, and with endogenous testosterone levels of at least 5 nmol/L must reduce that level using medication to compete in certain track and field events. The threshold was designed to exclude women with polycystic ovary syndrome.

Tamara Wexler

This is not the first time the IAAF has established thresholds for endogenous testosterone levels in women, nor the first time it has been challenged. After the publicity surrounding the case of Caster Semenya, a winning female sprinter from South Africa, the IAAF enacted regulations on May 1, 2011, requiring all female athletes with hyperandrogenism to be evaluated by the committee prior to competing in any international event. To be eligible to compete, all women were required to have endogenous testosterone levels below 10 nmol/L (“the normal male range,” according to the IAAF) or have androgen resistance. These 2011 regulations left the decision for eligibility to return to competition in the hands of an IAAF-appointed expert medical panel, and the IAAF noted that the panel might itself require proof of compliance with a prescribed medical treatment, but did not specify any particular treatment.

The 2011 regulations were challenged by competitor Dutee Chand and in 2015 were suspended by the Court of Arbitration for Sport, based largely on the court’s determination that the degree of performance advantage from endogenous testosterone level was not sufficient to justify exclusion from competition. The IAAF was allotted 2 years to provide evidence that endogenous testosterone level substantially affects competitive advantage — ie, an impact on the order of 10%, not the 1% to 3% suggested at the time — to justify exclusion from competition; if this bar was not met, the ruling would be deemed void.

PAGE BREAK

In response, the IAAF pursued the requested evidence, looking to quantify the relationship between higher testosterone level and competitive advantage. Results of the study, co-written by IAAF consultant Stéphane Bermon, MD, PhD, of the Université Côte d’Azur in Nice, France, and Monaco Institute of Sports Medicine and Surgery, and Pierre-Yves Garnier, MD, director of the IAAF health and science department, and funded by the IAAF and the World Anti-Doping Agency, were published last year in the British Journal of Sports Medicine. Interpretation of the IAAF study and other new evidence is likely to play a key role in whether the new regulation stands.

Evidence base

The reduction of the regulation’s threshold testosterone level from 10 nmol/L to 5 nmol/L is based on an IAAF review of literature that concludes that testosterone levels in “normal females” and in those with PCOS do not exceed 4.8 nmol/L. Further, the IAAF holds that “(a) below 5 nmol/L, there is limited evidence of any material testosterone doseresponse; but (b) an increase in circulating testosterone from normal female range up to between 5 and 10 nmol/L delivers a clear performance advantage (according to the studies, a 4.4% increase in muscle mass and a 12% to 26% increase in muscle strength, and a 7.8% increase in haemoglobin).” However, the evidence is not yet published and, thus, not available for review.

The 2017 IAAF study conducted by Bermon and Garnier is a cross-sectional examination of data from male and female athletes participating in the 2011 and 2013 IAAF World Championships in Athletics, which includes some overlap. Blood samples from these athletes were nonfasting and could be drawn at any time of day. The highest and lowest tertiles of calculated free testosterone were compared with athletic performance results in track and field events and with hemoglobin levels. The highest tertile included results from women later found to have been doping (nine of 1,332). The strongest association (4.53%) was found in the hammer throw, followed by the pole vault (2.94%); of note, neither event is included in the new regulations as a restricted event. The restricted events include the other events studied, the 400-m and 800-m races and the 400-m hurdles, as well as a number of events not included in the study. The highest calculated free testosterone tertile was associated with better performance in these events, but the differences were 2.73%, 1.78% and 2.78%, respectively — still within the 1% to 3% advantage deemed insufficient by the Court of Arbitration for Sport in its 2015 suspension ruling.

PAGE BREAK

Statistical significance is not the same as clinical significance and, regardless, does not speak to causality. There was no attempt to account for other contributing variables. Associations were not found in the male athletes based on highest free T4 tertile. Although higher free T4 levels tracked with higher hemoglobin concentrations, no association was found between hemoglobin concentration and performance. This further underscores the limitations of a cross-sectional study using tertiles.

Although Bermon and Garnier interpret their results as demonstrating that high free T4 levels confer a 1.8% to 2.8% competitive advantage in select racing events, the methodology does not support such a strong statement, and the significance of a less than 3% difference was already disputed by the Court of Arbitration for Sport. In addition, the use of calculated free T4 levels is open to debate, particularly given that the IAAF regulation refers only to total testosterone levels. In addition, the use of testosterone measurements to determine relative levels of active testosterone in these low ranges (lower than those in a normal male range) is less reliable.

The publication issue that holds Bermon and Garnier’s study also includes another cross-sectional analysis by Eklund and colleagues that found an association between physical performance measures and androgen precursors and metabolites — but, notably, no association between performance and testosterone levels. Eklund and colleagues compared 106 Swedish female Olympic athletes with age- and BMI-matched sedentary controls. Morning fasting laboratory evaluation included androgen precursors, such as dehydroepiandrosterone, and androgen metabolites as well as testosterone. After correlation and regression analysis, DHEA levels were significantly higher and estrone levels significantly lower in athletes vs. sedentary controls; athletes also had more frequent menstrual dysfunction. Athletes had higher lean muscle mass and bone density and lower percent body fat compared with controls. Based on multiple regression analysis, DHEA most strongly predicted squat jump performance results, and lean mass most strongly predicted countermeasure jump performance results. Body composition and explosive physical performance measures (the squat jump and countermeasure jump) were collected in a subset of the athletes, limiting the study. There was no difference in testosterone levels.

Published criticism

It is important to highlight the methodology in these studies, as cross-sectional studies can suggest correlations but cannot demonstrate causality. Published criticism by colleagues has focused on the data set and the methodology used by Bermon and Garnier. Points of contention include the use of tertiles and calculated free testosterone. Menier focuses on the absence of a clear objective threshold for high testosterone levels. Several have noted the distinction between free testosterone used in the Bermon and Garnier paper and the total testosterone in the IAAF regulations. The correlation cited between free testosterone tertiles and performance in five of 21 events does not hold true for all five events when using total endogenous testosterone. Furthermore, Sonksen and colleagues note a nonsignificant trend toward better performance in the lowest free testosterone tertile in nine of the 21 events.

PAGE BREAK

Sonksen and colleagues point to the Eklund study as employing the preferable methodology of looking for direct correlation between performance and endogenous testosterone levels — not tertiles — and noting that there was none found in the Eklund study. Interestingly, in a recent response to criticism of the data set and methodology upon which the regulation was based, Bermon, Eklund and others aver that the two studies both show that androgens within the normal range are associated with athletic performance, although the results with testosterone itself vary. In their response to criticisms, Bermon and colleagues share results of a separate statistical analysis using total testosterone values and excluding second appearances from individual athletes. Treating aggregated results from six events as one group, they found a correlation between performance and total testosterone levels; while they state that comparison between performance in select events and total testosterone tertiles confirm a correlation, the performance advantage was not greater than 3%. (Note: free testosterone is a better metric when measured properly; it is the IAAF regulation’s use of total testosterone that seems more curious.)

Franklin and colleagues argue that the associations found by Bermon and Garnier could have appeared by chance; Bermon and colleagues responded that chance is not a likely explanation. The raw data themselves are not publicly available, so Franklin and colleagues based their concern on an analysis of the P values, while calling for the raw data to be made public to allow independent rigorous analysis.

The authors’ responses do not fully address issues in the methodology. Even if the data are not made public, an independent analysis might alleviate some criticism. However, it is not only the data and interpretations used that are engendering criticism, but the fact that these correlations are being used to enact regulations — regulations of the physiology of healthy women. In question are both the results — their source, methodology and strength — and the use of those results to generate regulations.

Looking ahead

Semenya and Athletics South Africa have challenged the new IAAF regulations, and the Court of Arbitration for Sport has opened an arbitration procedure. While one can understand the IAAF’s stated intent, it also is easy to understand challenges to the regulation. The requirement to lower endogenous hormone levels with medication in healthy female athletes may sit poorly, resonating with tones of required “normalization” of individuals. In addition, there are issues with both the methodology of the IAAF study and with its application of study findings (with a < 3% association) to a far-reaching policy. The IAAF states that the regulation is intended to pursue fair competition in sport, but it focuses on one measurement, total testosterone level, among the many genetic and environmental factors that determine athleticism and competitive advantage. Statistical significance is not equivalent to clinical significance, and neither is it clear that any competitive advantage is due to higher testosterone levels (particularly given the discordance between the two recent studies), nor that any difference in performance is due to any androgen parameter.

PAGE BREAK

Whether the IAAF regulation is suspended or deemed void is likely to depend in part on the interpretation of new evidence produced by the IAAF and whether the court is convinced that the chosen measure and threshold confer a meaningful competitive advantage.

Disclosure: Wexler reports no relevant financial disclosures.