Testosterone Testing...testing (screening without regard to clinical manifestations of androgen...

Testosterone Testing

Draft Report: Public Comment & Response

February 6, 2015

20, 2012

Health Technology Assessment Program (HTA)

Washington State Health Care Authority

PO Box 42712

Olympia, WA 98504-2712

(360) 725-5126

hca.wa.gov/hta

[email protected]

Health Technology Assessment

http://www.hta.hca.wa.gov/

Testosterone Testing

Response to Public Comments on Draft Report

February 6, 2015

Prepared by: Hayes, Inc. 157 S. Broad Street Suite 200 Lansdale, PA 19446 P: 215.855.0615 F: 215.855.5218

WA – Health Technology Assessment February 6, 2015

Testosterone Testing Draft Report: Response to Public Comments of 6

Response to Public Comments, Draft Report

Testosterone Testing Hayes, Inc. is an independent vendor contracted to produce evidence assessment reports for the WA HTA program. For transparency, all comments received during the comments process are included in this response document. Comments related to program decisions, processes, or other matters not pertaining to the evidence report are acknowledged through inclusion only. When comments cite evidence, the information is forwarded to the vendor for consideration in the evidence report.

This document responds to comments from the following parties:

G. Steven Hammond, PhD, MD; Chief Medical Officer, Washington State Department of Corrections; comments presented on behalf of the Washington Agency Medical Directors

Alvin M. Matsumoto, MD; Acting Head, Division of Gerontology and Geriatric Medicine, and Professor, Department of Medicine, University of Washington School of Medicine; Associate Director, Geriatric Research, Education and Clinical Center, and Director, Clinical Research Unit, VA Puget Sound Health Care System

Table 1 provides a summary of the comments with corresponding responses.


Testosterone Testing Draft Report: Response to Public Comments Page 4 of 6

Table 1. Public Comments on Draft Report, Testosterone Testing

Comment and Source Response

January 16, 2015 Letter – Dr. Hammond, WA Agency Medical Directors

The commenter highlights some of the issues surrounding testosterone testing and testosterone supplementation:

“whether serum testosterone levels below the laboratory ‘reference range’ in adult men indicate a clinicopathological state that requires treatment.”

“whether an adult man with non-specific symptoms that could be associated with hypogonadism, who also has a serum testosterone level measured below a laboratory reference range, but without an otherwise well-defined hypogonadal condition, is likely to have improved health outcomes as a result of testosterone testing and consequent testosterone supplementation.”

Application of reference ranges derived from populations of young men to middle-aged and older men, who are “known to have continual declines in testosterone levels with increasing age.”

Mass media publicizing of the “low T” phenomenon.

Thank you for your comments. No changes needed in the report.

The commenter expressed concern about equating the term androgen deficiency, which implies a pathological state, with low serum testosterone and suggested that this statement be included in the report:

Low serum testosterone may indicate androgen deficiency. While low serum testosterone may suggest putative androgen deficiency, it must be correlated with additional clinicopathological signs to be diagnostic of hypogonadism.

Thank you for this comment. Any text suggesting that low testosterone level is equivalent to androgen deficiency as a pathological condition has been revised.

The commenter further advised that the report not equate low serum testosterone with hypogonadism in the absence of “clear signs of primary or secondary hypogonadism of known etiology.”




“The Hayes, Inc. technology assessment on testosterone testing includes a valuable review of the literature on current conceptions and medical practice around testosterone testing and testosterone supplementation in the setting of ’low’ serum testosterone levels, but there is insufficient evidence to equate ‘low serum testosterone,’ even in the presence of non-specific symptoms characteristic of advancing age, with a clinicopathological condition of ‘androgen deficiency.’”

Thank you for your comment. No changes needed in the report.

January 19, 2015 Email - Dr. Alvin Matsumoto, University of Washington Medical School and VA Puget Sound Health Care System

“In the “Testosterone Testing – Draft Evidence Report,” I am most concerned about the conclusion that testosterone therapy may be useful in improving blood sugar control (glucose and hemoglobin A1c) in men with type 2 diabetes mellitus and hypogonadism. The main evidence cited for this is the meta-analysis by Cai, et al. (2014). A more recent meta-analysis did not find improvement in glycemic control with testosterone treatment (Grossmann M, et al., Clin Endocrinol 2014 ePub, attached). The difficulty with interpreting studies that seek to determine the effect of testosterone therapy on glycemic control is the lack [of] control of changes in diabetes therapy independent of testosterone (oral hypoglycemic agents, insulin and insulin analogs, diet, exercise, weight loss). I am afraid that a conclusion that testosterone therapy improves glycemic control will lead to over-testing (screening without regard to clinical manifestations of androgen deficiency), misinterpretation of testing and over-diagnosis of hypogonadism (obese diabetics often have low total testosterone but normal free testosterone), and over-treatment with testosterone. In the absence of better data, I would eliminate this conclusion or at the very least temper it.”

Thank you for calling the new systematic review by Grossman and colleagues to our attention. This review was published after the draft report was released. The findings of the meta-analysis by Grossman et al. (2014) have been added to the report and conclusions have been modified accordingly. The issue of bias due to changes in antidiabetic medications has also been addressed.

“I think that the variability reported testosterone measurements in various testosterone assays needs to be mentioned. As shown in Table 1 of the report by Wang C, et al., JCEM 89:534-543, 2004 (attached), the same quality control sample measured in different assays gave

Thank you for these comments and for the reference. The results from the study by Wang et al. have been added to the section on Analytic Validity under CLINICAL BACKGROUND in the TECHNICAL REPORT and corresponding edits have been made in the




median testosterone levels that ranged from 215 ng/dL to 348 ng/dL. This situation has occurred because the emphasis in quality control programs has been on reproducibility within the same assay rather than accuracy of the measurement. The CDC program was initiated to provide an accuracy-based quality control for harmonization of assays (i.e., so that assay readings are more comparable); a similar program was needed to deal with the initially marked variability in cholesterol and hemoglobin A1c measurements.”

Analytic Validity section of the EVIDENCE SUMMARY. Further clarification of the quality control programs offered by the CDC and the Clinical Association of Pathologists (CAP), including their voluntary nature, has been added. Additionally, a paragraph on threats to analytic validity has been added to the OVERALL SUMMARY AND DISCUSSION.

“I also think that more emphasis is needed regarding substantial variability in testosterone levels within an individual from day-to-day, up to 35%. Within individual variability was found in a study by Swerdloff RS, et al., JCEM 85:4500-4510, 2000 (page 4509, paragraph 4, attached); 30-35% of men who were found on screening to have a low testosterone < 300 ng/dL had normal average testosterone levels over a 24-hr pharmacokinetic blood sampling. Subsequently, Brambilla DJ, et al. (Clin Endocrinol 67:853-862, 2007, attached) quantified intra-individual variation in testosterone levels more formally. The bottom line is that one sample is not sufficient to assess testosterone status.”

Thank you for these comments and for these references. Statistics regarding the intraindividual variability of test results within the day and between days were included in the Analytic Validity section of the TECHNICAL REPORT. These data have been added to the corresponding section in the EVIDENCE REVIEW. The references cited in the comments will be brought to the Health Technology Clinical Committee (HTCC) meeting.

of 12

January 16, 2015

Comments of the WA Agency Medical Directors

Testosterone Testing Draft Technology Assessment Report

Presented by G. Steven Hammond, PhD, MD, Washington State Department of Corrections Chief Medical Officer

The Hayes, Inc. technology assessment highlights some of the quandaries surrounding appropriate

clinical use of serum testosterone testing in adult men, as well as questions about the risks and benefits

of treating men having “low” serum testosterone levels with testosterone supplementation.

The primary question raised is whether serum testosterone levels below the laboratory “reference

range” in adult men indicate a clinicopathological state that requires treatment. As is noted in the

report, there are a number of well-defined clinical conditions that cause hypogonadism, either primary

or secondary, around which there is little controversy concerning testosterone testing or treatment with

testosterone supplementation. The major question is whether an adult man with non-specific symptoms

that could be associated with hypogonadism, who also has a serum testosterone level measured below

a laboratory reference range, but without an otherwise well-defined hypogonadal condition, is likely to

have improved health outcomes as a result of testosterone testing and consequent testosterone

supplementation.

Defining a clinical condition principally on the basis of a laboratory test result, with no further etiologic

diagnosis, is of questionable validity. Such practice is even more dubious when reference ranges for the

lab test are statistically defined (mean + two standard deviations) in a population of young men, and are

applied to middle-aged and older populations, who are known to have continual declines in

testosterone levels with increasing age.

As noted in the report, there has been much publicizing of so-called “low T” in mass media, with

suggestions for men to consult with their physicians about this. Such “public health” messaging often

suggests, more or less overtly, that expected changes related to aging, such as decreased vigor and

virility, may be related to a medical condition, i.e., “low T”, with the implication that medical treatment

(with testosterone) may be appropriate or even necessary.

As the report indicates, the health benefits, and safety and more so the necessity, of treating “low T”

remain very much in doubt. “Low T” is not an accepted clinical diagnosis. There is not a clear case

definition of “hypogonadism” associated with “below normal” serum testosterone levels and some array

of symptomatology, in the absence of other findings supporting an etiologic diagnosis.

It is not warranted to equate a “low serum testosterone” with “androgen deficiency”, as is done in the

opening sentences of the technology assessment:

“Low serum testosterone is a form of androgen deficiency. In the present report, the term

androgen deficiency can be interpreted to be equivalent to low serum testosterone.”

Comments of the WA Agency Medical Directors January 16, 2015

Page 2 of 12

As noted subsequently in the report, a definition of the term “low serum testosterone” is problematic,

given age-related declines seen in male populations, and the many factors that affect serum

testosterone and sex hormone binging globulin levels. The term androgen deficiency connotes a

pathological state, whereas there is no clearly defined pathological state associated with “low

testosterone” levels in aging men. It is mistaken and potentially misleading to state “the term androgen

deficiency can be interpreted to be equivalent to low serum testosterone.”

Under the circumstances it would be more accurate to say:

“Low serum testosterone may indicate androgen deficiency. While low serum testosterone may

suggest putative androgen deficiency, it must be correlated with additional clinicopathological

signs to be diagnostic of hypogonadism.”

The report should eschew any equation of “low serum testosterone” with “androgen deficiency” or

“hypogonadism” in clinical settings which do not include clear signs of primary or secondary

hypogonadism of known etiology. Without such signs, any putative clinicopathological hypogonadal

condition is hypothetical.

The Hayes, Inc. technology assessment on testosterone testing includes a valuable review of the

literature on current conceptions and medical practice around testosterone testing and testosterone

supplementation in the setting of “low” serum testosterone levels, but there is insufficient evidence to

equate “low serum testosterone”, even in the presence of non-specific symptoms characteristic of

advancing age, with a clinicopathological condition of “androgen deficiency.”

From: Matsumoto, Alvin M [mailto:[email protected]] Sent: Monday, January 19, 2015 6:44 PM To: Teresa Rogstad Cc: [email protected]; Karen Crotty Subject: RE: [EXTERNAL] Draft Report, WA HTA Program

Teresa, et al:

My specific comments:

1. I had few minor wording changes in the “Final Key Questions and Background – Testosterone Testing” sheet (attached). I think that it is important to emphasize that most men male factor infertility have normal serum testosterone levels.

2. In the “Testosterone Testing – Draft Evidence Report”, I am most concerned about the conclusion that testosterone therapy may be useful in improving blood sugar control (glucose and hemoglobin A1c) in men with type 2 diabetes mellitus and hypogonadism. The main evidence cited for this is the meta-analysis by Cai, et al (2014). A more recent meta-analysis did not find improvement in glycemic control with testosterone treatment (Grossmann M, et al, Clin Endocrinol 2014 ePub, attached). The difficulty with interpreting studies that seek to determine the effect of testosterone therapy on glycemic control is the lack control of changes in diabetes therapy independent of testosterone (oral hypoglycemic agents, insulin and insulin analogs, diet, exercise, weight loss). I am afraid that a conclusion that testosterone therapy improves glycemic control will lead to over-testing (screening without regard to clinical manifestations of androgen deficiency), misinterpretation of testing and over-diagnosis of hypogonadism (obese diabetics often have low total testosterone but normal free testosterone), and over-treatment with testosterone. In the absence of better data, I would eliminate this conclusion or at the very least temper it.

3. I think that the variability reported testosterone measurements in various testosterone assays needs to be mentioned. As shown in Table 1 of the report by Wang C et al, JCEM 89:534-543, 2004 (attached), the same quality control sample measured in different assays gave median testosterone levels that ranged from 215 ng/dL to 348 ng/dL. This situation has occurred because the emphasis in quality control programs has been on reproducibility within the same assay rather than accuracy of the measurement. The CDC program was initiated to provide an accuracy-based quality control for harmonization of assays (i.e. so that assay readings are more comparable); a similar program was needed to deal with the initially marked variability in cholesterol and hemoglobin A1c measurements.

4. I also think that more emphasis is needed regarding substantial variability in testosterone levels within an individual from day-to-day, up to 35%. Within individual variability was found in a study by Swerdloff RS, et al, JCEM 85:4500-4510, 2000 (page 4509, paragraph 4, attached); 30-35% of men who were found on screening to have a low testosterone < 300 ng/dL had normal average testosterone levels over a 24-hr pharmacokinetic blood sampling. Subsequently, Brambilla DJ, et al (Clin Endocrinol 67:853-862, 2007, attached) quantified intra-individual variation in testosterone levels more formally. The bottom line is that one sample is not sufficient to assess testosterone status.

mailto:[email protected]

mailto:[email protected]

My only general comment is that the “Testosterone Testing – Draft Evidence Report” was somewhat redundant and long, but was quite detailed and provided a good summary of most of the evidence-base on clinical testosterone testing.

I found the Washington State Agency Utilization and Costs interesting and found myself asking whether guidelines (such as Endocrine Society guidelines) for testosterone testing and treatment were followed in Washington, i.e. the appropriateness of utilization and costs.

I do not have the confirmed date, location and time of the public meeting in March (initially said to be on March 20th at the SeaTac Conference Center, unclear what time) or information regarding what is expected of me at the meeting. My calendar is pretty full already in March and is dynamically changing with time. So, I would appreciate more specific details as soon as possible.

Thanks,

Al

Alvin M. Matsumoto, M.D. Acting Head, Division of Gerontology and Geriatric Medicine Professor, Department of Medicine University of Washington School of Medicine Associate Director, Geriatric Research, Education and Clinical Center Director, Clinical Research Unit V.A. Puget Sound Health Care System 1660 S. Columbian Way (S-182-GRECC) Seattle, WA 98108-1597 Phone: 206-764-2308 FAX: 206-764-2569

Clinical Endocrinology (2007)

67

, 853–862 doi: 10.1111/j.1365-2265.2007.02976.x

© 2007 The AuthorsJournal compilation © 2007 Blackwell Publishing Ltd

853

O R I G I N A L A R T I C L E

Blackwell Publishing Ltd

Intraindividual variation in levels of serum testosterone and other reproductive and adrenal hormones in men

Donald J. Brambilla*, Amy B. O’Donnell*, Alvin M. Matsumoto† and John B. McKinlay*

*

New England Research Institutes, Watertown, Massachusetts,

†

Department of Medicine, University of Washington School of Medicine, and Geriatric Research, Education and Clinical Center, VA Puget Sound Health Care System, Seattle, USA

Summary

Background

Estimates of intraindividual variation in hormone

levels provide the basis for interpreting hormone measurements

clinically and for developing eligibility criteria for trials of hormone

replacement therapy. However, reliable systematic estimates of such

variation are lacking.

Objective

To estimate intraindividual variation of serum total,

free and bioavailable testosterone (T), dihydrotestosterone (DHT),

SHBG, LH, dehydroepiandrosterone (DHEA), dehydroepiandrosterone

sulphate (DHEAS), oestrone, oestradiol and cortisol, and the

contributions of biological and assay variation to the total.

Design

Paired blood samples were obtained 1–3 days apart at entry

and again 3 months and 6 months later (maximum six samples per

subject). Each sample consisted of a pool of equal aliquots of two

blood draws 20 min apart.

Study participants

Men aged 30–79 years were randomly selected

from the respondents to the Boston Area Community Health Survey,

a study of the health of the general population of Boston, MA, USA.

Analysis was based on 132 men, including 121 who completed all

six visits, 8 who completed the first two visits and 3 who completed

the first four visits.

Measurements

Day-to-day and 3-month (long-term) intra-

individual standard deviations, after transforming measurements

to logarithms to eliminate the contribution of hormone level to

intraindividual variation.

Results

Biological variation generally accounted for more of

total intraindividual variation than did assay variation. Day-to-day

biological variation accounted for more of the total than did long-term

biological variation. Short-term variability was greater in hormones

with pulsatile secretion (e.g. LH) than those that exhibit less

ultradian variation. Depending on the hormone, the intraindividual

standard deviations imply that a clinician can expect to see a difference

exceeding 18–28% about half the time when two measurements are

made on a subject. The difference will exceed 27–54% about a

quarter of the time.

Conclusions

Given the level of intraindividual variability in

hormone levels found in this study, one sample is generally not

sufficient to characterize an individual’s hormone levels but collecting

more than three is probably not warranted. This is true for clinical

measurements and for hormone measurements used to determine

eligibility for a clinical trial of hormone replacement therapy.

(Received 22 December 2006; returned for revision 11 February

2007; finally revised 7 June 2007; accepted 7 June 2007)

Introduction

Estimates of intraindividual variation in hormone levels provide

the foundation for interpreting hormone measurements, such as

the reliability of one or two values as estimates of an individual’s

average hormone concentration, for both the clinician and the

researcher. For present purposes, intraindividual variation is defined

as variation around an individual’s steady-state mean hormone level

rather than changes in the mean itself. The steady-state mean is that

individual’s current average state. Systematic variation, such as the

well-known changes in testosterone and other hormones with age

or the relatively rapid changes that are associated with onset of certain

diseases or initiation of certain medications, constitutes changes in

the steady state mean.

The number of blood samples required to adequately characterize

an individual’s steady state hormone level increases as intraindividual

variation increases. In the absence of information on this variation,

it is also difficult to determine whether a difference between hormone

levels on two occasions constitutes simply a fluctuation around the

steady state mean or a change in the mean. The researcher has

difficulty performing sample size calculations for trials in which

change in hormone level is the outcome because intraindividual

variation is usually the denominator of the test statistic used to

compare average changes in hormone levels between treatment

groups. Moreover, the number of samples required to determine

eligibility, when eligibility depends on hormone level, is unknown.

Interindividual variation in the levels of testosterone and other

hormones in men has received considerable attention.

1–8

Intra-

individual variation has also been examined,

9–17

but sample sizes

in previous studies were generally small and none of the studies

provided estimates of intraindividual variation that would form the

Correspondence: Donald J. Brambilla, New England Research Institutes, 9 Galen Street, Watertown, Massachusetts, MA 02472, USA. Tel: +1 617-923-7747; Fax: +1 617-926-8246; E-mail: [email protected]

854

D. J. Brambilla

et al.

© 2007 The AuthorsJournal compilation © 2007 Blackwell Publishing Ltd,

Clinical Endocrinology

,

67

, 853–862

quantitative basis for interpreting hormone measurements. There-

fore, a prospective study of variation in the levels of total, free and

bioavailable testosterone (T), dihydrotestosterone (DHT), SHBG, LH,

dehydroepiandrosterone (DHEA), dehydroepiandrosterone sulphate

(DHEAS), oestrone, oestradiol and cortisol in men was initiated

in Boston, Massachusetts, USA in May 2004. In this paper, we present

estimates of day-to-day and 3-month variation in these hormones.

Methods

Subjects

Subjects for the study were selected from among the 2301 male

respondents to the Boston Area Community Health (BACH)

Survey.

18

Subjects for the BACH survey were randomly selected from

among the residents of Boston, MA, USA who were aged 30–79 years,

using a weighted sampling scheme to recruit approximately equal

numbers of Hispanic Americans, non-Hispanic African Americans

and non-Hispanic Caucasians, and approximately equal numbers by

decade of age.

For the present study, BACH Survey respondents were stratified

into three categories of race/ethnicity and decade of age (30–39,

40–49, 50–59, 60–69, 70+ years) to produce 15 strata. Subjects were

randomly selected from the male respondents in each stratum with the

goal of obtaining approximately the same number of subjects in every

stratum. A potential subject who refused or was found to be ineligible

was replaced with another randomly selected from the same stratum.

Men were excluded if they (1) had a history of hypogonadism of

known cause, such as treatment for prostate cancer, Klinefelter

syndrome, Kallmann syndrome and orchidectomy; (2) were using

any medications that alter hormone levels, either as the intended

effect or as a side effect; (3) had cirrhosis, liver cancer, other severe

liver disease, or kidney disease requiring dialysis; or (4) had a problem

with blood draws, such as haemophilia, or a compromised immune

system caused by HIV/AIDS, chemotherapy, radiation or other

conditions. Excluded medications included anabolic steroids,

androstenedione, bicalutamide (Casodex®, AstraZeneca, Wilmington,

DE), cimetidine, DHEA, diethylstilbestrol, other oestrogens, dutas-

teride (Avodart®, GlaxoSmithKline, Research Triangle Park, NC),

finasteride (Proscar®, Merck, Whitehouse Station, NJ), glucocorticoids

(prednisone, cortisone, hydrocortisone and decadron), ketocona-

zole, megestrol acetate (Megace®, Bristol-Myers Squibb, Princeton,

NJ), opiates (morphine, codeine, oxycodone, hydrocodone, etc.),

spironolactone, testosterone or any androgen, flutamide (Eulexin®,

Schering-Plough, Kenilworth, NJ) and other medications for pros-

tate cancer. Eligibility was determined from responses to the BACH

survey and responses to questions at screening for this study.

Subjects were enrolled after written informed consent was

obtained. The consent form, protocol, telephone scripts and contact

documents were approved by the Institutional Review Board of the

New England Research Institutes.

Sampling methods

Paired blood samples were obtained 1–3 days apart (median 2 days)

at study entry and again 3 months and 6 months later, producing a

maximum of six blood samples per subject. With this nested design,

the study provided estimates of both day-to-day and 3-month

variation in hormone levels. Most study visits took place in the

subject’s home but, if the subject so requested, they took place

elsewhere, usually at his place of employment or at study headquarters

at the New England Research Institutes.

At each visit, two blood samples were drawn 20 min apart to

reduce the effects of pulsatile secretion on hormone levels. Blood was

drawn within 4 h of the subject’s awakening to control for diurnal

variation.

19–21

Sampling was postponed to another day if the time

of awakening on a given day departed substantially from a subject’s

normal pattern. The two samples were placed in an ice-filled cooler

for transport back to study headquarters, where they were centrifuged,

equal aliquots were pooled using a calibrated automatic pipette, and

the pooled samples were stored in scintillation vials at –70

°

C.

Samples were transferred to the endocrine laboratory of the Depart-

ment of Physiology at the University of Massachusetts (UMASS)

Medical School, Worcester, MA, where the assays were performed.

At the first, third and fifth study visits, a brief questionnaire was

administered to identify changes in health or behaviour that might

affect variation in hormone levels. Subjects who had started taking

medications or had developed conditions that would have made

them ineligible at the start were excluded from further participation.

Hormone assays

The methods that were used for the various assays are listed in

Table 1. Free T and bioavailable T were calculated from the Sodergard

equation using both a constant albumin concentration of 4·3 g/dl

and measured albumin concentration.

22,23

All samples obtained from

a subject were assayed in the same run for each hormone. Differences

among measurements within subjects are thus free of interassay

variation. Interassay variation was estimated as described below.

Statistical methods

Descriptive statistics for interindividual differences in hormone

levels were based on hormone levels at the first study visit.

Intraindividual variation was characterized using intraindividual

standard deviations. Prior to analysis, hormone measurements were

transformed to base 10 logarithms to eliminate a positive correlation

between the standard deviation and mean hormone level that

rendered application of estimates of intraindividual variation to

clinical data extremely difficult. The standard deviations of the

transformed data are applicable to a broad range of hormone levels.

Interpretation of the estimates of intraindividual variation for

untransformed hormone measurements is provided.

Intrasubject standard deviations were calculated under four

sampling schemes:

1

Samples collected 1–3 days apart and assayed in the same run:

2

Samples collected 1–3 days apart, with each one assayed in a

separate run:

3

Samples collected 3 months apart and assayed in the same run:

σ σ σ1

2 2 = +D A

σ σ σ σ22 2 2 = + +D E A

σ σ σ σ3

2 2 2 = + +D L A

Intraindividual variation of hormones in men

855



,

67

, 853–862

4

Samples collected 3 months apart, with each one assayed in a

separate run:

In these equations, is the variance component attributable to

day-to-day variation, is the component attributable to 3-month

variation, is the intra-assay, or within-run, component of

variance and is the interassay, or between-run, component of

variance. The so-called batch testing in Schemes 1 and 3 is often used

in clinical trials because interassay variation is excluded, which

improves statistical power to detect changes in hormone levels.

Schemes 2 and 4 are more relevant to clinical testing.

The day-to-day and 3-month intraindividual components of

variance were estimated using a nested linear model for each

hormone with subject and month within subject (baseline,

3 months, 6 months) as the predictors.

24

Month within subject was

treated as a random effect. The error mean square provided an

estimate of the sum of the day-to-day and intra-assay com-

ponents, . The 3-month component, , was estimated using

the equation for the expected mean square for month within subject.

Intra- and interassay variation, and , were estimated from

results from the assay controls in the nine runs per hormone in which

samples were assayed in this study, using a linear model with run

treated as a random effect and control (e.g. high, medium and low)

treated as a fixed effect.

25

Each assay of cortisol, DHEAS, DHT, free T,

LH and T included three controls (high, medium, low) in duplicate,

while each assay of DHEA, oestrone, oestradiol and SHBG included

two controls (high and low) in triplicate. The intra-assay component

was estimated from the error mean square, while the interassay

component was estimated from the expected mean square for the

predictor run. Components of assay variation for calculated free T and

calculated bioavailable T, using an albumin concentration of

4·3 g/dl, were estimated by Monte Carlo simulation using the

components of assay variation of T and SHBG. Similar calculations

using measured albumin concentration were not performed

because data from the albumin controls were not obtained.

Estimates of the intraindividual standard deviations of calculated

free and bioavailable T under Schemes 2 and 4, with albumin fixed

at 4·3 g/dl, were obtained by Monte Carlo simulation. Random

variates, sampled from two normal distributions, with means of zero

and variances equal to the interassay components of variance for T

and SHBG, respectively, were added to the observed T and SHBG

values. Free T and bioavailable T were then calculated from the

Sodergard

22,23

equations as usual and the variances of log-transformed

values were obtained from the nested linear model described above.

The intraindividual standard deviations were estimated from the

medians of the frequency distributions of the variances under

Schemes 2 and 4. Estimates of the intraindividual standard deviations

σ σ σ σ σ4

2 2 2 2 = + + +D L E A

σD

2

σL

2

Table 1. Methods used to measure hormone levels

Hormone Method Kit or reference Intra-assay CV Interassay CV

T RIA Coat-A-Count,

Diagnostic Product

Corporation,

Los Angeles, CA

4·3 9·8

Free T Calculated 22,23

Bioavailable T Calculated 17,18

DHT RIA In-house assay 2·1 3·1

SHBG CLIA Immulite, Diagnostic

Product Corporation,

Los Angeles, CA

3·1 4·1

LH CLIA Immulite, Diagnostic


Los Angeles, CA

4·2 5·5

DHEA RIA Diagnostic Systems

Laboratories, Webster, TX

2·6 5·6

DHEAS RIA Coat-A-Count, Diagnostic


Los Angeles, CA

2·5 5·2

Oestrone (E1) RIA Diagnostic Systems


1·0 4·2

Oestradiol (E2) RIA Diagnostic Systems


2·3 7·4

Cortisol RIA Coat-A-Count, Diagnostic


Los Angeles, CA

3·4 6·4

Albumin Colourimetric determination

with Bromcresol Purple

Beckman Coulter, Brea, CA N/A

T, testosterone; DHT, dihydrotestosterone; DHEA, dehydroepiandrosterone; DHEAS, dehydroepiandrosterone sulphate; RIA, radioimmunoassay; CLIA, chemiluminescent immunoassay; N/A, not available.

σA

2

σE

2

σ σD A

2 2 + σL

2

σA

2 σE

2

856

D. J. Brambilla

et al.



,

67

, 853–862

of free T and bioavailable T under Schemes 2 and 4, using measured

albumin concentration, were not obtained in the absence of data

from the albumin controls.

Approximate 95% confidence limits for the standard deviation of

each hormone under each sampling scheme were derived from the

frequency distribution of 10 000 bootstrap

26

estimates of the standard

deviation. For Schemes 2 and 4, which included interassay variation,

a separate random variate, sampled from a normal distribution with

mean zero and variance equal to the estimated interassay component

of variance for that hormone, was added to each observed hormone

measurement before bootstrapping took place.

To provide an interpretation of the standard deviations with some

clinical relevance, the percentage difference between hormone levels

in two samples from the same subject that would be exceeded 50%

of the time was calculated under sampling Schemes 2 and 4. If hor-

mone levels are normally distributed around an individual’s steady

state mean, then the percentage difference between two hormone

measurements from a subject, calculated as 100

×

(

x

2

−

x

1

)/

x

1

, where

x

2

is the larger and

x

1

the smaller measurement, should be

50% of the time, where

s

is the intraindividual

standard deviation and

Z

0·75

is the standard normal deviate for a

cumulative probability of 0·75. The difference that would be

exceeded 25% of the time was calculated similarly using

Z

0·875

.

The extent to which repeated sampling improves the precision of

estimated mean hormone level for an individual is another important

aspect of intraindividual variation. Therefore, 95% confidence limits

for the steady state mean were calculated for one measurement, the

average of two and the average of three measurements. The standard

deviations under sampling Scheme 4 were used for the calculations.

If log

10

-transformed measurements are normally distributed, then

the 95% confidence limits around the mean, M, of

n

measurements,

after transforming the confidence limits back to the raw scale, are

and . Each confidence limit in the raw scale

can thus be expressed as a percentage or proportion of the average

measurement. Barring extreme fluctuations in hormone level, the

bias induced by averaging

n

measurements, rather than averaging the

log-transformed values and transforming the mean of the logs back

to the raw scale, is small enough to be ignored for present purposes.

Systematic differences in the intraindividual standard deviation

among categories of age or race/ethnicity would compromise the

generalizability of the results of this study. Statistical tests for such

differences were performed by calculating intraindividual standard

deviations under sampling Schemes 1 and 3 for each hormone in

each subject who completed all six visits and treating the standard

deviations as outcome variables in an analysis of variance with age

category and race/ethnicity as predictors.

In order to determine if intraindividual variation measured over

6 months might be contaminated with systematic changes, such as

those associated with ageing, log-transformed hormone level was

modelled as a linear function of time on study using a mixed linear

model with subject treated as a random effect.

25

Results

Letters inviting participation were sent to 230 men: 22 refused to

participate; 43 were ineligible or too ill for screening; 31 could not

be contacted; and 134 agreed to participate. Eight men completed

only the first two visits and three men completed only the first four

visits. Of the 11 men with incomplete data, 6 men were dropped

from the study after they began taking medications or developed

medical conditions that altered hormone levels, 4 men withdrew

voluntarily or were lost to follow-up and 1 man died. Two subjects,

who were determined to be ineligible for the study after completing

all six visits, were excluded from the analysis. All available hormone

measurements from the remaining 132 subjects were used to obtain

the standard deviations reported here.

The cohort included 43 Caucasians, 46 African Americans and 45

Hispanic Americans. There were 20 respondents in the first age

category (30–39 years) and 28 or 29 respondents in each of the other

four age categories. Other baseline characteristics of the participants

are provided in Table 2. The hormone levels in the participants in

the study span broad ranges of values (Table 3). Some of the extreme

values in the table, such as the minimum for total T and the maxima

for LH and SHBG, may indicate various disease states. As health

status and medication use were obtained by self-report, the possibility

that some subjects were taking unreported medications or had

unreported conditions affecting hormone levels cannot be ruled out.

However, the majority of values are within normal ranges, so under-

reporting of diseases or medications was uncommon.

Median time from awakening to first blood draw was 1·67 h (5th

and 95th percentiles: 0·42 h and 3·12 h). Most (92·5%) of the 760

blood draws took place between 06·00 h and noon but 37 samples

were drawn between 16·30 h and 18·00 h, and the remaining 20

samples were drawn between 12·00 h and 16·25 h. Draw times were

fairly tightly clustered within subjects, although they varied considerably

among subjects because of work schedules, among other factors.

Tests for systematic changes in hormone levels during the study

indicated that mean 6-month change was < 4% for all hormones and

< 2% for half of the hormones. Total T had the smallest mean change

at 0·8%. All of these changes were quite small compared to the

intraindividual standard deviations described below. For more than

half of the hormones, the direction of the change indicated by the

> × −⋅ ( )100 10 10 75 2sz

M /10196⋅ s n/ M × ⋅10196s n/

Table 2. Baseline characteristics of the entire cohort

Mean ± SD

Height 174·7 ± 7·4 cm

Weight 85·5 ± 17·8 kg

BMI 28·0 ± 5·4 mg/m2

Waist to hip ratio 0·94 ± 0·07

Smokers 30·0%Recreational drug use 6·0%

Depressive symptoms 11·9%

Alcohol use (average drinks per day):

None 36·4%

> 0, < 1 35·3%

≥ 1, < 3 21·6%

≥ 3 6·7%

BMI, body mass index.

Intraindividual variation of hormones in men

857



,

67

, 853–862

fitted parameter disagreed with the age-related changes reported

previously.

2,6–8

Thus, the differences over time were more likely noise

than systematic variation.

Using a critical

P

-value of 0·05, only 7 of 56 tests for variation of

the short-term and long-term intraindividual standard deviations

with age category and race/ethnicity produced statistically significant

results. All but two of the other 49 tests had

P

> 0·20. Linear models

that included both predictors accounted for no more than 10% of

the variation (

r

2

≤

0·10) in the standard deviation for all hormones

considered. The short-term and long-term standard deviations for

oestradiol varied with race/ethnicity (short-term:

P

= 0·0178; long-

term:

P

= 0·0139). Non-Hispanic African Americans had the highest

average standard deviations, while Hispanic Americans had the

lowest. However, the differences between mean standard deviations

were small enough that a single estimate was reasonably representative

of all three groups. The short-term intraindividual standard deviations

of cortisol (

P =

0·0366), DHEA (

P =

0·0380) and LH (

P =

0·0097)

varied with age, as did the long-term standard deviations of DHT

(

P =

0·0381) and LH (

P =

0·0320). For cortisol, DHEA and DHT,

plots of intraindividual standard deviations against age failed to

demonstrate any coherent pattern that could be interpreted as

systematic changes in the level of variation with age. In all three cases,

the highest mean standard deviation was found in the middle age

category (50–59 years) and in each case, the elevated mean was

attributable to a small number of subjects with elevated variation,

rather than an overall difference between age groups. A single estimate

of the short-term and long-term standard deviation for each hormone

was deemed reasonably representative of the group as a whole.

On the other hand, the mean short-term and long-term

standard deviations of LH were higher in the first three age

groups (30–59 years) than in the last two age groups (60–79 years).

There was no clear evidence of an age-related trend within either the

first three or the last two age groups. Therefore, standard deviations

for LH were calculated separately for men

≤

59 years and those

≥

60 years.

Estimates of , , and are provided in Table 4

and intraindividual standard deviations based on these estimates are

provided in Table 5 with bootstrap 95% confidence limits. Units are

not specified in these tables because the variance and standard

deviation of log-transformed values do not depend on the units in

which a concentration is expressed. The 3-month standard deviations

were only 12–60% (median 26%) larger than the day-to-day standard

deviations, indicating that intraindividual variation measured over

a relatively short interval of, for instance, a few days, would capture

more than half the total variation that would be seen over a few

months. This follows from the relatively small size of the 3-month

component, , in Table 4.

Setting aside albumin and SHBG, cortisol, DHEA and LH had the

largest intraindividual standard deviations in the study, whereas

oestradiol, DHT, total T and oestrone had the smallest. The

differences in intraindividual variation between these two groups

of hormones were caused mainly by differences in biological variation,

not assay variation, as is clear from the components of variance in

Table 4. The intraindividual standard deviations for free T and

bioavailable T by the Sodergard equation were expected to be larger

than the standard deviations for total T because the variation for the

two fractions includes variation attributable to both total T and

SHBG. The differences, however, are rather small, reflecting the small

standard deviations for SHBG.

The percentage difference between two hormone measurements

on the same subject that would be exceeded 25% or 50% of the time,

when the measurements were made a few days or a few months apart,

Table 3. Descriptive statistics for hormone levels at the first study visit

Minutes

Percentile

Hormone 5th 25th 50th 75th 95th Maximum

Total T (nmol/l) 3·5 6·7 11·4 15·2 18·9 26·9 38·8

Free T (pmol/l)* 48·5 152·6 228·8 305·1 377·9 523·5 755·8

Free T (pmol/l)† 48·5 159·5 228·8 301·6 391·8 530·5 828·6

Bioavailable T (nmol/l)* 1·1 3·6 5·4 7·1 8·8 12·2 17·8

Bioavailable T (nmol/l)† 1·1 3·7 5·4 7 9·2 12·4 19·4

Albumin (g/dl) 3 3·4 3·9 4·1 4·4 4·8 5·2

Cortisol (nmol/l) 165·5 202 292·4 358·7 458 579·4 847

DHEA (nmol/l) 4·3 7 11·6 16·7 23·2 44 64·5

DHEAS (umol/l) 0·35 0·89 1·76 3·08 5·16 11·23 15·61

DHT (nmol/l) 0·34 0·55 0·76 0·96 1·24 1·65 2·03

Oestrone (pmol/l) 37 77·7 103·6 129·5 155·3 210·8 310·7

Oestradiol (pmol/l) 37 62·9 96·2 122 151·6 196 273·7

LH (mIU/ml) (76 subjects, 30–59 years) 1·26 2·52 3·77 4·96 6·53 11·1 17·0

LH (mIU/ml) (56 subjects, 60–79 years) 1·04 1·81 4·26 6·13 8·23 11·8 24·9

SHBG (nmol/l) 7·59 16·8 25·8 36·9 51 79·6 118

*From the Sodergard equations assuming albumin at 4·3 g/dl.†From the Sodergard equations using measured albumin.T, testosterone; DHT, dihydrotestosterone; DHEA, dehydroepiandrosterone; DHEAS, dehydroepiandrosterone sulphate.

σA

2 σE

2 σ σD A

2 2 + σL

2

σL

2

858

D. J. Brambilla

et al.



,

67

, 853–862

Table 5.

Estimates of the interindividual standard deviaton at Visit 1 and the intraindividual standard deviation of log

10

-transformed hormone concentration for four study designs, using data from all visits, with 95% bootstrap confidence limits in parentheses

Hormone

Total T 1·767 × 10–3 2·507 × 10–3 4·313 × 10–3 1·976 × 10–3

Free T* 3·730 × 10–3 2·630 × 10–3 5·433 × 10–3 1·774 × 10–3

Free T† – – 5·600 × 10–3 1·894 × 10–3

Bioavailable T* 3·730 × 10–3 2·630 × 10–3 5·433 × 10–3 1·774 × 10–3

Bioavailable T† – – 5·600 × 10–3 1·894 × 10–3

Albumin – – 5·777 × 10–4 1·361 × 10–4

Cortisol 1·122 × 10–3 1·069 × 10–3 8·578 × 10–3 3·451 × 10–3

DHEA 3·438 × 10–4 3·249 × 10–4 8·766 × 10–3 3·568 × 10–3

DHEAS 7·226 × 10–4 2·996 × 10–4 6·253 × 10–3 1·564 × 10–3

DHT 1·238 × 10–3 2·842 × 10–4 3·711 × 10–3 1·917 × 10–3

Estrone 3·723 × 10–3 8·911 × 10–4 3·605 × 10–3 1·543 × 10–3

Estradiol 1·431 × 10–3 3·342 × 10–4 4·715 × 10–3 1·588 × 10–3

LH (76 subjects, 30–59 years) 8·833 × 10–4 7·488 × 10–5 1·212 × 10–2 5·447 × 10–4

LH (56 subjects, 60–79 years) 8·833 × 10–4 7·488 × 10–5 6·105 × 10–3 2·095 × 10–3

SHBG 5·117 × 10–4 3·101 × 10–4 1·306 × 10–3 2·049 × 10–3

*From the Sodergard equations assuming albumin at 4·3 g/dl.

†From the Sodergard equations using measured albumin.

: Intra-assay variance

: Interassay variance

: Day-to-day variance + intra-assay variance

: Long-term or 3-month variance

T, testosterone; DHT, dihydrotestosterone; DHEA, dehydroepiandrosterone; DHEAS, dehydroepiandrosterone sulphate.

σA2 σE

2 σ σD A2 2 + σL

2

σA2

σE2

σ σD A2 2 +

σL2

Table 4. Estimates of the variance components for log-transformed hormone measurements

Interindividual

standard deviation

Sampling design

1 2 3 4

Total T 0·182 0·066 (0·057, 0·076) 0·083 (0·076, 0·093) 0·079 (0·070, 0·090) 0·094 (0·086, 0·104)

Free T* 0·170 0·074 (0·065, 0·084) 0·094 (0·084, 0·101) 0·085 (0·076, 0·096) 0·104 (0·093, 0·111)

Free T† 0·171 0·075(0·066, 0·085) – 0·087 (0·078, 0·098) –

Bioavailable T* 0·170 0·074 (0·065, 0·084) 0·094 (0·079, 0·097) 0·085 (0·076, 0·096) 0·104 (0·092, 0·111)

Bioavailable T† 0·171 0·075 (0·066, 0·085) – 0·087 (0·078, 0·098) –

Albumin 0·048 0·024 (0·021, 0·027) – 0·027 (0·024, 0·030) –

Cortisol 0·141 0·093 (0·082, 0·103) 0·098 (0·089, 0·110) 0·110 (0·098, 0·124) 0·114 (0·101, 0·127)

DHEA 0·236 0·094 (0·083, 0·101) 0·095 (0·084, 0·101) 0·111 (0·101, 0·120) 0·113 (0·102, 0·121)

DHEAS 0·335 0·079 (0·064, 0·095) 0·081 (0·066, 0·096) 0·088 (0·077, 0·101) 0·090 (0·079, 0·102)

DHT 0·145 0·061 (0·054, 0·069) 0·063 (0·056, 0·071) 0·075 (0·065, 0·086) 0·077 (0·067, 0·090)

Oestrone 0·157 0·060 (0·055, 0·066) 0·067 (0·066, 0·096) 0·072 (0·065, 0·080) 0·078 (0·079, 0·102)

Oestradiol 0·138 0·069 (0·063, 0·075) 0·071 (0·066, 0·078) 0·079 (0·072, 0·088) 0·082 (0·074, 0·090)

LH (subjects 30–59 years) 0·200 0·110 (0·095, 0·125) 0·111 (0·096, 0·126) 0·112 (0·100, 0·126) 0·113 (0·100, 0·126)

LH (subjects 60–79 years) 0·236 0·078 (0·068, 0·089) 0·079 (0·069, 0·090) 0·090 (0·080, 0·101) 0·091 (0·081, 0·102)

SHBG 0·211 0·036 (0·028, 0·045) 0·040 (0·034, 0·047) 0·058 (0·051, 0·065) 0·061 (0·054, 0·067)

*From the Sodergard equations assuming albumin at 4·3 g/dl.†From the Sodergard equations using measured albumin.Design 1: a short-term study with batch testing.Design 2: a short-term study with samples tested when collected.Design 3: a six-month study with batch testing.Design 4: a six-month study with samples tested when collected.

Intraindividual variation of hormones in men 859

© 2007 The AuthorsJournal compilation © 2007 Blackwell Publishing Ltd, Clinical Endocrinology, 67, 853–862

were calculated to aid in interpreting the standard deviations. For

DHT, oestrone and oestradiol, the difference between two hormone

measurements should exceed 15–17% or 18–20% half the time,

when the measurements are made a few days or a few months apart,

respectively. At the larger standard deviations that characterize free

T, bioavailable T, cortisol and DHEA, differences should exceed

22–24% or 25–28% half the time, for measurements made a few

days or a few months apart, respectively. Percentage differences that

would be exceeded 25% of the time ranged from 27% to 31%, for

DHT, oestrone and oestradiol measured a few days apart, to 46–54%

for free T, bioavailable T, cortisol and DHEA measured a few

months apart.

To clarify these calculations, consider two measurements of free T

made a few months apart and suppose that one of the two values was

175 pmol/l. The probability that the other value was < 140 pmol/l

or > 220 pmol/l is 0·50 and the probability that it was < 120 pmol/l

or > 238 pmol/l is 0·25. If one of the two values was 300 pmol/l, then

the probability that the other measurement was < 240 pmol/l or

> 380 pmol/l is 0·50 and the probability that it was < 205 pmol/l or

> 440 pmol/l is 0·25.

Another way to assess the results is to consider the impact of

intraindividual variation on a diagnosis of abnormally high or low

hormone levels. Consider, for example, total T and suppose that

values < 8·67 nmol/l (250 ng/dl) are considered possibly indicative

of hypogonadism. Of 121 subjects who completed all six visits, 15

had total T < 8·67 nmol/l at the first visit but only 6 of these 15 had

average values < 8·67 nmol/l over all six visits. This outcome probably

reflects the regression to the mean that can occur when subjects are

selected on the basis of values that are on one side of a specified

threshold. Of the 15 subjects, 3 had average values > 10·40 nmol/l

(300 ng/dl) which many clinicians would consider to be within the

normal range for young men. Reducing the threshold to 6·93 nmol/l

(200 ng/dl) does not eliminate the problem. Of 7 men with total

T < 6·93 nmol/l at Visit 1, 3 had average values over six visits that

were > 6·93 nmol/l. One average was between 6·93 nmol/l and

8·67 nmol/l, one was between 8·67 nmol/l and 10·40 nmol/l and the

third was > 10·40 nmol/l. These counts do not include the two men

who were excluded after it was determined that they were not eligible

for the study. In both cases, T on Visit 1and average T were

< 6·93 nmol/l. On the other hand, 5 of 10 subjects with average values

< 8·67 nmol/l over the first two visits had average values > 8·67 nmol/l

over all six visits but none had average values > 10·40 nmol/l.

Thus, some improvement in diagnostic accuracy can be obtained by

averaging values from two blood samples.

The 95% confidence limits for the steady state mean, based on one

measurement, the average of two and the average of three, are

provided in Table 6. The values in the table are the multipliers,

, that were defined earlier for the measured value or average

of two or three measured values. For example, if total T is measured

once, then the confidence limits are 65% and 153% of the measured

value. The values in the table demonstrate the gain in precision that

results from collecting more than one sample from a subject. The

confidence interval based on the average of two measurements of

total T is approximately 30% narrower than the width of the interval

around a single measurement, while the interval around the average

of three measurements is 43% narrower than the width around

a single measurement.

As an example of the gain in precision with repeated testing,

suppose that total T from a subject, based on one measurement or

the average of two or three measurements, is at the 5th percentile in

Table 3 (6·7 nmol/l). Using the multipliers in Table 6, the 95%

confidence interval for the steady state mean is 4·39–10·23 nmol/l,

based on one measurement, 4·97–9·04 nmol/l, based on the average

of two measurements, and 5·25–8·55 nmol/l, based on the average

of three measurements.

The extent to which batch testing reduces intraindividual variation

by eliminating interassay variation can be determined by comparing

the standard deviations for Schemes 1 and 2 or those for Schemes 3

and 4 in Table 5. For LH, DHEA, DHEAS, cortisol and DHT, batch

testing reduced the intraindividual standard deviation by < 5%,

indicating that interassay variation makes only a small contribution

to total variation when samples from a subject are assayed separately.

For total T and the fractions, batch testing produced reductions of

16–21% in the intraindividual standard deviations. Thus, batch

testing would lead to a fairly substantial increase in statistical power

or reduction in sample size in studies in which the end-point is

change in total T or a T fraction over time.

Discussion

This study provided estimates of day-to-day and 3-month intra-

individual variation in total T, free T and bioavailable T, seven other

adrenal and reproductive hormones and SHBG in a large cohort of

generally healthy, community-dwelling, middle-aged to older men

of diverse ethnicity. We expect the results to be broadly generalizable

because the subjects were randomly sampled from the community.

The relatively narrow bootstrap confidence limits in Table 5

indicate that the differences between the standard deviations for LH,

DHEA and cortisol on the one hand and total T, DHT, oestrone and

oestradiol on the other are not the result of chance but reflect real

Table 6. 95% confidence limits for steady state mean hormone level, expressed as multipliers of the result of 1 measurement or the average of 2 or 3 measurements

Hormone

1

measurement

Mean of 2

measurements

Mean of 3

measurements

Total T 0·65, 1·53 0·74, 1·35 0·78, 1·28

Free T* 0·63, 1·60 0·72, 1·39 0·76, 1·31

Bioavailable T* 0·63, 1·60 0·72, 1·39 0·76, 1·31

Cortisol 0·60, 1·68 0·69, 1·44 0·74, 1·35

DHEA 0·60, 1·66 0·70, 1·43 0·75, 1·34

DHEAS 0·67, 1·50 0·75, 1·33 0·79, 1·26

DHT 0·71, 1·41 0·78, 1·28 0·82, 1·22

Oestrone 0·70, 1·42 0·78, 1·28 0·82, 1·22

Oestradiol 0·69, 1·44 0·77, 1·30 0·81, 1·24

LH 0·63, 1·60 0·72, 1·39 0·76, 1·31

SHBG 0·76, 1·31 0·82, 1·21 0·85, 1·17

*From the Sodergard equations assuming albumin at 4·3 g/dl.T, testosterone; DHT, dihydrotestosterone; DHEA, dehydroepiandrosterone; DHEAS, dehydroepiandrosterone sulphate.

10 196± ⋅ s n/

860 D. J. Brambilla et al.


differences in intraindividual variation. As noted earlier, differences

in assay variation do not explain the differences in total variation

between these two sets of hormones. Differences in ultradian variation

brought on by pulsatile release could account for some of the

differences in total variation. Pooling equal aliquots of two samples,

as was done in this study, reduces but does not eliminate the effect

of pulsatile release. Thus, all other sources of variation being equal,

the hormone with a greater degree of ultradian variation will display

greater intraindividual variation even with the sampling strategy

used in this study. For example, LH displays greater ultradian

variation and had larger intraindividual standard deviations than

total T.27,28

The standard deviations presented here are based on a specific

protocol for blood draws and a specific set of assays. It is important

to consider the effects of departures from the procedures used in this

study on estimates of intraindividual variation in hormone levels.

Sampling times were fairly tightly clustered within men to reduce

the effects of diurnal variation on intraindividual variation. The

amplitude of diurnal variation for total T, free T and bioavailable T,

SHBG, LH, FSH and DHEAS may actually exceed the intraindividual

standard deviations obtained in this study.19,20,29–36 Relaxing the

restrictions on intrasubject variation of time of collection would

likely increase the intraindividual standard deviations considerably,

whereas further restricting the time of collection would probably

reduce the standard deviations. The amplitude of diurnal fluctuation,

at least for total T, may be reduced in elderly men19,30 although

possibly not in middle-aged men.36 The extent to which time of day

must be controlled in sample collection may therefore depend on a

subject’s age.

Measurements of hormone levels are often based on a single blood

draw, rather than on a pool of equal aliquots from two draws.

Hormone levels based on a single draw will be more variable than

levels based on a pool of equal aliquots of two draws but the difference

in variation is likely to be small. Pooling equal aliquots of two

samples drawn 20 min apart will reduce the day-to-day component

of variation, , but it will not alter the long-term component, ,

or the assay components, or . The extent to which total

intraindividual variation, , is reduced will depend on the relative

contribution of the day-to-day component to the total. It will also

depend on the relative contribution of variation over the short interval

between blood draws to the day-to-day component because

variation over the interval between draws is the only part of day-to-day

variation that will be affected by the pooling. Given the other sources

of variation that contribute to the total, the effect of pooling on the

intraindividual standard deviation is likely very small, especially if

one focuses on intraindividual variation over 6 months without

batch testing (Scheme 4). It is difficult to be more specific than this

without information on the effect of pooling two aliquots on intra-

individual variation. This issue is currently under investigation.

Although differences in assay variation among methods will affect

intraindividual variation, the effect will be diluted when assay variation

is combined with biological variation ( or in the models

described earlier). Consider, for example, a study of total T in which

an assay is used for which both the intra- and interassay component

of variance is two-thirds the component in the assay used in this

study. Assuming no change in the biological components, reductions

of only 11% and 8·5%, respectively, for the day-to-day and 3-month

intraindividual standard deviations, can be calculated from the

variance components in Table 4.

New assays for steroid hormones are being used by commercial

laboratories and developed by investigators, such as approaches

based on liquid chromatography tandem mass spectrometry

(LC-MS/SM) and gas chromatography mass spectrometry (GC-MS).

The new and existing methods may not measure exactly the same

molecular species or isoforms. For example, the antibodies in a

method based on radioimmunoassay (RIA) may measure immuno-

reactive molecules that are not detected by LC-MS/MS or GC-MS.

Furthermore, protein hormones, such as LH, FSH and SHBG

circulate in various glycosylated isoforms that are differentially

recognized by the antibodies used in different RIAs. If the various

species or isoforms of a given hormone exhibit different levels

of biological variation, then switching to a different method may

alter both the assay and biological components of intraindividual

variation. Therefore, it may be necessary to repeat the measurements

made here using the newer assays in order to determine if the estimates

of biological variation are affected by the assay used.

With these caveats in mind, the standard deviations presented here

have a number of uses in clinical monitoring of hormone levels and

in trials of hormone replacement therapy and in other investigations

in which hormones are of interest. As shown in Table 6, confidence

limits for steady state mean hormone level for an individual can be

calculated from the results of one, two or several measurements. One

important application of the confidence limits arises when a threshold

of concern has been defined, such as a threshold for total T level

below which hypogonadism might be suspected or diagnosed. The

standard deviations provide a basis for determining how far a measured

T level or the average of a series of measurements of T must be from

the threshold to conclude that an individual’s steady state mean is

reliably above or below the threshold. They also provide a basis for

deciding whether the difference between two measurements is

large enough to represent a change in hormone level rather than a

fluctuation around the steady state mean, for developing eligibility

criteria for a clinical trial of hormone replacement therapy in hypo-

gonadal men and for doing sample size calculations for a trial in

which change in hormone level is the outcome of interest. The

calculations for these uses of the intraindividual standard deviations

are straightforward for a statistician; for the sake of brevity, we forego

detailed exposition of the methods.

The calculations summarized in Table 6 show that averaging the

results from repeated hormone measurements on the same individual

reduces the width of the confidence interval for the steady state

mean. However, the increment by which the confidence interval is

narrowed and precision is improved declines with each additional

sample. Reasonably large gains can be made by averaging the results

of two or three measurements but, in many cases, the precision

gained with further measurements is likely to be too small to be

meaningful.

While repeated sampling may improve the precision with which

mean hormone level is estimated, there are some circumstances

under which such repetition may not be necessary. For example,

repeated sampling may not be required if there is a consistent pattern

to the results, such as low total T coupled with abnormally high LH

σD

2 σL

2

σA

2 σE

2

σT2

σD

2σ σD L

2 2 +

Intraindividual variation of hormones in men 861


and perhaps physical symptoms of hypogonadism. The clinician

may find such a cluster of signs and symptoms to be sufficient for a

diagnosis without obtaining follow-up blood samples.

In addition to averaging the results of two or more samples from

the same subject, intraindividual variation can also be reduced by

averaging the results of repeated assays of the same sample. While

repeated assays of a sample will reduce assay variation, however,

they will not reduce biological variation. Averaging the results

from repeated samples reduces both assay and biological variation.

Therefore, repeated assays are less effective than repeated samples at

reducing total variation. Moreover, the assay components of variance

in Table 4 are generally smaller than the biological components,

further limiting the gain from repeated assays.

Many clinicians are aware of the problems created by intra-

individual variation in hormone levels when interpreting clinical

measurements of hormone levels. Researchers are all too aware of

the difficulties encountered in designing studies involving hormone

levels as eligibility criteria or end-points when information on

intraindividual variation is not available. The measurements of

intraindividual variation provided here should have broad application

clinically and in research in endocrinology.

Acknowledgements

This study was supported by grant number AG23027 from the

National Institute on Ageing of the National Institutes of Health, USA.

References

1 Vermeulen, A. & Deslypere, J.P. (1985) Testicular endocrine functionin the ageing male. Maturitas, 7, 273–279.

2 Gray, A., Feldman, H.A., McKinlay, J.B. & Longcope, C. (1991) Age,disease, and changing sex hormone levels in middle-aged men:results of the Massachusetts Male Aging Study. Journal of ClinicalEndocrinology and Metabolism, 73, 1016–1025.

3 Wu, A.H., Whittemore, A.S., Kolonel, L.N., John, E.M., Gallagher,R.P., West, D.W., Hankin, J., Teh, C.Z., Dreon, D.M. & Paffenbarger,R.S. (1995) Serum androgens and sex hormone-binding globulinsin relation to lifestyle factors in older African-American, white, andAsian men in the United States and Canada. Cancer EpidemiologyBiomarkers and Prevention, 4, 735–741.

4 Hsieh, C.C., Signorello, L.B., Lipworth, L., Lagiou, P., Mantzoros,C.S. & Trichopoulos, D. (1998) Predictors of sex hormone levelsamong the elderly: a study in Greece. Journal of Clinical Epidemiol-ogy, 51, 837–841.

5 Denti, L., Pasolini, G., Snfelici, L., Benedetti, R., Cecchetti, A., Ceda,G.P., Ablondi, F. & Valenti, G. (2000) Aging-related decline ofgonadal function in healthy men: correlation with body compositionand lipoproteins. Journal of the American Geriatrics Society, 48, 51–58.

6 Feldman, H.A., Longcope, C., Derby, C.A., Johannes, C.B., Araujo,A.B., Coviello, A.D., Bremner, W.J. & McKinlay, J.B. (2002) Agetrends in the level of serum testosterone and other hormones inmiddle-aged men: longitudinal results from the Massachusetts maleaging study. Journal of Clinical Endocrinology and Metabolism, 87,589–598.

7 Gapstur, S.M., Gann, P.H., Kopp, P., Colangelo, L., Longcope, C. &Liu, K. (2002) Serum androgen concentrations in young men: a

longitudinal analysis of associations with age, obesity, and race. TheCARDIA male hormone study. Cancer Epidemiology Biomarkers andPrevention, 11, 1041–1047.

8 Kaufman, J.M. & Vermeulen, A. (2005) The decline of androgenlevels in elderly men and its clinical and therapeutic implications.Endocrine Reviews, 26, 833–876.

9 Diver, M.J. (2006) Analytical and physiological factors affecting theinterpretation of serum testosterone concentration in men. Annalsof Clinical Biochemistry, 43, 3–12.

10 Fox, C.A., Ismail, A.A., Love, D.N., Kirkham, K.E. & Loraine, J.A.(1972) Studies on the relationship between plasma testosterone levelsand human sexual activity. Journal of Endocrinology, 52, 51–58.

11 Morley, J.E., Patrick, P. & Perry, H.M. 3rd. (2002) Evaluation ofassays available to measure free testosterone. Metabolism, 51, 554–559.

12 Nieschlag, E. & Ismail, A.A. (1970) Diurnal variations of plasmatestosterone in normal and pathological conditions as measured bythe technique of competitive protein binding. Journal of Endocrinology,46, 3–4.

13 Vermeulen, A. & Verdonck, G. (1992) Representativeness of a singlepoint plasma testosterone level for the long term hormonalmilieu in men. Journal of Clinical Endocrinology and Metabolism, 74,939–942.

14 Couwenbergs, C., Knussmann, R. & Christiansen, K. (1986) Com-parisons of the intra-and inter-individual variability in sex hormonelevels of men. Annals of Human Biology, 13, 63–72.

15 Ricos, C. & Arbos, M.A. (1990) Quality goals for hormone testing.Annals of Clinical Biochemistry, 27, 353–358.

16 Valero-Politi, J. & Fuentes-Arderiu, X. (1993) Within- and between-subject biological variations of follitropin, lutropin, testosterone, andsex-hormone-binding globulin in men. Clinical Chemistry, 39,1723–1725.

17 Ahokoski, O., Virtanen, A., Huupponen, R., Scheinin, H., Salminen,E., Kairisto, V. & Irjala, K. (1998) Biological day-to-day variation anddaytime changes of testosterone, follitropin, lutropin and oestradiol-17beta in healthy men. Clinical Chemistry and Laboratory Medicine,36, 485–491.

18 McKinlay, J.B. & Link, C.L. (2007) Measuring the urologic iceberg:design and implementation of the Boston Area Community Health(BACH) Survey. European Urology, 52, 389–396.

19 Bremner, W.J., Vitiello, M.V. & Prinz, R.N. (1983) Loss of circadianrhythm in blood testosterone levels with aging in normal men.Journal of Clinical Endocrinology and Metabolism, 56, 1278–1281.

20 Gray, A., Berlin, J.A., McKinlay, J.B. & Longcope, C. (1991) An exam-ination of research design effects on the association of testosteroneand male aging: results of a meta-analysis. Journal of Clinical Epi-demiology, 44, 671–684.

21 Brambilla, D.J., McKinlay, S.M., McKinlay, J.B., Weiss, S.R.,Johannes, C.B., Crawford, S.L. & Longcope, C. (1996) Does collect-ing repeated blood samples from each subject improve the precisionof estimated steroid hormone levels? Journal of Clinical Epidemiology,49, 345–350.

22 Sodergard, R., Backstrom, T., Shanbhag, V. & Carstensen, H. (1982)Calculation of free and bound fractions of testosterone and estradiol-17 beta to human plasma proteins at body temperature. Journal ofSteroid Biochemistry, 16, 801–810.

23 Vermeulen, A., Verdonck, L. & Kaufman, J. (1999) A critical evaluationof simple methods for the estimation of free testosterone in serum.Journal of Clinical Endocrinology and Metabolism, 84, 3666–3672.

24 Searle, S.R., Casella, G. & McCulloch, C.E. (1992) Variance Compo-nents. John Wiley & Sons, New York.

862 D. J. Brambilla et al.


25 Fitzmaurice, G.M., Laird, N.M. & Ware, J.H. (2004) Applied Longi-tudinal Analysis. John Wiley & Sons, New York.

26 Davison, A.C. & Hinkley, D.V. (1997) Bootstrap Methods and TheirApplication. Cambridge University Press, Cambridge, UK.

27 Pincus, S.M., Mulligan, T., Iranmanesh, A., Gheorghiu, S., Godschalk,M. & Veldhuis, J.D. (1996) Older males secrete luteinizing hormoneand testosterone more irregularly, and jointly more asynchronously,than younger males. Proceedings of the National Academy of Sciencesof the USA, 93, 14100–14105.

28 Mulligan, T., Iranmanesh, A., Gheorghiu, S., Godschalk, M. &Veldhuis, J.D. (1995) Amplified nocturnal luteinizing hormone(LH) secretory burst frequency with selective attenuation of pulsatile(but not basal) testosterone secretion in healthy aged men: possibleLeydig cell desensitization to endogenous LH signaling – a clinicalresearch center study. Journal of Clinical Endocrinology andMetabolism, 80, 3025–3031.

29 Nicolau, G.Y., Haus, E., Lakatua, D.J., Bogdan, C., Sackett-Lundeen,L., Popescu, M., Berg, H., Petrescu, E. & Robu, E. (1985) Circadianand circannual variations of FSH, LH, testosterone, dehydroepi-androsterone-sulfate (DHEA-S) and 17-hydroxy progesterone (17OH-Prog) in elderly men and women. Endocrinologie, 23, 223–246.

30 Tenover, J.S., Matsumoto, A.M., Clifton, D.K. & Bremner, W.J.(1988) Age-related alterations in the circadian rhythms of pulsatile

luteinizing hormone and testosterone secretion in healthy men.Journal of Gerontology, 43, M163–M169.

31 Plymate, S.R., Tenover, J.S. & Bremner, W.J. (1989) Circadian variationin testosterone, sex hormone-binding globulin, and calculated non-sex hormone-binding globulin bound testosterone in healthy youngand elderly men. Journal of Andrology, 10, 366–371.

32 Winters, S.J. (1991) Diurnal rhythm of testosterone and luteinizinghormone in hypogonadal men. Journal of Andrology, 12, 185–190.

33 Cooke, R.R., McIntosh, J.E. & McIntosh, R.P. (1993) Circadianvariation in serum free and non-SHBG-bound testosterone in normalmen: measurements, and simulation using a mass action model.Clinical Endocrinology, 39, 163–171.

34 Mermall, H., Sothern, R.B., Kanabrocki, E.L., Quadri, S.F., Bremner,F.W., Nemchausky, B.A. & Scheving, L.E. (1995) Temporal (cir-cadian) and functional relationship between prostate-specific antigenand testosterone in healthy men. Urology, 46, 45–53.

35 Gupta, S.K., Lindemulder, E.A. & Sathyan, G. (2000) Modeling ofcircadian testosterone in healthy men and hypogonadal men. Journalof Clinical Pharmacology, 40, 731–738.

36 Diver, M.J., Imtiaz, K.E., Ahmad, A.M., Vora, J.P. & Fraser, W.D.(2003) Diurnal rhythms of serum total, free and bioavailable testo-sterone and of SHBG in middle-aged men compared with those inyoung men. Clinical Endocrinology, 58, 710–717.

O R I G I N A L A R T I C L E

Effects of testosterone treatment on glucose metabolism andsymptoms in men with type 2 diabetes and the metabolicsyndrome: a systematic review and meta-analysis of randomizedcontrolled clinical trials

Mathis Grossmann*,†, Rudolf Hoermann*, Gary Wittert‡ and Bu B. Yeap§,¶

*Department of Medicine Austin Health, University of Melbourne, †Endocrine Unit, Austin Health, Heidelberg, Vic., ‡Discipline ofMedicine, Royal Adelaide Hospital, University of Adelaide, Adelaide, SA, §School of Medicine and Pharmacology, University of

Western Australia and ¶Department of Endocrinology and Diabetes, Fremantle and Fiona Stanley Hospitals, Perth, WA, Australia

Summary

Context The effects of testosterone treatment on glucose

metabolism and other outcomes in men with type 2 diabetes

(T2D) and/or the metabolic syndrome are controversial.

Objective To perform a systematic review and meta-analysis of

placebo-controlled double-blind randomized controlled clinical

trials (RCT) of testosterone treatment in men with T2D and/or

the metabolic syndrome.

Data sources A systematic search of RCTs was conducted

using Medline, Embase and the Cochrane Register of controlled

trials from inception to July 2014 followed by a manual review

of the literature.

Study selection Eligible studies were published placebo-con-

trolled double-blind RCTs published in English.

Data extraction Two reviewers independently selected studies,

determined study quality and extracted outcome and descriptive

data.

Data synthesis Of the 112 identified studies, seven RCTs

including 833 men were eligible for the meta-analysis. In studies

using a simple linear equation to calculate the homeostatic

model assessment of insulin resistance (HOMA1), testosterone

treatment modestly improved insulin resistance, compared to

placebo, pooled mean difference (MD) �1�58 [�2�25, �0�91],P < 0�001. The treatment effect was nonsignificant for RCTs

using a more stringent computer-based equation (HOMA2),

MD �0�19 [�0�86, 0�49], P = 0�58). Testosterone treatment did

not improve glycaemic (HbA1c) control, MD �0�15 [�0�39,0�10], P = 0�25, or constitutional symptoms, Aging Male

Symptom score, MD �2�49 [�5�81, 0�83], P = 0�14).

Conclusions This meta-analysis does not support the routine

use of testosterone treatment in men with T2D and/or the meta-

bolic syndrome without classical hypogonadism. Additional

studies are needed to determine whether hormonal interventions

are warranted in selected men with T2D and/or the metabolic

syndrome.

(Received 11 August 2014; returned for revision 17 September

2014; finally revised 9 October 2014; accepted 4 November 2014)

Introduction

A large body of epidemiological evidence shows that, in men,

low serum testosterone is associated with insulin resistance and

the associated conditions metabolic syndrome and type 2 diabe-

tes (T2D). In meta-analyses of case-control studies, total testos-

terone levels are consistently lower in men with T2D or with the

metabolic syndrome compared to controls, although the degree

of the reduction is modest, ranging from �1�61 nM (95%CI

�2�56 to �0�65) to �2�99 nM (95%CI �3�59 to �2�40).1–3 In

addition, low testosterone levels predict incident T2D or the

metabolic syndrome in some4 but not all5 longitudinal studies.

From a mechanistic perspective, this relationship between low

testosterone and dysglycaemia is complex, and at least in part,

mediated by changes in body composition, in particular visceral

fat mass.6 It is also bidirectional: On the one hand, androgen

deprivation increases fat mass and insulin resistance,7 and, on

the other hand, weight loss increases both insulin sensitivity and

testosterone levels8 Preclinical evidence reviewed elsewhere6,9

suggests that testosterone regulates stem cells and differentiated

adipocytes and myocytes to promote metabolically favourable

changes in body composition and glucose metabolism.

The hypothesis that testosterone treatment improves measures

of glucose metabolism has been tested in a number of interven-

tional studies, which collectively have yielded inconclusive

results. In this study, therefore we sought to conduct a systematic

review and meta-analysis of the effects of testosterone therapy

Correspondence: Mathis Grossmann, Department of Medicine AustinHealth, The University of Melbourne, 145 Studley Road, Heidelberg,Vic., 3084, Australia. Tel.: +613 9496 5000; Fax: +613 9496 3365;E-mail: [email protected]

© 2014 John Wiley & Sons Ltd 1

Clinical Endocrinology (2014) doi: 10.1111/cen.12664

on glucose metabolism in men with T2D or the metabolic syn-

drome. In contrast to previous meta-analyses in this area,2,3,10

we limited our analysis to placebo-controlled double-blind ran-

domized controlled clinical trials (RCT) and include two recent

such studies11,12 that have not been considered previously.

Materials and methods

In this study, we followed the reporting recommendations made

in the PRISMA (Preferred reporting items for systematic reviews

and meta-analyses) statement.13 The PRISMA statement was

designed to improve the quality of systematic reviews or meta-

analyses. The statement lists 27 items to include when conduct-

ing and reporting such a study. In this study, all 27 items were

included.

Eligibility criteria

Eligible studies were defined in a protocol as fully published

(English language) double-blind randomized controlled clinical

trials that assessed the effects of testosterone therapy in men

with diagnosed metabolic syndrome and/or T2D on measures of

glucose metabolism.

Search strategy

We conducted a comprehensive search of the literature using

the electronic databases Medline, Embase and the Cochrane reg-

ister of controlled trials from inception to July 2014. The search

strategy was developed in consultation with an experienced med-

ical research librarian using a broad range of relevant search

terms (full research strategies are available in the supplementary

information). In addition, reference lists of potentially eligible

articles and relevant reviews were searched by hand. Study selec-

tion was conducted by two independent reviewers (M.G. and

B.B.Y). Studies included by both reviewers were compared and

disagreement resolved by consensus and third party adjudica-

tion. Only placebo-controlled double-blind RCTs were eligible.

Data extraction

Two investigators (M.G and B.B.Y) independently extracted the

relevant data using a standardized form. Data extracted from

each eligible RCT included demographic information, diagnosis

of T2D and/or the metabolic syndrome, number of participants,

baseline and on treatment testosterone levels, type and duration

of treatment. Disagreements were resolved by consensus and

third party adjudication.

Quality assessment

Two reviewers (M.G and B.B.Y), working in duplicate, assessed

the methodological quality of each eligible RCT using the full

25-item CONSORT checklist of information to be included

when reporting a RCT.14 The CONSORT checklist was designed

to improve the quality of RCT reporting. Therefore, the quality

of a published RCT can be assessed by quantifying how many of

the 25 recommended criteria are reported. A high quality RCT

will report all 25 items, and the quality of a RCT correlates

inversely with the number of reported items.

Primary outcomes

The primary outcomes of interest were the mean differences

(MD) in insulin resistance (assessed by homeostatic model

assessment of insulin resistance, HOMA-IR) and in glycaemic

control (assessed by HbA1c) between treatment and control

groups. In addition, standardized mean differences (SMD) were

derived. For each eligible RCT, MD � Standard Deviation (SD)

from baseline to end of trial in each group, treated and controls

were retrieved for HOMA-IR and HbA1c. Where SD was not

given, it was estimated from SEM or from the 95% confidence

interval of the MD. For obtaining SMD, mean differences of

individual trials were standardized prior to meta-analysing. In

RCTs reporting an open-label extension phase, only the initial

double-blind placebo-controlled phase was considered.

Secondary outcomes

The secondary outcomes were effects of testosterone on symp-

toms, cardiovascular risk markers and adverse effects. We meta-

analysed constitutional symptoms reported by Aging Male Symp-

tom score. Due to between-trial heterogeneity and inconsistent

reporting, effects of testosterone on sexual symptoms, cardiovas-

cular risk markers and adverse effects could not be meta-analy-

sed, but were instead reported in a descriptive fashion.

Data synthesis and statistical analysis

Three investigators (M.G., B.B.Y and G.W.) independently veri-

fied and collated the extracted data to provide a descriptive syn-

thesis of key characteristics and a quantitative synthesis of effect

size estimates for each RCT.

The consistency or heterogeneity of the results among various

trials in a meta-analysis is an important statistical measure.

Hence, the variation in effect beyond chance was tested by using

both the Cochran’s Q chi-squared test and the I2 test, which

describes the percentage of the variability in effect estimates that

is due to heterogeneity rather than sampling error, therefore

providing a quantitative measure of nonrandom differences

observed across studies. An I2 > 30% indicates a moderate inter-

trial heterogeneity. Given the fact that considerable inconsistency

existed among the trials, we used a random effects model to

estimate the average true difference (MD) in HOMA-IR or

HBA1c in the treatment group, compared to placebo. The

choice of differing HOMA-IR modelling employed in the trials,

HOMA1 or HOMA2, was assessed in the meta-analysis by add-

ing this information as a moderator variable. SMD are based on

Hedges’ g with appropriate correction for a negative bias. The

Hedges g’ with appropriate correction for negative bias was used

to calculate the SMD, as recommended by the Cochrane Library,

which in contrast to the MD relates the size of the intervention

© 2014 John Wiley & Sons Ltd

Clinical Endocrinology (2014), 0, 1–8

2 M. Grossmann et al.

effect to the variability in each study. In a moderator analysis,

we examined the impact of a covariate (moderator variable) on

the effect. In addition, we used a weighted restricted maximum-

likelihood estimator for fitting the models, and the model was

implemented by the metafor package (version 1.9.4) in the R

statistical program (R for Mac version 3.1.1).15,16

Graphical data presentation included Forest plots and Funnel

plots. A Funnel plot is a scatterplot of treatment effect against a

measure of study size-related imprecision (such as the standard

error used here) on the vertical axis and serves as a visual aid

for detecting bias. In addition to this graphical method, a regres-

sion test (regtest) was employed to formally test for the presence

of asymmetry in the Funnel plot.

The study is an analysis of published data that does not

require specific approval by an ethics committee.

Results

Study identification and descriptive data synthesis

Figure 1 shows the flow chart summarizing the identification of

eligible RCTs. Seven RCTs were eligible for analysis, randomiz-

ing between 22 and 220 men for a total of 833 men (449 treated

with testosterone and 384 receiving placebo) (Table 1).11,12,17–21

Major inclusion criteria in all studies were T2D and/or the

metabolic syndrome, and low or low-normal serum testosterone

levels. Presence of symptoms suggestive of androgen deficiency

was an inclusion criterion in five studies.12,17–20 Major exclusion

criteria were contraindications to testosterone treatment (e.g.

prostate disease, uncontrolled sleep apnoea, increased haemato-

crit), previous testosterone or anabolic treatment within the

last 6–12 months, and uncontrolled T2D (variously defined

HbA1c > 9–10%). 531 men had T2D either with or without the

metabolic syndrome, and 302 men had the metabolic syndrome,

but no diagnosis of T2D. Mean age of participants ranged from

44 to 64 years, BMI from 24–35 kg/m2 and baseline total

testosterone levels from 6�7 to 10�1 nM. Durations of the dou-

ble-blind treatment phase ranged from 3 to 12 months, and six

RCTs used intramuscular and one RCT transdermal testosterone

(Table 1). The studies were conducted in Italy,17 India,19 United

Kingdom,12,18 Russia21 and one was a multinational RCT con-

ducted in eight European Countries.20 Median CONSORT

scores ranged from 16 to 24, where 0 denotes the lowest and 25

the highest quality.

Effects of testosterone therapy on insulin resistance

(HOMA-IR)

HOMA-IR estimates were derived by different methods in the tri-

als. While five studies17–21 were using the original HOMA1 equa-

tion (fasting insulin in mUl/L x fasting glucose in mM)/22�5), twotrials11,12 employed the computer-based HOMA2 model.22 Com-

bining all seven trials resulted in a large intertrial heterogeneity

(Cochrane Q-test <0�0001, I2 = 76%). The heterogeneity was

markedly reduced when accounting for the choice of the HOMA-

IR model in the analysis (QE test P = 0�21, I2 = 36�4%).

Figure 2a shows the MDs in HOMA-IR after testosterone

therapy between treatment and control groups. While testoster-

one treatment significantly improved insulin resistance, com-

pared to placebo, in the HOMA1-based studies (pooled MD

�1�58 [�2�25, �0�91], P < 0�001), the treatment effect was non-

significant for the HOMA2 trials (�0�19, [�0�86, 0�49],P = 0�58). In the moderator analysis, the HOMA-IR-model-

related shift was 1�39, [0�44, 2�34], P = 0�004. When excluding

the study ranking lowest in the quality score,17 the treatment

effect for HOMA1 models decreased slightly to �1�22, [�1�92,�0�53], P = 0�0006. The SMD (Hedges’ g) for HOMA-IR across

all trials was �0�34 [�0�51, �0�16], P < 0�001, suggesting a

small to moderate treatment effect (Fig. 2b). A Funnel plot

including a regtest for funnel plot asymmetry (z = 0�20,P = 0�84) revealed no significant publication bias (Figure S1).

Effects of testosterone therapy on glycaemic control

(HbA1c)

The intertrial heterogeneity for the testosterone effect on HbA1c

was significant among the trials (Cochrane Q-test P < 0�001,I2 = 77�3%). The pooled difference was minor and did not reach

statistical significance (�0�15, [�0�39, 0�10], P = 0�25), as shownin Fig. 3. The funnel plot showed no indication for a publica-

tion bias or significant asymmetry (Figure S2, regtest z = 0�82,P = 0�41). Excluding the qualitatively weakest trial17 proved

again statistically inconsequential (�0�10, [�0�34, 0�42],P = 0�42). Omitting individual trials in leave-one-out scenarios

proved statistically inconsequential for all individual RCTs with

the exception of the trial by Gianatti et al.11 When omitting this

trial,11 the reduction in HbA1c was significant (�0�24 [�0�43,�0�06], P = 0�02), but failed to retain significance when applying

a more robust Knapp Hartung test (P = 0�16). The pooled dif-

ference expressed in terms of SMD was nonsignificant, �0�50[�1�37, 0�36], P = 0�25.

PubMed search

N = 112

PubMed (unindexed)

N = 28Embase search

N = 83

Cochrane register

N = 64

First cutN = 53

Studies identifiedN = 16

RCTs eligibleN = 7

Fig. 1 Flow chart summarizing the identification of studies included for

the meta-analysis.



Testosterone and glucose metabolism 3

Table

1.Characteristicsoftherandomized

controlled

trialsincluded

Reference

Major

inclusion

criteria

N (treated)

N (placebo)

Mean

age

(years)

MeanBMI

(m/kg2)

HbA1c

(%)

TT

Baseline

(nM)

Treatment

Duration

(RCTphase)

TT

Achieved

(nM)

Aversaet

al.17

MetST

T≤11

nM

orFT≤250pM9

2

4010

5731

5�7–6�6

8�7TU

im1g

every12

weeks

12months

14�2

Gopal

etal.19

T2D

FT<225pM

22(crossover,

randomized

blinded

placebo-controlled)

4424

7�010�1

Tim

every

15days9

6

3months

N.R.

Jones

etal.20

T2D

orMetST

T<11

nMorFT<255pM9

2

108

112

6032

7�2%

(T2D

)

9�4Tgel60

mg/day

6months

19�5

Kalinchenko

etal.21

MetST

T<12

nM

orFT<225pM

113

7152

35N.R.

6�7TU

im1gat

0,

6and18

weeks

30weeks

13�1

Kapooret

al.18

T2D

TT<12

nM9

224

(crossover,

randomized

double-

blindplacebo-

controlled)

6433

7�38�6

Tim

200mg

every2weeks

3months

(1months

washout)

12�8

Hackettet

al.12,24

T2D

TT8�1

–12nM

or≤8

nM(FT0�1

8–0�2

5

or≤0

�18nM)average/2

97102

6233

7�69�1

TU

im1gat

0,

6and18

weeks

30weeks

(RCTphase)

11�2

Gianattiet

al.11,25

T2D

TT≤12

nMaverage/2

4543

6231

7�18�6

TU

im1gat

0,

6,18

and30

weeks

40weeks

13�6

Reference

HOMA-IR

HbA1c

(%)

Studydesign

Qualityscore

(outof25)

Aversaet

al.17

T�2

�10�

3�5PBO

0�5�

0�3P<0�0

01

T�0

�2�

0�4PBO

0�9�

1�6P<0�0

1

Mixed

Model

14

Gopal

etal.19

T0�3

8�

2�61P

BO

1�72�

3�49M

D�1

�67�

4�29(CI�6

�91,10�25

)

T�0

�06�

1�87P

BO

–2�04

�4�5

5MD

1�75�

5�35(CI�1

2�46,

8�95)

ITT

16

Jones

etal.20

T�0

�68�

2�81P

BO

�0�07

�2�5

9

T�0

�06�

1�87P

BO

–2�04

�4�5

5ITT

21

Kalinchenko

etal.21

T�1

�49�

2�56(SEM)PBO

0�20�

0�84(SEM)P=0�0

4

N/A

Mixed

Model

19

Kapooret

al.18

MD

T�1

�73�

0�67(SEM)P=0�0

2MD

�0�37

�0�1

7(SEM)P=0�0

3Per

Protocol

17

Hackettet

al.12,24

T4�0

6�

2�02to

4�16�

2�58P

BO

3�67�

2�64to

3�94�

2�22M

AD

0�15�

2�19(CI�0

�92,1�2

2)

T7�7

4�

1�31to

7�68�

1�26P

BO

7�47�

1�24to

7�54�

1�24M

AD

�0�11

�0�8

4(CI�0

�34,0�1

3)

Mixed

Model

21

Gianattiet

al.11,25

MAD

�0�08

(CI�0

�31,0�4

7),P=0�2

3MAD

0�36(CI0�0

,0�7

),P=0�0

5Mixed

Model

24

TT,totaltestosterone;FT,free

testosterone,MetS,

metabolicsyndrome,TU,testosteroneundecanoate;im

,intram

uscular;T,testosteroneeffect,PBO,placeboeffect;MD,meandifference,compared

with

placebo;MAD,meanadjusted

difference;SE

M,standarderrorofthemean;CI,95%

confidence

interval;N.R.,notreported;ITT,intentionto

treat.Datashownin

thetable

arereported

measures.For

themeta-analysis,SD

-converted

(whererequired)standardized

effect

sizeswereused.




Effects of testosterone treatment on other

cardiovascular risk markers

Three11,12,18 of the seven studies reported modest (�0�21 to

0�40 mmol) reductions in total cholesterol with testosterone

treatment relative to placebo, with the largest decrease in the

smallest study.18 LDL cholesterol was reduced (by - 0�26 mM) in

one study,11 and modest decreases in HDL cholesterol were

found in two studies.11,20 None of the studies showed effects on

triglyceride or blood pressure levels. One study reported a signif-

icant decrease in Lipoprotein a.20 The three RCTs11,17,21 that

assessed C-reactive protein (CRP) levels all showed that testos-

terone treatment reduced CRP levels, although this was just

short of statistical significance (P = 0�05) in one study.11

Effects of testosterone treatment on symptoms

Effects of testosterone treatment on symptoms consistent with

androgen deficiency were reported as secondary outcome mea-

sures either within the same publication18–20 or in a separate

manuscript.23–25 Constitutional symptoms using the Aging Male

Symptom score were reported by 4 RCTs,20,23–25 including a

total of 691 men.

The trials showed considerable heterogeneity with respect to

their mean differences in Aging Male Symptom scores between

the testosterone and control groups (Q-test P < 0�001,I2 = 92%). Overall, the pooled treatment effect of testosterone

administration on the symptom score was nonsignificant, MD

�2�49 [�5�81, 0�83], P = 0�14) (Fig. 4). The funnel plot (Figure

S3) and regtest (z = �0�23, P = 0�82) were sufficiently balanced

with a wide variation. The SMD missed significance, �0�35[�0�76, 0�06], P = 0�09.Effects on sexual function could not be meta-analysed due to

the use of different instruments to assess sexual function across

RCTs. Of the four trials that reported effects on sexual function,

one study reported a modest improvement erectile function24

whereas three studies did not find significant effects. Two studies

reported an increase in sexual desire,20,24 whereas overall sexual

function improved in only one of the four studies.23

Adverse events

Adverse event reporting was inconsistent among trials. The

reported incidences of serious adverse events were none (in

three studies)18,19,21 2�0%,17 2�0%12 7�7%20 and 9�1%11 with no

differences between testosterone and placebo groups. A total of

24 cardiovascular related events were reported in three studies

(testosterone group vs placebo group: 0 vs 1,17 3 vs 311 and 5 vs

1220), whereas the other four studies did not report any cardio-

vascular related events. The overall incidence was 2�9%, with

–4·00 –2·00 0·00Mean difference

Hackett et al. 2013 Gianatti et al. 2014Kalinchenko et al. 2011Jones et al. 2011Gopal et al. 2010Kapoor et al. 2006Aversa et al. 2010

–0·26 [ –1·09 , 0·57 ]–0·15 [ –0·50 , 0·20 ]

–1·69 [ –3·34 , –0·04 ]–0·61 [ –1·61 , 0·39 ]–1·34 [ –3·16 , 0·48 ]

–1·73 [ –2·91 , –0·55 ]–2·60 [ –3·70 , –1·50 ]

RE Model

–2·00 –1·00 0·00Standardized mean difference

Hackett et al. 2013 Gianatti et al. 2014Kalinchenko et al. 2011Jones et al. 2011Gopal et al. 2010Kapoor et al. 2006Aversa et al. 2010

–0·15 [ –0·63 , 0·33 ]–0·21 [ –0·69 , 0·27 ]–0·32 [ –0·64 , 0·00 ]–0·22 [ –0·59 , 0·15 ]–0·43 [ –1·02 , 0·17 ]

–1·05 [ –1·84 , –0·26 ]–0·81 [ –1·52 , –0·10 ]

–0·34 [ –0·51 , –0·16 ]

(a)

(b)

Fig. 2 Mean difference (MD) (with 95%CI) in HOMA-IR across

randomized controlled trials between testosterone- and placebo-treated

groups. (a) MDs were stratified in a moderator analysis by the HOMA-

IR model used, as indicated by the grey diamonds. (b) Random effects

model of the standardized mean difference (SMD) in HOMA-IR across

trials with different HOMA models. SMD is based on Hedges’ g (see

Materials and methods). The studies by Gianatti11 and Hackett12 used

HOMA2, all others HOMA1 to estimate insulin resistance.

RE Model

–4·00 0·00 2·00 4·00 6·00Mean difference

Hackett et al. 2013 Gianatti et al. 2014Jones et al. 2011Gopal et al. 2010Kapoor et al. 2006 Aversa et al. 2010

–0·13 [ –0·41 , 0·15 ] 0·20 [ –0·09 , 0·49 ]

–0·12 [ –0·33 , 0·09 ] 1·98 [ –0·08 , 4·04 ]

–0·37 [ –0·46 , –0·28 ]–1·10 [ –2·10 , –0·10 ]

–0·15 [ –0·39 , 0·10 ]

Fig. 3 Mean difference (with 95%CI) in HbA1c across randomized

controlled trials between testosterone- and placebo-treated groups.

RE Model

–10·00 –5·00 0·00 5·00Mean difference

Hackett et al. 2014 Gianatti et al. 2014Giltay et al. 2010 Jones et al. 2011

–2·00 [ –3·65 , –0·35 ]–0·90 [ –3·06 , 1·26 ]

–7·40 [ –9·37 , –5·43 ] 0·30 [ –1·50 , 2·10 ]

–2·49 [ –5·81 , 0·83 ]

Fig. 4 Mean difference (with 95%CI) in Aging Male symptom score

across randomized controlled trials between testosterone- and placebo-

treated groups.




eight events occurring in testosterone treated, and 16 in pla-

cebo-treated patients, including one fatal cardiovascular event

occurring in a man receiving placebo. Significant increases in

haematocrit were reported in four,11,17,20,21 and significant

increases in PSA in one11 of the seven RCTs.

Discussion

In this systematic review and meta-analysis, using rigorous

inclusion criteria, we have identified seven double-blind pla-

cebo-controlled RCTs that assess the effects of testosterone treat-

ment on glucose metabolism in men with T2D and/or the

metabolic syndrome and low to low-normal testosterone levels.

Although the RCTs were of moderate to high quality, there was

a large between trial heterogeneity. Differences in the choice of

HOMA-IR-based methodology to estimate insulin resistance was

identified as a major contributor as accounting for HOMA-IR

methodology eliminated significant between trial heterogeneity.

Collectively, the results indicate that testosterone treatment

modestly improved insulin resistance in RCTs that estimated

insulin resistance by using the HOMA1 equation. In contrast, in

RCTs that used a computer-based model to determine insulin

resistance (HOMA2),22 insulin resistance was not improved.

While HOMA1 gives a good approximation of beta cell function

relative to gold standard insulin clamp studies, the HOMA2

model attempts to account for variations in hepatic and periph-

eral glucose resistance, for increases in the insulin secretion

curve for plasma glucose concentrations above 10 mM, and for

the contribution of circulating proinsulin [https://www.dtu.ox.

ac.uk/ homacalculator/]. HOMA2 may thus provide a more pre-

cise physiological basis for estimation of insulin resistance.22

Therefore, testosterone therapy results in improved glucose-insu-

lin profiles as assessed in the fasting state using HOMA1, which

would be consistent with a beneficial effect to ameliorate insulin

resistance. However, this may be modulated indirectly via factors

which are captured by HOMA2, possibly explaining the disparity

in trial results when the newer model is used.

Testosterone treatment had no significant effect on glycaemic

control, assessed by HbA1c. Although we did not identify evi-

dence of publication bias, we are aware of one unpublished RCT

in 180 men with T2D with baseline HbA1c of 7�0–9�5% and

total testosterone < 10�4 nM.26 In this RCT, testosterone treat-

ment had no significant effect on HbA1c or on HOMA-IR, com-

pared to placebo. Insufficient information precluded inclusion of

this RCT into the meta-analysis, although it is expected that

inclusion of this relatively large negative study would have fur-

ther reduced the difference in glycaemic outcomes between tes-

tosterone and placebo groups. Similarly to the trial by

Gianatti,11 this unpublished RCT recruited exclusively patients

with established T2D. Reductions in insulin resistance with tes-

tosterone treatment were predominantly reported in RCTs that

included men with the metabolic syndrome but without estab-

lished T2D.20,21 This raises the possibility that testosterone

treatment may be more effective in improving glycaemic out-

comes in men with the metabolic syndrome compared to men

with established T2D.

Our results differ from previous meta-analyses in this area,

which have generally shown more favourable effects of testoster-

one treatment on glucose metabolism.2,3,10 These previous meta-

analyses were smaller, including between 228 and 483 partici-

pants, showed larger between trial heterogeneity (I2 38–82%),

and were not restricted to placebo-controlled double-blind

RCTs, but instead included nonblinded, open-label trials. In

addition, the two more recent studies using more stringent

HOMA211,12 modelling were not included.

Given that men in all the studies included had relatively well

controlled T2D at baseline, the effects of testosterone treatment in

men with poorly controlled T2D are unknown. Of note, RCTs in

such populations would be more difficult to conduct, given the

efficacy of standard antidiabetic medications. In addition, baseline

testosterone levels in RCT participants were only modestly

reduced, and whether testosterone treatment improves glucose

metabolism in men with more marked reductions in testosterone

levels remains unknown. Experimental studies of induced hypog-

onadism have not identified a serum testosterone threshold below

which insulin resistance increases.27 Consistent with this, the

inverse relationship of insulin resistance with testosterone levels in

men with T2D does not have a clear breakpoint and remains pres-

ent even in men with testosterone levels extending into the normal

range.28 Marked reductions in testosterone are relatively uncom-

mon in men with T2D and require careful assessment for underly-

ing classical hypogonadism.8 Such men may well require

testosterone treatment irrespective of glycaemic considerations.

While there is evidence that testosterone treatment can improve

glucose metabolism in preclinical studies by a variety of cellular

mechanisms,8,9 the degree of which they are operative in men is

unknown. Testosterone treatment modestly increases muscle mass

and decreases fat mass,29 changes expected to be metabolically

favourable. Of the meta-analysed RCTs, only one study11 reported

changes in body composition assessed by rigorous methodology.

Despite the expected decrease in fat mass and increase in muscle

mass, in that study insulin resistance was not improved. Similarly,

in the unpublished RCT, insulin resistance was not improved

despite a significant increase in lean body mass.26 In a recent study

of 57 obese men, testosterone treatment did not improve insulin

sensitivity assessed by euglycaemic clamps, despite significant

decreases in fat mass and increases in total fat mass,30 and in a

chemical castration study of healthy young men, no changes in

insulin resistance were observed despite significant increases in fat

mass.31 One explanation for this apparent paradox is the observa-

tion that in men with T2D testosterone treatment preferentially

reduced the amount of subcutaneous, but not of the metabolically

more active visceral fat.11 Similarly, testosterone treatment had no

significant effect on visceral adipose tissue in most but not all

RCTs conducted in overweight or obese men not specifically

selected for the presence of T2D and/or the metabolic syndrome.8

Additional studies are needed to clarify whether testosterone acts

selectively on subcutaneous compared to visceral fat, or whether

specific circumstances exist which lead to one or other reservoir

being affected.

Given that the metabolic syndrome and T2D are slowly pro-

gressive conditions, it remains possible that longer duration of




testosterone treatment, beyond the maximum duration of

12 months in current RCTs (Table 1) may have more marked

effect on glucose metabolism. While uncontrolled registry stud-

ies have demonstrated progressive improvements in glucose

metabolism in men treated with testosterone up to 6 years,32

these observations have yet to be confirmed in controlled trials.

With respect to the effects of testosterone treatment on other

cardiovascular risk factors, effects in the different RCTs were rel-

atively modest and consistent with findings from RCTs in of tes-

tosterone therapy in men from the general population.33 Effects

on lipids were modest and may be neutral from a cardiovascular

perspective as both decreases in pro-atherogenic lipid fractions

(total cholesterol, LDL cholesterol, lipoprotein a) and in the the-

oretically cardioprotective HDL cholesterol were reported. Tri-

glyceride levels did not change, consistent with the absence of

documented changes in visceral fat. While none of the RCTs

found an effect on blood pressure, modest decreases in CRP

were seen, although whether this is independent of changes in

body composition is not known.

Testosterone treatment did not have a significant effect on

constitutional symptoms suggestive of androgen deficiency, as

assessed by the relatively nonspecific Aging Male Symptoms

score. Heterogeneity in the instruments used to assess sexual

function precluded a meta-analysis of sexual function.

Some20,23,24 but not all25 RCTs reported improvements in sexual

parameters although these were variable between the studies,

including erectile function,24 sexual desire20,23,24 and overall sex-

ual function. In one RCT,25 sexual and constitutional symptoms

were worse in men with depression and microvascular complica-

tions, but did not correlate with testosterone levels, suggesting

that these nonspecific symptoms are confounded by comorbidi-

ties. This provides a potential explanation for the relatively mod-

est and inconsistent symptomatic benefit of testosterone

treatment in this population, which is distinct from the marked

symptomatic benefits in men with classical, pathologically based

hypogonadism who have unequivocally low testosterone levels

and objective evidence of androgen deficiency.34

In these relatively short term RCTs as expected, serious

adverse events were few. Consistent with men in the general

population,33 an increase in haematocrit was the most com-

monly reported adverse event, occurring in four of the seven

RCTs. Cardiovascular events were numerically lower in men

receiving testosterone a difference largely driven by the TIMES2

study.20 Although the number of events was very low, this may

provide some assurance given the uncertainties surrounding the

cardiovascular safety of testosterone treatment.35

Strengths of the present meta-analysis include, in contrast to

previous meta-analyses,2,3,10 the strict selection of double-blind

placebo-controlled RCTs including two recent RCTs11,12 using

more rigorous HOMA2 methodology to estimate insulin resis-

tance, which allowed identification of sources of heterogeneity

and the quantification of summary estimates across all published

RCTs. We used the PRISMA statement to conduct our

research,13 and carefully evaluated study quality using the full

25-item CONSORT checklist.14 Limitations include the limited

number and relative modest size of included studies, lack of

access to individual patient level data and the fact that included

studies not always fully complied with CONSORT reporting cri-

teria, which may weaken the precision of some of our estimates.

Moreover, while none of the RCTs used ‘gold standard’ mea-

surements of insulin resistance, HOMA-IR measurements have

shown to correlate well with clamp-based technologies.22

In conclusion, the results of this meta-analysis do not support

the routine use of testosterone treatment to improve glucose

metabolism or constitutional symptoms in men with relatively

well controlled T2D and/or the metabolic syndrome and modest

reductions in testosterone levels. This reinforces recommenda-

tions that, at the current state of evidence, lifestyle measures and

use of standard therapy to optimize glycaemic control and com-

orbidities should remain the first line approach.8

Acknowledgements

The authors thank Cheryl Hamill and the staff of the Medical

Library, Fremantle Hospital, for assistance with the literature

search.

M Grossmann was supported by a National Health and Medi-

cal Research Council of Australia Career Development Fellow-

ship (# 1024139).

Conflict of interest

Mathis Grossmann received research funding from Bayer

Schering and speaker’s honoraria from Besins Healthcare. Rudolf

Hoermann has nothing to disclose. Gary Wittert received

research funding from Bayer Schering, Eli Lilly and Lawley

Pharmaceuticals, speaker’s honoraria from Bayer Schering and is

a member of an Eli Lilly advisory board. Bu B Yeap received

speaker’s honoraria and conference support from Bayer Schering

and Eli Lilly and is a member of an Eli Lilly Advisory Board.

References

1 Ding, E.L., Song, Y., Malik, V.S. et al. (2006) Sex differences of

endogenous sex hormones and risk of type 2 diabetes: a system-

atic review and meta-analysis. JAMA, 295, 1288–1299.2 Corona, G., Monami, M., Rastrelli, G. et al. (2010) Type 2

diabetes mellitus and testosterone: a meta-analysis study. Interna-

tional Journal of Andrology, 34, 528–540.3 Corona, G., Monami, M., Rastrelli, G. et al. (2011) Testosterone

and metabolic syndrome: a meta-analysis study. The Journal of

Sexual Medicine, 8, 272–283.4 Laaksonen, D.E., Niskanen, L., Punnonen, K. et al. (2004)

Testosterone and sex hormone-binding globulin predict the met-

abolic syndrome and diabetes in middle-aged men. Diabetes

Care, 27, 1036–1041.5 Bhasin, S., Jasjua, G.K., Pencina, M. et al. (2011) Sex hormone-

binding globulin, but not testosterone, is associated prospectively

and independently with incident metabolic syndrome in men:

the framingham heart study. Diabetes Care, 34, 2464–2470.6 Grossmann, M. (2014) Testosterone and glucose metabolism in

men: current concepts and controversies. Journal of

Endocrinology, 220, R37–R55.




7 Hamilton, E.J., Gianatti, E., Strauss, B.J. et al. (2011) Increase in

visceral and subcutaneous abdominal fat in men with prostate

cancer treated with androgen deprivation therapy. Clinical Endo-

crinology (Oxford), 74, 377–383.8 Grossmann, M. (2011) Low testosterone in men with type 2

diabetes: significance and treatment. Journal of Clinical Endocri-

nology and Metabolism, 96, 2341–2353.9 Rao, P.M., Kelly, D.M. & Jones, T.H. (2013) Testosterone and

insulin resistance in the metabolic syndrome and T2DM in men.

Nature Reviews. Endocrinology, 9, 479–493.10 Cai, X., Tian, Y., Wu, T. et al. (2014) Metabolic effects of testos-

terone replacement therapy on hypogonadal men with type 2

diabetes mellitus: a systematic review and meta-analysis of ran-

domized controlled trials. Asian Journal of Andrology, 16,

146–152.11 Gianatti, E.J., Dupuis, P., Hoermann, R. et al. (2014) Effect of

testosterone treatment on glucose metabolism in men with type

2 diabetes: a randomized controlled trial. Diabetes Care, 37,

2098–2107.12 Hackett, G., Cole, N., Bhartia, M. et al. (2013) Testosterone

Replacement therapy improves metabolic parameters in hypogo-

nadal men with type 2 diabetes but not in men with coexisting

depression: the BLAST Study. The Journal of Sexual Medicine,

11, 840–856.13 Moher, D., Liberati, A., Tetzlaff, J. et al. (2009) Preferred report-

ing items for systematic reviews and meta-analyses: the PRISMA

statement. Annals of Internal Medicine, 151, 264–269, W264.

14 Moher, D., Hopewell, S., Schulz, K.F. et al. (2010) CONSORT

2010 explanation and elaboration: updated guidelines for report-

ing parallel group randomised trials. BMJ, 340, c869.

15 R Core Team (2014) R: A Language and Environment for Statis-

tical Computing. R Foundation for Statistical Computing,

Vienna, Austria. Available from: http://wwwR-projectorg/

(accessed 7 June 2014).

16 Viechtbauer, W. (2010) Conducting meta-analyses in R with the

metafor package. Journal of Statistical Software, 36, 1–48.17 Aversa, A., Bruzziches, R., Francomano, D. et al. (2010) Effects

of testosterone undecanoate on cardiovascular risk factors and

atherosclerosis in middle-aged men with late-onset hypogona-

dism and metabolic syndrome: results from a 24-month, ran-

domized, double-blind, placebo-controlled study. The Journal of

Sexual Medicine, 7, 3495–3503.18 Kapoor, D., Goodwin, E., Channer, K.S. et al. (2006) Testoster-

one replacement therapy improves insulin resistance, glycaemic

control, visceral adiposity and hypercholesterolaemia in hypogo-

nadal men with type 2 diabetes. European Journal of Endocrinol-

ogy, 154, 899–906.19 Gopal, R.A., Bothra, N., Acharya, S.V. et al. (2010) Treatment of

hypogonadism with testosterone in patients with type 2 diabetes

mellitus. Endocrine Practice, 16, 570–576.20 Jones, T.H., Arver, S., Behre, H.M. et al. (2011) Testosterone

replacement in hypogonadal men with type 2 diabetes and/or

metabolic syndrome (the TIMES2 Study). Diabetes Care, 34,

828–837.21 Kalinchenko, S.Y., Tishova, Y.A., Mskhalaya, G.J. et al. (2010)

Effects of testosterone supplementation on markers of the

metabolic syndrome and inflammation in hypogonadal men with

the metabolic syndrome: the double-blinded placebo-controlled

Moscow study. Clinical Endocrinology (Oxford), 73, 602–612.22 Wallace, T.M., Levy, J.C. & Matthews, D.R. (2004) Use and

abuse of HOMA modeling. Diabetes Care, 27, 1487–1495.

23 Giltay, E.J., Tishova, Y.A., Mskhalaya, G.J. et al. (2010) Effects of

testosterone supplementation on depressive symptoms and

sexual dysfunction in hypogonadal men with the metabolic

syndrome. The Journal of Sexual Medicine, 7, 2572–2582.24 Hackett, G., Cole, N., Bhartia, M. et al. (2013) Testosterone

replacement therapy with long-acting testosterone undecanoate

improves sexual function and quality-of-life parameters vs. pla-

cebo in a population of men with type 2 diabetes. The Journal of

Sexual Medicine, 10, 1612–1627.25 Gianatti, E.J., Dupuis, P., Hoermann, R. et al. (2014) Effect of

testosterone treatment on constitutional and sexual symptoms in

men with type 2 diabetes in a randomized, placebo-controlled

clinical trial. Journal of Clinical Endocrinology and Metabolism 99,

3821–3828.26 Unpublished accessed at Available from: http://www.solvay-

pharmaceuticals.com/static/wma/pdf/1/3/4/4/2/S176.2.101.pdf

(accessed 28 May 2014).

27 Singh, A.B., Hsia, S., Alaupovic, P. et al. (2002) The effects of

varying doses of T on insulin sensitivity, plasma lipids, apolipo-

proteins, and C-reactive protein in healthy young men. Journal

of Clinical Endocrinology and Metabolism, 87, 136–143.28 Grossmann, M., Thomas, M.C., Panagiotopoulos, S. et al. (2008)

Low testosterone levels are common and associated with insulin

resistance in men with diabetes. Journal of Clinical Endocrinology

and Metabolism, 93, 1834–1840.29 Isidori, A.M., Giannetta, E., Greco, E.A. et al. (2005) Effects of

testosterone on body composition, bone metabolism and serum

lipid profile in middle-aged men: a meta-analysis. Clinical Endo-

crinology (Oxford), 63, 280–293.30 Juang, P.S., Peng, S., Allehmazedeh, K. et al. (2013) Testosterone

with dutasteride, but not anastrazole, improves insulin sensitivity

in young obese men: a randomized controlled trial. The Journal

of Sexual Medicine, 11, 563–573.31 Woodhouse, L.J., Gupta, N., Bhasin, M. et al. (2004) Dose-

dependent effects of testosterone on regional adipose tissue dis-

tribution in healthy young men. Journal of Clinical Endocrinology

and Metabolism, 89, 718–726.32 Haider, A., Yassin, A., Doros, G. et al. (2014) Effects of

long-term testosterone therapy on patients with “diabesity”:

results of observational studies of pooled analyses in obese hy-

pogonadal men with type 2 diabetes. International Journal of

Endocrinology, 2014, 683515.

33 Fernandez-Balsells, M.M., Murad, M.H., Lane, M. et al. (2010)

Clinical review 1: adverse effects of testosterone therapy in adult

men: a systematic review and meta-analysis. Journal of Clinical

Endocrinology and Metabolism, 95, 2560–2575.34 Snyder, P.J., Peachey, H., Berlin, J.A. et al. (2000) Effects of

testosterone replacement in hypogonadal men. Journal of Clinical

Endocrinology and Metabolism 85, 2670–2677.35 Endocrine Society Statement (2014) The Risk of Cardiovascular

Events in Men Receiving Testosterone Therapy. http://www.

endocrine.org/~/media/endosociety/Files/ Advocacy and

Outreach/Position Statements/Other Statements/The Risk of Car-

diovascular Events in Men Receiving Testosterone Therapy.pdf

(accessed 28 May 2014)

Supporting Information

Additional supporting information may be found in the online

version of this article at the publisher’s web site.




Measurement of Total Serum Testosterone in Adult Men:Comparison of Current Laboratory Methods VersusLiquid Chromatography-Tandem Mass Spectrometry

CHRISTINA WANG, DON H. CATLIN, LAURENCE M. DEMERS, BORISLAV STARCEVIC, AND

RONALD S. SWERDLOFF

Division of Endocrinology (C.W., R.S.S.), Department of Medicine, Harbor-UCLA Medical Center and Research andEducation Institute, Torrance, California 90502; UCLA-Olympic Analytical Laboratory (D.H.C., B.S.), Los Angeles,California 90025; and Department of Pathology and Medicine (L.M.D.), Pennsylvania State University College of Medicine,H. S. Hershey Medical Center, Hershey, Pennsylvania 17033

The diagnosis of male hypogonadism requires the demonstra-tion of a low serum testosterone (T) level. We examined serumT levels in pedigreed samples taken from 62 eugonadal and 60hypogonadal males by four commonly used automated immu-noassay instruments (Roche Elecsys, Bayer Centaur, OrthoVitros ECi and DPC Immulite 2000) and two manual immu-noassay methods (DPC-RIA, a coated tube commercial kit, andHUMC-RIA, a research laboratory assay) and compared re-sults with measurements performed by liquid chromatogra-phy-tandem mass spectrometry (LC-MSMS). Deming’s regres-sion analyses comparing each of the test results with LC-MSMS showed slopes that were between 0.881 and 1.217. Theinterclass correlation coefficients were between 0.92 and 0.97for all methods. Compared with the serum T concentrationsmeasured by LC-MSMS, the DPC Immulite results were biasedtoward lower values (mean difference, �90 � 9 ng/dl) whereasthe Bayer Centaur data were biased toward higher values

(mean difference, �99 � 11 ng/dl) over a wide range of serumT levels. At low serum T concentrations (<100 ng/dl or 3.47nmol/liter), HUMC-RIA overestimated serum T, Ortho VitrosECi underestimated the serum T concentration, whereas theother two methods (DPC-RIA and Roche Elecsys) showed dif-ferences in both directions compared with LC-MSMS. Over60% of the samples (with T levels within the adult male range)measured by most automated and manual methods werewithin � 20% of those reported by LC-MSMS. These immuno-assays are capable of distinguishing eugonadal from hypogo-nadal males if adult male reference ranges have been estab-lished in each individual laboratory. The lack of precision andaccuracy, together with bias of the immunoassay methods atlow serum T concentrations, suggests that the current meth-ods cannot be used to accurately measure T in females orserum from prepubertal subjects. (J Clin Endocrinol Metab 89:534–543, 2004)

THE DIAGNOSIS OF androgen deficiency in men is usu-ally based on clinical features of hypogonadism and

the demonstration of a morning serum total testosterone (T)level below the reference range for young male adults. In thepast 30 yr, serum T levels have been measured in both re-search and clinical laboratories using established RIAs thatinitially employed an extraction and column chromatogra-phy purification step before performing the RIA (1–4). Sub-sequently with the availability of more specific antibodies,the chromatography step and then the extraction step wereeliminated in most laboratories. Ready-made commercialkits for RIAs were then introduced and routinely used inmost clinical and research laboratories.

More recently, assays for serum T in male and femaleserum have been performed in many hospital and referencelaboratories using rapid automated immunoassay instru-ments that employ chemiluminescence detection. These as-says are performed with proprietary reagents that include

analogs of T as standards and reference ranges provided bythe instrument manufacturer. While economical and rapid,many of these assays have had limited published validationdata, raising questions about the accuracy and/or specificityof these automated immunoassay methods. Furthermore, theapproval of these methods by regulatory agencies for clinicaluse is primarily based on noninferiority comparison againstpreviously approved assays frequently using pooled sam-ples and mostly not from T-free serum spiked with gravi-metrically determined standards of authentic T or from in-dividual serum samples independently assayed by othermethods such as mass spectrometry methods. A major prob-lem exists when the standard reference texts for physicians(5) describe an adult male reference range that does notcorrespond to values quoted by many clinical laboratories.Clinicians are being presented with normal male referenceranges for serum T from these automated platforms that havelow end clinical limits down to 170–200 ng/dl (5.9–6.9nmol/liter) and upper range limits of 700–800 ng/dl (24.3–27.7 nmol/liter). These stated reference ranges provided bythe manufacturer are significantly lower than the 300-1000ng/dl (10.4–34.7 nmol/liter) reference range referred to innumerous publications over the past 30 yr based on tradi-tional RIA methods with or without the chromatographystep as well as some research techniques employed by in-ternal recovery standards to correct for procedural losses (5).

Abbreviations: CV, Coefficient of variation; GC, gas chromatograph;HRP, horseradish peroxidase; HUMC, Harbor-UCLA Research and Ed-ucation Institute Endocrine Research Laboratory; LC-MSMS, liquidchromatography-tandem mass spectrometry; LOQ, limit of quantifica-tion; MS, mass spectrometry; T, testosterone.JCEM is published monthly by The Endocrine Society (http://www.endo-society.org), the foremost professional society serving the en-docrine community.

0021-972X/04/$15.00/0 The Journal of Clinical Endocrinology & Metabolism 89(2):534–543Printed in U.S.A. Copyright © 2004 by The Endocrine Society

doi: 10.1210/jc.2003-031287

534

The Endocrine Society. Downloaded from press.endocrine.org by [${individualUser.displayName}] on 19 January 2015. at 14:39 For personal use only. No other uses without permission. . All rights reserved.

External quality control programs such as that providedby the College of American Pathologists allow laboratories tocompare results with other laboratories using the samemethod or kit reagents. As shown in Table 1, the medianvalue of a quality control sample (Y-04, 2002) varied between215 and 348 ng/dl (7.5 and 12.0 nmol/liter) among methodswith coefficients of variation among laboratories using thesame method or instrument ranging between 5.1% and22.7%. The median average for this sample from all methodswas 297 ng/dl (10.3 nmol/liter) and results were as low as160 or as high as 508 ng/dl (5.5 to 17.6 nmol/liter). Theseresults span the hypogonadal to eugonadal range.

A previous study evaluated and compared steroid mea-surements by RIA and gas chromatography-mass spectrom-etry using pooled female and male serum samples. Theyused linear regression analysis and demonstrated that sim-ilar results could be obtained for most steroids in serumeither by RIA or mass spectrometry (6). This report, however,only tested pooled samples that covered the high, medium,and low range of each steroid standard curve and not ped-igreed samples from normal subjects and patients. Moreoverthe use of least-squares linear regression analysis is not anoptimal measure because it does not take into considerationthe fact that both the reference and the test methods containerror. In this study, we compared serum T measurementsfrom eugonadal and hypogonadal adult men using liquidchromatography-tandem mass spectrometry (LC-MSMS)(UCLA Olympic Analytical Laboratory) vs. two RIAs run ina research laboratory (Harbor-UCLA Research and Educa-tion Institute Endocrine Research Laboratory, HUMC-RIA)and a hospital based reference laboratory using a commer-cially available RIA kit (DPC-RIA, Core Endocrine Labora-tory, Penn State University-Hershey Medical Center, Her-shey, PA), and compared results with the same specimensrun on the most common automated immunoassay instru-ments used in hospital based laboratories (Penn State Uni-versity-Hershey Medical Center; University of Pennsylvania,Philadelphia, PA; Mercy Health Laboratories, Philadelphia,PA; and Henry Ford Hospital, Detroit, MI).

Subjects and MethodsSubjects

Serum samples were collected from normal (n � 62) and hypogonadalmen (n � 60) from June 1995 to September 1999. The 62 normal healthy

volunteers were 18–60 yr of age. Serum was collected between 0800 and1000 h from healthy volunteers in the basal state without any researchprotocol interventions. These subjects were recruited at Harbor-UCLACenter of Men’s Health for other research studies on androgen metab-olism. They had no significant medical history and were not takingmedications. They had a normal physical examination, normal clinicalchemistry values, normal semen analyses, and normal serum gonado-tropin levels. Sera were also obtained from 25 hypogonadal men (agerange from 19–68 yr) who had serum T levels less than 300 ng/dl (10.4nmol/liter, as previously determined by RIA at HUMC) before T ther-apy. In addition, sera were collected from 35 hypogonadal men aftertransdermal T replacement therapy. Of the samples from T-replacedhypogonadal men, 20 were within the normal range and 15 were abovethe normal range as previously determined by an RIA at HUMC.

Samples

The serum was stored at �20 C at HUMC. Since their original col-lection and aliquoting, the samples were thawed only once before thecurrent study. Aliquots from each serum sample were pooled and mixedthoroughly by the laboratory supervisor before being aliquoted intoportions for each of the laboratories participating in the study. Sampleswere bar-coded at HUMC and sent to the UCLA Olympic AnalyticalLaboratory for LC-MSMS assay and to the Penn State-Hershey MedicalCenter Core Endocrine Laboratory for RIA and for assay on four dif-ferent automated instruments. The bar codes were linked to a databasethat contained demographics including the origin of the sample, the dateof the sample collection, and the original T concentration assayed at theHUMC. This database was maintained by the laboratory supervisor atHUMC and was not made available to the investigators or the differenttechnicians performing the assays. To maintain blinding of the samplesat the HUMC, an aliquot of each sample was sent to the Penn State-Hershey Medical Center Core Endocrine Laboratory where each samplewas recoded and sent back to the HUMC for assay. The listing of therecoded samples were not made available to the HUMC until all T assayswere performed and entered into a database by an independent datamanager. Thus, all samples were assayed in the different laboratorieswithout prior knowledge of the serum T concentrations of the samples.

Methods

All assays used appropriate quality control material and standardseither as steroid-free serum samples spiked with T or samples provided,by the manufacturer as defined by the standard operating proceduresestablished and validated in each laboratory. Steroid-free sera werecharcoal stripped sera prepared in the laboratory, newborn bovine se-rum, or steroid free sera obtained commercially. These steroid-free serawere tested in each individual laboratory to ensure that they did notshow any T at the limit of detection of the assay used in each laboratory.All samples were measured similarly to other test samples run in eachlaboratory. For LC-MSMS, each sample was extracted and injected intothe LC-MSMS once because of inadequate serum volume for replicatesfor most test samples. As routinely done at the laboratories performingthe RIAs, the serum T result for each sample was determined from the

TABLE 1. Examples of serum total testosterone (ng/dl) external quality control program (College of American Pathologists, sample Y-04)

Instrument/assay No. oflabs

Mean(ng/dl) SD CV Median

Range

Low High

Abbott Architect 11 243.5 13.8 5.7 243 219 262Bayer ACS:180 83 317.6 39.0 12.3 314 227 410Bayer Centaur 231 324.0 41.5 12.8 319 234 454Bayer Immuno-1 43 300.6 16.7 5.6 300 254 335Beckman Access/2 98 297.8 15.3 5.1 298 239 330Diagnostic Systems solid 10 352.7 80.1 22.7 375 177 440DPC Coat-a-Count 76 277.8 34.2 12.3 281 196 363DPC Immulite 86 232.0 32.9 14.2 228 160 330DPC Immulite 2000 83 210.8 33.5 15.9 215 130 299Roche Elecsys/E170 87 349.9 23.0 6.6 348 299 408Ortho Vitros ECi 54 282.3 15.8 5.6 280 254 324All instruments 891 293.6 56.2 19.1 297 160 508

Wang et al. • Serum Total T J Clin Endocrinol Metab, February 2004, 89(2):534–543 535


average of two duplicates. Samples were run in singlicate on all fourautomated immunoassay instruments as specified by the proceduremanuals of each laboratory. Data from all laboratories were sent to theHUMC and data entry validated before statistical analyses. The char-acteristics of the various methods are listed in Table 2 and detailedbelow.

LC-MSMS

The UCLA Olympic Analytical Laboratory used LC-MSMS to quan-titate serum T levels. Advantages of the LC-MSMS method include easyand simple sample preparation (nonderivatized steroids can be ana-lyzed directly), high recovery with improved signal to noise ratio, en-hanced specificity, and low interference due to MSMS technology (7–9).A 2.0-ml sample was used for analyses and trideuterated T was used asthe internal standard to monitor recovery. A LC-10A Shimadzu binarypump LC equipped with a PE-Applied Biosystem (Foster City, CA) PESeries 200 autosampler was used for LC and an Applied Biosystem-SciexAPI-300 triple quadruple mass spectrometer equipped with an APIinterface was used to perform the T analysis.

The LC-MSMS method was validated using protocols specified by theFederal Drug Administration. This included determining the limit ofdetection (10), the limit of quantitation (LOQ), the characteristics of thecalibration curve, and the within- and between-day reproducibility atthree different concentrations of serum T. The standard curve for T waslinear between 0 and 2000 ng/dl (0–69 nmol/liter) and the calibrationplots over four days showed a slope 0.752–0.787, intercept 0.068–0.139,regression coefficient 0.997 to 0.999. The LOQ was 20 ng/dl (0.69 nmol/liter) and the accuracy for that level was 84.6% of the nominal value with%CV (coefficient of variation) of 9.4%. The between-day %CV was 7.4,6.1, and 6.5 at 50, 750, and 1500 ng/dl, respectively. The dynamic rangeof the assay is 20 to 2000 ng/dl or 0.7–69.4 nmol/liter. Bovine newbornserum (determined by LC-MSMS to contain less than 20 ng/dl of T, LOQof assay) was spiked with T (Sigma, St. Louis, MO) determined to be99.8% pure by LC-MSMS and gas chromatograph (GC)-MS. The accu-racy was 100.7, 93.6, 100.4, 100.3,103.5, and 97.8 for samples known tocontain 20, 50, 250, 100, 500, 1000, and 2000 ng/dl, respectively. Thecorresponding precision values were: 10.5, 10.4, 7.2, 4.8, 1.7, and 5.9%.Recovery (% recovery of the analyte during analysis) was 77.0% at 50ng/dl, 76.9% at 750 ng/dl, and 71.4% at 1500 ng/dl. Only a singleextraction and injection were performed for each sample due to inad-equate serum volume for replicate assays for most samples.

During the study, the standard curve was linear between 0 and 2000ng/dl (0–69 nmol/liter) of T concentrations and the calibration lines for4 d showed a slope 0.789–0.833, intercept 0.072–0.301, regression co-efficient 0.997–0.999. The LOQ was 20 ng/dl (0.69 nmol/liter) and theaccuracy for that level was 85.2% of the nominal value with %CV of17.9%. The interday %CV was 10.5, 8.6, and 8.4 at 50, 750, and 1500 ng/dl.The accuracy was 110.4, 98.1, 98.5, 98.3, 96.6, and 102.4% for samplesknown to contain 20, 50, 250, 100, 500, 1000, and 2000 ng/dl, respectively.The corresponding values for precision were: 10.4, 8.3, 5.7, 9.5, 6.5, and3.2%.

RIAs

RIA at HUMC. Serum T was measured by a T RIA using reagentsincluding the iodinated tracer obtained from ICN (Costa Mesa, CA). Thecross reactivity of the ICN antibody used in the T RIA were 2.0% for5�-dihydrotestosterone, 2.3% for androstenedione, 0.8% for 3�-andro-stanediol, 0.6% for etiocholanolone, and less than 0.01% for all othersteroids tested (from 0.1–1000 ng/ml, up to 200-fold of the highest Tstandard). Before analysis, the samples (0.1 ml) were extracted with 2.0ml of ethyl acetate:hexane, 3:2 (vol:vol). Initially tritiated T was used asan internal standard for each sample. The average recovery of the in-ternal standard was 102 � 1% (range 99.6–105.1%). Because of theproven minimal procedural loss, subsequently no internal standard wasused to correct for the extraction. The extract was then dissolved in theassay buffer and two aliquots were assayed in sequence in the RIA. Theaverage of the T levels in each of the two aliquots were reported. ThisRIA was validated using the guidelines published by Shah et al. (11). Thefollowing were data from the validation studies. The lower limit ofquantitation of serum T measured by this assay was 0.87 nmol/liter (25ng/dl). This was the lowest concentration of T measured in serum thatcan be accurately distinguished from steroid-free serum with a 12% CV.The accuracy of the T assay, determined by spiking steroid-free serum(ICN) with 25, 50, 100, 500, 1000, and 1500 ng/dl of T was 114, 118, 109,94, 92, and 92%, respectively (mean 104%). The T was obtained fromSigma and was 99.8% as determined by celite column chromatography.The within-run precision (CV) at a serum T concentration of 646 ng/ml(22.4 nmol/liter) was 5.9%. The between-run precision (CV) for low,medium, and high serum T concentrations of 136, 531, and 1477 ng/dl(4.7, 18.4, and 51.2 nmol/liter) was 12.4, 9.3, and 12.5%, respectively. Theadult male reference range in this laboratory was 298-1043 ng/dl (10.33to 36.17 nmol/liter) determined from samples in young men (18–50 yr)with normal physical examination, serum gonadotropin and semenanalyses (12, 13). This RIA was developed and validated primarily forresearch studies in men. Although not used in this study, a separateprotocol was available using more serum for extraction of samplessuspected of containing very low T levels such as that seen in womenand children. All the samples for this study were done in three assayson three different days where two sets of quality control samples wererun with each assay. The interassay CV for serum T levels of 101, 518,and 1201 ng/dl were 15.4, 14.0, and 9.1%, respectively. The HUMC-RIAprotocol required repeating the analyses if the CV for the duplicatecounts exceeds 10%; however, in this study all CV were less than 10%.

RIA at Penn State-Hershey Medical Center. Serum T was measured usingthe DPC coat-a-tube RIA method (Diagnostic Products Corp., Los An-geles, CA). This method used an iodinated tracer and a T-specific an-tibody immobilized to the wall of a polypropylene tube. Duplicatessamples were run in sequence in the assay and the average serum Tlevels were reported. Antibody cross-reactivity against androstenedi-one, 3�-androstanediol, dehydroepiandrosterone, and other possibleinterfering steroids was less than 1%. Cross-reactivity with 5�-dihy-drotestosterone was 2.8%. Accuracy studies averaged 101% with steroid-stripped serum samples spiked with T (purity ascertained by celite

TABLE 2. Characteristics of the methods

Assay LLOQ(ng/dl) Accuracy (%) Interassay Precision

(CV%)

Referencea rangefor adult men

(ng/dl)

LC-MSMS 20 84.6–110.4 8.0 at 750 ng/dlHUMC-RIA 25 92–118% 9.3 at 530 ng/dl 298–1043DPC-RIA 14 101% 5.3 at 602 ng/dl 250–900Roche Elecsys 11.5 NA 4.3 at 271 ng/dl 210–810Bayer Centaur 34.6 NA 7.3 at 671 ng/dl 241–827Ortho Vitros ECi 14 NA 2.8 at 271 ng/dl 132–813DPC Immulite 2000 49 NA 13.7 at 427 ng/dl 286–1510

LLOQ, Lower limit of quantitation.a Reference ranges for HUMC-RIA and DPC-RIA were determined from serum obtained in healthy men between the ages of 18 and 50 yr

with normal physical examination, serum gonadotropins, and normal gonadal semen analyses. The ranges for automatic immunoassays werebased on reference ranges quoted by manufacturer. Each individual laboratory then verified the reference range with samples from normal menwith normal gonadotropin levels and normal physical examination.

536 J Clin Endocrinol Metab, February 2004, 89(2):534–543 Wang et al. • Serum Total T


column chromatography) at a concentration of 250 ng/dl (8.7 nmol/liter). The within-run precision (CV) at a serum T concentration of 545ng/dl (18.9 nmol/liter) was 3.9%. The between-run precision (CV) forsamples with low, medium and high serum T concentrations of 83.6, 602,and 1229 ng/dl (2.9, 20.9, and 42.6 nmol/liter) was 11.4, 5.3, and 4.5%,respectively. The assay reportable range extends from 14–1600 ng/dl(0.5–55.5 nmol/liter). The adult male reference range for this assay was250–900 ng/dl (8.7–31.2 nmol/liter). During the study the between runCV averaged 4.8%.

Automated platform assays

The measurement of T on the different automated immunoassaysystems was carried out at four institutions including The Penn State-Hershey Medical Center, Hershey, PA; The University of Pennsylvania;Mercy Health Laboratories; and Henry Ford Hospital. The automatedsystems included the Roche Elecsys, the Bayer Centaur, the Ortho VitrosECi, and the DPC-Immulite 2000. The references range quoted in Table2 are based on those provided by the manufacturer. These referenceranges were verified by the individual laboratories using serum samplesobtained from men with normal physical examination and normalgonadotropins.

Roche Elecsys. The Elecsys 2010 automated analyzer (Roche DiagnosticsGmbH, Mannheim, Germany) measures T in serum using electrochemi-luminescence. This assay uses a highly specific antibody to measure T.Briefly, 50 �l of serum and a biotinylated antibody against T are incu-bated together. A second antibody labeled with a ruthenium complex isthen added together with streptavidin-coated microparticles. A sand-wich complex is formed that is bound to the solid phase (the micro-particles) via biotin-streptavidin interaction. The microparticles are thenmagnetically captured onto the surface of an electrode. Application ofvoltage on this electrode induces a chemiluminescence emission, whichis detected by a photomultiplier and the signal compared with a Tcalibration curve, which is instrument-specific. This instrument uses atwo-point calibration curve for day-to-day analysis, and a master curveprovided by the manufacturer for each lot of reagents. A three-levelassay control provided by the manufacturer was used with each assayrun. The LOQ of the Elecsys T assay is 11.5 ng/dl (0.4 nmol/liter) andbetween-run precision averaged 4.3% at a concentration of 271 ng/dl(9.4 nmol/liter). The reference range for adult males for this method was210–810 ng/dl (7.3–28.1 nmol/liter). During the study the between runCV averaged 4.6%.

Bayer (Centaur). The Bayer ACS Centaur (Bayer Diagnostics, Tarrytown,NY) is a fully automated random access immunoassay analyzer thatused paramagnetic solid-phase particles and an acridinium ester-baseddirect chemiluminescence tracer that is coupled to T antibodies in asecond reagent. After magnetic separation and washing of the particles,luminescence is initiated by the addition of an acid and base reagent.Individual assays are calibrated using a two-point calibration curve anda three level assay control is used with each run. A master curve isprovided for each lot of reagents. The functional sensitivity of the Cen-taur T assay was 34.6 ng/dl (1.2 nmol/liter) and between run precisionat a concentration of 671 ng/dl (23.3 nmol/liter) averaged 7.3%. Thereference range for adult males was 241–827 ng/dl (8.36–28.7 nmol/liter). During the study, the between run CV averaged 6.8%.

Ortho Vitros Eci. The Vitros T assay is performed using the Vitros TReagent Pack and Vitros Immunodiagnostic Product T calibrators on afully automated random access immunoassay system that used en-hanced chemiluminescence technology with horseradish peroxidase(HRP) as a label and a luminol substrate for signal detection (OrthoClinical Diagnostics, Rochester, NY). The assay depends on competitionbetween T present in a serum sample with an HRP-labeled T conjugatefor binding sites on a biotinylated mouse anti-T antibody. The antigen-antibody complex is then captured by streptavidin in the incubationwells. Following a wash step, the bound HRP conjugate is determinedby a luminescence reaction with a luminol derivative and a peracid salt.The HRP in the bound conjugate catalyzes the oxidation of the luminalderivative, producing a flash of light. An electron transfer reagent ispresent to enhance the level of light produced prolonging its emissionspectra. The amount of HRP conjugate bound is in direct proportion tothe concentration of T present in the sample. Calibration is lot specific,

and the T calibrators are supplied by the manufacturer ready for use. Onboard calibration stability is 28 d. A three-level control was run with eachassay run. The calibration range of the Vitros T assay is 0–2163 ng/dl(0–75 nmol/liter) (calibrated against samples measured by isotope di-lution-gas chromatography/mass spectrometry, ID-GC/MS). The func-tional sensitivity of the Vitros T assay was 14 ng/dl (0.5 nmol/liter) witha between run precision of 2.8% at a concentration of 271 ng/dl (9.4nmol/liter). The reported adult male range was 132–813 ng/dl (4.6–28.2nmol/liter). During the study, the between run CV averaged 3.6%.

DPC Immulite 2000. The Immulite 2000 is an automated, random-accessimmunoassay analyzer with a solid-phase washing process and a chemi-luminescence detection system. The solid phase is made up of a poly-styrene bead enclosed within the Immulite test unit that is coated witha polyclonal rabbit antibody specific for T. The patient’s serum sampleand an alkaline phosphatase-conjugated T reagent are simultaneouslyintroduced into the test unit. During a 60-min incubation period at 37C with intermittent shaking, the T in the serum sample competes withthe enzyme-labeled T for a limited number of antibody binding sites onthe bead. Unbound enzyme conjugate is then removed by a patentedfive-spin-wash technique. The chemiluminescence substrate, a phos-phate ester of adamantyl dioxetane, is added and the test unit incubatedfor 10 min. The substrate is hydrolyzed by the alkaline phosphatase toan unstable anion. The decomposition of the anion yields a sustainedemission of light. The bound complex, corresponding to the photonoutput, is inversely proportional to the concentration of T in the sample.A single determination uses 25 �l of serum, and the dynamic range ofthe Immulite T assay is 14 to 1586 ng/dl (0.5–55 nmol/liter). The func-tional sensitivity for the T assay on this system is 49 ng/dl (1.7 nmol/liter) and the average between run imprecision was 13.7% at a concen-tration of 427 ng/dl (14.8 nmol/liter). The normal range for adult malebetween 20 and 49 yr is reported to be 286-1510 ng/dl (9.9–52.4 nmol/liter). During the study, the between run CV averaged 11.5%.

Data analyses

Because serum T concentrations were not normally distributed, weestimated the median and the 10th, 25th, 75th, and 90th percentiles ofthe values obtained from the different methods. The serum T resultsobtained from the four automated immunoassay systems and the twoRIAs (test methods) were compared with values obtained with theLC-MSMS method to determine the extent of agreement among methods(14). Deming regression was used to estimate the slope and intercept(15). We computed the interclass correlation coefficient (16). Plots of thepercent differences of the values between two methods (test vs. LC-MSMS) vs. the mean of the values generated by the two methods asinitially described by Bland and Altman were used (17–20) to identifyother types of systematic bias.

Of the 122 samples that were distributed, seven were below the LOQin one or more assays, 13 were not analyzed in all assays (inadequatevolume of serum) and one sample was excluded from the analysisbecause the result from one method were one third that of the others(outlier). The data analyses were based on 101 samples. Because theserum T values spanned a large range (�50–1500 ng/dl), our samplesize of 101 samples should provide stable estimates for the measures ofagreement, should not be influenced by individual variables, and shouldbe reproducible in other studies (21). The use of samples from hypogo-nadal men as well as normal men assured that our results would coverthe widest range of possible T values seen in clinical practice in ado-lescent and adult men.

ResultsComparison of median and range

Figure 1 shows the median and the 10th, 25th, 75th, and90th percentiles of the serum T levels measured by the sevendifferent methods. Compared with the median serum Tvalue obtained by LC-MSMS (462 ng/dl), the median valuedetermined by the DPC Immulite was lower (318 ng/dl),whereas the median T result obtained from the Bayer Cen-taur was higher (514 ng/ml). The median serum T levels



determined by DPC-RIA, HUMC-RIA, Roche Elecsys, andOrthoVitros ECi were similar to LC-MSMS at 490, 473, 431,and 431 ng/dl, respectively.

Comparison using regression analyses andcorrelation coefficient

Figure 2 shows the Deming regression analyses for theRIAs and platform analog assays vs. LC-MSMS. Table 3 givesthe slope and intercept of the Deming regression the inter-class correlation coefficient and the 95% confidence intervalfor all parameters. The slope was closest to one between theDPC-RIA and LC-MSMS (1.098), whereas the other assaysranged from 0.881 (DPC Immulite) to 1.217 (Ortho VitrosECi). The intercepts for DPC-RIA and Beyer Centaur are notsignificantly different from zero. The Vitros ECi interceptwas the largest. The interclass correlation coefficient for allmethods was between 0.92 and 0.97. The 95% confidenceintervals for this correlation were 0.63–0.97 and 0.71–0.96 forDPC Immulite and Bayer Centaur, respectively, and ex-ceeded 0.92 for the other four assays.

Assessment of agreement and bias between methods

Figure 3 shows the plots of the percent difference betweeneach method and LC-MSMS against the means of serum Tconcentrations obtained by LC-MSMS and the values ob-tained by each immunoassay. The plots also showed percentdifference � 2 sd (95% limits of agreement). In the quotedadult male range (between 300-1000 ng/dl or 10.4–34.7nmol/liter), agreement of serum T concentrations among thetwo RIAs, Roche Elecsys, Ortho Vitros ECi were within �20% in over 60% of the samples of that measured by LC-MSMS (Fig. 3, A–D, and Table 4). As shown in Fig. 3, theaverage percent difference in serum T levels between DPC-RIA, HUMC-RIA, Roche Elecsys, Ortho Vitros ECi, DPCImmulite and Bayer Centaur and LC-MSMS were �9.7, �9.7,�3.4, �11.2, �18.7, and �15.9%, respectively. The meandifferences in measured serum T levels between DPC-RIA,HUMC-RIA, Roche Elecsys, Ortho Vitros ECi and LC-MSMSwere �48.1 � 7.5, �33.8 � 11.1, 10.8 � 9.6, and �3.5 � 11.2ng/dl, respectively. At serum T levels above the adult ref-erence range, the values obtained by LC-MSMS were lowerthan all the other methods except the results obtained with

the DPC Immulite. It is evident from Fig. 3 that comparedwith LC-MSMS in the adult male reference range, the DPCImmulite assay generally underestimates the serum T values(mean difference �90 � 8.7 ng/dl; Fig. 3E). In contrast, theBayer Centaur overestimates serum T levels (mean differ-ence �99 � 11 ng/dl; Fig. 3F).

The left side of each graph shows more clearly the differ-ences between the methods when serum T levels were con-siderably below the adult male reference range. At valuesless than 100 ng/dl (3.47 nmol/liter), the percent differencebetween DPC-RIA and LC-MSMS varied between �40% and�40% (Fig. 3A). Similarly, the percent difference between Tvalues estimated by Roche Elecsys and LC-MSMS rangedfrom �80 to �40% (Fig. 3C). At low serum T concentrations(�100 ng/dl), the HUMC-RIA was biased in the high direc-tion (�20 to 80%; Fig. 3B) and the Ortho Vitros ECi in the lowdirection (0 to �100%; Fig. 3D). Figure 3E shows that theserum T values at low serum T levels obtained by the DPCImmulite is again systematically biased in the low directionfor serum T values and those measured by the Bayer Centauris systematically biased in the high direction for samples atall T concentrations (Fig. 3F).

For the 102 samples analyzed by all seven methods, Table4 shows the percent of the T values obtained by the varioustest methods that fell outside � 20% of the LC-MSMS values.It can be seen from Table 4 that 19.8, 25.7, 39.6, 39.6, 48.5, and50.4% of the samples fell outside the � 20% range of theLC-MSMS generated serum T value by DPC-RIA, Roche-Elecsys, Ortho Vitros-Eci, HUMC-RIA, Immulite and Bayer,respectively. This difference was especially noted in the sam-ples with T values less than 100 ng/dl (3.47 nmol/liter)obtained by the six different immunoassays, the majority(55.5–90.0% of the samples) fell outside the � 20% range ofthose obtained by LC-MSMS.

Lower limit of quantitation

The LOQ of each assay is listed in Table 2. Seven sampleswere excluded because the serum T values measured by oneor more of the assays were below the LOQ. One sample wasbelow the LOQ of LC-MSMS, HUMC-RIA, Ortho Vitros ECi,and Immulite. Another sample was below the LOQ of all theplatform methods. All seven samples were below the LOQof DPC Immulite, whereas none were below the LOQ byDPC-RIA.

Discussion

In this study, we have compared serum total T levels usingtwo RIAs and four automated analog platform assays againstLC-MSMS as the reference method using the standard op-erating procedures for measuring clinical samples particularto each laboratory. The results indicate that despite an ap-parent good correlation as evidenced by the slope (between0.88 and 1.23) and the interclass correlation coefficients (0.92–0.97) between the immunoassays and LC-MSMS method,there were systematic biases detected in some of the meth-ods. Using Deming’s regression, the DPC-RIA has a slopethat was closest to one as well as a small intercept that wasnot significantly different from zero when compared withLC-MSMS. Others like the DPC-Immulite and the Bayer Cen-

FIG. 1. Median levels of serum T measured by the seven differentmethods. Line within the box represents the median, lower boundaryof box indicates the 25th percentile, and the upper boundary of boxindicates the 75th percentile. Whiskers above and below indicate the90th and 10th percentiles. x, Outlying points.



taur methods showed lower agreement with LC-MSMS witha lower 95% confidence interval of the correlation coefficientof 0.63 and 0.71, respectively. Our results corroborate thoserecently reported by Taieb et al. (22) who demonstrated thatthe serum T measured by GC-MS and 10 immunoassaysshowed correlation coefficients between 0.92 and 0.97 in malesera. They also indicated that only DPC-RIA and three otherplatform immunoassays not examined in our present studygave serum T levels that were not significantly different fromGS-MS. It should be noted that the GC-MS method reportedrequired extraction purification by ethylene-glycol impreg-nated celite chromatography and derivatization of the ste-

roid before quantitation of T from the sample, which is moretime consuming and complicated than our LC-MSMS assay.

Using the method described by Bland and Altman (17–20),which shows the relationship between the mean of LC-MSMS and various values of serum T on the x-axis and thepercent difference the various assays from LC-MSMS valueon the y-axis, the DPC-RIA, HUMC-RIA, Roche Elecsys andBayer Centaur showed that all these methods gave T valueshigher than LC-MSMS, whereas the DPC Immulite and Or-tho Vitros ECi gave lower values. When the individualgraphs were examined, it was shown that values obtained bythe Bayer Centaur showed a bias in the high direction. In

FIG. 2. Deming regression plots of serum T concentrations measured by the six different immunoassays (y-axis) against LC-MSMS (x-axis).



contrast, serum T values obtained by the DPC-Immulite werebiased in the low direction. For both the DPC-RIA andHUMC-RIA the mean serum T was higher by 48 and 34

ng/dl, respectively, when compared with LC-MSMS. Thecomparison of mean serum T results obtained by Roche-Elecsys (�10.8 ng/dl) and Ortho Vitros ECi (�3.5 ng/dl)

FIG. 3. Plots of percentage differences in serum T levels (test minus LC-MSMS) against the average of the two methods. The bold solid linerepresents 0%, the light solid line the mean percentage difference between the methods, and the dashed lines 2 SD of the mean percentage difference.

TABLE 3. The slope and intercept of Deming regression and interclass correlation coefficient for LC-MSMS vs. immunoassays

Slope Intercept Interclass correlationcoefficient

DPC-RIA 1.098 (1.032–1.165) �2.9 (�30.9 to 25.2) 0.968 (0.918–0.984)HUMC-RIA 1.141 (1.076–1.206) �39.2 (�73.7 to �4.2a) 0.948 (0.910–0.967)Roche Elecsys 1.167 (1.112–1.222) �75.5 (�102 to �49.1a) 0.965 (0.939–0.978)Vitros ECi 1.233 (1.136–1.330) �118.4 (�160.5 to �76.4a) 0.954 (0.921–0.971)DCP Immulite 0.881 (0.838–0.924) �28.6 (�49.8 to �7.4a) 0.925 (0.628–0.969b)Bayer Centaur 1.195 (1.112–1.277) �1.4 (�36.8 to 33.9) 0.919 (0.711–0.963b)

Numbers in parentheses are 95% confidence intervals.a Significantly different from zero.b Data not exchangeable with LC-MSMS (see Ref. 16).



were less different from those obtained by LC-MSMS. Thesedifferences in serum T levels are not clinically relevant in theadult male reference range. Using GC-MS as the standardmethod and the Bland-Altman analyses, Taieb et al. (22) alsoreported that Roche Elecsys underestimated serum T levelsthat was not demonstrated in our study, whereas their resultsdemonstrating that Bayer Centaur displayed a positive andDPC-Immulite a negative bias for male sera concurred withour data. They also reported the DPC-RIA displayed no biasin male range but overestimated serum T in the female rangewhich was quite similar with our findings. When the percentdifferences were plotted against the means, using LC-MSMSas the reference method, the largest difference was observedin the serum T concentrations less than 100 ng/dl (3.47 nmol/liter). Again, the values of serum T obtained by DPC Im-mulite were systematically lower and those by the BayerCentaur higher than LC-MSMS. At very low serum T valuescompared with the LC-MSMS method, the HUMC-RIA wasbiased toward the high direction, whereas the Ortho VitrosECi was biased in the low direction. The DPC-RIA and RocheElecsys showed large percent difference both in the high andlow directions. The results indicate that none of the assays asperformed are of sufficient accuracy at low serum T levelsusing LC-MSMS as the gold standard. Our data are similarto the previous findings comparing immunoassays withGC-MS demonstrating that none of the immunoassays testedwas sufficiently reliable for investigation from children andwomen (22). However, from a clinical use perspective, theRIA and some automated methods would be acceptable foruse in adult males even at the very low range (�100 ng/dl,3.47 nmol/liter) as these males would be diagnosed to behypogonadal who would be investigated and treated with T.The RIAs and some of the automated methods may also beacceptable for discerning abnormal elevations in T (above100 ng/ml, 3.47 nmol/liter) in females and prepubertal chil-dren. The dose-response curve of RIAs, immunoradiometricassays, and enzyme-linked immunosorbent assay are non-linear and various curve-fitting methods have been used. Themost common data reduction method in use is the four-parameter logistics model (23–25). Despite use of thesecurve-fitting techniques, only a segment of the standardcurve is linear with relatively low variance. For many im-munoassays, low concentrations of the hormone are mea-sured at a portion of the calibration (standard) curve wherethe variance is larger than that at the more linear portion ofthe calibration (standard) curve. This is not the case forLC-MSMS where the calibration curve is linear. The RIAsdesigned for serum T assays are standardized for use in maleserum and optimized for lower variance in the adult male

range (e.g. HUMC-RIA and DPC-RIA). Because of the highvariance of the immunoassays at low concentrations as il-lustrated by the data from this study, a high proportion ofsamples with serum T values less than 100 ng/dl whenmeasured by various immunoassays were outside of � 20%range of the LC-MSMS values (55.5% for Roche Elecsys andBayer Centaur, 63.6% for DPC-RIA and DPC Immulite, and90.9% for HUMC-RIA and Ortho Vitros ECi). Based on thesedata, we conclude that these assays should be modified toincrease their sensitivity and accuracy at low serum T levelsless than 100 ng/dl (3.47 nmol/liter) to improve their ap-plicability to serum T measurements in prepubertal childrenand female serum. For the RIAs, increased sensitivity can beachieved by adjusting the antibody titer, selecting more spe-cific antibodies, preincubation of the antibodies with the testserum (nonequilibrium), and changing methods for the sep-aration of bound from free hormone. For the automatedplatform assays, the reagents, the time of reaction, and thecapture antibody may be adjusted by the manufacturer toproduce more accurate and precise results in ranges capableof measuring low serum T levels expected for normal womenand children.

From our results, all assays without a relatively large sys-tematic bias for the adult male range (i.e. DPC-RIA, HUMC-RIA, Roche Elecsys and Ortho Vitros ECi) would be accept-able assays for measuring adult male sera. These assayscould also be used for the diagnosis for male hypogonadismusually defined as serum T values less than 300 ng/dl (10.4nmol/liter). For a serum sample in a male with a T concen-tration at or less than 200 ng/dl (6.9 nmol/liter), a methodthat measures serum T above �40% of LC-MSMS values,would give a T value of 280 ng/dl (9.7 nmol/liter) that wouldbe below the normal adult male range of 300 ng/dl. It ishowever essential that each laboratory using their ownmethod establish a reference range specific for subjects ofinterest, for example young adult males, women, prepuber-tal children.

The lower LOQ was 0.69 nmol/liter (20 ng/dl) for theLC-MSMS method when 2 ml of sera was used. This LOQwas similar to a prior report using LC-MSMS in bovine sera(26) and could be lowered by using more sera and revali-dated for female samples. For the DPC Immulite, seven of 122samples were below the LOQ. DPC-RIA gave readings abovethe LOQ for all these seven samples and LC-MSMS andHUMC-RIA each reported one sample below the LOQ. Itshould be noted that in this comparison study a standardvolume of serum was used as routinely performed for eachassay. In laboratory practice, more serum could be used insome of these assays to bring the LOQ to a lower threshold.

TABLE 4. Samples with serum T values determined by the six assays outside of �20% range of LC-MSMS values

DPC-RIA HUMC-RIA Roche Elecsys Ortho-Vitros ECi DPC-Immulite Bayer Centaur

Number of samples� �20% of LC-MSMS 3 12 19 25 45 5� �20% of LC-MSMS 17 28 7 6 4 46

Samples outside � 20% ofLC MSMS values (%)

All T values 20/101 (19.8%) 40/101 (39.6%) 26/101 (25.7%) 40/101 (39.7%) 49/101 (48.5%) 51/101 (50.4%)T value �100 ng/dl 7/11 (63.6) 10/11 (90.9%) 6/11 (55.5%) 10/11 (90.9%) 7/11 (63.6%) 6/11 (55.5%)



If more serum were used in the assays, validation studieswould need to be done to ensure that increasing the amountof serum would not affect the characteristics of the assay.

Because of the limitation of the volume of serum availablefor this study, the values obtained by LC-MSMS were basedon a single sample that was taken through extraction, LCfollowed by mass spectroscopy. Despite this limitation, theLC-MSMS assay underwent vigorous validation with a lin-ear calibration curve spanning 20–2000 ng/dl, accuracy be-tween 96.6 and 110.4% and precision of less than 10% at allpoints except for the LOQ results (8). The range of serum Tvalues obtained in 17 normal men ages 18–50 yr in this studywas 302–905 ng/dl by the LC-MSMS T method.

As shown in the College of American Pathologists qualitycontrol program, the four instrument-based assays we eval-uated were some of the commonest used by laboratoriesparticipating in this program. The DPC-RIA (DPC-Coat-a-Count) is the most common RIA used in hospital or referencelaboratories and appears to show the best agreement withserum T values measured in male serum by LC-MSMS. TheRIAs used by the Penn State-Hershey Medical Center (DPC-RIA) and the HUMC-RIA were both fully validated accord-ing to standard procedures recommended (11). The HUMC-RIA uses an extraction step. An internal standard was notused to monitor procedural losses because during initialvalidation this was found not to improve assay performance.Possibly because of this reason, the HUMC-RIA had a higherLOQ and higher interassay and intraassay variability thanthe DPC-RIA. The medians for all the evaluable serum Tvalues were 490 and 473 ng/dl for DPC-RIA and HUMC-RIA, respectively. The correlation coefficient between thetwo RIAs was 0.964 and Deming’s regression with T valuesmeasured by HUMC-RIA on the vertical axis showed a slopeof 1.05 and an intercept of �85.6 ng/ml (data not shown).There was no systematic bias between the two RIAs, andthese two assays also gave similar adult male range.

The automated assay instruments are widely used in clin-ical and reference laboratories. Our comparison results in-dicate that the DPC Immulite gives T values that are biasedin the low direction. This assay also had a high LOQ (49ng/dl). The normal range given by the manufacturer (286–1510 ng/dl) had a similar low male reference range as othermethods but with an extremely high upper limit. This sug-gests that the adult male range might not have been gener-ated by each laboratory and both the lower and the upperlimit of the reference range might have to be adjusted. TheBayer Centaur assay on the other hand showed a systematicbias toward higher serum T levels when compared withLC-MSMS. Despite this bias toward higher values, the ref-erence range for adult men with this instrument is reportedas 241–827 ng/dl. This range obtained from the manufac-turer should be validated in each laboratory that uses thisinstrument with an adequate number of adult healthy malesamples as suggested by Shah et al. (11). Our study suggeststhat the reference range quoted by the manufacturer may beinappropriate for individual laboratories and the determi-nation of reference ranges for male, female, and children’sserum should be determined by each laboratory using thismethod.

We conclude that using LC-MSMS as our gold standard for

estimating serum T levels in male serum, the DPC-RIA, theRoche Elecsys, the Ortho Vitros ECi, and HUMC-RIA gaveresults that are within the clinically acceptable limits of �20% of the reference method in over 60% of the samples. Atlow T concentrations (�100 ng/dl), HUMC-RIA is biasedtoward higher values, whereas the Ortho Vitros ECi resultsare biased toward lower values. The DPC Immulite methodshowed a systematic bias in the low direction, whereas theBayer Centaur was biased in the high direction for serum Tlevels at all concentrations. In this study, the DPC-RIA andRoche Elecsys methods for determining serum T levels showthe closest correlation with values determined by LC-MSMS.Without modification, none of the automated methods arecurrently acceptable for the measurement of T in the serumof normal females or children. These methods lack adequateprecision, accuracy, and have a sufficiently low limit of quan-titation to preclude their use in these populations. Becausefree T measurements either directly by equilibrium dialysis,from bioavailable T calculations or from a total T to sexhormone binding globulin ratio are dependent on an accu-rate T measurement, the results of this study has significantimplications on free T determinations as well (27).

Acknowledgments

The authors thank Nancy Berman, Ph.D., for her advice with thestatistical analyses. This study would not have been possible without theeffort of Andrew Leung, HTC, who coordinated all the samples and theassays at the HUMC. We thank Alfred De Leon from the UCLA OlympicAnalytical Laboratory for the excellence in analytical chemistry. ChrisHamilton from the Penn State Core Endocrine Laboratory at Hersheykindly blinded all of the samples for the different assay methods andshipped samples to the individual clinical laboratories for analysis. Wealso thank those laboratories who performed T analysis on the auto-mated platforms: Drs. Peter Wilding and Marilyn Senior at the Univer-sity of Pennsylvania, William Pepper Laboratories, Dr. Carolyn Feld-kamp at Henry Ford Hospital, and Dr. Bette Seamonds at Mercy HealthSystems. We thank Laura Hull, B.A., who managed the database andwas responsible for the graphic presentations, and Sally Avancena,M.A., for preparing the manuscript.

Received July 24, 2003. Accepted September 29, 2003.Address all correspondence and requests for reprints to: Christina

Wang, M.D., UCLA School of Medicine, General Clinical Research Cen-ter, Box 16, 1000 West Carson Street, Torrance, California 90502.

This work was supported by the Core Endocrine Laboratory at PennState-Hershey Medical Center; National Institutes of Health (NIH) GrantMO1 RR00543 to the GCRC at Harbor-UCLA Medical Center; NIHGrants RO1 CA 71053 and RO1 DK 61006 (to C.W., D.H.C., and R.S.S.);and United States Anti-Doping Agency (to D.H.C.). The samples werecollected by the nurses of the Harbor-UCLA GCRC, supported by NIHGrant MO1 RR00425.

References

1. Furuyama S, Mayes D, Nugent C 1970 A radioimmunoassay for plasmatestosterone. Steroids 16:415–428

2. Chen J, Zoru E, Hallberg M, Wieland R 1971 Antibodies to testosterone-3-bovine serum albumin applied to assay of serum 17-�-ol androgens. Clin Chem17:581–584

3. Dufan M, Catt K, Tsuruhara T, Ryan D 1972 Radioimmunoassay of plasmatestosterone. Clin Chem Acta 37:109–116

4. Wang C, Youatt G, O’Connor S, Dulmanis A, Hudson B 1974 A simpleradioimmunoassay for plasma testosterone plus 5 � dihydrotestosterone. JSteroid Biochem 5:551–555

5. Larsen PR, Kronenberg HM, Melmed S, Polonsky K 2002 William’s textbookof endocrinology, reference values. 10th ed. Philadelphia: Saunders

6. Dorgan JF, Fears TR, McMahon RP, Friedman LA, Patterson BH, GreenhutSF 2002 Measurement of steroid sex hormones in serum: a comparison ofradioimmunoassay and mass spectrometry. Steroids 67:151–158



7. Gelpi E 1995 Biochemical and biochemical applications of liquid chromatog-raphy-mass spectrometry. J Chromatogr A 703:59–80

8. Sheffield-Moore M, Urban RJ, Wolf SE, Jiang J, Catlin DH, Herndon DN,Wolfe RR, Ferrando AA 1999 Short-term oxandrolone administration stim-ulates net muscle protein synthesis in young men. J Clin Endocrinol Metab84:2705–2711

9. Starcevic B, DiStefano E, Wang C, Catlin DH 2003 An LC-MS-MS assay forhuman serum testosterone and deuterated testosterone. J Chromatogr B Ana-lyt Technol Biomed Life Sci 792:197–204

10. Lang JR, Bolton S 1991 A comprehensive method of validation strategy forbioanalytical applications in the pharmaceutical industry-2. Statistical analy-ses. J Pharm Biomed Anal 9:435–442

11. Shah VP, Midha KK, Findlay JWA, Hill HM, Hulse JD, McGilveray IJ,McKay G, Miller JJ, Patnaik RN, Powell ML, Tonelli A, Viswanathan CT,Yacobi A 2000 Bioanalytic method validation—a revisit with a decade ofprogress. Pharm Res 17:1551–1557

12. Wang C, Berman N, Longstreth JA, Chuapoco B, Hull L, Steiner S, FaulknerS, Dudley RE, Swerdloff RS 2000 Pharmacokinetics of transdermal testos-terone gel in hypogonadal men: application of gel at one site versus four sites:a general clinical research center study. J Clin Endocrinol Metab 85:964–969

13. Swerdloff RS, Wang C, Cunningham G, Dobs A, Iranmanesh A, MatsumotoA, Snyder P, Weber T, Berman N, and T gel Study Group 2000 Comparativepharmacokinetics of two doses of transdermal testosterone gel versus testos-terone patch after daily application for 180 days in hypogonadal men. J ClinEndocrinol Metab 85:4500–4510

14. Magari RT 2000 Evaluating agreement between two analytical methods inclinical chemistry. Clin Chem Lab Med 38:1021–1025

15. Linnet K 1998 Performance of Deming regression analysis in case of mis-specified analytic error ratio in method comparison studies. Clin Chem 44:1024–1031

16. Perisic I, Rosner B 1999 Comparison of measures of interclass correlation: thegeneral case of unequal group size. Stat Med 18:1451–1466

17. Bland JM, Altman DG 1986 Statistical method for assessment of agreementbetween two methods of clinical measurement. Lancet i:307–310

18. Bland JM, Altman DG 1999 Measuring agreement in method comparisonstudies. Stat Methods Med Res 8:135–160

19. Pollock MA, Jefferson SG, Kane JW, Lomax K, Mackinnon G, Winnard CB1992 Method comparison—a different approach. Ann Clin Biochem 29:556–560

20. Dewitte K, Fierens C, Stockl D, Thienport LM 2002 Application of the Bland-Altman plot for interpretation of method-comparison studies: a critical inves-tigation of its practice. Clin Chem 48:799–801

21. Linnet K 1999 Necessary sample size for method comparison studies based onregression analysis. Clin Chem 45:882–894

22. Taieb J, Mathian B, Millot F, Patricot M-C, Mathieu E, Queyrel N, LacroixI, Somma-Delpero C, Boudou P 2003 Testosterone measured by 10 immu-noassays and by isotope-dilution gas chromatography-mass spectrometry insera from 116 men, women, and children. Clin Chem 49:1381–1395

23. Rodbard D, Lenox RH, Wray RL, Ramseth D 1976 Statistical characterizationof random errors in the radioimmunoassay dose-response variable. Clin Chem22:350–358

24. Dudley RA, Edwards P, Ekins RP, Finney DJ, McKenzie IG, Raab GM,Rodbard D, Rodgers RP 1985 Guidelines for immunoassay processing. ClinChem 31:1264–1271

25. Guardabasso V, Rodbard D, Munson PJ 1987 A dose-response analysis. Am JPhysiol 252:E357–E364

26. Draisci R, Palleschi L, Ferretti E, Lucentini L, Cammarata P 2000 Quantitationof anabolic hormones and their metabolites in bovine serum and urine byliquid chromatography-tandem mass spectrometry. J Chromatogr A 870:511–522

27. Vermeulen A, Verdouck L, Kaufman JM 1999 A critical evaluation of simplemethods for the estimation of free testosterone in serum. J Clin EndocrinolMetab 84:3666–3672

JCEM is published monthly by The Endocrine Society (http://www.endo-society.org), the foremost professional society serving theendocrine community.



Long-Term Pharmacokinetics of TransdermalTestosterone Gel in Hypogonadal Men*

RONALD S. SWERDLOFF, CHRISTINA WANG, GLENN CUNNINGHAM,ADRIAN DOBS, ALI IRANMANESH, ALVIN M. MATSUMOTO, PETER J. SNYDER,THOMAS WEBER, JAMES LONGSTRETH, NANCY BERMAN, AND THE

TESTOSTERONE GEL STUDY GROUP†

Divisions of Endocrinology, Departments of Medicine/Pediatrics, Harbor-University of California-LosAngeles Medical Center and Research and Education Institute (R.S.S., C.W., N.B.), Torrance,California 90509; Veterans Affairs Medical Center, Baylor College of Medicine (G.C.), Houston, Texas77030; The Johns Hopkins University (A.D.), Baltimore, Maryland 21287; Veterans Affairs MedicalCenter (A.I.), Salem, Virginia 24153; Veterans Affairs Puget Sound Health Care System, University ofWashington (A.M.M.), Seattle, Washington 98108; University of Pennsylvania Medical Center (P.J.S.),Philadelphia, Pennsylvania 19104; Duke University Medical Center (T.W.), Durham, North Carolina27705; Unimed Pharmaceuticals, Inc. (J.L.), Deerfield, Illinois 60015

ABSTRACTTransdermal delivery of testosterone (T) represents an effective

alternative to injectable androgens. Transdermal T patches normal-ize serum T levels and reverse the symptoms of androgen deficiencyin hypogonadal men. However, the acceptance of the closed system Tpatches has been limited by skin irritation and/or lack of adherence.T gels have been proposed as delivery modes that minimize theseproblems. In this study we examined the pharmacokinetic profilesafter 1, 30, 90, and 180 days of daily application of 2 doses of T gel (50and 100 mg T in 5 and 10 g gel, delivering 5 and 10 mg T/day,respectively) and a permeation-enhanced T patch (2 patches deliv-ering 5 mg T/day) in 227 hypogonadal men. This new 1% hydroalco-holic T gel formulation when applied to the upper arms, shoulders,and abdomen dried within a few minutes, and about 9–14% of the Tapplied was bioavailable. After 90 days of T gel treatment, the dosewas titrated up (50 mg to 75 mg) or down (100 mg to 75 mg) if thepreapplication serum T levels were outside the normal adult malerange. Serum T rose rapidly into the normal adult male range on day1 with the first T gel or patch application. Our previous study showedthat steady state T levels were achieved 48–72 h after first applicationof the gel. The pharmacokinetic parameters for serum total and freeT were very similar on days 30, 90, and 180 in all treatment groups.After repeated daily application of the T formulations for 180 days, the

average serum T level over the 24-h sampling period (Cavg) was high-est in the 100 mg T gel group (1.4- and 1.9-fold higher than the Cavgin the 50 mg T gel and T patch groups, respectively). Mean serumsteady state T levels remained stable over the 180 days of T gelapplication. Upward dose adjustment from T gel 50 to 75 mg/day didnot significantly increase the Cavg, whereas downward dose adjust-ment from 100 to 75 mg/day reduced serum T levels to the normalrange for most patients. Serum free T levels paralleled those of serumtotal T, and the percent free T was not changed with transdermal Tpreparations. The serum dihydrotestosterone Cavg rose 1.3-fold abovebaseline after T patch application, but was more significantly in-creased by 3.6- and 4.6-fold with T gel 50 and 100 mg/day, respec-tively, resulting in a small, but significant, increase in the serumdihydrotestosterone/T ratios in the two T gel groups. Serum estradiolrose, and serum LH and FSH levels were suppressed proportionatelywith serum T in all study groups; serum sex hormone-binding globulinshowed small decreases that were significant only in the 100 mg T gelgroup. We conclude that transdermal T gel application can efficientlyand rapidly increase serum T and free T levels in hypogonadal mento within the normal range. Transdermal T gel provided flexibility indosing with little skin irritation and a low discontinuation rate.(J Clin Endocrinol Metab 85: 4500–4510, 2000)

THE SKIN IS an attractive route for systemic delivery ofsteroids. Transdermal preparations of testosterone (T)

provide a useful delivery system for normalizing serum Tlevels in hypogonadal men and preventing the clinical symp-toms and long-term effects of androgen deficiency (1–5).Currently available transdermal patches are applied to the

scrotal skin (Testosderm) or to other parts of the body (An-droderm and Testoderm TTS). The former requires prepa-ration of the scrotal skin with hair clipping or shaving tooptimize adherence of the patches. The permeation-enhanced T patch (Androderm) is associated with skin irri-tation in about a third of the patients, and 10–15% of subjectshave been reported to discontinue the treatment because of

Received December 28, 1999. Revision received March 25, 2000. Re-revision received June 30, 2000. Accepted August 30, 2000.

Address all correspondence and requests for reprints to: ChristinaWang, M.D., General Clinical Research Center, Harbor-University ofCalifornia-Los Angeles Medical Center, 1000 West Carson Street, Tor-rance, California 90509-2910. E-mail: [email protected].

* This work was supported by grants from Unimed Pharmaceuticals,Inc. The work at Harbor-University of California-Los Angeles MedicalCenter was supported by NIH Grant M01-RR-00425 to the GeneralClinical Research Center. The work at Duke University Medical Centerwas performed at the General Clinical Research Center supported byNIH Grant M01-RR-0030.

† The Testosterone Gel Study Group includes: S. Berger, The ChicagoCenter for Clinical Research (Chicago, IL); E. Dula, West Coast ClinicalResearch (Van Nuys, CA); J. Kaufman, Urology Research Options (Au-rora, CO); G. P. Redmond, Center for Health Studies (Cleveland, OH);S. Scheinman and H. W. Hutman, South Florida Bioavailability Clinic(Miami, FL); S. L. Schwartz, Diabetes and Glandular Disease Clinic, P.A.(San Antonio, TX); C. Steidle, Northeast Indiana Research (Fort Wayne,IN); J. Susset, MultiMed Research (Providence, RI); G. Wells, AlabamaResearch Center, L.L.C. (Birmingham, AL); and R. E. Dudley, S.Faulkner, N. Rehousky, G. Ringham, W. Singleton, and K. Zunich,Unimed Pharmaceuticals, Inc. (Deerfield, IL).

0021-972X/00/$03.00/0 Vol. 85, No. 12The Journal of Clinical Endocrinology & Metabolism Printed in U.S.A.Copyright © 2000 by The Endocrine Society

4500


chronic skin irritation (6, 7). Preapplication of corticosteroidcream at the site of application of the Androderm patch hasbeen reported to decrease the incidence and severity of theskin irritation (8). The most recently approved nonscrotal Tpatch (Testoderm TTS) causes less skin irritation (itching inabout 12% and erythema in 3% of the subjects), but adherenceof the patch to the skin poses a problem in some subjects (9,10). Despite these limitations of local irritation and adherenceto skin, the various T patches provide a steady state deliveryof T to the circulation that mimics the normal diurnal rhythmof serum T at the low to mid normal adult male range (11–17).The long-term use of these transdermal androgen deliverypatches has been shown to be efficacious in maintainingsexual function, secondary sexual characteristics, and boneand muscle mass in hypogonadal young and elderly men (5,18–21).

T and other steroids can also be applied to the skin in opensystems. When T is applied to the skin surface as a hydroal-coholic gel, the gel dries rapidly, and the steroid is absorbedinto the stratum corneum, which serves as a reservoir. Thereservoir in the skin releases T into the circulation slowlyover several hours, resulting in steady state serum levels ofthe hormones (22). Our previous short-term (7–14 days)pharmacokinetic studies of both T and 5a-dihydrotestoster-one (DHT) transdermal hydroalcoholic gels showed that theandrogens were absorbed, and peak levels of the appliedandrogens occurred 18–24 h after initial application. Withcontinued application of the gel for 7–14 days, steady serumlevels of androgens were maintained (23, 24). About 9–14%of the T in the gel applied to the skin is bioavailable (24). Wealso demonstrated that application of the T gel (100 mg/day)at a single site or four separate sites resulted in serum T levelsat the upper limit of the normal range, with about 23% higherserum levels when the gel was applied at four sites. In the 7-to 14-day studies, neither T nor DHT gel produced skinirritation in the small number of subjects studied (23, 24). Inthe present study we investigated the detailed pharmacoki-netics and tolerability of T gel (AndroGel) at two dosages (50and 100 mg/day) and T patch after repeated daily dosing for180 days in a large number of hypogonadal men (n 5 227)recruited from 16 centers across the United States.

Subjects and MethodsSubjects

Two hundred and twenty-seven hypogonadal men were recruited,randomized, and studied in 16 centers in the United States. About onethird of the subjects were randomized into each treatment group (Table1). The patients were between 19–68 yr of age and had single morningserum T levels at screening of 10.4 nmol/L (300 ng/dL) or less. Thescreening serum T concentrations were measured at each center’s clin-ical laboratory. Previously treated hypogonadal men were withdrawnfrom T ester injection for at least 6 weeks and from oral or transdermalandrogens for 4 weeks before the screening visit. Aside from the hy-pogonadism, the subjects were in good health, as evidenced by medicalhistory, physical examination, complete blood count, urinalysis, andserum biochemistry. If the subjects were taking lipid- lowering agentsor tranquilizers, the doses were stabilized for at least 3 months beforeenrollment. The subjects had no history of chronic medical illness oralcohol or drug abuse. The subjects had a normal rectal examination, aprostate-specific antigen level of less than 4 ng/mL, and a urine flow rateof more than 12 mL/s before enrollment to the study. They were ex-cluded if they had a generalized skin disease that might affect T ab-sorption or a prior history of skin irritability with the nonscrotal T patch(Androderm). Subjects with body weight of less than 80 or more than140% of ideal body weight and subjects taking medications known toalter the cytochrome P450 enzyme systems were also excluded from thisstudy.

T gel and patch

T gel (AndroGel) was manufactured by Besins Iscovesco (Paris,France) and supplied by Unimed Pharmaceuticals, Inc. (Deerfield, IL).The formulation is a hydroalcoholic gel containing 1% T (10 mg/g). Wehave previously shown that about 9–14% of the steroid in the gel appliedis available to the body. Thus, 10 g gel applied to the skin contain 100mg T and delivers approximately 10 mg T to the body (23, 24). Ap-proximately 250 g gel were packaged in multidose glass bottles thatdelivered 2.27 g gel for each actuation of the pump. Patients assigned tothe 50 mg T in 5 g gel group were given one bottle of T gel and one bottleof placebo gel (vehicle only); those assigned to the 100 mg T in 10 g gelwere dispensed two bottles of the active T gel. All patients applied T gelor placebo gel at four separate sites each day (right and left upperarms/shoulders and right and left abdomen). On day 1 of the study, thepatients were instructed to depress the pump of one of the bottles once,and the gel was applied to the right upper arm/shoulder. Then, usingthe same bottle, a second dose of gel was delivered and applied to theleft upper arm/shoulder. The second bottle was then used with theactuation of the pump for gel to be applied to the right abdomen andthe second actuation to the left abdomen. On the following day, theapplication sites were reversed. Alternate application sites continuedthroughout the study. After application of the gel to the skin, the gel

TABLE 1. Baseline characteristics of the hypogonadal men

Treatment group T patch(5 mg/day)

T gel(50 mg/day)

T gel(100 mg/day)

No. of subjects enrolled 76 73 78Age (yr) 51.1 51.3 51.0Range (yr) 28–67 23–67 19–68Ht (cm) 179.3 6 0.9 175.8 6 0.8 178.6 6 0.8Wt (kg) 92.7 6 1.6 90.5 6 1.8 91.6 6 1.5Serum T (nmol/L) at screeninga 6.40 6 0.41 6.44 6 0.39 6.49 6 0.37Causes of hypogonadism

Primary hypogonadism 34 26 34Secondary hypogonadism 15 17 12Aging 6 13 6Normogonadotropic hypogonadism 21 17 26

Yr diagnosed 5.8 6 1.1 4.4 6 0.9 5.7 6 1.24No. previously treated with T (%) 50 (65.8) 38 (52.1) 46 (59.0)Duration of treatment (yr) 5.8 6 1.0 5.4 6 0.8 4.6 6 0.7

a Screening serum T concentrations were measured before enrollment in each study center’s clinical laboratory and not at the centrallaboratory.

PHARMACOKINETICS OF T GEL 4501


dried within a few minutes. The patients washed their hands with soapand water thoroughly after gel application. After 90 days the subjectstitrated to the 75 mg/day T gel dose were supplied with three bottles,one containing placebo and two containing T gel. The subjects wereinstructed to apply one actuation from the placebo bottle and threeactuations from the T gel bottle to four different sites of the body asdescribed above.

T patches (Androderm) were provided, each delivering 2.5 mg/dayT, which is the recommended replacement dose for androgen replace-ment therapy. The patients were instructed to apply two T patches to aclean dry area of skin on the back, abdomen, upper arms, or thighs onceper day. Application sites were rotated, with an approximately 7-dayinterval between applications to the same site. T gel or patches wereapplied at approximately 0800 h each morning for 180 days.

In the T gel group, treatment compliance was estimated as the per-centage of T gel actually used compared with the theoretical amount ofT gel that could have been used. The actual amount of T gel used wasmeasured as the difference in weight of the dispensed and returned Tgel bottles. The theoretical weight of T gel that could have been used wascalculated as 2.27 g/actuation 3 days in study 3 2, 3, or 4 actuationsdepending on whether the dose of T gel was 50, 75, or 100 mg, respec-tively. In the T patch group, the actual number of patches used wascompared with the theoretical number that could have been used cal-culated as days in study 3 2 patches/day.

Study design

The study is a randomized, multicenter (16 centers), parallel studyincluding 2 doses of T gel and a single dose of T patches. A placebo groupwas not included because 6-month placebo treatment of hypogonadalmen was not believed to be justifiable, as untreated hypogonadism willresult in impaired libido, decreased strength, bone mineral loss, andother clinical defects. The study was double blinded until day 90 withrespect to the T gel groups and open label for the T patch group. For thefirst 3 months of the study (days 1–90), the subjects were randomizedto receive 50 mg/day T gel (in 5 g gel delivering about 5 mg T/day), 100mg/day T gel (in 10 g gel delivering about 10 mg T/day), or 2 patchesdelivering 5 mg T/day (T patch). In the following 3 months (days91–180), the subjects were administered 1 of the following treatments:50 mg/day T gel, 100 mg/day T gel, 75 mg/day T gel, or 5.0 mg/dayT patch. Patients who were applying T gel had a single, preapplicationserum T measurement made on day 60; if the levels were within thenormal range (10.4–34.7 nmol/L; 300-1000 ng/dL), they remained ontheir original dose. Men with T levels at 60 days of treatment less than10.4 nmol/L and who were applying 50 mg T gel and those with T levelsmore than 34.7 nmol/L who had received 100 mg T gel were thenassigned to the 75 mg/day T gel group for days 91–180. No changes indose were made to subjects randomized to T patch.

On days 0, 1, 30, 90, and 180 subjects had multiple blood samples forT and free T measurements at 30, 15, and 0 min before and 2, 4, 8, 12,16, and 24 h after T gel or patch application. Brief history and physicalexaminations were performed, and any complaints or adverse eventswere documented in the subject’s records. In addition, subjects returnedto each study center on days 60, 120, and 150 for a single blood samplingbefore application of the gel or patch. Serum DHT, estradiol (E2), FSH,LH, and sex hormone-binding globulin (SHBG) were measured in sam-ples collected before gel or patch application on days 0, 30, 60, 90, 120,150, and 180. Sera for hormones were stored frozen at 220 C until assay.All samples for a patient for each hormone were measured in the sameassay whenever possible. In addition, the subjects were examined forany adverse effects and skin irritation.

Hormone assays

Except for the screening serum T concentration, which was measuredat each center’s clinical laboratory, all hormone assays were performedat the Endocrine Research Laboratory of the Harbor-University of Cal-ifornia-Los Angeles Medical Center. Serum T levels were measured afterextraction with ethyl acetate and hexane by a specific RIA using reagentsfrom ICN Biomedicals, Inc. (Costa Mesa, CA). The cross-reactivities ofthe antiserum used in the T RIA were 2.0% for DHT, 2.3% for andro-stenedione, 0.8% for 3b-androstanediol, 0.6% for etiocholanolone, andless than 0.01% for all other steroids tested. The lower limit of quanti-

tation of serum T measured by this assay was 0.87 nmol/L (25 ng/dL).The mean accuracy (recovery) of the T assay, determined by spikingsteroid free serum with varying amounts of T (0.9–52 nmol/L), was104% (range, 92–117%). The intra- and interassay coefficients of the Tassay were 7.3% and 11.1% at the normal adult male range, which in ourlaboratory was 10.33–36.17 nmol/L (298–1043 ng/dL). Serum free T wasmeasured by RIA of the dialysate after an overnight equilibrium dialysis,using the same RIA reagents as in the T assay. The lower limit ofquantitation of serum free T using this equilibrium dialysis method wasestimated to be 22 pmol/L. When steroid-free serum was spiked withincreasing doses of T in the adult male range, increasing amounts of freeT were recovered, with a coefficient of variation that ranged from 11–18.5%. The intra- and interassay precisions of free T were 15% and 16.8%,respectively, for adult normal male values (121–620 pmol/L, 3.48–17.9ng/dL).

Serum DHT was measured by RIA after potassium permanganatetreatment of the sample followed by extraction. The methods and re-agents of the DHT assay were provided by Diagnostic Systems Labo-ratories, Inc. (Webster, TX). The cross-reactivities of the antiserum usedin the RIA for DHT were 6.5% for 3b-androstanediol, 1.2% for 3a-androstanediol, 0.4% for 3a-androstanediol glucuronide, 0.4% for T(after potassium permanganate treatment and extraction), and less than0.01 for other steroids tested. This low cross-reactivity against T wasfurther confirmed by spiking steroid free serum with T (35 nmol/L, 1000ng/dL) and taking the samples through the DHT assay. The results evenon spiking with over 35 nmol/L T were less than 0.1 nmol/L DHT. Thelower limit of quantitation of serum DHT in this assay was 0.43 nmol/L.All values below this value were reported as less than 0.43 nmol/L. Themean accuracy (recovery) of the DHT assay, determined by spikingsteroid free serum with varying amounts of DHT from 0.43–9 nmol/L,was 101% (range, 83–114%). The intra- and interassay coefficients ofvariation for the DHT assay were 7.8% and 16.6%, respectively, for theadult male range, which in our laboratory was 1.06–6.66 nmol/L (30.7–193.2 ng/dL).

Serum E2 levels were measured by a direct assay without extractionwith reagents from ICN Biomedicals, Inc. The intra- and interassaycoefficients of variation of E2 were 6.5% and 7.1%, respectively, fornormal adult male range (E2, 63–169 pmol/L, 17.1–46.1 pg/mL). Thelower limit of quantitation of the E2 was 18 pmol/L. All values belowthis value were reported as 18 pmol/L. The cross-reactivities of the E2antibody were 6.9% for estrone, 0.4% for equilenin, and less than 0.01%for all other steroids tested. The accuracy of the E2 assay was assessedby spiking steroid free serum with an increasing amount of E2 (18–275pmol/L). The mean recovery of E2 compared with the amount addedwas 99.1% (range, 95–101%).

Serum SHBG levels were measured by assay kits obtained from Delfia(Wallac, Inc., Gaithersburg, MD). The intra- and interassay precisionswere 5% and 12%, respectively, for the adult normal male range (10.8–46.6 nmol/L). Serum FSH and LH were measured by highly sensitiveand specific fluoroimmunometric assays with reagents provided byDelfia (Wallac, Inc., Gaithersburg, MD). The intraassay coefficient ofvariations for LH and FSH fluoroimmunometric assays were 4.3% and5.2%, respectively, and the interassay variations for LH and FSH were11.0% and 12.0%, respectively (adult normal male range: LH, 1.0–8.1U/L; FSH, 1.0–6.9 U/L). For both LH and FSH assays, the lower limitof quantitation was 0.2 IU/L. All samples obtained from the same subjectwere measured in the same assay.

Statistical analyses

Descriptive statistics for each of the hormone levels were calculated.Before analysis, each variable was examined for its distributional char-acteristics and, if necessary, transformed to meet the requirements of anormal distribution. There were no significant differences between thestudy sites on any of the parameters; therefore, the data presented werepooled for all of the centers. The pharmacokinetic parameters for eachfull sampling day were determined by noncompartmental methods. Thepharmacokinetics of T gel were assessed using the area under the curvefrom 0–24 h (AUC0–24) generated by the 24 h of multiple blood samplingfor T on days 1, 30, 90, and 180. The AUC was computed using the lineartrapezoid method. The average T concentration over the 24 h after gelapplication (Cavg) was calculated as the AUC0–24 divided by 24 h.

All data in the figures and tables show the treatment mean (6sem)

4502 SWERDLOFF ET AL. JCE & M • 2000Vol. 85 • No. 12


by time and/or day for each of the three groups of subjects based on thetreatment from days 0–90 and for each of the five groups from days91–180. However, because the final treatment groups (five groups) forthe subjects receiving T gel were no longer randomized, statistical com-parisons between groups were only performed until day 90 using theoriginal treatment assignments (50 or 100 mg T gel or patch) as theindependent groups. Comparisons between groups were performedusing one-way ANOVA or the Kruskal-Wallace test (accumulation ratio,fluctuation index) followed by posttest contrasts. Analysis of the effectswas performed using repeated measures ANOVA. The x2 test was usedto compare rates. Analyses of change from day 0 to day 180 withintreatment groups were performed within each of the five groups basedon pattern using paired t tests. Comparisons resulting in P # 0.05 wereconsidered statistically significant. SAS version 6.12 was used for allanalyses (SAS Institute, Inc., Chicago, IL).

ResultsSubjects

A total of 227 patients were enrolled: 73, 78, and 76 wererandomized to the 50 mg/day T gel (T gel 50), 100 mg/dayT gel (T gel 100), and T patch groups, respectively (Table 1).There were no significant differences in the patients’ char-acteristics at baseline (height, weight, and previous T treat-ment). Thirty-five to 45% of the patients in each treatmentgroup had primary hypogonadism (Klinefelter’s syndrome,anorchia, testicular failure); 15–25% had well defined sec-ondary hypogonadism (Kallman’s syndrome, hypothalamicpituitary disease, pituitary tumor). The other patients hadlow serum T and normal or low normal LH levels. Thesewere ascribed to aging (based on age .60 yr), or normogo-nadotropic hypogonadism. These patients did not have brainimaging to exclude hypothalamic-pituitary disease. Theirprimary physician did not deem that brain scans were in-dicated. After completion of day 90, 55 of the subjects in theT patch, 67 in the T gel 50, and 73 in the T gel 100 groupsagreed to continue for another 3 months (days 91–180). Thediscontinuation rate (21 of 76, 27.6%) in the T patch groupwas higher (P 5 0.0002) than those in the T gel groups (50 mg:6 of 73, 8.2%; 100 mg: 5 of 78, 6.4%). Most of the discontin-uation in the T patch group was due to adverse skin reactionbased on the subjects’ complaints and records. After 90 daysof treatment, patients randomized initially to the T gelgroups had dose adjustment if their preapplication serum Tlevel was below 10.4 or above 34.7 nmol/L on day 60. Twentysubjects who had received 50 mg/day T gel had their doseincreased to 75 mg/day; 20 who had received 100 mg/dayT gel decreased their dose to 75 mg/day. The exceptionswere 1 100 mg T gel patient who was adjusted to 50 mg/dayand 1 50 mg T gel patient who decreased the dose to 25mg/day. Before approval of the long-term follow-up study,3 patients who were receiving T patch until day 90 wereswitched to T gel 50 from days 91–180 because of skin irri-tation from the patches. The data for these 3 patients as wellas for the single subject who was changed from 100 to 50mg/day were analyzed as the T gel 50 group from days91–180. The number of subjects enrolled in the study fromdays 91–180 was 195, with 51 receiving T gel 50, 40 receivingT gel 75, 52 receiving T gel 100, and 52 continuing on thepatch.

Treatment compliance

From days 1–90, the mean treatment compliance rateswere 89.8%, 93.1%, and 96.0% for the T patch, T gel 50, andT gel 100 groups, respectively. During days 1–180 (the6-month study period), the mean compliance rate was 86.3%for the T patch and 93.3%, 111.4%, and 96.5% for the 50, 75,and 100 mg/day T gel groups, respectively.

Pharmacokinetics of serum T concentrations (Table 2 andFig. 1)

At baseline (day 0) average serum T concentrations over24 h (Cavg) were similar in the three groups and were belowthe normal adult range (Fig. 1). In all three groups, during the24-h baseline period the mean maximum T levels (Cmax)occurred between 0800–1000 h (0–2 h in Fig. 1), and theminimum (Cmin) T levels occurred 8–12 h later, demonstrat-ing the expected diurnal variation of serum T.

About 35% of the patients in each group (24 of 73 subjectsfor the T gel 50, 26 of 78 subjects for the T gel 100, and 25 of76 subjects for T patch) had Cavg within the lower normaladult male range on day 0. (The Cavg of serum T levels atbaseline in the subjects with normal serum T on day 0 were13.3 6 0.4, 13.3 6 0.5, and 13.0 6 0.5 nmol/L in the T patch,T gel 50, and T gel 100 groups, respectively.) However, over55% of these subjects had one or more serum T measure-ments below 10.4 nmol/L during the course of day 0. Allexcept three of the subjects met the enrollment criterion ofserum T less than 10.4 nmol/L at screening (measured ateach center’s laboratory). These three subjects were enrolledduring a brief period when the admission serum T level wasraised to 12.1 nmol/L (350 ng/dL) or less by the sponsor. TheCavg of serum T in the three treatment groups on day 90 aftertransdermal T application was different between those withlow (T patch, 11.8 6 0.8; T gel 50, 17.2 6 1.2; T gel 100, 25.9 61.4 nmol/L) or normal (T patch, 14.5 6 0.7; T gel 50, 25.1 62.4; T gel 100, 29.5 6 1.9 nmol/L) baseline serum T levels.This was anticipated; however, statistical analyses with two-way ANOVA showed that the status (Cavg) of serum T atbaseline of more than or less than 10.4 nmol/L had no sig-nificant interaction with treatment. Thus, the differential re-sponse to transdermal T treatment was not confounded bythe pretreatment serum T concentrations. Inclusion of thesesubjects did not influence the pharmacokinetic results of thetreatment groups. Thus, in all subsequent pharmacokineticanalyses, all subjects in a treatment group were analyzedtogether regardless of whether their Cavg of serum T on day0 was more than or less than 10.4 nmol/L.

On day 1 after the first application of transdermal T, serumT rose most rapidly in the T patch group, reaching a Cmaxbetween 8–12 h (Tmax), plateaued for another 8 h, then de-clined to the baseline. Serum T rose steadily to the normalrange after T gel application, with Cmax achieved by 22 and16 h in the T gel 50 and T gel 100 groups, respectively.

On days 30 and 90, serum T followed a similar pattern ason day 1 in the T patch group. In the T gel groups, serum Tlevels were at steady state, showing small and variable in-creases after treatment. After gel application on both days 30and 90, the Cavg in the T gel 100 group was 1.4-fold higherthan that in the T gel 50 group and was 1.9-fold higher than



that in the T patch group (P 5 0.0001). The variation in serumconcentration over the day [fluctuation index 5 (Cmax 2Cmin)/Cavg] was similar in the three groups. On days 30 and90, the accumulation ratio, which is defined as the increasein daily exposure to T with continued transdermal applica-tion (calculated as AUCday 30 or 90/AUCday 1) was 0.94 6 0.04for the T patch group showing no accumulation, whereas theaccumulation ratios at 1.53 6 0.09 and 1.9 6 0.18 were sig-nificantly higher (P 5 0.0001) in the T gel 50 and 100 groups,respectively. This indicates that the T gel preparations had alonger effective half-life than the T patch (Table 2 and Fig. 2).

On day 180, the serum T concentrations achieved and thepharmacokinetic parameters were similar to those on days 30and 90 in those patients who continued in their initial ran-domized treatment groups (Fig. 1 and Table 2). For the pa-tients who switched from T gel 50 or 100 to T gel 75, their Cavgon day 180 was 20.84 6 1.76 nmol/L, midway between theCavg in the T gel 50 (19.24 6 1.18 nmol/L) and T gel 100(24.72 6 6.08 nmol/L) groups. Examination of Table 2 andFig. 1 shows that the patients titrated to this T gel 75 groupwere not homogeneous. On day 180, the Cavg in the patientsin the T gel 100 group who converted to 75 mg/day on day90 was 1.7-fold higher than the Cavg in the patients titratedto T gel 75 from 50 mg/day. Despite adjusting the dose upby 25 mg/day in the T gel 50 to 75 group, the Cavg remainedlower than for those remaining in the 50 mg group. In the Tgel 100 to 75 group, the Cavg became similar to those achievedby patients remaining in the T gel 100 group without dosetitration.

The increase in AUC0–24 h on days 30, 90, and 180 from thepretreatment baseline (net AUC0–24 h) showed dose propor-tionality. The mean for the net AUC0–24 h from day 0 to day

30 or 90 was about 1.7-fold higher for T gel 100 than for T gel50 patients (T gel 50: day 30, 268 6 28; day 90, 263 6 29nmol/Lzh; T gel 100: day 30, 446 6 30; day 90, 461 6 27nmol/Lzh). A 4.3 nmol/L (125 ng/dL) mean increase in theserum T Cavg level was produced by each 25 mg/day of T gel.The increases in AUC0–24 h from the pretreatment baselineachieved by the T gel 100 and T gel 50 groups were approx-imately 2.9- and 1.7-fold higher than those resulting fromapplication of the T patch (day 30, 154 6 18; day 90, 157 620 nmol/Lzh).

The preapplication serum T levels in the T patch groupremained at the lower limit of the normal range throughoutthe entire treatment period. Serum T levels after T gel ap-plication reached steady state at about 1–2 days after theinitial application (24). Thereafter, the mean serum T levelsremained at about 17–20 nmol/L in the T gel 50 group andabout 22–30 nmol/L in the T gel 100 group (Fig. 2, upperpanel).

Pharmacokinetics of serum free T concentration

At baseline (day 0), serum free T Cavg was similar in allthree groups (T patch, 167 6 14; T gel 50, 154 6 14; T gel 100,150 6 13 pmol/L) and was at the lower limit of the adult malerange (121–620 pmol/L). The detailed pharmacokinetic pa-rameters of serum free T on days 1, 30, 90, and 180 mirroredthose of serum total T as described above (data not shown).Similar to the total T results, the free T Cavg achieved by theT gel 100 group was 1.4- and 1.7-fold higher than those in theT gel 50 and T patch groups, respectively (P 5 0.001).

The preapplication mean free T levels throughout thetreatment period in all three groups were within the normal

TABLE 2. Serum T pharmacokinetic parameters after transdermal application of T gel or patch

Parameters T patch T gel(50 mg/day)

T gel(100 mg/day)

Day 0Cavg (nmol/L) 8.22 6 0.55 8.22 6 0.53 8.60 6 0.55Cmax (nmol/L) 10.89 6 0.71 11.37 6 0.72 11.55 6 0.76Cmin (nmol/L) 6.07 6 0.42 6.07 6 0.42 6.52 6 0.44

Day 1Cavg (nmol/L) 16.71 6 0.82 13.80 6 0.63 17.82 6 0.90Cmax (nmol/L) 22.36 6 1.13 19.42 6 1.09 25.86 6 1.39Cmin (nmol/L) 8.04 6 0.53 7.90 6 0.50 8.67 6 0.57Tmax (h) 11.8 22.1 16.0



Day 180 T patch T gel(50 mg/day)

T gel(50 to 75 mg/day)

T gel(100 to 75 mg/day)

T gel(100 mg/day)

Cavg (nmol/L) 14.14 6 0.88 19.24 6 1.18 15.60 6 3.68 25.79 6 2.55 24.72 6 1.05Cmax (nmol/L) 20.04 6 1.31 28.78 6 1.81 23.58 6 3.72 38.48 6 3.72 37.55 6 2.17Cmin (nmol/L) 7.69 6 0.62 12.86 6 0.86 10.47 6 2.47 17.51 6 1.85 16.82 6 0.78Tmax (h) 10.6 5.8 2.0 7.8 7.7

Cavg (nmol/L), Time-averaged concentration over 24-h dosing interval determined by AUC0–24/24; Cmax (nmol/L), maximum concentrationduring 24-h dosing interval; Cmin (nmol/L), minimum concentration during 24-h dosing interval; Tmax, time at which Cmax occurred.



range, with the T gel 100 group maintaining higher free Tlevels than both the T gel 50 and T patch groups (Fig. 2, middlepanel). The calculated percent free T (free T/T 3 100) re-mained between 1.6–2.2% before and throughout the trans-dermal T treatment period. Exogenous T replacement did notsignificantly alter the percent free T in any of the treatmentgroups (Fig. 2, lower panel).

Serum DHT concentrations

The pretreatment mean serum DHT concentrations werebetween 1.24–1.45 nmol/L, which were near the lower limitof the normal range (1.06–6.66 nmol/L) and were not dif-ferent among the three groups (Fig. 3, upper panel). After Tpatch application mean serum DHT levels rose to about1.3-fold above the baseline, whereas serum DHT increased to3.6-fold (within the normal range) and 4.8-fold (at the upperlimit of the normal range) above the baseline after applicationof T gel 50 and 100 (P 5 0.0001), respectively, throughout the180 days. Examination of the DHT to T ratio (Fig. 3, middlepanel) showed that this ratio was not significantly altered inthe T patch group (P 5 0.078), whereas in the T gel 50 and100 groups, the DHT to T ratio increased significantly froma baseline of 0.2 to between 0.23–0.29 and 0.29–0.33, respec-tively, during the treatment period (P 5 0.0001 for bothgroups). The mean serum total androgen levels (calculated asthe sum of serum T 1 DHT levels for each time point)achieved by T gel 100 throughout the treatment period were

1.4- and 2.5-fold higher than those in the T gel 50 (;20nmol/L) and T patch (;10 nmol/L) groups, respectively(P 5 0.0001; Fig. 3, lower panel). Adjustment of the T gel doseon day 90 did not significantly affect the serum DHT levels,DHT/T ratios, or total androgen levels.

Serum E2 concentrations

The baseline mean serum E2 levels were at the lower nor-mal range and were not different in the three treatmentgroups. After transdermal T application, mean serum estra-diol increased to stable levels by an average of 9.2% in the Tpatch during the treatment period, 30.9% in the T gel 50group, and 45.5% in the T gel 100 group (P 5 0.001; Fig. 4).

Serum SHBG concentrations

The serum SHBG levels were similar and within the adultmale range in the three treatment groups at baseline. AfterT replacement, serum SHBG levels showed a small decreasein all three groups (P 5 0.0046; data not shown), which wasmost marked in the T gel 100 group (baseline, 26.6 6 2.0; day90, 23.6 6 2.7; day 180, 24.0 6 1.7 nmol/L; P 5 0.0095).

Suppression of serum gonadotropin levels

Because of the wide variability in the baseline serum LHand FSH levels, these were expressed as the percent changefrom baseline in response to T replacement (Fig. 5). The mean

FIG. 1. Serum T concentrations (mean 6 SE) before (day 0) and after transdermal T applications on days 1, 30, 90, and 180. Time 0 h was 0800 h,when blood sampling usually began. On day 90, the dose in the subjects applying T gel 50 or 100 was up- or down-titrated if their preapplicationserum T levels were below or above the normal adult male range, respectively. In this and subsequent figures the dotted lines denote the adultmale normal range, and the dashed lines and open symbols represent subjects whose T gel dose were adjusted.



percent suppression of serum LH levels was least in the Tpatch group (between ;30–40%), intermediate in the T gel50 group (between ;55–60%), and most marked in the T gel100 group (between ;80–85%; P , 0.01). The suppression ofserum FSH paralleled that of serum LH levels. In the subjectswith primary hypogonadism, mean serum LH and FSH lev-els were suppressed to within the normal range after bothdoses of T gel administration, but remained above the normalrange after T patch application. The suppression of serumgonadotropins occurred in all hypogonadal subjects regard-less of the classification of hypogonadism.

Discussion

We have shown in this study that transdermal applicationof this new hydroalcoholic T gel formulation (AndroGel) toa large area of skin (arms, shoulders, and abdomen) at 50 and100 mg/day (in 5 and 10 g gel, delivering approximately 5and 10 mg T/day, respectively) resulted in dose proportional

increases in serum T in a large number of hypogonadal men.After the first application of T gel, serum T levels graduallyclimbed to reach a maximum level after 48–72 h, as shownin our previous report (24). On repeated application, as il-lustrated by the pharmacokinetics, parameters on days 30,90, and 180 remained remarkably similar and steady serumT levels were maintained, with small and variable peaks ofserum T after each application. The T levels achieved with theT patch showed little evidence of accumulation (accumula-tion ratio, ;1) with repeated application. The accumulationratios were higher in both T gel groups (1.5–1.9) on day 30,consistent with the longer lasting elevations of serum T. Withcontinued application of T gel, the accumulation ratesshowed no further increases, suggesting no further accumu-lation on days 90 and 180.

Dose titration of T gel to 75 mg was initiated after day 90in the hypogonadal men who had serum T levels above orbelow the normal range. Because of study design there was

FIG. 2. Preapplication serum T (upperpanel), free T (middle panel), and per-cent free T (lower panel) concentrationsduring daily treatment with T gel orpatch from days 1–90 (left panel) anddays 90–180 (right panel). On day 90,the dose in the T gel groups waschanged in some subjects, as describedin Fig. 1. Œ, T patch, f, T gel 50; F, T gel100; M, T gel 50 to 75; E, T gel 100 to 75.



no dose adjustment within the T patch group. Increasing thenumber of T patches to three or four a day could haveresulted in increases in the mean serum T concentrations (16),but might have led to an even higher dropout rate becauseof skin irritation in some subjects. The patients who wereconverted from the T gel 50 to 75 mg/day, despite increasingthe dose by 50%, had average serum T levels lower than thoseremaining in the T gel 50 group. It is uncertain whether theselower responders to T gel might be less compliant or arebiologically different. The former may be possible in someindividuals, as about one third of the subjects had a lowermean compliance rate of 80%, and the average serum T levels

attained were related to the mean compliance rate. Alterna-tively, some patients might have low absorption and highclearance of T either in the basal state or after induction byexogenous T. Downward titration of the T gel dose from 100to 75 mg/day was effective in decreasing the mean serum Tlevel in the group by 15% and lowering the serum T con-centration to the normal range in 16 of 19 of these hypogo-nadal men.

The present study examined a new transdermal open sys-tem, T gel, together with the available standard closed Tpatch system. A placebo group was not included because ofethical problems associated with withdrawing or delaying T

FIG. 3. Preapplication serum DHT con-centration (upper panel), DHT/T ratio(middle panel), and DHT and T concen-trations (lower panel) during dailytransdermal T treatment from days1–90 (left panel) and days 90–180 (rightpanel). Œ, T patch; f, T gel 50; F, T gel100; M, T gel 50 to 75; E, T gel 100 to 75.



replacement in hypogonadal men for a prolonged 6-monthstudy period. Despite a relatively higher dropout rate, phar-macokinetic data obtained from this large group of hypogo-

nadal men treated with this T patch were similar to thosepreviously reported (14, 15).

Serum free T levels rose after transdermal T gel or T patch

FIG. 4. Serum E2 levels during trans-dermal T treatment from days 1–90 (leftpanel) and days 90–180 (right panel).Œ,T patch; f, T gel 50; F, T gel 100; M, Tgel 50 to 75; E, T gel 100 to 75.

FIG. 5. Percent change in serum LH(upper panel) and FSH (lower panel)from baseline values after transdermalT replacement therapy from days 1–90(left panel) and days 90–180 (right pan-el). Œ, T patch; f, T gel 50; F, T gel 100;M, T gel 50 to 75; E, T gel 100 to 75.



application, paralleling those of serum T. The percent free Tdid not change significantly with T treatment. The resultswere corroborated by the small decreases, probably not clin-ically significant, in serum SHBG observed after transdermalT replacement in all three groups. The results indicated thatwhen T is administered by the transdermal route, the lack ofthe first pass effect of the liver resulted in minor, if any,decreases in SHBG.

T gel application resulted in mean serum DHT that tripledafter application of 50 mg T gel and rose nearly 5-fold with100 mg T gel treatment. As 5a-reductase is present in non-genital skin (25), the increase in DHT/T ratios in the 100 and50 mg gel groups could be explained by the higher conver-sion in the skin of T to DHT as a result of the large area ofskin surface exposed to T in the gel groups compared withthe very small area of skin exposed to the T patch. IncreasedDHT/T ratios have been observed with the transdermal scro-tal patch, where even greater DHT/T ratios were noted (11–13). DHT is a potent androgen that is not back-convertible toT or aromatizable to E2. Serum levels of T and DHT are notequivalent in all aspects of biological action, but certainlyboth have major actions on multiple androgen-dependenttarget organs. The biological impact of the moderatelygreater increase in DHT after T gel application is unclearother than its additive effect on total androgen action. SerumE2 levels showed small and proportionate increases aftertransdermal T application that may be important for theknown beneficial effects of estrogens on serum lipid levels,vascular endothelium reactivity, and bone resorption.

The biological activity of the T replacement in the hy-pogonadal men was evidenced by the consistent suppressionof serum gonadotropin levels in the patients after transder-mal T applications. The suppression of gonadotropins wasproportional to the serum T levels achieved by the T patchor T gel. The marked and consistent suppression of gonad-otropins observed after T gel 100 treatment suggested thatsuch a modality of T delivery could be used in a male con-traceptive regimen.

All patients were diagnosed to have male hypogonadismby their primary physician. In each of the three treatmentgroups, the same proportion (;30–35%) of subjects had sub-normal serum T levels at screening (assayed at each center’sclinical laboratory), but their average serum T levels over 24 hwere within the normal range when studied at baseline (onanother day and assayed at the central laboratory). Serum Tin a population of men is to a great extent a continuum. Theselection of men that had serum T levels below 10.4 nmol/Lat screening would inevitably allow some subjects to haveserum T above this arbitrarily defined threshold (approxi-mately ,2 SD below the mean for young adult men) onsubsequent measurements. The admission criterion requir-ing a serum T concentration of 10.4 nmol/L or less is arbitraryand necessary for the design of a clinical study; however,there is no definite evidence that there is a threshold level ofT at which biological response changes. The well knownintrasubject variability from day to day and the differencesbetween T assays using different reagents and methodsmight account for this discrepancy between screening andbaseline levels. It is also not uncommon in clinical practicethat on repeat serum T measurements, some hypogonadal

patients would have serum T levels that fluctuate in and outof the statistical normal range. In practice, if symptomatic,many if not most of these men received androgen replace-ment therapy. The situation for assessment of pharmacoki-netic parameters after administration of naturally occurringsubstances (e.g. T) poses different problems from those afteradministration of non-naturally occurring substances in thebody. Ultimate serum levels attained in dynamic closed loopendocrine systems are complex and include integration of Tlevels (with endogenous serum T decreasing while serum Trises from exogenous administration), the characteristics ofthe formulation, the generic and individualized metabolicfactors, and the duration of treatment. Although serum Tlevels attained in the groups with low or normal baselinelevels were different, statistical analyses showed that therelative response to T transdermal treatment was not affectedby the initial value. Thus, inclusion of these subjects did notinfluence the treatment comparison.

We conclude that transdermal T gel application can effi-ciently elevate serum T and free T levels in hypogonadal meninto the mid to upper normal range within the first day ofapplication, achieve steady state within a few days, andmaintain serum T levels with once daily repeated applica-tions. Although serum DHT/T ratios were raised after T gelapplications, these ratios remained within the normal range.Serum E2 levels were increased, and gonadotropin levelswere suppressed in proportion to serum T levels. The phar-macokinetic profile and the dose proportionality observedafter T gel application indicate that this transdermal deliverysystem may provide dose flexibility and serum T levels fromthe low to the high normal adult male range.

Acknowledgments

We thank Barbara Steiner, R.N., B.S.N.; Carmelita Silvino, R.N.; thenurses at the General Clinical Research Center (Harbor-University ofCalifornia-Los Angeles Medical Center, Torrance, CA); Emilia Cordero,R.N. (V.A. Medical Center, Houston, TX); Tam Nguyen (The JohnsHopkins University, Baltimore, MD); Nancy Valler (V.A. Medical Cen-ter, Salem, VA); Janet Gilchriest (V.A. Puget Sound Health Care System,Seattle, WA); Helen Peachey, R.N., M.S.S. (University of PennsylvaniaMedical Center, Philadelphia, PA); Mike Shin and Cheryl Franklin-Cook(Duke University Medical Center, Durham, NC); K. Todd Keylock (TheChicago Center for Clinical Research, Chicago, IL); Brenda Fulham(West Coast Clinical Research, Van Nuys, CA); Shari L. DeGrofft (Urol-ogy Research Options, Aurora, CO); Mary Dettmer (Center for HealthStudies, Cleveland, OH); Jessica Bean and Maria Rodriguez (South Flor-ida Bioavailability Clinic, Miami, FL); George Gwaltney, R.N. (Diabetesand Glandular Disease Clinic, P.A., San Antonio, TX); Peggy Tinkey(Northeast Indiana Research, Fort Wayne IN); Bill Webb (MultiMedResearch, Providence, RI); and Linda Mott (Alabama Research Center,L.L.C., Birmingham, AL) for study coordination, and other support staffof each study center for their dedicated effort in conducting these stud-ies. F. Ziel, M. D. (Kaiser Permanente Southern California) referred manypatients to Harbor-University of California-Los Angeles Medical Centerfor this study. We thank A. Leung, H.T.C.; S. Baravarian, Ph.D.; VinceAtienza, B.Sc.; Magdalene Que, B.Sc.; Joy Whetstone, B.Sc.; StephanieGriffiths, M.Sc.; Maria La Joie, B.Sc.; and Ellen Aquino, B.Sc., for theirskillful technical assistance with many hormonal assays; Laura Hull,B.A., for data management and graphical presentations; and Sally Avan-cena, M.A., for preparation of the manuscript.

References

1. Bhasin S, Gabelnick HL, Spieler JM, Swerdloff RS, Wang C. 1996 Pharma-cology, biology and clinical applications of androgen. New York: Wiley Liss.



2. Wang C, Swerdloff RS. 1997 Androgen replacement therapy. Ann Med29:365–70.

3. Nieschlag E, Behre HM. 1998 Testosterone: action, deficiency, substitution,2nd Ed. Berlin: Springer.

4. Wang C, Swerdloff RS. 1999 Androgen replacement therapy, risks and ben-efits. In: Wang C, ed. Male reproductive function. Boston: Kluwer; 157–172.

5. McClellan KJ, Goa KL. 1998 Transdermal testosterone. Drugs. 55:253–258.6. Jordan WP. 1997 Allergy and topical irritation associated with transdermal

testosterone administration: a comparison of scrotal and non-scrotal trans-dermal systems. Am J Contact Dermat. 8:108–113.

7. Jordan WP Jr, Atkinson LE, Lai C. 1998 Comparison of the skin irritationpotential of two testosterone transdermal systems: an investigational systemand a marketed product. Clin Ther. 20:80–87.

8. Wilson DE, Kaidbey K, Boike SC, Jorkasky DK. 1998 Use of topical corti-costerol pretreatment to reduce the incidence and severity of skin reactionsassociated with testosterone transdermal delivery. Clin Ther. 20:229–306.

9. Yu Z, Gupta SK, Hwang SS, Kipnes MS, Mooradian AD, Snyder PJ, At-kinson LE. 1997b Testosterone pharmacokinetics after application of an in-vestigational transdermal system in hypogonadal men. J Clin Pharmacol.37:1139–1145.

10. Yu Z, Gupta SK, Hwang SS, Cook DM, Duckett MJ, Atkinson LE. 1997aTransdermal testosterone administration in hypogonadal men: comparison ofpharmacokinetics at different sites of application and at the first and fifth daysof application. J Clin Pharmacol. 37:1129–3118.

11. Findlay JC, Place V, Snyder PJ. 1989 Treatment of primary hypogonadism inmen by the transdermal administration of testosterone. J Clin EndocrinolMetab. 68:369–373.

12. Cunningham GR, Cordero E, Thornby JI. 1989 Testosterone replacement withtransdermal therapeutic systems. Physiological serum testosterone and ele-vated dihydrotestosterone levels. JAMA. 261:2525–2530.

13. Nieschlag E, Bals-Pratsch M. 1989 Transdermal testosterone. Lancet.1:1146–1147.

14. Meikle AW, Mazer NA, Moellmer JF, Stringham JD, Tolman KG, SandersSW, Odell WD. 1992 Enhanced transdermal delivery of testosterone acrossnonscrotal skin produces physiological concentrations of testosterone and itsmetabolites in hypogonadal men. J Clin Endocrinol Metab. 74:623–628.

15. Meikle AW, Arver S, Dobs AS, Sanders SW, Rajaram L, Mazer NA. 1996Pharmacokinetics and metabolism of a permeation-enhanced testosteronetransdermal system in hypogonadal men: influence of application site–a clin-ical research center study. J Clin Endocrinol Metab. 81:1832–1840.

16. Brocks DR, Meikle AW, Boike SC, Mazer NA, Zariffa N, Audet PR, JorkaskyDK. 1996 Pharmacokinetics of testosterone in hypogonadal men after trans-dermal delivery: influence of dose. J Clin Pharmacol. 36:732–739.

17. Wilson DE, Meikle AW, Boike SC, Failes AJ, Etheredge RC, Jorkasky DK.1998 Bioequivalence assessment of a single 5 mg/day testosterone transdermalsystem versus two 2.5 mg/day systems in hypogonadal men. J Clin Pharmacol.38:54–59.

18. Arver S, Dobs AS, Meikle AW, Caramelli KE, Rajaram L, Sanders SW, MazuNA. 1997 Long-term efficacy and safety of a permeation-enhanced testosteronetransdermal system in hypogonadal men. Clin Endocrinol (Oxf). 47:727–737.

19. Behre HM, von Eckardstein S, Kliesch S, Nieschlag E. 1999 Long termsubstitution of hypogonadal men with transcrotal testosterone over 7–10 years.Clin Endocrinol (Oxf). 50:629–635.

20. Snyder PJ, Peachey H, Hannoush P, et al. 1999 Effect of testosterone treatmenton bone mineral density in men over 65 years of age. J Clin Endocrinol Metab.84:1966–1972.

21. Snyder PJ, Peachey H, Hannoush P, et al. 1999 Effect of testosterone treatmenton body composition and muscle strength in men over 65 years of age. J ClinEndocrinol Metab. 84:2647–2653.

22. Sitruk-Ware R. 1989 Transdermal delivery of steroids. Contraception. 39:1–20.23. Wang C, Iranmanesh A, Berman N, et al. 1998 Comparative pharmacokinetics

of three doses of percutaneous dihydrotestosterone gel in healthy elderlymen–a clinical research center study. J Clin Endocrinol Metab. 83:2749–2757.

24. Wang C, Berman N, Longstreth JA, et al. 2000 Pharmacokinetics of trans-dermal testosterone gel in hypogonadal men: application of gel at one siteversus four sites. J Clin Endocrinol Metab. 85:964–969.

25. Russell DW. 1993 Tissue distribution and ontogeny of steroid 5a reductaseisozyme expression. J Clin Invest. 92:903–910.

26. Wang C, Swerdloff RS. 1999 Male contraception in the 21st century. In: WangC, ed. Male reproductive function. Norwell: Kluwer; 303–319.



Testosterone Testing...testing (screening without regard to clinical manifestations of androgen...

Documents

Transcript of Testosterone Testing...testing (screening without regard to clinical manifestations of androgen...