Analysis of Complex Genetic Traits in Population - DiVA Portal

52
ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2007 Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 287 Analysis of Complex Genetic Traits in Population Cohorts using High-throughput Genotyping Technology ANDREAS DAHLGREN ISSN 1651-6206 ISBN 978-91-554-7007-4 urn:nbn:se:uu:diva-8291

Transcript of Analysis of Complex Genetic Traits in Population - DiVA Portal

ACTA

UNIVERSITATIS

UPSALIENSIS

UPPSALA

2007

Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Medicine 287

Analysis of Complex GeneticTraits in Population Cohorts usingHigh-throughput GenotypingTechnology

ANDREAS DAHLGREN

ISSN 1651-6206ISBN 978-91-554-7007-4urn:nbn:se:uu:diva-8291

���������� �������� �� ������ �������� � �� �������� ������� � ������������������������������ ��� ������������� ��� ��� ������� �������� !������ �"� ���# ��$%&�� '� �(� ������ ' ���� ' )(����(� *+������ ' ,������-. /(� �������� 0��� ��������� � 1����(.

��������

��(����� 2. ���#. 2������ ' 3����� 4����� /����� � )������ 3(��� �������(5�(���(��� 4������ /��(���. 2��� ����������� ���������. ���������� � ���� ����� � � ������� ���� ������� �� �� ������� � � ����� �6#. 78��. ������. 9 :! ;#65;$577"5#��#5".

,�� (��� ������ �� ��� �������� (��� � ������ ������ ������ ������ ��� �(�� ���. /(� 0�� �������� � �(�� �(���� ����������� ������ ��� (���(� �� �(� ���������� ���� � �������� �������� */��,-. 9 ����� 9 0� ����<�� �0 ����� ��������������(���� * !)�- � �(� /3+#=� ��� �(�� (�� ��� �(0 � �� ��������� 0��(/��,. 2������ 0�� ���'���� � �(� = 2, ������� �(�� ' >$7�� �����. ?� 0������� � ��������� �(� �������� � ���� � �������� �� � ������ � �(�� 0� ���� � ���'��� �(0�� �������� ���0�� �(� ���� ������� �� �������� ������� ������. 9 ����� 990� ����<�� '�� ���� �����'��� � �� ��������� 0��( /��, � � ����50��� �������������. ?� ����<�� !)� � �(��� ���� � �(� = 2, ������� �(�� �� '�� ��������� ���0�� !)� � �(� ��1@ ��� �� ����� ������� �� ����� ������./(� ��� ' ������� 9995A 0�� � �����'� ���� �''����� ���� ������� � ������ ���

(���(�. ��� � �������� ��� ������( � ����� 999� $# ���� 0��� ������� � �(� = 2,������� �(�� ���� !)�. 2 ���������� �������� ' �(� 1 �$ ��� 0��( (���(� 0��'�� �� �'����� �� ����'���� � ����� '�� �(� )9A ������� �(��. 9 ����� 9A��� � ���� ' �(� 4��1 �0� ������� 0� ���'���� ������ '�� ������ ' � ����� ����'� ��� (���(� �(� @5�(�����. :� ����<�� $%## !)� � #6� +���( �0��� 0������� � ���� ����� 87�� ' �(�� ���� 0��( ������ � ��� (���(� � �����. /(������ ����� �(� 4)3% �� )�+8 ���� �(�� (��� �0 ������ � ������� 0��������� ��� (���(� �� �''�����. 9 ����� A ����'���� ������ �� �������� � ��������� (���(� � ����� 0�� '�� '� �(� 3B=$2$$ ���� ���� ������� �(��� '��+���� �� 9�����.

� ������ !)� /3+#=�� ��1@� 3B=$$2$� 1 �$� ��� (���(�� ���� � �������� ����������������� = 2,� ������ ������ ������ ������� ���(���

���� �� ������ �� � ���� �� � � ����� ��� �� �� � �� �� � �!� ��� �� ����������� ������ �"#$%&'% �������� �� � �

C 2����� ��(���� ���#

9 ! $87$58��89 :! ;#65;$577"5#��#5"��&�&��&��&����56�;$ *(���&DD��.��.��D������E��F��&�&��&��&����56�;$-

To my good friends

List of publications

This thesis is based on the following publications, which will be referred to in the text by their roman numerals: I Dahlgren A, Zethelius B, Jensevik K, Syvänen A-C, Berne C.

Variants of the TCF7L2 gene are associated with beta cell dys-function and confer an increased risk of type 2 diabetes mellitus in the ULSAM cohort of Swedish elderly men. Diabetologia 50:1852-1857 (2007)

II Dahlgren A, Zethelius B, Eriksson N, Lundmark P, Axelsson T,

Syvänen A-C, Berne C. Variants in the HHEX gene are associ-ated with biochemical markers for beta-cell function in the UL-SAM cohort. Submitted manuscript

III Dahlgren A, Lundmark P, Axelsson T, Lind L, Syvänen A-C.

Association of the estrogen receptor 1 (ESR1) gene with body height in adult males from two Swedish population cohorts. Submitted manuscript

IV Dahlgren A, Perola M, Liljedahl U, Kaprio J, Spector T, Martin

N, Peltonen L, Syvänen A-C. Finemapping of a QTL for body height on the human X chromosome in a Finnish twin cohort. Manuscript

V Kettunen J, Sammalisto S, Costiander E, Gudbjartsson D,

Dahlgren A, Heikkalinna T, Kaprio J, Heliövaara, M, Peltonen L, Perola M. The COL11A1 gene is associated with human stat-ure in two population cohorts. Manuscript

Published material was reprinted with permission from Springer Science and Business Media.

Supervisor: Ann-Christine Syvänen, Professor Molecular Medicine Department of Medical Sciences, Uppsala University, Sweden

Co-supervisors: Håkan Melhus, Professor Clinical Pharmacology Department of Medical Sciences, Uppsala University, Sweden Markus Perola, Ph.D, M.D. Molecular Medicine National Public Health Institute, Helsinki, Finland

Faculty opponent: Doctor Struan Grant Center for Applied Genomics, The Children’s Hospital of Philadelphia, USA

Review board: Professor Anders Karlsson Endocrinology, Diabetes and Metabolism, Department of Medical Sciences, Uppsala University, Sweden Docent Fredrik Nyström Department of Endocrinology and Metabolism Faculty of Health Science, Lindköping University Hospital, Sweden Docent Ingrid Dahlman Endocrinology unit, Department of Medicine Karolinska Instituetet, Stockholm, Sweden Docent Marju Orho-Melander Diabetes and Endocrinology Research unit, Department of Clinical Sciences, Malmö University Hospital, Lund University Sweden Professor Åke Sjöholm Experimental Endocrinology Department of Clinical Research and Education, Karolinska Institutet, Stockholm, Sweden

Contents

Introduction...................................................................................................11 The human genome ..................................................................................12 Our genes..................................................................................................12 Sequence variations..................................................................................13

Single nucleotide polymorphisms........................................................13 Copy-number variants .........................................................................13

Technology ...................................................................................................15 Polymerase chain reaction........................................................................15 DNA Sequencing......................................................................................16 SNP genotyping........................................................................................17

Hybridization-based techniques...........................................................17 Enzyme-assisted techniques ................................................................18

Genetics ........................................................................................................21 Genetic complexity ..................................................................................21

Monogenic ...........................................................................................21 Polygenic .............................................................................................22

Genetic analysis........................................................................................22 Linkage ................................................................................................22 Association ..........................................................................................23

Present study .................................................................................................25 Overall aim...............................................................................................25 Specific aims ............................................................................................25 Trait and disease.......................................................................................26

Type 2 diabetes mellitus ......................................................................26 Human body height..............................................................................26

Genetics....................................................................................................27 Type 2 diabetes mellitus ......................................................................27 Human body height..............................................................................27

Material and methods ...............................................................................28 Study I-II..............................................................................................28 Study III-V...........................................................................................29

Results and discussion..............................................................................35 Concluding remarks .................................................................................39

Final thoughts ...............................................................................................40

Acknowledgements.......................................................................................41

References.....................................................................................................43

Abbreviations

ASO Allele specific oligonucleotide CNV Copy-number variant ddNTP Dideoxynucleotide triphosphate DNA Deoxyribonucleic acid GWA Genome-wide association HapMap International Haplotype Mapping project HMGA2 High mobility group A2 protein Indel Insertion/Deletion polymorphism IRI Immuno reactive insulin kb kilo base pairs LD Linkage disequilibrium LSO Locus specific oligonucleotide MAF Minor allele frequency OGTT Oral glucose tolerance test OMIM Online Mendelian inheritance in man PCR Polymerase chain reaction PIVUS Prospective Investigation of the Vasculature in Uppsala Seniors RR Relative risk SNP Single nucleotide polymorphism STR Short tandem repeat T2DM Type 2 diabetes mellitus TCF7L2 Transcription factor 7-like 2 ULSAM Uppsala Longitudinal Study of Adult Men

11

Introduction

“Equipped with his five senses, man explores the universe around him and calls the adventure Science” (Edwin Powell Hubble, The Nature of Science, 1954).

The science of genetics can be described as the study of inherited varia-

tion in living organisms. The knowledge that physical traits can be passed on from generation to generation has been known and utilized since humans began growing crops and domesticating animals to improve agricultural production and breeding of livestock. The foundation for the modern scien-tific field of genetics has been attributed to the works of George Mendel. He presented and published a study in the mid 18th century where he looked at variations in plants using hybrids of pea plants [1] introducing the concept of dominant and recessive properties of heritable traits. The significance of his findings were not realized by the scientific community until the early 19th century, but today “Mendel’s law” of inheritance is taught as the first introduction to genetics in schools across the world. The science of genetics has since George Mendel continued to develop and many important strides forward have been taken.

The discovery of DNA as the carrier of genetic information [2] and the following characterization of its now famous double helix structure revealed how this molecule both transmits the genetic information during cell division and as a blue print for all the molecules needed for all functions to create and sustain life [3, 4]. Key technical developments such as the Polymerase Chain Reaction (PCR) and Sanger sequencing has allowed us take genetics in to the molecular era were the first draft sequence of the entire human ge-nome was published in 2001.

With an ever increasing amount of data on the make up of our genome and its variations, the science of genetics is attempting to decipher this in-formation in order to understand how complex patterns of genetic variations affects biological functions that combined with environmental factors deter-mines human traits and influence common diseases.

The work presented in this thesis touches on both these areas. It contains studies aimed at identifying genes underlying one of the most basic biologi-cal traits, namely human body height as well as investigation on how genes influence on type 2 diabetes mellitus which is one of the most rapidly grow-ing common diseases in developed countries today.

12

The human genome The genetic information in each living human cell coded by approxi-

mately 3.1 billion paired nucleotides (adenine to thymine and cytosine to guanine), which form DNA molecules that in humans are organized into 22 paired autosomal and two sex specific chromosomes. The first draft se-quence of the whole human genome was published in 2001 in parallel by the Human Genome Project (HGP, International Human Genome Sequencing Consortium) [5] and the company Celera Genomics [6]. The HGP declared the sequence completed in 2004 [7] when over 99% of the genome sequence had been successfully elucidated . The gaps remaining will most likley be filled in as new sequencing techniques are designed and used.

Our genes Before the first draft sequence was published, there was a lot of specula-

tion on how many protein coding genes the human genome could contain and early guesses ranged all the way from over 100,000 to 35,000 genes [8]. With the genome sequence complete, one of the most current estimates sug-gests that the number will end up in the range of 20,000-25,000 protein cod-ing genes [7]. If one compares this to the ~20,000 genes found in the ge-nome of the famous model organism “Caenorhabditis elegans” (C. elegans) [9] it is not possible to explain the obvious difference between this nematode (roundworm) with a 1mm body length and a fully grown Homo sapiens by the number of genes in the genome. Two mechanisms have been suggested to partially explain how the difference in complexity can be generated with such similar number of genes. The first being alternative RNA splicing where the RNA is modified in different ways after being transcribed using the genomic DNA sequence as template [10]. The different RNA splice variants can then be translated into multiple different proteins with different functions. The second mechanism that can increase the diversity of avail-able genes is regulation of their expression the gene expression creating unique patterns required in different cell types to drive development of tis-sues and organs at specific phases of an organisms development [11].

13

Sequence variations

Single nucleotide polymorphisms Among known types of sequence variation found in the human genome

the single nucleotide polymorphism (SNP) is the most frequently found. A SNP is most commonly defined as a position in the genome were a single nucleotide has been substituted for another and that this change can be seen at least in 1% of a chosen population. The two different nucleotide at the SNP position are referred to as the its two alleles. SNPs can be found throughout the entire genome and to date 11.8 million SNPs have been regis-tered in the dbSNP database (http://ncbi.nih.gov/SNP/, Build 127, September 18, 2007) which is the largest public database for SNPs. Almost half of the SNPs in dbSNP are validated thanks to the efforts of researchers around the world and projects like the International Haplotype Mapping project (www.hapmap.org). Because of their abundance and presence throughout the genome, SNPs are well suited for use as biallelic markers in genetic stud-ies. SNPs have been applied to genetic studies ranging in scope from ana-lyzing variants of a single gene to performing genome-wide analyses to in-vestigate the genetics of complex traits and diseases.

Copy-number variants Another common, but less well known type of variation in the human ge-

nome is made up of copy-number variants (CNVs) that can be subdivided into several groups based on their sizes and amount of different alleles they present. Information about the distribution and frequency of CNVs in the human genome is steadily increasing as new large scale sequence data be-comes available through new re-sequencing technologies that can be com-pared against the human reference sequence [12]. Current research looking into comparing the human and chimpanzee genomes suggest that segmental duplications could have greater effects on genomic change than SNPs mak-ing CNVs important to study from an evolutionary perspective [13].

Short tandem repeats

Short tandem repeats (STR) often referred to as “microsatellites” are short sequences made up of a 2-4 nucleotides that are repeated continuously for different lengths, ranging from below ten repeats to over a hundred. These markers are highly polymorphic and are found in all populations, making them good markers for extracting an ample amount of information using a relatively low number of makers [14]. These properties have resulted in STRs being extensively used as marker for large genetic studies, such as

14

whole genome linkage scans and STRs have also become the most common type of genetic marker used in the field of forensic genetics. One example is the Combined DNA Index System (CODIS, www.fbi.gov/hq/lab/codis) de-veloped by the Federal Bureau of Investigation in th US that uses 13 STR markers to create forensic DNA profiles for identification of individuals.

Insertions and deletions Insertion/deletions (Indels) are most often used to describe a copy-number change smaller that 1kb are often made up of repetitive elements like the well known STRs described previously. Indels kan have multiple alleles when in the form of STR’s, but can also like SNPs be biallelic. Indels have been estimated to make up about 20% of all human DNA polymorphisms [15]. The interest and possibility of studying indels is growing with more human sequences becoming available to allow comparisons with the finished human ref sequence. In the recently published sequence of Craig Venters (founder of Celera genomics) genome over 700,000 indels where identified when comparing the sequences of both copies of his chromosomes to the human reference assembly [16]. Indels have been shown to have similar allele frequency distributions as SNPs in population samples and could thus be used as markers for association studies [17]

Large scale copy-number variants Above the size of indels (>1kb) there exist larger variations involving

segmental duplications. One recent review article estimated based on cur-rent literature that they expected around 100 CNVs with a size above 50kb along with a substantial number of smaller CNVs likes those described above to be found in any individual compared to the human reference se-quence. This suggests that the 99.9% sequence homology proposed between individuals might be an overestimate [12]. CNVs are important to consider when performing SNP genotyping because if a believed base substitution is located in a segmental duplication it can cause a scew in the distribution of the alleles or even give rises to “false” SNPs when a base varies only be-tween duplicated segments but not at one unique position [18].

15

Technology

Polymerase chain reaction To give an overview of the different techniques used to investigate the

DNA molecules in our genome one must start with the polymerase chain reaction (PCR), which has become one of the most significant technological developments for genetic research in our time since it was first introduced in the late 1980’s [19, 20]. The principle of PCR is beautiful in its simplicity and allows exponential amplification of a selected sequence of DNA to cre-ate millions of DNA copies to be used for further analysis. To generate a PCR amplified fragment, two specific primers are constructed to be com-plementary to the ends of the DNA fragment of interest, located so that the 3’ ends of the primers are facing each other. The DNA is denatured using high temperature to break the hydrogen bonds holding the two strands of the DNA molecule together to render it single stranded. Lowering the tempera-ture allows the PCR primes to hybridize to their complementary sequences, and by having a DNA polymerase present along with deoxynucleotides the annealed primers will be extended starting at their 3’ ends. When enough time has passed for the polymerase to extend the selected DNA sequence the temperature is raised again to denature all DNA to single stranded form this process is then repeated in several cycles generating an exponential amplifi-cation as long as the polymerase is viable and there is available deoxynu-cleotides in the reaction mixture. To be able to cycle the reaction without adding new enzyme for each cycle, a heat-stable DNA polymerase origi-nally isolated from the thermophilic bacteria Thermophilus aquaticus [21] is used. This feature has made PCR a cornerstone technology for molecular genetics during the last 20 years. Today PCR is still widely used, but for highly multiplexed analysis required for studies on a genome-wide scale, it has become a limiting factor due to the problem that originate from primer-primer interactions when using many primer pairs in one reaction [22]. Modifications of the traditional PCR design and new PCR-free technologies have been developed to accommodate whole genome analysis [23].

16

DNA Sequencing The premier technique for investigating the genome is DNA sequencing,

where the complete sequence information is determined for the area of inter-est. The gold standard method for doing sequencing is the Sanger’s dideoxy sequencing method [24]. Like PCR, this method utilizes a DNA polymerase to extend a primer annealed to a DNA template, By including dideoxynu-cleotides (ddNTPs) to the reaction mixture, which when incorporated termi-nate the extension process to yield fragments of different lengths. Together these fragments represent the whole length of the targeted sequence. By separating the fragments according to size and fluorescently labeling each of the four ddNTP types with a different fluorophore, the sequenced can be deduced and analyzed. Sanger sequencing has been automated [25] and was used for sequencing by the HGP to determine the human genome sequence. With the human reference sequence available, the focus has now shifted to re-sequencing to examine the complete sequence of a selected area or the whole genome in multiple human samples in future studies. Very recently the first diploid genome sequence from an individual was published. The genome sequenced using Sanger sequencing belonged to the former presi-dent and founder of the sequencing company Celerea Genomics [16].

To lower the costs of sequencing has driven the development of the next generation of sequencing technology. To be able to sequence the whole genome at a cost of 1000 US dollars per individual is now one of the goals for the development of new sequencing technologies [26]. Today there are a few commercially available systems that could possibly in the foreseeable future approach this kind of performance.

One such system has been developed by 454 Life Sciences™ [27] using the Pyrosequencing technique of sequence-by-synthesis. In this system the four nucleotides are added one at the time to the reaction mixture and when a nucleotide is incorporated by the polymerase an enzymatic cascade uses the release of pyrophosphate and generates luminescence and the type of nucleo-tide is recorded to the sequence read [28]. This technology was used this year (2007) to re-sequence the genome of James Watson who is credited as co-discoverer of the structure of DNA. The cost of this effort was said to be two million US dollars and it took two months to complete. For comparison the sequencing cost for the human genome reference sequence has been es-timated to 3 billion dollars. Executives at 454 Life Sciences™ have said that they hope to bring the cost for re-sequencing one genome down to 100,000 US dollars during 2008 [29].

Another commercial system that is working towards the goal of a 1000 dollar genome is the Solexa sequencing system (Illumina Inc.). This system uses a slightly different application of the principle of sequencing-by-synthesis than the 454-system. By using fluorescently labeled nucleotides with reversible termination and labeling moieties, each cycle of the reaction

17

can use all four types of nucleotides. Only one base in is incorporated in each cycle due to the termination properties. The unincorporated nucleotides are washed away and then the fluorescence is the detected and identifies one position in the sequence template. The termination properties and the fluorophore can both be removed by enzymatic cleavage and another cycle can start to determine the next position in the sequence template [30]. Illu-mina Inc. are reported to be planning to use this system to sequence the entire genome of one of the individuals used in the HapMap project [29].

SNP genotyping As described previously SNPs are the most abundant genomic variant

available in the genome. As a biallelic genetically stable variation it is less informative than an STR which is the second most common type of genetic marker used. However the abundance of SNPs allows better coverage of the genome. This is an advantage for whole genome association studies which are now practically and economically possible using SNPs and new genotyp-ing technologies in combination with the increased number of validated SNPs available for assay design.

The methods used for genotyping SNPs can be divided into two main groups based on two main principals used to discriminate the alleles of each SNP.

Hybridization-based techniques Discrimination of SNP alleles based on hybridization is the first of the

two main groups. It utilizes the feature that the strength of the binding be-tween two short complementary strands of DNA is changed by a single mismatched nucleotide. This change in stability of the DNA molecule al-lows separation using different denaturing conditions for example by high temperature or low salt concentration in a washing solution.

The first method using the hybridization principle for genotyping SNPs was published in 1979, and used allele specific oligonucleotides (ASO) for genotyping of DNA from a bacetriophage [31]. This study showed that a single nucleotide mismatched changed the denaturing temperature of the hybridized strands by 10 degrees. Using this principle several types of geno-typing techniques have been developed such as real-time PCR using TaqMan probes [32] or Molecular Beacons [33]. Due to the fact that the hybridization and denaturing conditions are sequence dependent it has proven difficult to achieve high multiplexing levels using hybridization for discrimination SNP alleles [34]. Using specially selected SNP and high density oligonucleotide arrays it is however possible today to determine the alleles of over 900,000 SNPs in one experiment using the Genome-Wide

18

Human SNP array 6.0 from Affymetrix (www.affymetrix.com). Affymetrix uses photolithographic synthesis of ASO probes directly onto a array sur-face, which allow for the construction of very high density arrays [35]. This type of array can have up to 40 different ASO probes with slightly different sequence for each SNP to be genotyped. The SNP site is amplified using PCR and the PCR products are labeled with biotin which can be detected using fluorescently marked streptavidin. Using the combined signal from all probes provides redundancy for the genotype calling to compensate for the sequence dependent issues that arise in highly multiplexed genotyping by hybridization.

Enzyme-assisted techniques Utilizing enzymes to discriminate SNP alleles is the second and largest

group of genotyping techniques used today. Genotyping assays using en-zymes for discrimination are more specific than hybridization assays [36]. Enzymes like DNA-ligases and DNA-polymerases have been used exten-sively. In vivo these enzymes need to be both very specific and to have very low error rates to perform their natural biological functions. DNA-polymerases and DNA-ligases are involved in DNA replication and repair and are highly sensitive to matched and mismatched nucleotides. Utilizing the natural functions of these enzymes has resulted in several different geno-typing assays for highly specific multiplexed genotyping of SNPs.

Ligation assisted assays The function of DNA ligase is to repair breaks in DNA molecules by cre-

ating a phospodiester bonds to ligate the two strands covalently together. It will only do so if the ends of the DNA strands are aligned properly, if there is a mismatched nucleotide pair at the site ligation will not occur. This was utilized by the Oligonucleotide Ligation Assay (OLA) by using two OLA probe sequences that hybridize to the target DNA so that a free 3’ and 5’ are positioned next to each other. If either end is not correctly hybridized due to the allele of the targeted SNP ligation will not occur. In the first publication describing the use OLA, ligation was detected by adding a biotin to one of the probes and a radioactive label to the other. This made it possible to sepa-rate the biotinylated primer from the reaction mixture using streptavidin. If a radioactive signal can be detected a perfect probe match was present in the DNA and ligation occurred joining the probes. Based on knowing the probe sequences the genotype of the SNP can be determined [37]. Since its inven-tion OLA has been refined by development of new types of probes and de-tection schemes. The use of padlock probes is one good example. A pad-lock probe is a single linear probe with specific recognition sequences for the selected target DNA sequence in its ends. When these successfully ligated, the probe circularized [38]. A commercialized version of padlock

19

probes are molecular inversion probes. In molecular inversion probes uni-versal primer sites are added to the probe sequence allowing for highly mul-tiplexed PCR amplification. A tag sequence is also added that allows the probes to be sorted by hybridization on microarrays for analysis. The circu-larization of a probe that finds a perfect match is utilized by adding exonu-clease to the reaction after ligation which will destroy any linear probe in the reaction leaving only the circular probes for amplification and sorting [39]. Molecular inversion probes have been used successfully to type 12,000 SNPs simultaneously in one multiplexed reaction [40] and the limit of possi-ble multiplexing has yet to be determined.

Polymerase assisted assays DNA polymerase is the key enzyme for both PCR and Sequencing and its

ability to assemble a double stranded DNA molecule using a single strand template can also be utilized for SNP genotyping.

Single nucleotide primer extension In a single nucleotide primer extension reaction an oligonucleotide primer

is designed so that its 3’ end hybridizes to the nucleotide adjacent to the SNP site. The DNA polymerase will then extend the primer over the SNP site enabling determination genotype of the sample. This method of genotyping originally called minisequencing [41] has since the original publication be-come known by many different names such as single base extension [42] and single nucleotide primer extension [43]. As with the names there are many different single nucleotide primer extension assay formats for performing SNP genotyping of individual SNPs and in multiplexed formats, with prim-ers in solution, immobilized or sorted on microarrays with detection by sev-eral labeling strategies (Table 1).

Assay format Detection method

Singleplex genotyping by template di-rected incorporation in microtiterplates Fluorescent polarization [44]

Multiplexed primer extension MALDI-TOF detection [45]

Multiplexed genotyping by immobilized primers on microarrays.

Radioactively labeled ddNTPs [36] Fluorescently labeled ddNTPs [46]

Multiplexed tag-array minisequencing Fluorescently labeled ddNTPs [47]

Table 1: Different variants of single nucleotide primer extension

20

Single nucleotide primer extension is available in several commercial ap-plications for SNP genotyping. The GenomeLab™ SNPstream® system from Beckman & Coulter is designed for flexible medium to high throughput genotyping [48]. This system analyzes 12 or 48 SNPs per experiment using tag-array minisequencing with two color fluorescent detection on a 384-well microtiter formatted microarrays. For whole genome sized SNP genotyping the Infinium II assay developed by Illumina® uses single base extension [49]. Using this assay and the humanhap650Y genotyping beadchip over 650,000 SNPs can be typed in a single experiment [50]. In July of 2007 the human 1M beadchip also using the Infinium II assay was released as the first commercially available application for genotyping more than one million SNPs in one experiment.

Allele specific primer extension To use allele specific primer extension is another way to utilize the func-

tion of DNA polymerase for SNP genotyping. In this method allele specific oligonucleotides (ASO) primers are designed so that the nucleotide at the 3’ end will hybridize to the SNP position in the target sequence. Two ASOs are needed to determine the genotyped of a SNP. If an ASO primer se-quence has a complete match to the target sequence the DNA polymerase will be able to extend the ASO primer. By detecting the extended ASO the genotype can be determined. One way to detect extension used in early ap-plications of ASO was to run a PCR reaction use an ASO primer as a one of the two PCR primers. If the ASO primer fully matched the PCR reaction would amplify the target sequence and the PCR product could easily be de-tected using a standard agarose gel [51]. Several alternate detection methods have been used with ASO primers and one of the latest adaptations of this reaction principle is the GoldenGate genotyping assay from Illumina® [52]. This assay makes use of both DNA polymerase and ligase to determine SNP genotypes. If the ASO primer is a match it will be extended by the poly-merase until it reaches a locus specific primer (LSO) that stops the exten-sion. In the next step the extended ASO primer is ligated to the locus spe-cific primer. Using universal PCR primer sequences contained in both the ASO and LSO the ligated product can be amplified and later sorted on to a bead array [53] using a tag sequence in the LSO. The GoldenGate assay can be used for flexible genotyping of 1536 SNPs per sample in one experiment.

21

Genetics

Genetic complexity All human traits and diseases that have a heritable component can roughly

be divided into two major groups according to the genetic complexity under-lying the trait or disease in question.

Monogenic Monogenic traits and diseases follow the Mendelian patterns of dominant

or recessive inheritance. Diseases with Mendelian inheritance in humans have traditionally been identified and studied by finding families with multi-ple affected members. By examining how the trait or disease is passed on through the generations in families the mode of inheritance can be deter-mined. For a trait or disease to be defined as having Mendelian inheritance it must show either a dominant or recessive pattern in families. Current sci-entific information about diseases showing Mendelian inheritance are cata-logued in the “Online Mendelian Inheritance of Man” (OMIM) database (www.ncbi.nlm.nih.gov/omim/). OMIM also catalogues genes that have been indicated to affect phenotypes and diseases with Mendelian inheritance. The first monogenic disease to have its causative gene identified was cystic fibrosis. The CFTR gene was identified using linkage-based analysis fol-lowed by positional cloning and the most common mutation causing cystic fibrosis ( F508) was identified [54]. Today more than 1000 mutations have been described in the CTFR gene, with the F508 being the most common in cystic fibrosis patients. It is worth to note that even if a disease shows Mendelian inheritance and is referred to as monogenic, there will be interac-tions with other genes that result in differences in severity of disease for patients with the same mutation in the causative gene [55]. Relatively few traits and diseases have been defined as monogenic, and consequently the vast majority of human traits and diseases with heritable components are polygenic.

22

Polygenic Polygenic traits are most often referred to as complex genetic traits in cur-

rent literature. The name indicates that in contrast to monogenic traits they are influenced by more than one gene and do not display an obvious pattern of Mendelian inheritance in families. The majority of human traits and common diseases that have a heritable component have a complex genetic makeup[56]. The most frequently used method for determining the heritable component of a trait or disease that does not show Mendelian inheritance is to study twins. By comparing to what extent monozygotic twins share a phenotype compared to dizygotic twins, provides a good a first estimate of the heritable component affecting the trait or disease being studied [57].

Genetic analysis To find the genetic components of any trait or disease two main methods

of genetic analysis have been applied. They are called linkage and associa-tion analysis. Both have in common that they utilize known genetic markers like microsatellites or SNPs and share the common purpose to find one or more markers that are correlated to the genetic loci that makes up the herita-ble component of the trait of interest.

Linkage The process of recombination is where DNA segments are exchanged be-

tween paired chromosomes during meiosis. The average number of recom-bination events is around 38 for females and 24 for males during meiosis [58]. Recombination is a key mechanism for generating genetic diversity and gives rise to genetic linkage that can be used to map loci linked to a trait or disease. The frequency of recombination between loci is related to the distance between them. Loci that are closer to each other are inherited to-gether and are said to be in linkage with each other.

To perform linkage analysis is to measure how a known genetic marker is co-inherited with the trait or disease of interest in families. Traditionally in linkage analysis microsatellites have been used as genetic markers. Link-age analysis tracks the recombination in family materials to locate causative loci. The analysis is then limited by the number of meiosis available in the family material which depends on the size of the family material and the number of generations represented. This limitation can result in poor genetic resolution of the markers used finding linkage being detected between a marker and the locus of interest even if they are several mega bases (Mb) apart. Genetic linkage studies in family materials have been very successful in identifying genetic loci linked to traits and diseases with Mendelian in-

23

heritance [59]. Using linkage analysis to identify genetic components of common diseases has not been nearly as successful [60].

Association A population cohort sample can be described as a very large pedigree

where the family information is unknown, but where it can be assumed that all share common ancestry going back far enough in time. This means that in a pedigree of unknown structure there has been thousands of recombina-tion events that have taken place since the beginning of the common ances-try. In order for this assumption to be valid it is important to ensure that the ethnicity and geographic origin of the samples are matched as far as possi-ble. If not the problem of population stratification arises where population subgroups are present in sample which can cause differences in marker allele frequencies, that in turn can result in false positive findings of association [61]. Association analysis examines the end result of all the recombination events in the population which provides higher genetic resolution compared to traditional linkage analysis that is limited by the number of generations available in a family material with a known pedigree.

The classical set up for studying association is a case-control study using SNP markers. It compares the allele frequencies in a group of “cases” that have a disease of interest for example type 2 diabetes examined in this the-sis. This group is then compared to a group of control samples from the same population that does not have the disease. Recent technological ad-vancements and efforts such as the HapMap project have now made ge-nome-wide association studies a reality [60]. This year (2007) several such studies have been published with findings of previously unknown genes for body height and type 2 diabetes (see Present study).

Linkage disequilibrium The connection between genetic markers and genetic loci utilized in asso-

ciation analysis is called linkage disequilibrium (LD). LD is described using two types of statistical measurements called D´ and r2. The measurement of D´ estimates the number of recombination events that have occurred be-tween the two loci and the value ranges from zero to one. If D´ equals one for two loci, they are considered to be in complete genetic linkage, meaning that they are inherited together and that no recombination has occurred be-tween them in the population analyzed. However the marker alleles can still have different frequencies in the population. For SNPs this is caused by original mutations that have occurred at different time points in the popula-tions genetic history. The value of r2 is therefore used to describe the corre-lation between two loci where r2 equals one means that the marker alleles have the same frequencies. This correlation can be used to select the most

24

informative SNPs referred to as tagSNPs in order to maximize cost-benefits in an association study [62].

Haplotypes The LD measurements can be used to construct haplotypes formed by

SNPs. A haplotype is defined as a set of distinct genetic loci that are linked on the same chromosome, that are inherited together. After the completion of the human reference sequence it was a logical next step to start mapping the variation in the genomic sequence and using this data determine the hap-lotype structure of the entire genome. The HapMap project was initiated to attempt to accomplish this task. The project was started in 2002 and set out to genotype more than one million SNPs in populations selected to represent all major population groupings in the world [63]. The results of this effort were published in 2005 [64] and continued with a second phase of genotyp-ing that were officially completed this year (2007) [65]. The HapMap pro-ject has in total genotyped around 6.8 million SNPs and the data is publicly available making it an invaluable asset for all research involving SNP geno-typing.

25

Present study

Overall aim To use analyze the genetics complex genetic make up of type 2 diabetes mellitus and human body height.

Specific aims � To replicate previous findings for the association between the TCF7L2

gene and T2DM originally identifed by by linkage analysis in an Ice-landic population. (Study I)

� Test for association between TCF7L2 and T2DM specific quantitative biochemical markers in the ULSAM population cohort. (Study I)

� To replicate association originally identified in a genome-wide associa-tion study in patients from the French population between variants in the LOC387761 loci and the HHEX, SLC30A8 and EXT2 genes for T2DM in the Swedish population. (Study II)

� Test for association between SNPs in the LOC387761 loci and the HHEX, SLC30A8 and EXT2 genes for T2DM (Study II) and T2DM spe-cific quantitative biochemical markers in the ULSAM population cohort. (Study II)

� Identify candidate genes and analyze the association of SNPs in them with body height in the ULSAM cohort. (Study III)

� Fine map a region with known linkage to body height on the X-chromosomes using SNPs. (Study IV)

� Analyze SNPs in the four functional candidate genes COL1A11, CSF1, ALX3 and EPS8L3 for association and linkage with to body height. (Study V)

26

Trait and disease Type 2 diabetes mellitus

Type 2 diabetes mellitus (T2DM) is a metabolic disease characterized by insulin resistance and/or abnormal insulin secretion resulting in hyperglyce-mia. The diagnostic criteria according to the World Health Organization is fasting plasma glucose � 7.0 mmol/l or � 11.1 mmol/l measured two hours after a oral glucose tolerance test (OGTT) [66]. Left untreated T2DM will result in sever complications due effect of chronic hyperglycemia. The complications include an overall increased risk for cardiovascular disease, retinopathy that can lead to blindness and nephropathy that progress until the kidneys fail completely.

During the last century there has been a dramatic increase in the incidence of T2DM world wide, to the point that T2DM is referred to as an epidemic [67]. It is rapidly becoming one of the largest common diseases in the world. Recent projection predicts that by the year 2050 there could be as many as 48 million individuals diagnosed with diabetes in the U.S. alone [68]. The dramatic rise in T2DM is mainly attributed to changes in human behavior and lifestyle leading to increased obesity [69]. The best way to deal with this epidemic is prevention and several studies have clearly shown that lifestyle intervention can have great success in preventing the develop-ment of T2DM in subjects with impaired glucose tolerance (IGT) which is a pre-stage to full T2DM [70, 71].

Human body height Standing body height is one of the most basic human quantitative traits.

The heritability of body height has been extensively examined and high heritability is well known. For adult body height the heritability estimates ranges from 68-93% [72, 73]. Beside genetic influences, height is also af-fected by many environmental factors, of which nutrition and health care are important. Understanding the genetic components of normal variation in body height would not only provide important insight into basic human biol-ogy, but could also serve as a model for future investigations of other com-plex genetic traits.

27

Genetics Type 2 diabetes mellitus

T2DM is a complex genetic disease being influenced by several genes most of which are still unknown. Until last year (2006) only the PPARG [74] and KCNJ11 [75] genes had been convincingly identified as having an effect on the risk for T2DM. In January of 2006 the association between a variant located in the TCF7L2 gene and increased risk (RR=1.56) for T2DM identified in was published [76]. The TCF7L2 variant was discovered when Grant and colleagues in the Icelandic DECODE group performed a genetic fine mapping of a region on chromosome 10 originally identified by linkage analysis, using an additional high density set of microsatellite markers. This region had been reported to be linked to T2DM in Mexican Americans [77] as well as having shown suggestive evidence of linkage in an Icelandic population [63] . Grant and colleagues also replicated the association ini-tially found an Icelandic population by replicating it in Danish and American populations. They also genotyped a number of SNPs in the TCF7L2 gene and found one (rs12255372) in almost complete LD (r2=0.95) with the origi-nal microsatellite marker (DG10S478). They suggested that the SNP rs12255372 and one other SNP rs7903146 should be included in any future replication efforts by other research groups. These results prompted us to initiate study I presented in this thesis. Since the original publication nu-merous population cohorts from all around the world including Europe, America, Asia and West Africa have been analyzed to replicate the associa-tion TCF7L2 and T2DM [78-96]. Additional support for the TCF7L2 as risk factor for T2DM has been provided by seven genome-wide association stud-ies (GWAS) published during 2007 [91, 97-101]. The first of these studies was performed by Sladek and colleagues and in addition replicating the TCF7L2 association they also showed associations to T2DM for the LOC387761 loci and the HHEX, SLC30A8 and EXT2 genes. These find-ings prompted us to initiate study II.

Human body height Body height is a classic complex genetic trait. The results from a large

number of genome-wide linkage studies indicate that there must be multiple genes controlling body height each with relatively small effect. This in part explains the absence of findings genes with a clear and reproducible link with for body height. Loci on all autosomal chromosomes except for 10, 16, and 19 and Y-chromosome have been suggested to be linked to body height [102-120]. Up until this year only findings on chromosomes 3,5,6 and 7 had been suggested in more than one study [112]. The best candidate gene so far

28

for body height is the HMGA2 gene that was only recently found in a ge-nome-wide association study. A common variant in this gene showed con-vincing evidence for association and it was subsequently replicated in sev-eral population cohorts in the same study [121].

Material and methods

Study I-II Uppsala longitudinal study of adult men (ULSAM)

The ULSAM population cohort was collected as part of an investigation on diabetes and cardiovascular disease in adult men (www.pubcare.uu.se/ULSAM). The study was initiated in 1970 when all men born between 1920 and 1924 and residing in Uppsala county in Swe-den were invited to a health survey, in which 2,322 men participated [122]. The participants have subsequently been invited for follow-ups every 10 years with the last follow-up study completed in 2005. At the follow-up study conducted when the participants had reached 70 years of age, blood samples for extraction of the DNA samples analyzed in study I and II were collected (n=1,142) [123]. The ULSAM cohort has been extensively charac-terized for studying T2DM using biochemical and clinical approache. The euglycaemic–hyperinsulinaemic clamp technique [124] considered to be the gold standard for determining insulin sensitivity was used to calculate the insulin sensitivity index (M/I). Using the M/I value to adjust for insulin sensitivity in the body allows examination of the actual �-cell function very precisely. Several other key biochemical markers related to T2DM have been measured in the ULSAM cohort. They include immunoreactive insulin (IRI) during a oral glucose tolerance test (OGTT), fasting intact and 32–33 split proinsulin and specific insulin [125].

Genotyping In study I and II genotyping was performed using a homogeneous single

base extension assay with fluorescent polarization detection using in-house reagents [44]. Fluorescence polarization was recorded in a fluorometer (Analyst AD; Molecular Devices, Sunnyvale, CA, USA). Genotyping of two SNPs in study I resulted in a sample success rate of 98% and 100% geno-type reproducibility based on >100 duplicated genotypes from independent experiments. For the SNP genotyped in study II the sample success rate was 99% and genotype reproducibility 100 %.

29

In study II ten SNPs were genotyped using the GenomeLab™ SNPstream® system [48] (Beckman Coulter, Fullerton, CA, USA) in one multiplexed reaction. The sample success rate was on average 99% and genotype reproducibility was >99%. Primer design was performed using the Autoprimer.com primer design tool (www.Autoprimer.com, Beckman Coul-ter) for both genotyping methods. The genotypes of all SNPs conformed to Hardy-Weinberg equilibrium according to a chi-square test (p>0.05). Dur-ing assay design all SNP alleles showed gave correct inheritance when typed in a reference family material.

Statistical analysis All statistical analysis in study I and II were performed using SAS version

9.1 (SAS Institute, Cary, NC, USA). Correction for multiple testing was performed by calculating the number of tests taking into account the LD between the SNPs [126]. In study I that resulted in a required overall critical p-value of 0.033 for significance. In study II the overall critical p-value was 0.007.

In study I, logistic regression analysis was used to test for association be-tween SNPs and T2DM. For the quantitative biochemical measures linear regression analysis was performed after excluding subjects with type 2 dia-betes. Age was used as a covariate in all analyses, and during for biochemi-cal measurements both insulin sensitivity and BMI were used alternately as covariates. Significant findings were further analyzed using a two-tailed t-test.

In study II power was estimated for detection of association with T2DM assuming a dominant model of inheritance. It was found to be 80% for a SNP with a minor allele frequency of 25% using a cut off p-value of 0.05 for an odds ratio of 1.4. Association analysis for T2DM was performed using a Chi-square test. Analysis of quantitative biochemical measures was per-formed using ANOVA, univariate linear regression and multiple linear re-gression analysis was performed after excluding subjects with type 2 diabe-tes mellitus, using either age and insulin sensitivity index (M/I) or age and BMI as covariates.

Study III-V

Study design Study III used a candidate gene design study to analyze body height.

First 17 genes were selected (see Table 1, Paper 3) all with different connec-tions to body height or growth according to the current literature. By geno-typing a smaller number of SNPs distributed across the selected genes we screened for association with body height. When a suggestive association

30

was found in the ESR1 gene, we genotyped the gene further to detect more strongly associated polymorphisms. We also replicated our analysis of the ESR1 gene by genotyping it in a second population cohort, were we found a significant association.

In study IV we used SNPs to performe fine mapping of a linked locus for body height on the X-chromosome identified by a combined analysis of sev-eral genome-wide linkage scans [120].

Study V combines fine mapping and candidate gene design. Four func-tional candidate genes for body height were selected from a genomic region on chromosome 1p21. This region had been shown to be linked to body height in a previous genome-wide linkage study [112]. After initial analysis only the COL11A1 showed convincing evidence of linkage and association to body height. Thus further genotyping and analysis was only done for COL11A1 to further investigate and replicate the initial findings using asso-ciation in a Finnish and Icelandic population cohort.

SNP selection Study III

The principle for SNP selection was to cover the candidate genes, includ-ing exons and introns using SNPs with minor allele frequencies > 0.05 at an even spacing of 1,5 kb to 12.5 kb, depending in the size of the genes. In ad-dition, an Illumina design score of 0.5 was used as the lower limit for select-ing of a SNP for genotyping using the Illumina GoldenGate assay[52]. 174 SNPs found in the dbSNP database were selected (see Table 1, Paper IV).

An additional panel of 33 tag-SNPs was selected to investigate the genetic variation of the ESR1 gene further. Selection was done the Haploview soft-ware [127].

Study IV We designed a genotyping panel for fine mapping the region on chromo-

some 1p21 using the Illumina® GoldenGate™ assay by which 1536 SNPs can be genotyped in parallel. The design scheme for the genotyping panel was centered it with respect to the peak micro satellite marker (DXS1047) identified previously [120] and use an average physical spacing of 5kb on average between SNPs.

The SNP were selected manually, assisted by a computer script which highlighted available SNPs that fulfil the basic selection criteria of validation status and spacing at 5kb distance flanking the DSX1047 marker. There are three levels of validation status given in the SNP information file created by Illumina for the user.

31

1. Top level validation is GoldenGate™ validation status, meaning that a SNP has been genotyped before by Illumina in-house with their GoldenGate™ assay.

2. Second level is called Two-hit validated meaning that the SNP has been genotyped and reported by two different methods and in two popula-tions.

3. Third level of validation and last choice for selecting SNPs for our fine mapping panel was Non-validated meaning that a SNP has only been re-ported by one method and in one population in the databases.

When selecting between available SNPs with the same level of validation

the Illlumina® SNP design score was used as the decisive criteria. The SNP design score uses a proprietary algorithm to estimate the probability of de-signing a GoldenGate™ assay for any SNP. A GoldenGate validated SNP has a SNP design score of 1.1. Selecting SNPs with a score above 0.6 was recommended, and was always used if possible regardless of validation status. If validation status and SNP design score were equal the minor allele frequency (MAF) was considered, and SNPs with a MAF>0.05 in a Euro-pean population were preferred.

The original design containing 1536 manually selected SNPs had an aver-age spacing of ~6kb and covered approximately 9.3Mb of the QTL region (Figure1, Paper IV).

Study V Using SNPs from the HapMap database (www.hapmap.org) tagSNPs

were selected to capture the known variation in the functional candidate genes. Forty eight SNPs were selected and genotyped in total. HapMap build #16 was used to select the 25 tag SNPs and one additional SNP that is non-synonymous in COL11A. HapMap build #18 was used to select the 22 tag SNPs in CSF1, EPS8L3, and ALX3.

Sample cohorts Study III

The initial genotyping of the candidate genes was performed in the UL-SAM population cohort (see earlier description). To further investigate our findings in the ESR1 gene, we utilized genotype data from the Prospective Investigation of the Vasculature in Uppsala Seniors (PIVUS) population cohort. The PIVUS cohort consists of 1016 participants, with 507 males and 509 females of age 70, and was originally collected to study endothelial functions [128]. Both population cohorts are from the Uppsala region in Central Sweden.

32

Study IV The genotyped sample cohort used for genotyping consisted of 780 Fin-

nish twin samples from the Finnish twin cohort study [129]. This cohort was part of the samples used for the combined analysis that identified the linked region to be fine mapped on the X chromosome [120].

Study V Two of the four original cohorts used to identify the initial linkage of

stature[112] on chromosome 1p21 were used for fine-mapping in this study. These cohorts were initially ascertained for familial combined hyperlipede-mia (FCHL) and familial low HDL-cholesterol. The detailed characteristics and the ascertainment protocol for these 54 families are described in their respective original articles[114, 130, 131]. Some additional family members of these families recruited since the original linkage studies were included, as well as an independent set of 38 Finnish families for replication purposes.

In addition to the family samples used for gene identification we also verified the association in an unselected population sample drawn to repre-sent the Finnish population from the Health 2000-cohort, which is a repre-sentative sample of the Finnish population that has been collected in the year 2000 to be used to study a wide range of issues related to public health. Samples were also randomly selected from the ATBC Study that originally was a collected to investigate if �-tocopherol and ß-carotene supplements reduce the incidence of lung cancer in Finland. More detailed information about the Health-2000 and ATBC studies can be found at www.nationalbiobanks.fi. Finally a set of Finnish dizygotic twins, were used as part of creating a sample cohort representative for the Finnish popu-lation.

Genotyping Study III

The 174 SNPs selected for the candidate genes were genotyped in the ULSAM cohort using the GoldenGate assay [52] and the Illumina BeadAr-ray system (Illumina, San Diego, CA, USA). The assay success rate for the original assays was 79%. The failed SNP assays were caused by 25 SNPs that were not morphic in the ULSAM cohort. Another 10 SNP assays were excluded due to sample call rates below 90% and 2 SNPs for which the genotype distribution deviated from Hardy-Weinberg equilibrium were also omitted. The sample success rate for the SNP assays that passed quality con-trols were 96.3% and the reproducibility of genotyping was 99.98%.

33

For the ESR,1 gene 25 SNPs were genotyped in the PIVUS cohort using the GoldenGate™ assays with an average call rate of 99.5% and reproduci-bility of 99.8% based on duplication of 2% of the genotypes.

The additional panel of 33 selected tag-SNPs were genotyped in the UL-SAM cohort using the SNPstream™ genotyping system (Beckman Coulter, Fullerton, CA, USA) [48]. Three SNPs from the original genotyping were also included for quality checking between the two genotyping systems.

The sample success rate for working assays was 92%. Genotype repro-ducibility between the Illumina and GenomeLab SNPstream systems was 99.7%

In total 47 SNPs in the ESR1 gene were successfully genotyped in the ULSAM cohort 25 of them were also genotyped in the PIVUS cohort.

Study IV Genotyping was done using the GoldenGate™ assay and Illumina Bead

array system (Illumina, San Diego, CA, USA) [52]. After testing and quality checks the final working genotyping panel consisted of 1377 SNPs giving an assay success rate of 90%. The sample success rate was 93% and reproduci-bility of the genotypes was 99.9% .

This panel of working assays had an average spacing of ~7kb and cover-age remained around 9.3Mb.

Study V The SNPs in COL11A1 were genotyped using homogenous Mass Exten-

sion reaction of the MassARRAY System (Sequenom, San Diego, Califor-nia, USA). Tag SNPs in CSF1, EPS8L3, and ALX3 were genotyped using iPlex assay of the MassARRAY System (Sequenom, San Diego, California, USA). The sample success rate for the 24 working SNPs assays in the COL11A1 was between 90-99% in the initial genotyping. These samples were genotyped at the Finish Genome Center.

The Finnish twin samples were genotyped in Uppsala using 9 of the 24 SNPs genotyped previously the COL11A gene 9. Genotyping was per-formed using the GenomeLab™ SNPstream® system (Beckman Coulter, Fullerton, CA, USA). The sample success rate for the SNPs ranged from 97-99.6%. Genotype reproducibility ranged from 98.8-100% based on 13.45% of the samples being genotyped in duplicate.

Statistical analysis

Study III The statistical analysis was performed using the free statistical software

environment “R” [6]. Analysis of variance (ANOVA) was performed to test for association between SNPs and body height. Replicated nominally sig-

34

nificant results from ANOVA (p<0.05) were tested using a Wilcoxon rank sum test on body height according to genotype distribution. In the PIVUS cohort, males and females were analyzed separately because of known dif-ferences in heritability of body height. The Haploview software “Tagger” was used for SNP selection and to estimate the amount of SNP variation captured by the panel of SNPs.

Study IV The Merlin statistical software package [132] was used to perform non-

parametric linkage analysis for genotyped SNPs and body height. Analysis was performed using both multi-point and single-point variance component linkage analysis. Age and sex were used as covariates. Males and females were also analyzed separately using age as covariate.

We performed association analysis using QTDT [133], Mendel software suit [134] and a prototype X-chromosome module for Merlin to test all SNPs for association with body height. Analysis was performed using the same groupings and covariates as was done for the linkage analysis.

All computational work was performed using the Linux cluster located at the Genome Informatics Unit at Biomedcium in Helsinki (www.giu.fi).

Study V Prior to genetic analyses the phenotype distributions were examined with

SPSS 14.0.1 (SPSS, Chicago, IL). Individuals less than 23 years old was excluded since they may still be growing. Also outliers (> 3 SD from the sex-specific mean) were removed prior to genetic analyses because they may have an undue impact in the subsequent analyses. Variance components linkage analyses were performed using MERLIN[132] and family-based association analyses using MENDEL[135]. The significant association found for one SNP the population sample was examined with SPSS using analysis of covariance (ANCOVA). In all population analysis the region of residence was used as a covariate due to the population stratification seen in the Fin-nish population sample. Correction of multiple comparisons in the family-based analyses was performed by the method proposed by Li and Ji, which takes linkage disequilibrium to account [136].

35

Results and discussion Study I

We successfully replicated the association for both SNPs in the TCF7L2 gene with T2DM (Table 2). Table 2: Association analysis of SNP rs12255372 and rs7903146 genotypes with type 2 diabetes at age 70 SNP Non-diabetic Diabetic p-value Odds ratio (95% CI) rs7903146 CC 496 (0.56) 67 (0.40) CT vs CC CT 327 (0.37) 83 (0.49) 0.0006 1.88 (1.32–2.67) TT 62 (0.07) 18 (0.11) TT vs CC 2.15 (1.20–3.85) T allele 451 (0.25) 119 (0.35) 0.0002 C allele 1,319 (0.75) 217 (0.65) rs12255372 GG 498 (0.56) 73 (0.44) GT vs GG GT 327 (0.37) 81 (0.48) 0.011 1.69 (1.20–2.39) TT 63 (0.07) 14 (0.08) TT vs GG 1.52 (0.81–2.84) T allele 453 (0.26) 109 (0.32) 0.0085 G allele 1,323 (0.74) 227 (0.68) Logistic regression was performed to test for association with T2DM. Diabetic subjects were diagnosed with T2DM diabetes mellitus in accordance with WHO criteria [66].

Analysis of the quantitative biochemical markers in ULSAM showed an at the time novel association to proinsulin. We found that the risk allele of both SNPs were associated with elevated levels of proinsulin in plasma using M/I as covariate (rs7903146 p=0.005 and rs12255372 p=0.004). This find-ing in supports the findings of Loos and colleagues who reported elevated proinsulin levels associated with the risk allele of rs7903146 in the TCF7L2 gene [137]. It should be noted that when we performed the analysis using BMI to adjust for insulin sensitivity as was done by Loos and colleagues we did not see an association (rs7903146 p=0.18 and rs12255372 p=0.12).

Our findings suggest that the increased risk for T2DM conferred by the TCF7L2 variants is caused by dysfunction in the production of insulin in the �-cell. Adjusting for insulin sensitivity measured using the euglycaemic–hyperinsulinaemic clamp technique, we are able to compensate for the oth-erwise significant association between insulin resistance and elevated fasting plasma proinsulin. This adjustment enabled us to distinguish the elevated proinsulin levels associated with impending �-cell failure from elevated lev-els caused by insulin resistance in the body.

36

By using the longitudinal data available in the ULSAM cohort we also analyzed early insulin response in a subset of the cohort with data from a intravenous glucose tolerance test performed at the first investigation at age 50 comparing it with the OGTT performed at age 70. This analysis showed a significantly lower acute insulin response in carriers of the high-risk allele of SNP rs7903146. This finding emphasizes the association between TCF7L2 genetic variants and first-phase insulin release [138-140]. The re-sults highlight the importance of impaired insulin secretion, which occurs decades before the clinical onset of T2DM diabetes mellitus in individuals at increased genetic risk [141].

The mechanism behind the increased risk for T2DM associated to TCF7L2 is currently investigated by many research groups. It is known that TCF7L2 is a part of the so called WNT pathway [142] that in turn affects the regulation of glucose homeostasis trough GLP-1 [143]as well as �-cell pro-liferation [144]. A recent study published by Lynsenko and colleagues showed that the risk allele of TCF7L2 was associated with impaired insulin secretion and the overexpression of TCF7L2 appears to reduce glucose stimulated insulin secretion [145]. These finding can be the first real steps to understand how the TCF7L2 gene causes an increased risk for T2DM.

Study II Due to lack of statistical power to detect small genetic effects in the UL-

SAM cohort we were not able to replicate the association for the LOC387761 loci and the HHEX, SLC30A8 and EXT2 genes with T2DM. Although not significant our result showed similar odds ratios as was pub-lished by Sladek and colleagues [97].

Analyzing the biochemical marker resulted in significant associations for several measures related to insulin secretion for all three SNPs genotyped in the HHEX gene (see Table 1, Paper II). Immunoreactive insulin concentra-tions both in the fasting state and at 30 minutes after an oral glucose load were significantly associated with the SNPs in the HHEX gene, which was also the case for the measurement of acute insulin response. Our findings of association to impaired first phase insulin response provides biochemical support to the association found between HHEX and T2DM by GWAS [97, 98, 101].

The function of the HHEX in relation to T2DM gene has been reported to play a central role in both hepatic and pancreatic differentiation in mouse models for of development [146, 147]. The expression of HHEX in part regulated by the WNT-signaling pathway [148] as was previously described also for the TCF7L2 gene (see Study I). We could not find any evidence for association between the HHEX gene and elevated proinsulin levels analo-gously to our findings for TCF7L2. This suggests that TCF7L2 and HHEX

37

gene although both being part of the WNT signaling pathway affect the risk of T2DM via alternate mechanisms.

Study III We found four SNPs in the ESR1 gene that showed nominally significant

association signals (p<0.05). Based on this finding we selected the ESR1 gene for further study. We performed the same analysis for 26 SNPs in the ESR1 in the PIVUS cohort and found a male specific association to body height (p=0.0056). The associated SNP rs2179922 is located in intron 4 of ESR1. The difference in height can be seen below.

Mean body height1 according to rs2179922 genotype

Cohort GG AA+AG p-value2

ULSAM

175.1±6.0 (843)

174.2±6.8 (210)

0.03

PIVUS 176.2±6.3 (408) 173.9±6.8 (93) 0.002

1 Mean standing body height ± SD in cm with number of observations in parenthesis. 2 Wilcoxon rank sum test

Of the additional 21 tag-SNPs that were genotyped in the ULSAM cohort,

three of them showed nominal evidence for association (p< 0.05). We calcu-lated that the 47 SNPs genotyped in the ESR1 gene captured 73% of the common SNP variation of ESR1 (MAF �0.05) found in the European sam-ple from the HapMap project.

The ESR1 gene has a functional connection to body height. It has been shown to have a direct effect on bone development and body height. In a reported case of a male patient with estrogen resistance caused by mutations in the estrogen receptor gene the patient had incomplete epiphyseal closure and a history of continued growth into adulthood. His final body height was 204 cm. [149]. Other published studies with SNPs in the ESR1 gene that are suggested to be associated to body height in women[150, 151] and adoles-cent boys [152]. The effect on body height of the G-allele of the SNP rs2179922 that we observed in males from the ULSAM and PIVUS cohorts is comparable to the effect of the SNP rs1042725 in the HMGA2 gene in adult males in multiple cohorts from the UK and Sweden[121].

Our findings suggest that the ESR1 gene could be one of the genes in-volved in regulating normal variation of body height in males. The power of this study however does not allow us to exclude small effects on height by the other candidate genes analyzed.

38

To identify the actual functional variants in the ESR1 gene that affect height will require re-sequencing of the genes to identify possible rare vari-ants, and functional studies on the molecular level, as well as very large population-based studies on the interactions between genes and with factors from the environment.

Study IV Utilizing the sibling relation of the twin pairs we performed single-point

linkage analysis and when analyzing male samples separately and found 18 SNPs with linkage to body height located together defining a region cover-ing ~65.5kb. Analysis of the female samples separately did not result in any significant linkage. Association analysis didn’t produce and significant re-sults for any grouping of the samples.

Based on the genotyping results we evaluated the importance the valida-tion status for the final assay success rate. We concluded that the Illumina design score was more important than the current validation status of the SNPs (see Table 1, Paper IV).

Two functionally interesting candidate genes are located in the region de-fined by the SNPs linked in the male samples.

The first gene is Glypican-3 (GPC3) a gene were variations have been shown to cause Simpson-Golabi-Behmel syndrome which among other things results in abnormal body height (gigantism) along with skeletal anomalies [153]. The GPC3 gene is also believed to play a role in the sup-pression and regulating growth in mesodermal tissues and organs and possi-bly can also interact with insulin-like growth factor 2 (IGF2) which could also be a functional link to growth regulation and influence o body height [154].

The second possible candidate gene is the plant homeodomain finger gene 6 (PHF6). The PHF6 gene is associated with the Borjeson-Forssman-Lehmann syndrome, which is a form of X-linked mental retardation were short stature is part of the clinical manifestations.

Further investigation into this region and the candidate genes in other population cohort are needed to possibly replicate our findings, and to de-termine if the suggested candidate genes are influencing normal variation in body height.

Study V Linkage results from the first round of genotyping and analysis of all four

candidate genes resulted in non-significant linkage of all but the COL11A1 gene. This prompted this study to focus further effort on the COL11A1 gene alone. Results from extensive analysis of linkage and association using the genotyped markers in the COL11A1 gene identified significant linkage and association for a functional non-synonymous SNP and body height in males.

39

We could show that one allele of this SNP was associated with an increase in body height for males in the Finnish population. Homozygote carriers of the serine allele were approximately 0.9cm taller when compared to carriers of the other allele.

We calculated that this SNP can explain 0.1% of the variance in the male population and 0.01% in the whole Finnish population.

The COL11A1 gene is a highly relevant candidate gene for human body height. It is expressed in cartilage tissue, growth plate and in the nucleus pulposus of the intervertebral discs. The encoded protein is one of the three subunits that make up collagen XI which in turn is a part of collagen fibrils that are a vital part of cartilage in the human body.

Known mutations in the COL11A1 gene have been shown to cause Mar-shall and Stickler syndrome. Phenotypes observed for these syndromes in-clude short stature, osteoarthritis, midfacial hypoplacia and cleft palate, all indications of skeletal defects. The involvement of COL11A1 in the skeletal development and morphogenesis is strongly supported by a knock-out mouse model were COL11A1 null mice were only half the normal length [155].

Our finding of a potentially functional variant in the COL11A1 gene with relevant biological function related to body height represents one small but important step towards discovering and understanding the complex make up of human body height.

Concluding remarks Both studies I and II demonstrate the importance of having a broad range

of well characterized phenotypes in sample cohorts used to further investi-gate loci identified using whole genome SNP association studies. Type 2 diabetes has all the characteristics of a typical complex genetic disease so associated genes are not unlikely to have a wide variety of effects on its pa-thology.

The three studies III-V presented here confirm that human body height is a complex genetic trait that is most likely influenced by a large number of variants throughout the genome. Two genes that are genetically and func-tionally linked with body height were identified. These are the ESR1 and COL11A1genes. Two other potential candidate genes, GPC3 and PHF6 were found to be located in a linked region on the X-chromosome that could be investigated more closely in the future.

The future of research of human body height will most likely be to ana-lyze the results from genome-wide association studies that have only re-cently become a usable tool. Because body height is being so readily avail-able in most sample cohorts, it is a good candidate for analysis in upcoming studies regardless of their main focus.

40

Final thoughts

The work that began all those years ago has now come to completion in my Ph.D. that you are now reading. As I look back at the years of work that have passed I am reminded of the mysterious quote

“May you live in interesting times”

This rings true in more ways than one for me. During my time as a

Ph.D. student the reference sequence of the human genome was realized along side with the genomes of many other species that we share a planet with. I was able to follow the huge undertaking of the HapMap project from start to finish. The number of human SNPs in the dbSNP database has in-creased threefold from 3 million almost 12 million, and the proportion of them that are validated as true SNPs have risen from around 10% to almost 50%.

On the technology side of science when I started the cutting edge com-mercial genotyping system in our lab could analyze 12 SNP per sample in one experiment. Today we have the possibility to analyze over 1 million SNPs per sample in one experiment.

If you are just beginning your Ph.D. studies as you read this do not worry, you have many things to look forward to. Re-sequencing of genes will become an every day operation, advancing to whole chromosomes to perhaps actually making the $1000 genome a reality. We have just started to truly explore the human genome in close detail and the complexity of it is so vast that it is sometimes frightening (I know).

Interesting times it has been, and more is sure to come, but my thesis is now finished.

One road ends and another begins, who knows what waits around the next bend not only for me but for all of us ….I look forward to finding out.

41

Acknowledgements

The work presented in this thesis was performed in the group of Mo-lecular Medicine at the Department of Medical Sciences, Uppsala University.

I would like to express my gratitude to all that have contributed to my work, helped and supported me during my time working on this thesis. To my supervisor Ann-Christine Syvänen, thank you for giving me the op-portunity perform my thesis work and be a part of your research team. To my co-supervisor Håkan Melhus, we did not get the chance to do much work together but I very much appreciated the conversations we did have. My second co-supervisor Markus Perola I would like to thank for all the help and interesting collaboration we have had in connection to the GenomEUt-win project and my work on human body height.

To all the people in Markuls Perola’s research group at the National Pub-lic Health Institute in Finland. Thank you for making me feel welcome and helping me out during my visits with you working on the statistical analysis.

To the Molecular Medicine research group. I would like to express my deepest appreciation for all you past and present members that I have had the pleasure and privilege to work along side with during these years. I should really write a paragraph for each and every one of you but I’m afraid I would leave someone out (and time is in short supply these days ;) To all past and present Ph.D. students and project students:

I thank all of you that you that have helped me with my work during and in the final stages of this thesis work. You have been both good friends and colleagues to me over the years, I will not forget and will always be grateful for everything you did for me.

42

To all the people that work at the genotyping core facility within the Mo-lecular Medicine group:

To all of you I would also like to express my appreciation and gratitude, for you have all been so nice to me and really helped me a lot. It has been an honor and a privilege to have worked along side you all.

To my collaborators in my work with type 2 diabetes I would like to thank especially. Christian Berne and Björn Zethelius for providing me the oppor-tunity to work with you on the two studies that became a vital part of this thesis. I also thank Karin Jensevik and Niclas Ericsson for explaining the statistics to me.

To all my friends at Uppsala Ju-Jutsuklubb, without you I could not have done this. Thank you for providing me with a constructive way to handle my frustration from time to time and for all the joy we ha shared over the years.

To all my friends outside the world of science and Ju-Jutsu. You have all been a vital part to get to this point and I’m truly privileged to call all of you my friends. Special thanks to Helena for helping me with the arrangements for the up-coming celebrations. To Martin Hallonqvist

Your support and faith in me has been an invaluable asset during the final stages of this thesis. I hope that I can be as good a friend to you as you are to me. I owe you one.

To those that got lost along the way, I will never forget you and I will always treasure the time we shared together.

Finally to any and all that I might have missed to mention here, I have not forgotten you and I never will.

Andreas Dahlgren

Uppsala, 2007-10-23

This work was supported by the European Commission through the Ge-nomEUtwin project (Contract QLG2-CT-2002-01254) and by the Swedish Research Council for Science and Technology (VR-NT).

43

References 1. Mendel, G.J., Versuche über Pflanzen-Hybriden. Verhandlungen des Na-

turforschenden Vereins zu Brünn, 1866. 4: p. 3-47. 2. Avery, O.T., C.M. MacLeod, and M. McCarty, Studies on the chemical

nature of the substance inducing transformation of pneumococcal types. Inductions of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III. J. Exp. Med., 1944. 79: p. 137-159.

3. Watson, J.D. and F.H. Crick, Molecular structure of nucleic acids; a struc-ture for deoxyribose nucleic acid. Nature, 1953. 171(4356): p. 737-8.

4. Watson, J.D. and F.H. Crick, A structure for deoxyribose nucleic acid. Nature, 1953. 171(4356): p. 964-7.

5. Lander, E.S., et al., Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): p. 860-921.

6. Venter, J.C., et al., The sequence of the human genome. Science, 2001. 291(5507): p. 1304-51.

7. Finishing the euchromatic sequence of the human genome. Nature, 2004. 431(7011): p. 931-45.

8. Liang, F., et al., Gene index analysis of the human genome estimates ap-proximately 120,000 genes. Nat Genet, 2000. 25(2): p. 239-40.

9. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science, 1998. 282(5396): p. 2012-8.

10. Graveley, B.R., Alternative splicing: increasing diversity in the proteomic world. Trends Genet, 2001. 17(2): p. 100-7.

11. Levine, M. and R. Tjian, Transcription regulation and animal diversity. Nature, 2003. 424(6945): p. 147-51.

12. Feuk, L., A.R. Carson, and S.W. Scherer, Structural variation in the human genome. Nat Rev Genet, 2006. 7(2): p. 85-97.

13. Cheng, Z., et al., A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature, 2005. 437(7055): p. 88-93.

14. Edwards, A., et al., DNA typing and genetic mapping with trimeric and tetrameric tandem repeats. Am J Hum Genet, 1991. 49(4): p. 746-56.

15. Weber, J.L., et al., Human diallelic insertion/deletion polymorphisms. Am J Hum Genet, 2002. 71(4): p. 854-62.

16. Levy, S., et al., The Diploid Genome Sequence of an Individual Human. PLoS Biol, 2007. 5(10): p. e254.

44

17. Bhangale, T.R., et al., Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate ge-nes. Hum Mol Genet, 2005. 14(1): p. 59-69.

18. Fredman, D., et al., Complex SNP-related sequence variation in segmental genome duplications. Nat Genet, 2004. 36(8): p. 861-6.

19. Saiki, R.K., et al., Enzymatic amplification of beta-globin genomic sequen-ces and restriction site analysis for diagnosis of sickle cell anemia. Science, 1985. 230(4732): p. 1350-4.

20. Mullis, K., et al., Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harb Symp Quant Biol, 1986. 51 Pt 1: p. 263-73.

21. Saiki, R.K., et al., Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science, 1988. 239(4839): p. 487-91.

22. Landegren, U. and M. Nilsson, Locked on target: strategies for future gene diagnostics. Ann Med, 1997. 29(6): p. 585-90.

23. Syvanen, A.C., Toward genome-wide SNP genotyping. Nat Genet, 2005. 37 Suppl: p. S5-10.

24. Sanger, F., S. Nicklen, and A.R. Coulson, DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A, 1977. 74(12): p. 5463-7.

25. Drossman, H., et al., High-speed separations of DNA sequencing reactions by capillary electrophoresis. Anal Chem, 1990. 62(9): p. 900-3.

26. Mardis, E.R., Anticipating the 1,000 dollar genome. Genome Biol, 2006. 7(7): p. 112.

27. Margulies, M., et al., Genome sequencing in microfabricated high-density picolitre reactors. Nature, 2005. 437(7057): p. 376-80.

28. Ronaghi, M., et al., Real-time DNA sequencing using detection of py-rophosphate release. Anal Biochem, 1996. 242(1): p. 84-9.

29. Singer, E., The $2 Million Genome TechnologyReview.com, 2007. 30. Bentley, D.R., Whole-genome re-sequencing. Current Opinion in Genetics

& Development, 2006. 16(6): p. 545-552. 31. Wallace, R.B., et al., Hybridization of synthetic oligodeoxyribonucleotides

to phi chi 174 DNA: the effect of single base pair mismatch. Nucleic Acids Res, 1979. 6(11): p. 3543-57.

32. Livak, K.J., et al., Oligonucleotides with fluorescent dyes at opposite ends provide a quenched probe system useful for detecting PCR product and nucleic acid hybridization. PCR Methods Appl, 1995. 4(6): p. 357-62.

33. Tyagi, S. and F.R. Kramer, Molecular beacons: probes that fluoresce upon hybridization. Nat Biotechnol, 1996. 14(3): p. 303-8.

34. Southern, E., K. Mir, and M. Shchepinov, Molecular interactions on mic-roarrays. Nat Genet, 1999. 21(1 Suppl): p. 5-9.

35. McGall, G.H. and J.A. Fidanza, Photolithographic synthesis of high-density oligonucleotide arrays. Methods Mol Biol, 2001. 170: p. 71-101.

36. Pastinen, T., et al., Minisequencing: a specific tool for DNA analysis and diagnostics on oligonucleotide arrays. Genome Res, 1997. 7(6): p. 606-14.

37. Landegren, U., et al., A ligase-mediated gene detection technique. Science, 1988. 241(4869): p. 1077-80.

45

38. Nilsson, M., et al., Padlock probes: circularizing oligonucleotides for loca-lized DNA detection. Science, 1994. 265(5181): p. 2085-8.

39. Hardenbol, P., et al., Multiplexed genotyping with sequence-tagged molecu-lar inversion probes. Nat Biotechnol, 2003. 21(6): p. 673-8.

40. Hardenbol, P., et al., Highly multiplexed molecular inversion probe genoty-ping: over 10,000 targeted SNPs genotyped in a single tube assay. Genome Res, 2005. 15(2): p. 269-75.

41. Syvanen, A.C., et al., A primer-guided nucleotide incorporation assay in the genotyping of apolipoprotein E. Genomics, 1990. 8(4): p. 684-92.

42. Fan, J.B., et al., Parallel genotyping of human SNPs using generic high-density oligonucleotide tag arrays. Genome Res, 2000. 10(6): p. 853-60.

43. Fortina, P., et al., Simple two-color array-based approach for mutation detection. Eur J Hum Genet, 2000. 8(11): p. 884-94.

44. Chen, X., L. Levine, and P.Y. Kwok, Fluorescence polarization in homo-geneous nucleic acid analysis. Genome Res, 1999. 9(5): p. 492-8.

45. Sauer, S., et al., A novel procedure for efficient genotyping of single nucleo-tide polymorphisms. Nucleic Acids Res, 2000. 28(5): p. E13.

46. Lindroos, K., et al., Minisequencing on oligonucleotide microarrays: com-parison of immobilisation chemistries. Nucleic Acids Res, 2001. 29(13): p. E69-9.

47. Lindroos, K., et al., Multiplex SNP genotyping in pooled DNA samples by a four-colour microarray system. Nucleic Acids Res, 2002. 30(14): p. e70.

48. Bell, P.A., et al., SNPstream UHT: ultra-high throughput SNP genotyping for pharmacogenomics and drug discovery. Biotechniques, 2002. Suppl: p. 70-2, 74, 76-7.

49. Steemers, F.J., et al., Whole-genome genotyping with the single-base exten-sion assay. Nat Methods, 2006. 3(1): p. 31-3.

50. Steemers, F.J. and K.L. Gunderson, Whole genome genotyping technolo-gies on the BeadArray platform. Biotechnol J, 2007. 2(1): p. 41-9.

51. Wu, D.Y., et al., Allele-specific enzymatic amplification of beta-globin genomic DNA for diagnosis of sickle cell anemia. Proc Natl Acad Sci U S A, 1989. 86(8): p. 2757-60.

52. Fan, J.B., et al., Highly parallel SNP genotyping. Cold Spring Harb Symp Quant Biol, 2003. 68: p. 69-78.

53. Oliphant, A., et al., BeadArray technology: enabling an accurate, cost-effective approach to high-throughput genotyping. Biotechniques, 2002. Suppl: p. 56-8, 60-1.

54. Kerem, B., et al., Identification of the cystic fibrosis gene: genetic analysis. Science, 1989. 245(4922): p. 1073-80.

55. Scriver, C.R. and P.J. Waters, Monogenic traits are not simple: lessons from phenylketonuria. Trends Genet, 1999. 15(7): p. 267-72.

56. Wang, W.Y., et al., Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet, 2005. 6(2): p. 109-18.

57. Boomsma, D., A. Busjahn, and L. Peltonen, Classical twin studies and beyond. Nat Rev Genet, 2002. 3(11): p. 872-82.

58. Cheung, V.G., et al., Polymorphic variation in human meiotic recombinati-on. Am J Hum Genet, 2007. 80(3): p. 526-30.

46

59. Jimenez-Sanchez, G., B. Childs, and D. Valle, Human disease genes. Na-ture, 2001. 409(6822): p. 853-5.

60. Hirschhorn, J.N. and M.J. Daly, Genome-wide association studies for common diseases and complex traits. Nat Rev Genet, 2005. 6(2): p. 95-108.

61. Cardon, L.R. and L.J. Palmer, Population stratification and spurious allelic association. Lancet, 2003. 361(9357): p. 598-604.

62. Carlson, C.S., et al., Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequi-librium. Am J Hum Genet, 2004. 74(1): p. 106-20.

63. Reynisdottir, I., et al., Localization of a susceptibility gene for type 2 diabe-tes to chromosome 5q34-q35.2. Am J Hum Genet, 2003. 73(2): p. 323-35.

64. A haplotype map of the human genome. Nature, 2005. 437(7063): p. 1299-320.

65. Frazer, K.A., et al., A second generation human haplotype map of over 3.1 million SNPs. Nature, 2007. 449(7164): p. 851-61.

66. Alberti, K.G. and P.Z. Zimmet, Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus provisional report of a WHO consultation. Diabet Med, 1998. 15(7): p. 539-53.

67. Zimmet, P., K.G. Alberti, and J. Shaw, Global and societal implications of the diabetes epidemic. Nature, 2001. 414(6865): p. 782-7.

68. Narayan, K.M., et al., Impact of recent increase in incidence on future diabetes burden: U.S., 2005-2050. Diabetes Care, 2006. 29(9): p. 2114-6.

69. Hossain, P., B. Kawar, and M. El Nahas, Obesity and diabetes in the deve-loping world--a growing challenge. N Engl J Med, 2007. 356(3): p. 213-5.

70. Pan, X.R., et al., Effects of diet and exercise in preventing NIDDM in peo-ple with impaired glucose tolerance. The Da Qing IGT and Diabetes Study. Diabetes Care, 1997. 20(4): p. 537-44.

71. Tuomilehto, J., et al., Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. N Engl J Med, 2001. 344(18): p. 1343-50.

72. Carmichael, C.M. and M. McGue, A cross-sectional examination of height, weight, and body mass index in adult twins. J Gerontol A Biol Sci Med Sci, 1995. 50(4): p. B237-44.

73. Silventoinen, K., et al., Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res, 2003. 6(5): p. 399-408.

74. Altshuler, D., et al., The common PPARgamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat Genet, 2000. 26(1): p. 76-80.

75. Gloyn, A.L., et al., Large-scale association studies of variants in genes encoding the pancreatic beta-cell KATP channel subunits Kir6.2 (KCNJ11) and SUR1 (ABCC8) confirm that the KCNJ11 E23K variant is associated with type 2 diabetes. Diabetes, 2003. 52(2): p. 568-72.

76. Grant, S.F., et al., Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat Genet, 2006. 38(3): p. 320-3.

47

77. Duggirala, R., et al., Linkage of type 2 diabetes mellitus and of age at onset to a genetic location on chromosome 10q in Mexican Americans. Am J Hum Genet, 1999. 64(4): p. 1127-40.

78. Bodhini, D., et al., The rs12255372(G/T) and rs7903146(C/T) poly-morphisms of the TCF7L2 gene are associated with type 2 diabetes mellitus in Asian Indians. Metabolism, 2007. 56(9): p. 1174-8.

79. Chandak, G.R., et al., Common variants in the TCF7L2 gene are strongly associated with type 2 diabetes mellitus in the Indian population. Diabeto-logia, 2007. 50(1): p. 63-7.

80. Chang, Y.C., et al., Association study of the genetic polymorphisms of the transcription factor 7-like 2 (TCF7L2) gene and type 2 diabetes in the Chi-nese population. Diabetes, 2007. 56(10): p. 2631-7.

81. Dahlgren, A., et al., Variants of the TCF7L2 gene are associated with beta cell dysfunction and confer an increased risk of type 2 diabetes mellitus in the ULSAM cohort of Swedish elderly men. Diabetologia, 2007. 50(9): p. 1852-7.

82. Damcott, C.M., et al., Polymorphisms in the transcription factor 7-like 2 (TCF7L2) gene are associated with type 2 diabetes in the Amish: replicati-on and evidence for a role in both insulin secretion and insulin resistance. Diabetes, 2006. 55(9): p. 2654-9.

83. De Silva, N.M., et al., The transcription factor 7-like 2 (TCF7L2) gene is associated with Type 2 diabetes in UK community-based cases, but the risk allele frequency is reduced compared with UK cases selected for genetic studies. Diabet Med, 2007. 24(10): p. 1067-72.

84. Elbein, S.C., et al., Transcription factor 7-like 2 polymorphisms and type 2 diabetes, glucose homeostasis traits and gene expression in US participants of European and African descent. Diabetologia, 2007. 50(8): p. 1621-30.

85. Groves, C.J., et al., Association analysis of 6,736 U.K. subjects provides replication and confirms TCF7L2 as a type 2 diabetes susceptibility gene with a substantial effect on individual risk. Diabetes, 2006. 55(9): p. 2640-4.

86. Hayashi, T., et al., Replication study for the association of TCF7L2 with susceptibility to type 2 diabetes in a Japanese population. Diabetologia, 2007. 50(5): p. 980-4.

87. Horikoshi, M., et al., A genetic variation of the transcription factor 7-like 2 gene is associated with risk of type 2 diabetes in the Japanese population. Diabetologia, 2007. 50(4): p. 747-51.

88. Humphries, S.E., et al., Common variants in the TCF7L2 gene and pre-disposition to type 2 diabetes in UK European Whites, Indian Asians and Afro-Caribbean men and women. J Mol Med, 2006. 84(12): p. 1005-14.

89. Marzi, C., et al., Variants of the transcription factor 7-like 2 gene (TCF7L2) are strongly associated with type 2 diabetes but not with the me-tabolic syndrome in the MONICA/KORA surveys. Horm Metab Res, 2007. 39(1): p. 46-52.

90. Mayans, S., et al., TCF7L2 polymorphisms are associated with type 2 dia-betes in northern Sweden. Eur J Hum Genet, 2007. 15(3): p. 342-6.

48

91. Meigs, J.B., et al., Genome-wide association with diabetes-related traits in the Framingham Heart Study. BMC Med Genet, 2007. 8 Suppl 1: p. S16.

92. Ng, M.C., et al., Replication and identification of novel variants at TCF7L2 associated with type 2 diabetes in Hong Kong Chinese. J Clin Endocrinol Metab, 2007. 92(9): p. 3733-7.

93. Parra, E.J., et al., Association of TCF7L2 polymorphisms with type 2 diabe-tes in Mexico City. Clin Genet, 2007. 71(4): p. 359-66.

94. Sale, M.M., et al., Variants of the transcription factor 7-like 2 (TCF7L2) gene are associated with type 2 diabetes in an African-American populati-on enriched for nephropathy. Diabetes, 2007. 56(10): p. 2638-42.

95. Scott, L.J., et al., Association of transcription factor 7-like 2 (TCF7L2) variants with type 2 diabetes in a Finnish sample. Diabetes, 2006. 55(9): p. 2649-53.

96. Zhang, C., et al., Variant of transcription factor 7-like 2 (TCF7L2) gene and the risk of type 2 diabetes in large cohorts of U.S. women and men. Di-abetes, 2006. 55(9): p. 2645-8.

97. Sladek, R., et al., A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature, 2007. 445(7130): p. 881-5.

98. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 2007. 447(7145): p. 661-678.

99. Salonen, J.T., et al., Type 2 diabetes whole-genome association study in four populations: the DiaGen consortium. Am J Hum Genet, 2007. 81(2): p. 338-45.

100. Saxena, R., et al., Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science, 2007. 316(5829): p. 1331-6.

101. Scott, L.J., et al., A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science, 2007. 316(5829): p. 1341-5.

102. Beck, S.R., et al., Age-stratified QTL genome scan analyses for anthropo-metric measures. BMC Genet, 2003. 4 Suppl 1: p. S31.

103. Deng, H.W., et al., A whole-genome linkage scan suggests several genomic regions potentially containing QTLs underlying the variation of stature. Am J Med Genet, 2002. 113(1): p. 29-39.

104. Ellis, J.A., et al., Comprehensive multi-stage linkage analyses identify a locus for adult height on chromosome 3p in a healthy Caucasian populati-on. Hum Genet, 2006.

105. Geller, F., A. Dempfle, and T. Gorg, Genome scan for body mass index and height in the Framingham Heart Study. BMC Genet, 2003. 4 Suppl 1: p. S91.

106. Hirschhorn, J.N., et al., Genomewide linkage analysis of stature in multiple populations reveals several regions with evidence of linkage to adult height. Am J Hum Genet, 2001. 69(1): p. 106-16.

107. Liu, Y.Z., et al., Genetic linkage of human height is confirmed to 9q22 and Xq24. Hum Genet, 2006. 119(3): p. 295-304.

108. Liu, Y.Z., et al., Genetic dissection of human stature in a large sample of multiplex pedigrees. Ann Hum Genet, 2004. 68(Pt 5): p. 472-88.

49

109. Mukhopadhyay, N., et al., A genome-wide scan for loci affecting normal adult height in the Framingham Heart Study. Hum Hered, 2003. 55(4): p. 191-201.

110. Mukhopadhyay, N. and D.E. Weeks, Linkage analysis of adult height with parent-of-origin effects in the Framingham Heart Study. BMC Genet, 2003. 4 Suppl 1: p. S76.

111. Sale, M.M., et al., Loci contributing to adult height and body mass index in African American families ascertained for type 2 diabetes. Ann Hum Ge-net, 2005. 69(Pt 5): p. 517-27.

112. Sammalisto, S., et al., A male-specific quantitative trait locus on 1p21 cont-rolling human stature. J Med Genet, 2005. 42(12): p. 932-9.

113. Shmulewitz, D., et al., Linkage analysis of quantitative traits for obesity, diabetes, hypertension, and dyslipidemia on the island of Kosrae, Federa-ted States of Micronesia. Proc Natl Acad Sci U S A, 2006. 103(10): p. 3502-9.

114. Soro, A., et al., Genome scans provide evidence for low-HDL-C loci on chromosomes 8q23, 16q24.1-24.2, and 20q13.11 in Finnish families. Am J Hum Genet, 2002. 70(5): p. 1333-40.

115. Willemsen, G., et al., QTLs for height: results of a full genome scan in Dutch sibling pairs. Eur J Hum Genet, 2004. 12(10): p. 820-8.

116. Wiltshire, S., et al., Evidence for linkage of stature to chromosome 3p26 in a large U.K. Family data set ascertained for type 2 diabetes. Am J Hum Genet, 2002. 70(2): p. 543-6.

117. Wu, X., et al., Combined analysis of genomewide scans for adult height: results from the NHLBI Family Blood Pressure Program. Eur J Hum Ge-net, 2003. 11(3): p. 271-4.

118. Xu, J., et al., Major recessive gene(s) with considerable residual polygenic effect regulating adult height: confirmation of genomewide scan results for chromosomes 6, 9, and 12. Am J Hum Genet, 2002. 71(3): p. 646-50.

119. Weiss, L.A., et al., The sex-specific genetic architecture of quantitative traits in humans. Nat Genet, 2006. 38(2): p. 218-22.

120. Perola, M., et al., Combined Genome Scans for Body Stature in 6,602 Eu-ropean Twins: Evidence for Common Caucasian Loci. PLoS Genet, 2007. 3(6): p. e97.

121. Weedon, M.N., et al., A common variant of HMGA2 is associated with adult and childhood height in the general population. Nat Genet, 2007. 39(10): p. 1245-1250.

122. Hedstrand, H., A study of middle-aged men with particular reference to risk factors for cardiovascular disease. Ups J Med Sci Suppl, 1975. 19: p. 1-61.

123. Byberg, L., et al., Birth weight and the insulin resistance syndrome: asso-ciation of low birth weight with truncal obesity and raised plasminogen ac-tivator inhibitor-1 but not with abdominal obesity or plasma lipid distur-bances. Diabetologia, 2000. 43(1): p. 54-60.

124. DeFronzo, R.A., J.D. Tobin, and R. Andres, Glucose clamp technique: a method for quantifying insulin secretion and resistance. Am J Physiol, 1979. 237(3): p. E214-23.

50

125. Zethelius, B., et al., Insulin resistance, impaired early insulin response, and insulin propeptides as predictors of the development of type 2 diabetes: a population-based, 7-year follow-up study in 70-year-old men. Diabetes Ca-re, 2004. 27(6): p. 1433-8.

126. Nyholt, D.R., A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Ge-net, 2004. 74(4): p. 765-9.

127. Barrett, J.C., et al., Haploview: analysis and visualization of LD and haplo-type maps. Bioinformatics, 2005. 21(2): p. 263-5.

128. Lind, L., et al., A comparison of three different methods to evaluate endo-thelium-dependent vasodilation in the elderly: the Prospective Investigation of the Vasculature in Uppsala Seniors (PIVUS) study. Arterioscler Thromb Vasc Biol, 2005. 25(11): p. 2368-75.

129. Kaprio, J. and M. Koskenvuo, Genetic and environmental factors in complex diseases: the older Finnish Twin Cohort. Twin Res, 2002. 5(5): p. 358-65.

130. Lilja, H.E., et al., A candidate gene study in low HDL-cholesterol families provides evidence for the involvement of the APOA2 gene and the APOA1C3A4 gene cluster. Atherosclerosis, 2002. 164(1): p. 103-11.

131. Pajukanta, P., et al., Genomewide scan for familial combined hyperlipide-mia genes in finnish families, suggesting multiple susceptibility loci in-fluencing triglyceride, cholesterol, and apolipoprotein B levels. Am J Hum Genet, 1999. 64(5): p. 1453-63.

132. Abecasis, G.R., et al., Merlin--rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet, 2002. 30(1): p. 97-101.

133. Abecasis, G.R., L.R. Cardon, and W.O. Cookson, A general test of associa-tion for quantitative traits in nuclear families. Am J Hum Genet, 2000. 66(1): p. 279-92.

134. Lange K, C.R., Horvath S, Perola M, Sabatti C, Sinsheimer J, Sobel E. , Mendel version 4.0: A complete package for the exact genetic analysis of discrete traits in pedigree and population data sets. . Amer J Hum Genetics 2001. 69(supplement):A1886.

135. Lange, K., J.S. Sinsheimer, and E. Sobel, Association testing with Mendel. Genet Epidemiol, 2005. 29(1): p. 36-50.

136. Li, J. and L. Ji, Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity, 2005. 95(3): p. 221-7.

137. Loos, R.J., et al., TCF7L2 polymorphisms modulate proinsulin levels and beta-cell function in a British Europid population. Diabetes, 2007. 56(7): p. 1943-7.

138. Saxena, R., et al., Common single nucleotide polymorphisms in TCF7L2 are reproducibly associated with type 2 diabetes and reduce the insulin response to glucose in nondiabetic individuals. Diabetes, 2006. 55(10): p. 2890-5.

139. Florez, J.C., et al., TCF7L2 polymorphisms and progression to diabetes in the Diabetes Prevention Program. N Engl J Med, 2006. 355(3): p. 241-50.

51

140. Munoz, J., et al., Polymorphism in the transcription factor 7-like 2 (TCF7L2) gene is associated with reduced insulin secretion in nondiabetic women. Diabetes, 2006. 55(12): p. 3630-4.

141. Zethelius, B., et al., Proinsulin and acute insulin response independently predict Type 2 diabetes mellitus in men--report from 27 years of follow-up study. Diabetologia, 2003. 46(1): p. 20-6.

142. Prunier, C., B.A. Hocevar, and P.H. Howe, Wnt signaling: physiology and pathology. Growth Factors, 2004. 22(3): p. 141-50.

143. Yi, F., P.L. Brubaker, and T. Jin, TCF-4 mediates cell type-specific regula-tion of proglucagon gene expression by beta-catenin and glycogen synthase kinase-3beta. J Biol Chem, 2005. 280(2): p. 1457-64.

144. Rulifson, I.C., et al., Wnt signaling regulates pancreatic beta cell prolifera-tion. Proc Natl Acad Sci U S A, 2007. 104(15): p. 6247-52.

145. Lyssenko, V., et al., Mechanisms by which common variants in the TCF7L2 gene increase risk of type 2 diabetes. J Clin Invest, 2007. 117(8): p. 2155-63.

146. Bort, R., et al., Hex homeobox gene-dependent tissue positioning is requi-red for organogenesis of the ventral pancreas. Development, 2004. 131(4): p. 797-806.

147. Bort, R., et al., Hex homeobox gene controls the transition of the endoderm to a pseudostratified, cell emergent epithelium for liver bud development. Dev Biol, 2006. 290(1): p. 44-56.

148. Foley, A.C. and M. Mercola, Heart induction by Wnt antagonists depends on the homeodomain transcription factor Hex. Genes Dev, 2005. 19(3): p. 387-96.

149. Smith, E.P., et al., Estrogen resistance caused by a mutation in the estro-gen-receptor gene in a man. N Engl J Med, 1994. 331(16): p. 1056-61.

150. Lehrer, S., et al., Association of an estrogen receptor variant with increa-sed height in women. Horm Metab Res, 1994. 26(10): p. 486-8.

151. Langdahl, B.L., et al., A TA repeat polymorphism in the estrogen receptor gene is associated with osteoporotic fractures but polymorphisms in the first exon and intron are not. J Bone Miner Res, 2000. 15(11): p. 2222-30.

152. Lorentzon, M., et al., Estrogen receptor gene polymorphism, but not estra-diol levels, is related to bone density in healthy adolescent boys: a cross-sectional and longitudinal study. J Clin Endocrinol Metab, 1999. 84(12): p. 4597-601.

153. Pilia, G., et al., Mutations in GPC3, a glypican gene, cause the Simpson-Golabi-Behmel overgrowth syndrome. Nat Genet, 1996. 12(3): p. 241-7.

154. Weksberg, R. and J.A. Squire, Molecular biology of Beckwith-Wiedemann syndrome. Med Pediatr Oncol, 1996. 27(5): p. 462-9.

155. Li, Y., et al., A fibrillar collagen gene, Col11a1, is essential for skeletal morphogenesis. Cell, 1995. 80(3): p. 423-30.

Acta Universitatis UpsaliensisDigital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Medicine 287

Editor: The Dean of the Faculty of Medicine

A doctoral dissertation from the Faculty of Medicine, UppsalaUniversity, is usually a summary of a number of papers. A fewcopies of the complete dissertation are kept at major Swedishresearch libraries, while the summary alone is distributedinternationally through the series Digital ComprehensiveSummaries of Uppsala Dissertations from the Faculty ofMedicine. (Prior to January, 2005, the series was publishedunder the title “Comprehensive Summaries of UppsalaDissertations from the Faculty of Medicine”.)

Distribution: publications.uu.seurn:nbn:se:uu:diva-8291

ACTA

UNIVERSITATIS

UPSALIENSIS

UPPSALA

2007