American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology...

19
American Journal of Physical Anthropology Copy of e-mail Notification Your article ( 2006-00153.R2 ) from American Journal of Physical Anthropology is available for download ===== American Journal of Physical Anthropology Published by John Wiley & Sons, Inc. Dear Author, Your article page proofs for American Journal of Physical Anthropology are ready for review. John Wiley & Sons has made this article available to you online for faster, more efficient editing. Please follow the instructions below and you will be able to access a PDF version of your article as well as relevant accompanying paperwork. First, make sure you have a copy of Adobe Acrobat Reader software to read these files. This is free software and is available for user downloading at http://www.adobe.com/products/acrobat/readstep.html. Open your web browser, and enter the following web address: http://kwglobal.co.in/jw/retrieval.aspx You will be prompted to log in, and asked for a password. Your login name will be your email address, and your password will be ---- Example: Login: your e-mail address Password: ---- The site contains one file, containing: - Author Instructions Checklist - Adobe Acrobat Users - NOTES tool sheet - Reprint Order form - A copy of your page proofs for your article Print out this file, and fill out the forms by hand. (If you do not wish to order reprints, please mark a "0" on the reprint order form.) Read your page proofs carefully and: - indicate changes or corrections in the margin of the page proofs - answer all queries (footnotes A,B,C, etc.) on the last page of the PDF proof - proofread any tables and equations carefully - check your figure legends for accuracy Within 48 hours, please return via fax or express mail all materials to the address given below. This will include: 1) Page proofs with corrections 2) Reprint Order form Return to: Doug Frank Production Editor

Transcript of American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology...

Page 1: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

American Journal of Physical AnthropologyCopy of e-mail Notification

Your article ( 2006-00153.R2 ) from American Journal of Physical Anthropology is available for download=====American Journal of Physical Anthropology Published by John Wiley & Sons, Inc.

Dear Author,

Your article page proofs for American Journal of Physical Anthropology are ready for review. John Wiley & Sons has made this article available to you online for faster, more efficient editing. Please follow the instructions below and you will be able to access a PDF version of your article as well as relevant accompanying paperwork.

First, make sure you have a copy of Adobe Acrobat Reader software to read these files. This is free software and is available for user downloading at http://www.adobe.com/products/acrobat/readstep.html.

Open your web browser, and enter the following web address:http://kwglobal.co.in/jw/retrieval.aspx

You will be prompted to log in, and asked for a password. Your login name will be your email address, and your password will be ----

Example:

Login: your e-mail addressPassword: ----

The site contains one file, containing:

- Author Instructions Checklist- Adobe Acrobat Users - NOTES tool sheet- Reprint Order form- A copy of your page proofs for your article

Print out this file, and fill out the forms by hand. (If you do not wish to order reprints, please mark a "0" on the reprint order form.) Read your page proofs carefully and:

- indicate changes or corrections in the margin of the page proofs- answer all queries (footnotes A,B,C, etc.) on the last page of the PDF proof- proofread any tables and equations carefully- check your figure legends for accuracy

Within 48 hours, please return via fax or express mail all materials to the address given below. This will include:

1) Page proofs with corrections2) Reprint Order form

Return to:

Doug FrankProduction Editor

Page 2: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

American Journal of Physical AnthropologyCopy of e-mail Notification

Cadmus Communications300 W. Chestnut St.Ephrata, PA [email protected]

Phone: 800-238-3814 x615Fax: 717-738-9360

Technical problems? If you experience technical problems downloading your file or any other problem with the website listed above, please contact Malathi Shekar (e-mail: [email protected], phone: +91 (44) 42058888 (x310).

Questions regarding your article? Please don’t hesitate to contact me with any questions about the article itself, or if you have trouble interpreting any of the questions listed at the end of your file. REMEMBER TO INCLUDE YOUR ARTICLE NO. ( 2006-00153.R2 ) WITH ALL CORRESPONDENCE. This will help both of us address your query most efficiently.

As this e-proofing system was designed to make the publishing process easier for everyone, we welcome any and all feedback. Thanks for participating in our e-proofing system!

This e-proof is to be used only for the purpose of returning corrections to the publisher.

Sincerely,

Doug FrankProduction EditorCadmus Communications300 W. Chestnut St.Ephrata, PA 17522e-mail: [email protected] Phone: 800-238-3814 x615Fax: 717-738-9360

Page 3: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

111 RI V E R S T R E E T , H O B O K E N , NJ 07030

***IMMEDIATE RESPONSE REQUIRED*** Your article will be published online via Wiley's EarlyView® service (www.interscience.wiley.com) shortly after receipt of

corrections. EarlyView® is Wiley's online publication of individual articles in full text HTML and/or pdf format before release of the compiled print issue of the journal. Articles posted online in EarlyView® are peer-reviewed, copyedited, author corrected,

and fully citable via the article DOI (for further information, visit www.doi.org). EarlyView® means you benefit from the best of two worlds--fast online availability as well as traditional, issue-based archiving.

Please follow these instructions to avoid delay of publication.

READ PROOFS CAREFULLY

• This will be your only chance to review these proofs. Please note that once your corrected article is posted online, it is considered legally published, and cannot be removed from the Web site for further corrections.

• Please note that the volume and page numbers shown on the proofs are for position only.

ANSWER ALL QUERIES ON PROOFS (Queries for you to answer are attached as the last page of your proof.) • Mark all corrections directly on the proofs. Note that excessive author alterations may ultimately result in delay of

publication and extra costs may be charged to you.

CHECK FIGURES AND TABLES CAREFULLY (Color figure proofs will be sent under separate cover.) • Check size, numbering, and orientation of figures. • All images in the PDF are downsampled (reduced to lower resolution and file size) to facilitate Internet delivery.

These images will appear at higher resolution and sharpness in the printed article. • Review figure legends to ensure that they are complete. • Check all tables. Review layout, title, and footnotes.

COMPLETE REPRINT ORDER FORM

• Fill out the attached reprint order form. It is important to return the form even if you are not ordering reprints. You may, if you wish, pay for the reprints with a credit card. Reprints will be mailed only after your article appears in print. This is the most opportune time to order reprints. If you wait until after your article comes off press, the reprints will be considerably more expensive.

RETURN PROOFS REPRINT ORDER FORM CTA (If you have not already signed one) RETURN IMMEDIATELY AS YOUR ARTICLE WILL BE POSTED ONLINE SHORTLY AFTER RECEIPT; FAX PROOFS TO 717-738-9360. QUESTIONS? Doug Frank, Production Editor Phone: 800-238-3814 x615 E-mail: [email protected] Refer to journal acronym and article production number

(i.e., AJPA 02-3399 for American Journal of Physical Anthropology ms 02-3399).

Page 4: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

Softproofing for advanced Adobe Acrobat Users - NOTES tool NOTE: ACROBAT READER FROM THE INTERNET DOES NOT CONTAIN THE NOTES TOOL USED IN THIS PROCEDURE.

Acrobat annotation tools can be very useful for indicating changes to the PDF proof of your article. By using Acrobat annotation tools, a full digital pathway can be maintained for your page proofs. The NOTES annotation tool can be used with either Adobe Acrobat 6.0 or Adobe Acrobat 7.0. Other annotation tools are also available in Acrobat 6.0, but this instruction sheet will concentrate on how to use the NOTES tool. Acrobat Reader, the free Internet download software from Adobe, DOES NOT contain the NOTES tool. In order to softproof using the NOTES tool you must have the full software suite Adobe Acrobat Exchange 6.0 or Adobe Acrobat 7.0 installed on your computer. Steps for Softproofing using Adobe Acrobat NOTES tool: 1. Open the PDF page proof of your article using either Adobe Acrobat Exchange 6.0 or Adobe Acrobat 7.0. Proof your article on-screen or print a copy for markup of changes. 2. Go to Edit/Preferences/Commenting (in Acrobat 6.0) or Edit/Preferences/Commenting (in Acrobat 7.0) check “Always use login name for author name” option. Also, set the font size at 9 or 10 point. 3. When you have decided on the corrections to your article, select the NOTES tool from the Acrobat toolbox (Acrobat 6.0) and click to display note text to be changed, or Comments/Add Note (in Acrobat 7.0). 4. Enter your corrections into the NOTES text box window. Be sure to clearly indicate where the correction is to be placed and what text it will effect. If necessary to avoid confusion, you can use your TEXT SELECTION tool to copy the text to be corrected and paste it into the NOTES text box window. At this point, you can type the corrections directly into the NOTES text box window. DO NOT correct the text by typing directly on the PDF page. 5. Go through your entire article using the NOTES tool as described in Step 4. 6. When you have completed the corrections to your article, go to Document/Export Comments (in Acrobat 6.0) or Comments/Export Comments (in Acrobat 7.0). Save your NOTES file to a place on your harddrive where you can easily locate it. Name your NOTES file with the article number assigned to your article in the original softproofing e-mail message.

7. When closing your article PDF be sure NOT to save changes to original file. 8. To make changes to a NOTES file you have exported, simply re-open the original PDF proof file, go to Document/Import Comments and import the NOTES file you saved. Make changes and reexport NOTES file keeping the same file name. 9. When complete, attach your NOTES file to a reply e-mail message. Be sure to include your name, the date, and the title of the journal your article will be printed in.

Page 5: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

C1

REPRINT BILLING DEPARTMENT ¶ 111 RIVER STREET ¶ HOBOKEN, NJ 07030

PHONE: (201) 748-8789; FAX: (201) 748-6326

E-MAIL: [email protected]

PREPUBLICATION REPRINT ORDER FORM

Please complete this form even if you are not ordering reprints. This form MUST be returned with your corrected proofs and original manuscript. Your

reprints will be shipped approximately 4 weeks after publication. Reprints ordered after printing will be substantially more expensive.

JOURNAL American Journal of Physical Anthropology VOLUME ISSUE

TITLE OF

MANUSCRIPT

MS. NO. AJPA-- NO. OF

PAGES

AUTHOR(S)

No. of Pages 100 Reprints 200 Reprints 300 Reprints 400 Reprints 500 Reprints

$ $ $ $ $

1-4 336 501 694 890 1052

5-8 469 703 987 1251 1477

9-12 594 923 1234 1565 1850

13-16 714 1156 1527 1901 2273

17-20 794 1340 1775 2212 2648

21-24 911 1529 2031 2536 3037

25-28 1004 1707 2267 2828 3388

29-32 1108 1894 2515 3135 3755

33-36 1219 2092 2773 3456 4143

37-40 1329 2290 3033 3776 4528

**REPRINTS ARE ONLY AVAILABLE IN LOTS OF 100. IF YOU WISH TO ORDER MORE THAN 500 REPRINTS, PLEASE CONTACT OUR

REPRINTS DEPARTMENT AT (201) 748-8789 FOR A PRICE QUOTE.

Please send me _____________________ reprints of the above article at $

Please add appropriate State and Local Tax (Tax Exempt No.____________________) $

for United States orders only.

Please add 5% Postage and Handling $

TOTAL AMOUNT OF ORDER** $

**International orders must be paid in currency and drawn on a U.S. bank

Please check one: Check enclosed Bill me Credit Card

If credit card order, charge to: American Express Visa MasterCard

Credit Card No Signature Exp. Date

BILL TO: SHIP TO: (Please, no P.O. Box numbers)

Name Name

Institution Institution

Address Address

Purchase Order No. Phone Fax

E-mail

Page 6: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

COPYRIGHT TRANSFER AGREEMENT

Re: Manuscript entitled____________________________________________________________________________________________________________________________________________________________________(the “Contribution”)for publication in ______________________________________________________________________(the “Journal”)published by Wiley Periodicals, Inc. (“Wiley”).

Dear Contributor(s):

Thank you for submitting your Contribution for publication. In order to expedite the editing and publishing process and enableWiley to disseminate your work to the fullest extent, we need to have this Copyright Transfer Agreement signed and returned tous as soon as possible. If the Contribution is not accepted for publication this Agreement shall be null and void.

A. COPYRIGHT1. The Contributor assigns to Wiley, during the full term of copyright and any extensions or renewals of that term, all

copyright in and to the Contribution, including but not limited to the right to publish, republish, transmit, sell, distributeand otherwise use the Contribution and the material contained therein in electronic and print editions of the Journal andin derivative works throughout the world, in all languages and in all media of expression now known or laterdeveloped, and to license or permit others to do so.

2. Reproduction, posting, transmission or other distribution or use of the Contribution or any material containedtherein, in any medium as permitted hereunder, requires a citation to the Journal and an appropriate credit to Wileyas Publisher, suitable in form and content as follows: (Title of Article, Author, Journal Title and Volume/IssueCopyright [year] Wiley Periodicals, Inc. or copyright owner as specified in the Journal.)

B. RETAINED RIGHTSNotwithstanding the above, the Contributor or, if applicable, the Contributor’s Employer, retains all proprietary rightsother than copyright, such as patent rights, in any process, procedure or article of manufacture described in theContribution, and the right to make oral presentations of material from the Contribution.

C. OTHER RIGHTS OF CONTRIBUTORWiley grants back to the Contributor the following:

1. The right to share with colleagues print or electronic “preprints” of the unpublished Contribution, in form andcontent as accepted by Wiley for publication in the Journal. Such preprints may be posted as electronic files on theContributor’s own website for personal or professional use, or on the Contributor’s internal university or corporatenetworks/intranet, or secure external website at the Contributor’s institution, but not for commercial sale or for anysystematic external distribution by a third party (e.g., a listserve or database connected to a public access server).Prior to publication, the Contributor must include the following notice on the preprint: “This is a preprint of anarticle accepted for publication in [Journal title] copyright (year) (copyright owner as specified in the Journal)”.After publication of the Contribution by Wiley, the preprint notice should be amended to read as follows: “This is apreprint of an article published in [include the complete citation information for the final version of the Contributionas published in the print edition of the Journal]”, and should provide an electronic link to the Journal’s WWW site,located at the following Wiley URL: http://www.interscience.Wiley.com/. The Contributor agrees not to update thepreprint or replace it with the published version of the Contribution.

2. The right, without charge, to photocopy or to transmit online or to download, print out and distribute to a colleague acopy of the published Contribution in whole or in part, for the colleague’s personal or professional use, for the

Production/ContributionID# _______________Publisher/Editorial office use only

111 River StreetHoboken, NJ 07030201.748.6000FAX 201.748.6052

Date:

To:

Page 7: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

advancement of scholarly or scientific research or study, or for corporate informational purposes in accordance withParagraph D.2 below.

3. The right to republish, without charge, in print format, all or part of the material from the published Contribution ina book written or edited by the Contributor.

4. The right to use selected figures and tables, and selected text (up to 250 words, exclusive of the abstract) from theContribution, for the Contributor’s own teaching purposes, or for incorporation within another work by theContributor that is made part of an edited work published (in print or electronic format) by a third party, or forpresentation in electronic format on an internal computer network or external website of the Contributor or theContributor’s employer.

5. The right to include the Contribution in a compilation for classroom use (course packs) to be distributed to studentsat the Contributor’s institution free of charge or to be stored in electronic format in datarooms for access by studentsat the Contributor’s institution as part of their course work (sometimes called “electronic reserve rooms”) and for in-house training programs at the Contributor’s employer.

D. CONTRIBUTIONS OWNED BY EMPLOYER1. If the Contribution was written by the Contributor in the course of the Contributor’s employment (as a “work-made-

for-hire” in the course of employment), the Contribution is owned by the company/employer which must sign thisAgreement (in addition to the Contributor’s signature), in the space provided below. In such case, thecompany/employer hereby assigns to Wiley, during the full term of copyright, all copyright in and to theContribution for the full term of copyright throughout the world as specified in paragraph A above.

2. In addition to the rights specified as retained in paragraph B above and the rights granted back to the Contributorpursuant to paragraph C above, Wiley hereby grants back, without charge, to such company/employer, itssubsidiaries and divisions, the right to make copies of and distribute the published Contribution internally in printformat or electronically on the Company’s internal network. Upon payment of Wiley’s reprint fee, the institution maydistribute (but not resell) print copies of the published Contribution externally. Although copies so made shall not beavailable for individual re-sale, they may be included by the company/employer as part of an information packageincluded with software or other products offered for sale or license. Posting of the published Contribution by theinstitution on a public access website may only be done with Wiley’s written permission, and payment of anyapplicable fee(s).

E. GOVERNMENT CONTRACTSIn the case of a Contribution prepared under U.S. Government contract or grant, the U.S. Government may reproduce,without charge, all or portions of the Contribution and may authorize others to do so, for official U.S. Governmentpurposes only, if the U.S. Government contract or grant so requires. (U.S. Government Employees: see note at end.)

F. COPYRIGHT NOTICEThe Contributor and the company/employer agree that any and all copies of the Contribution or any part thereofdistributed or posted by them in print or electronic format as permitted herein will include the notice of copyright asstipulated in the Journal and a full citation to the Journal as published by Wiley.

G. CONTRIBUTOR’S REPRESENTATIONSThe Contributor represents that the Contribution is the Contributor’s original work. If the Contribution was preparedjointly, the Contributor agrees to inform the co-Contributors of the terms of this Agreement and to obtain their signatureto this Agreement or their written permission to sign on their behalf. The Contribution is submitted only to this Journaland has not been published before, except for “preprints” as permitted above. (If excerpts from copyrighted works ownedby third parties are included, the Contributor will obtain written permission from the copyright owners for all uses as setforth in Wiley’s permissions form or in the Journal’s Instructions for Contributors, and show credit to the sources in theContribution.) The Contributor also warrants that the Contribution contains no libelous or unlawful statements, does notinfringe upon the rights (including without limitation the copyright, patent or trademark rights) or the privacy of others, orcontain material or instructions that might cause harm or injury.

Page 8: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

CHECK ONE:

[____] Contributor-owned work

________________________________________________________ Type or print name and title

______________________________________ _________________ Co-contributor’s signature Date

________________________________________________________ Type or print name and title

ATTACHED ADDITIONAL SIGNATURE PAGE AS NECESSARY

[____] Company/Institution-owned work (made-for-hire in the

course of employment)

______________________________________ _________________ Authorized signature of Employer Date

[____] U.S. Government work

[____] U.K. Government work (Crown Copyright)

______________________________________ _________________ Contributor’s signature Date

______________________________________ _________________ Company or Institution (Employer-for-Hire) Date

Note to U.S. Government Employees

A contribution prepared by a U.S. federal government employee as part of the employee’s official duties, or which is anofficial U.S. Government publication is called a “U.S. Government work,” and is in the public domain in the United States.In such case, the employee may cross out Paragraph A.1 but must sign and return this Agreement. If the Contribution was notprepared as part of the employee’s duties or is not an official U.S. Government publication, it is not a U.S. Government work.

Note to U.K. Government Employees

The rights in a Contribution prepared by an employee of a U.K. government department, agency or other Crown body as partof his/her official duties, or which is an official government publication, belong to the Crown. In such case, Wiley willforward the relevant form to the Employee for signature.

Page 9: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

Worldwide Analysis of Multiple Microsatellites:Language Diversity has a Detectable Influenceon DNA DiversityAQ1

Elise M.S. Belle and Guido Barbujani*

Dipartimento di Biologia ed Evoluzione, Universita di Ferrara, Via Borsari, 46, 44100 Ferrara, Italy

KEY WORDS microsatellite loci; population genetics; genetic diversity; linguistics

ABSTRACT Previous studies of the correlations be-tween the languages spoken by human populations andthe genes carried by the members of those populationshave been limited by the small amount of geneticmarkers available and by approximations in the treat-ment of linguistic data. In this study we analyzed a largecollection of polymorphic microsatellite loci (377), distrib-uted on all autosomes, and used Ruhlen’s linguistic clas-sification, to investigate the relative roles of geographyand language in shaping the distribution of human DNAdiversity at a worldwide scale. For this purpose, we per-formed three different kinds of analysis: (i) we parti-tioned genetic variances at three hierarchical levels of

population subdivision according to language group bymeans of a molecular analysis of variance (AMOVA); (ii)we quantified by a series of Mantel’s tests the correlationbetween measures of genetic and linguistic differentia-tion; and (iii) we tested whether linguistic differencesare increased across known zones of increased geneticchange between populations. Genetic differences appearto more closely reflect geographic than linguistic differen-tiation. However, our analyses show that language differ-ences also have a detectable effect on DNA diversity atthe genomic level, above and beyond the effects of geo-graphic distance. Am J Phys Anthropol 131:000–000,2007. VVC 2007 Wiley-Liss, Inc.

The patterns of genetic diversity in the geographicspace reflect the interaction between evolutionary factorsleading populations to diverge (mutation, drift, anddiversifying selection) and factors causing genetic con-vergence (gene flow and other forms of selection). Inhumans, the spatial distance between potential mates isprobably the most important factor preventing gene flowbetween populations (Relethford, 2004), but nonspatialreproductive barriers, such as cultural, religious, andlanguage boundaries, also play an important role (Bar-bujani, 1991). In particular, language boundaries havebeen shown to represent a major obstacle to interbreed-ing, which accounts for the generally good correlationbetween allele frequencies and language groups (Sokal,1988; Cavalli-Sforza et al., 1988; Barbujani and Sokal,1990; Cavalli-Sforza et al., 1992).

Any obstacles to gene flow, including geographic dis-tance, tend to decrease both genetic and linguistic simi-larity between populations. Languages spoken by distantor isolated groups will undergo independent changes andslowly diverge from the source language, whereas adja-cent groups—or groups not separated by reproductivebarriers—will have easier contacts and will tend tospeak related languages (Nichols, 1997). Under the sameconditions, parallel processes occur at the genomic level.However, the phenomena producing genetic and linguis-tic change through time are not identical, and changesmight occur at different time scales. Cavalli-Sforza et al.(1988) first suggested that linguistic change might bemore rapid than genetic change. Chen et al. (1995)remarked that genetic transmission is only verticalwhereas languages can also spread horizontally and sug-gested that linguistic and genetic change are likely toproceed at different rates. In human history many exam-ples of phenomena disrupting the parallelism betweengenetic and linguistic change are indeed known. These

include selection affecting specific genetic loci, episodesof language replacement associated with the migrationof a numerically small elite, creolization, and admixtureaccompanied by the establishment of a common lan-guage (Renfrew, 1989). Therefore, a relationship betweenlanguage and genes is not a safe general assumption,but rather a hypothesis which needs to be tested.

Around 6,000 languages are currently spoken onearth, but the great majority is threatened of extinction.Indeed about 97% of the world’s population speak 4% ofthe languages, whereas 10% of languages have less than100 speakers (Wurm, 2001). About half of the languagesare currently losing their speakers, which could lead tothe replacement of between 50% and 90% of the minoritylanguages before the end of the century. In this context,it is urgent to assess to which extent languages differen-ces have played a role in shaping human genetic diver-sity before many of the relevant data vanish.

Several studies have shown that there is a significantcorrelation between genetic and linguistic diversity inEurope (Sokal, 1988; Sokal et al., 1989, 1989; Hardingand Sokal, 1988; Sajantila et al., 1995) and that lan-guage barriers probably play an important role in main-

J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 1

ID: vasanss Date: 20/3/07 Time: 15:43 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044

Grant sponsor: European Science Foundation (Eurocores Pro-gramme: The Origin of Man, Language, and Languages) throughthe Italian CNRUniversity of Ferrara.

*Correspondence to: Guido Barbujani, Dipartimento di Biologia edEvoluzione, Universita di Ferrara, Via Borsari, 46, 44100 Ferrara,Italy. E-mail: [email protected]

Received 27 June 2006; accepted 28 February 2007

DOI 10.1002/ajpa.20622Published online in Wiley InterScience (www.interscience.wiley.com).

VVC 2007 WILEY-LISS, INC.

AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 131:000–000 (2007)

Page 10: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

taining genetic differences by acting as reproductive bar-riers (Barbujani and Sokal, 1990). In sub-SaharanAfrica, Excoffier et al. (1991) found that the languagefamily relationships are a better predictor of the geneticstructure than geographical relatedness. Based on classi-cal markers (20 alleles in total), Cavalli-Sforza et al.(1988) first produced evidence suggesting that the mainlinguistic groups of the world broadly correspond togenetic clusters. Later, Chen et al. (1995) and Poloniet al. (1997) also found a significant correlation betweengenetic and linguistic diversity based, respectively, on 11autosomal loci and on Y-chromosome restriction polymor-phisms.

On the other hand, the results are not so clear-cut in theAmericas and at the worldwide scale. Some authors haveindeed failed to detect association between genetic and lin-guistic diversity among native Americans and Asians(Monsalve et al., 1999) and within North America (Hunleyand Long, 2005). Finally, Nettle and Harris (2003) found asignificant influence of languages on X chromosomegenetic differences in Europe and East and Central Asia,but not in the Near East, Southeast Asia, and West Africa,a result indicating that this correlation emerges onlyunder specific conditions. All these studies relied on a lim-ited number of genetic polymorphisms, which may notexhaustively reflect diversity at the genomic level.

The number of loci analyzed is not a secondary factor.Indeed, particular selection regimes—affecting specificloci—may generate genetic outliers, i.e. loci with unusualspatial patterns of diversity. As a consequence, as has beensuggested for many decades now (Cavalli-Sforza, 1966), tominimize the effect of such outliers, an extensive samplingof different genome regions is indispensable.

Therefore, in this study we analyzed a large sample ofpolymorphic microsatellite loci (377) spread across thegenome, the largest genetic dataset used so far to inves-tigate the influence of languages on the distribution ofhuman genetic diversity. Rosenberg et al. (2002) brieflydiscussed the relationship between genes and languagesin these data, but without a quantitative approach.Here, thanks to the large number of markers considered,we are seeking a more accurate estimate of the correla-tion between genetics and linguistics. Those autosomalmicrosatellite markers have a higher rate of mutationthan other markers, and could therefore parallel moreclosely the linguistic change (Barbujani, 1997). In addi-tion, estimates of variation at microsatellite loci do notseem to suffer from ascertainment bias, a factor knownto lead to erroneous conclusions in analyses based onsingle nucleotide or insertion–deletion polymorphisms(Clark et al., 2005).

We addressed three questions in particular, namely (i)Do increasing levels of linguistic differentiation lead togreater genetic distances? (ii) What is the relativeweight of geography and languages in shaping humangenetic diversity? (iii) Do previously identified geneticbarriers across the world also tend to represent zones ofincreased linguistic differentiation?

MATERIALS AND METHODS

Dataset

We analyzed the published genetic dataset comprising377 autosomal microsatellite loci distributed on all 22autosomes in 52 world populations (Rosenberg et al.,2002). The samples were part of a study of the cell-line

diversity panel of the CEPH (Centre pour l’Etude desPolymorphismes Humains at the Institut Jean Daussetin Paris) (Cann et al., 2002). In the CEPH panel, theindividuals defined as \Bantus" came from several dif-ferent areas; we excluded from the analysis all thosewho were not from Kenya because of their small samplesizes.

Languages were attributed to the samples according totwo sources, namely Ruhlen’s (1991) classification andThe Ethnologue website (Gordon, 2005). There is so farno universally accepted global taxonomy of languages;however, Ruhlen’s classification, albeit somewhat contro-versial, is widely used in population genetic studies(Poloni et al., 1997; Chen et al., 1995; Cavalli-Sforzaet al., 1992, 1988; Excoffier et al., 1991) because it isprobably the most comprehensive attempt to group allthe world’s 6,000 languages into linguistic phyla, 17. Thelanguages of all populations of this study were definedconsistently in the two sources, with the exception of afew populations indicated in Table T11. For instance, TheEthnologue classifies the American languages into tofour distinct phyla, and not one as Ruhlen (1991) does.

The CEPH dataset includes two pygmy samples, Biakaand Mbuti. The pygmies’ linguistic affiliation is difficultto define, as in most cases they seem to have borrowedtheir neighbors’ language (Cavalli-Sforza, 1986). Weattributed the Biaka to the Niger-Kordofanian phylum,which includes Bantu languages, because, besides speak-ing a language of this phylum, the Biaka are thought tohave undergone extensive admixture with Bantu speak-ers (Cavalli-Sforza, 1986). On the contrary, Mbuti pyg-mies include both speakers of Nilosaharan and Niger-Kordofanian languages (ALFRED database: Cheunget al., 2000), whom proved impossible for us to separate.As a consequence, the Mbuti sample was not consideredin the analysis. In this way, our results refer to 50 popu-lations: 6 from Africa, 3 from the Middle East, 26 fromAsia, 2 from Oceania, 8 from Europe, and 5 from theAmericas (Fig. F11), which we classified into linguistic fam-ilies, branches, and phyla (Table 1).

Analysis of molecular variance

Measures of genetic variance were hierarchically parti-tioned by an analysis of molecular variance (AMOVA)(Excoffier et al., 1992) implemented in the Arlequin ver2.0 software (Schneider et al., 2000). For each locus, wecompared each individual genotype with (i) the genotypeof the individuals of the same linguistic family, (ii) thegenotypes of the individuals from different families ofthe same linguistic phylum and (iii) the genotypes ofindividuals belonging to different linguistic phyla.

Two measures of genetic distance were estimated fromthe data which can be assimilated to FST and RST values.The number of alleles differing between two haplotypeswas calculated as:

dxy ¼XL

i¼1

dxyðiÞ

where dxy(i) is the Kronecker function, equal to 1 if thealleles of the ith locus are identical for both haplotypes,and equal to 0 otherwise, and summation is over all loci.This index is analogous to a weighted FST statistics overall loci (see Arlequin manual, Schneider et al., 2000).

J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 2

ID: vasanss Date: 20/3/07 Time: 15:43 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044

2 BELLE AND BARBUJANI

American Journal of Physical Anthropology—DOI 10.1002/ajpa

Page 11: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 3

ID: vasanss Date: 20/3/07 Time: 15:43 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044

TABLE I. Language name, family, branch and phylum for each population considered,according to Ruhlen’s classification of languages

Population name Language (dLAN¼1) Family (dLAN¼2) Branch (dLAN¼3) Phylum (dLAN¼4)

Biaka Yaka Bantoid Niger-Congo Niger-KordofanianEth: Niger-Congo

Mandenka Mandinka Mande Niger-Congo Niger-KordofanianEth: Niger-Congo

Yoruba Yoruba Yoruba-North. Akoko Niger-Congo Niger-KordofanianEth: Niger-Congo

San San Hai.n//um – KhoisanKenya Bantu Bantoid Niger-Congo Niger-Kordofanian

Eth: Niger-CongoMozabite Mozabite Berber – Afro-AsiaticBedouin Arabic Arabo-Canaanite Semitic Afro-AsiaticDruze Arabic Arabo-Canaanite Semitic Afro-AsiaticPalestinian Arabic Arabo-Canaanite Semitic Afro-AsiaticBrahui Brahui Dravidian North West Elamo-Dravidian

Eth: DravidianBalochi Baluchi Iranian Indo-Iranian Indo-Hittite

Eth: Indo-EuropeanHazara Persian Iranian Indo-Iranian Indo-Hittite

Eth: Indo-EuropeanMakrani Baluchi Iranian Indo-Iranian Indo-Hittite

Eth: Indo-EuropeanSindhi Sindhi Indic Indo-Iranian Indo-Hittite

Eth: Indo-EuropeanPathan Newari Tibetic Tibeto-Karen Sino-Tibetan

Eth: Himalayish Eth: Tibeto-BurmanKalash Kalasha Indic Indo-Iranian Indo-Hittite

Eth: Dardic Eth: Indo-EuropeanBurusho Burushaski – – Language isolateHan Mandarin/Cantonese Chinese Sinitic Sino-TibetanTujia Tujia Tai Austro-Tai Austric

Eth: Tibeto-Burman Eth: Sino-TibetanYi Yi Burmic Tibeto-Karen Sino-Tibetan

Eth: Tibeto-BurmanMiao Miao Miao Miao-Yao Austric

Eth: Miao-YaoOroqen Oroqen Tungus Mongolian-Tungus AltaicDaur Daur Mongolian Mongolian-Tungus AltaicMongolian Mongolian Mongolian Mongolian-Tungus AltaicHezhen Hezhen Tungus Mongolian-Tungus AltaicXibo Xibo Tungus Mongolian-Tungus AltaicUyghur Uyghur Turkic – AltaicDai Dai Austronesian Austro-Tai Austric

Eth: Austro-Tai Eth: Babar Eth: AustronesianLahu Lahu Burmic Tibeto-Karen Sino-Tibetan

Eth: Tibeto-BurmanShe She Yao Miao-Yao Austric

Eth: Miao-YaoNaxi Naxi Burmic Tibeto-Karen Sino-Tibetan

Eth: Tibeto-BurmanTu Tu Mongolian Mongolian-Tungus AltaicYakut Yakut Turkic – AltaicJapanese Japanese Japanese Korean-Japanese AltaicCambodian Khmer Mon-Khmer Austro-Asiatic Austric

Eth: Austro-AsiaticPapuan – Papuan – Indo-Pacific

Eth: Oceanic Eth: AustronesianMelanesian – Melanesian – Indo-Pacific

Eth: Oceanic Eth: AustronesianFrench French Romance continental italic Indo-HittiteBasque Basque – – Language IsolateSardinian Sardinian – Italic Indo-Hittite

Eth: Indo-EuropeanBergamo Italian Romance continental Italic Indo-Hittite

Eth: Indo-EuropeanTuscan Italian Romance continental Italic Indo-Hittite

Eth: Indo-EuropeanOrcadian English – Germanic Indo-Hittite

Eth: Indo-EuropeanAdygei Adyghe – Circassian Northern Caucasian

(continued)

3HUMAN DNA DIVERSITY AND LANGUAGES

American Journal of Physical Anthropology—DOI 10.1002/ajpa

Page 12: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

Slatkin’s (1995) RST is a specific distance measure formicrosatellites, based on both allele frequency differen-ces between populations and repeat number differencesbetween alleles. This measure was calculated as the sumof the squared numbers of repeat differences betweentwo haplotypes, that is to say as:

dxy ¼XL

i¼1

axi � ayi

� �2

where axi is the number of repeats at the ith locus (seeArlequin manual, Schneider et al., 2000).

The significance of the estimated variances was testedby means of a nonparametric permutational procedure(Excoffier et al., 1992). In three independent tests, (i)individuals were assigned to random language familiesof the same phylum, (ii) individuals were assigned torandom language families regardless of the phylum, and(iii) language families were randomly assigned to lan-guage phyla, each time recalculating the relevant var-iance. Each randomization process was iterated 10,000

J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 4

ID: vasanss Date: 20/3/07 Time: 15:43 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044

TABLE 1. (Continued)

Population name Language (dLAN¼1) Family (dLAN¼2) Branch (dLAN¼3) Phylum (dLAN¼4)

Russian Russian Slavic Balto-Slavic Indo-HittiteEth: Indo-European

Pima Pima Pimic Uto-Aztecan AmerindSonoran Eth: Aztecan

Maya Maya Mexican Penutian AmerindEth: Yucatecan Eth: Mayan

Piapoco Piapoco Macro-Arawakan Equatorial-Tucanoan AmerindEth: Maipuran Eth: Arawakan

Karitiana Karitiana Kariti-Tupi Equatorial-Tucanoan AmerindEth: Arikem Eth: Tupi

Surui Surui Kariti-Tupi Equatorial-Tucanoan AmerindEth: Monde Eth: Tupi

When the classification of The Ethnologue differs significantly from Ruhlen’s, the different classification is indicated as ‘Eth:’ on asecond line.

Fig. 1. Distribution of the sampling localities. The significant genomic boundaries found based on RST values (Barbujani andBelle, 2006) are represented by thick black lines. The Delaunay connections are represented by thin lines (solid between languagephyla, dashed between language families, and dotted within families). Different colors represent samples belonging to different lan-guage phyla and different symbols (triangles, circles, squares) represent samples belonging to different language families withinthe same phylum.

COLOR

4 BELLE AND BARBUJANI

American Journal of Physical Anthropology—DOI 10.1002/ajpa

Page 13: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

times. The empirical distributions thus obtained werethen compared with the observed variances, and an em-pirical level of significance was thus estimated for eachvariance.

Mantel tests

A matrix of great circle geographic distances (dGEO

matrix) between populations was constructed using amethod first proposed by Ramachandran et al. (2005),i.e. assuming five obligatory waypoints on the world’smap so as to make distances between continents morereflective of the likely human dispersal routes fromAfrica. These points were Anadyr, Russia (64N, 177E);Cairo, Egypt (30N, 31E); Istanbul, Turkey (41N, 28E);Phnom Penh, Cambodia (11N, 104E); and Prince Rupert,Canada (54N, 130W).

Linguistic distances (in the dLAN matrix) were esti-mated as simple dissimilarity indexes ranging from 0 to4, according to the method described by Excoffier et al.(1991). In fact, two dLAN matrices were constructed,based either on Ruhlen’s (1991) language classification,or on The Ethnologue. Population speaking languagesbelonging to different phyla were assigned dLAN ¼ 4,languages of different branches dLAN ¼ 3, languages ofdifferent families dLAN ¼ 2, different languages dLAN ¼1, and the same language dLAN ¼ 0.

Finally, matrices of genetic distances between popula-tions (dGEN matrix) were computed, using either thestandard FST measure or Slatkin’s RST.

Correlation between genetic boundaries andlanguage barriers

The correlations among the three matrices were esti-mated by pairwise (Mantel, 1967) and multiple (Smouseet al., 1986) Mantel tests. First we calculated three pair-wise correlations (between dGEO and dGEN, betweendGEO and dLAN, and between dGEN and dLAN). We thenseparated the effects of geography and language on thegenetic distances by calculating the partial correlationsbetween dGEN and dGEO (with dLAN held constant) andbetween dGEN and dLAN (with dGEO held constant). Thesignificance of the observed coefficients was calculatedby randomly permuting rows and columns of one matrixwhile keeping the other matrix constant, thus obtainingan empirical null distribution of correlation coefficients.Doubts have been raised on the accuracy of the P-valuesassociated with the partial Mantel tests (Raufaste andRousset, 2001). Be that as it may, we provide those val-ues for the sake of comparison with previous studies. Allthe above procedures are implemented in the Arlequinver 2.0 software (Schneider et al., 2000).

Correlation between genetic barriers andlinguistic groups

In a previous analysis of the same dataset, six signifi-cant genetic boundaries were identified, namely zoneswhere the rate of genetic change is locally increasedwith respect to random locations (Barbujani and Belle,2006; see also Rosenberg et al., 2005 for an alternativeapproach). In this study, we investigated whether atthese genetic boundaries linguistic change is increasedwith respect to other zones on the map. Adjacent popula-tions were connected by a Delaunay network (Manniet al., 2004) and each edge of the network was associatedwith a measure of linguistic differentiation (Figure 1).

For this purpose, we pooled the indexes of language dis-similarity used for the Mantel tests in two differentways: we either considered dLAN ¼ 4 versus dLAN ¼ 0, 1,2, 3 or dLAN ¼ 3, 4 versus dLAN ¼ 0, 1, 2. We then com-pared the average dLAN along edges of the network thatwere crossed by a significant genetic boundary with theaverage dLAN measured along the other edges of the net-work, under the null hypothesis of no difference betweenaverages.

RESULTS

AMOVA

Considering allele-frequency differences (FST), theanalysis of variance reveals that only 2.9% of the geneticvariation is explained by differences between linguisticphyla and 2.4% is due to differences between languagefamilies (Table T22). The average worldwide FST value esti-mated from this dataset is equal to 0.053.

When molecular differences between alleles are consid-ered (RST), genetic diversity between phyla and familiesincreases to 6.7% of the total for differences among lan-guage phyla and to 2.9% among families.

For both analyses, the diversity indexes among indi-viduals from the same language family, individuals fromdifferent language families of the same phylum, andindividuals from different language phyla are all signifi-cantly greater than 0 at P < 0.05. Considering the classi-fication of languages of The Ethnologue instead of Ruh-len’s, we found very similar results, with the differencein the percentages <0.5% for each component of varia-tion (data not given). In what follows, unless otherwisespecified, we shall refer to dLAN matrices estimatedaccording to Ruhlen’s classification.

We did not perform analyses at a lower hierarchicallevel, that is to say within language phyla or families,because of the highly variable number of populationssampled within each linguistic group.

Mantel tests

The correlation coefficients between geographic, lin-guistic and genetic distances, using either RST or FST

distances, are given in Table T3, F23 and Figure 2.Considering FST values, and hence not taking into

account molecular differences between alleles, both vari-ables, geography and language, appear significantlyassociated with genetic variation. The correlationbetween genetic diversity and linguistic distances washighly significant (r ¼ 0.226, P < 0.0001), meaning that5.1% (r2) of the genetic variation is accounted for by lan-guage distances. The proportion of genetic variationaccounted for by geography was much higher (r2 ¼65.3%). However, geographic distances and linguisticclassification are also correlated, and predictably so,

J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 5

ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044

TABLE 2. Hierarchical AMOVA analysis, showing the percent-age of variation at each of three levels of population hierarchy,considering either allele frequencies only (FST) or both allele

frequencies and molecular distances between alleles (RST)

Measure of genetic distance FST RST

Among language phyla 2.9 6.7Among populations between language

families, within phyla2.4 2.9

Within populations 94.7 90.4

All values are significant at the P < 0.0001 level.

5HUMAN DNA DIVERSITY AND LANGUAGES

American Journal of Physical Anthropology—DOI 10.1002/ajpa

Page 14: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 6

ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044

TABLE 3. Mantel correlation and partial correlation coefficients (r), between genetic (dGEN), geographic (dGEO) and linguistic matri-ces (dLAN) based on Ruhlen’s classification of languages, either using FST or RST measures of genetic diversity

Matrices considered

Genetic measure

FST RST

Correlationcoefficient (r)

Proportion ofvariance

explained (r2)Correlation

coefficient (r)

Proportion ofvariance

explained (r2)

DGEN and dGEO 0.808*** 0.653 0.746*** 0.557dGEN and dlan 0.226*** 0.051 0.311*** 0.097dGEO and dLAN 0.268*** 0.072 0.269*** 0.072dGEN, dGEO, and dLAN 0.653 0.570dGEN and dLAN, dLAN constanta 0.796*** 0.633 0.723*** 0.523dGEN and dLAN, dLAN constanta 0.018N.S. 0.0003 0.172*** 0.030

a Partial correlation coefficients.*** P � 0.0001; NS: non significants.

Fig. 2. (a) Correlation between genetic distances calculated as RST values, and geographic distances constrained by five obliga-tory waypoints (dGEO). Following Ramachandran et al. (2005), red squares represent comparisons within regions, green trianglescomparisons between populations in Africa and Eurasia, and blue diamonds comparisons with America and Oceania. (b) Box plotshowing the correlation between genetic distances (RST) and linguistic distances (dLAN) calculated on the basis of Ruhlen’s classifi-cation of languages. The upper and lower limits of the box represent, respectively, the 5% and 95% confidence intervals, and thelong horizontal bar in the box represents the median.

COLOR

6 BELLE AND BARBUJANI

American Journal of Physical Anthropology—DOI 10.1002/ajpa

Page 15: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

because populations speaking related languages formclusters in the geographical space. As a consequence, theobserved correlation between linguistic and genetic dis-tances may conceivably reflect the fact that both are cor-related with geographical distances. When geographicdistances are held constant, the correlation does notremain significant when genetic distances are measuredby FST.

By contrast, using RST values (Table 3), which aremore specific for microsatellites, we found that 55.7% ofthe genetic variance is explained by geography, whileonly 9.7% is accounted for by linguistics (without keep-ing any variable constant). In this case, the correlationbetween genetic diversity and linguistic distancesremains significant, and highly so, even at constant geo-graphic distances (r ¼ 0.172, P < 0.0001), and in thiscase linguistic distances alone account for less than 3%(r2) of genetic variation.

Therefore, the geographic location of populationsseems to be a better predictor of genetic distances thantheir linguistic affiliation, even though both correlationsare highly significant. However, our results also suggestthat between 43% (with RST values) and 45% (with FST

values) of the overall genetic variance reflects the actionof other factors. Once again, considering the classifica-tion of languages of The Ethnologue instead of Ruhlen’sdid not alter substantially our results (data not given).

Figure 2a is a plot of the RST values versus pairwisegeographic distances between populations. The correla-tion appears to be mostly due to comparisons of Ameri-can and Oceanian populations, where both the RST val-ues and geographic distances are greatest. However,increasing genetic distances tend to correspond toincreasing language distances also when populations ofthe same region are compared. This is confirmed by Fig-ure 2b, where the category corresponding to dLAN ¼ 4has a large confidence interval for the RST value.

Correlation between genetic barriers andlinguistic groups

In our 50 population samples, 48 different languagesare spoken belonging to 11 linguistic phyla and 2 lan-guage isolates, according to Ruhlen’s (1991) classifica-tion. If language differences represent an important bar-rier to gene flow, we expect the linguistic differencebetween populations on different sides of a genetic bar-rier to be on average greater than the one on the sameside of that barrier.

In a previous analysis of this dataset, six significantgenetic boundaries were identified using Slatkin’s RST

measure of genetic distances (Barbujani and Belle,2006). We found that out of the 97 Delaunay connectionsof the map, 23 are crossed by one of those genetic boun-daries (Fig. 1 and TableT4 4). A fraction equal to 32.7%(18/55) of the Delaunay connections between linguisticphyla (dLAN ¼ 4) is crossed by a significant genetic bar-rier, as is the case for 13.9% (6/43) of the connectionswithin linguistic phyla (dLAN ¼ 1, 2, or 3). The differencebetween those proportions is statistically significant(v2 ¼ 4.6, P < 0.05), showing that increased geneticchange tends to be observed with higher probabilitywhere linguistic change is greatest.

Among the most controversial issues in linguistics isthe classification of native American languages.Although Ruhlen clusters them into a single phylum,many studies have suggested that they could belong to

several distinct phyla (see for instance Sims-Williams,1998). We reran the analysis considering the classifica-tion of The Ethnologue (Table 1). In this way, 21 out of23 Delaunay connections showing increased geneticchange occur between different language phyla, thusmaking the difference between linguistic categories evenmore significant (v2 ¼ 9.9, P < 0.01) than in the previoustest.

To understand if the correlation is due to a specificdegree of linguistic differentiation, we also investigatedwhether populations speaking languages belonging todifferent phyla or branches (dLAN ¼ 3 or 4) occur moreoften on different sides of a genetic boundary than popu-lations speaking languages belonging to the same branch(dLAN ¼ 0, 1, or 2). The difference between those linguis-tic categories was not significant (v2 ¼ 2.13, NS).

DISCUSSION

This study suggests that linguistic differences betweenpopulations have a small, but nonnegligible, effect onpatterns of DNA variation at the world scale. The largenumber of markers considered, and the fact that ascer-tainment bias is unlikely to have affected the statisticsestimated from DNA data, suggests that this conclusionis robust and can be generalized at the genome level, atleast for autosomes. By contrast, uniparentally-transmit-ted markers (mtDNA and Y chromosome) have a lowereffective population size and hence are affected morestrongly by genetic drift, often showing peculiar geo-graphical patterns (see for instance Jorde et al., 2000,Wilson et al., 2001; Romualdi et al., 2002). As a conse-quence, for further generalizations our conclusions willhave to be tested against suitable mitochondrial DNAand Y-chromosome datasets.

Here we found that 5.3% of the global variance in ourautosomal dataset can be explained by differencesbetween linguistic groups (families or phyla) using FST

measures of genetic distances, and 9.6% using the RST

statistic. These values are of the same order of magni-tude as the proportion of genetic variation that can beattributed to differences among seven geographic regionsusing the same dataset, that is to say 3.6% (Rosenberget al., 2002) and 9.2% (Excoffier and Hamilton, 2003),using respectively FST and RST measures of genetic di-versity.

These low values could be interpreted as meaning thatlanguage affiliations are as informative—or uninforma-tive—as geographic locations to predict the genetic rela-

J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 7

ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044

TABLE 4. Proportion of Delaunay connections in the world’smap crossed by a significant genetic

barrier as defined in Barbujani and Belle (2006)

Linguisticdistance

Delaunay connections

Crossed bya barrier

Not crossedby a barrier Total

dLAN¼0 0 5 5dLAN¼1 1 8 9dLAN¼2 2 7 9dLAN¼3 3 17 20dLAN¼4 18 37 55Total 23 74 97

Two different ways of grouping the coding of the linguistic dis-tances were tested: (i) at the level dLAN¼4 versus dLAN¼0,l,2,3and (ii) at the level dLAN¼3,4 versus dLAN¼0,1,2 (see maintext).

7HUMAN DNA DIVERSITY AND LANGUAGES

American Journal of Physical Anthropology—DOI 10.1002/ajpa

Page 16: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

tionships between populations. However, the Manteltests indicate that in fact geographic distances are agood predictor of genetic distances, and that linguisticdistances also affect DNA differences.

The levels of genetic differentiation inferred frommicrosatellites are lower than that from other DNA poly-morphisms (Jorde et al., 2000; Romualdi et al., 2002).This is thought to be due, at least in part, to the highermicrosatellite mutation rates, which tend to increasevariation within groups compared to variation amonggroups (Jin and Chakraborty, 1995; Jorde et al., 2000).

When molecular distances between alleles were alsoconsidered (RST distances), the differences between lan-guage groups became almost twice as large (9.6%). Thisconfirms that evolutionary processes affecting DNAsequences and allele frequencies occur at different timescales. It is known that DNA sequences evolve relativelyslowly by the gradual accumulation of mutations,whereas allele frequencies change more rapidly underthe action of genetic drift. Many polymorphisms mayeven predate the geographical differentiation of modernhumans and differ between human groups only in rela-tive frequencies (Klein et al., 1993; O’hUigin et al. 2002).Accordingly, we would expect linguistic change to bemore closely associated with changes in allele frequen-cies, and hence with FST. However, microsatellites tendto evolve more rapidly than other polymorphisms, andhence RST measures seem to more closely parallel lin-guistic change than FST (for a similar result, see Excoff-ier and Hamilton, 2003). Globally speaking, the geneticdifferences between the main language phyla probablyreflect relatively ancient demographic subdivision, whoseeffects may be easier to identify at the time scale ofmicrosatellite evolution.

Mantel tests allowed us to quantify the relative impor-tance of geography and language in determining the cur-rent genetic diversity. As already observed, genetic dis-tances are more closely related with geographic thanwith linguistic distances. The values of the correlationcoefficients between geographic and genetic distancesare consistent with the ones obtained by Ramachandranet al. (2005) based on the same populations but with 406additional loci. Our values are lower, but we also observea stronger correlation with RST measures of genetic dis-tances (r ¼ 0.808) than with FST measures (r ¼ 0.746).

In the partial correlations, using RST measures ofgenetic diversity, geography accounts for more than 50%of the genetic variance when languages are kept con-stant, whereas linguistic differences account for only 3%of the genetic variance when geography is controlled for.Nevertheless, both correlations are significant. This sug-gests that (i), as already known, populations that aredistant in the geographical space tend to be geneticallydifferentiated, but also that (ii) pairs of populations sep-arated by the same geographic distance tend to be moregenetically differentiated if they speak languages belong-ing to different phyla. This significant correlationbetween linguistic and genetic distances based on RST

values is consistent with previous analyses of Europeandiversity based on nuclear allele frequencies (Sokal,1988) and on Y-chromosome polymorphisms (Poloniet al., 1997; Rosser et al., 2000), and with Chen et al.’s(1995) results at the world level, the latter study basedon 11 loci only. By contrast, when considering geneticdistances calculated as FST values, the partial correla-tion between genetics and linguistics holding geographyconstant is not significant. However, it is important to

note that because there is presently no consensus onhow to quantify linguistic relatedness, we were forced toadopt a rather arbitrary and approximate scale of val-ues. Therefore the fact that we did find significant corre-lations suggests that the relationship between linguisticand genetic differentiation may in fact be stronger thanestimated.

In agreement with this view is another result of thisstudy. We examined the effect of using a dLAN ¼ 8(instead of 4) for languages belonging to different phyla,implicitly assuming that only relatively close linguisticrelationships can be safely established, whereas the rela-tionships between language phyla are minimal andlargely undefined (see e.g. Trask, 1996). This rescalinghad the effect of further increasing the partial correla-tion between genetics and languages (r ¼ 0.206, P <0.0001 with RST measures), although the associationbetween genetics and geography remains stronger.

Geography and languages together explain between57% (RST) and 65% (FST) of the DNA variance among thepopulations of this study, two values lower than thosereported for the same populations using different statis-tics, namely 78% (Ramachandran et al., 2005). At anyrate, this means that a substantial part of microsatellitediversity must be accounted for by other demographic orevolutionary factors, and presumably that forces actinglocally, such as genetic drift and founder effects, have sub-stantially contributed to shaping human DNA diversity.Some aspects of this phenomenon have already beenpointed out, including the fact that the DNA similaritiesbetween the Hazara and the Uyghurs of this study wouldnot be expected based on either their linguistic or geo-graphic relationships (Rosenberg et al., 2002).

Finally, we investigated to which extent large linguis-tic differences are localized at the main genetic bounda-ries. Our analysis showed that the greatest proportion ofgenetic barriers falls between populations speaking lan-guages of different phyla. In other words, genetic boun-daries, or zones or sharp genetic change, tend to occurwhere well-distinct linguistic groups are at contact. Weshowed in a previous study (Belle and Barbujani, 2006)that genetic boundaries do not necessarily correspond toobvious physical barriers. Significant genetic boundarieshave been observed, for instance, within South Americanpopulations separated by but a few kilometers. The verysmall size of the American hunting–gathering popula-tions, and their documented tendency to split along fam-ily lines (Crawford, 1998) have certainly enhanced thelocal drift effects. Obviously, when population splits arefollowed by isolation, both language and genetic diver-gence will ensue, and language differences will reinforceevolutionary independence between populations, even inthe absence of other reproductive barriers.

To understand if the significant difference foundbetween genetic barriers separating populations speak-ing languages belonging to the same phylum or differentphyla was due to linguistic differences at the phylumlevel, we repeated the same analysis using a differentgrouping, that is to say between branches (or phyla) ver-sus within branches. We found that no significant differ-ence was left, suggesting that most of the correlationspreviously found are due to the effects of the major lin-guistic boundaries, such as those separating languagesof different phyla.

Our results may have been biased by a number of fac-tors. First, it has been suggested that the uneven distri-bution of the populations sampled across the world could

J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 8

ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044

8 BELLE AND BARBUJANI

American Journal of Physical Anthropology—DOI 10.1002/ajpa

Page 17: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

influence the inferred patterns of genetic diversity (Serreand Paabo, 2004), and hence of linguistic diversity. How-ever, this issue is still debated, with Rosenberg et al.(2005) maintaining that this geographic representationwas actually suitable for population structure studies.Second, the classification of languages at the world levelis still controversial. To better assess the patterns of cor-relation between genetic and linguistic diversity, weneed a clearer consensus on the relationships among lan-guages.

Despite those potential drawbacks, the results of ouranalysis, the largest comparison so far of genetic and lin-guistic diversity in terms of the number of loci consid-ered, confirm the existence, at a worldwide scale, of asmall but detectable effect of linguistic differences onhuman DNA diversity at the genomic level.

This result is also of practical anthropological signifi-cance, because most languages spoken on earth are cur-rently threatened of extinction. Our finding that the dis-tribution of languages has a detectable influence onhuman genetic diversity suggests that it would be essen-tial to constitute a collection of additional linguistic andgenetic data from the populations which are the mostendangered, before it is too late to study them.

ACKNOWLEDGMENTS

We thank Noah Rosenberg and another anonymousreviewer for insightful comments and suggestions.

LITERATURE CITED

Barbujani G. 1991. What do languages tell us about humanmicroevolution? Trends Ecol Evol 5:151–156.

Barbujani G. 1997. DNA variation and language affinities. Am JHum Genet 61:1011–1014.

Barbujani G, Belle E. 2006. Genomic boundaries betweenhuman populations. Hum Hered 61:15–21.

Barbujani G, Sokal RR. 1990. Zones of sharp genetic change inEurope are also linguistic boundaries. Proc Natl Acad SciUSA 87:1816–1819.

Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, PiouffreL, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-ThomsenA, Chen Z, Chu J, Carcassi C, Contu L, Du R, Excoffier L,Ferrara GB, Friedlaender JS, Groot H, Gurwitz D, Jenkins T,Herrera RJ, Huang X, Kidd J, Kidd KK, Langaney A, Lin AA,Mehdi SQ, Parham P, Piazza A, Pistillo MP, Qian Y, Shu Q,Xu J, Zhu S, Weber JL, Greely HT, Feldman MW, Thomas G,Dausset J Cavalli-Sforza LL. 2002. A human genome diversitycell line panel. Science 296:261–262.

Cavalli-Sforza LL. 1966. Population structure and human evolu-tion. Proc Royal Soc B 164:362–379.

Cavalli-Sforza LL. 1986. African Pygmies. Academic Press, Or-lando, FL.

Cavalli-Sforza LL, Minch E Mountain JL. 1992. Coevolution ofgenes and languages revisited. Proc Natl Acad Sci USA89:5620–5624.

Cavalli-Sforza LL, Piazza A, Menozzi P Mountain J. 1988.Reconstruction of human evolution: Bringing together genetic,archaeological and linguistic data. Proc Natl Acad Sci USA85:6002–6006.

Chen J, Sokal RR, Ruhlen M. 1995. Worldwide analysis ofgenetic and linguistic relationships of human populations.Hum Biol 67:595–612.

Cheung KH, Osier MV, Kidd JR, Pakstis AJ, Miller PL, KiddKK. 2000. ALFRED: an allele frequency database for diversepopulations and DNA polymorphisms. Nucleic Acids Res28:361–363.

Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, NielsenR. 2005. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res 15:1496–1502.

Crawford M. 1998. The origin of native Americans: evidencefrom archeological genetics. Cambridge University Press,Cambridge.

Excoffier L, Hamilton G. 2002. Comment on \Genetic structureof human populations". Science 298:2381–2385.

Excoffier L, Harding RM, Sokal RR, Pellegrini B, Sanchez-Mazas A. 1991. Spatial differentiation of RH and GM haplo-type frequencies in sub-Saharan Africa and its relation to lin-guistic affinities. Hum Biol 63:273–307.

Excoffier L, Smouse PE, Quattro JM. 1992. Analysis of molecu-lar variance inferred from metric distances among DNA hap-lotypes: application to human mitochondrial DNA restrictiondata. Genetics 131:479–491.

Gordon RG Jr. 2005. Ethnologue: languages of the world, 15thed. Dallas, TX: SIL International. Online version: http://www.ethnologue.com/.

Harding RM, Sokal RR. 1988. Classification of the Europeanlanguage families by genetic distances. Proc Natl Acad SciUSA 85:9370–9372.

Hunley K, Long JC. 2005. Gene flow across linguistic bounda-ries in native North American populations. Proc Natl AcadSci USA 102:1312–1317.

Jin L, Chakraborty R. 1995. Population structure, stepwisemutations, heterozygote deficiency and their implications inDNA forensics. Heredity 74:274–285.

Jorde LB, Watkins WS, Bamshad MJ, Dixon ME, Ricker CE,Seielstad MT, Batzer MA. 2000. The distribution of humangenetic diversity: a comparison of mitochondrial, autosomaland Y-chromosome data. Am J Hum Genet 66:979–988.

Klein J, Satta Y, O’hUigin C, Takahata N. 1993. The moleculardescent of the major histocompatibility complex. Annu RevImmunol 11:269–295.

Manni F, Guerard E, Heyer E. 2004. Geographic patterns of(genetic, morphologic, linguistic) variation: how barriers canbe detected by using Monmonier’s algorithm. Hum Biol76:173–190.

Mantel NA. 1967. The detection of disease clustering and a gen-eralized regression approach. Cancer Res 27:209–220.

Monsalve MV, Helgason A, Devine DV. 1999. Languages, geog-raphy and HLA haplotypes in native American and Asianpopulations. Proc Royal Soc B 266:2209–2216.

Nettle D, Harriss L. 2003. Genetic and linguistic affinitiesbetween human populations in Eurasia and West Africa. HumBiol 75:331–344.

Nichols J. 1997. Modeling ancient population structures andmovement in linguistics. Annu Rev Anthropol 26:359–384.

O’hUigin C, Satta Y, Takahata N, Klein J. 2002. Contribution ofhomoplasy and of ancestral polymorphism to the evolution ofgenes in anthropoid primates. Mol Biol Evol 19:1501–1513.

Poloni ES, Semino O, Passarino G, Santachiara-Benerecetti AS,Dupanloup I, Langaney A, Excoffier L. 1997. Human geneticaffinities for Y-chromosome P49a,f/TaqI haplotypes showstrong correspondence with linguistics. Am J Hum Genet61:1015–1035.

Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA,Feldman MW, Cavalli-Sforza LL. 2005. Support from the rela-tionship of genetic and geographic distance in human popula-tions for a serial founder effect originating in Africa. ProcNatl Acad Sci USA 102:15942–15947.

Raufaste N, Rousset F. 2001. Are partial Mantel tests adequate?Evolution 55:1703–1705.

Relethford JH. 2004. Global patterns of isolation by distancebased on genetic and morphological data. Hum Biol 76:499–513.

Renfrew C. 1989. Models of change in languages and archaeol-ogy. Trans Philos Soc 87:103–155.

Romualdi C, Balding D, Nasidze IS, Risch G, Robichaux M,Sherry ST, Stoneking M, Batzer MA, Barbujani G. 2002. Pat-terns of human diversity, within and among continents,inferred from biallelic DNA polymorphisms. Genome Res12:602–612.

Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK,Zhivotovsky LA, Feldman MW. 2002. Genetic structure ofhuman populations. Science 298:2381–2385.

J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 9

ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044

9HUMAN DNA DIVERSITY AND LANGUAGES

American Journal of Physical Anthropology—DOI 10.1002/ajpa

Page 18: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

Rosenberg NA, Mahajan S, Ramachandran S, Zhao C, PritchardJK, Feldman MW. 2005. Clines, clusters, and the effects ofstudy design on the inference of human population structure.PloS Genetics 6:e70.

Rosser ZH, Zerjal T, Hurles ME, Adojaan M, Alavantic D,Amorim A, Amos W, Armenteros M, Arroyo E, Barbujani G,Beckman G, Beckman L, Bertranpetit J, Bosch E, BradleyDG, Brede G, Cooper G, Corte-Real HB, de Knijff P, DecorteR, Dubrova YE, Evgrafov O, Gilissen A, Glisic S, Golge M,Hill EW, Jeziorowska A, Kalaydjieva L, Kayser M, Kivisild T,Kravchenko SA, Krumina A, Kucinskas V, Lavinha J, LivshitsLA, Malaspina P, Maria S, McElreavey K, Meitinger TA,Mikelsaar AV, Mitchell RJ, Nafa K, Nicholson J, Norby S,Pandya A, Parik J, Patsalis PC, Pereira L, Peterlin B, Piel-berg G, Prata MJ, Previdere C, Roewer L, Rootsi S, Rubinsz-tein DC, Saillard J, Santos FR, Stefanescu G, Sykes BC,Tolun A, Villems R, Tyler-Smith C, Jobling MA. 2000. Y-chro-mosomal diversity in Europe is clinal and influenced primar-ily by geography, rather than language. Am J Hum Genet67:1526–1543.

Ruhlen M. 1991. A guide to the world’s languages, Vol 1: Classi-fication. Stanford, CA: Stanford University Press.

Sajantila A, Lahermo P, Anttinen T, Lukka M, Sistonen P,Savontaus M-L, Aula P, Beckman L, Tranebjaerg L, Gedde-Dahl T, Issel-Tarver L, DiRienzo A, Paabo S. 1995. Genes and

languages in Europe: an analysis of mitochondrial lineages.Genome Res 5:42–52.

Schneider S, Roessli D, Excoffier L. 2000. Arlequin ver. 2.000: Asoftware for population genetics data analysis. Genetics andBiometry Laboratory, University of Geneva, Switzerland.

Serre D, Paabo S. 2004. Evidence for gradients of human genetic di-versity within and among continents. Genome Res 14:1679–1685.

Sims-Williams P. 1998. Genetics, linguistics, and prehistory:thinking big and thinking straight. Antiquity 72:505–527.

Slatkin M. 1995. A measure of population subdivision based onmicrosatellite allele frequencies. Genetics 139:457–462.

Smouse PE, Long JC, Sokal RR. 1986. Multiple regression andcorrelation extensions of the Mantel test of matrix correspon-dence. Syst Zool 35:627–632.

Sokal RR. 1988. Genetic, geographic and linguistic distances inEurope. Proc Natl Acad Sci USA 85:1722–1726.

Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J, Vaudor A.1989. Genetic differences among language families in Europe.Am J Phys Anthropol 79:489–502.

Trask RL. 1996. Historical linguistics. London: Arnold.Wilson JE, Weale ME, Smith AC, Gratrix F, Fletcher B, Thomas

MG, Bradman N, Goldstein DB. 2001. Population geneticstructure of variable drug response. Nat Genet 29:265–269.

Wurm SA. 2001. Atlas of the world’s languages in danger of dis-appearing, 2nd ed. UNESCO Publishing.

J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 10

ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044

10 BELLE AND BARBUJANI

American Journal of Physical Anthropology—DOI 10.1002/ajpa

Page 19: American Journal of Physical Anthropology - Fare …...American Journal of Physical Anthropology Copy of e-mail Notification Cadmus Communications 300 W. Chestnut St. Ephrata, PA 17522

AQ1: The title has been modified slightly so as to eliminate the \the statement/sentence effect" of theoriginal. Kindly confirm whether the changes made are appropriate.

AQ2: Kindly note that as Roosenberg et al (2002) is repeated twice in the reference list, the latter hasbeen deleted.

J_ID: ZC0 Customer A_ID: 2006-00153.R2 Cadmus Art: AJPA20622 Date: 20-MARCH-07 Stage: I Page: 11

ID: vasanss Date: 20/3/07 Time: 15:44 Path: J:/Production/AJPA/Vol00000/070044/3B2/C2AJPA070044