Research Article Mutation Detection in an Antibody ...

9
Research Article Mutation Detection in an Antibody-Producing Chinese Hamster Ovary Cell Line by Targeted RNA Sequencing Siyan Zhang, 1 Jason D. Hughes, 2 Nicholas Murgolo, 3 Diane Levitan, 3 Janice Chen, 1 Zhong Liu, 1 and Shuangping Shi 1 1 Biologics & Vaccines, Merck Research Laboratories, Kenilworth, NJ 07033, USA 2 Biology & Genetics Informatics, Merck Research Labs IT, Merck & Co, Boston, MA 02115, USA 3 Discovery Pharmacogenomics, Merck Research Laboratories, Kenilworth, NJ 07033, USA Correspondence should be addressed to Shuangping Shi; [email protected] Received 18 November 2015; Revised 4 February 2016; Accepted 21 February 2016 Academic Editor: Jorge F. B. Pereira Copyright © 2016 Siyan Zhang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Chinese hamster ovary (CHO) cells have been used widely in the pharmaceutical industry for production of biological therapeutics including monoclonal antibodies (mAb). e integrity of the gene of interest and the accuracy of the relay of genetic information impact product quality and patient safety. Here we employed next-generation sequencing, particularly RNA-seq, and developed a method to systematically analyze the mutation rate of the mRNA of CHO cell lines producing a mAb. e effect of an extended culturing period to mimic the scale of cell expansion in a manufacturing process and varying selection pressure in the cell culture were also closely examined. 1. Introduction e development of next-generation sequencing (NGS) tech- nologies has greatly improved the efficiency of sequencing and contributed to the understanding of dynamic changes in gene expression [1]. With the maturation of NGS, its applications in biomedical research and drug discovery have greatly advanced the identification of disease related mutations and the development of molecules targeting the aberrantly expressed gene products [2–6]. Massively parallel cDNA sequencing (RNA-seq) has revolutionized transcrip- tomics studies compared to microarray technologies [7]. RNA-seq allows both qualitative and quantitative analysis of the expressed gene product at messenger RNA (mRNA) level with wide dynamic ranges and superior sensitivity [8]. Mammalian cell lines such as the Chinese hamster ovary (CHO) cells have been widely used in the production of recombinant therapeutic product including monoclonal anti- bodies [9, 10]. ese cell lines are propagated extensively to reach large-scale production vessel. Production cell lines are generated by transfecting the host cells with a plasmid vector expressing the gene of interest (GOI) and a selection marker, followed by drug treatment and clone selection. During a large-scale manufacturing process, cells from a frozen bank need to be expanded multiple times to reach a final volume as large as 20,000 liters. e integrity of the GOI and the accurate flow of genetic information throughout this process are crucial to product quality. Traditionally, protein sequencing and mass spectrometry are used to characterize the final product for its consistency and homogeneity at the protein level [11]. DNA sequencing based on the Sanger or pyrosequencing method has also been used for sequence analysis of the mRNA (via cDNA) [12]. Although these mam- malian host cells have a proven track record in consistently producing high-quality products, a potential threat is posed to the quality of the final product by the drug selection process, cloning procedures, and environmental stress over extended passaging conditions [13]. Product variants includ- ing point mutations could develop during the life cycle of the production cells. However, the extent of this risk has not been fully understood due to the limitations of traditional molecular biology tools mentioned above. In this study, we explored the use of RNA-seq technology for the characterization of the mutation rate in a stably trans- fected CHO cell line expressing a recombinant monoclonal Hindawi Publishing Corporation BioMed Research International Volume 2016, Article ID 8356435, 8 pages http://dx.doi.org/10.1155/2016/8356435

Transcript of Research Article Mutation Detection in an Antibody ...

Page 1: Research Article Mutation Detection in an Antibody ...

Research ArticleMutation Detection in an Antibody-Producing ChineseHamster Ovary Cell Line by Targeted RNA Sequencing

Siyan Zhang1 Jason D Hughes2 Nicholas Murgolo3 Diane Levitan3

Janice Chen1 Zhong Liu1 and Shuangping Shi1

1Biologics amp Vaccines Merck Research Laboratories Kenilworth NJ 07033 USA2Biology amp Genetics Informatics Merck Research Labs IT Merck amp Co Boston MA 02115 USA3Discovery Pharmacogenomics Merck Research Laboratories Kenilworth NJ 07033 USA

Correspondence should be addressed to Shuangping Shi shuangpingshimerckcom

Received 18 November 2015 Revised 4 February 2016 Accepted 21 February 2016

Academic Editor Jorge F B Pereira

Copyright copy 2016 Siyan Zhang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Chinese hamster ovary (CHO) cells have been used widely in the pharmaceutical industry for production of biological therapeuticsincluding monoclonal antibodies (mAb) The integrity of the gene of interest and the accuracy of the relay of genetic informationimpact product quality and patient safety Here we employed next-generation sequencing particularly RNA-seq and developed amethod to systematically analyze the mutation rate of the mRNA of CHO cell lines producing a mAb The effect of an extendedculturing period to mimic the scale of cell expansion in a manufacturing process and varying selection pressure in the cell culturewere also closely examined

1 Introduction

Thedevelopment of next-generation sequencing (NGS) tech-nologies has greatly improved the efficiency of sequencingand contributed to the understanding of dynamic changesin gene expression [1] With the maturation of NGS itsapplications in biomedical research and drug discoveryhave greatly advanced the identification of disease relatedmutations and the development of molecules targeting theaberrantly expressed gene products [2ndash6] Massively parallelcDNA sequencing (RNA-seq) has revolutionized transcrip-tomics studies compared to microarray technologies [7]RNA-seq allows both qualitative and quantitative analysis ofthe expressed gene product at messenger RNA (mRNA) levelwith wide dynamic ranges and superior sensitivity [8]

Mammalian cell lines such as the Chinese hamster ovary(CHO) cells have been widely used in the production ofrecombinant therapeutic product includingmonoclonal anti-bodies [9 10] These cell lines are propagated extensivelyto reach large-scale production vessel Production cell linesare generated by transfecting the host cells with a plasmidvector expressing the gene of interest (GOI) and a selectionmarker followed by drug treatment and clone selection

During a large-scale manufacturing process cells from afrozen bank need to be expanded multiple times to reach afinal volume as large as 20000 litersThe integrity of the GOIand the accurate flow of genetic information throughout thisprocess are crucial to product quality Traditionally proteinsequencing and mass spectrometry are used to characterizethe final product for its consistency and homogeneity at theprotein level [11] DNA sequencing based on the Sanger orpyrosequencing method has also been used for sequenceanalysis of themRNA (via cDNA) [12] Although thesemam-malian host cells have a proven track record in consistentlyproducing high-quality products a potential threat is posedto the quality of the final product by the drug selectionprocess cloning procedures and environmental stress overextended passaging conditions [13] Product variants includ-ing point mutations could develop during the life cycle ofthe production cells However the extent of this risk has notbeen fully understood due to the limitations of traditionalmolecular biology tools mentioned above

In this study we explored the use of RNA-seq technologyfor the characterization of the mutation rate in a stably trans-fected CHO cell line expressing a recombinant monoclonal

Hindawi Publishing CorporationBioMed Research InternationalVolume 2016 Article ID 8356435 8 pageshttpdxdoiorg10115520168356435

2 BioMed Research International

antibody (mAb) under extensive in vitro passaging The goalis to identify and quantify mutations in a cell population atthe transcript level under various culture conditions We firstcarried out a feasibility study by mixing two slightly differentmAb light chain cDNAs at different ratios and subjected themixture samples to RNA-seq analysis The detection limit ofthe mutation rate was determined by the feasibility studySince mutation rate is presumably related to the length ofpassaging and the presence of potentially mitogenic selectionreagents such as methotrexate (MTX) we next culturedthe CHO cell line continuously to reach an in vitro cellage of sim150 population doubling levels (PDLs) In parallelincreasing the dose of MTX was also evaluated for its impacton mutation rate The method we developed in this studywill be instrumental in defining the cell culture parametersto ensure consistent and reliable product quality

2 Materials and Methods

21 Feasibility Study by cDNAMixing Two cell clones (A andB) expressing a human IgG with different light chain (LC)sequences were thawed from frozen banks and cultured inalpha-MEM (Gibco Cat 12561) containing 10 dialyzed fetalbovine serum (FBS SAFC Cat 12015C) and 045 glucose(Sigma Cat G8769) Cells were passaged and expanded forRNA extraction RNA extraction was performed using theRNeasy kit (Qiagen Cat 74104) andRNAwas eluted in 50 120583LRNase-free water RNA concentrationwasmeasured onNan-oDrop Spectrophotometer (ND-1000 Thermo Scientific)

RT-PCR of IgG light chains was set up with 200 ng RNAper sample using the OneStep RT-PCR kit (Qiagen Cat210212) in 50 120583L reaction volume RT-PCR was run on theApplied Biosystems 2720 Thermal Cycler with incubationperiods of 30min at 50∘C and 15min at 95∘C 30 cyclesof 30-second denaturing at 94∘C 30-second annealing at62∘C and 2min extension at 72∘C followed by final 10minincubation at 72∘C cDNA was purified using the QiaquickPCR Purification Kit (Qiagen Cat 28106) and eluted in 30 120583LEB buffer (10mM Tris-Cl pH 85) cDNA concentrationswere measured on NanoDrop The cDNA of clone B wasmixed with cDNAof clone A atmixing ratios of 5 1 0501 005 and 001 Triplicate samples of pure cDNA ofclones A and B and each mixture were submitted to BGI forRNA-seq

See Supplementary Information in Supplementary Mate-rial available online at httpdxdoiorg10115520168356435for light chain and primer sequences

22 cDNA Preparation from Cell Line under Different CultureConditions (Main Study) Clone A derived from a singlecell was thawed from a frozen bank at about 14 PDLs sinceserum-free adaptation and cultured in Ex-cell ACF CHOmedium C5467 (SAFC Cat 86016C-1000mL) with 4mM L-glutamine (Gibco Cat 25030) 1x Trace Elements A (CellgroCat 99-182-C1) and 1x Trace Elements B (Cellgro Cat 99-175-C1) Cells after thawing were termed PDL 0 and around1 million cells were pelleted and resuspended in 350 120583L RLTbuffer with 1 beta-mercaptoethanol for RNA extraction

Cells were further passaged at 05millionmL every 3-4 daysin the presence of 0 20 or 80 nMMTX (Sigma Cat 8407) at37∘C and 75 CO

2

At PDLs 0 50 100 and 150 15 million cells were pelleteddivided into 3 aliquots upon lysis (except PDL 0 samplewhich was divided into replicates at RNA level) and RNAwas extracted following Qiagen protocol (Qiagen RNeasykit Cat 74104) Reverse transcription was performed with200 ng RNA using the AccuScript High Fidelity RT-PCR kits(Agilent Cat 600180) The thermal program includes 5minincubation at 65∘C and cooling to room temperature for5min followed by addition of 1 120583L of 100mM dithiothreitol(DTT) and 1 120583L of AccuScript Reverse Transcriptase Thereaction was further incubated at 42∘C for 30min and storedat 4∘C Three separate reverse transcription reactions wereperformed for PDL 0 RNA to create replicates cDNAs ofheavy chain (HC) light chain (LC) dihydrofolate reductase(DHFR) andGAPDHwere amplified via PCRusing PfuUltraHF DNA polymerase (Agilent Cat 600380) and the follow-ing thermal cycle program 1min at 95∘C 30 cycles of 30 sec-onds at 95∘C 30 seconds at 64∘C (62∘Cannealingwas used forDHFR) and 3min at 68∘C followed by a final 10min incuba-tion at 68∘C PCRproductswere purified usingQiaquick PCRPurification Kit (Qiagen Cat 28104) For each sample equal-molar ratios of HC LC DHFR and GAPDHwere mixed to atotal cDNAmass of 25 120583g and submitted for RNA-seq at BGIThe experimental procedure is outlined in Figure 1

For the feasibility study the amplified fragment for lightchain corresponded precisely to the target sequence In themain study a slightly larger region was amplified for eachtarget to ensure that the region of interest was outside therange of the PCR primers themselvesThe references used formapping were modified accordingly

23 RNA-Seq At BGI cDNA was fragmented to an averagefragment size of 170ndash180 bp using Covaris OnThermomixerthese fragments were subjected to end-repair and the 31015840end was adenylated Adaptors were ligated to the 31015840 endsThe ligation products were purified on TAE-agarose gel andsim14 rounds of PCR amplification were performed to enrichthe purified cDNA template For quality control the librarywas validated on the Agilent Technologies 2100 Bioanalyzerand the ABI StepOnePlus Real-Time PCR System Qualifiedlibraries were sequenced on Illumina HiSeq2000 and 100Mbclean sequence data were generated for each

See Supplementary Information for details on sequencesof primers and amplified regions Analysis was performedexcluding the regions corresponding to the PCR primers

3 Results

31 Feasibility Study cDNAs from two clones expressinglight chainwith closely related but slightly differing sequenceswere mixed in different ratios to assess the ability of NGS toquantitatively detect the fraction of mutant bases in a mixedpopulationThe sequences chosen for this were each 714 baseslong and differed in 46 positions The sequence alignment isshown in Figure S1

BioMed Research International 3

Cellisolation

RNAextraction

Dataanalysis

Reversetranscriptionand PCR of

specific genes

Equal-molarmixing and

submitting forsequencing

Figure 1 Experimental outline of RNA-seq studies of production CHO cell linesThe tested CHO cell lines expressing mAb were propagatedin suspension Cell pellets were isolated and RNA samples were subsequently extracted Reverse transcription was performed on the RNAsamples and certain genes of interest were amplified from cDNAs After library preparation the product was sequenced on IlluminaHiSeq2000 Details of data analysis are described in Section 3

Detecting the fraction of sequence reads from a mixtureof these clones is fundamentally different than detectingemerging mutations in cell culture in that one would notexpect to find so many mutations emerging at once In termsof the data analysis the main impact is on the ability to mapreads For example in the sequence between positions 80 and120 there are more than a dozen sequence differences Bydefault most short-readmappers will onlymap reads reliablywhen the error rate is less than around 5 If sequencesincluding mixtures of reads from clones A and B weremapped directly to clone A reference some reads from cloneBwould notmap at all to cloneA referenceThis would not beexpected to happen in the real case of an emerging mutationat a single position To address this issue for the feasibilitystudy we map reads to a reference sequence that includesboth clone A and clone B sequences using BWA (httpsgithubcomlh3bwa version 070 Li H and Durbin R(2009) Fast and accurate short read alignment with Burrows-Wheeler transform Bioinformatics 25 1754ndash1760 [PMID19451168]) BWA will output the single best alignment foreach read in SAM format For reads from regions whereclones A and B differ the alignment will indicate that themapping was specific to reference A or B For reads fromregions where clones A and B do not differ reads will berandomly assigned to one reference or the other In orderto obtain a mapping that is consistent with what we wouldexpect to find in the real study if any one of the 46 mutationshad occurred singly we modify the mappings obtained inthis way as follows We replace all occurrences of the cloneB sequence identifier in the SAM-formatted alignment fileswith the clone A identifier and we ignore the trailing tagfields Since there are no insertion or deletion differencesbetween the two clones the SAM file obtained in this wayis perfectly consistent with what would have been obtainedif the mutations had occurred separately This procedure isequivalent to mapping reads to each of the clone sequencesseparately determining which reference was a better fit and

then translating the clone B alignments to become cloneA alignments In this case that translation step is trivialsince the two sequences differ only by substitutions The keyadvantage of this approach over any single-referencemappingapproach is that it eliminates the possibility of any edgeeffects or incorrectly induced insertions or deletions in thealignments in regions where the clones A and B sequencesare significantly different Had we used a more exhaustiveapproach such as a Smith-Waterman alignment of all reads tothe clone A sequence for example the resulting alignmentsof reads from clone B that included significantly differingsections would have had small errors in alignment that wouldhave confounded the analysis Also it is important to notethat this modified alignment procedure is only relevant forthe initial validation portion of this study

Aside from this mapping difference the analysis for thefeasibility study is performed exactly as for the main studySequence data were received from BGI in FASTQ formatAdapters were removed using SeqPrep (httpsgithubcomjstjohnSeqPrep version 04 unpublished) and aligned tothe reference sequence using BWA Coverage across the lightchain sequence for all samples is shown in Figure S2 Theoverall mapping rate across all experiments was very highgenerally around 99 and the reads aligned with a very lowmismatch rate typically around 02 mismatches per 90 bpread This indicates that we had very little contamination inthe experiment

The SAMtools program ldquompileuprdquo (httpsgithubcomsamtoolssamtools version 0119 Li Hlowast Handsaker BlowastWysoker A Fennell T Ruan J Homer N Marth G Abeca-sis G andDurbin R and 1000Genome Project Data Process-ing Subgroup (2009) The Sequence alignmentmap (SAM)format and SAMtools Bioinformatics 25 2078-9 [PMID19505943]) was used along with custom scripts to extract foreach position in the target region the counts of each base of ACG andT aswell as the numbers of insertions and deletionsInsertions were counted according to the base immediately

4 BioMed Research International

preceding the insertion regardless of what sequence wasbeing inserted Similarly deletions were attributed to the basebeing deleted regardless of how many bases were spannedby the overall deletion These counts were stratified based onwhether they were found from reads aligned in the forwardor reverse directions Bases with quality scores less than15 were ignored in this analysis This cutoff was selectedto remove a minimum amount of data (typically 2ndash5 ofbases) while eliminating the lowest quality bases which aremainly those with reported base quality of two indicatingthat the sequencer failed to call the base at the positionWithin each experiment for each position in each targetsequence a preferred orientation was determined based onwhich orientation gave rise to higher overall coverage Onlydata from reads in the preferred orientation at each positionwas used to generate final results Overall this step has theimpact of removing a small portion of very-low-quality dataat the cost of ignoring just under half of the overall sequencedata which has little impact on most positions

This decision to use only data from reads in a preferredorientation is driven by the fact that some sequence contextsare problematic for sequencing (observed in a variety oftargeted sequencing experiments unpublished results) Theproblem may arise from any step in the process fromamplification to library prep to the sequencing itselfThe issueis often found in regions that are G-rich The reads on theG-rich strand will often have errors while the reads fromthe other C-rich strand do not In those cases we find thatthe ldquobetterrdquo strand usually has higher coverage presumablybecause the sequencer was unable to generate acceptablereads from that direction andor some of the base calls hadquality scores below the threshold of 15 By applying a cutoffbased on coverage we are able to identify the ldquobetterrdquo strandwithout explicitly biasing the analysis to lower-frequencyresults For consistency the strand choice is made once foreach unit of analysis the feasibility study and the main study

Once the data have been processed to the counts of A CG and T indels and deletions for each position we can deter-mine the consensus sequence and the rate of occurrence foreach possible alternate allele at each position If we considerthe data from the unmixed sample for clone A to be our ref-erence and any alternate allele observations to be errors wefind that the error rate across all possible positions measuredas the frequency of the most common alternate allele at eachposition ranges from less than 001 to a high of 027 with99of possible alternate alleles occurring at a rate of less than02 The full distribution is shown in Figure 2

To assess the reproducibility of the data we looked at theapparent error rates for each possiblemutation using replicateexperiments Figure S3 shows plots of error versus error fortwo of the 100 clone A reference samples versus the thirdThe plot has a point for each possible base at each positionincluding the reference baseThe reference base calls all hovernear 1 when there are consensus base calls that all fit into thesame pixel on the log-log plot In this way the plot focusesattention on the erroneous base callsThe red green and bluecurves correspond to a difference in apparentmutation rate of10 1 and 01 respectively Using these plots it is possibleto quickly identify any outliers that might correspond to true

minus45 minus40 minus35 minus30 minus25

Freq

uenc

y

Distribution of error rates (feasibility study)

0

50

100

150

200

250

300

log10 (frequency of major alt allele)

Figure 2 Distribution of error rates across all positions in lightchain from the feasibility study The most frequent alternate alleleat each position is used to populate the figure

mutations and to get an estimate of the overall noise level inthe experiment

For these samples there are a few points very close tothe blue 01 line but none that actually cross it in eithercomparison By contrast when there is a true signal in thedata set data points are expected to be well outside thisregion For example if we take two of the 01 spiked controlsand two of the 05 spiked controls and compare them to the0 reference we obtain the plots in Figure S4The points cor-responding to the true spiked-in mutations are colored red

We will take the signal for each mutation in each spiked-in sample to be the difference between the average alternateallele rate observed in each of the three replicate spike-insamples and the average alternate allele rate observed for thecorresponding mutation in the replicate reference samplesFor each of these possible mutations we will use a 119905-testto assess whether the difference between the two means isstatistically significant Given the small numbers of replicatesinvolved the 119905-test results will not be used aggressively butrather as a filter to weed out spurious results (uncorrected 119875value cutoff of 01)

The main results from the samples in the feasibility studyare shown in Figure 3 We find that the estimates of mixingratio are very accurateThemedian signals at positive controlsites for the 001 005 01 05 1 and 5 spike-in experiments were 0017 0057 011 057 11and 53 respectively The range of signals was typically asmuch as plusmn2x however Certain sites have consistently loweror higher signal estimates across different spike-in levelssuggesting that the variability may be sequence-dependentand may not be corrected by additional sequencing

All 46 true-positive mutations are observed with statis-tical significance for spike-in levels of 5 1 and 05At the 01 005 and 001 spike-in levels 4546 4246and 1046 of the mutations are observed Across all controlsites (true negative) 27 false positives were observed Theobserved signal was less than 001 in most of those cases

BioMed Research International 5

Feasibility study results

Mutation rate at each position

Vary

ing

mix

ing

ratio

s

100

5

1

05

01

005

001

1e minus 011e minus 031e minus 051e minus 07

1

2

3

4

5

6

7

Figure 3 The seven horizontal bands of points correspond toexperiments with mixing ratios of 001 005 01 05 1 5and 100 There are points for each position in light chain for eachsample sequenced The 119909-axis corresponds to the apparent signalfor each spiked-in sample In order to include the negatives thatresult from this measurement on the log-scale plot they are plottedas their absolute values colored grey and offset just below theother points The points corresponding to the spiked-in mutationsare colored blue and offset just above the other points The lightblue points did not meet the threshold for statistical significanceTrue-negative mutations that did meet the criteria for statisticalsignificance are colored purple instead of black All points have hada small amount of vertical jitter addedThe jitter and offsets serve toallow visualization of the full distribution of points for the negativeand positive controls

and the highest signal observed was 003 By contrastfor the positive control sites at the 01 spike-in level thelowest observed excess signal was 00599 Based on theseobservations we set the following thresholds for mutationdetection in the main study excess mutation signal of morethan 005with a119875 value less than 01 In the feasibility studythese criteria would yield 4546 true positives at the 01spike-in level with no false positives The one false negativehad an apparent signal of 012 but just barely missed the 119875value cutoff with a value of 012 Therefore these settings aredesigned to be sufficient to detect (or rule out)mutationswitha true signal of more than 01

It is worth noting here that had we been interested onlyin mutations at higher levels the natural thresholds basedon this feasibility study would always be around one-half ofthe desired mutation detection rate That threshold wouldstill allow perfect sensitivity for all 46 tested mutations whileminimizing the false positive rate

32 Main Study We found that the error profile for the mainstudy was slightly different than that observed in the feasi-bility study Overall the error profile was better for the mainstudy with an average error rate over all possible substitutionsand indels of 011 versus 017 for the feasibility study

However while there were no mutations with a back-ground rate of more than 03 in the feasibility study therewere four such mutations in the main study including two

Error error comparison (main versus feasibility)

Error (feasibility study)

Erro

r (m

ain

study

)

1e minus 06

1e minus 04

1e minus 02

1e + 00

1e minus 061e minus 041e minus 021e + 00

Figure 4 Comparison of a baseline sample from the main studyversus a reference sample from the feasibility study showing therate of apparent error versus error for each possible alternate alleleat each position The dotted lines correspond to a mutation rate of03

PDL0

5000

MTX

PDL0

5020

MTX

PDL0

5080

MTX

PDL1

0000

MTX

PDL1

0020

MTX

PDL1

0080

MTX

PDL1

5000

MTX

PDL1

5020

MTX

PDL1

5080

MTX

0501

Distribution of significant mutations from main study

0

20

40

60

80

Figure 5 Histogram of counts of mutations meeting the thresholdfor detection of mutations at the 01 level for each experimentalcondition tested Those mutations that also met the criteria for the05 level are highlighted in light grey

above the 1 level The overall correspondence betweenthe error rates was nevertheless quite good overall See theerror error plot in Figure 4 More importantly the errorprofiles for the main study samples compared to replicateswithin that study were very consistent See the error errorplots for the reference samples in Figure S5

We proceeded with the analysis as described Across allnine samples covering no MTX 20 nM MTX and 80 nMMTX at 50 100 and 150 PDLs 245 mutations met thecriteria established in the feasibility study for the 01 levelThese were unevenly distributed across the samples biasedstrongly toward samples with larger PDLs The distributionof mutations is shown in Figure 5 Also highlighted in this

6 BioMed Research International

Main study results (LC)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

Main study results (HC)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Main study results (DHFR)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Main study results (GAPDH)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Figure 6 Four panels correspond to each of the four targets light chain heavy chain GAPDH and DHFR (clockwise from the top left)Each panel has points for each experimental condition stratified vertically exactly as done for the feasibility study (Figure 3) The coloringjittering and offsets for the points are also identical to Figure 3 except that there are no spike-in signals here and hence no blue pointsPositions meeting the criteria for significance (119905-test 119875 value lt01) are colored purple

figure are those mutations that would have met the criteriafor mutation detection at the 05 level In total there wereten signals detected at that level

The same analysis was performed with identical settingsfor the other three targets in the experiment The pattern ofmutations was very similar in each caseThe plots in Figure 6show the apparent rate of mutation for all possible mutationsin each of the four targets studied In this more quantitativeview it is possible to see the full distribution of error ratesacross the study While many mutations met the criteria forstatistical significance (119905-test 119875 value lt01 points coloredpurple) the vast majority of those have a very low apparentmutation rate Since we had only triplicate data it was notpossible to use a more stringent statistical cutoff However itis also possible to see some general trends in this view Acrossall four targets as the PDL increases the distribution ofapparent mutation rates shifts uniformly higher for examplePresumably this reflects small true shifts in the populationaccumulating over time though few mutations met ourcriteria for detection In terms of specific mutations meeting

the criteria established for detection at the 05 level thenumbers of signals observed in light chain heavy chainDHFR andGAPDHwere 10 17 4 and 0 respectively A tablewith all signals found across all four genes is included in theSupplementary Information

4 Discussion

Here we explored using RNA-seq technology for the detec-tion of emerging mutations in a CHO cell line producing arecombinant antibody during long-term culture In the feasi-bility study we established a high-confidence mutation leveldetection limit of 01 which is significantly more sensitivethan traditional molecular biology or protein characteriza-tion techniques The detection limit of mutation by SangerDNA sequencing is around 15ndash20 [14] When comparingthe feasibility study to the main study we noticed that thebackground error profile revealed by sequencing replicatesof the same biological sample can vary from batch to batchWithin each batch the error profile at each position (whether

BioMed Research International 7

arising from amplification library prep or sequencing itself)was very consistent Therefore a reference run should beincluded in each sequencing batch and used to assess vari-ation within each batch By considering each position tohave an independent error profile we can implicitly accountfor a variety of error sources without knowing exactly whatcontribution each source makes

In the main study we analyzed all three exogenous genesintroduced by the expression vector which were heavy chainand light chain of the mAb and the DHFR selection markerWe also analyzed the house-keeping gene GAPDH as arepresentative host endogenous gene As the study showsthe mutation rate displayed a clear increasing trend withextended culture passaging And in most cases the mutationrate also increased in the presence of selection pressure(MTX) In the actual cell culture manufacturing processthe cell inoculum typically needs to be passaged for at least30ndash40 PDLs starting from a frozen cell bank and often in thepresence of selection pressure such asMTXOur experimentswere designed to sufficiently cover this manufacturingwindow with respect to both process conditions In Figure 6there is a noticeable jump in the numbers of significantmuta-tions (above 01) starting at 150 PDLs At the same time upto 100 PDLs only the sample treated with 80 nMMTX exhib-ited detectable mutations higher than 05 No mutationabove 05was observed in the house-keeping gene GAPDHunder any of the culture conditions This indicates thatincreasing selection pressure and extending passaging periodmainly affect the stability of the transgenes but have minimaleffect on endogenous host genes presumably due to thedeleterious effect to the host It is noteworthy that mutationrate can be described in two ways The first is the numberof mutations above the 01 detection limit across theentire gene fragment And the second is the percentage ofpopulation that carries a specific point mutation Both repre-sentations showed similar trend in our study

On the molecular level mutations identified in mRNAcan be attributed to DNA template mutations [15] transcrip-tional errors [16 17] or posttranscriptionalmodifications [8]Understanding the mechanism behind individual mutationsrequires further characterization of all these possible factorsincluding DNA sequence analysis of the expression vectorinserted into the genome In addition mutations detected byRNA-seq require confirmation by protein sequence analysisto assess their impact on product quality

NGS technologies have played increasing roles in thedevelopment of cell culture production process and facilitatedthe understanding of the production cell line There has notbeen a report on applying RNA sequencing to systematicallyanalyze mutation rate during extended passaging of produc-tion CHO cells Production cell line stability with respectto sequence integrity is crucial for the biopharmaceuticalindustry because cell lines carrying the intended transgenesequences are essential for product quality and patient safetyHere we have demonstrated that RNA-seq can help to ensurethe accurate flowof genomic information to the final productAlthough CHO cell lines developed with DHFR as theselection system are used as a model system in this studyto characterize gene stability the methods developed in this

study should also be applicable for other production host celllines and selection methodologies The information gener-ated should further stimulate investigation on the molecularmechanisms behind sequence variations in mRNA

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Siyan Zhang Jason D Hughes and Nicholas Murgolo con-tributed equally to this work

References

[1] M LMetzker ldquoSequencing technologiesmdashthe next generationrdquoNature Reviews Genetics vol 11 no 1 pp 31ndash46 2010

[2] S B Baylin and P A Jones ldquoA decade of exploring the cancerepigenomemdashbiological and translational implicationsrdquo NatureReviews Cancer vol 11 no 10 pp 726ndash734 2011

[3] E T Cirulli and D B Goldstein ldquoUncovering the roles of rarevariants in common disease through whole-genome sequenc-ingrdquo Nature Reviews Genetics vol 11 no 6 pp 415ndash425 2010

[4] Y-H Jiang R K C Yuen X Jin et al ldquoDetection of clinicallyrelevant genetic variants in autism spectrum disorder by whole-genome sequencingrdquo American Journal of Human Genetics vol93 no 2 pp 249ndash263 2013

[5] Z Kan H Zheng X Liu et al ldquoWhole-genome sequencingidentifies recurrent mutations in hepatocellular carcinomardquoGenome Research vol 23 no 9 pp 1422ndash1433 2013

[6] Y Song L Li Y Ou et al ldquoIdentification of genomic alterationsin oesophageal squamous cell cancerrdquoNature vol 508 no 7498pp 91ndash95 2014

[7] F Ozsolak and P M Milos ldquoRNA sequencing advanceschallenges and opportunitiesrdquo Nature Reviews Genetics vol 12no 2 pp 87ndash98 2011

[8] Z Peng Y Cheng B C-M Tan et al ldquoComprehensive analysisof RNA-Seq data reveals extensive RNA editing in a humantranscriptomerdquo Nature Biotechnology vol 30 no 3 pp 253ndash260 2012

[9] DMWuest SW Harcum and K H Lee ldquoGenomics inmam-malian cell culture bioprocessingrdquo Biotechnology Advances vol30 no 3 pp 629ndash638 2012

[10] X Xu H Nagarajan N E Lewis et al ldquoThe genomic sequenceof the Chinese hamster ovary (CHO)-K1 cell linerdquo NatureBiotechnology vol 29 no 8 pp 735ndash741 2011

[11] H Zhang W Cui and M L Gross ldquoMass spectrometryfor the biophysical characterization of therapeutic monoclonalantibodiesrdquo FEBS Letters vol 588 no 2 pp 308ndash317 2014

[12] F Cheung J Win J M Lang et al ldquoAnalysis of the Pythiumultimum transcriptome using Sanger and pyrosequencingapproachesrdquo BMC Genomics vol 9 pp 542ndash551 2008

[13] F M Wurm ldquoCHO quasispecies-implications for manufactur-ing processesrdquo Processes vol 1 no 3 pp 296ndash311 2013

[14] A C Tsiatis A Norris-Kirby R G Rich et al ldquoComparison ofSanger sequencing pyrosequencing andmelting curve analysisfor the detection of KRAS mutations diagnostic and clinicalimplicationsrdquo Journal ofMolecular Diagnostics vol 12 no 4 pp425ndash432 2010

8 BioMed Research International

[15] J A Stamatoyannopoulos I Adzhubei R E Thurman G VKryukov S M Mirkin and S R Sunyaev ldquoHuman mutationrate associated with DNA replication timingrdquo Nature Geneticsvol 41 no 4 pp 393ndash395 2009

[16] P Cui F Ding Q Lin et al ldquoDistinct contributions of repli-cation and transcription to mutation rate variation of humangenomesrdquo Genomics Proteomics amp Bioinformatics vol 10 no 1pp 4ndash10 2012

[17] P Green B Ewing W Miller P J Thomas and E DGreen ldquoTranscription-associated mutational asymmetry inmammalian evolutionrdquo Nature Genetics vol 33 no 4 pp 514ndash517 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Mutation Detection in an Antibody ...

2 BioMed Research International

antibody (mAb) under extensive in vitro passaging The goalis to identify and quantify mutations in a cell population atthe transcript level under various culture conditions We firstcarried out a feasibility study by mixing two slightly differentmAb light chain cDNAs at different ratios and subjected themixture samples to RNA-seq analysis The detection limit ofthe mutation rate was determined by the feasibility studySince mutation rate is presumably related to the length ofpassaging and the presence of potentially mitogenic selectionreagents such as methotrexate (MTX) we next culturedthe CHO cell line continuously to reach an in vitro cellage of sim150 population doubling levels (PDLs) In parallelincreasing the dose of MTX was also evaluated for its impacton mutation rate The method we developed in this studywill be instrumental in defining the cell culture parametersto ensure consistent and reliable product quality

2 Materials and Methods

21 Feasibility Study by cDNAMixing Two cell clones (A andB) expressing a human IgG with different light chain (LC)sequences were thawed from frozen banks and cultured inalpha-MEM (Gibco Cat 12561) containing 10 dialyzed fetalbovine serum (FBS SAFC Cat 12015C) and 045 glucose(Sigma Cat G8769) Cells were passaged and expanded forRNA extraction RNA extraction was performed using theRNeasy kit (Qiagen Cat 74104) andRNAwas eluted in 50 120583LRNase-free water RNA concentrationwasmeasured onNan-oDrop Spectrophotometer (ND-1000 Thermo Scientific)

RT-PCR of IgG light chains was set up with 200 ng RNAper sample using the OneStep RT-PCR kit (Qiagen Cat210212) in 50 120583L reaction volume RT-PCR was run on theApplied Biosystems 2720 Thermal Cycler with incubationperiods of 30min at 50∘C and 15min at 95∘C 30 cyclesof 30-second denaturing at 94∘C 30-second annealing at62∘C and 2min extension at 72∘C followed by final 10minincubation at 72∘C cDNA was purified using the QiaquickPCR Purification Kit (Qiagen Cat 28106) and eluted in 30 120583LEB buffer (10mM Tris-Cl pH 85) cDNA concentrationswere measured on NanoDrop The cDNA of clone B wasmixed with cDNAof clone A atmixing ratios of 5 1 0501 005 and 001 Triplicate samples of pure cDNA ofclones A and B and each mixture were submitted to BGI forRNA-seq

See Supplementary Information in Supplementary Mate-rial available online at httpdxdoiorg10115520168356435for light chain and primer sequences

22 cDNA Preparation from Cell Line under Different CultureConditions (Main Study) Clone A derived from a singlecell was thawed from a frozen bank at about 14 PDLs sinceserum-free adaptation and cultured in Ex-cell ACF CHOmedium C5467 (SAFC Cat 86016C-1000mL) with 4mM L-glutamine (Gibco Cat 25030) 1x Trace Elements A (CellgroCat 99-182-C1) and 1x Trace Elements B (Cellgro Cat 99-175-C1) Cells after thawing were termed PDL 0 and around1 million cells were pelleted and resuspended in 350 120583L RLTbuffer with 1 beta-mercaptoethanol for RNA extraction

Cells were further passaged at 05millionmL every 3-4 daysin the presence of 0 20 or 80 nMMTX (Sigma Cat 8407) at37∘C and 75 CO

2

At PDLs 0 50 100 and 150 15 million cells were pelleteddivided into 3 aliquots upon lysis (except PDL 0 samplewhich was divided into replicates at RNA level) and RNAwas extracted following Qiagen protocol (Qiagen RNeasykit Cat 74104) Reverse transcription was performed with200 ng RNA using the AccuScript High Fidelity RT-PCR kits(Agilent Cat 600180) The thermal program includes 5minincubation at 65∘C and cooling to room temperature for5min followed by addition of 1 120583L of 100mM dithiothreitol(DTT) and 1 120583L of AccuScript Reverse Transcriptase Thereaction was further incubated at 42∘C for 30min and storedat 4∘C Three separate reverse transcription reactions wereperformed for PDL 0 RNA to create replicates cDNAs ofheavy chain (HC) light chain (LC) dihydrofolate reductase(DHFR) andGAPDHwere amplified via PCRusing PfuUltraHF DNA polymerase (Agilent Cat 600380) and the follow-ing thermal cycle program 1min at 95∘C 30 cycles of 30 sec-onds at 95∘C 30 seconds at 64∘C (62∘Cannealingwas used forDHFR) and 3min at 68∘C followed by a final 10min incuba-tion at 68∘C PCRproductswere purified usingQiaquick PCRPurification Kit (Qiagen Cat 28104) For each sample equal-molar ratios of HC LC DHFR and GAPDHwere mixed to atotal cDNAmass of 25 120583g and submitted for RNA-seq at BGIThe experimental procedure is outlined in Figure 1

For the feasibility study the amplified fragment for lightchain corresponded precisely to the target sequence In themain study a slightly larger region was amplified for eachtarget to ensure that the region of interest was outside therange of the PCR primers themselvesThe references used formapping were modified accordingly

23 RNA-Seq At BGI cDNA was fragmented to an averagefragment size of 170ndash180 bp using Covaris OnThermomixerthese fragments were subjected to end-repair and the 31015840end was adenylated Adaptors were ligated to the 31015840 endsThe ligation products were purified on TAE-agarose gel andsim14 rounds of PCR amplification were performed to enrichthe purified cDNA template For quality control the librarywas validated on the Agilent Technologies 2100 Bioanalyzerand the ABI StepOnePlus Real-Time PCR System Qualifiedlibraries were sequenced on Illumina HiSeq2000 and 100Mbclean sequence data were generated for each

See Supplementary Information for details on sequencesof primers and amplified regions Analysis was performedexcluding the regions corresponding to the PCR primers

3 Results

31 Feasibility Study cDNAs from two clones expressinglight chainwith closely related but slightly differing sequenceswere mixed in different ratios to assess the ability of NGS toquantitatively detect the fraction of mutant bases in a mixedpopulationThe sequences chosen for this were each 714 baseslong and differed in 46 positions The sequence alignment isshown in Figure S1

BioMed Research International 3

Cellisolation

RNAextraction

Dataanalysis

Reversetranscriptionand PCR of

specific genes

Equal-molarmixing and

submitting forsequencing

Figure 1 Experimental outline of RNA-seq studies of production CHO cell linesThe tested CHO cell lines expressing mAb were propagatedin suspension Cell pellets were isolated and RNA samples were subsequently extracted Reverse transcription was performed on the RNAsamples and certain genes of interest were amplified from cDNAs After library preparation the product was sequenced on IlluminaHiSeq2000 Details of data analysis are described in Section 3

Detecting the fraction of sequence reads from a mixtureof these clones is fundamentally different than detectingemerging mutations in cell culture in that one would notexpect to find so many mutations emerging at once In termsof the data analysis the main impact is on the ability to mapreads For example in the sequence between positions 80 and120 there are more than a dozen sequence differences Bydefault most short-readmappers will onlymap reads reliablywhen the error rate is less than around 5 If sequencesincluding mixtures of reads from clones A and B weremapped directly to clone A reference some reads from cloneBwould notmap at all to cloneA referenceThis would not beexpected to happen in the real case of an emerging mutationat a single position To address this issue for the feasibilitystudy we map reads to a reference sequence that includesboth clone A and clone B sequences using BWA (httpsgithubcomlh3bwa version 070 Li H and Durbin R(2009) Fast and accurate short read alignment with Burrows-Wheeler transform Bioinformatics 25 1754ndash1760 [PMID19451168]) BWA will output the single best alignment foreach read in SAM format For reads from regions whereclones A and B differ the alignment will indicate that themapping was specific to reference A or B For reads fromregions where clones A and B do not differ reads will berandomly assigned to one reference or the other In orderto obtain a mapping that is consistent with what we wouldexpect to find in the real study if any one of the 46 mutationshad occurred singly we modify the mappings obtained inthis way as follows We replace all occurrences of the cloneB sequence identifier in the SAM-formatted alignment fileswith the clone A identifier and we ignore the trailing tagfields Since there are no insertion or deletion differencesbetween the two clones the SAM file obtained in this wayis perfectly consistent with what would have been obtainedif the mutations had occurred separately This procedure isequivalent to mapping reads to each of the clone sequencesseparately determining which reference was a better fit and

then translating the clone B alignments to become cloneA alignments In this case that translation step is trivialsince the two sequences differ only by substitutions The keyadvantage of this approach over any single-referencemappingapproach is that it eliminates the possibility of any edgeeffects or incorrectly induced insertions or deletions in thealignments in regions where the clones A and B sequencesare significantly different Had we used a more exhaustiveapproach such as a Smith-Waterman alignment of all reads tothe clone A sequence for example the resulting alignmentsof reads from clone B that included significantly differingsections would have had small errors in alignment that wouldhave confounded the analysis Also it is important to notethat this modified alignment procedure is only relevant forthe initial validation portion of this study

Aside from this mapping difference the analysis for thefeasibility study is performed exactly as for the main studySequence data were received from BGI in FASTQ formatAdapters were removed using SeqPrep (httpsgithubcomjstjohnSeqPrep version 04 unpublished) and aligned tothe reference sequence using BWA Coverage across the lightchain sequence for all samples is shown in Figure S2 Theoverall mapping rate across all experiments was very highgenerally around 99 and the reads aligned with a very lowmismatch rate typically around 02 mismatches per 90 bpread This indicates that we had very little contamination inthe experiment

The SAMtools program ldquompileuprdquo (httpsgithubcomsamtoolssamtools version 0119 Li Hlowast Handsaker BlowastWysoker A Fennell T Ruan J Homer N Marth G Abeca-sis G andDurbin R and 1000Genome Project Data Process-ing Subgroup (2009) The Sequence alignmentmap (SAM)format and SAMtools Bioinformatics 25 2078-9 [PMID19505943]) was used along with custom scripts to extract foreach position in the target region the counts of each base of ACG andT aswell as the numbers of insertions and deletionsInsertions were counted according to the base immediately

4 BioMed Research International

preceding the insertion regardless of what sequence wasbeing inserted Similarly deletions were attributed to the basebeing deleted regardless of how many bases were spannedby the overall deletion These counts were stratified based onwhether they were found from reads aligned in the forwardor reverse directions Bases with quality scores less than15 were ignored in this analysis This cutoff was selectedto remove a minimum amount of data (typically 2ndash5 ofbases) while eliminating the lowest quality bases which aremainly those with reported base quality of two indicatingthat the sequencer failed to call the base at the positionWithin each experiment for each position in each targetsequence a preferred orientation was determined based onwhich orientation gave rise to higher overall coverage Onlydata from reads in the preferred orientation at each positionwas used to generate final results Overall this step has theimpact of removing a small portion of very-low-quality dataat the cost of ignoring just under half of the overall sequencedata which has little impact on most positions

This decision to use only data from reads in a preferredorientation is driven by the fact that some sequence contextsare problematic for sequencing (observed in a variety oftargeted sequencing experiments unpublished results) Theproblem may arise from any step in the process fromamplification to library prep to the sequencing itselfThe issueis often found in regions that are G-rich The reads on theG-rich strand will often have errors while the reads fromthe other C-rich strand do not In those cases we find thatthe ldquobetterrdquo strand usually has higher coverage presumablybecause the sequencer was unable to generate acceptablereads from that direction andor some of the base calls hadquality scores below the threshold of 15 By applying a cutoffbased on coverage we are able to identify the ldquobetterrdquo strandwithout explicitly biasing the analysis to lower-frequencyresults For consistency the strand choice is made once foreach unit of analysis the feasibility study and the main study

Once the data have been processed to the counts of A CG and T indels and deletions for each position we can deter-mine the consensus sequence and the rate of occurrence foreach possible alternate allele at each position If we considerthe data from the unmixed sample for clone A to be our ref-erence and any alternate allele observations to be errors wefind that the error rate across all possible positions measuredas the frequency of the most common alternate allele at eachposition ranges from less than 001 to a high of 027 with99of possible alternate alleles occurring at a rate of less than02 The full distribution is shown in Figure 2

To assess the reproducibility of the data we looked at theapparent error rates for each possiblemutation using replicateexperiments Figure S3 shows plots of error versus error fortwo of the 100 clone A reference samples versus the thirdThe plot has a point for each possible base at each positionincluding the reference baseThe reference base calls all hovernear 1 when there are consensus base calls that all fit into thesame pixel on the log-log plot In this way the plot focusesattention on the erroneous base callsThe red green and bluecurves correspond to a difference in apparentmutation rate of10 1 and 01 respectively Using these plots it is possibleto quickly identify any outliers that might correspond to true

minus45 minus40 minus35 minus30 minus25

Freq

uenc

y

Distribution of error rates (feasibility study)

0

50

100

150

200

250

300

log10 (frequency of major alt allele)

Figure 2 Distribution of error rates across all positions in lightchain from the feasibility study The most frequent alternate alleleat each position is used to populate the figure

mutations and to get an estimate of the overall noise level inthe experiment

For these samples there are a few points very close tothe blue 01 line but none that actually cross it in eithercomparison By contrast when there is a true signal in thedata set data points are expected to be well outside thisregion For example if we take two of the 01 spiked controlsand two of the 05 spiked controls and compare them to the0 reference we obtain the plots in Figure S4The points cor-responding to the true spiked-in mutations are colored red

We will take the signal for each mutation in each spiked-in sample to be the difference between the average alternateallele rate observed in each of the three replicate spike-insamples and the average alternate allele rate observed for thecorresponding mutation in the replicate reference samplesFor each of these possible mutations we will use a 119905-testto assess whether the difference between the two means isstatistically significant Given the small numbers of replicatesinvolved the 119905-test results will not be used aggressively butrather as a filter to weed out spurious results (uncorrected 119875value cutoff of 01)

The main results from the samples in the feasibility studyare shown in Figure 3 We find that the estimates of mixingratio are very accurateThemedian signals at positive controlsites for the 001 005 01 05 1 and 5 spike-in experiments were 0017 0057 011 057 11and 53 respectively The range of signals was typically asmuch as plusmn2x however Certain sites have consistently loweror higher signal estimates across different spike-in levelssuggesting that the variability may be sequence-dependentand may not be corrected by additional sequencing

All 46 true-positive mutations are observed with statis-tical significance for spike-in levels of 5 1 and 05At the 01 005 and 001 spike-in levels 4546 4246and 1046 of the mutations are observed Across all controlsites (true negative) 27 false positives were observed Theobserved signal was less than 001 in most of those cases

BioMed Research International 5

Feasibility study results

Mutation rate at each position

Vary

ing

mix

ing

ratio

s

100

5

1

05

01

005

001

1e minus 011e minus 031e minus 051e minus 07

1

2

3

4

5

6

7

Figure 3 The seven horizontal bands of points correspond toexperiments with mixing ratios of 001 005 01 05 1 5and 100 There are points for each position in light chain for eachsample sequenced The 119909-axis corresponds to the apparent signalfor each spiked-in sample In order to include the negatives thatresult from this measurement on the log-scale plot they are plottedas their absolute values colored grey and offset just below theother points The points corresponding to the spiked-in mutationsare colored blue and offset just above the other points The lightblue points did not meet the threshold for statistical significanceTrue-negative mutations that did meet the criteria for statisticalsignificance are colored purple instead of black All points have hada small amount of vertical jitter addedThe jitter and offsets serve toallow visualization of the full distribution of points for the negativeand positive controls

and the highest signal observed was 003 By contrastfor the positive control sites at the 01 spike-in level thelowest observed excess signal was 00599 Based on theseobservations we set the following thresholds for mutationdetection in the main study excess mutation signal of morethan 005with a119875 value less than 01 In the feasibility studythese criteria would yield 4546 true positives at the 01spike-in level with no false positives The one false negativehad an apparent signal of 012 but just barely missed the 119875value cutoff with a value of 012 Therefore these settings aredesigned to be sufficient to detect (or rule out)mutationswitha true signal of more than 01

It is worth noting here that had we been interested onlyin mutations at higher levels the natural thresholds basedon this feasibility study would always be around one-half ofthe desired mutation detection rate That threshold wouldstill allow perfect sensitivity for all 46 tested mutations whileminimizing the false positive rate

32 Main Study We found that the error profile for the mainstudy was slightly different than that observed in the feasi-bility study Overall the error profile was better for the mainstudy with an average error rate over all possible substitutionsand indels of 011 versus 017 for the feasibility study

However while there were no mutations with a back-ground rate of more than 03 in the feasibility study therewere four such mutations in the main study including two

Error error comparison (main versus feasibility)

Error (feasibility study)

Erro

r (m

ain

study

)

1e minus 06

1e minus 04

1e minus 02

1e + 00

1e minus 061e minus 041e minus 021e + 00

Figure 4 Comparison of a baseline sample from the main studyversus a reference sample from the feasibility study showing therate of apparent error versus error for each possible alternate alleleat each position The dotted lines correspond to a mutation rate of03

PDL0

5000

MTX

PDL0

5020

MTX

PDL0

5080

MTX

PDL1

0000

MTX

PDL1

0020

MTX

PDL1

0080

MTX

PDL1

5000

MTX

PDL1

5020

MTX

PDL1

5080

MTX

0501

Distribution of significant mutations from main study

0

20

40

60

80

Figure 5 Histogram of counts of mutations meeting the thresholdfor detection of mutations at the 01 level for each experimentalcondition tested Those mutations that also met the criteria for the05 level are highlighted in light grey

above the 1 level The overall correspondence betweenthe error rates was nevertheless quite good overall See theerror error plot in Figure 4 More importantly the errorprofiles for the main study samples compared to replicateswithin that study were very consistent See the error errorplots for the reference samples in Figure S5

We proceeded with the analysis as described Across allnine samples covering no MTX 20 nM MTX and 80 nMMTX at 50 100 and 150 PDLs 245 mutations met thecriteria established in the feasibility study for the 01 levelThese were unevenly distributed across the samples biasedstrongly toward samples with larger PDLs The distributionof mutations is shown in Figure 5 Also highlighted in this

6 BioMed Research International

Main study results (LC)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

Main study results (HC)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Main study results (DHFR)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Main study results (GAPDH)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Figure 6 Four panels correspond to each of the four targets light chain heavy chain GAPDH and DHFR (clockwise from the top left)Each panel has points for each experimental condition stratified vertically exactly as done for the feasibility study (Figure 3) The coloringjittering and offsets for the points are also identical to Figure 3 except that there are no spike-in signals here and hence no blue pointsPositions meeting the criteria for significance (119905-test 119875 value lt01) are colored purple

figure are those mutations that would have met the criteriafor mutation detection at the 05 level In total there wereten signals detected at that level

The same analysis was performed with identical settingsfor the other three targets in the experiment The pattern ofmutations was very similar in each caseThe plots in Figure 6show the apparent rate of mutation for all possible mutationsin each of the four targets studied In this more quantitativeview it is possible to see the full distribution of error ratesacross the study While many mutations met the criteria forstatistical significance (119905-test 119875 value lt01 points coloredpurple) the vast majority of those have a very low apparentmutation rate Since we had only triplicate data it was notpossible to use a more stringent statistical cutoff However itis also possible to see some general trends in this view Acrossall four targets as the PDL increases the distribution ofapparent mutation rates shifts uniformly higher for examplePresumably this reflects small true shifts in the populationaccumulating over time though few mutations met ourcriteria for detection In terms of specific mutations meeting

the criteria established for detection at the 05 level thenumbers of signals observed in light chain heavy chainDHFR andGAPDHwere 10 17 4 and 0 respectively A tablewith all signals found across all four genes is included in theSupplementary Information

4 Discussion

Here we explored using RNA-seq technology for the detec-tion of emerging mutations in a CHO cell line producing arecombinant antibody during long-term culture In the feasi-bility study we established a high-confidence mutation leveldetection limit of 01 which is significantly more sensitivethan traditional molecular biology or protein characteriza-tion techniques The detection limit of mutation by SangerDNA sequencing is around 15ndash20 [14] When comparingthe feasibility study to the main study we noticed that thebackground error profile revealed by sequencing replicatesof the same biological sample can vary from batch to batchWithin each batch the error profile at each position (whether

BioMed Research International 7

arising from amplification library prep or sequencing itself)was very consistent Therefore a reference run should beincluded in each sequencing batch and used to assess vari-ation within each batch By considering each position tohave an independent error profile we can implicitly accountfor a variety of error sources without knowing exactly whatcontribution each source makes

In the main study we analyzed all three exogenous genesintroduced by the expression vector which were heavy chainand light chain of the mAb and the DHFR selection markerWe also analyzed the house-keeping gene GAPDH as arepresentative host endogenous gene As the study showsthe mutation rate displayed a clear increasing trend withextended culture passaging And in most cases the mutationrate also increased in the presence of selection pressure(MTX) In the actual cell culture manufacturing processthe cell inoculum typically needs to be passaged for at least30ndash40 PDLs starting from a frozen cell bank and often in thepresence of selection pressure such asMTXOur experimentswere designed to sufficiently cover this manufacturingwindow with respect to both process conditions In Figure 6there is a noticeable jump in the numbers of significantmuta-tions (above 01) starting at 150 PDLs At the same time upto 100 PDLs only the sample treated with 80 nMMTX exhib-ited detectable mutations higher than 05 No mutationabove 05was observed in the house-keeping gene GAPDHunder any of the culture conditions This indicates thatincreasing selection pressure and extending passaging periodmainly affect the stability of the transgenes but have minimaleffect on endogenous host genes presumably due to thedeleterious effect to the host It is noteworthy that mutationrate can be described in two ways The first is the numberof mutations above the 01 detection limit across theentire gene fragment And the second is the percentage ofpopulation that carries a specific point mutation Both repre-sentations showed similar trend in our study

On the molecular level mutations identified in mRNAcan be attributed to DNA template mutations [15] transcrip-tional errors [16 17] or posttranscriptionalmodifications [8]Understanding the mechanism behind individual mutationsrequires further characterization of all these possible factorsincluding DNA sequence analysis of the expression vectorinserted into the genome In addition mutations detected byRNA-seq require confirmation by protein sequence analysisto assess their impact on product quality

NGS technologies have played increasing roles in thedevelopment of cell culture production process and facilitatedthe understanding of the production cell line There has notbeen a report on applying RNA sequencing to systematicallyanalyze mutation rate during extended passaging of produc-tion CHO cells Production cell line stability with respectto sequence integrity is crucial for the biopharmaceuticalindustry because cell lines carrying the intended transgenesequences are essential for product quality and patient safetyHere we have demonstrated that RNA-seq can help to ensurethe accurate flowof genomic information to the final productAlthough CHO cell lines developed with DHFR as theselection system are used as a model system in this studyto characterize gene stability the methods developed in this

study should also be applicable for other production host celllines and selection methodologies The information gener-ated should further stimulate investigation on the molecularmechanisms behind sequence variations in mRNA

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Siyan Zhang Jason D Hughes and Nicholas Murgolo con-tributed equally to this work

References

[1] M LMetzker ldquoSequencing technologiesmdashthe next generationrdquoNature Reviews Genetics vol 11 no 1 pp 31ndash46 2010

[2] S B Baylin and P A Jones ldquoA decade of exploring the cancerepigenomemdashbiological and translational implicationsrdquo NatureReviews Cancer vol 11 no 10 pp 726ndash734 2011

[3] E T Cirulli and D B Goldstein ldquoUncovering the roles of rarevariants in common disease through whole-genome sequenc-ingrdquo Nature Reviews Genetics vol 11 no 6 pp 415ndash425 2010

[4] Y-H Jiang R K C Yuen X Jin et al ldquoDetection of clinicallyrelevant genetic variants in autism spectrum disorder by whole-genome sequencingrdquo American Journal of Human Genetics vol93 no 2 pp 249ndash263 2013

[5] Z Kan H Zheng X Liu et al ldquoWhole-genome sequencingidentifies recurrent mutations in hepatocellular carcinomardquoGenome Research vol 23 no 9 pp 1422ndash1433 2013

[6] Y Song L Li Y Ou et al ldquoIdentification of genomic alterationsin oesophageal squamous cell cancerrdquoNature vol 508 no 7498pp 91ndash95 2014

[7] F Ozsolak and P M Milos ldquoRNA sequencing advanceschallenges and opportunitiesrdquo Nature Reviews Genetics vol 12no 2 pp 87ndash98 2011

[8] Z Peng Y Cheng B C-M Tan et al ldquoComprehensive analysisof RNA-Seq data reveals extensive RNA editing in a humantranscriptomerdquo Nature Biotechnology vol 30 no 3 pp 253ndash260 2012

[9] DMWuest SW Harcum and K H Lee ldquoGenomics inmam-malian cell culture bioprocessingrdquo Biotechnology Advances vol30 no 3 pp 629ndash638 2012

[10] X Xu H Nagarajan N E Lewis et al ldquoThe genomic sequenceof the Chinese hamster ovary (CHO)-K1 cell linerdquo NatureBiotechnology vol 29 no 8 pp 735ndash741 2011

[11] H Zhang W Cui and M L Gross ldquoMass spectrometryfor the biophysical characterization of therapeutic monoclonalantibodiesrdquo FEBS Letters vol 588 no 2 pp 308ndash317 2014

[12] F Cheung J Win J M Lang et al ldquoAnalysis of the Pythiumultimum transcriptome using Sanger and pyrosequencingapproachesrdquo BMC Genomics vol 9 pp 542ndash551 2008

[13] F M Wurm ldquoCHO quasispecies-implications for manufactur-ing processesrdquo Processes vol 1 no 3 pp 296ndash311 2013

[14] A C Tsiatis A Norris-Kirby R G Rich et al ldquoComparison ofSanger sequencing pyrosequencing andmelting curve analysisfor the detection of KRAS mutations diagnostic and clinicalimplicationsrdquo Journal ofMolecular Diagnostics vol 12 no 4 pp425ndash432 2010

8 BioMed Research International

[15] J A Stamatoyannopoulos I Adzhubei R E Thurman G VKryukov S M Mirkin and S R Sunyaev ldquoHuman mutationrate associated with DNA replication timingrdquo Nature Geneticsvol 41 no 4 pp 393ndash395 2009

[16] P Cui F Ding Q Lin et al ldquoDistinct contributions of repli-cation and transcription to mutation rate variation of humangenomesrdquo Genomics Proteomics amp Bioinformatics vol 10 no 1pp 4ndash10 2012

[17] P Green B Ewing W Miller P J Thomas and E DGreen ldquoTranscription-associated mutational asymmetry inmammalian evolutionrdquo Nature Genetics vol 33 no 4 pp 514ndash517 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Mutation Detection in an Antibody ...

BioMed Research International 3

Cellisolation

RNAextraction

Dataanalysis

Reversetranscriptionand PCR of

specific genes

Equal-molarmixing and

submitting forsequencing

Figure 1 Experimental outline of RNA-seq studies of production CHO cell linesThe tested CHO cell lines expressing mAb were propagatedin suspension Cell pellets were isolated and RNA samples were subsequently extracted Reverse transcription was performed on the RNAsamples and certain genes of interest were amplified from cDNAs After library preparation the product was sequenced on IlluminaHiSeq2000 Details of data analysis are described in Section 3

Detecting the fraction of sequence reads from a mixtureof these clones is fundamentally different than detectingemerging mutations in cell culture in that one would notexpect to find so many mutations emerging at once In termsof the data analysis the main impact is on the ability to mapreads For example in the sequence between positions 80 and120 there are more than a dozen sequence differences Bydefault most short-readmappers will onlymap reads reliablywhen the error rate is less than around 5 If sequencesincluding mixtures of reads from clones A and B weremapped directly to clone A reference some reads from cloneBwould notmap at all to cloneA referenceThis would not beexpected to happen in the real case of an emerging mutationat a single position To address this issue for the feasibilitystudy we map reads to a reference sequence that includesboth clone A and clone B sequences using BWA (httpsgithubcomlh3bwa version 070 Li H and Durbin R(2009) Fast and accurate short read alignment with Burrows-Wheeler transform Bioinformatics 25 1754ndash1760 [PMID19451168]) BWA will output the single best alignment foreach read in SAM format For reads from regions whereclones A and B differ the alignment will indicate that themapping was specific to reference A or B For reads fromregions where clones A and B do not differ reads will berandomly assigned to one reference or the other In orderto obtain a mapping that is consistent with what we wouldexpect to find in the real study if any one of the 46 mutationshad occurred singly we modify the mappings obtained inthis way as follows We replace all occurrences of the cloneB sequence identifier in the SAM-formatted alignment fileswith the clone A identifier and we ignore the trailing tagfields Since there are no insertion or deletion differencesbetween the two clones the SAM file obtained in this wayis perfectly consistent with what would have been obtainedif the mutations had occurred separately This procedure isequivalent to mapping reads to each of the clone sequencesseparately determining which reference was a better fit and

then translating the clone B alignments to become cloneA alignments In this case that translation step is trivialsince the two sequences differ only by substitutions The keyadvantage of this approach over any single-referencemappingapproach is that it eliminates the possibility of any edgeeffects or incorrectly induced insertions or deletions in thealignments in regions where the clones A and B sequencesare significantly different Had we used a more exhaustiveapproach such as a Smith-Waterman alignment of all reads tothe clone A sequence for example the resulting alignmentsof reads from clone B that included significantly differingsections would have had small errors in alignment that wouldhave confounded the analysis Also it is important to notethat this modified alignment procedure is only relevant forthe initial validation portion of this study

Aside from this mapping difference the analysis for thefeasibility study is performed exactly as for the main studySequence data were received from BGI in FASTQ formatAdapters were removed using SeqPrep (httpsgithubcomjstjohnSeqPrep version 04 unpublished) and aligned tothe reference sequence using BWA Coverage across the lightchain sequence for all samples is shown in Figure S2 Theoverall mapping rate across all experiments was very highgenerally around 99 and the reads aligned with a very lowmismatch rate typically around 02 mismatches per 90 bpread This indicates that we had very little contamination inthe experiment

The SAMtools program ldquompileuprdquo (httpsgithubcomsamtoolssamtools version 0119 Li Hlowast Handsaker BlowastWysoker A Fennell T Ruan J Homer N Marth G Abeca-sis G andDurbin R and 1000Genome Project Data Process-ing Subgroup (2009) The Sequence alignmentmap (SAM)format and SAMtools Bioinformatics 25 2078-9 [PMID19505943]) was used along with custom scripts to extract foreach position in the target region the counts of each base of ACG andT aswell as the numbers of insertions and deletionsInsertions were counted according to the base immediately

4 BioMed Research International

preceding the insertion regardless of what sequence wasbeing inserted Similarly deletions were attributed to the basebeing deleted regardless of how many bases were spannedby the overall deletion These counts were stratified based onwhether they were found from reads aligned in the forwardor reverse directions Bases with quality scores less than15 were ignored in this analysis This cutoff was selectedto remove a minimum amount of data (typically 2ndash5 ofbases) while eliminating the lowest quality bases which aremainly those with reported base quality of two indicatingthat the sequencer failed to call the base at the positionWithin each experiment for each position in each targetsequence a preferred orientation was determined based onwhich orientation gave rise to higher overall coverage Onlydata from reads in the preferred orientation at each positionwas used to generate final results Overall this step has theimpact of removing a small portion of very-low-quality dataat the cost of ignoring just under half of the overall sequencedata which has little impact on most positions

This decision to use only data from reads in a preferredorientation is driven by the fact that some sequence contextsare problematic for sequencing (observed in a variety oftargeted sequencing experiments unpublished results) Theproblem may arise from any step in the process fromamplification to library prep to the sequencing itselfThe issueis often found in regions that are G-rich The reads on theG-rich strand will often have errors while the reads fromthe other C-rich strand do not In those cases we find thatthe ldquobetterrdquo strand usually has higher coverage presumablybecause the sequencer was unable to generate acceptablereads from that direction andor some of the base calls hadquality scores below the threshold of 15 By applying a cutoffbased on coverage we are able to identify the ldquobetterrdquo strandwithout explicitly biasing the analysis to lower-frequencyresults For consistency the strand choice is made once foreach unit of analysis the feasibility study and the main study

Once the data have been processed to the counts of A CG and T indels and deletions for each position we can deter-mine the consensus sequence and the rate of occurrence foreach possible alternate allele at each position If we considerthe data from the unmixed sample for clone A to be our ref-erence and any alternate allele observations to be errors wefind that the error rate across all possible positions measuredas the frequency of the most common alternate allele at eachposition ranges from less than 001 to a high of 027 with99of possible alternate alleles occurring at a rate of less than02 The full distribution is shown in Figure 2

To assess the reproducibility of the data we looked at theapparent error rates for each possiblemutation using replicateexperiments Figure S3 shows plots of error versus error fortwo of the 100 clone A reference samples versus the thirdThe plot has a point for each possible base at each positionincluding the reference baseThe reference base calls all hovernear 1 when there are consensus base calls that all fit into thesame pixel on the log-log plot In this way the plot focusesattention on the erroneous base callsThe red green and bluecurves correspond to a difference in apparentmutation rate of10 1 and 01 respectively Using these plots it is possibleto quickly identify any outliers that might correspond to true

minus45 minus40 minus35 minus30 minus25

Freq

uenc

y

Distribution of error rates (feasibility study)

0

50

100

150

200

250

300

log10 (frequency of major alt allele)

Figure 2 Distribution of error rates across all positions in lightchain from the feasibility study The most frequent alternate alleleat each position is used to populate the figure

mutations and to get an estimate of the overall noise level inthe experiment

For these samples there are a few points very close tothe blue 01 line but none that actually cross it in eithercomparison By contrast when there is a true signal in thedata set data points are expected to be well outside thisregion For example if we take two of the 01 spiked controlsand two of the 05 spiked controls and compare them to the0 reference we obtain the plots in Figure S4The points cor-responding to the true spiked-in mutations are colored red

We will take the signal for each mutation in each spiked-in sample to be the difference between the average alternateallele rate observed in each of the three replicate spike-insamples and the average alternate allele rate observed for thecorresponding mutation in the replicate reference samplesFor each of these possible mutations we will use a 119905-testto assess whether the difference between the two means isstatistically significant Given the small numbers of replicatesinvolved the 119905-test results will not be used aggressively butrather as a filter to weed out spurious results (uncorrected 119875value cutoff of 01)

The main results from the samples in the feasibility studyare shown in Figure 3 We find that the estimates of mixingratio are very accurateThemedian signals at positive controlsites for the 001 005 01 05 1 and 5 spike-in experiments were 0017 0057 011 057 11and 53 respectively The range of signals was typically asmuch as plusmn2x however Certain sites have consistently loweror higher signal estimates across different spike-in levelssuggesting that the variability may be sequence-dependentand may not be corrected by additional sequencing

All 46 true-positive mutations are observed with statis-tical significance for spike-in levels of 5 1 and 05At the 01 005 and 001 spike-in levels 4546 4246and 1046 of the mutations are observed Across all controlsites (true negative) 27 false positives were observed Theobserved signal was less than 001 in most of those cases

BioMed Research International 5

Feasibility study results

Mutation rate at each position

Vary

ing

mix

ing

ratio

s

100

5

1

05

01

005

001

1e minus 011e minus 031e minus 051e minus 07

1

2

3

4

5

6

7

Figure 3 The seven horizontal bands of points correspond toexperiments with mixing ratios of 001 005 01 05 1 5and 100 There are points for each position in light chain for eachsample sequenced The 119909-axis corresponds to the apparent signalfor each spiked-in sample In order to include the negatives thatresult from this measurement on the log-scale plot they are plottedas their absolute values colored grey and offset just below theother points The points corresponding to the spiked-in mutationsare colored blue and offset just above the other points The lightblue points did not meet the threshold for statistical significanceTrue-negative mutations that did meet the criteria for statisticalsignificance are colored purple instead of black All points have hada small amount of vertical jitter addedThe jitter and offsets serve toallow visualization of the full distribution of points for the negativeand positive controls

and the highest signal observed was 003 By contrastfor the positive control sites at the 01 spike-in level thelowest observed excess signal was 00599 Based on theseobservations we set the following thresholds for mutationdetection in the main study excess mutation signal of morethan 005with a119875 value less than 01 In the feasibility studythese criteria would yield 4546 true positives at the 01spike-in level with no false positives The one false negativehad an apparent signal of 012 but just barely missed the 119875value cutoff with a value of 012 Therefore these settings aredesigned to be sufficient to detect (or rule out)mutationswitha true signal of more than 01

It is worth noting here that had we been interested onlyin mutations at higher levels the natural thresholds basedon this feasibility study would always be around one-half ofthe desired mutation detection rate That threshold wouldstill allow perfect sensitivity for all 46 tested mutations whileminimizing the false positive rate

32 Main Study We found that the error profile for the mainstudy was slightly different than that observed in the feasi-bility study Overall the error profile was better for the mainstudy with an average error rate over all possible substitutionsand indels of 011 versus 017 for the feasibility study

However while there were no mutations with a back-ground rate of more than 03 in the feasibility study therewere four such mutations in the main study including two

Error error comparison (main versus feasibility)

Error (feasibility study)

Erro

r (m

ain

study

)

1e minus 06

1e minus 04

1e minus 02

1e + 00

1e minus 061e minus 041e minus 021e + 00

Figure 4 Comparison of a baseline sample from the main studyversus a reference sample from the feasibility study showing therate of apparent error versus error for each possible alternate alleleat each position The dotted lines correspond to a mutation rate of03

PDL0

5000

MTX

PDL0

5020

MTX

PDL0

5080

MTX

PDL1

0000

MTX

PDL1

0020

MTX

PDL1

0080

MTX

PDL1

5000

MTX

PDL1

5020

MTX

PDL1

5080

MTX

0501

Distribution of significant mutations from main study

0

20

40

60

80

Figure 5 Histogram of counts of mutations meeting the thresholdfor detection of mutations at the 01 level for each experimentalcondition tested Those mutations that also met the criteria for the05 level are highlighted in light grey

above the 1 level The overall correspondence betweenthe error rates was nevertheless quite good overall See theerror error plot in Figure 4 More importantly the errorprofiles for the main study samples compared to replicateswithin that study were very consistent See the error errorplots for the reference samples in Figure S5

We proceeded with the analysis as described Across allnine samples covering no MTX 20 nM MTX and 80 nMMTX at 50 100 and 150 PDLs 245 mutations met thecriteria established in the feasibility study for the 01 levelThese were unevenly distributed across the samples biasedstrongly toward samples with larger PDLs The distributionof mutations is shown in Figure 5 Also highlighted in this

6 BioMed Research International

Main study results (LC)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

Main study results (HC)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Main study results (DHFR)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Main study results (GAPDH)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Figure 6 Four panels correspond to each of the four targets light chain heavy chain GAPDH and DHFR (clockwise from the top left)Each panel has points for each experimental condition stratified vertically exactly as done for the feasibility study (Figure 3) The coloringjittering and offsets for the points are also identical to Figure 3 except that there are no spike-in signals here and hence no blue pointsPositions meeting the criteria for significance (119905-test 119875 value lt01) are colored purple

figure are those mutations that would have met the criteriafor mutation detection at the 05 level In total there wereten signals detected at that level

The same analysis was performed with identical settingsfor the other three targets in the experiment The pattern ofmutations was very similar in each caseThe plots in Figure 6show the apparent rate of mutation for all possible mutationsin each of the four targets studied In this more quantitativeview it is possible to see the full distribution of error ratesacross the study While many mutations met the criteria forstatistical significance (119905-test 119875 value lt01 points coloredpurple) the vast majority of those have a very low apparentmutation rate Since we had only triplicate data it was notpossible to use a more stringent statistical cutoff However itis also possible to see some general trends in this view Acrossall four targets as the PDL increases the distribution ofapparent mutation rates shifts uniformly higher for examplePresumably this reflects small true shifts in the populationaccumulating over time though few mutations met ourcriteria for detection In terms of specific mutations meeting

the criteria established for detection at the 05 level thenumbers of signals observed in light chain heavy chainDHFR andGAPDHwere 10 17 4 and 0 respectively A tablewith all signals found across all four genes is included in theSupplementary Information

4 Discussion

Here we explored using RNA-seq technology for the detec-tion of emerging mutations in a CHO cell line producing arecombinant antibody during long-term culture In the feasi-bility study we established a high-confidence mutation leveldetection limit of 01 which is significantly more sensitivethan traditional molecular biology or protein characteriza-tion techniques The detection limit of mutation by SangerDNA sequencing is around 15ndash20 [14] When comparingthe feasibility study to the main study we noticed that thebackground error profile revealed by sequencing replicatesof the same biological sample can vary from batch to batchWithin each batch the error profile at each position (whether

BioMed Research International 7

arising from amplification library prep or sequencing itself)was very consistent Therefore a reference run should beincluded in each sequencing batch and used to assess vari-ation within each batch By considering each position tohave an independent error profile we can implicitly accountfor a variety of error sources without knowing exactly whatcontribution each source makes

In the main study we analyzed all three exogenous genesintroduced by the expression vector which were heavy chainand light chain of the mAb and the DHFR selection markerWe also analyzed the house-keeping gene GAPDH as arepresentative host endogenous gene As the study showsthe mutation rate displayed a clear increasing trend withextended culture passaging And in most cases the mutationrate also increased in the presence of selection pressure(MTX) In the actual cell culture manufacturing processthe cell inoculum typically needs to be passaged for at least30ndash40 PDLs starting from a frozen cell bank and often in thepresence of selection pressure such asMTXOur experimentswere designed to sufficiently cover this manufacturingwindow with respect to both process conditions In Figure 6there is a noticeable jump in the numbers of significantmuta-tions (above 01) starting at 150 PDLs At the same time upto 100 PDLs only the sample treated with 80 nMMTX exhib-ited detectable mutations higher than 05 No mutationabove 05was observed in the house-keeping gene GAPDHunder any of the culture conditions This indicates thatincreasing selection pressure and extending passaging periodmainly affect the stability of the transgenes but have minimaleffect on endogenous host genes presumably due to thedeleterious effect to the host It is noteworthy that mutationrate can be described in two ways The first is the numberof mutations above the 01 detection limit across theentire gene fragment And the second is the percentage ofpopulation that carries a specific point mutation Both repre-sentations showed similar trend in our study

On the molecular level mutations identified in mRNAcan be attributed to DNA template mutations [15] transcrip-tional errors [16 17] or posttranscriptionalmodifications [8]Understanding the mechanism behind individual mutationsrequires further characterization of all these possible factorsincluding DNA sequence analysis of the expression vectorinserted into the genome In addition mutations detected byRNA-seq require confirmation by protein sequence analysisto assess their impact on product quality

NGS technologies have played increasing roles in thedevelopment of cell culture production process and facilitatedthe understanding of the production cell line There has notbeen a report on applying RNA sequencing to systematicallyanalyze mutation rate during extended passaging of produc-tion CHO cells Production cell line stability with respectto sequence integrity is crucial for the biopharmaceuticalindustry because cell lines carrying the intended transgenesequences are essential for product quality and patient safetyHere we have demonstrated that RNA-seq can help to ensurethe accurate flowof genomic information to the final productAlthough CHO cell lines developed with DHFR as theselection system are used as a model system in this studyto characterize gene stability the methods developed in this

study should also be applicable for other production host celllines and selection methodologies The information gener-ated should further stimulate investigation on the molecularmechanisms behind sequence variations in mRNA

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Siyan Zhang Jason D Hughes and Nicholas Murgolo con-tributed equally to this work

References

[1] M LMetzker ldquoSequencing technologiesmdashthe next generationrdquoNature Reviews Genetics vol 11 no 1 pp 31ndash46 2010

[2] S B Baylin and P A Jones ldquoA decade of exploring the cancerepigenomemdashbiological and translational implicationsrdquo NatureReviews Cancer vol 11 no 10 pp 726ndash734 2011

[3] E T Cirulli and D B Goldstein ldquoUncovering the roles of rarevariants in common disease through whole-genome sequenc-ingrdquo Nature Reviews Genetics vol 11 no 6 pp 415ndash425 2010

[4] Y-H Jiang R K C Yuen X Jin et al ldquoDetection of clinicallyrelevant genetic variants in autism spectrum disorder by whole-genome sequencingrdquo American Journal of Human Genetics vol93 no 2 pp 249ndash263 2013

[5] Z Kan H Zheng X Liu et al ldquoWhole-genome sequencingidentifies recurrent mutations in hepatocellular carcinomardquoGenome Research vol 23 no 9 pp 1422ndash1433 2013

[6] Y Song L Li Y Ou et al ldquoIdentification of genomic alterationsin oesophageal squamous cell cancerrdquoNature vol 508 no 7498pp 91ndash95 2014

[7] F Ozsolak and P M Milos ldquoRNA sequencing advanceschallenges and opportunitiesrdquo Nature Reviews Genetics vol 12no 2 pp 87ndash98 2011

[8] Z Peng Y Cheng B C-M Tan et al ldquoComprehensive analysisof RNA-Seq data reveals extensive RNA editing in a humantranscriptomerdquo Nature Biotechnology vol 30 no 3 pp 253ndash260 2012

[9] DMWuest SW Harcum and K H Lee ldquoGenomics inmam-malian cell culture bioprocessingrdquo Biotechnology Advances vol30 no 3 pp 629ndash638 2012

[10] X Xu H Nagarajan N E Lewis et al ldquoThe genomic sequenceof the Chinese hamster ovary (CHO)-K1 cell linerdquo NatureBiotechnology vol 29 no 8 pp 735ndash741 2011

[11] H Zhang W Cui and M L Gross ldquoMass spectrometryfor the biophysical characterization of therapeutic monoclonalantibodiesrdquo FEBS Letters vol 588 no 2 pp 308ndash317 2014

[12] F Cheung J Win J M Lang et al ldquoAnalysis of the Pythiumultimum transcriptome using Sanger and pyrosequencingapproachesrdquo BMC Genomics vol 9 pp 542ndash551 2008

[13] F M Wurm ldquoCHO quasispecies-implications for manufactur-ing processesrdquo Processes vol 1 no 3 pp 296ndash311 2013

[14] A C Tsiatis A Norris-Kirby R G Rich et al ldquoComparison ofSanger sequencing pyrosequencing andmelting curve analysisfor the detection of KRAS mutations diagnostic and clinicalimplicationsrdquo Journal ofMolecular Diagnostics vol 12 no 4 pp425ndash432 2010

8 BioMed Research International

[15] J A Stamatoyannopoulos I Adzhubei R E Thurman G VKryukov S M Mirkin and S R Sunyaev ldquoHuman mutationrate associated with DNA replication timingrdquo Nature Geneticsvol 41 no 4 pp 393ndash395 2009

[16] P Cui F Ding Q Lin et al ldquoDistinct contributions of repli-cation and transcription to mutation rate variation of humangenomesrdquo Genomics Proteomics amp Bioinformatics vol 10 no 1pp 4ndash10 2012

[17] P Green B Ewing W Miller P J Thomas and E DGreen ldquoTranscription-associated mutational asymmetry inmammalian evolutionrdquo Nature Genetics vol 33 no 4 pp 514ndash517 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Mutation Detection in an Antibody ...

4 BioMed Research International

preceding the insertion regardless of what sequence wasbeing inserted Similarly deletions were attributed to the basebeing deleted regardless of how many bases were spannedby the overall deletion These counts were stratified based onwhether they were found from reads aligned in the forwardor reverse directions Bases with quality scores less than15 were ignored in this analysis This cutoff was selectedto remove a minimum amount of data (typically 2ndash5 ofbases) while eliminating the lowest quality bases which aremainly those with reported base quality of two indicatingthat the sequencer failed to call the base at the positionWithin each experiment for each position in each targetsequence a preferred orientation was determined based onwhich orientation gave rise to higher overall coverage Onlydata from reads in the preferred orientation at each positionwas used to generate final results Overall this step has theimpact of removing a small portion of very-low-quality dataat the cost of ignoring just under half of the overall sequencedata which has little impact on most positions

This decision to use only data from reads in a preferredorientation is driven by the fact that some sequence contextsare problematic for sequencing (observed in a variety oftargeted sequencing experiments unpublished results) Theproblem may arise from any step in the process fromamplification to library prep to the sequencing itselfThe issueis often found in regions that are G-rich The reads on theG-rich strand will often have errors while the reads fromthe other C-rich strand do not In those cases we find thatthe ldquobetterrdquo strand usually has higher coverage presumablybecause the sequencer was unable to generate acceptablereads from that direction andor some of the base calls hadquality scores below the threshold of 15 By applying a cutoffbased on coverage we are able to identify the ldquobetterrdquo strandwithout explicitly biasing the analysis to lower-frequencyresults For consistency the strand choice is made once foreach unit of analysis the feasibility study and the main study

Once the data have been processed to the counts of A CG and T indels and deletions for each position we can deter-mine the consensus sequence and the rate of occurrence foreach possible alternate allele at each position If we considerthe data from the unmixed sample for clone A to be our ref-erence and any alternate allele observations to be errors wefind that the error rate across all possible positions measuredas the frequency of the most common alternate allele at eachposition ranges from less than 001 to a high of 027 with99of possible alternate alleles occurring at a rate of less than02 The full distribution is shown in Figure 2

To assess the reproducibility of the data we looked at theapparent error rates for each possiblemutation using replicateexperiments Figure S3 shows plots of error versus error fortwo of the 100 clone A reference samples versus the thirdThe plot has a point for each possible base at each positionincluding the reference baseThe reference base calls all hovernear 1 when there are consensus base calls that all fit into thesame pixel on the log-log plot In this way the plot focusesattention on the erroneous base callsThe red green and bluecurves correspond to a difference in apparentmutation rate of10 1 and 01 respectively Using these plots it is possibleto quickly identify any outliers that might correspond to true

minus45 minus40 minus35 minus30 minus25

Freq

uenc

y

Distribution of error rates (feasibility study)

0

50

100

150

200

250

300

log10 (frequency of major alt allele)

Figure 2 Distribution of error rates across all positions in lightchain from the feasibility study The most frequent alternate alleleat each position is used to populate the figure

mutations and to get an estimate of the overall noise level inthe experiment

For these samples there are a few points very close tothe blue 01 line but none that actually cross it in eithercomparison By contrast when there is a true signal in thedata set data points are expected to be well outside thisregion For example if we take two of the 01 spiked controlsand two of the 05 spiked controls and compare them to the0 reference we obtain the plots in Figure S4The points cor-responding to the true spiked-in mutations are colored red

We will take the signal for each mutation in each spiked-in sample to be the difference between the average alternateallele rate observed in each of the three replicate spike-insamples and the average alternate allele rate observed for thecorresponding mutation in the replicate reference samplesFor each of these possible mutations we will use a 119905-testto assess whether the difference between the two means isstatistically significant Given the small numbers of replicatesinvolved the 119905-test results will not be used aggressively butrather as a filter to weed out spurious results (uncorrected 119875value cutoff of 01)

The main results from the samples in the feasibility studyare shown in Figure 3 We find that the estimates of mixingratio are very accurateThemedian signals at positive controlsites for the 001 005 01 05 1 and 5 spike-in experiments were 0017 0057 011 057 11and 53 respectively The range of signals was typically asmuch as plusmn2x however Certain sites have consistently loweror higher signal estimates across different spike-in levelssuggesting that the variability may be sequence-dependentand may not be corrected by additional sequencing

All 46 true-positive mutations are observed with statis-tical significance for spike-in levels of 5 1 and 05At the 01 005 and 001 spike-in levels 4546 4246and 1046 of the mutations are observed Across all controlsites (true negative) 27 false positives were observed Theobserved signal was less than 001 in most of those cases

BioMed Research International 5

Feasibility study results

Mutation rate at each position

Vary

ing

mix

ing

ratio

s

100

5

1

05

01

005

001

1e minus 011e minus 031e minus 051e minus 07

1

2

3

4

5

6

7

Figure 3 The seven horizontal bands of points correspond toexperiments with mixing ratios of 001 005 01 05 1 5and 100 There are points for each position in light chain for eachsample sequenced The 119909-axis corresponds to the apparent signalfor each spiked-in sample In order to include the negatives thatresult from this measurement on the log-scale plot they are plottedas their absolute values colored grey and offset just below theother points The points corresponding to the spiked-in mutationsare colored blue and offset just above the other points The lightblue points did not meet the threshold for statistical significanceTrue-negative mutations that did meet the criteria for statisticalsignificance are colored purple instead of black All points have hada small amount of vertical jitter addedThe jitter and offsets serve toallow visualization of the full distribution of points for the negativeand positive controls

and the highest signal observed was 003 By contrastfor the positive control sites at the 01 spike-in level thelowest observed excess signal was 00599 Based on theseobservations we set the following thresholds for mutationdetection in the main study excess mutation signal of morethan 005with a119875 value less than 01 In the feasibility studythese criteria would yield 4546 true positives at the 01spike-in level with no false positives The one false negativehad an apparent signal of 012 but just barely missed the 119875value cutoff with a value of 012 Therefore these settings aredesigned to be sufficient to detect (or rule out)mutationswitha true signal of more than 01

It is worth noting here that had we been interested onlyin mutations at higher levels the natural thresholds basedon this feasibility study would always be around one-half ofthe desired mutation detection rate That threshold wouldstill allow perfect sensitivity for all 46 tested mutations whileminimizing the false positive rate

32 Main Study We found that the error profile for the mainstudy was slightly different than that observed in the feasi-bility study Overall the error profile was better for the mainstudy with an average error rate over all possible substitutionsand indels of 011 versus 017 for the feasibility study

However while there were no mutations with a back-ground rate of more than 03 in the feasibility study therewere four such mutations in the main study including two

Error error comparison (main versus feasibility)

Error (feasibility study)

Erro

r (m

ain

study

)

1e minus 06

1e minus 04

1e minus 02

1e + 00

1e minus 061e minus 041e minus 021e + 00

Figure 4 Comparison of a baseline sample from the main studyversus a reference sample from the feasibility study showing therate of apparent error versus error for each possible alternate alleleat each position The dotted lines correspond to a mutation rate of03

PDL0

5000

MTX

PDL0

5020

MTX

PDL0

5080

MTX

PDL1

0000

MTX

PDL1

0020

MTX

PDL1

0080

MTX

PDL1

5000

MTX

PDL1

5020

MTX

PDL1

5080

MTX

0501

Distribution of significant mutations from main study

0

20

40

60

80

Figure 5 Histogram of counts of mutations meeting the thresholdfor detection of mutations at the 01 level for each experimentalcondition tested Those mutations that also met the criteria for the05 level are highlighted in light grey

above the 1 level The overall correspondence betweenthe error rates was nevertheless quite good overall See theerror error plot in Figure 4 More importantly the errorprofiles for the main study samples compared to replicateswithin that study were very consistent See the error errorplots for the reference samples in Figure S5

We proceeded with the analysis as described Across allnine samples covering no MTX 20 nM MTX and 80 nMMTX at 50 100 and 150 PDLs 245 mutations met thecriteria established in the feasibility study for the 01 levelThese were unevenly distributed across the samples biasedstrongly toward samples with larger PDLs The distributionof mutations is shown in Figure 5 Also highlighted in this

6 BioMed Research International

Main study results (LC)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

Main study results (HC)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Main study results (DHFR)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Main study results (GAPDH)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Figure 6 Four panels correspond to each of the four targets light chain heavy chain GAPDH and DHFR (clockwise from the top left)Each panel has points for each experimental condition stratified vertically exactly as done for the feasibility study (Figure 3) The coloringjittering and offsets for the points are also identical to Figure 3 except that there are no spike-in signals here and hence no blue pointsPositions meeting the criteria for significance (119905-test 119875 value lt01) are colored purple

figure are those mutations that would have met the criteriafor mutation detection at the 05 level In total there wereten signals detected at that level

The same analysis was performed with identical settingsfor the other three targets in the experiment The pattern ofmutations was very similar in each caseThe plots in Figure 6show the apparent rate of mutation for all possible mutationsin each of the four targets studied In this more quantitativeview it is possible to see the full distribution of error ratesacross the study While many mutations met the criteria forstatistical significance (119905-test 119875 value lt01 points coloredpurple) the vast majority of those have a very low apparentmutation rate Since we had only triplicate data it was notpossible to use a more stringent statistical cutoff However itis also possible to see some general trends in this view Acrossall four targets as the PDL increases the distribution ofapparent mutation rates shifts uniformly higher for examplePresumably this reflects small true shifts in the populationaccumulating over time though few mutations met ourcriteria for detection In terms of specific mutations meeting

the criteria established for detection at the 05 level thenumbers of signals observed in light chain heavy chainDHFR andGAPDHwere 10 17 4 and 0 respectively A tablewith all signals found across all four genes is included in theSupplementary Information

4 Discussion

Here we explored using RNA-seq technology for the detec-tion of emerging mutations in a CHO cell line producing arecombinant antibody during long-term culture In the feasi-bility study we established a high-confidence mutation leveldetection limit of 01 which is significantly more sensitivethan traditional molecular biology or protein characteriza-tion techniques The detection limit of mutation by SangerDNA sequencing is around 15ndash20 [14] When comparingthe feasibility study to the main study we noticed that thebackground error profile revealed by sequencing replicatesof the same biological sample can vary from batch to batchWithin each batch the error profile at each position (whether

BioMed Research International 7

arising from amplification library prep or sequencing itself)was very consistent Therefore a reference run should beincluded in each sequencing batch and used to assess vari-ation within each batch By considering each position tohave an independent error profile we can implicitly accountfor a variety of error sources without knowing exactly whatcontribution each source makes

In the main study we analyzed all three exogenous genesintroduced by the expression vector which were heavy chainand light chain of the mAb and the DHFR selection markerWe also analyzed the house-keeping gene GAPDH as arepresentative host endogenous gene As the study showsthe mutation rate displayed a clear increasing trend withextended culture passaging And in most cases the mutationrate also increased in the presence of selection pressure(MTX) In the actual cell culture manufacturing processthe cell inoculum typically needs to be passaged for at least30ndash40 PDLs starting from a frozen cell bank and often in thepresence of selection pressure such asMTXOur experimentswere designed to sufficiently cover this manufacturingwindow with respect to both process conditions In Figure 6there is a noticeable jump in the numbers of significantmuta-tions (above 01) starting at 150 PDLs At the same time upto 100 PDLs only the sample treated with 80 nMMTX exhib-ited detectable mutations higher than 05 No mutationabove 05was observed in the house-keeping gene GAPDHunder any of the culture conditions This indicates thatincreasing selection pressure and extending passaging periodmainly affect the stability of the transgenes but have minimaleffect on endogenous host genes presumably due to thedeleterious effect to the host It is noteworthy that mutationrate can be described in two ways The first is the numberof mutations above the 01 detection limit across theentire gene fragment And the second is the percentage ofpopulation that carries a specific point mutation Both repre-sentations showed similar trend in our study

On the molecular level mutations identified in mRNAcan be attributed to DNA template mutations [15] transcrip-tional errors [16 17] or posttranscriptionalmodifications [8]Understanding the mechanism behind individual mutationsrequires further characterization of all these possible factorsincluding DNA sequence analysis of the expression vectorinserted into the genome In addition mutations detected byRNA-seq require confirmation by protein sequence analysisto assess their impact on product quality

NGS technologies have played increasing roles in thedevelopment of cell culture production process and facilitatedthe understanding of the production cell line There has notbeen a report on applying RNA sequencing to systematicallyanalyze mutation rate during extended passaging of produc-tion CHO cells Production cell line stability with respectto sequence integrity is crucial for the biopharmaceuticalindustry because cell lines carrying the intended transgenesequences are essential for product quality and patient safetyHere we have demonstrated that RNA-seq can help to ensurethe accurate flowof genomic information to the final productAlthough CHO cell lines developed with DHFR as theselection system are used as a model system in this studyto characterize gene stability the methods developed in this

study should also be applicable for other production host celllines and selection methodologies The information gener-ated should further stimulate investigation on the molecularmechanisms behind sequence variations in mRNA

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Siyan Zhang Jason D Hughes and Nicholas Murgolo con-tributed equally to this work

References

[1] M LMetzker ldquoSequencing technologiesmdashthe next generationrdquoNature Reviews Genetics vol 11 no 1 pp 31ndash46 2010

[2] S B Baylin and P A Jones ldquoA decade of exploring the cancerepigenomemdashbiological and translational implicationsrdquo NatureReviews Cancer vol 11 no 10 pp 726ndash734 2011

[3] E T Cirulli and D B Goldstein ldquoUncovering the roles of rarevariants in common disease through whole-genome sequenc-ingrdquo Nature Reviews Genetics vol 11 no 6 pp 415ndash425 2010

[4] Y-H Jiang R K C Yuen X Jin et al ldquoDetection of clinicallyrelevant genetic variants in autism spectrum disorder by whole-genome sequencingrdquo American Journal of Human Genetics vol93 no 2 pp 249ndash263 2013

[5] Z Kan H Zheng X Liu et al ldquoWhole-genome sequencingidentifies recurrent mutations in hepatocellular carcinomardquoGenome Research vol 23 no 9 pp 1422ndash1433 2013

[6] Y Song L Li Y Ou et al ldquoIdentification of genomic alterationsin oesophageal squamous cell cancerrdquoNature vol 508 no 7498pp 91ndash95 2014

[7] F Ozsolak and P M Milos ldquoRNA sequencing advanceschallenges and opportunitiesrdquo Nature Reviews Genetics vol 12no 2 pp 87ndash98 2011

[8] Z Peng Y Cheng B C-M Tan et al ldquoComprehensive analysisof RNA-Seq data reveals extensive RNA editing in a humantranscriptomerdquo Nature Biotechnology vol 30 no 3 pp 253ndash260 2012

[9] DMWuest SW Harcum and K H Lee ldquoGenomics inmam-malian cell culture bioprocessingrdquo Biotechnology Advances vol30 no 3 pp 629ndash638 2012

[10] X Xu H Nagarajan N E Lewis et al ldquoThe genomic sequenceof the Chinese hamster ovary (CHO)-K1 cell linerdquo NatureBiotechnology vol 29 no 8 pp 735ndash741 2011

[11] H Zhang W Cui and M L Gross ldquoMass spectrometryfor the biophysical characterization of therapeutic monoclonalantibodiesrdquo FEBS Letters vol 588 no 2 pp 308ndash317 2014

[12] F Cheung J Win J M Lang et al ldquoAnalysis of the Pythiumultimum transcriptome using Sanger and pyrosequencingapproachesrdquo BMC Genomics vol 9 pp 542ndash551 2008

[13] F M Wurm ldquoCHO quasispecies-implications for manufactur-ing processesrdquo Processes vol 1 no 3 pp 296ndash311 2013

[14] A C Tsiatis A Norris-Kirby R G Rich et al ldquoComparison ofSanger sequencing pyrosequencing andmelting curve analysisfor the detection of KRAS mutations diagnostic and clinicalimplicationsrdquo Journal ofMolecular Diagnostics vol 12 no 4 pp425ndash432 2010

8 BioMed Research International

[15] J A Stamatoyannopoulos I Adzhubei R E Thurman G VKryukov S M Mirkin and S R Sunyaev ldquoHuman mutationrate associated with DNA replication timingrdquo Nature Geneticsvol 41 no 4 pp 393ndash395 2009

[16] P Cui F Ding Q Lin et al ldquoDistinct contributions of repli-cation and transcription to mutation rate variation of humangenomesrdquo Genomics Proteomics amp Bioinformatics vol 10 no 1pp 4ndash10 2012

[17] P Green B Ewing W Miller P J Thomas and E DGreen ldquoTranscription-associated mutational asymmetry inmammalian evolutionrdquo Nature Genetics vol 33 no 4 pp 514ndash517 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Mutation Detection in an Antibody ...

BioMed Research International 5

Feasibility study results

Mutation rate at each position

Vary

ing

mix

ing

ratio

s

100

5

1

05

01

005

001

1e minus 011e minus 031e minus 051e minus 07

1

2

3

4

5

6

7

Figure 3 The seven horizontal bands of points correspond toexperiments with mixing ratios of 001 005 01 05 1 5and 100 There are points for each position in light chain for eachsample sequenced The 119909-axis corresponds to the apparent signalfor each spiked-in sample In order to include the negatives thatresult from this measurement on the log-scale plot they are plottedas their absolute values colored grey and offset just below theother points The points corresponding to the spiked-in mutationsare colored blue and offset just above the other points The lightblue points did not meet the threshold for statistical significanceTrue-negative mutations that did meet the criteria for statisticalsignificance are colored purple instead of black All points have hada small amount of vertical jitter addedThe jitter and offsets serve toallow visualization of the full distribution of points for the negativeand positive controls

and the highest signal observed was 003 By contrastfor the positive control sites at the 01 spike-in level thelowest observed excess signal was 00599 Based on theseobservations we set the following thresholds for mutationdetection in the main study excess mutation signal of morethan 005with a119875 value less than 01 In the feasibility studythese criteria would yield 4546 true positives at the 01spike-in level with no false positives The one false negativehad an apparent signal of 012 but just barely missed the 119875value cutoff with a value of 012 Therefore these settings aredesigned to be sufficient to detect (or rule out)mutationswitha true signal of more than 01

It is worth noting here that had we been interested onlyin mutations at higher levels the natural thresholds basedon this feasibility study would always be around one-half ofthe desired mutation detection rate That threshold wouldstill allow perfect sensitivity for all 46 tested mutations whileminimizing the false positive rate

32 Main Study We found that the error profile for the mainstudy was slightly different than that observed in the feasi-bility study Overall the error profile was better for the mainstudy with an average error rate over all possible substitutionsand indels of 011 versus 017 for the feasibility study

However while there were no mutations with a back-ground rate of more than 03 in the feasibility study therewere four such mutations in the main study including two

Error error comparison (main versus feasibility)

Error (feasibility study)

Erro

r (m

ain

study

)

1e minus 06

1e minus 04

1e minus 02

1e + 00

1e minus 061e minus 041e minus 021e + 00

Figure 4 Comparison of a baseline sample from the main studyversus a reference sample from the feasibility study showing therate of apparent error versus error for each possible alternate alleleat each position The dotted lines correspond to a mutation rate of03

PDL0

5000

MTX

PDL0

5020

MTX

PDL0

5080

MTX

PDL1

0000

MTX

PDL1

0020

MTX

PDL1

0080

MTX

PDL1

5000

MTX

PDL1

5020

MTX

PDL1

5080

MTX

0501

Distribution of significant mutations from main study

0

20

40

60

80

Figure 5 Histogram of counts of mutations meeting the thresholdfor detection of mutations at the 01 level for each experimentalcondition tested Those mutations that also met the criteria for the05 level are highlighted in light grey

above the 1 level The overall correspondence betweenthe error rates was nevertheless quite good overall See theerror error plot in Figure 4 More importantly the errorprofiles for the main study samples compared to replicateswithin that study were very consistent See the error errorplots for the reference samples in Figure S5

We proceeded with the analysis as described Across allnine samples covering no MTX 20 nM MTX and 80 nMMTX at 50 100 and 150 PDLs 245 mutations met thecriteria established in the feasibility study for the 01 levelThese were unevenly distributed across the samples biasedstrongly toward samples with larger PDLs The distributionof mutations is shown in Figure 5 Also highlighted in this

6 BioMed Research International

Main study results (LC)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

Main study results (HC)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Main study results (DHFR)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Main study results (GAPDH)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Figure 6 Four panels correspond to each of the four targets light chain heavy chain GAPDH and DHFR (clockwise from the top left)Each panel has points for each experimental condition stratified vertically exactly as done for the feasibility study (Figure 3) The coloringjittering and offsets for the points are also identical to Figure 3 except that there are no spike-in signals here and hence no blue pointsPositions meeting the criteria for significance (119905-test 119875 value lt01) are colored purple

figure are those mutations that would have met the criteriafor mutation detection at the 05 level In total there wereten signals detected at that level

The same analysis was performed with identical settingsfor the other three targets in the experiment The pattern ofmutations was very similar in each caseThe plots in Figure 6show the apparent rate of mutation for all possible mutationsin each of the four targets studied In this more quantitativeview it is possible to see the full distribution of error ratesacross the study While many mutations met the criteria forstatistical significance (119905-test 119875 value lt01 points coloredpurple) the vast majority of those have a very low apparentmutation rate Since we had only triplicate data it was notpossible to use a more stringent statistical cutoff However itis also possible to see some general trends in this view Acrossall four targets as the PDL increases the distribution ofapparent mutation rates shifts uniformly higher for examplePresumably this reflects small true shifts in the populationaccumulating over time though few mutations met ourcriteria for detection In terms of specific mutations meeting

the criteria established for detection at the 05 level thenumbers of signals observed in light chain heavy chainDHFR andGAPDHwere 10 17 4 and 0 respectively A tablewith all signals found across all four genes is included in theSupplementary Information

4 Discussion

Here we explored using RNA-seq technology for the detec-tion of emerging mutations in a CHO cell line producing arecombinant antibody during long-term culture In the feasi-bility study we established a high-confidence mutation leveldetection limit of 01 which is significantly more sensitivethan traditional molecular biology or protein characteriza-tion techniques The detection limit of mutation by SangerDNA sequencing is around 15ndash20 [14] When comparingthe feasibility study to the main study we noticed that thebackground error profile revealed by sequencing replicatesof the same biological sample can vary from batch to batchWithin each batch the error profile at each position (whether

BioMed Research International 7

arising from amplification library prep or sequencing itself)was very consistent Therefore a reference run should beincluded in each sequencing batch and used to assess vari-ation within each batch By considering each position tohave an independent error profile we can implicitly accountfor a variety of error sources without knowing exactly whatcontribution each source makes

In the main study we analyzed all three exogenous genesintroduced by the expression vector which were heavy chainand light chain of the mAb and the DHFR selection markerWe also analyzed the house-keeping gene GAPDH as arepresentative host endogenous gene As the study showsthe mutation rate displayed a clear increasing trend withextended culture passaging And in most cases the mutationrate also increased in the presence of selection pressure(MTX) In the actual cell culture manufacturing processthe cell inoculum typically needs to be passaged for at least30ndash40 PDLs starting from a frozen cell bank and often in thepresence of selection pressure such asMTXOur experimentswere designed to sufficiently cover this manufacturingwindow with respect to both process conditions In Figure 6there is a noticeable jump in the numbers of significantmuta-tions (above 01) starting at 150 PDLs At the same time upto 100 PDLs only the sample treated with 80 nMMTX exhib-ited detectable mutations higher than 05 No mutationabove 05was observed in the house-keeping gene GAPDHunder any of the culture conditions This indicates thatincreasing selection pressure and extending passaging periodmainly affect the stability of the transgenes but have minimaleffect on endogenous host genes presumably due to thedeleterious effect to the host It is noteworthy that mutationrate can be described in two ways The first is the numberof mutations above the 01 detection limit across theentire gene fragment And the second is the percentage ofpopulation that carries a specific point mutation Both repre-sentations showed similar trend in our study

On the molecular level mutations identified in mRNAcan be attributed to DNA template mutations [15] transcrip-tional errors [16 17] or posttranscriptionalmodifications [8]Understanding the mechanism behind individual mutationsrequires further characterization of all these possible factorsincluding DNA sequence analysis of the expression vectorinserted into the genome In addition mutations detected byRNA-seq require confirmation by protein sequence analysisto assess their impact on product quality

NGS technologies have played increasing roles in thedevelopment of cell culture production process and facilitatedthe understanding of the production cell line There has notbeen a report on applying RNA sequencing to systematicallyanalyze mutation rate during extended passaging of produc-tion CHO cells Production cell line stability with respectto sequence integrity is crucial for the biopharmaceuticalindustry because cell lines carrying the intended transgenesequences are essential for product quality and patient safetyHere we have demonstrated that RNA-seq can help to ensurethe accurate flowof genomic information to the final productAlthough CHO cell lines developed with DHFR as theselection system are used as a model system in this studyto characterize gene stability the methods developed in this

study should also be applicable for other production host celllines and selection methodologies The information gener-ated should further stimulate investigation on the molecularmechanisms behind sequence variations in mRNA

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Siyan Zhang Jason D Hughes and Nicholas Murgolo con-tributed equally to this work

References

[1] M LMetzker ldquoSequencing technologiesmdashthe next generationrdquoNature Reviews Genetics vol 11 no 1 pp 31ndash46 2010

[2] S B Baylin and P A Jones ldquoA decade of exploring the cancerepigenomemdashbiological and translational implicationsrdquo NatureReviews Cancer vol 11 no 10 pp 726ndash734 2011

[3] E T Cirulli and D B Goldstein ldquoUncovering the roles of rarevariants in common disease through whole-genome sequenc-ingrdquo Nature Reviews Genetics vol 11 no 6 pp 415ndash425 2010

[4] Y-H Jiang R K C Yuen X Jin et al ldquoDetection of clinicallyrelevant genetic variants in autism spectrum disorder by whole-genome sequencingrdquo American Journal of Human Genetics vol93 no 2 pp 249ndash263 2013

[5] Z Kan H Zheng X Liu et al ldquoWhole-genome sequencingidentifies recurrent mutations in hepatocellular carcinomardquoGenome Research vol 23 no 9 pp 1422ndash1433 2013

[6] Y Song L Li Y Ou et al ldquoIdentification of genomic alterationsin oesophageal squamous cell cancerrdquoNature vol 508 no 7498pp 91ndash95 2014

[7] F Ozsolak and P M Milos ldquoRNA sequencing advanceschallenges and opportunitiesrdquo Nature Reviews Genetics vol 12no 2 pp 87ndash98 2011

[8] Z Peng Y Cheng B C-M Tan et al ldquoComprehensive analysisof RNA-Seq data reveals extensive RNA editing in a humantranscriptomerdquo Nature Biotechnology vol 30 no 3 pp 253ndash260 2012

[9] DMWuest SW Harcum and K H Lee ldquoGenomics inmam-malian cell culture bioprocessingrdquo Biotechnology Advances vol30 no 3 pp 629ndash638 2012

[10] X Xu H Nagarajan N E Lewis et al ldquoThe genomic sequenceof the Chinese hamster ovary (CHO)-K1 cell linerdquo NatureBiotechnology vol 29 no 8 pp 735ndash741 2011

[11] H Zhang W Cui and M L Gross ldquoMass spectrometryfor the biophysical characterization of therapeutic monoclonalantibodiesrdquo FEBS Letters vol 588 no 2 pp 308ndash317 2014

[12] F Cheung J Win J M Lang et al ldquoAnalysis of the Pythiumultimum transcriptome using Sanger and pyrosequencingapproachesrdquo BMC Genomics vol 9 pp 542ndash551 2008

[13] F M Wurm ldquoCHO quasispecies-implications for manufactur-ing processesrdquo Processes vol 1 no 3 pp 296ndash311 2013

[14] A C Tsiatis A Norris-Kirby R G Rich et al ldquoComparison ofSanger sequencing pyrosequencing andmelting curve analysisfor the detection of KRAS mutations diagnostic and clinicalimplicationsrdquo Journal ofMolecular Diagnostics vol 12 no 4 pp425ndash432 2010

8 BioMed Research International

[15] J A Stamatoyannopoulos I Adzhubei R E Thurman G VKryukov S M Mirkin and S R Sunyaev ldquoHuman mutationrate associated with DNA replication timingrdquo Nature Geneticsvol 41 no 4 pp 393ndash395 2009

[16] P Cui F Ding Q Lin et al ldquoDistinct contributions of repli-cation and transcription to mutation rate variation of humangenomesrdquo Genomics Proteomics amp Bioinformatics vol 10 no 1pp 4ndash10 2012

[17] P Green B Ewing W Miller P J Thomas and E DGreen ldquoTranscription-associated mutational asymmetry inmammalian evolutionrdquo Nature Genetics vol 33 no 4 pp 514ndash517 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Mutation Detection in an Antibody ...

6 BioMed Research International

Main study results (LC)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

Main study results (HC)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Main study results (DHFR)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Main study results (GAPDH)

Mutation rate at each position

Vary

ing

PDL

MTX

trea

tmen

t

1e minus 011e minus 031e minus 051e minus 07

2

4

6

8

PDL50MTX = 0

PDL50MTX = 20

PDL50MTX = 80

PDL100MTX = 0

PDL100MTX = 20

PDL100MTX = 80

PDL150MTX = 0

PDL150MTX = 20

PDL150MTX = 80

Figure 6 Four panels correspond to each of the four targets light chain heavy chain GAPDH and DHFR (clockwise from the top left)Each panel has points for each experimental condition stratified vertically exactly as done for the feasibility study (Figure 3) The coloringjittering and offsets for the points are also identical to Figure 3 except that there are no spike-in signals here and hence no blue pointsPositions meeting the criteria for significance (119905-test 119875 value lt01) are colored purple

figure are those mutations that would have met the criteriafor mutation detection at the 05 level In total there wereten signals detected at that level

The same analysis was performed with identical settingsfor the other three targets in the experiment The pattern ofmutations was very similar in each caseThe plots in Figure 6show the apparent rate of mutation for all possible mutationsin each of the four targets studied In this more quantitativeview it is possible to see the full distribution of error ratesacross the study While many mutations met the criteria forstatistical significance (119905-test 119875 value lt01 points coloredpurple) the vast majority of those have a very low apparentmutation rate Since we had only triplicate data it was notpossible to use a more stringent statistical cutoff However itis also possible to see some general trends in this view Acrossall four targets as the PDL increases the distribution ofapparent mutation rates shifts uniformly higher for examplePresumably this reflects small true shifts in the populationaccumulating over time though few mutations met ourcriteria for detection In terms of specific mutations meeting

the criteria established for detection at the 05 level thenumbers of signals observed in light chain heavy chainDHFR andGAPDHwere 10 17 4 and 0 respectively A tablewith all signals found across all four genes is included in theSupplementary Information

4 Discussion

Here we explored using RNA-seq technology for the detec-tion of emerging mutations in a CHO cell line producing arecombinant antibody during long-term culture In the feasi-bility study we established a high-confidence mutation leveldetection limit of 01 which is significantly more sensitivethan traditional molecular biology or protein characteriza-tion techniques The detection limit of mutation by SangerDNA sequencing is around 15ndash20 [14] When comparingthe feasibility study to the main study we noticed that thebackground error profile revealed by sequencing replicatesof the same biological sample can vary from batch to batchWithin each batch the error profile at each position (whether

BioMed Research International 7

arising from amplification library prep or sequencing itself)was very consistent Therefore a reference run should beincluded in each sequencing batch and used to assess vari-ation within each batch By considering each position tohave an independent error profile we can implicitly accountfor a variety of error sources without knowing exactly whatcontribution each source makes

In the main study we analyzed all three exogenous genesintroduced by the expression vector which were heavy chainand light chain of the mAb and the DHFR selection markerWe also analyzed the house-keeping gene GAPDH as arepresentative host endogenous gene As the study showsthe mutation rate displayed a clear increasing trend withextended culture passaging And in most cases the mutationrate also increased in the presence of selection pressure(MTX) In the actual cell culture manufacturing processthe cell inoculum typically needs to be passaged for at least30ndash40 PDLs starting from a frozen cell bank and often in thepresence of selection pressure such asMTXOur experimentswere designed to sufficiently cover this manufacturingwindow with respect to both process conditions In Figure 6there is a noticeable jump in the numbers of significantmuta-tions (above 01) starting at 150 PDLs At the same time upto 100 PDLs only the sample treated with 80 nMMTX exhib-ited detectable mutations higher than 05 No mutationabove 05was observed in the house-keeping gene GAPDHunder any of the culture conditions This indicates thatincreasing selection pressure and extending passaging periodmainly affect the stability of the transgenes but have minimaleffect on endogenous host genes presumably due to thedeleterious effect to the host It is noteworthy that mutationrate can be described in two ways The first is the numberof mutations above the 01 detection limit across theentire gene fragment And the second is the percentage ofpopulation that carries a specific point mutation Both repre-sentations showed similar trend in our study

On the molecular level mutations identified in mRNAcan be attributed to DNA template mutations [15] transcrip-tional errors [16 17] or posttranscriptionalmodifications [8]Understanding the mechanism behind individual mutationsrequires further characterization of all these possible factorsincluding DNA sequence analysis of the expression vectorinserted into the genome In addition mutations detected byRNA-seq require confirmation by protein sequence analysisto assess their impact on product quality

NGS technologies have played increasing roles in thedevelopment of cell culture production process and facilitatedthe understanding of the production cell line There has notbeen a report on applying RNA sequencing to systematicallyanalyze mutation rate during extended passaging of produc-tion CHO cells Production cell line stability with respectto sequence integrity is crucial for the biopharmaceuticalindustry because cell lines carrying the intended transgenesequences are essential for product quality and patient safetyHere we have demonstrated that RNA-seq can help to ensurethe accurate flowof genomic information to the final productAlthough CHO cell lines developed with DHFR as theselection system are used as a model system in this studyto characterize gene stability the methods developed in this

study should also be applicable for other production host celllines and selection methodologies The information gener-ated should further stimulate investigation on the molecularmechanisms behind sequence variations in mRNA

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Siyan Zhang Jason D Hughes and Nicholas Murgolo con-tributed equally to this work

References

[1] M LMetzker ldquoSequencing technologiesmdashthe next generationrdquoNature Reviews Genetics vol 11 no 1 pp 31ndash46 2010

[2] S B Baylin and P A Jones ldquoA decade of exploring the cancerepigenomemdashbiological and translational implicationsrdquo NatureReviews Cancer vol 11 no 10 pp 726ndash734 2011

[3] E T Cirulli and D B Goldstein ldquoUncovering the roles of rarevariants in common disease through whole-genome sequenc-ingrdquo Nature Reviews Genetics vol 11 no 6 pp 415ndash425 2010

[4] Y-H Jiang R K C Yuen X Jin et al ldquoDetection of clinicallyrelevant genetic variants in autism spectrum disorder by whole-genome sequencingrdquo American Journal of Human Genetics vol93 no 2 pp 249ndash263 2013

[5] Z Kan H Zheng X Liu et al ldquoWhole-genome sequencingidentifies recurrent mutations in hepatocellular carcinomardquoGenome Research vol 23 no 9 pp 1422ndash1433 2013

[6] Y Song L Li Y Ou et al ldquoIdentification of genomic alterationsin oesophageal squamous cell cancerrdquoNature vol 508 no 7498pp 91ndash95 2014

[7] F Ozsolak and P M Milos ldquoRNA sequencing advanceschallenges and opportunitiesrdquo Nature Reviews Genetics vol 12no 2 pp 87ndash98 2011

[8] Z Peng Y Cheng B C-M Tan et al ldquoComprehensive analysisof RNA-Seq data reveals extensive RNA editing in a humantranscriptomerdquo Nature Biotechnology vol 30 no 3 pp 253ndash260 2012

[9] DMWuest SW Harcum and K H Lee ldquoGenomics inmam-malian cell culture bioprocessingrdquo Biotechnology Advances vol30 no 3 pp 629ndash638 2012

[10] X Xu H Nagarajan N E Lewis et al ldquoThe genomic sequenceof the Chinese hamster ovary (CHO)-K1 cell linerdquo NatureBiotechnology vol 29 no 8 pp 735ndash741 2011

[11] H Zhang W Cui and M L Gross ldquoMass spectrometryfor the biophysical characterization of therapeutic monoclonalantibodiesrdquo FEBS Letters vol 588 no 2 pp 308ndash317 2014

[12] F Cheung J Win J M Lang et al ldquoAnalysis of the Pythiumultimum transcriptome using Sanger and pyrosequencingapproachesrdquo BMC Genomics vol 9 pp 542ndash551 2008

[13] F M Wurm ldquoCHO quasispecies-implications for manufactur-ing processesrdquo Processes vol 1 no 3 pp 296ndash311 2013

[14] A C Tsiatis A Norris-Kirby R G Rich et al ldquoComparison ofSanger sequencing pyrosequencing andmelting curve analysisfor the detection of KRAS mutations diagnostic and clinicalimplicationsrdquo Journal ofMolecular Diagnostics vol 12 no 4 pp425ndash432 2010

8 BioMed Research International

[15] J A Stamatoyannopoulos I Adzhubei R E Thurman G VKryukov S M Mirkin and S R Sunyaev ldquoHuman mutationrate associated with DNA replication timingrdquo Nature Geneticsvol 41 no 4 pp 393ndash395 2009

[16] P Cui F Ding Q Lin et al ldquoDistinct contributions of repli-cation and transcription to mutation rate variation of humangenomesrdquo Genomics Proteomics amp Bioinformatics vol 10 no 1pp 4ndash10 2012

[17] P Green B Ewing W Miller P J Thomas and E DGreen ldquoTranscription-associated mutational asymmetry inmammalian evolutionrdquo Nature Genetics vol 33 no 4 pp 514ndash517 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Mutation Detection in an Antibody ...

BioMed Research International 7

arising from amplification library prep or sequencing itself)was very consistent Therefore a reference run should beincluded in each sequencing batch and used to assess vari-ation within each batch By considering each position tohave an independent error profile we can implicitly accountfor a variety of error sources without knowing exactly whatcontribution each source makes

In the main study we analyzed all three exogenous genesintroduced by the expression vector which were heavy chainand light chain of the mAb and the DHFR selection markerWe also analyzed the house-keeping gene GAPDH as arepresentative host endogenous gene As the study showsthe mutation rate displayed a clear increasing trend withextended culture passaging And in most cases the mutationrate also increased in the presence of selection pressure(MTX) In the actual cell culture manufacturing processthe cell inoculum typically needs to be passaged for at least30ndash40 PDLs starting from a frozen cell bank and often in thepresence of selection pressure such asMTXOur experimentswere designed to sufficiently cover this manufacturingwindow with respect to both process conditions In Figure 6there is a noticeable jump in the numbers of significantmuta-tions (above 01) starting at 150 PDLs At the same time upto 100 PDLs only the sample treated with 80 nMMTX exhib-ited detectable mutations higher than 05 No mutationabove 05was observed in the house-keeping gene GAPDHunder any of the culture conditions This indicates thatincreasing selection pressure and extending passaging periodmainly affect the stability of the transgenes but have minimaleffect on endogenous host genes presumably due to thedeleterious effect to the host It is noteworthy that mutationrate can be described in two ways The first is the numberof mutations above the 01 detection limit across theentire gene fragment And the second is the percentage ofpopulation that carries a specific point mutation Both repre-sentations showed similar trend in our study

On the molecular level mutations identified in mRNAcan be attributed to DNA template mutations [15] transcrip-tional errors [16 17] or posttranscriptionalmodifications [8]Understanding the mechanism behind individual mutationsrequires further characterization of all these possible factorsincluding DNA sequence analysis of the expression vectorinserted into the genome In addition mutations detected byRNA-seq require confirmation by protein sequence analysisto assess their impact on product quality

NGS technologies have played increasing roles in thedevelopment of cell culture production process and facilitatedthe understanding of the production cell line There has notbeen a report on applying RNA sequencing to systematicallyanalyze mutation rate during extended passaging of produc-tion CHO cells Production cell line stability with respectto sequence integrity is crucial for the biopharmaceuticalindustry because cell lines carrying the intended transgenesequences are essential for product quality and patient safetyHere we have demonstrated that RNA-seq can help to ensurethe accurate flowof genomic information to the final productAlthough CHO cell lines developed with DHFR as theselection system are used as a model system in this studyto characterize gene stability the methods developed in this

study should also be applicable for other production host celllines and selection methodologies The information gener-ated should further stimulate investigation on the molecularmechanisms behind sequence variations in mRNA

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Siyan Zhang Jason D Hughes and Nicholas Murgolo con-tributed equally to this work

References

[1] M LMetzker ldquoSequencing technologiesmdashthe next generationrdquoNature Reviews Genetics vol 11 no 1 pp 31ndash46 2010

[2] S B Baylin and P A Jones ldquoA decade of exploring the cancerepigenomemdashbiological and translational implicationsrdquo NatureReviews Cancer vol 11 no 10 pp 726ndash734 2011

[3] E T Cirulli and D B Goldstein ldquoUncovering the roles of rarevariants in common disease through whole-genome sequenc-ingrdquo Nature Reviews Genetics vol 11 no 6 pp 415ndash425 2010

[4] Y-H Jiang R K C Yuen X Jin et al ldquoDetection of clinicallyrelevant genetic variants in autism spectrum disorder by whole-genome sequencingrdquo American Journal of Human Genetics vol93 no 2 pp 249ndash263 2013

[5] Z Kan H Zheng X Liu et al ldquoWhole-genome sequencingidentifies recurrent mutations in hepatocellular carcinomardquoGenome Research vol 23 no 9 pp 1422ndash1433 2013

[6] Y Song L Li Y Ou et al ldquoIdentification of genomic alterationsin oesophageal squamous cell cancerrdquoNature vol 508 no 7498pp 91ndash95 2014

[7] F Ozsolak and P M Milos ldquoRNA sequencing advanceschallenges and opportunitiesrdquo Nature Reviews Genetics vol 12no 2 pp 87ndash98 2011

[8] Z Peng Y Cheng B C-M Tan et al ldquoComprehensive analysisof RNA-Seq data reveals extensive RNA editing in a humantranscriptomerdquo Nature Biotechnology vol 30 no 3 pp 253ndash260 2012

[9] DMWuest SW Harcum and K H Lee ldquoGenomics inmam-malian cell culture bioprocessingrdquo Biotechnology Advances vol30 no 3 pp 629ndash638 2012

[10] X Xu H Nagarajan N E Lewis et al ldquoThe genomic sequenceof the Chinese hamster ovary (CHO)-K1 cell linerdquo NatureBiotechnology vol 29 no 8 pp 735ndash741 2011

[11] H Zhang W Cui and M L Gross ldquoMass spectrometryfor the biophysical characterization of therapeutic monoclonalantibodiesrdquo FEBS Letters vol 588 no 2 pp 308ndash317 2014

[12] F Cheung J Win J M Lang et al ldquoAnalysis of the Pythiumultimum transcriptome using Sanger and pyrosequencingapproachesrdquo BMC Genomics vol 9 pp 542ndash551 2008

[13] F M Wurm ldquoCHO quasispecies-implications for manufactur-ing processesrdquo Processes vol 1 no 3 pp 296ndash311 2013

[14] A C Tsiatis A Norris-Kirby R G Rich et al ldquoComparison ofSanger sequencing pyrosequencing andmelting curve analysisfor the detection of KRAS mutations diagnostic and clinicalimplicationsrdquo Journal ofMolecular Diagnostics vol 12 no 4 pp425ndash432 2010

8 BioMed Research International

[15] J A Stamatoyannopoulos I Adzhubei R E Thurman G VKryukov S M Mirkin and S R Sunyaev ldquoHuman mutationrate associated with DNA replication timingrdquo Nature Geneticsvol 41 no 4 pp 393ndash395 2009

[16] P Cui F Ding Q Lin et al ldquoDistinct contributions of repli-cation and transcription to mutation rate variation of humangenomesrdquo Genomics Proteomics amp Bioinformatics vol 10 no 1pp 4ndash10 2012

[17] P Green B Ewing W Miller P J Thomas and E DGreen ldquoTranscription-associated mutational asymmetry inmammalian evolutionrdquo Nature Genetics vol 33 no 4 pp 514ndash517 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Mutation Detection in an Antibody ...

8 BioMed Research International

[15] J A Stamatoyannopoulos I Adzhubei R E Thurman G VKryukov S M Mirkin and S R Sunyaev ldquoHuman mutationrate associated with DNA replication timingrdquo Nature Geneticsvol 41 no 4 pp 393ndash395 2009

[16] P Cui F Ding Q Lin et al ldquoDistinct contributions of repli-cation and transcription to mutation rate variation of humangenomesrdquo Genomics Proteomics amp Bioinformatics vol 10 no 1pp 4ndash10 2012

[17] P Green B Ewing W Miller P J Thomas and E DGreen ldquoTranscription-associated mutational asymmetry inmammalian evolutionrdquo Nature Genetics vol 33 no 4 pp 514ndash517 2003

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Mutation Detection in an Antibody ...

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology