An unbiased genome-wide analysis of zinc-finger nuclease specificity

9
816 VOLUME 29 NUMBER 9 SEPTEMBER 2011 NATURE BIOTECHNOLOGY 1 Department of Translational Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany. 2 San Raffaele Telethon Institute for Gene Therapy (HSR-TIGET), Vita Salute San Raffaele University, Milan, Italy. 3 Sangamo BioSciences, Inc., Richmond, California, USA. 4 These authors contributed equally to this work. Correspondence should be addressed to L.N. ([email protected]) or C.v.K. ([email protected]). Received 12 May; accepted 19 July; published online 7 August 2011; doi:10.1038/nbt.1948 Targeted genome modification is a powerful approach for investigating gene function and for correction of genetic disease. ZFN technology has enabled efficient, site-specific gene modification in live cells 1–4 . ZFNs can be engineered to introduce a DSB at a predetermined site in the genome by combining the nonspecific nuclease domain of the type IIS restriction enzyme FokI with an array of zinc-finger domains bind- ing in the major groove of DNA and recognizing a desired DNA target site 5 . Each individual zinc-finger domain recognizes 3–4 contiguous base pairs. Typical ZFNs contain 3–6 zinc fingers recognizing 9–18 bp of DNA. Because FokI functions as a dimer, two ZFNs are designed to bind directly upstream and downstream of the intended cleavage site, provid- ing stringent recognition of a potentially unique site in the genome 6,7 . DSBs are repaired by nonhomologous end-joining (NHEJ), which often introduces inserted or deleted sequences (indels) at the sealed break, or by homology-directed repair (HDR), which faithfully restores the original sequence by copying it from the sister chromatid. Both pathways can be used for site-directed modification of the genome. Disruption of essential coding or regulatory sequences has been achieved by NHEJ 8–12 , whereas correction of mutations or insertion of transgenes into preselected sites can be accomplished by HDR if an exogenous DNA template is provided carrying homologous sequences flanking the ZFN target site 3,4,13–19 . DSBs are rapidly resolved in live cells and, depending on the pathway, can be repaired perfectly, leaving no marks or indels to indicate the transient presence of the DSB itself. Consequently, comprehensive iden- tification of ZFN cleavage sites in vivo has remained an open challenge. Current approaches rely upon (i) in vitro determination of the consensus DNA binding site for a given ZFN pair, (ii) bioinformatic identification of the most homologous genomic targets and (iii) experimental detec- tion of indels introduced at these predicted off-target sites 8,13 . These approaches, however, provide limited insight into the in vivo selectivity of a ZFN in the context of the entire genome. Therefore, an unbiased genome-wide determination of cleavage events at on- and off-target sites remains lacking 1 . Here we have addressed this challenge by using integrase-defective lentiviral vectors 20,21 (IDLVs; for abbreviations of vectors used in this study see also Supplementary Fig. 1). We hypothesized that linear double-stranded IDLV genomes present in the nucleus of transduced cells 22–24 could be ligated into DSBs by NHEJ, thereby stably tagging transient, otherwise undetectable DSBs (Fig. 1a). Such trapping of extrachromosomal DNA into DSBs has been observed previously 25–27 by using adeno-associated viral vectors (AAV). Indeed, off-target sites of I-SceI meganuclease have been identified by an AAV rescue assay 28 . However, the natural instability of inverted terminal repeats and exten- sive concatemer formation of episomal AAV may adversely affect detec- tion of integrated sequences. To distinguish ZFN-mediated IDLV integrations from those found in naturally occurring DSBs, we used clustered integration site (CLIS) analysis. We defined a CLIS as two or more integration sites separated by no more than 500 bp. Such ‘small range’ clusters are considered An unbiased genome-wide analysis of zinc-finger nuclease specificity Richard Gabriel 1,4 , Angelo Lombardo 2,4 , Anne Arens 1 , Jeffrey C Miller 3 , Pietro Genovese 2 , Christine Kaeppel 1 , Ali Nowrouzi 1 , Cynthia C Bartholomae 1 , Jianbin Wang 3 , Geoffrey Friedman 3 , Michael C Holmes 3 , Philip D Gregory 3 , Hanno Glimm 1 , Manfred Schmidt 1,4 , Luigi Naldini 2,4 & Christof von Kalle 1,4 Zinc-finger nucleases (ZFNs) allow gene editing in live cells by inducing a targeted DNA double-strand break (DSB) at a specific genomic locus. However, strategies for characterizing the genome-wide specificity of ZFNs remain limited. We show that nonhomologous end-joining captures integrase-defective lentiviral vectors at DSBs, tagging these transient events. Genome-wide integration site analysis mapped the actual in vivo cleavage activity of four ZFN pairs targeting CCR5 or IL2RG. Ranking loci with repeatedly detectable nuclease activity by deep-sequencing allowed us to monitor the degree of ZFN specificity in vivo at these positions. Cleavage required binding of ZFNs in specific spatial arrangements on DNA bearing high homology to the intended target site and only tolerated mismatches at individual positions of the ZFN binding sites. Whereas the consensus binding sequence derived in vivo closely matched that obtained in biochemical experiments, the ranking of in vivo cleavage sites could not be predicted in silico. Comprehensive mapping of ZFN activity in vivo will facilitate the broad application of these reagents in translational research. ARTICLES © 2011 Nature America, Inc. All rights reserved.

Transcript of An unbiased genome-wide analysis of zinc-finger nuclease specificity

816 volume 29 number 9 september 2011 nature biotechnology

1Department of Translational Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany. 2San Raffaele Telethon Institute for Gene Therapy (HSR-TIGET), Vita Salute San Raffaele University, Milan, Italy. 3Sangamo BioSciences, Inc., Richmond, California, USA. 4These authors contributed equally to this work. Correspondence should be addressed to L.N. ([email protected]) or C.v.K. ([email protected]).

Received 12 May; accepted 19 July; published online 7 August 2011; doi:10.1038/nbt.1948

Targeted genome modification is a powerful approach for investigating gene function and for correction of genetic disease. ZFN technology has enabled efficient, site-specific gene modification in live cells1–4. ZFNs can be engineered to introduce a DSB at a predetermined site in the genome by combining the nonspecific nuclease domain of the type IIS restriction enzyme FokI with an array of zinc-finger domains bind-ing in the major groove of DNA and recognizing a desired DNA target site5. Each individual zinc-finger domain recognizes 3–4 contiguous base pairs. Typical ZFNs contain 3–6 zinc fingers recognizing 9–18 bp of DNA. Because FokI functions as a dimer, two ZFNs are designed to bind directly upstream and downstream of the intended cleavage site, provid-ing stringent recognition of a potentially unique site in the genome6,7.

DSBs are repaired by nonhomologous end-joining (NHEJ), which often introduces inserted or deleted sequences (indels) at the sealed break, or by homology-directed repair (HDR), which faithfully restores the original sequence by copying it from the sister chromatid. Both pathways can be used for site-directed modification of the genome. Disruption of essential coding or regulatory sequences has been achieved by NHEJ8–12, whereas correction of mutations or insertion of transgenes into preselected sites can be accomplished by HDR if an exogenous DNA template is provided carrying homologous sequences flanking the ZFN target site3,4,13–19.

DSBs are rapidly resolved in live cells and, depending on the pathway, can be repaired perfectly, leaving no marks or indels to indicate the transient presence of the DSB itself. Consequently, comprehensive iden-

tification of ZFN cleavage sites in vivo has remained an open challenge. Current approaches rely upon (i) in vitro determination of the consensus DNA binding site for a given ZFN pair, (ii) bioinformatic identification of the most homologous genomic targets and (iii) experimental detec-tion of indels introduced at these predicted off-target sites8,13. These approaches, however, provide limited insight into the in vivo selectivity of a ZFN in the context of the entire genome. Therefore, an unbiased genome-wide determination of cleavage events at on- and off-target sites remains lacking1.

Here we have addressed this challenge by using integrase-defective lentiviral vectors20,21 (IDLVs; for abbreviations of vectors used in this study see also Supplementary Fig. 1). We hypothesized that linear double-stranded IDLV genomes present in the nucleus of transduced cells22–24 could be ligated into DSBs by NHEJ, thereby stably tagging transient, otherwise undetectable DSBs (Fig. 1a). Such trapping of extrachromosomal DNA into DSBs has been observed previously25–27 by using adeno-associated viral vectors (AAV). Indeed, off-target sites of I-SceI meganuclease have been identified by an AAV rescue assay28. However, the natural instability of inverted terminal repeats and exten-sive concatemer formation of episomal AAV may adversely affect detec-tion of integrated sequences.

To distinguish ZFN-mediated IDLV integrations from those found in naturally occurring DSBs, we used clustered integration site (CLIS) analysis. We defined a CLIS as two or more integration sites separated by no more than 500 bp. Such ‘small range’ clusters are considered

An unbiased genome-wide analysis of zinc-finger nuclease specificityRichard Gabriel1,4, Angelo Lombardo2,4, Anne Arens1, Jeffrey C Miller3, Pietro Genovese2, Christine Kaeppel1, Ali Nowrouzi1, Cynthia C Bartholomae1, Jianbin Wang3, Geoffrey Friedman3, Michael C Holmes3, Philip D Gregory3, Hanno Glimm1, Manfred Schmidt1,4, Luigi Naldini2,4 & Christof von Kalle1,4

Zinc-finger nucleases (ZFNs) allow gene editing in live cells by inducing a targeted DNA double-strand break (DSB) at a specific genomic locus. However, strategies for characterizing the genome-wide specificity of ZFNs remain limited. We show that nonhomologous end-joining captures integrase-defective lentiviral vectors at DSBs, tagging these transient events. Genome-wide integration site analysis mapped the actual in vivo cleavage activity of four ZFN pairs targeting CCR5 or IL2RG. Ranking loci with repeatedly detectable nuclease activity by deep-sequencing allowed us to monitor the degree of ZFN specificity in vivo at these positions. Cleavage required binding of ZFNs in specific spatial arrangements on DNA bearing high homology to the intended target site and only tolerated mismatches at individual positions of the ZFN binding sites. Whereas the consensus binding sequence derived in vivo closely matched that obtained in biochemical experiments, the ranking of in vivo cleavage sites could not be predicted in silico. Comprehensive mapping of ZFN activity in vivo will facilitate the broad application of these reagents in translational research.

ART ICLES©

201

1 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

nature biotechnology volume 29 number 9 september 2011 817

a ZFN + IDLVZFN target locus or off-target locus

DNA DSB

NHEJ

GF

P+

cel

ls (%

)

IDLV

w/o

ho

mo

log

ies

100

80

60

40

20

0

0 20

6%

0 1023 0 1023 0

100

101

102

103

104

100

101

102

103

104

100

101

102

103

104

1023

5.7%

SSC

3%

40 60 0 20 40 60

100

80

60

40

20

0

GF

P+

cel

ls (%

)

Day after transductionDay after transduction

CCR5wt/IDLV CCR5muF/IDLV IDLV

b c

d

ZFN

IL2RG exon5

IL2RG1/IDLV

IL2RG2/IDLV–60 60

f

ZFN

CCR5 exon3

CCR5wt/IDLV

CCR5muF/IDLV–60 60

e

CCR5wt/IDLV

IDLV

ZFN

CCR5muF/IDLV

95

104

No. IS No. IS in CCR5(±60 bp to target site)

290

29 (27)

-

71 (52)

IL2RG1/IDLV

IDLV

ZFN

IL2RG2/IDLV

208

104

No. IS No. IS in CCR5(±60 bp to target site)

248

17 (15)

-

21 (18)

Figure 1 Trapping of IDLVs into ZFN-induced DNA DSBs. (a) Upon their transient expression, ZFNs bind to the intended DNA sequence and to off-target sites and introduce a DSB. During NHEJ repair of these DSBs (which often results in insertion or deletion of sequences), a full copy of the IDLV can be introduced into the ZFN target site but also at off-target ZFN-induced DSBs, thereby stably marking the transient DSB. (b) IDLV-GFP transduced A549 cells were photon-irradiated with different doses 24 h after transduction and GFP-positive cells were measured by flow-cytometry analysis (FACSCalibur). At the last time point analyzed, the fractions of GFP-expressing cells were: 0 Gy: 4.73 ± 3.56% (dark blue line); 2 Gy: 31.80 ± 5.82% (light green line); 5 Gy: 59.40 ± 7.93% (dark green line). (c) Same as in panel b in K562 cells. 0 Gy: 2.3 ± 0.2% (dark blue line); 2 Gy: 8.5 ± 1.1% (light green line); 5 Gy: 10.8 ± 1.4% (dark green line). (d) Representative FACS analysis of IDLV-transduced K562 cells either without ZFNs or together with CCR5wt or CCR5muF ZFNs analyzed after 4 weeks. Percentages of GFP-positive cells were: CCR5wt/IDLV: 6.5 ± 0.1%; CCR5muF/IDLV: 6.06 ± 0.3%; IDLV 2.6 ± 0.15%; mean ± s.d., n = 3. (e) Insertion sites located in a ± 60-bp window surrounding the ZFN binding site in exon 3 of the CCR5 gene. Blue bars: integration site from wild-type ZFN (CCR5wt/IDLV); green bars: integration site from ZFN with obligate heterodimeric FokI domain (CCR5muF/IDLV). (f) Insertion sites (IS) located in a ± 60-bp window surrounding the ZFN binding site in exon 5 of IL2RG. Blue bars: IL2RG1/IDLV; green bars: IL2RG2/IDLV.

to be the result of ZFN activity or the presence of fragile sites in the genome. CLIS analysis has proven to be a very simple, yet power-ful statistical tool to detect nonrandomness in genomic integration studies29.

To assess ZFN specificity genome-wide, we comprehensively mapped the locations of IDLV integration sites in ZFN-treated cells by linear amplification-mediated (LAM)-PCR30 and nonrestrictive (nr)LAM-PCR31,32. For all ZFN pairs tested, the largest cluster of tagged IDLV integration sites occurred at the intended genomic target site, but CLIS were also identified at other locations. Our data show that all off-target CLIS were found within 47 bp of a sequence bearing substantial homology to the intended ZFN target site. Notably, however, target homology alone was not sufficient to determine whether an in silico–predicted off-target site was actually cleaved in vivo.

RESULTSIDLVs are captured by DNA breaksA small fraction of IDLV-treated cells harbors integrated vector. To investigate the genomic distribution of IDLV background integration, we performed (nr)LAM-PCR31,32 analysis on DNA isolated from K562 cells 4 weeks after IDLV transduction. LAM-PCR was performed using primers hybridizing in the U5 (unique5) region of the lentiviral long ter-minal repeat (LTR) to amplify the unknown 3′-vector genome junction (Supplementary Fig. 2a). Unlike integrase-competent lentiviral vectors, which show preferred integration into gene coding regions33,34, no such preference was observed in our data set of 104 unique IDLV integration sites (Supplementary Table 1 and Supplementary Fig. 2b). In addition, LTR sequences joined to the genome often showed deletions atypical of viral integrase activity (Supplementary Fig. 2c,d). These data agree with results obtained from a series of >800 unique integration sites retrieved

ART ICLES©

201

1 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

818 volume 29 number 9 september 2011 nature biotechnology

Figure 2 Gene targeting using ZFNs and a homologous donor vector. (a) Co-delivery of a donor template DNA, which contains homology regions to the target site, additionally allows targeted integration of solely an exogenous expression cassette. Therefore, treatment of cells with ZFN and the cognate homologous donor IDLV can result in three different ways of on-target integration: (i) integration of a single expression cassette (without residual vector sequence; left drawing) or of head-to-tail concatemers (with intervening viral sequences) by HDR at the target locus; (ii) integration by a combined mechanism of HDR and NHEJ, thereby joining vector sequence to the target locus (middle left drawing); and (iii) pure NHEJ-mediated IDLV integration (middle right drawing). Also by using a homology- containing donor vector off-target integration by NHEJ can occur (right drawing). (b) Meta-analysis of 258 integration sites retrieved from 212 GFP-positive cell clones derived from the homology-containing donor IDLVs plus ZFN- treated cells. 89.9% of the integrations occurred by HDR at the intended ZFN site. A further 3.9% of the integrations occurred at the intended target site by a combined mechanism of HDR and NHEJ. 6.2% of the IDLV integrations were into unidentified off-target genomic locations. (c) Representative FACS analysis (n = 3) of K562 cells treated with a GFP-expressing IDLV containing homologous sequences to the CCR5 target site either alone (without ZFNs) or together with CCR5wt ZFNs or CCR5muF ZFNs. The frequency of GFP-positive cells is indicated for each sample. Note that the presence of homology dramatically increased the number of GFP+ cells.

from mouse tissues and cell lines treated with IDLVs35, showing that the majority of background IDLV integration occurs by host cell–mediated NHEJ into spontaneously occurring DSBs.

If IDLV integration into a DSB is mediated by NHEJ, induction of addi-tional genomic DSBs might be expected to increase IDLV trapping. We introduced DSBs by irradiation of K562 and A549 cells transduced with GFP-expressing IDLVs, and observed a substantially increased frequency of stable GFP-expressing cells in both cell lines tested, demonstrating that IDLV integration was dependent on DSB generation (A549: 2 Gy, seven-fold; 5 Gy, 13-fold; K562: 2 Gy, 3.7-fold; 5 Gy, 4.8-fold; Fig. 1b,c).

Sites of IDLV integration cluster around the ZFN target siteTo experimentally assess ZFN specificity, we transduced K562 cells in parallel with three different IDLVs—two of them each expressing an individual ZFN monomer and a third expressing GFP. ZFNs were designed against human CCR5 or IL2RG. We compared two methods of ZFN optimization. First, using CCR5-specific ZFNs, we evaluated the identical DNA-binding domains fused to either the wild-type FokI domain (CCR5wt) or to the obligate heterodimeric FokI mutants

(CCR5muF)36. Second, we also compared two pairs of IL2RG-specific ZFNs designed to target the same IL2RG sequence but differing solely in the DNA binding domain (IL2RG1 and IL2RG2). Notably, ZFN vec-tor doses were chosen according to our previous studies3 to enable the maximal gene-targeting efficiency without overt cellular toxicity (Supplementary Fig. 3). Infection of cells with ZFN-IDLVs together with a GFP-expressing IDLV produced up to 6% GFP-expressing cells after 4 weeks of cell culture versus 3% in the samples treated with IDLV-GFP alone (Fig. 1d and data not shown).

(nr)LAM-PCR–mediated IDLV integration site analysis of the ZFN-treated samples showed that in all four ZFN systems numerous inde-pendent IDLV integration sites clustered at the intended target locus, with 73–93% located less than 60 bp away from the ZFN-induced cut (Fig. 1e,f; note that the simple calculation of on-target integration sites versus off-target integration sites is not a valid strategy to calculate the ratio of on-target to off-target cleavages as will be discussed in detail below). In contrast, none of the integration sites in cells treated with IDLV-GFP alone were located near (that is, within 600 bp) any of the respective ZFN target sites. These data are consistent with the capture of IDLVs at ZFN-induced DSBs by NHEJ, which is known to favor the introduction of small deletions of <60 bp37.

CLIS analysis identifies on- and off-target sites of ZFN actionWe next asked whether this type of analysis could also be used to track ZFN action when IDLVs are used as donor templates for HDR to specifi-cally integrate a transgene into the ZFN target site3,38. In this approach, the IDLV transgene cassette is flanked by homologous sequences to the ZFN target locus (Fig. 2a) in order to favor HDR-mediated integration at the intended target site. We previously reported the high efficiency and specificity of this strategy3, but the occurrence and location of off-target vector integrations remained unexplored. A meta-analysis performed on 212 transgene-positive, single cell–derived clones showed that 93.8% of insertions were located at the respective ZFN target sites (Fig. 2b and Supplementary Fig. 4). Only 6.2% of insertions occurred at unidentified

a ZFN on-target site

Off-target6.2%

26% 30%

SSC

GFP

0 0 0100100100

101

102

103

104

101

102

103

104

101

102

103

104

1,023 1,023 1,023

2%

CCR5wt/IDLVhc CCR5muF/IDLVhc IDLVhc

On-targetHDR+NHEJ

3.9%

On-target HDR89.9%

ZFN off-target sites

DNA DSB DNA DSB

HDR HDR + NHEJ NHEJ NHEJ

b c

Table 1 Cleavage activity of ZFN at CCR5 binding sites with various spacer sizesCell lines CCR5wt (%) CCR5muF (%)

4 bp 1.5 ± 1.4 0.0 ± 0.0

WT (5 bp) 45.1 ± 7.8 52.7 ± 2.1

5 bp 38.9 ± 2.3 48.7 ± 2.1

6 bp 10.8 ± 3.8 12.6 ± 2.8

7 bp 0.0 ± 0.0 0.0 ± 0.0

8 bp 0.0 ± 0.0 0.0 ± 0.0

WT (5 bp) 58.4 ± 2.1 59.7 ± 3.2

15 bp 11.2 ± 0.8 6.1 ± 2.5

16 bp 6.3 ± 0.5 2.4 ± 0.2

ART ICLES©

201

1 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

nature biotechnology volume 29 number 9 september 2011 819

site (Supplementary Fig. 8). Thus, simple homology to the intended target site alone is not an accurate predictor of which sites are actually cleaved in vivo.

Previous work has shown that ZFN activity is highly dependent on the length of the gap separating the two monomer binding sites39,40. Therefore, we screened the genomic sequences in close proximity to the CLIS loci for the occurrence of potential ZFN target sites separated by different spacer lengths (Fig. 3b,d and Supplementary Fig. 9). We also screened as controls loci that corresponded to single integra-tion sites in the CCR5muF- and IL2RG-treated samples, and IDLV integration sites that occurred in the absence of ZFNs. Notably, we observed a clear enrichment of sites homologous to the intended ZFN target site only in the CCR5 and IL2RG CLIS but never in the controls (Fig. 3b,d).

The strongest homology at CLIS loci was found for ZFN dimers sep-arated by 5-bp spacers. Occasionally, we observed the pairing with the highest degree of homology at putative binding sites separated by 15 bp for CCR5 CLIS and 6 bp for IL2RG CLIS (Supplementary Fig. 9). Cleavage of 15-bp-spaced binding sites in vivo was verified by engi-neering the endogenous CCR5 locus in human K562 cells, replacing the natural 5-bp spacer with a 15-bp gap and observing measurable CCR5 ZFN activity (Table 1 and Supplementary Fig. 10). Together, these data demonstrate that all CLIS included sites homologous to the intended ZFN target site in close vicinity to the integration site of the IDLVs, as all IDLV integrations at CLIS loci were located within 47 bp of the putative off-target binding site. Moreover, cleavage of these sites requires binding of the ZFN monomers in a specific arrangement on the DNA (Supplementary Figs. 11 and 12). CLIS demonstrate that in vivo ZFN cleavage sites are a select subset of all homology sites pres-ent in the genome.

Validation of off-target activityTo confirm that the CLIS-identified ZFN binding sites were the actual sites of cleavage, we did two independent experiments. First, we over-expressed the CCR5 ZFNs by nucleofection of ZFN-encoding plasmid DNA into K562 cells and performed a mismatch-selective endonuclease assay that reports levels of modification by NHEJ at preselected sites of the genome41. Using this approach, ZFN activity in CCR5wt-treated samples could be confirmed at the (nr)LAM-PCR-identified off-target CLIS within or nearest to the RefSeq genes CCR5, C3orf59 (also known as MB21D2), SOS, ABLIM2, PGC, KRR1, ZCCHC14 and FBXL11. For the CCR5muF-treated samples, IDLV trapping correctly predicted ZFN activity on heterodimeric off-target sites in the CCR5, KRR1, ZCCHC14 and FBXL11 loci (Fig. 4).

Second, we further confirmed these cleavage data by high-throughput pyrosequencing of a second set of DNA samples in Figures 1–3 to determine the frequency of NHEJ-mediated indels at the CLIS loci. As a control, two additional loci with an in silico–predicted off-target site with 91.7% and 95.8% homology to the intended ZFN target site but with no experimentally identified IDLV integrant were included in the analysis. The percentage of indels observed in the pyrosequencing data was used to rank the off-target CLIS (Table 2 and Supplementary Discussion). CLIS were found to be accurate predictors of off-target activity, as all CLIS loci with four or more integrations showed indels resulting from NHEJ repair in 1.5–7% of all obtained sequences (versus indels in 34–40% of sequences at CCR5), whereas the two control in silico–predicted sites showed ≤0.1% indels. The vast majority of mutated sequences showed deletions of <10 bp around the ZFN binding sites (Supplementary Fig. 13). Notably, CLIS containing homodimeric off-target sites showed a higher occurrence of indels only in cells treated with CCR5wt and not with the obligate heterodimeric CCR5muF ZFNs.

sites, likely representing the sum of the ZFN-induced and background integration events. Thus, we decided to exploit high-throughput CLIS analysis to chart the location of on- and off-target IDLV insertions in our model. We treated K562 cells with donor IDLVs bearing homology to CCR5 (IDLVhc) or IL2RG (IDLVhi) at each side of the GFP cassette and with or without the cognate ZFNs. As expected, the inclusion of a homol-ogous donor vector together with ZFNs greatly increased the frequency of stable GFP-expressing cells: 13- to 15-fold in CCR5-ZFN–treated cells and up to 31-fold in IL2RG-ZFN–treated cells coinfected with a homolo-gous IDLV donor (CCR5wt/IDLVhc: 27.1 ± 2.1%; CCR5muF/IDLVhc: 28.2 ± 1.4%; IDLVhc: 1.8 ± 0.1%; mean ± s.d. after 4 weeks of cell cultur-ing, unpaired t-test: P < 0.005; Fig. 2c; and IL2RG1/IDLVhi: 2.7 ± 0.3%; IL2RG2/IDLVhi: 26.1 ± 3%; IDLVhi: 0.8 ± 0.1%; mean ± s.d. after 4 weeks of cell culturing, unpaired t-test, P < 0.005; Supplementary Fig. 5). We then investigated NHEJ-mediated IDLV integration by (nr)LAM-PCR. We observed the highest number of tagged IDLVs clustered around the intended target locus, despite competition for resolution of this on-target DSB by HDR and the target-specific donor DNA that would be invisible to (nr)LAM-PCR primers. At the CCR5 locus, 59 and 32 integration sites were detected in CCR5wt/IDLVhc- and CCR5muF/IDLVhc-treated cells, respectively. Similarly, 13 were detected at the IL2RG locus in IL2RG1/IDLVhi-treated cells (Supplementary Fig. 6).

To determine ZFN off-target activity, we pooled and screened our complete integration site data sets for CLIS outside of the target sequence. In the CCR5wt samples, (nr)LAM-PCR identified only 13 genomic CLIS loci harboring 75 integration sites. In samples transduced with the obli-gate heterodimeric FokI ZFNs, this number fell to 44 integration sites in seven chromosomal loci (Fig. 3). CLIS loci near the RefSeq genes CCR2, KRR1, FBXL11, ZCCHC14 and CD180 were found to occur in cells treated with wild-type FokI (CCR5wt) as well as in cells treated with the mutated FokI (CCR5muF). The off-target CLIS locus with the highest number of insertions (CCR2) showed 41 integration sites within a 70-bp window, indicating that these events were nonrandom. Similarly, we could identify 36 integration sites at 14 genomic loci in samples treated with IL2RG1 ZFNs and 32 integration sites at 15 genomic loci in samples treated with IL2RG2 ZFNs, respectively (Fig. 3). Notably, three CLIS loci, FAM133B, SLC31A1 and SEC16A, were common between samples treated with IL2RG1 and IL2RG2 ZFNs. Therefore, for each ZFN pair studied, a small subset of specific genomic loci showed recurring inte-gration sites, independent of the type of IDLV used (with or without homology to the intended target site). These loci do not occur in the absence of ZFNs and can be considered to represent off-target ZFN activity (Supplementary Discussion).

In silico analysis of ZFN off-target activityReasoning that the most likely sites of ZFN off-target activity should be sequences with homology to the intended target site, we annotated the human genome in silico for the presence of putative off-target sites of ZFN action. We ranked sequences purely according to their degree of homology, allowing spacing of ZFN monomers between 0 and 20 nucleotides and allowing both heterodimers and homodim-ers of each of the two ZFNs used per experiment. Comparison of the detected IDLV integrations with the in silico list of target homologies revealed that just two of the predicted in silico sites were CLIS, one in ABLIM2 (4 integration sites) and one in CCR2 (41 integration sites). Both have previously been described as off-target sites of the CCR5-specific ZFN8. These off-target sites showed 96% and 92% sequence similarity to the CCR5 target site, respectively (Supplementary Fig. 7). Similarly, after IL2RG-specific ZFN treatment, a single site from the in silico–derived list was detected, revealing two integration sites in KIAA0528, which has 88% sequence homology to the IL2RG target

ART ICLES©

201

1 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

820 volume 29 number 9 september 2011 nature biotechnology

Figure 3 CLIS at genomic loci in ZFN-treated cells. (a) Genomic regions with ≥3 integration sites in CCR5-ZFN–treated samples. Blue line: identity to the ZFN target site; dark-blue bars: CCR5wt/IDLV; light-blue bars: CCR5wt/IDLVhc; dark-green bars: CCR5muF/IDLV; light-green bars: CCR5muF/IDLVhc. X axis: nearest RefSeq gene and percentage of identity to the ZFN target site. IS, insertion site. (b) Enrichment of CCR5-ZFN binding sites at CLIS loci. The curves show the average identity of the 24-bp ZFN-dimer binding site (separated by a 5-bp spacer) to the CCR5 target site in close vicinity to different IDLV integration site data sets. The x axis shows the location of the dimer binding site relative to the appropriate integration site data set. Blue: CLIS loci described in panel a; green: 188 single integration site loci from the CCR5muF/IDLV samples; black: 98 integration site loci retrieved from no ZFN samples (IDLV). (c) Genomic regions with ≥3 integration site in IL2RG-ZFN–treated cells. Brown line: identity to the ZFN target site; brown bars: IL2RG1/IDLV; orange bars: IL2RG1/IDLVhi; red bars: IL2RG2/IDLV. (d) Enrichment of IL2RG ZFN binding sites at CLIS loci. Similar to b, but brown: IL2RG CLIS loci from c; red: 170 single integration-site loci from IL2RG1/IDLV sample; black: 98 integration-site loci retrieved from no ZFN samples (IDLV).

of the actual sites of cleavage for a given ZFN pair within the context of the entire genome. We found that off-target ZFN action, as indicated by repeated IDLV integrations in CLIS, required significant homology to the intended target sequence. All the CLIS discovered in CCR5 ZFN-treated cell samples showed >66.7% sequence homology to the intended ZFN binding site, tolerated mismatches only in certain positions and required a defined spacer length between the two ZFN monomers. Aligning the

The frequency of both on-site and off-site cleavage was much lower for IL2RG ZFNs. However, we noted an increased level of on-site activity and decreased activity at off-target sites for IL2RG2 versus IL2RG1 ZFNs (Supplementary Fig. 14).

In vivo nucleotide specificity of ZFNsOur IDLV trapping analysis showed that CLIS analysis reproduc-ibly identified genomic sequences where ZFN off-target cleavage occurred. Thus, for each ZFN a consensus of the sequences from all CLIS should be a precise reflection of the sequence motifs permis-sive for ZFN activity in vivo, inside and outside of its intended target site. The in silico analysis indicated that, at CLIS, ZFNs act at dimeric target sites. The mismatch-sensitive endonuclease assay and deep sequencing confirmed that all CLIS represented real in vivo targets. Figure 5 shows a weighted alignment of all in vivo ZFN target sites including the intended CCR5 or IL2RG target sites. The sequence fidelity of the zinc-finger combinations used in these synthetic nucleases was very high. Relaxed sequence recognition was restricted to a few positions for each ZFN, and these are now easily identifiable by this approach. Notably, a retrospective analysis of our in vitro Selex assay showed that ZFNs bind to very similar motifs on synthetic DNA fragments (Fig. 5a and Supplementary Fig. 15). We conclude that this DNA target consensus specificity combined with the appropriate spac-ing between ZFN monomers are stringent prerequisites for in vivo ZFN activity. However, our analyses also showed that additional cellular fac-tors influence which sites of this target homology are actually cleaved in vivo, demonstrating that ZFN off-target activity cannot be predicted by in vitro methods alone.

DISCUSSIONThe sequence fidelity of ZFNs is critical to their safe and successful thera-peutic application. Until now, there has been no unbiased genome-wide assessment of ZFN off-target activity to our knowledge. Here we demon-strate that IDLV capture at ZFN-induced DSBs permits the identification

a b

c dFr

eque

ncy

of IS

%Fr

eque

ncy

of IS

%

15

100 80

75

70

65

60

55

50

45

40

80

75

70

65

60

55

50

45

40

-100 -50 0 50 100

90

80

70

60

50

40

30

20

10

0

100

90

80

70

60

50

40

30

20

10

0

10

5

0

5

4

3

2

1

0

ABLIM295.8

CCR291.7

PKN287.5

PGC83.3

KRR183.3

VEZT75.0

SOS175.0

FBXL1170.8

ZCCHC1466.7

C3orf5966.7

KIAA052887.5

SLC31A175.0

SF3B170.8

STAG170.8

SEC16A70.8

SCARB170.8

FAM133B66.7

ADAMTS1862.5

Iden

tity

to Z

FN t

arge

t si

te %

Iden

tity

to Z

FN t

arge

t si

te %

Position (bp)

-100 -50 0 50 100

Position (bp)

CCR5

FBXL11 ZCCHC14 C3orf59 SOS1 ACSM5

PGCABLIM2 KRR1 PKN2

Figure 4 Validation of (nr)LAM-PCR identified off-target CLIS loci. Cel1 analysis of the CCR5, ABLIM2, PGC, KRR1, PKN2, FBXL11, ZCCHC14, C3orf59, SOS1 and ACSM5 loci in K562 cells mock transfected (Con) or transfected with plasmid DNA encoding CCR5wt or CCR5muF. Arrows denote the Cel1 cleavage bands, and the percentage of modified alleles determined by the assay is indicated below each lane.

ART ICLES©

201

1 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

nature biotechnology volume 29 number 9 september 2011 821

fied off-target positions that were cleaved at a 100-fold lower frequency than was the intended target site (Table 2 and Supplementary Fig. 14). The data presented in this study provide for the first time a snapshot of the relevant ZFN action in a living cell using a method that requires no a priori knowledge of ZFN specificity and is unbiased by in silico homology-based predictions.

Although the precise cellular factors that restrict ZFN action in vivo are currently unknown, our results argue that proper evaluation of the specificity of any particular ZFN pair will require use of the relevant cell type, the intended ZFN dose and the delivery method for that specific application. Our IDLV approach is rapid, robust and applicable to any cell type that can be infected with lentiviral vectors (a broad-tropism delivery vehicle) and may be further extended to other tagging vectors (e.g., AAV) or transfected linear tagging DNA.

We further determined the specificity obtained from two differ-ent approaches to ZFN optimization using CCR5 and IL2RG ZFNs, respectively. In the case of CCR5 ZFNs, we compared the identical DNA binding domains fused to either the wild-type or obligate heterodimeric FokI domains. The obligate heterodimeric CCR5-specific ZFNs retained activity only at heterodimeric off-target sites, whereas off-target activity at homodimeric binding sites was abolished. These data demonstrate the improvement in specificity achieved by using the obligate heterodimeric FokI domain. Moreover, as heterodimeric ZFNs do not cut at homodi-meric off-target sites, these data provide clear evidence that cleavage of DNA requires both ZFN monomers to recognize a homologous target in the genome in the proper spatial orientation to assemble a functional ZFN. These data strongly argue against the formation of ZFN dimers in solution or at monomeric ZFN binding sites, in agreement with earlier in vitro studies6,7,42,43. For IL2RG-specific ZFN pairs that differ only at

target site consensus sequence from each CLIS location obtained by (nr)LAM-PCR analysis, we could derive an in vivo-determined ZFN DNA binding site consensus sequence. This consensus sequence reflects the binding preference for a pair of ZFNs at each nucleotide position, iden-tifying any position permissive for binding to an unintended DNA base. The in vivo defined consensus sequence is remarkably congruent with the consensus sequence obtained from in vitro DNA binding studies (Fig. 5). Whereas all sites repeatedly cleaved in vivo possess a consensus stretch of DNA, the converse—that is, that all sites with high homology to this consensus are cleaved—is not true. Bioinformatic estimation of off-target activity based on sequence homology to the intended target site did not reliably predict a specific genomic locus as a bona fide off-target site. IDLV CLIS were found in only a subset of all in silico–predicted sites with high homology to the consensus sequence (see also predicted sites8). This result was not due to a failure of IDLV integration site retrieval by LAM-PCR. Deep sequencing of the GALC locus—despite high homology to the CCR5 ZFN binding sites—revealed no evidence of ZFN-induced NHEJ, consistent with the absence of IDLV insertion in this locus. The lack of cleavage at these sites may reflect differences from the consensus sequence at positions that abolish ZFN binding or constraints imposed by the nuclear environment, such as epigenetic modifications.

The strength of the IDLV capture approach is therefore the unbiased identification of the actual sites of off-target action. It is possible that not every off-target site in the human genome was identified during the course of our analysis (e.g., due to off-target sites located in highly repetitive stretches of the human genome or off-target events occur-ring so infrequently that they fell below the sequence coverage used in our study, thus preventing the unequivocal identification of the IDLV integration site by (nr)LAM-PCR). However, the CLIS strategy identi-

Figure 5 Comparative analysis of zinc-finger sequence specificity in vivo. The sequences of the most frequently hit off-target binding sites for the CCR5 ZFNs that were found by DSB trapping can be considered as consensus sequences for the true binding affinity of each zinc-finger combination. Mapping of ZFN-induced DSB for each vector precisely delineates which zinc finger in which particular nucleotide position of its target sequence contributes to off-target DNA binding. The target site sequence itself is indicated under the sequence logos. (a) The binding sites near the CLIS loci CCR5, ABLIM2, CCR2, PGC, KRR1, FBXL11, ZCCHC14, PKN2, C3orf59, PTGS2, SOS1, ACSM5 and VEZT have been aligned and weighted by the number of integration sites derived from (nr)LAM-PCR experiments to create the consensus binding sequence for CCR5wt ZFN. The SELEX-derived consensus sequence, done as described8, for the left and right CCR5 ZFNs aligned to the intended genomic target, is shown at the bottom. (b) The binding sites near the CLIS loci IL2RG, SCARB1, SLC31A1, SEC16A, STAG1, RRS1 and FAM133B have been aligned and weighted by the number of integration sites derived from (nr)LAM-PCR to create the consensus binding sequence for IL2RG1 ZFN. (c) Accordingly, the binding sites near the CLIS loci IL2RG, A2BP1, SLC31A1, SEC16A, KIAA0528, SF3B1, KCTD8, NARG1L and FAM133B have been aligned and weighted by the number of integration sites derived from (nr)LAM-PCR to create the consensus binding sequence for IL2RG2 ZFN.

a

CCR5 in vivo:

CCR5 in vitro:(SELEX)

b

IL2RG1 in vivo:

c

IL2RG2 in vivo:

gctg-3’GTC ATC CTC ATC ctgat AAA GTG CAA AAG5’-gctg

aaaca-3’CTT CCA CAG AGT gggtt AAA GCG GCT CCG5’-agca

aaaca-3’CTT CCA CAG AGT gggtt AAA GCG GCT CCG5’-agca

ART ICLES©

201

1 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

822 volume 29 number 9 september 2011 nature biotechnology

Concerning applications of the ZFNs used in this study, we exploited our CLIS analysis to assess the impact of off-target ZFN activity in a previously developed model of site-specific gene transfer3. In this model, transgene integration at the ZFN target site occurs with very high effi-ciency and specificity due to the synergy between induction of DSB and engagement of the HDR pathway specifically at the intended tar-get site (>93.8% on target). Yet, a measurable rate of vector integration at unidentified sites still occurs with this approach, which is the sum of ZFN-mediated and background levels of IDLV integration35. Our CLIS analyses now show that the ZFN contribution to this off-target integration is to direct the vector to a defined, measurable and highly limited subset of genomic sites. This finding demonstrates that the use of ZFN for targeted transgene integration largely overcomes the concerns typically associated with vector systems exploiting random integration. Taken together, our results validate the high specificity of ZFN action on the human genome and should enhance the application of synthetic nucleases for genome editing of living cells.

METHODSMethods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/.

Note: Supplementary information is available on the Nature Biotechnology website.

ACKNOWLEDGMENTSWe thank U. Abel for fruitful discussions. Funding was provided by the Deutsche Forschungsgemeinschaft (SPP1230, grant of the Tumor Center Heidelberg/Mannheim), by the Bundesministerium für Bildung und Forschung (iGene), by the VIth + VIIth Framework Programs of the European Commission (EC, European Network for the Advancement of Clinical Gene Transfer and Therapy (CLINIGENE) and Persisting Transgenesis (PERSIST) and by the Initiative and Networking Fund of the Helmholtz Association within the Helmholtz Alliance on Immunotherapy of Cancer to C.v.K. and M.S. Funding to L.N. was provided by Telethon (TIGET grant), EC (FP7-HEALTH-2009-222878, PERSIST; ERC Advanced Grant FP7, Targeting gene therapy - 249845).

the DNA binding level, improvement in the specificity of the IL2RG2 ZFN pair compared to IL2RG1 was evident in an almost doubling of the cleavage activity at the intended target site with a concurrent reduction of activity at the identified off-target loci.

Finally, we asked whether the data generated here could be used to establish an accurate on-target/off-target ratio for ZFN action in the cells tested. Such data cannot be reliably extrapolated from the number of integration sites occurring at each CLIS because NHEJ can generate the same sequence at the vector-genome junction for multiple independent IDLV integrations. In fact, it is known that NHEJ favors specific repair outcomes—for example, a duplication of the 5-bp gap between the two ZFNs targeting CCR5 represents >25% of all repair events. Therefore, the detection of a unique integration site position does not reflect how often an IDLV integration occurred at this particular location in the genome, thus preventing the simple interpretation of an on-target/off-target ratio by counting only the numbers of identified positions. Such a simple calculation would underestimate the actual ZFN activity at the most preferred site. On the other hand, we have shown that CLIS analysis can identify the actual ZFN target sites in the genome, allowing the direct assessment of NHEJ-driven indels at each site after ZFN treat-ment, independent of IDLV insertion. The frequency of indels within the population of raw sequences retrieved for each site thus provides a robust readout of NHEJ that can be used to rank ZFN activity at each off-target site (Table 2 and Supplementary Fig. 14). Under these experi-mental conditions in K562 cells (maximized for HDR activity without overt cellular toxicity) the indel frequency at the target locus is more than sevenfold above the top ranking off-target position in the ABLIM2 gene (96% sequence identity), and even more enriched compared with all other off-target positions identified. Notably, the ratio of on- versus off-target activity may still be underestimated by this approach when ZFN activity at the intended target site far exceeds the activity at all off-target sites (as was the case in our analysis), as NHEJ progressively depletes the ZFN binding sequences from the intended targeted site.

Table 2 Percent frequency of NHEJ at CCR5 target and off-target sites

Chr. Location

Nearest RefSeq gene In/ex Identity (%) ZFN dimer CCR5wt CCR5muF

IDLV-IS

LAM-PCRa

NHEJb (%)

DS

IDLV-IS

LAM-PCRa

NHEJb (%)

DS

3 46389562 CCR5 Ex 100 L_5_R 60 34.0 76 40.0

4 8165390 ABLIM2 In 95.8 L_5_L 4 7.0 – 0.0

3 46374223 CCR2 Ex 91.7 L_5_R 17 4.1 24 5.8

12 74249731 KRR1 – 83.3 R_5_L 4 3.3 2 3.8

12 94236443 VEZT – 75.0 R_5_R 5 2.9 – 0.1

11 66720370 FBXL11 In 70.8 R_5_L 11 2.7 7 2.4

3 194006347 C3orf59 In 66.7 R_5_R 6 1.8 – 0.1

6 41813539 PGC Ex 83.3 R_5_R 6 1.6 – 0.0

16 86056727 ZCCHC14 In 66.7 L_5_R 8 1.5 1 0.6

1 88422184 PKN2 – 87.5 R_15_R 3 0.1 – 0.2

2 39202369 SOS1 – 75.0 R_15_R 3 n.d. – n.d.

16 20330793 ACSM5 In 66.7 R_5_R 2 n.d. – n.d.

14 64097852 C14orf50 In 91.7 R_7_L – 0.1 – 0.0

14 87308775 GALC – 95.8 R_6_R – 0.1 – 0.1

Chr., chromosome; In, intron; Ex, exon; L, ‘left’ ZFN monomer; R, ‘right’ ZFN monomer; IS, integration site; DS, deep-sequencing. Number in ZFN-dimer column indicates spacer size. n.d., not determined. Entries in bold indicate CLIS loci. Entries in italics indicate partial homologous genomic sites with no IDLV integration sites detected (controls).aThe number of integration sites retrieved by LAM-PCR. bThe percentage of NHEJ-mutated sequences calculated in the deep-sequencing analysis.

ART ICLES©

201

1 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

nature biotechnology volume 29 number 9 september 2011 823

19. Goldberg, A.D. et al. Distinct factors control histone variant H3.3 localization at specific genomic regions. Cell 140, 678–691 (2010).

20. Naldini, L. et al. In vivo gene delivery and stable transduction of nondividing cells by a lentiviral vector. Science 272, 263–267 (1996).

21. Vargas, J. Jr., Gusella, G.L., Najfeld, V., Klotman, M.E. & Cara, A. Novel integrase-defective lentiviral episomal vectors for gene transfer. Hum. Gene Ther. 15, 361–372 (2004).

22. Li, L. et al. Role of the non-homologous DNA end joining pathway in the early steps of retroviral infection. EMBO J. 20, 3272–3281 (2001).

23. Nightingale, S.J. et al. Transient gene expression by nonintegrating lentiviral vectors. Mol. Ther. 13, 1121–1132 (2006).

24. Gaur, M. & Leavitt, A.D. Mutations in the human immunodeficiency virus type 1 inte-grase D,D(35)E motif do not eliminate provirus formation. J. Virol. 72, 4678–4685 (1998).

25. Miller, D.G., Petek, L.M. & Russell, D.W. Adeno-associated virus vectors integrate at chromosome breakage sites. Nat. Genet. 36, 767–773 (2004).

26. Lin, Y. & Waldman, A.S. Promiscuous patching of broken chromosomes in mammalian cells with extrachromosomal DNA. Nucleic Acids Res. 29, 3975–3981 (2001).

27. Lin, Y. & Waldman, A.S. Capture of DNA sequences at double-strand breaks in mam-malian chromosomes. Genetics 158, 1665–1674 (2001).

28. Petek, L.M., Russell, D.W. & Miller, D.G. Frequent endonuclease cleavage at off-target locations in vivo. Mol. Ther. 18, 983–986 (2010).

29. Deichmann, A. et al. Vector integration is nonrandom and clustered and influences the fate of lymphopoiesis in SCID-X1 gene therapy. J. Clin. Invest. 117, 2225–2232 (2007).

30. Schmidt, M. et al. High-resolution insertion-site analysis by linear amplification- mediated PCR (LAM-PCR). Nat. Methods 4, 1051–1057 (2007).

31. Gabriel, R. et al. Comprehensive genomic access to vector integration in clinical gene therapy. Nat. Med. 15, 1431–1436 (2009).

32. Paruzynski, A. et al. Genome-wide high-throughput integrome analyses by nrLAM-PCR and next-generation sequencing. Nat. Protoc. 5, 1379–1395 (2010).

33. Schroeder, A.R. et al. HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110, 521–529 (2002).

34. Mitchell, R.S. et al. Retroviral DNA integration: ASLV, HIV, and MLV show distinct target site preferences. PLoS Biol. 2, E234 (2004).

35. Matrai, J. et al. Hepatocyte-targeted expression by integrase-defective lentiviral vec-tors induces antigen-specific tolerance in mice with low genotoxic risk. Hepatology 53, 1696–1707 (2011).

36. Miller, J.C. et al. An improved zinc-finger nuclease architecture for highly specific genome editing. Nat. Biotechnol. 25, 778–785 (2007).

37. Honma, M. et al. Non-homologous end-joining for repairing I-SceI-induced DNA double strand breaks in human cells. DNA Repair (Amst.) 6, 781–788 (2007).

38. Cornu, T.I. & Cathomen, T. Targeted genome modifications using integrase-deficient lentiviral vectors. Mol. Ther. 15, 2107–2113 (2007).

39. Bibikova, M. et al. Stimulation of homologous recombination through targeted cleavage by chimeric nucleases. Mol. Cell. Biol. 21, 289–297 (2001).

40. Handel, E.M., Alwin, S. & Cathomen, T. Expanding or restricting the target site repertoire of zinc-finger nucleases: the inter-domain linker as a major determinant of target site selectivity. Mol. Ther. 17, 104–111 (2009).

41. Guschin, D.Y. et al. A rapid and general assay for monitoring endogenous gene modification. Methods Mol. Biol. 649, 247–256 (2010).

42. Bitinaite, J., Wah, D.A., Aggarwal, A.K. & Schildkraut, I. FokI dimerization is required for DNA cleavage. Proc. Natl. Acad. Sci. USA 95, 10570–10575 (1998).

43. Vanamee, E.S., Santagata, S. & Aggarwal, A.K. FokI requires two specific DNA sites for cleavage. J. Mol. Biol. 309, 69–78 (2001).

44. Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).45. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment

search tool. J. Mol. Biol. 215, 403–410 (1990).

AUTHOR CONTRIBUTIONSR.G., A.L., H.G., M.S., J.C.M., P.D.G., M.C.H., L.N. and C.v.K. conceived the project, designed experiments and interpreted data. R.G., A.L., P.G., C.K., A.N., J.W., G.F. and C.C.B. performed experiments. R.G., A.A. and J.C.M. conducted bioinformatics analysis. M.C.H. and P.D.G. provided ZFN. R.G., A.L., A.A., J.C.M., M.C.H., P.D.G., M.S., L.N. and C.v.K. prepared and wrote the manuscript.

COMPETING FINANCIAL INTERESTSThe authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/nbt/index.html.

Published online at http://www.nature.com/nbt/index.html.Reprints and permissions information is available online at http://www.nature.com/reprints/index.html.

1. Klug, A. The discovery of zinc fingers and their applications in gene regulation and genome manipulation. Annu. Rev. Biochem. 79, 213–231 (2010).

2. Urnov, F.D., Rebar, E.J., Holmes, M.C., Zhang, H.S. & Gregory, P.D. Genome editing with engineered zinc finger nucleases. Nat. Rev. Genet. 11, 636–646 (2010).

3. Lombardo, A. et al. Gene editing in human stem cells using zinc finger nucleases and integrase-defective lentiviral vector delivery. Nat. Biotechnol. 25, 1298–1306 (2007).

4. Urnov, F.D. et al. Highly efficient endogenous human gene correction using designed zinc-finger nucleases. Nature 435, 646–651 (2005).

5. Kim, Y.G., Cha, J. & Chandrasegaran, S. Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc. Natl. Acad. Sci. USA 93, 1156–1160 (1996).

6. Mani, M., Smith, J., Kandavelou, K., Berg, J.M. & Chandrasegaran, S. Binding of two zinc finger nuclease monomers to two specific sites is required for effective double-strand DNA cleavage. Biochem. Biophys. Res. Commun. 334, 1191–1197 (2005).

7. Smith, J. et al. Requirements for double-strand cleavage by chimeric restriction enzymes with zinc finger DNA-recognition domains. Nucleic Acids Res. 28, 3361–3369 (2000).

8. Perez, E.E. et al. Establishment of HIV-1 resistance in CD4+ T cells by genome editing using zinc-finger nucleases. Nat. Biotechnol. 26, 808–816 (2008).

9. Liu, P.Q. et al. Generation of a triple-gene knockout mammalian cell line using engineered zinc-finger nucleases. Biotechnol. Bioeng. 106, 97–105 (2010).

10. Santiago, Y. et al. Targeted gene knockout in mammalian cells by using engineered zinc-finger nucleases. Proc. Natl. Acad. Sci. USA 105, 5809–5814 (2008).

11. Bibikova, M., Golic, M., Golic, K.G. & Carroll, D. Targeted chromosomal cleavage and mutagenesis in Drosophila using zinc-finger nucleases. Genetics 161, 1169–1175 (2002).

12. Geurts, A.M. et al. Knockout rats via embryo microinjection of zinc-finger nucleases. Science 325, 433 (2009).

13. Hockemeyer, D. et al. Efficient targeting of expressed and silent genes in human ESCs and iPSCs using zinc-finger nucleases. Nat. Biotechnol. 27, 851–857 (2009).

14. Moehle, E.A. et al. Targeted gene addition into a specified location in the human genome using designed zinc finger nucleases. Proc. Natl. Acad. Sci. USA 104, 3055–3060 (2007).

15. Maeder, M.L. et al. Rapid “open-source” engineering of customized zinc-finger nucleases for highly efficient gene modification. Mol. Cell 31, 294–301 (2008).

16. Bibikova, M., Beumer, K., Trautman, J.K. & Carroll, D. Enhancing gene targeting with designed zinc finger nucleases. Science 300, 764 (2003).

17. DeKelver, R.C. et al. Functional genomics, proteomics, and regulatory DNA analysis in isogenic settings using zinc finger nuclease-driven transgenesis into a safe harbor locus in the human genome. Genome Res. 20, 1133–1142 (2010).

18. Doyon, J.B. et al. Rapid and efficient clathrin-mediated endocytosis revealed in genome-edited mammalian cells. Nat. Cell Biol. 13, 331–337 (2011).

ART ICLES©

201

1 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

nature biotechnology doi:10.1038/nbt.1948

ONLINE METHODS

Vectors and cell transduction. GFP- and ZFN-expressing constructs were generated from HIV-derived, third generation self-inactivating transfer con-structs (Supplementary Methods). IDLV stocks were prepared and titered by HIV-1 Gag p24 immunocapture assay (PerkinElmer) as described3. For transduction, 1 × 106 K562 cells were incubated overnight with GFP-expressing IDLVs (0.75 μg HIV Gag p24 equivalent/ml), either alone or together with IDLVs expressing either CCR5- or IL2RG-ZFNs (1 μg p24/ml for each ZFN-expressing IDLV), and then expanded for flow cytom-etry analysis (FACSCalibur; Becton Dickinson Pharmingen) and genomic DNA extraction (Blood & Cell Culture DNA Midi Kit, QIAGEN). Single cell derived clones were obtained by limiting dilution and analyzed for integra-tion into the ZFN-target site by Southern blot, PCR and real-time PCR as previously described3. ZFNs targeting exon 3 of the CCR5 gene and the first ZFN pair targeting exon 5 of the IL2RG gene were previously described3. The amino acid sequences of all ZFN used in this study are shown in Supplementary Figure 16.

Induction of DSB by irradiation of IDLV transduced K562 and A549 cells. For dose dependant stable marking of radiation induced DSB by IDLV 1 × 106 K562 or A549 cells were transduced with a GFP-expressing IDLV (0.6 µg HIV Gag p24 equivalent/ml). Cells have been photon-irradiated with 2 or 5Gy 24h post transduction. Stable IDLV trapping of DSB was measured by the percentage of GFP-expressing cells with FACS and subsequent subculturing of irradiated cells for 28 (K562) or 50 (A549) days post transduction.

Insertion site analysis by LAM-PCR. To identify insertion sites of the IDLV LAM-PCR30 and nrLAM-PCR31,32 was performed using the enzymes Tsp509I, MseI, HpyCH4V and MspI. The resulting amplicons were sequenced using the Roche/454 platform.

Pyrosequencing using the 454 platform (Roche). PCR amplicons were pre-pared as suggested by the manufacturer. An additional PCR (‘Fusionprimer-PCR’) with fusion primers containing individual barcode sequences of 6 bases was carried out. 40 ng of purified LAM-PCR products served as template for the fusion primer PCR reaction. PCR conditions: Initial denaturation 2 min at 95 °C; followed by 12 cycles at 95 °C for 45 s, 60 °C for 45 s and 72 °C for 60 s. Final elongation was 5 min at 72 °C. 15 µl of the PCR-products were analyzed on a 2% agarose gel. DNA concentration was measured with the ND-1000 Spectrophotometer (Thermo Scientific).

IS data analysis of IDLV transduced cells. integration site data analysis was performed as previously described and aligned to the human genome using BLAT32,44 (RefSeq genes and RepeatMasker; Assembly March 2006, hg18).

In silico prediction of off-target loci. To identify sequence homologous to the ZFN motifs, the human genome was scanned for all possible 3mers contained in the ZFN motifs. All matches were extended to full motif length depending on the location of the 3mer within the motif. Between the two ZFN cassettes a spacer length from 0 to 20 nucleotides was allowed.

Analysis of NHEJ of potential off-target hotspots using the Cel1 assay. K562 cells were co-transfected with 2.5 ug of each plasmid expressing the individual CCR5-ZFNs fused to either the wild-type FokI cleavage domain (WT) or the het-erodimeric FokI domains (muF) and cloned into the pVax vector (Invitrogen). Approximately 1 × 106 cells were nucleofected with a total of 5 ug of plasmid DNA using Cell Line Nucleofector Kit V and Program T16 according to the manufacturer’s protocol. The cells were harvested 48 h post-transfection, and the genomic DNA was isolated using the DNA MasterPure Kit (Epicentre Biotechnologies). ZFN-induced modification of the genomic loci encompass-ing the CLIS were analyzed by the Cel1 assay, as previously described36, using the primers listed in Supplementary Table 2 to PCR amplify the regions of interest. PCR conditions: Initial denaturation 5 min at 94 °C; followed by 30 cycles at 94 °C for 30 s, 60 °C for 30 s and 68 °C for 30 s. Final elongation was 5 min at 68 °C.

Deep sequencing of potential off-target loci. Selected genomic loci, which showed partial sequence identity to the ZFN target site and/or harboured CLIS were amplified by nested PCR from 500 ng genomic DNA of ZFN treated cells generating amplicons of 278 to 337 bp genomic DNA surrounding the poten-tial ZFN binding site (Supplementary Table 2). PCR products were purified using the Qiagen PCR Purification Kit and 454 Titanium adaptors were added by Fusionprimer-PCR. The obtained sequences were aligned to the PCR specific target sequence via BLAT44 to observe deletions and insertions at the cutting site. Sequences which showed indel of >2 bp located within the spacer ± 1 bp were con-sidered as ZFN induced genome modification. The aligned sequences were further cross validated by a comparison to the human genome with BLAST45. Sequences that aligned to another genomic loci were removed from further analyses.

DNA sequence recognition of ZFN. The putative ZFN binding sites neigh-boring CLIS have been aligned using the sequence logo generator WebLogo (http://weblogo.berkeley.edu/).

© 2

011

Nat

ure

Am

eric

a, In

c. A

ll ri

gh

ts r

eser

ved

.