Measure twice, cut once: predicting gene editing outcomes

Date: October 21, 2019

Author: Takara Bio Blog Team

Over the past several years, the genome editing field has experienced nothing short of a revolution. Recently, this has largely been driven by the development and refinement of CRISPR/Cas9 technology. By introducing double-strand breaks (DSBs) at targeted locations in the genome and leveraging endogenous repair processes, CRISPR/Cas9 provides a simple method for engineering site-specific genomic modifications. However, despite these rapid advances, this method is still imperfect—and often unpredictable. Errors in DSB repair occasionally introduce insertions or deletions, which may (depending on the desired outcome) necessitate additional screening and a trial-and-error approach to choosing guide RNAs.

But why guess at something when you can predict it? While we've written previously about our own tips to optimize gene editing experiments, we'd like to turn the spotlight to the Shendure lab at the University of Washington for their recent article (Chen et al. 2019) that not only showed how repair errors are influenced by the surrounding DNA sequence, but also gave us a more accurate tool to predict them.

A method to the madness: understanding editing outcomes

To determine whether gene editing outcomes exhibit a predictable pattern, researchers first generated plasmids encoding single guide RNAs (sgRNAs) for 12,917 targets paired with the corresponding target sequences in cis, which were packaged into lentiviruses and used to infect Cas9-expressing HEK293T cells. Targeted genomic regions were amplified by PCR, incorporating unique molecular identifiers (UMIs) to allow the linking of a Cas9-mediated edit to the cell of origin.

Sequencing of these amplicons and subsequent data filtering resulted in 1.16 million reads that represented 6,872 unique targets, with editing outcomes divided into three categories: unedited (wild-type), deletion, or insertion. A minority of these outcomes (9.8% of UMIs) corresponded to unedited alleles, while 63.6% and 31.5% of edited alleles corresponded to deletions and insertions, respectively. Insertions were overwhelmingly comprised of 1-bp events (~75.3%), while large events (>25-bp) were much rarer (~1.5%).

Insertion events seemed to be determined by the sequence surrounding the DSB: 85% of 1-bp insertions appeared to be templated by the 17th nucleotide in the genomic sequence targeted by the guide RNA. Similarly, for 2-bp insertions, 41% appeared to be templated by the 16th and 17th nucleotides in the target sequence (4 or 3 bp upstream from the PAM, respectively). Interestingly, insertions were associated with an A or T at the 17th nucleotide, while C and G tended to be associated with deletions.

Over 75% of the observed deletion events were mediated by microhomologies (MHs) 1–10 bp in size, with 94.6% of MHs falling between 1 and 4 bp in size. To determine whether these MHs could be used to predict editing outcomes, researchers reused their earlier strategy to generate and test a library of 1,000 targets and sgRNAs that contained MHs of varying sizes (2, 4, or 6 bp). While putative MH-mediated editing outcomes were more readily predicted compared to a random sequence, there was very little change in the relative number of deletions (82% in targeted MHs vs. 76% in random sequences selected in the previous experiment).

Based on these findings, the researchers then developed a model, termed Lindel, to predict NHEJ outcomes for any genomic target sequence. After training this model on their data, they compared Lindel's performance to two other models that had been developed by other groups: ForeCasT (Allen, 2019) and inDelphi (Shen, 2018). Performance was determined based on mean squared error.

Machine learning for predicting gene editing outcomes

Perhaps unsurprisingly, Lindel performed best on the researcher's data set (followed by inDelphi and ForeCasT), while ForeCasT performed best on its own data set (followed by Lindel and inDelphi). The researchers then aggregated these two data sets and used the combined data set to re-train Lindel. Following this second training, Lindel had the best performance on both the researcher's data set and the ForeCasT data set—though the researchers noted that most of the prediction errors were primarily due to small deletions and 1-bp insertions, indicating further work will need to be done to predict these events.

Know thy editing: simplifying genome engineering

The insights gleaned by the Shendure lab will allow researchers a greater measure of predictability in genome engineering experiments. In particular, the Lindel webtool they've provided will narrow down the number of sgRNAs researchers will need to evaluate for their desired editing outcomes, enabling more streamlined and efficient genome editing workflows.

Their tool allows you to predict your experimental outcome—and our kits make those outcomes a reality. Check out the links below to learn how our best-in-class gene editing screening kits and reagents can work to make your gene editing workflows efficient, fast, and simple, from start to finish.

References

Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. 37, 64–72 (2019).

Chen, W. et al. Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair. Nucleic Acids Res. 47, 7989–8003 (2019).

Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018).

Back to Blog Front

End-to-end solutions for your gene editing needs

In vitro transcription and screening kits

Kits providing a simple method for production and evaluation of sgRNAs for Cas9-mediated gene editing.

Recombinant Cas9 protein

A recombinant Cas9 nuclease for use in CRISPR/Cas9-mediated gene editing experiments.

Mutation detection kits

A faster and simpler method than the surveyor assay for identifying mutations such as insertions or deletions in mammalian cells.

Genotype confirmation kit

Screening for monoallelic or biallelic indels after CRISPR/Cas9 gene editing.

Indel identification kit

The Guide-it Indel Identification Kit includes all of the components needed to amplify, clone, and prepare modified target sites for identification of indels by DNA sequence analysis.

Knockin screening kit

Rapid, sensitive detection of single-nucleotide substitutions or insertions for precise genome editing applications.

Takara Bio USA, Inc.
United States/Canada: +1.800.662.2566 • Asia Pacific: +1.650.919.7300 • Europe: +33.(0)1.3904.6880 • Japan: +81.(0)77.565.6999
FOR RESEARCH USE ONLY. NOT FOR USE IN DIAGNOSTIC PROCEDURES. © 2025 Takara Bio Inc. All Rights Reserved. All trademarks are the property of Takara Bio Inc. or its affiliate(s) in the U.S. and/or other countries or their respective owners. Certain trademarks may not be registered in all jurisdictions. Additional product, intellectual property, and restricted use information is available at takarabio.com.