T cells are essential parts of the adaptive immune response. The T-cell receptors (TCRs) expressed on their surface enable the recognition of unique molecular patterns of the pathogens invading the host. Understanding the expression profiles of TCRs (i.e., the diversity of receptors and clonotypes) provides insights into the adaptive immune response in healthy individuals and those with a wide range of diseases. Accurate determination of the clonotypes expressed by the immune system will aid in generating a complete picture of the T-cell repertoire and its role in human health, as well as help guide the development of immune therapy research.
Current next-generation sequencing (NGS) approaches for profiling T-cell repertoires have yielded valuable insights into the adaptive immune response and clonal selection. There are two major approaches used in profiling T-cell repertoires: multiplex PCR and 5' RACE combined with NGS. While multiplexing allows you to amplify multiple TCR genes in one reaction, it can be difficult to achieve accurate, reproducible clonotype identification due to suboptimal sensitivity and specificity, as well as a prevalence in amplification bias of certain sequences. The new SMARTer Human TCR a/b Profiling Kit v2 (TCRv2 kit) leverages SMART full-length cDNA synthesis technology (Switching Mechanism at 5' end of RNA Template) and pairs NGS with a 5'-RACE approach to provide a sensitive, accurate, and optimized method for TCR profiling that captures complete V(D)J variable regions of TRA and TRB genes (Figure 1, Panel A). Contrary to systems that use multiplex PCR, the 5'-RACE method does not require any prior knowledge of the sequences comprising the 5' end of the TCR transcripts. Additionally, the 5'-RACE method reduces variability and allows for priming from the constant region of TRA/TRB (Figure 1, Panel B). Downstream sequencing provides accurate identification of top clonotypes.
The TCRv2 kit has several key improvements over our first TCR kit (referred to as TCRv1), from optimized chemistry to the addition of both unique molecular identifiers (UMIs) and unique dual indexes (UDIs). During the template-switching step, 12 random nucleotides are incorporated into the cDNA with the TCR SMART UMI Oligo. When used with our Cogent NGS Immune Profiler Software, PCR duplicates and sequencing errors can be detected and removed from the data, enabling more accurate, reliable clonotype calling and quantification. The addition of the UDIs lets researchers pool multiple samples—currently up to 192 different samples—while providing greater confidence in sample integrity when sequencing on a patterned Illumina flow cell. The shorter length of the libraries makes them compatible with any Illumina sequencer, allowing researchers to save on sequencing costs and increase sample multiplexing on high-throughput sequencers like the NovaSeqTM system. Alternatively, the full-length TCRa and/or TCRb transcript information can be obtained when libraries are sequenced on the MiSeq® system. The TCRv2 kit is designed to generate a consistent library yield from 10 ng–1 µg of PBMC RNA and 1 ng–100 ng of T-cell RNA, ensuring a sufficient yield is achieved for sequencing and allowing for improved ease of use.
Sensitive and reproducible clonotype detection from a wide range of RNA amounts
To evaluate the sensitivity of the new SMARTer Human TCR a/b Profiling Kit v2 (TCRv2), libraries were prepared from either human CD3+ T-cell total RNA (1, 10, and 100 ng) or human CD3+ T cells (1,000 and 10,000 cells). As shown in Figure 2, TRA and TRB clonotype counts from the TCRa and TCRb libraries consistently increase as the amount of RNA input increases. Similar data were obtained when using RNA extracted from PBMCs with inputs ranging from 10 ng to 1 μg (data not shown). 1,000 and 10,000 T cells were resuspended in lysis buffer containing RNase Inhibitor and were used for the library preparation. A significant number of clonotypes was identified using whole cells as input. These data clearly indicate that the kit is robust enough to accommodate samples with very high complexity (e.g., libraries can be generated directly from lysed T cells without the need for RNA purification).
In addition to high sensitivity and the ability to accommodate a large range of sample complexities, this protocol also shows a high level of reproducibility. Technical replicates of TCRb libraries generated with 100 ng of T-cell RNA extracted from a single donor showed excellent correlation between overlapping TRB clones, as demonstrated by a Pearson correlation (r) of 0.999 and a Spearman's correlation (ρ) of 0.97 for the top 50 ranked TCRb clonotypes (Figure 3).
TCRv2 provides improved unbiased amplification of TCR transcripts
Even with an industry-leading product such as the SMARTer Human TCR a/b Profiling Kit, there is always room for improvement. To evaluate the improved performance of the TCRv2 kit, we compared the libraries prepared from PBMC RNA from a single donor using the TCRv2 kit and our original TCRv1 kit. Since the TCRv1 kit does not include UMIs, an arbitrary cutoff (using a frequency cutoff line of 0.0001%, 0.001%, and 0.01%) needs to be set to remove low-confidence clonotypes like singletons generated from sequencing or PCR errors (Figure 4, Panel A). The TCRv2 data identified clonotypes with greater confidence due to the addition of UMIs. Furthermore, the detected TRA/TRB V and J segments perfectly overlapped between the two versions of the SMARTer TCR kits (Figure 4, Panel B). Chord diagrams showed similar patterns of V-J combination between TCRv1 and TCRv2 (Figure 4, Panel C). These data demonstrated that TCRv2 chemistry shows an improvement in the unbiased amplification of V-J segments compared to TCRv1, thanks to the greater confidence provided by UMI analysis.
Confident identification of low-abundance clonotypes
In order to further test the reproducibility and detection limit of the TCRv2 kit, we spiked a serial dilution of Jurkat RNA into 100 ng of PBMC RNA. As shown in Table 1, we were able to accurately quantify the TRBV12-3-TRBJ1-2 Jurkat-specific sequence reads to a concentration of 0.01% without UMI collapse at a depth of ~2,500,000 TRA/TRB total reads (indicated as gray background). Importantly, multiple PCR cycles amplified Jurkat transcripts with very low copy numbers and did not maintain the linear ratio at a spike-in concentration of 0.001%. In contrast, when UMI collapse was performed, the linear detection of Jurkat-specific sequences at 0.001% was evidence of the improved sensitivity afforded by the UMI-based analysis approach. When comparing the percentage of Jurkat RNA spiked into the sample versus the percentage of detected Jurkat UMI, there is a perfectly linear correlation (r >0.99) from 10% to 0.001% (five orders of magnitude). This can be seen consistently in both of the technical duplicates (Figure 5). This result demonstrates that differences in relative abundance of transcripts for a particular TCR clonotype are faithfully and reproducibly represented in sequencing libraries generated using SMARTer technology and a UMI approach. Thus, the TCRv2 kit can accommodate the detection of rare TCR clones.
% Jurkat RNA spiked in to 100 ng of PBMC RNA
Total read count (TRA/TRB)
Without UMI collapse
With UMI collapse
# of TRB raw reads
# of reads for TRBV12‑3-TRBJ1‑2
Detected percentage of Jurkat reads
# of detected UMIs
# of UMIs for TRBV12‑3-TRBJ1‑2
Detected percentage of Jurkat UMIs
Table 1. Assessing the sensitivity and reproducibility of the SMARTer approach. Spike-in analysis was performed in replicate on PBMC RNA samples spiked at varying concentrations (10%, 1.0%, 0.1%, 0.01%, 0.001%, and 0.0001%) with RNA obtained from a homogeneous population of leukemic Jurkat T cells (containing TRBV12-3-TRBJ1-2 clonotypes). TRB CDR3 regions were amplified from 100 ng of total RNA using the TCRv2 kit and sequenced. Reads of 2 x 150 bp were obtained on an Illumina NextSeq® system. The sequencing reads were downsampled to 2.5M reads. Read results for spike-in concentrations identified as the reliable concentration limit for each criterion (without and with UMI collapse) have data highlighted in gray. Without UMI collapse, PCR duplicates of TRBV12-3 were observed in 0.0010% of the raw reads.
Significant impact of biological variation on the number of clonotypes detected
The number and expression profile of T cells in peripheral blood circulation vary from person to person. We tested 10 ng of PBMC RNA from six different donors with the TCRv2 kit. A total of 12 libraries (TCRa and TCRb) from these six donors were pooled and sequenced. We found that clonotype counts were indeed very different from sample to sample, as shown in Figure 6. These data also demonstrated the large range of clonotype counts that the kit can identify. The smallest clonotype counts identified in one library was 4,200 (TRA) and 8,700 (TRB), while the largest was around 10,000 (TRA) and 17,000 (TRB). The on-target rates of these libraries ranged from 75% to 95% (data not shown).
Avoid oversequencing with UMI analysis
The incorporation of unique molecular identifiers (UMIs) is another great feature of the SMARTer Human TCR a/b Profiling Kit v2. UMIs are often used to remove molecular duplicates and sequencing errors resulting from PCR. Without UMI-based correction (Figure 7, Panel A), the number of clonotype counts identified increases as you continue to sequence deeper (yellow line). However, without correcting for UMIs, it is difficult to say if the newly identified clonotypes are rare or if some are the result of the accumulation of PCR and/or sequencing errors. However, when UMI-based correction is included, the clonotype count plateaus after reaching a saturated sequencing depth; in this data, the plateauing occurs at 1M reads per library. To further illustrate this, when comparing the clonotype calls between 1M (+UMI) and 5M (+UMI) reads, there is at least a 90% overlap in the clonotypes identified (Figure 7, Panel B). With fewer reads required to identify the same number of clonotypes, users can instead pool more samples with the additional sequencing reads available. Collectively, these results suggest that for SMARTer human TCR libraries generated from 10 ng PBMC RNA, 1M reads per library is sufficient to capture the majority of clones.
Superior sensitivity and reproducibility compared to alternative profiling approaches
Applications of NGS to genomes (DNA-seq) and transcriptomes (RNA-seq) are becoming standard components of immune profiling. However, it remains unclear which methods provide the best quantitative data. We, therefore, conducted comparative studies using technologies from two different vendors. Company X takes advantage of a ligation-based method to add their adapters after reverse transcription using RNA as the starting input. Company Y uses gDNA and multiplex PCR to amplify the TRB gene. Takara Bio's kit (TCRv2) uses a 5'-RACE RNA approach. A total of 5M PBMCs were used for each gDNA and RNA extraction, and a significant portion of gDNA (1.6 µg) and total RNA (100 ng) were used for library preparation. Clonotype numbers were generated following each respective company's pipeline. (Note: Company Y does not provide TRA information, so it was not tested).
Downsampling allows for the fair comparison of different sequencing data, and we previously demonstrated that Takara Bio's TCRv2 has superior sensitivity in clonotype calling at 5M reads. Superior sensitivity was also observed in TRA and TRB clonotypes in two biologically different samples (Figure 8). The length distribution of the TRB CDR3 amino acid sequence showed similar patterns for all three technologies (Figure 9, Panel A). These data opened the question of whether the three technologies share identical clonotype listings in the top ranks. The clonotypes called in the top 10 and 20 ranks were identified by all three technologies (Figure 9, Panel B). In contrast, the clonotypes in the top 100 rank correlated well in Takara Bio TCRv2 replicates (Figure 9, Panel B, left plot) but not in the replicates of the other technologies (data not shown). These results demonstrate that the Takara Bio TCRv2 method has greater reproducibility than other mRNA and gDNA methods.
The SMARTer Human TCR a/b Profiling Kit v2 is a powerful tool for profiling human T-cell receptors. By leveraging SMART technology and combining a 5'-RACE approach with gene-specific amplification, this workflow captures complete V(D)J variable regions of TCRs and is optimized for highly sensitive and specific clonotype detection. With primers that incorporate Illumina-specific adapter sequences during cDNA amplification, the protocol generates indexed libraries ready for sequencing on Illumina platforms. This optimized method also includes a unique PCR cycling and pooling workflow, which reduces sequencing costs while still enabling accurate clonotype identification. By avoiding multiplex PCR, this kit also avoids the pitfalls of amplification biases of certain sequences, helping to provide a complete and accurate view of human TCR repertoires. Incorporating UMIs into the libraries makes it possible to remove reads derived from PCR or sequencing errors, thus ensuring more accurate and reliable results. Incorporating UDIs into the libraries allows for both pooling of multiple samples and sequencing on patterned flow cells without worrying about index hopping. Lastly, our Cogent NGS Immune Profiler Software provides an easy-to-use method for analyzing the immune repertoire at your fingertips.
Materials and methods
CD3+ T-cell RNA was purchased from AllCells (Cat. #LP, CR, CD3+, NS, 25M). PBMC RNA from single donors and Jurkat RNA was purchased from Biochain (Cat. # R1255815-50) in addition to in-house RNA, which was extracted from PBMC cells acquired from AllCells. RNA was extracted using the Macherey-Nagel NucleoSpin RNA PLUS kit (available from Takara Bio, Cat. # 740984.50).
10 ng, 1 ng, 100 pg, 10 pg, and 1 pg of Jurkat RNA were spiked into 100 ng of a single donor's T-cell RNA. All libraries containing TCRa/b sequences were generated using the SMARTer Human TCR a/b Profiling Kit v2, as per the user manual. Following purification and size selection, libraries were quantified using the Qubit and the Agilent 2100 Bioanalyzer. Pooled libraries were quantified with the Library Quantification Kit (Cat. # 638324) and sequenced on either an Illumina MiSeq platform with 600-cycle V3 cartridges (Illumina, Cat. # MS-102-3003) or NextSeq platform with 300-cycle cartridges (Illumina, Cat. # 20024905). Sequencing data analysis was completed using the Cogent NGS Immune Profiler Software. The report of top 9,999 clonotypes generated by the immune profiler were uploaded to VDJviz browser (https://vdjviz.cdr3.net/) for chord diagram visualization.