T-cell receptor (TCR) profiling of bulk samples has improved our understanding of the TCR repertoire diversity, TCR-mediated antigen specificity, and mechanisms of adaptive immune response (Becattini et al. 2015). However, while bulk studies can indicate which clonotypes are expressed and their relative frequencies in the population, it is nearly impossible to determine the proper alpha-beta (αβ) pairing of specific receptor chains (Figure 1A) of the cells—with the exception of some very rare cell populations (Figure 1B). The lack of pairing information makes it difficult for these studies to elucidate how particular chain pairings contribute to antigenic specificity (Stubbington et al. 2016). Sequencing single T cells allows for the determination of αβ pairings on TCRs thus providing context for antigen specificity and insights for the efficient design of TCRs for targeted immunotherapy, and helping establish ancestral relationships to pinpoint towards the clonality of cells.
While single-cell sequencing is a powerful approach, most methods require sequencing of a large number of cells, making the experiments prohibitively expensive and the analyses difficult. The SMARTer Human scTCR a/b Profiling Kit (referred to below as the "human scTCR kit") tackles this problem by combining SMART cDNA synthesis and RACE-based gene-specific priming followed by TCR-specific PCR to fully capture and amplify TCR-α and TCR-β variable regions and generate Illumina-ready libraries that provide a highly sensitive approach to sequencing TCRs (Figure 2).
Optimal index design allows for unbiased library construction and accurate multiplexing
In order to streamline the workflow and processing of samples, we devised a strategy for using in-line indexes (SMART-Seq Indexed Oligos) that act as cell barcodes which allow pooling the 96 cells into twelve sample pools following the cDNA amplification step (Figure 3). Before proceeding with an analysis of large data sets, we wanted to validate two key aspects of our assay:
Do all the SMART-Seq Indexed Oligos behave equivalently?
How many errors should we allow when demultiplexing the data?
Control Jurkat Total RNA was processed with the human scTCR kit to generate libraries for each SMART-Seq Indexed Oligo. The libraries were then sequenced on an Illumina MiSeq® sequencer (see Methods section for details) to assess the performance of the barcodes. In particular, we wanted to assess whether any barcode introduced bias (resulting from sequence-specific differences in template-switching or amplification). The sequencing results from the final libraries showed that all of the different SMART-Seq Indexed Oligos performed equivalently and showed no bias, as evidenced by the similar percentage of total reads obtained from each indexed library across the different barcodes (Figure 4A). Additionally, the data indicate that minimal reads (~3%) cannot be assigned to any of the index sequences (Figure 4A, Bad barcode).
We next wanted to examine the ability to demultiplex the data. Data for one of the pools sequenced for Panel A was run through the SMARTer Human scTCR Demultiplexer software allowing for either zero or one errors in the assignment of reads to the SMART-Seq Indexed Oligo barcodes. When no errors were allowed, 6.4% of reads for the pool were unassignable to a specific index (Figure 4B, Bad barcode). When one error was allowed the unassignable reads were reduced to 2.9% with a parallel, even, increase in the percent of reads mapping to each of the indexes (Figure 4B). Furthermore, analysis of the clonotype calls for the corresponding reads with zero or one errors, revealed identification of the correct TCR-α and -β clonotype for all Control Jurkat Total RNA wells at the same frequency, independent of the allowed errors (Figure 4C). Since allowing for one error in the demultiplexing process did not have any adverse effects on the data, we performed our data analysis using this setting. Taken together, these results speak to the robustness of the design of the in-line indexes in the SMART-Seq Indexed Oligos.
Specific amplification of TCR a/b chains in the final libraries
We next moved to evaluating the human scTCR kit's performance using cells. To assess the performance, we mixed Jurkat and CCRF-CEM cells at a 1:1 ratio and sorted the cells via FACS. Cells were sorted into a 96-well plate, and TCR libraries were generated per the kit protocol (see Methods section). These cell lines were chosen because they are known to express different clonotypes that can be easily distinguished. To confirm the success of library amplification and purification, samples were run on a Bioanalyzer. The electropherograms show distinct peaks at ~700 bp and 900 bp, the expected positions for TCR-β and TCR-α sequences, respectively. The variations in the peak profiles, however, suggest pool-to-pool differences in the abundances of the two different chains.
Making accurate clonotype calls in cells
We next set out to establish analysis criteria for the confident calling of clonotypes. This was done by examining the number of reads mapping to the top TCR-α or the top TCR-β clonotype in negative control wells (wells with no cells). Since these wells are truly empty, any reads assigned to them can be considered to be background. For this experiment, libraries were generated and sequenced for three pools (each containing seven wells with cells and one empty well as a negative control). The FASTQ files for each pool were down-sampled to 300,000 reads, demultiplexed with one error allowed, and individual wells were analyzed using MiXCR2.0.2. To ensure that our data did not contain any clonotype calls that were within the background range, the mean number of clonotype reads (mean = 74 reads) for the negative wells was established and any clonotype call for a cell-containing well that fell within three standard deviations (mean + 3SD = 238 reads) of the negative-well mean was omitted from further analysis. The data of cell-containing wells (Figure 6, blue dots on the left) was evaluated using this threshold of 238 reads to make a clonotype call. With the threshold applied (shown by the dashed red line), low-read clonotype calls were eliminated, leaving us with data that was out of the background range (purple dots, right).
Analysis of a mixed population of single cells reveals pairing information
Following sequencing of the mixed Jurkat-CCRF pools, we sought to demultiplex the data and determine which cell type was present in each well of the pool, and subsequently which TCR clonotypes were present in each of the cells. The FASTQ files for each pool were down-sampled to 300,000 reads, demultiplexed with one error allowed for barcode assignment, and individual cells were anlyzed using MiXCR2.0.2. The threshold of 238 reads for clonotype calling that was established in previous experiments (Figure 6) was applied to this dataset, and clonotype calls that fell below the threshold were omitted.
When the clonotype sequences were examined to determine which cell type they were characteristic of, 43 wells were assigned as Jurkat while 46 cells were deemed CCRF-CEM (Figure 7A). This observed distribution corresponds well with 1:1 cell mixture prepared for these experiments. A total of seven wells could not be assigned a cell type because they did not have clonotype calls for either TCR-α or TCR-β chains with read numbers that were above the cutoff threshold.
Digging further into the data, we next sought to examine which chains were contained in each cell (Figure 7B, 7C). A total of 38% of the cells were assigned an αβ pairing, 16% of the cells had only an α chain, and 46% of the cells showed only a β chain. A careful comparison of the electropherograms in Figure 4 with the data in Pools 9, 10, and 11 shows a clear correlation between the observed peaks on the trace and the relative abundance of TCR-α or TCR-β clonotypes in the resulting data. For example, Pool 9 was largely comprised of cells for which only a β chain was sequenced, and the pool's electropherogram had a very predominant peak at ~700 bp, a characteristic of TCR-β chains. Meanwhile, Pool 11's electropherogram, shows peaks for both TCR-α and TCR-β chains with the TCR-β peak slightly more prominent as expected for this pool where eight cells had a sequenced β chain while only five contained an α chain.
It should be noted that for cells where only one chain was detected the data does not rule out the complete absence of the opposite chain. One possibility is that since the cells were not activated the missing chain was not present at high enough levels to be detected. Another is that there were not enough reads (i.e., reads fell below the threshold) to confidently call the clonotype for the opposing chain. The low input concentration for these single cells may be another contributing factor. The observation of a high proportion of cells containing only TCR-β reads is not unexpected since literature indicates that beta chains are more highly expressed.
The SMARTer Human scTCR a/b Profiling Kit provides a powerful solution for sensitive profiling of TCR chains from single cells, allowing the identification of alpha-beta chain pairings in each cell. Our validated assay design is optimized to pool 96 samples into 12 sequencing libraries, making it easy to use. In addition, the 12 libraries can be further multiplexed to run in a single flow-cell lane. Sequencing data can be demultiplexed using the SMARTer Human scTCR Demultiplexer software to assign reads and make accurate clonotype calls for each cell. The sensitivity of the kit allowed identification of alpha-beta pairings in Jurkat and CCRF-CEM cells that are in agreement with single-cell sequencing reports by other groups.
Libraries containing TCR-α and TCR-β sequences were generated using the SMARTer Human scTCR a/b Profiling Kit per the protocol given in the user manual. For assay validation experiments, libraries were generated from Control Jurkat Total RNA (Takara Bio). Each pool contained 8 x 5 pg of the Control Jurkat Total RNA, and seven replicate pools were run across three experiments. For the mixed-cell population studies, Jurkat and CCRF-CEM cells (ATCC) were mixed at a 1:1 ratio. Cells were sorted into a 96-well plate using FACS prior to processing with the human scTCR kit.
Samples were pooled at a final concentration of 4 nM. The final library pool was diluted to 13.5 pM, including a 5–10% PhiX Control v3 (Illumina) spike-in for sequencing. While not essential, the addition of the PhiX control allows for detection of sequencing errors and increases the nucleotide diversity and thus aids in high-quality data generation. Sequencing was performed on an Illumina MiSeq sequencer using the 600-cycle MiSeq Reagent Kit v3 (Illumina) with paired-end, 2 x 300 base pair reads.
After sequencing, the FASTQ files for each pool were demultiplexed using the SMARTer Human scTCR Demultiplexer available on the Takara Bio website and reads were assigned to each in-line index/sample well. Unless otherwise stated, demultiplexing was performed allowing for one error in the in-line index. Repertoire analysis for each sample well was performed using MiXCR 2.0.2 (Bolotin et al. 2015).
Becattini, S. et al. Functional heterogeneity of human memory CD4+ T cell clones primed by pathogens or vaccines. Science 347, 400–406 (2015).
Bolotin, D. A. et al. MiXCR: software for comprehensive adaptive immunity profiling. Nat. Methods12, 380–1 (2015).
Stubbington, M. J. T. et al. T cell fate and clonality inference from single-cell transcriptomes. Nat. Methods13, 329–32 (2016).