Unlock insights: Comprehensive genome and transcriptome information from a single cell
Revolutionizing biomarker discovery in cancer research requires deeper insights into metastasis, therapy resistance, and tumor evolution—achievable through single-cell multiomic analysis.
Advancing the understanding of cancer progression through the integrated study of the genome and transcriptome of circulating tumor cells (CTCs) facilitates the development of non-invasive liquid biopsy approaches.
Enabling comprehensive genomic and transcriptomic analysis from single cells, the Embgenix GT-omics Kit and its accompanying bioinformatics tools further advance cancer and CTC research.
Single-cell multiomic approaches, such as combined genomic and transcriptomic analysis, enable precise correlation of genotype and phenotype, offering a detailed view of cellular heterogeneity and unlocking novel biological insights.
Multiomic approaches are transforming oncology by offering deeper insights into cellular heterogeneity, a key driver of cancer evolution, treatment responses, and resistance to therapy (Chakraborty et al., 2024). In this field, circulating tumor cells (CTCs) have become a key focus for researchers investigating the mechanisms of cancer metastasis and those advancing non-invasive liquid biopsy strategies for early cancer detection, patient stratification, and real-time disease monitoring. A recent analysis of CTCs from prostate cancer patients and associated cell models used genomic, transcriptomic, and immunohistochemical (IHC) approaches to identify correlations between polyploidization and novel RNA and protein biomarkers linked to chemotherapy resistance and cancer recurrence (Schmidt et al., 2024).
To advance oncology biomarker discovery, the Embgenix GT-omics Kit combines the reproducibility of PicoPLEX whole-genome amplification (WGA) with the exceptional sensitivity of SMART cDNA synthesis and a streamlined NGS library preparation method. Accompanying bioinformatics tools provide a seamless end-to-end solution for single-cell genome and transcriptome analysis.
Here, we present data demonstrating that the Embgenix GT-omics workflow performs comparably to standalone single-cell DNA-seq and RNA-seq methods, enabling applications in cancer and CTC research, including combined copy number variation (CNV) and differential expression analysis.
Figure 1.Embgenix GT-omics assay workflow. Polyadenylated mRNA is captured using oligo(dT)-coated magnetic beads, while gDNA remains in the supernatant. cDNA is synthesized from bead-bound mRNA, and gDNA undergoes whole-genome amplification in separate tubes, followed by library preparation. RNA-seq and DNA-seq libraries are pooled, sequenced, and analyzed using Embgenix Analysis Software and Cogent NGS software.
Results
Reliable CNV calling from single cells
Figure 2. Single-cell CNV analysis using the Embgenix GT-omics Kit and Embgenix Analysis Software. Panel A. CNV plots displaying normalized counts of sequencing reads mapped to 1 Mb bins across each chromosome for six replicates of the GM08331 cell line. A segmental loss at chromosome 13 was identified in all replicates. Panel B. Automated sample classification, karyotype calls, and corresponding QC metrics for each replicate. Number of total reads denotes the total number of sequencing reads submitted for analysis, while Number of informative reads represents the quantities of sequencing reads that were successfully mapped and used for CNV analysis. Derivative log ratio spread (DLRS) quantifies signal noise, serving as a key metric for evaluating data suitability for accurate CNV analysis. QC status indicates whether a sample met predefined thresholds for informative reads (%) and DLRS, ensuring data quality for downstream analysis.
The Embgenix GT-omics Kit was used to generate DNA-seq and RNA-seq libraries from six single-cell replicates derived from the well-characterized lymphoblastoid cell line, GM08331. DNA-seq libraries were sequenced at a depth of 1.5 million paired-end reads per cell and analyzed using Embgenix Analysis Software (Figure 2, Panels A and B). Sequencing data from all replicates met the software's QC thresholds, and the assay accurately determined the cell line's karyotype—detecting a known segmental aneuploidy, a 12.1 Mb loss at chromosome 13, in each replicate. These results demonstrate that single-cell DNA-seq data generated with the Embgenix GT-omics Kit exhibit accuracy and reproducibility comparable to standalone approaches, enabling reliable CNV analysis with minimal background noise and a low likelihood of false positive calls.
RNA-seq libraries were sequenced and analyzed at a depth of 4.0 x 106 paired-end reads per cell using the Cogent NGS Analysis Pipeline (CogentAP). The distribution of reads mapping to exonic, intronic, intergenic, ribosomal RNA, and mitochondrial regions was consistent across replicates. Each replicate yielded over 11,150 unique detected genes, with an average of 12,195 genes identified across all six replicates (Figure 3). These results highlight the high quality and sensitivity of single-cell transcriptome data generated using the Embgenix GT-omics Kit.
High-quality single-cell transcriptome data
Figure 3. Single-cell transcriptome analysis using the Embgenix GT-omics Kit and Cogent software. Single-cell RNA-seq data for each of six replicates from the GM08331 cell line. The bar charts depict the distributions of sequencing reads mapped to exonic, intronic, intergenic, ribosomal, and mitochondrial regions for each sample. The number of unique genes detected for each replicate based on mapping of RNA-seq data is shown on the top.
While the analysis of GM08331 cells (Figures 2 and 3) provided insights into data quality obtained with the Embgenix GT-omics Kit and its suitability for CNV characterization, it did not directly assess the accuracy or reproducibility of the corresponding transcriptome data. To address this, synthetic RNA reference standards from the External RNA Controls Consortium (ERCC) were utilized. These standards consisted of 92 distinct polyadenylated RNA species with known sequences, each present at defined concentrations in one of two formulations (Mix1 or Mix2). To simulate real-world sample processing conditions, ERCC standards were combined with five-cell samples derived from the GM05067 cell line at two different dilution levels (low and high), generating four distinct RNA concentration conditions. Samples were processed in triplicate using the Embgenix GT-omics Kit and resulting RNA-seq libraries were sequenced and analyzed at a depth of 4 x 106 reads per sample using a custom analysis pipeline. Measured RNA quantities were compared to expected values to assess the correlation across the four concentration levels, validating the accuracy and reproducibility of the transcriptomic data obtained.
Accurate transcript quantitation at levels relevant to single-cell analysis
Figure 4. Assessingtranscriptomic accuracy with synthetic RNA spike-in standards. Panel A. Pearson correlation matrix illustrating the relationships between measured quantities of 92 synthetic ERCC spike-in RNA species across two formulations (Mix1 vs. Mix2). Each spike-in mix was added to GM05067 cells at two dilution levels (low vs. high) in triplicate or measured directly using the Embgenix GT-omics Kit and a custom analysis pipeline. Panel B. Scatter plot comparing the measured fold changes of 92 ERCC spike-in RNA species (Y-axis) to their expected fold changes (X-axis) at four different concentrations. Each dot represents an individual RNA species, with spiked-in concentrations indicated by the color gradient to the right of the plot. The plot was generated using the Embgenix GT-omics Kit and a custom analysis pipeline.
The resulting profiles for synthetic ERCC transcripts exhibited strong linear correlations (R > 0.99 for intra-mix comparisons and R > 0.93 for comparisons with expected ERCC values), as visualized in a Pearson correlation matrix (Figure 4, Panel A), highlighting the high reproducibility provided by the GT-omics assay. Comparison of measured vs. expected ERCC transcript counts at each of four concentrations (Figure 4, Panel B) demonstrated the assay’s ability to detect transcripts at an abundance of 100 copies or more with a sequencing depth of 4 x 106 reads. While measured fold changes did not precisely match expected values, particularly at lower abundance levels, a clear correlation was observed between expected and measured fold changes. These results highlight the ability of the Embgenix GT-omics Kit’s to generate reliable data for single-cell differential expression analysis, including quantification of low-abundance transcripts.
Conclusions
Using a previously characterized lymphoblastoid cell line and synthetic RNA reference standards as benchmarks, we demonstrate that the Embgenix GT-omics Kit enables simultaneous genomic and transcriptomic profiling from single cells with sensitivity, accuracy, and reproducibility comparable to standalone methods. This is accomplished through a simple, streamlined workflow compatible with standard molecular biology equipment.
The simultaneous analysis of genomic and transcriptomic profiles at the single-cell level is particularly valuable in cancer research, with CTC studies serving as a prime example. Analysis of genomic alterations such as CNVs provide critical insights into subclonal architecture, aiding in the definition of sublineages and assessment of tumor evolution within heterogeneous CTC populations. Integrating these data with transcriptomic profiling and differential gene expression analysis facilitates the identification of biomarkers and molecular mechanisms linked to disease progression and metastasis, including epithelial-mesenchymal transition (EMT) status, treatment responses, and therapy resistance. A deeper understanding of these processes and their associated biomarkers would support the development of novel therapeutics in addition to non-invasive diagnostic and prognostic assays.
Methods
Samples
Cells isolated from the GM08331 and GM05067 lymphoblastoid cell lines (Coriell Institute of Medical research) were included in the study. Single cells were sorted into a 96-well PCR plate containing 1X Lysis Buffer from the Embgenix GT-omics Kit using the Sony SH800S Cell Sorter. To assess the transcriptomic accuracy of the method, ERCC ExFold RNA Spike-In Mixes (ThermoFisher Scientific; Cat. # 4456739) containing blends of 92 different synthetic RNA species in two different formulations were added to the GM05067 cell samples at two different concentrations.
Embgenix GT-omics assay workflow
Lysates were processed according to the Embgenix GT-omics Kit protocol shown in Figure 1. cDNA and WGA products were subjected to enzymatic fragmentation, adapter ligation and PCR amplification to yield RNA-seq and DNA-seq libraries, respectively. Following clean-up, RNA-seq and DNA-seq libraries were pooled and sequenced. The data was analyzed concurrently to identify genomic variants and perform differential expression analysis.
Sequencing
All libraries were sequenced on an Illumina® NextSeq 550 using 2 x 75 bp paired-end reads with a NextSeq 500/550 Mid Output v2.5 Kit (150 Cycles; Cat. # 20024904).
Analysis software
Following library preparation and sequencing, DNA-seq data were downsampled to 1.5 x 106 paired-end reads per cell and copy number analysis was performed using Embgenix Analysis Software. RNA-seq data were downsampled to 4.0 x 106 paired-end reads per cell and analyzed using the Cogent NGS Analysis Pipeline (CogentAP). For analysis of sequencing data from the ERCC spike-in mixes, Illumina adapter sequences were trimmed using Trimmomatic, and reads were aligned with Bowtie2. Mapped read counts were then used to calculate fold changes between samples using a custom analysis pipeline.
References
Chakraborty, S. et al. Multi-OMICS approaches in cancer biology: New era in cancer therapy. BBA Molecular Basis of Disease1870, 5, 167120 (2024)
Schmidt, M.J. et al. Polyploid cancer cells reveal signatures of chemotherapy resistance. Oncogene44, 439–449 (2025)