CogentAP FAQs
Refer to the FAQs below for best practices and helpful tips on Cogent NGS Analysis Pipeline (CogentAP).
If you're ready to start analyzing your data, download the CogentAP software now.
FAQs:
Introduction
What is Cogent NGS Analysis Pipeline?
Cogent NGS Analysis Pipeline (CogentAP) is a command line interface tool using Nextflow as a pipeline manager. It is a software for analysis of single-cell RNA-seq, bulk RNA-seq, and single-cell DNA-seq data generated using Takara Bio next-generation sequencing (NGS) reagent kits and Illumina® sequencing platforms. For a complete list of compatible kits, see Table 1 of the Cogent NGS Analysis Pipeline User Manual.
CogentAP processes FASTQ files and outputs a .rds object. For RNA-seq data, processing involves trimming, alignment, and quantification. For DNA-seq data, it involves trimming, alignment, and copy number variant (CNV) analysis. The .rds output from CogentAP can be input into Cogent NGS Discovery Software (CogentDS) for additional data processing and visualization.
Installation and running the tool
Which operating systems are compatible with CogentAP?
CogentAP is designed to run on a Linux server. The following versions are supported:
- CentOS 8 or higher
- Red Hat 8 or higher
- Ubuntu 18.04 or higher
Can I use HPC for installing and running CogentAP?
Current support for CogentAP is on a single-node Linux server. While CogentAP can be run on a high-performance computing (HPC) system, it has not been tested and is hence currently not supported.
Does CogentAP support parallel processing?
CogentAP runs on a single node. Module-specific (e.g., for STAR or Salmon) memory and CPU settings are defined in
COGENT_AP_HOME/config/nextflow/process.config
and used when launching each module in the CogentAP workflow. During execution, Nextflow detects the available system cores and automatically parallelizes tasks across them. This process is constrained by the specified resource limits.
I have the older version of CogentAP. Should I uninstall it before installing the latest version?
It is advisable to uninstall the older version of CogentAP before installing the newer version to avoid version conflicts for the various required packages.
Getting started
Does CogentAP support short-read or long-read data?
CogentAP supports short-read sequencing data from Illumina’s MiniSeq®, MiSeq®, NextSeq®, HiSeq®, and NovaSeq® platforms. For RNA-seq, short reads of 50, 75, 100, and 150 bp can be used. For DNA-seq, analysis is restricted to read lengths of 75 and 150 bp. CogentAP does not support long-read data from PacBio or Oxford Nanopore technologies.
Can I input BCL files in CogentAP?
CogentAP takes raw FASTQ files with index sequences in the header as input. It does not take binary base call (BCL) files as input. You will need to convert your BCL files to FASTQ using either bcl2fastq or BCL Convert. The resulting raw R1 and R2 (undetermined) files can be input into CogentAP.
Where can I find the dummy sample sheet for use with bcl2fastq or BCL Covert?
The dummy sample sheet for use with bcl2fastq and BCL Convert (for NextSeq 2000 and NovaSeq data) is available in the config folder of the directory where CogentAP is installed.
In the dummy sample sheet, the read cycle is mentioned as 76. However, I am doing a read length of 150 bp. Do I need to change the value of 76 to 150 in the sample sheet to reflect my sequencing run?
When using Bcl2fastq, it is not necessary to change the parameters in the dummy sample sheet. It uses XML files from sequencing run data to estimate sequencing parameters.
However, in the case of BCL Convert, the dummy sample sheet SampleSheet_dummy_bclconvert_NovaSeq.csv must exactly match the parameters from the RunInfo.xml file to avoid errors.
The sequencing core facility at my institute provided me with demultiplexed FASTQ files instead of BCL files. Can I bypass the demultiplexer (demuxer) step and move straight to the analyze step?
Input into CogentAP consists of a pair of raw, non-demultiplexed FASTQ files. These are demultiplexed by CogentAP into barcode-level gzipped FASTQ files. The demultiplexer also extracts the barcode and writes it into FASTQ files at the end of the read name. The modified and demultiplexed files serve as input into CogentAP analyzer. Files demultiplexed by any other software would not be recognized by CogentAP.
If your FASTQ files are already demultiplexed by Illumina barcodes, you need to combine all R1 and R2 separately using the command below:
cat *R1* > combo_R1.fastq.gz
cat *R2* > combo_R2.fastq.gz
The cat command assumes that all of your FASTQ files are in the same (current) location. Exact syntax may vary if FASTQ files for separate samples are in separate locations. Using the combo FASTQ files generated through concatenation and the sample sheet from Illumina, demux can be performed in CogentAP. In such a case, please use the --random_pick flag to ensure that all barcodes are represented.
I have a single sample and do not need to demux. Can I directly input the single set of files and proceed with CogentAP analyze?
During the process of demultiplexing, CogentAP demultiplexer extracts the barcode and writes it into FASTQ files at the end of the read name. The modified files serve as input to the CogentAP analyzer. In the case of a single barcode, in the absence of demux, the file modification would not occur and CogentAP analyze would not function correctly.
Can I perform RNA-seq analysis for a single barcode using CogentAP?
Seurat object creation fails when there is a single barcode. A single sample (for bulk) or a single barcode (for single cell) therefore cannot be run in CogentAP. The downstream analysis with CogentDS also requires a minimum of 20 barcodes.
What is the format of a well list used during demux?
For the demux step, you will need to use the undetermined paired-end raw FASTQ files and a well list. A well list is essentially a 2-column tab-separated text file with sample names in one column and the i7+i5 index in the other column, as shown in the example below:
Sample, Barcode
WT, CGTTGGTT+AACCGGTT
WT, CGTTGGTT+TCTAGGTT
After demux, why are all my reads going into undetermined FASTQ?
If your index barcodes in the well list are correct, CogentAP demux will successfully demultiplex the raw data, allowing for one mismatch in the barcodes, into sample-barcoded FASTQ files. If, however, the barcodes are incorrect, you might find all of your reads going into the Undetermined FASTQ category. Please check the correctness of barcode index sequences to ensure that they correspond to those used in the experiment.
Do I need to manually set the index orientation when starting demux?
Orientation of the index sequences depends on the sequencer used. CogentAP will automatically detect i7/i5 orientation and there is no need to manually set it. However, parameters --i7_rc and --i5_rc can be set to false to override auto-detection.
I have single-end data. Can I duplicate the single-end read and create pseudo paired-end data to input into CogentAP?
CogentAP does not support single-end data. A pseudo paired-end read will not be recognized.
Not all of my barcodes are represented after running "rna demux". FASTQ files have been generated for only 2 of the 4 files.
CogentAP estimates barcodes by reading 200 million random reads. However, if you had already demuxed FASTQ files and had concatenated them for input into CogentAP, then your reads would be sorted. In this case, if your first 2 barcodes account for approximately 200 million reads, the last 2 barcodes will not be estimated. In order for demux to capture all of the barcodes (especially low-abundance barcodes), you need to select --random_pick parameter during CogentAP demux. The --random_pick parameter randomly picks reads for barcode selection and is disabled by default.
What is a dry run in the demux step for Shasta Total RNA-Seq experiments?
In Shasta Total RNA-Seq experiments, due to high throughput, a dry run is required to identify barcodes which can be taken for further downstream analysis. This is accomplished by using the --dry_run command, which outputs estimated counts for all barcodes without writing demultiplexed FASTQ files. The estimated counts can then be used in CogentDS to generate a barcode rank plot or a knee plot. Based on the inflection point of the knee plot, a decision can be made to either use a specified number of barcodes, or retain/discard barcodes depending on a specified number of reads, for the demux step in CogentAP.
How do I optimize the demux process?
For the demux workflow, the number of CPUs used on a single node (n_processes) is by default set to 15. It can be changed in the COGENT_AP_HOME/config/nextflow/params.config file to a higher number based on your system, or as a command line parameter while running cogent demux using the flag ‘--n_processes’.
Internally, these CPUs are further divided into workers and writers, using the formula:
n_workers = n_processes - n_writers - 1
The workflow aims for an approximately 1:1 split between workers and writers. n_workers being too low and n_writers being too high will result in slow barcode processing. Similarly, demultiplexed file writing will be slow if n_writers is too low and n_workers is too high. ‘n_writers’ can be changed in COGENT_AP_HOME/config/nextflow/params.config or can also be specified as a command line parameter while running the demux command using the flag --n_writers.
Is the whole-genome analysis supported by CogentAP specific to single cells?
Yes. The Shasta Whole-Genome Amplification Kit supported by CogentAP is specific for single-cell analysis.
Analysis
Which genomes are supported by CogentAP?
Pre-indexed human (hg38) and mouse (mm39) genome builds are supported. These are downloaded and installed during CogentAP setup. Custom genomes for other organisms can also be added. Refer to “Adding a genome build” section of the manual for more details. Immune profiling and gene fusion functionalities are not supported for custom genomes.
Does CogentAP analyze retain intermediate files produced during the analysis step?
Intermediate files (*.fastq, *.bam, count_matrices, *.sf) are stored in a standard RNA-seq analysis workflow. However, intermediate files are not stored while doing Shasta Total RNA-Seq analysis due to the large sizes involved and increased processing times. You can use --keep_intermediate option for Shasta Total RNA-Seq to get default RNA-seq analyze workflow outputs.
How is the percent rRNA estimated and ribodepletion done in CogentAP?
Ribodepletion is done using SortMeRNA. The analyze stats file gives the total number of rRNA reads and the percent is given in the HTML output report.
Is ribodepletion a default parameter or does it have to be explicitly specified?
Ribodepletion is enabled for all RNA-seq kits by default, except for Shasta Total RNA-Seq. It can be enabled manually using the --ribodepletion true option.
Are the analyze stats computed considering rRNA?
The analyze stats are generated after filtering out the rRNAs using the SortMeRNA algorithm.
How is the genome alignment done?
For RNA-seq, alignment is done using STAR aligner. STAR indexing creates both gene-and transcript-based maps, and during CogentAP alignment, generates .Aligned.out.bam and .Aligned.toTranscriptome.out.bam files.
For DNA-seq experiments, Bowtie2 is used for alignment.
For UMI-based analysis, what information does the deduplicated BAM files contain?
BAM files (transcriptome BAM) produced after STAR alignment are processed using Samtools to extract and append unique molecular identifier (UMI) tags to the reads. Deduplicated BAM files are generated after deduplication of the STAR-aligned BAM files with UMI tools. Deduplicated BAM files contain a single representative read for each UMI and USS (Unique Start Stop position).
Is there an appropriate method to normalize the deduplicated BAM files and make the samples comparable?
The deduplicated BAM files are taken up by salmon quant to output raw count matrices that do not incorporate normalization. To do normalization, export the .rds object from CogentAP and import into CogentDS, where various normalization options are offered.
I need Reads Per Kilobase Million (RPKM) values. Does CogentAP calculate RPKM?
CogentAP uses Salmon for transcript-level quantification and summarizes the data to gene-level counts. It does not support RPKM normalization. Downstream of CogentAP, CogentDS accepts .rds or .csv files containing read count data and provides normalization options for both single-cell and bulk experiments.
If RPKM values are required, they can be computed using the formula:
RPKM = NumReads/(geneLength/1000 * totalNumReads/1,000,000)
Where:
- NumReads = number of reads mapped to a gene
- geneLength = length of the gene
- totalNumReads = sum of the number of mapped reads of a sample
What is the resolution of CNV detection using Gingko in CogentAP?
The size range would be determined by the bin size selected for analysis and the minimum number of bins per segment. The CogentAP pipeline supports 500 kb and 1 Mb bins, and at least 5 consecutive bins are required for a CNV to be detected. Thus, the smallest size the pipeline can theoretically detect is 2.5 Mb. The default is 5 Mb.
Advanced questions
Our company doesn’t have an Anaconda license. Is it possible to install the software without Anaconda?
Miniforge is a minimal installer for conda (an open source package and environment management software) that is lighter than Anaconda and uses conda-forge as the primary package channel (instead of the Anaconda “default” channel). Visit https://github.com/conda-forge/miniforge for instructions on how to install miniforge. Miniforge will install different dependencies from either conda-forge or bioconda channels to set up CogentAP. You can follow the manual for more instructions.
The CogentAP user manual recommends uninstalling existing conda and installing a new version before installing CogentAP. Since I need conda for my other applications, can I bypass this requirement?
This would involve setting up the bash environment and removing all paths related to the existing conda installation. Please contact bioinformatics_support@takarabio.com for more details.
Get started
Product information
Cogent NGS Analysis Pipeline takes input data from sequencing and a well-list file and outputs an HTML report and other files, such as a gene matrix, to continue further analysis.
Takara Bio USA, Inc.
United States/Canada: +1.800.662.2566 • Asia Pacific: +1.650.919.7300 • Europe: +33.(0)1.3904.6880 • Japan: +81.(0)77.565.6999
FOR RESEARCH USE ONLY. NOT FOR USE IN DIAGNOSTIC PROCEDURES. © 2026 Takara Bio Inc. All Rights Reserved. All trademarks are the property of Takara Bio Inc. or its affiliate(s) in the U.S. and/or other countries or their respective owners. Certain trademarks may not be registered in all jurisdictions. Additional product, intellectual property, and restricted use information is available at takarabio.com.

