Cogent NGS Immune Profiler Software notices
The issues described below for the Cogent NGS Immune Profiler Software are ones that have been seen or reported. If you encounter other errors using Immune Profiler or are unable to resolve the problem using the suggestions below, please:
- Capture a screenshot of the text of the error you see on the screen.
- Gather the relevant log file(s). Refer to the FAQ entry below or the Cogent NGS Immune Profiler User Manual, Appendix B, for more detailed information about the logs.
- Send the screenshot and gathered log file(s) to Technical Support along with a brief description of the issue.
Software issues FAQs
Common error messages on Linux
SyntaxError: invalid syntax
Problem description
If calling the script at the command line in the format of python immune_profiler.py [variables], a message "SyntaxError: invalid syntax" is seen.
Cause
Python2 is the default version on the machine.
Fix
Use the syntax: python3 immune_profiler.py [variables]
immune_profiler.py: error: the following arguments are required
Problem description
The immune_profiler.py script is returning the message similar to the following:
"immune_profiler.py: error: the following arguments are required: [variables]"
Cause
Immune Profiler has five required arguments:
Flag | Parameter input |
-f | fastq_dir |
-m | meta_file |
-o | output_name |
-r | receptor_type |
-t | target_region |
If any of the required argument is missing, this error is seen indicating which argument is missing.
Fix
Make sure to specify all five required arguments for the script command. Please see the Quick Start Guide or User Manual if more information is needed.
IndexError: list index out of range
Problem description
The immune_profiler.py script is returning the message:
"IndexError: list index out of range"
Cause
This error might be seen if the argument value of the -o parameter is given in a path structure + output name string. The output name specified by -o is supposed to be an alphanumeric string with no special characters other than hyphens.
Fix
Modify the value of the -o parameter to be only a string with no directory path in it.
[ERROR] Found duplicate sample ID
Problem description
The immune_profiler.py script is returning the message:
"[ERROR] Found duplicate sample ID: [variable]"
Cause
A duplicate sample ID value was detected in the metadata file.
Fix
Use the ID value stated in the error message to identify the duplicate in the metadata CSV file and update it to a value that would not duplicate any other values in the file.
[ERROR] Found _ in sample ID
Problem description
The immune_profiler.py script is returning the message:
"[ERROR] Found _ in sample ID: [variable]"
Cause
A sample ID in the metadata CSV file was found to include an underscore character (_).
Fix
Edit the metadata file to change the sample ID name to remove the underscore. Valid characters include alphanumerics or hyphens, no other special characters.
[WARN] Delete empty sample ID in meta file
Problem description
The immune_profiler.py script is returning the message:
"[WARN] Delete empty sample ID in meta file: [variable]"
Cause
The script found one or more empty lines in the metadata CSV file.
Fix
No action is required; this is a warning message only. The script will, in addition to the warning, also delete the empty line from the file and report on the actual number of samples to process (non-empty rows).
[ERROR] The input FASTQ: [filename] is not found.
Problem description
The immune_profiler.py script is returning the message:
"[ERROR] The input FASTQ: [filename] is not found."
Cause
The FASTQ file name(s) configured in the metadata CSV file do not match or do not exist in the directory specified by the path parameter of the -f flag of the script.
Fix
- If the path statement to the FASTQ files is incorrect in the script command, correct the path value.
- Check the name of the FASTQ file(s) listed in the metadata file, and update the FASTQ file name(s) to exactly match between the metadata file configuration and the actual file name.
Common error messages on MacOSX
[Error] Parameter definition not found/needs to be defined/selected
Problem description
The user interface status screen is showing the message:
"Finished
An error has occurred."
and one or more of the following explanations:
- "[Error] The fastq directory is not found"
- "[Error] The metadata file is not found"
- "[ERROR] Please define output name"
- "[ERROR] Please select receptor type"
- "[ERROR] Please select target region"
Cause
Immune profiler has five required arguments:
- FASTQ directory
- Metafile location
- Output name
- Receptor type
- Target region
These messages may be seen if any of the required fields are not configured in the user interface.
Fix
Make sure to configure all five required arguments on the input fields screen. Please see the Quick Start Guide or User Manual if more information is needed.
[ERROR] Found duplicate sample ID
Problem description
The user interface status screen is showing the message:
"[ERROR] Found duplicate sample ID: [variable]
[INFO] Immune profiler analysis ends with error"
Cause
A duplicate sample ID value was detected in the metadata file.
Fix
Use the ID value stated in the error message to identify the duplicate in the metadata CSV file, and update it to a value that would not duplicate any other values in the file.
[ERROR] Found _ in sample ID
Problem description
The user interface status screen is showing the message:
"[ERROR] Found _ in sample ID: [variable]
[INFO] Immune profiler analysis ends with error"
Cause
A sample ID in the metadata CSV file was found to include an underscore character (_).
Fix
Edit the metadata file to change the sample ID name to remove the underscore. Valid characters include alphanumerics or hyphens, no other special characters.
[WARN] Delete empty sample ID in meta file
Problem description
The user interface status screen is showing the message:
"[WARN] Delete empty sample ID in meta file: [variable]
[WARN] After deletion, ## samples to process
[WARN] Please check and modify meta file if the sample number is incorrect
[WARN] Check if there is any extra empty line in meta file"
Cause
Profiler found one or more empty lines in the metadata CSV file.
Fix
No action is required; this is a warning message only. The program will, in addition to the warning, also delete the empty line from the file and report on the actual number of samples to process (non-empty rows).
[ERROR] The input FASTQ: [filename] is not found.
Problem description
The user interface status screen is showing the message:
"[ERROR] The input FASTQ: [filename] is not found.
Please make sure the FASTQ directory & FASTQ name in metadata file is correct (case sensitive)
Immune profiler error
[INFO] Immune profiler analysis ends with error"
Cause
The specified FASTQ file name(s) configured in the metadata CSV file do not match or do not exist in the directory specified in the required arguments configuration screen.
Fix
- If the path statement to the FASTQ files is incorrect in the script command, correct the path value.
- Check the name of the FASTQ file(s) listed in the metadata file, and update the FASTQ file name(s) to exactly match between the metadata file configuration and the actual file name.
List & description of log files
The table contains the log filename, the location where to find them within the immune_profiler/ directory, and a brief description of the information stored in each. This information is also documented in the Cogent NGS Immune Profiler User Manual.
The string [cdr3|fl] means either CDR3 or Full_length. E.g., run_mixcr_fl.log
Log filename | Sub-folder | Description |
-- | Immune Profiler analysis progress and important notes | |
mig_run_migec.log | run_migec/ | MIGEC analysis progress |
mig_run_migec.error | run_migec/ | MIGEC error messages (if any) |
assemble.log.txt | run_migec/assemble | MIGEC process status |
assemble.cmd.txt | run_migec/assemble | The command call to MIGEC for the assemble function |
checkout.cmd.txt | run_migec/ checkout_all/ | MIGEC analysis commands used |
checkout.log.txt | run_migec/ checkout_all/ | MIGEC analysis progress and findings |
checkout.filelist.txt | run_migec/ checkout_all/ | Documents all intermediate files while MIGEC is processing |
run_mixcr_[cdr3|fl].log | run_mixcr/ | MIXCR analysis progress and important notes NOTE: Different files are created depending on whether the CDR3 or full-length target regions are selected during configuration |
run_mixcr_[cdr3|fl].error | run_mixcr/ | MIXCR error messages (if any) NOTE: Different files are created depending on whether the CDR3 or full-length target regions are selected during configuration |
Data analysis FAQs
How to determine if BCR input data is of low sequencing quality
Example: good sequencing data
In this sequencing run, we loaded a pool of IgM and IgL libraries, from 1,000 ng PBMC RNA input at ~10 million reads per library, on the same flow cell.
As shown in the QC stats table, IgM is the main chain identified in Sample_ID 1000ng-IgM, and IgL is the main chain identified in 1000ng-IgL. The other chains read in the sample are less than 0.1% of the total reads, indicating that index hopping and cross-contamination from library preparation of IgG, IgK, and the nondominant Ig type for the sample (IgL for the IgM sample, IgM for the IgL sample) are below an acceptable level.
Sample_ID | IGG | IGM | IGK | IGL | Short | Undetermined | flc | Total |
1000ng‑IgM | 0 (0.0%) |
8,530,650 (94.8%) |
10 (0.0%) |
1,982 (0.0%) |
0 (0.0%) |
116,026 (1.3%) |
351,332 (3.9%) |
9,000,000 (100.0%) |
1000ng‑IgL | 38 (0.0%) | 1,895 (0.0%) | 15 (0.0%) | 8,456,630 (94.0%) | 0 (0.0%) | 192,472 (2.1%) | 348,950 (3.9%) | 9,000,000 (100.0%) |
Table 3. Example from sample_QC_stats.csv displaying the results of good sequencing data.
In the mapping_stats.csv report, we notice that the “total reads” are roughly equivalent when compared to the value in the sample_QC stats.csv.
Sample type | Total reads | Total MIG | UMI threshold | Number of reads after MIG collapse | Aligned | Pair‑read overlap | Overlapped and aligned | Clonotype count |
1000ng‑IgM_IGM | 8,530,650 | 444,674 | 6 | 122,399 | 103,787 | 105,836 | 93,609 | 53,063 |
1000ng‑IgM_IGK | 10 | 2 | 1 | 2 | 1 | 2 | 1 | 0 |
1000ng‑IgM_IGL | 1,982 | 926 | 1 | 926 | 865 | 711 | 661 | 62 |
1000ng‑IgL_IGG | 38 | 22 | 1 | 22 | 22 | 18 | 18 | 3 |
1000ng‑IgL_IGM | 1,895 | 615 | 1 | 615 | 480 | 273 | 202 | 40 |
1000ng‑IgL_IGK | 15 | 8 | 1 | 8 | 3 | 2 | 2 | 1 |
1000ng‑IgL_IGL | 8,456,630 | 669,152 | 4 | 275,140 | 262,494 | 271,499 | 259,397 | 29,335 |
Table 4. Example from mapping_stats.csv reads displaying the results of good sequencing data.
Example #1:
1000ng-IgM
sample_QC_stats.csv reads: 8,530,650
mapping_stats.csv reads: 8,530,650
Example #2:
1000ng-IgK
sample_QC_stats.csv reads: 8,456,630
mapping_stats.csv reads: 8,456,630
The equivalency of total reads between the two files indicates that few or no reads were discarded during the threshold check by MIGEC during processing and that MIGEC has determined that the input sequencing data is of good quality.
Example: poor sequencing data
In the sequencing results shown below, we loaded a pool of IgG and IgK libraries, from 1,000 ng PBMC RNA input at ~10 million reads per library, on the same flow cell.
As shown in QC stats table, IgG is the main chain identified in Sample_ID 1000ng-IGG, and IgK is the main chain identified in 1000ng-IGK.
Sample_ID | IGG | IGK | IGL | IGM | Short | Undetermined | flc | Total |
1000ng‑IGG | 9,443,333 (94.4%) |
708 (0.0%) |
104 (0.0%) |
0 (0.0%) |
0 (0.0%) |
204,032 (2.0%) |
351,823 (3.5%) |
10,000,000 (100.0%) |
1000ng‑IGK | 5,197 (0.1%) |
8,973,560 (94.7%) |
1 (0.0%) |
2 (0.0%) |
0 (0.0%) |
167,327 (1.8%) |
326,252 (3.4%) |
9,472,339 (100.0%) |
Table 5. Example from sample_QC_stats.csv displaying the results of using poor-quality sequencing data.
In the mapping_stats.csv report, we notice that the "total reads" dropped significantly compared to the value in the sample_QC stats.csv.
Sample type | Total reads | Total MIG | UMI threshold | Number of reads after MIG collapse | Aligned | Pair‑read overlap | Overlapped and aligned | Clonotype count |
1000ng‑IGG_IGG | 4,844,692 | 247,936 | 6 | 74,221 | 66,623 | 68,411 | 63,275 | 5,884 |
1000ng‑IGG_IGK | 403 | 251 | 1 | 251 | 245 | 114 | 109 | 1 |
1000ng‑IGG_IGL | 41 | 21 | 1 | 21 | 15 | 6 | 4 | 1 |
1000ng‑IGK_IGG | 3,219 | 2,018 | 1 | 2,018 | 1,859 | 472 | 396 | 10 |
1000ng‑IGK_IGM | 2 | 2 | 1 | 2 | 2 | 2 | 2 | 1 |
1000ng‑IGK_IGK | 4,618,169 | 655,183 | 3 | 365,885 | 354,951 | 360,270 | 353,187 | 30,486 |
1000ng‑IGK_IGL | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Table 6. Example from mapping_stats.csv reads displaying the results of using poor-quality sequencing data.
Example #3:
1000ng-IGG
sample_QC_stats.csv reads: 9,443,333
mapping_stats.csv reads: 4,844,692
Example #4:
1000ng-IGKs
sample_QC_stats.csv reads: 8,973,560
mapping_stats.csv reads: 4,618,169
This discrepancy in total reads between the two reports is due to MIGEC excluding reads based on the calculated exclusion threshold. The quality of the UMI region is determined to be poor, so the reads are not processed or reported in the mapping stats report.
Profiling human B-cell receptors with SMART technology
Analyze BCR repertoires from human PBMCs or purified B cells.
Human TCRv2 profiling kit for Illumina sequencing
SMARTer TCR profiling with optimized chemistry, UMIs, UDIs, and bioinformatics support for more accurate, reliable clonotype calling and quantification.
Takara Bio USA, Inc.
United States/Canada: +1.800.662.2566 • Asia Pacific: +1.650.919.7300 • Europe: +33.(0)1.3904.6880 • Japan: +81.(0)77.565.6999
FOR RESEARCH USE ONLY. NOT FOR USE IN DIAGNOSTIC PROCEDURES. © 2023 Takara Bio Inc. All Rights Reserved. All trademarks are the property of Takara Bio Inc. or its affiliate(s) in the U.S. and/or other countries or their respective owners. Certain trademarks may not be registered in all jurisdictions. Additional product, intellectual property, and restricted use information is available at takarabio.com.