Cogent NGS Immune Profiler Software notices

The issues described below for the Cogent NGS Immune Profiler Software are ones that have been seen or reported. If you encounter other errors using Immune Profiler or are unable to resolve the problem using the suggestions below, please:

  1. Capture a screenshot of the text of the error you see on the screen.
  2. Gather the relevant log file(s). Refer to the FAQ entry below or the Cogent NGS Immune Profiler User Manual, Appendix B, for more detailed information about the logs.
  3. Send the screenshot and gathered log file(s) to Technical Support along with a brief description of the issue.

Software issues FAQs

Common error messages on Linux

SyntaxError: invalid syntax

Problem description

If calling the script at the command line in the format of python immune_profiler.py [variables], a message "SyntaxError: invalid syntax" is seen.

Cause

Python2 is the default version on the machine.

Fix

Use the syntax: python3 immune_profiler.py [variables]

immune_profiler.py: error: the following arguments are required

Problem description

The immune_profiler.py script is returning the message similar to the following:

"immune_profiler.py: error: the following arguments are required: [variables]"

Cause

Immune Profiler has five required arguments:

Flag Parameter input
-f fastq_dir
-m meta_file
-o output_name
-r receptor_type
-t target_region

If any of the required argument is missing, this error is seen indicating which argument is missing.

Fix

Make sure to specify all five required arguments for the script command. Please see the Quick Start Guide or User Manual if more information is needed.

IndexError: list index out of range

Problem description

The immune_profiler.py script is returning the message:

"IndexError: list index out of range"

Cause

This error might be seen if the argument value of the -o parameter is given in a path structure + output name string.  The output name specified by -o is supposed to be an alphanumeric string with no special characters other than hyphens.

Fix

Modify the value of the -o parameter to be only a string with no directory path in it.

[ERROR] Found duplicate sample ID

Problem description

The immune_profiler.py script is returning the message:

"[ERROR] Found duplicate sample ID: [variable]"

Cause

A duplicate sample ID value was detected in the metadata file.

Fix

Use the ID value stated in the error message to identify the duplicate in the metadata CSV file and update it to a value that would not duplicate any other values in the file.

[ERROR] Found _ in sample ID

Problem description

The immune_profiler.py script is returning the message: 

"[ERROR] Found _ in sample ID: [variable]"

Cause

A sample ID in the metadata CSV file was found to include an underscore character (_).

Fix

Edit the metadata file to change the sample ID name to remove the underscore. Valid characters include alphanumerics or hyphens, no other special characters.

[WARN] Delete empty sample ID in meta file

Problem description

The immune_profiler.py script is returning the message:

"[WARN] Delete empty sample ID in meta file: [variable]"

Cause

The script found one or more empty lines in the metadata CSV file.

Fix

No action is required; this is a warning message only. The script will, in addition to the warning, also delete the empty line from the file and report on the actual number of samples to process (non-empty rows).

[ERROR] The input FASTQ: [filename] is not found.

Problem description

The immune_profiler.py script is returning the message:

"[ERROR] The input FASTQ: [filename] is not found."

Cause

The FASTQ file name(s) configured in the metadata CSV file do not match or do not exist in the directory specified by the path parameter of the -f flag of the script.

Fix

  1. If the path statement to the FASTQ files is incorrect in the script command, correct the path value.
  2. Check the name of the FASTQ file(s) listed in the metadata file, and update the FASTQ file name(s) to exactly match between the metadata file configuration and the actual file name.

Common error messages on MacOSX

[Error] Parameter definition not found/needs to be defined/selected

Problem description

The user interface status screen is showing the message:

"Finished
An error has occurred."

and one or more of the following explanations:

  • "[Error] The fastq directory is not found"
  • "[Error] The metadata file is not found"
  • "[ERROR] Please define output name"
  • "[ERROR] Please select receptor type"
  • "[ERROR] Please select target region"

Cause

Immune profiler has five required arguments:

  1. FASTQ directory
  2. Metafile location
  3. Output name
  4. Receptor type
  5. Target region

These messages may be seen if any of the required fields are not configured in the user interface.

Fix

Make sure to configure all five required arguments on the input fields screen. Please see the Quick Start Guide or User Manual if more information is needed.

[ERROR] Found duplicate sample ID

Problem description

The user interface status screen is showing the message:

"[ERROR] Found duplicate sample ID: [variable]
[INFO] Immune profiler analysis ends with error"

Cause

A duplicate sample ID value was detected in the metadata file.

Fix

Use the ID value stated in the error message to identify the duplicate in the metadata CSV file, and update it to a value that would not duplicate any other values in the file.

[ERROR] Found _ in sample ID

Problem description

The user interface status screen is showing the message:

"[ERROR] Found _ in sample ID: [variable]
[INFO] Immune profiler analysis ends with error"

Cause

A sample ID in the metadata CSV file was found to include an underscore character (_).

Fix

Edit the metadata file to change the sample ID name to remove the underscore. Valid characters include alphanumerics or hyphens, no other special characters.

[WARN] Delete empty sample ID in meta file

Problem description

The user interface status screen is showing the message:

"[WARN] Delete empty sample ID in meta file: [variable]
[WARN] After deletion, ## samples to process
[WARN] Please check and modify meta file if the sample number is incorrect
[WARN] Check if there is any extra empty line in meta file"

Cause

Profiler found one or more empty lines in the metadata CSV file.

Fix

No action is required; this is a warning message only. The program will, in addition to the warning, also delete the empty line from the file and report on the actual number of samples to process (non-empty rows).

[ERROR] The input FASTQ: [filename] is not found.

Problem description

The user interface status screen is showing the message:

"[ERROR] The input FASTQ: [filename] is not found.
Please make sure the FASTQ directory & FASTQ name in metadata file is correct (case sensitive)
Immune profiler error
[INFO] Immune profiler analysis ends with error"

Cause

The specified FASTQ file name(s) configured in the metadata CSV file do not match or do not exist in the directory specified in the required arguments configuration screen.

Fix

  1. If the path statement to the FASTQ files is incorrect in the script command, correct the path value.
  2. Check the name of the FASTQ file(s) listed in the metadata file, and update the FASTQ file name(s) to exactly match between the metadata file configuration and the actual file name.

List & description of log files

The table contains the log filename, the location where to find them within the immune_profiler/ directory, and a brief description of the information stored in each. This information is also documented in the Cogent NGS Immune Profiler User Manual.

The string [cdr3|fl] means either CDR3 or Full_length. E.g., run_mixcr_fl.log

Log filename Sub-folder Description
_immune_profiler.log -- Immune Profiler analysis progress and important notes
mig_run_migec.log run_migec/ MIGEC analysis progress
mig_run_migec.error run_migec/ MIGEC error messages (if any)
assemble.log.txt run_migec/assemble MIGEC process status
assemble.cmd.txt run_migec/assemble The command call to MIGEC for the assemble function
checkout.cmd.txt run_migec/ checkout_all/ MIGEC analysis commands used
checkout.log.txt run_migec/ checkout_all/ MIGEC analysis progress and findings
checkout.filelist.txt run_migec/ checkout_all/ Documents all intermediate files while MIGEC is processing
run_mixcr_[cdr3|fl].log run_mixcr/ MIXCR analysis progress and important notes
NOTE: Different files are created depending on whether the CDR3 or full-length target regions are selected during configuration
run_mixcr_[cdr3|fl].error run_mixcr/ MIXCR error messages (if any)
NOTE: Different files are created depending on whether the CDR3 or full-length target regions are selected during configuration

Data analysis FAQs

How to determine if BCR input data is of low sequencing quality

Example: good sequencing data

In this sequencing run, we loaded a pool of IgM and IgL libraries, from 1,000 ng PBMC RNA input at ~10 million reads per library, on the same flow cell.

As shown in the QC stats table, IgM is the main chain identified in Sample_ID 1000ng-IgM, and IgL is the main chain identified in 1000ng-IgL. The other chains read in the sample are less than 0.1% of the total reads, indicating that index hopping and cross-contamination from library preparation of IgG, IgK, and the nondominant Ig type for the sample (IgL for the IgM sample, IgM for the IgL sample) are below an acceptable level.

Sample_ID IGG IGM IGK IGL Short Undetermined flc Total
1000ng‑IgM 0
(0.0%)
8,530,650
(94.8%)
10
(0.0%)
1,982
(0.0%)
0
(0.0%)
116,026
(1.3%)
351,332
(3.9%)
9,000,000
(100.0%)
1000ng‑IgL 38 (0.0%) 1,895 (0.0%) 15 (0.0%) 8,456,630 (94.0%) 0 (0.0%) 192,472 (2.1%) 348,950 (3.9%) 9,000,000 (100.0%)

Table 3. Example from sample_QC_stats.csv displaying the results of good sequencing data.

In the mapping_stats.csv report, we notice that the “total reads” are roughly equivalent when compared to the value in the sample_QC stats.csv.

Sample type Total reads Total MIG UMI threshold Number of reads after MIG collapse Aligned Pair‑read overlap Overlapped and aligned Clonotype count
1000ng‑IgM_IGM 8,530,650 444,674 6 122,399 103,787 105,836 93,609 53,063
1000ng‑IgM_IGK 10 2 1 2 1 2 1 0
1000ng‑IgM_IGL 1,982 926 1 926 865 711 661 62
1000ng‑IgL_IGG 38 22 1 22 22 18 18 3
1000ng‑IgL_IGM 1,895 615 1 615 480 273 202 40
1000ng‑IgL_IGK 15 8 1 8 3 2 2 1
1000ng‑IgL_IGL 8,456,630 669,152 4 275,140 262,494 271,499 259,397 29,335

Table 4. Example from mapping_stats.csv reads displaying the results of good sequencing data.

Example #1:
1000ng-IgM
sample_QC_stats.csv reads: 8,530,650
mapping_stats.csv reads: 8,530,650

Example #2:
1000ng-IgK
sample_QC_stats.csv reads: 8,456,630
mapping_stats.csv reads: 8,456,630

The equivalency of total reads between the two files indicates that few or no reads were discarded during the threshold check by MIGEC during processing and that MIGEC has determined that the input sequencing data is of good quality.

Example: poor sequencing data

In the sequencing results shown below, we loaded a pool of IgG and IgK libraries, from 1,000 ng PBMC RNA input at ~10 million reads per library, on the same flow cell.

As shown in QC stats table, IgG is the main chain identified in Sample_ID 1000ng-IGG, and IgK is the main chain identified in 1000ng-IGK.

Sample_ID IGG IGK IGL IGM Short Undetermined flc Total
1000ng‑IGG 9,443,333
(94.4%)
708
(0.0%)
104
(0.0%)
0
(0.0%)
0
(0.0%)
204,032
(2.0%)
351,823
(3.5%)
10,000,000
(100.0%)
1000ng‑IGK 5,197
(0.1%)
8,973,560
(94.7%)
1
(0.0%)
2
(0.0%)
0
(0.0%)
167,327
(1.8%)
326,252
(3.4%)
9,472,339
(100.0%)

Table 5. Example from sample_QC_stats.csv displaying the results of using poor-quality sequencing data.

In the mapping_stats.csv report, we notice that the "total reads" dropped significantly compared to the value in the sample_QC stats.csv.

Sample type Total reads Total MIG UMI threshold Number of reads after MIG collapse Aligned Pair‑read overlap Overlapped and aligned Clonotype count
1000ng‑IGG_IGG 4,844,692 247,936 6 74,221 66,623 68,411 63,275 5,884
1000ng‑IGG_IGK 403 251 1 251 245 114 109 1
1000ng‑IGG_IGL 41 21 1 21 15 6 4 1
1000ng‑IGK_IGG 3,219 2,018 1 2,018 1,859 472 396 10
1000ng‑IGK_IGM 2 2 1 2 2 2 2 1
1000ng‑IGK_IGK 4,618,169 655,183 3 365,885 354,951 360,270 353,187 30,486
1000ng‑IGK_IGL 0 0 1 0 0 0 0 0

Table 6. Example from mapping_stats.csv reads displaying the results of using poor-quality sequencing data.

Example #3:
1000ng-IGG
sample_QC_stats.csv reads: 9,443,333
mapping_stats.csv reads: 4,844,692

Example #4:
1000ng-IGKs
sample_QC_stats.csv reads: 8,973,560
mapping_stats.csv reads: 4,618,169

This discrepancy in total reads between the two reports is due to MIGEC excluding reads based on the calculated exclusion threshold. The quality of the UMI region is determined to be poor, so the reads are not processed or reported in the mapping stats report.