Seeker Bioinformatics Solutions

The comprehensive Seeker Bioinformatics Solutions include the Seeker Primary Analysis Pipeline and the Secondary Data Analysis Workflow. 

Ways to run Seeker Bioinformatics Solutions

  • Local: via installation on local workstations or clusters. Access installation and usage instructions by completing the form below.
  • Cloud: via cloud analysis on the Takara Bio Spatial Bioinformatics Portal.

The Seeker Primary Analysis Pipeline processes FASTQ files generated from Seeker sequencing libraries for high resolution spatial transcriptomic analysis in a single, efficient package. The pipeline reconstructs a spatial map of gene expression as well as provides quality control and preliminary cluster and spatial variable gene analyses in a simple .html report for initial data interpretation of the Seeker experiment.

The Seekker Primary Analysis Pipeline requires:

  • Paired end FASTQ files (without adapter trimming) generated from sequenced libraries 
  • Tile bead barcode file for the Seeker tile used
  • Sample sheet with reference genome for the species surveyed associating sample metadata with aforementioned inputs

The Secondary Data Analysis workflow is a summary of popular tools developed by the scientific community to process the html output of the Trekker Primary Analysis Pipeline and Seeker Primary Analysis Pipeline. This workflow provides guidelines specifically designed for Trekker and Seeker data.

More Information

Supported operating systems

Linux platforms with ≥ 256 GB RAM and 32 cores.

IMPORTANT: The pipeline does NOT support MacOS or Windows operating systems.

Additional third-party software dependencies*

The Seeker bioinformatics pipeline is written in Nextflow. Its dependencies can be executed through either Singularity (Apptainer) or Docker.

Not included with the software; must be downloaded and installed separately.

Seeker Primary Analysis Pipeline version history

Version number Release date Notes
v3.1.0 2025-08

New features

  • STAR Alignment and FORMATCLEANUP: Now automatically retry with progressively increased memory on failure.

Bug fixes

  • FORMATCLEANUP: Fixed edge case handling for last count matrix chunk containing only one bead barcode.
  • GENEREPORT: Dynamically adjusts color palette size to support datasets with many clusters.
  • ANALYSIS: Resolved issue with reticulate failing to locate the anndata module by enforcing use of the Seeker conda environment.
  • STAR Alignment: Updated container from star:2.6.1d--0 to star:2.6.1d--h9ee0642_1 for compatibility with newer Docker/Singularity versions.

FEATURECOUNT: Updated Subread container from 2.0.1 to 2.0.8 to fix segmentation faults on large genomes.

Workflow simplification

  • Bundled Seeker, STAR, Samtools, and Subread Singularity and Docker containers directly into the codebase. This removes the need to pull containers separately or configure singularity paths and cache directories.
v3.0.0 2024-06

Impact

  • Algorithmic improvements, chunk size optimization, and reorganization of steps have been implemented, leading to lower memory requirements (256 GB for up to 3B reads) and decreased execution time (32-63% compared with v2.0.0).
  • Minimal output change as compared to v2.0.0.
  • Now compatible with Sequera Nextflow Tower platform.
  • Bugs fixed for commonly encountered user issues, due to different storage setups, varying data sizes,and varying data quality. This results in a higher percentage of successful executions without user intervention and improved user experience.
  • 6 new prebuilt references are available for download.
  • Optional cleanup of intermediate files, leading to a smaller output footprint.

Optimization and efficiency

  • Merged processes GENBARCODES and CALCULATETOPBB into a single process, CALCULATETOPBB_COMBINED.
  • Removed process MERGE_READ1_READ2_DB.
  • Removed process FORMATCONVERT.
  • Renamed process UMITOOLSCOUNTSX as UMITOOLSCOUNTSG.
  • Process CALCULATETOPBB_COMBINED was optimized for efficient memory and CPU utilization.
  • Process FORMATCLEANUP was optimized for efficient memory utilization, through chunk size reduction from 165,000 to 100,000.
  • Optimized resource allocation strategy for faster and less expensive distributed cloud computing.
  • Optimized default resource allocation and chunk sizes for adaptability to a wider range of dataset sizes (up to 3 billion reads).

Bug fixes

  • Assorted container mounts were replaced with dedicated Nextflow channels, eliminating the need for explicitly declared volumes.
  • -STAR alignment genome file inaccessible bug fixed
  • -samplesheet.csv inaccessible bug fixed
  • GEN_GENE_BARCODE_UMI_DB: pyarrow.lib.ArrowNotImplementedError:Unsupported cast from string to null using function cast_null) bug fixed.
  • FORMATCONVERT: RuntimeError: cannot cache function 'sparse_mean_var_minor_axis': no locator available for file bug fixed.
  • FORMATCLEANUP: ‘out_of_memory’ bug fixed.

New Features

  • Full Nextflow Tower compatibility.
  • --clean_work option added: removal of intermediate and temporary work files after successful pipeline executions.

Execution command and input changes

  • --igenomes_base: no longer required for execution using custom genome reference.
  • samplesheet.csv: for execution using custom genome reference, only 2 additional columns (star_index, gtf) are required.

Output changes

  • ${sample}_anndata.h5ad: Additional information is added to this output, including dimensional reduction outcomes, cluster assignments, and quality control metrics. They are obtained via the FORMATCLENUP and ANALYSIS processes and visualized in the html report.

New prebuilt genome references

  • Drosophila melanogaster
  • Macaca mulatta
  • Callithrix jacchus
  • Poecilia reticulata
  • Sorghum bicolor
  • Xenopus laevis