Pipeline Module

The pipeline module orchestrates the complete VariantCentrifuge workflow.

Pipeline orchestration module for variantcentrifuge.

This module provides high-level orchestration of the analysis steps: - Checks external tools - Normalizes genes - Loads phenotypes and phenotype terms - Extracts variants and applies filters - Performs optional genotype replacement - Integrates phenotype data - Runs variant-level and gene-burden analyses using analyze_variants - Generates metadata and optionally converts results to Excel - Cleans up intermediates if requested

All steps are coordinated within the run_pipeline function.

variantcentrifuge.pipeline.remove_vcf_extensions(filename)[source]

Remove common VCF-related extensions from a filename.

Parameters:

filename (str) – The input filename, possibly ending in .vcf, .vcf.gz, or .gz.

Returns:

The filename base without VCF-related extensions.

Return type:

str

variantcentrifuge.pipeline.compute_base_name(vcf_path, gene_name)[source]

Compute a base name for output files based on the VCF filename and genes.

If multiple genes are specified, create a hash to represent them. If ‘all’ is specified, append ‘.all’. Otherwise, append the gene name if it’s not already in the VCF base name.

Parameters:
  • vcf_path (str) – Path to the VCF file.

  • gene_name (str) – The normalized gene name string.

Returns:

A base name for output files.

Return type:

str

variantcentrifuge.pipeline.load_terms_from_file(file_path, logger)[source]

Load terms (HPO terms, sample IDs, etc.) from a file, one per line.

Parameters:
  • file_path (str or None) – Path to a file containing one term per line.

  • logger (logging.Logger) – Logger instance for error logging.

Returns:

A list of terms loaded from the file.

Return type:

list of str

Raises:

SystemExit – If the file is missing or empty and a file_path was specified.

variantcentrifuge.pipeline.parse_samples_from_vcf(vcf_file)[source]

Parse sample names from a VCF file by reading its header line.

Parameters:

vcf_file (str) – Path to the VCF file.

Returns:

A list of sample names extracted from the VCF header.

Return type:

list of str

variantcentrifuge.pipeline.run_pipeline(args, cfg, start_time)[source]

High-level orchestration of the pipeline steps.

Parameters:
  • args (argparse.Namespace) – Parsed command-line arguments.

  • cfg (dict) – Configuration dictionary merged from CLI and config file.

  • start_time (datetime.datetime) – The start time of the run.

Returns:

Writes output files and may print results to stdout.

Return type:

None