Pipeline Module¶
The pipeline module orchestrates the complete VariantCentrifuge workflow.
Pipeline orchestration module for variantcentrifuge.
This module provides high-level orchestration of the analysis steps: - Checks external tools - Normalizes genes - Loads phenotypes and phenotype terms - Extracts variants and applies filters - Performs optional genotype replacement - Integrates phenotype data - Runs variant-level and gene-burden analyses using analyze_variants - Generates metadata and optionally converts results to Excel - Cleans up intermediates if requested
All steps are coordinated within the run_pipeline function.
- variantcentrifuge.pipeline.remove_vcf_extensions(filename)[source]¶
Remove common VCF-related extensions from a filename.
- variantcentrifuge.pipeline.compute_base_name(vcf_path, gene_name)[source]¶
Compute a base name for output files based on the VCF filename and genes.
If multiple genes are specified, create a hash to represent them. If ‘all’ is specified, append ‘.all’. Otherwise, append the gene name if it’s not already in the VCF base name.
- variantcentrifuge.pipeline.load_terms_from_file(file_path, logger)[source]¶
Load terms (HPO terms, sample IDs, etc.) from a file, one per line.
- Parameters:
file_path (str or None) – Path to a file containing one term per line.
logger (logging.Logger) – Logger instance for error logging.
- Returns:
A list of terms loaded from the file.
- Return type:
- Raises:
SystemExit – If the file is missing or empty and a file_path was specified.
- variantcentrifuge.pipeline.parse_samples_from_vcf(vcf_file)[source]¶
Parse sample names from a VCF file by reading its header line.
- variantcentrifuge.pipeline.run_pipeline(args, cfg, start_time)[source]¶
High-level orchestration of the pipeline steps.
- Parameters:
args (argparse.Namespace) – Parsed command-line arguments.
cfg (dict) – Configuration dictionary merged from CLI and config file.
start_time (datetime.datetime) – The start time of the run.
- Returns:
Writes output files and may print results to stdout.
- Return type:
None