Utils Module¶
The utils module contains common utility functions and external tool integration.
Utility functions module.
Provides helper functions for logging, running commands, checking tool availability, and retrieving tool versions.
- variantcentrifuge.utils.run_command(cmd, output_file=None)[source]¶
Run a shell command and write stdout to output_file if provided, else return stdout.
- Parameters:
- Returns:
If output_file is None, returns the command stdout as a string. If output_file is provided, returns output_file after completion.
- Return type:
- Raises:
subprocess.CalledProcessError – If the command returns a non-zero exit code.
- variantcentrifuge.utils.normalize_vcf_headers(lines)[source]¶
Normalize header lines from tools like SnpEff and SnpSift.
By: 1. Removing known prefixes (e.g., “ANN[*].”, “ANN[0].”) 2. Converting indexed genotype fields from format GEN[index].FIELD to FIELD_index
(e.g., “GEN[0].AF” -> “AF_0”, “GEN[1].DP” -> “DP_1”)
- Parameters:
lines (List[str]) – A list of lines (e.g., lines from a file) whose first line may contain SnpEff/SnpSift-generated prefixes in column headers.
- Returns:
The updated list of lines where the first line has had matching prefixes removed or replaced and indexed fields normalized.
- Return type:
List[str]
- variantcentrifuge.utils.normalize_snpeff_headers(lines)[source]¶
Alias for normalize_vcf_headers for backward compatibility.
This function is deprecated, use normalize_vcf_headers instead.
- variantcentrifuge.utils.check_external_tools()[source]¶
Check if required external tools are installed and in the PATH.
Tools checked:
bcftools
snpEff
SnpSift
bedtools
If any are missing, log an error and exit.
- Raises:
SystemExit – If any required tool is missing.
- Return type:
- variantcentrifuge.utils.get_tool_version(tool_name)[source]¶
Retrieve the version of a given tool.
Supported tools:
snpEff
bcftools
SnpSift
bedtools
- variantcentrifuge.utils.sanitize_metadata_field(value)[source]¶
Sanitize a metadata field by removing tabs and newlines, replacing with spaces for TSV.
- variantcentrifuge.utils.ensure_fields_in_extract(base_fields_str, extra_fields)[source]¶
Ensure each item in extra_fields is present in the space-delimited base_fields_str.
Notes
We no longer normalize extra_fields here, so that raw columns like “GEN[*].DP” remain unmodified.
- variantcentrifuge.utils.generate_igv_safe_filename_base(sample_id, chrom, pos, ref, alt, max_allele_len=10, hash_len=6, max_variant_part_len=50)[source]¶
Generate a safe, shortened filename base for IGV reports to prevent “File name too long” errors.
This function handles long REF/ALT alleles by truncating and appending a hash of the original allele to maintain uniqueness. The function returns a base filename (without extension) that is filesystem-safe and should avoid “File name too long” errors.
- Parameters:
sample_id (str) – Sample identifier
chrom (str) – Chromosome name/identifier
ref (str) – Reference allele
alt (str) – Alternate allele
max_allele_len (int, default=10) – Maximum length for each allele in the filename
hash_len (int, default=6) – Length of hash to append when truncating an allele
max_variant_part_len (int, default=50) – Maximum length for the variant part of the filename (chr_pos_ref_alt)
- Returns:
A safe, shortened filename base
- Return type: