Gene BED Module

Gene coordinate processing and BED file generation

Gene BED extraction and gene normalization module.

This module provides: - normalize_genes: For normalizing gene inputs (single gene, multiple genes, or file). - get_gene_bed: For generating a BED file corresponding to specified genes via snpEff genes2bed.

variantcentrifuge.gene_bed.normalize_genes(gene_name_str, gene_file_str, logger)[source]

Normalize genes from either a single gene name, a list of genes, or a file.

If ‘all’ is provided or no genes after filtering, returns “all”.

Parameters:
  • gene_name_str (str or None) – The gene name(s) provided via CLI (can be a single gene or space/comma-separated).

  • gene_file_str (str or None) – Path to a file containing gene names, one per line.

  • logger (logging.Logger) – Logger instance for logging messages.

Returns:

A normalized, space-separated string of gene names, or “all”.

Return type:

str

Raises:

SystemExit – If no gene name or file is provided, or if the specified file does not exist.

variantcentrifuge.gene_bed.get_gene_bed(reference, gene_name, interval_expand=0, add_chr=True, output_dir='output')[source]

Generate a BED file for the given gene(s) using snpEff genes2bed.

If gene_name == “all”, the command runs without specifying genes. If multiple genes are provided, they are passed as arguments.

Parameters:
  • reference (str) – The reference genome name compatible with snpEff.

  • gene_name (str) – “all” or space-separated list of gene names.

  • interval_expand (int, optional) – Number of bases to expand upstream/downstream of the gene regions.

  • add_chr (bool, optional) – Whether to add a ‘chr’ prefix to chromosome names in the BED file.

  • output_dir (str, optional) – Directory to store cached BED files. Default is “output”.

Returns:

Path to the final BED file.

Return type:

str

Raises:

subprocess.CalledProcessError – If the snpEff genes2bed or sorting command fails.