Usage Guide¶

Basic Usage¶

The most basic command to run VariantCentrifuge:

variantcentrifuge \
  --gene-name BICC1 \
  --vcf-file path/to/your.vcf \
  --output-file output.tsv

Command Line Options¶

Required Arguments¶

--vcf-file - Input VCF file (can be compressed with gzip)
--output-file - Output TSV file path

Gene Selection (choose one)¶

--gene-name GENE - Single gene name
--gene-file GENES.TXT - File containing multiple genes (one per line)

Configuration¶

--config CONFIG_FILE - Load custom parameters from JSON config file
--reference REFERENCE - snpEff reference database (overrides config)
--filters "FILTER_EXPRESSION" - Custom SnpSift filters (overrides config)
--fields "FIELD_LIST" - Custom fields to extract (overrides config)

Input/Output Options¶

--samples-file SAMPLES.TXT - Sample ID mapping for genotype replacement
--phenotype-file PHENO.TSV - Phenotype data file
--phenotype-sample-column - Column name for sample IDs in phenotype file
--phenotype-value-column - Column name for phenotype values
--xlsx - Convert final output TSV to XLSX format
--keep-intermediates - Retain intermediate files after successful run

Analysis Options¶

--perform-gene-burden - Run gene burden analysis
--html-report - Generate interactive HTML report
--igv - Enable IGV.js integration (requires additional options)
--bam-mapping-file - TSV/CSV file mapping sample IDs to BAM files
--igv-reference - Genome reference for IGV (e.g., ‘hg19’, ‘hg38’)

Scoring Options¶

--scoring-config-path - Path to scoring configuration directory containing variable_assignment_config.json and formula_config.json

Annotation Options¶

--annotate-bed BED_FILE - Annotate variants with genomic regions from BED files (can specify multiple)
--annotate-gene-list GENE_LIST - Check if variants affect genes in custom gene lists (can specify multiple)
--annotate-json-genes JSON_FILE - Annotate variants with gene information from JSON file
--json-gene-mapping MAPPING - Specify JSON field mapping for gene annotations (required with –annotate-json-genes)

Other Options¶

--version - Show version and exit
--help - Show help message

Examples¶

Basic Gene Analysis¶

variantcentrifuge \
  --gene-name BRCA1 \
  --vcf-file samples.vcf.gz \
  --output-file brca1_variants.tsv

Multiple Genes with Custom Filters¶

variantcentrifuge \
  --gene-file cancer_genes.txt \
  --vcf-file samples.vcf.gz \
  --filters "(( dbNSFP_gnomAD_exomes_AC[0] <= 2 ) | ( na dbNSFP_gnomAD_exomes_AC[0] )) & ((ANN[ANY].IMPACT has 'HIGH') | (ANN[ANY].IMPACT has 'MODERATE'))" \
  --output-file cancer_variants.tsv \
  --xlsx

Comprehensive Analysis with Reports¶

variantcentrifuge \
  --gene-name BRCA1 \
  --vcf-file samples.vcf.gz \
  --samples-file sample_mapping.txt \
  --phenotype-file patient_data.tsv \
  --phenotype-sample-column "sample_id" \
  --phenotype-value-column "disease_status" \
  --perform-gene-burden \
  --html-report \
  --xlsx \
  --output-file brca1_analysis.tsv

IGV Integration¶

variantcentrifuge \
  --gene-name TP53 \
  --vcf-file samples.vcf.gz \
  --igv \
  --bam-mapping-file bam_files.tsv \
  --igv-reference hg38 \
  --html-report \
  --output-file tp53_variants.tsv

Variant Scoring¶

# Apply custom scoring model to variants
variantcentrifuge \
  --gene-file kidney_genes.txt \
  --vcf-file patient.vcf.gz \
  --scoring-config-path scoring/nephro_variant_score \
  --preset rare,coding \
  --html-report \
  --output-file scored_variants.tsv

Custom Annotations¶

# Annotate with JSON gene information
variantcentrifuge \
  --gene-name BRCA1 \
  --vcf-file samples.vcf.gz \
  --annotate-json-genes gene_metadata.json \
  --json-gene-mapping '{"identifier":"gene_symbol","dataFields":["panel","inheritance","function"]}' \
  --output-file annotated_variants.tsv

# Multiple annotation sources
variantcentrifuge \
  --gene-file cancer_genes.txt \
  --vcf-file samples.vcf.gz \
  --annotate-bed cancer_hotspots.bed \
  --annotate-gene-list actionable_genes.txt \
  --annotate-json-genes gene_panels.json \
  --json-gene-mapping '{"identifier":"symbol","dataFields":["panel_name","evidence_level"]}' \
  --html-report \
  --output-file multi_annotated.tsv

Input File Formats¶

VCF Files¶

Standard VCF format (v4.0 or later)
Can be compressed with gzip (.vcf.gz)
Should be annotated with snpEff for optimal functionality

Gene Files¶

Text file with one gene name per line:

BRCA1
BRCA2
TP53
ATM

Sample Mapping Files¶

Tab-separated file for genotype replacement:

original_id	new_id
sample_001	Patient_A
sample_002	Patient_B
sample_003	Control_001

Phenotype Files¶

Tab or comma-separated file with sample information:

sample_id	disease_status	age	sex
Patient_A	case	45	F
Patient_B	case	52	M
Control_001	control	48	F

BAM Mapping Files¶

For IGV integration, provide a mapping from sample IDs to BAM file paths:

sample_id	bam_path
Patient_A	/path/to/patient_a.bam
Patient_B	/path/to/patient_b.bam
Control_001	/path/to/control_001.bam

JSON Gene Files¶

For gene annotation, provide a JSON file containing an array of gene objects:

[
  {
    "gene_symbol": "BRCA1",
    "panel": "HereditaryCancer",
    "inheritance": "AD",
    "function": "DNA repair"
  },
  {
    "gene_symbol": "TP53",
    "panel": "HereditaryCancer",
    "inheritance": "AD",
    "function": "Tumor suppressor"
  }
]

The --json-gene-mapping parameter specifies:

identifier: The field containing the gene symbol (e.g., “gene_symbol”)
dataFields: Array of fields to include as annotations (e.g., [“panel”, “inheritance”, “function”])

Output Files¶

Main Output¶

TSV file - Tab-separated variant table with extracted fields
XLSX file - Excel format (if --xlsx specified)
Metadata file - Analysis parameters and tool versions

Optional Outputs¶

HTML report - Interactive variant browser (if --html-report specified)
IGV reports - Individual variant visualization (if --igv specified)
Gene burden results - Statistical analysis (if --perform-gene-burden specified)

Configuration¶

See the Configuration Guide for detailed information about setting up configuration files and customizing VariantCentrifuge behavior.

Troubleshooting¶

Common Issues¶

No variants found:
- Check that your VCF file contains variants in the specified gene regions
- Verify gene names are correct and match your reference annotation
- Review filter expressions - they may be too restrictive
External tool errors:
- Ensure all required tools are installed and in PATH
- Check that snpEff database matches your VCF reference
- Verify file permissions and disk space
Memory issues:
- Large VCF files may require more memory
- Consider filtering your VCF file beforehand to reduce size
- Use --keep-intermediates to debug intermediate file sizes

Getting Help¶

Use variantcentrifuge --help for command-line options
Check the API Reference for detailed function documentation
Report issues on GitHub