Usage Guide¶
Basic Usage¶
The most basic command to run VariantCentrifuge:
variantcentrifuge \
--gene-name BICC1 \
--vcf-file path/to/your.vcf \
--output-file output.tsv
Command Line Options¶
Required Arguments¶
--vcf-file
- Input VCF file (can be compressed with gzip)--output-file
- Output TSV file path
Gene Selection (choose one)¶
--gene-name GENE
- Single gene name--gene-file GENES.TXT
- File containing multiple genes (one per line)
Configuration¶
--config CONFIG_FILE
- Load custom parameters from JSON config file--reference REFERENCE
- snpEff reference database (overrides config)--filters "FILTER_EXPRESSION"
- Custom SnpSift filters (overrides config)--fields "FIELD_LIST"
- Custom fields to extract (overrides config)
Input/Output Options¶
--samples-file SAMPLES.TXT
- Sample ID mapping for genotype replacement--phenotype-file PHENO.TSV
- Phenotype data file--phenotype-sample-column
- Column name for sample IDs in phenotype file--phenotype-value-column
- Column name for phenotype values--xlsx
- Convert final output TSV to XLSX format--keep-intermediates
- Retain intermediate files after successful run
Analysis Options¶
--perform-gene-burden
- Run gene burden analysis--html-report
- Generate interactive HTML report--igv
- Enable IGV.js integration (requires additional options)--bam-mapping-file
- TSV/CSV file mapping sample IDs to BAM files--igv-reference
- Genome reference for IGV (e.g., ‘hg19’, ‘hg38’)
Scoring Options¶
--scoring-config-path
- Path to scoring configuration directory containing variable_assignment_config.json and formula_config.json
Annotation Options¶
--annotate-bed BED_FILE
- Annotate variants with genomic regions from BED files (can specify multiple)--annotate-gene-list GENE_LIST
- Check if variants affect genes in custom gene lists (can specify multiple)--annotate-json-genes JSON_FILE
- Annotate variants with gene information from JSON file--json-gene-mapping MAPPING
- Specify JSON field mapping for gene annotations (required with –annotate-json-genes)
Other Options¶
--version
- Show version and exit--help
- Show help message
Examples¶
Basic Gene Analysis¶
variantcentrifuge \
--gene-name BRCA1 \
--vcf-file samples.vcf.gz \
--output-file brca1_variants.tsv
Multiple Genes with Custom Filters¶
variantcentrifuge \
--gene-file cancer_genes.txt \
--vcf-file samples.vcf.gz \
--filters "(( dbNSFP_gnomAD_exomes_AC[0] <= 2 ) | ( na dbNSFP_gnomAD_exomes_AC[0] )) & ((ANN[ANY].IMPACT has 'HIGH') | (ANN[ANY].IMPACT has 'MODERATE'))" \
--output-file cancer_variants.tsv \
--xlsx
Comprehensive Analysis with Reports¶
variantcentrifuge \
--gene-name BRCA1 \
--vcf-file samples.vcf.gz \
--samples-file sample_mapping.txt \
--phenotype-file patient_data.tsv \
--phenotype-sample-column "sample_id" \
--phenotype-value-column "disease_status" \
--perform-gene-burden \
--html-report \
--xlsx \
--output-file brca1_analysis.tsv
IGV Integration¶
variantcentrifuge \
--gene-name TP53 \
--vcf-file samples.vcf.gz \
--igv \
--bam-mapping-file bam_files.tsv \
--igv-reference hg38 \
--html-report \
--output-file tp53_variants.tsv
Variant Scoring¶
# Apply custom scoring model to variants
variantcentrifuge \
--gene-file kidney_genes.txt \
--vcf-file patient.vcf.gz \
--scoring-config-path scoring/nephro_variant_score \
--preset rare,coding \
--html-report \
--output-file scored_variants.tsv
Custom Annotations¶
# Annotate with JSON gene information
variantcentrifuge \
--gene-name BRCA1 \
--vcf-file samples.vcf.gz \
--annotate-json-genes gene_metadata.json \
--json-gene-mapping '{"identifier":"gene_symbol","dataFields":["panel","inheritance","function"]}' \
--output-file annotated_variants.tsv
# Multiple annotation sources
variantcentrifuge \
--gene-file cancer_genes.txt \
--vcf-file samples.vcf.gz \
--annotate-bed cancer_hotspots.bed \
--annotate-gene-list actionable_genes.txt \
--annotate-json-genes gene_panels.json \
--json-gene-mapping '{"identifier":"symbol","dataFields":["panel_name","evidence_level"]}' \
--html-report \
--output-file multi_annotated.tsv
Input File Formats¶
VCF Files¶
Standard VCF format (v4.0 or later)
Can be compressed with gzip (.vcf.gz)
Should be annotated with snpEff for optimal functionality
Gene Files¶
Text file with one gene name per line:
BRCA1
BRCA2
TP53
ATM
Sample Mapping Files¶
Tab-separated file for genotype replacement:
original_id new_id
sample_001 Patient_A
sample_002 Patient_B
sample_003 Control_001
Phenotype Files¶
Tab or comma-separated file with sample information:
sample_id disease_status age sex
Patient_A case 45 F
Patient_B case 52 M
Control_001 control 48 F
BAM Mapping Files¶
For IGV integration, provide a mapping from sample IDs to BAM file paths:
sample_id bam_path
Patient_A /path/to/patient_a.bam
Patient_B /path/to/patient_b.bam
Control_001 /path/to/control_001.bam
JSON Gene Files¶
For gene annotation, provide a JSON file containing an array of gene objects:
[
{
"gene_symbol": "BRCA1",
"panel": "HereditaryCancer",
"inheritance": "AD",
"function": "DNA repair"
},
{
"gene_symbol": "TP53",
"panel": "HereditaryCancer",
"inheritance": "AD",
"function": "Tumor suppressor"
}
]
The --json-gene-mapping
parameter specifies:
identifier
: The field containing the gene symbol (e.g., “gene_symbol”)dataFields
: Array of fields to include as annotations (e.g., [“panel”, “inheritance”, “function”])
Output Files¶
Main Output¶
TSV file - Tab-separated variant table with extracted fields
XLSX file - Excel format (if
--xlsx
specified)Metadata file - Analysis parameters and tool versions
Optional Outputs¶
HTML report - Interactive variant browser (if
--html-report
specified)IGV reports - Individual variant visualization (if
--igv
specified)Gene burden results - Statistical analysis (if
--perform-gene-burden
specified)
Configuration¶
See the Configuration Guide for detailed information about setting up configuration files and customizing VariantCentrifuge behavior.
Troubleshooting¶
Common Issues¶
No variants found:
Check that your VCF file contains variants in the specified gene regions
Verify gene names are correct and match your reference annotation
Review filter expressions - they may be too restrictive
External tool errors:
Ensure all required tools are installed and in PATH
Check that snpEff database matches your VCF reference
Verify file permissions and disk space
Memory issues:
Large VCF files may require more memory
Consider filtering your VCF file beforehand to reduce size
Use
--keep-intermediates
to debug intermediate file sizes
Getting Help¶
Use
variantcentrifuge --help
for command-line optionsCheck the API Reference for detailed function documentation
Report issues on GitHub