Cancer Genomics Analysis¶
This guide covers somatic variant analysis workflows for cancer genomics using VariantCentrifuge, including tumor-normal paired analysis, tumor-only analysis, and loss of heterozygosity (LOH) detection.
Overview¶
VariantCentrifuge supports several cancer analysis modes through built-in somatic presets and configurable tumor-normal parameters:
Tumor-normal paired — Compare tumor and matched normal samples to identify somatic variants
Tumor-only — Analyze tumor samples without a matched normal
LOH detection — Identify loss of heterozygosity events
Germline shared — Find variants present in both tumor and normal
Tumor-Normal Paired Analysis¶
Prerequisites¶
Your VCF must contain both tumor and normal samples, typically produced by a somatic variant caller such as Mutect2, Strelka2, or VarDict.
Basic Workflow¶
variantcentrifuge \
--gene-file oncogenes_tsg.txt \
--vcf-file tumor_normal.vcf.gz \
--preset somatic,coding \
--tumor-sample-index 1 \
--normal-sample-index 0 \
--html-report \
--output-file somatic_variants.tsv
Configuring Sample Indices¶
VCF files list samples in a specific order. Use --tumor-sample-index and --normal-sample-index to identify which sample is which:
Flag |
Default |
Description |
|---|---|---|
|
|
0-based index of the tumor sample |
|
|
0-based index of the normal sample |
To check sample order in your VCF:
bcftools query -l tumor_normal.vcf.gz
Quality Thresholds¶
Fine-tune somatic variant calling quality with these parameters:
Flag |
Default |
Description |
|---|---|---|
|
|
Minimum read depth in tumor |
|
|
Minimum read depth in normal |
|
|
Minimum allele frequency in tumor (5%) |
|
|
Maximum allele frequency in normal (3%) |
These values are substituted into preset filter expressions via template variables ({tumor_idx}, {normal_idx}, {tumor_dp_min}, etc.).
Strict Somatic Filtering¶
For high-confidence somatic calls with stringent thresholds:
variantcentrifuge \
--gene-file cancer_panel.txt \
--vcf-file tumor_normal.vcf.gz \
--preset somatic_strict,coding \
--tumor-sample-index 1 \
--normal-sample-index 0 \
--tumor-dp-min 30 \
--tumor-af-min 0.10 \
--normal-af-max 0.01 \
--html-report \
--xlsx \
--output-file strict_somatic.tsv
Tumor-Only Analysis¶
When a matched normal sample is not available, use the tumor_only preset which filters on population frequency databases to remove common germline variants:
variantcentrifuge \
--gene-file oncogenes_tsg.txt \
--vcf-file tumor_only.vcf.gz \
--preset tumor_only,coding \
--tumor-sample-index 0 \
--tumor-dp-min 30 \
--tumor-af-min 0.05 \
--html-report \
--output-file tumor_only_variants.tsv
Note
Tumor-only analysis has a higher false-positive rate for somatic calls because rare germline variants may pass population frequency filters. Consider using stricter frequency thresholds.
Loss of Heterozygosity (LOH)¶
Detect LOH events where the normal sample is heterozygous but the tumor shows allelic imbalance:
variantcentrifuge \
--gene-file tsg_panel.txt \
--vcf-file tumor_normal.vcf.gz \
--preset loh \
--tumor-sample-index 1 \
--normal-sample-index 0 \
--output-file loh_events.tsv
Available Somatic Presets¶
VariantCentrifuge ships with several somatic-focused presets:
Preset |
Description |
|---|---|
|
Standard somatic filter: tumor AF >= threshold, normal AF <= threshold, minimum depth |
|
Somatic filter restricted to PASS variants |
|
Stringent somatic filter with higher depth and AF requirements |
|
Loss of heterozygosity: normal is het (0/1), tumor shows imbalance |
|
Variants present in both tumor and normal (shared germline) |
|
Tumor-only mode: filters by population frequency without matched normal |
|
Mutect2-specific: PASS filter for tumor-normal pairs |
|
Mutect2-specific: includes non-PASS variants |
|
Mutect2-specific: PASS filter for tumor-only |
Combine with impact presets for targeted analysis: --preset somatic,coding or --preset somatic,high.
Custom Somatic Configuration¶
For non-standard somatic calling pipelines, define custom presets in your config file:
{
"reference": "GRCh38.99",
"presets": {
"my_somatic": "(GEN[{normal_idx}].AF < {normal_af_max}) & (GEN[{tumor_idx}].AF >= {tumor_af_min}) & (GEN[{tumor_idx}].DP >= {tumor_dp_min}) & (GEN[{normal_idx}].DP >= {normal_dp_min})",
"cosmic_hotspot": "((exists ID) & (ID =~ 'COS'))",
"somatic_rare": "((dbNSFP_gnomAD_exomes_AC[0] <= 2) | (na dbNSFP_gnomAD_exomes_AC[0]))"
}
}
Template variables ({tumor_idx}, {normal_idx}, {tumor_dp_min}, etc.) are expanded at runtime using the CLI flags.
Worked Example: Comprehensive Cancer Panel¶
# Step 1: Somatic variant analysis with scoring
variantcentrifuge \
--gene-file comprehensive_cancer_panel.txt \
--vcf-file tumor_normal.vcf.gz \
--preset somatic,coding \
--tumor-sample-index 1 \
--normal-sample-index 0 \
--tumor-dp-min 30 \
--tumor-af-min 0.05 \
--scoring-config-path scoring/nephro_candidate_score \
--final-filter 'IMPACT in ["HIGH", "MODERATE"]' \
--html-report \
--xlsx \
--output-file cancer_panel_results.tsv
Interpretation Guidelines¶
Variant Prioritization for Cancer¶
Tier 1 — Known oncogenic: Variants in COSMIC hotspots or known driver mutations
Tier 2 — Likely oncogenic: HIGH impact variants in known cancer genes with low population frequency
Tier 3 — Uncertain significance: MODERATE impact variants requiring further evidence
Tier 4 — Likely passenger: Common variants or LOW impact changes
Key Columns for Cancer Analysis¶
IMPACT — Functional impact (HIGH, MODERATE, LOW, MODIFIER)
gnomAD AF — Population allele frequency (lower = more likely somatic)
CADD — Combined Annotation Dependent Depletion score
ClinVar — Clinical significance from ClinVar database
Tumor AF — Variant allele frequency in tumor (from genotype field)
Normal AF — Variant allele frequency in normal (should be ~0 for true somatic)
Best Practices¶
Verify sample order in your VCF before running — swapped indices silently produce wrong results
Use PASS variants (
somatic_passpreset) for clinical reporting; include non-PASS for researchAdjust depth thresholds based on your sequencing coverage (WGS ~30x, WES ~100x, panel ~500x)
Combine with population filters — true somatic variants are absent from population databases
Review LOH alongside somatic — LOH in tumor suppressors is a common second hit
Use
--bcftools-prefilteron large WGS VCFs to speed up analysis
See Also¶
Configuration Guide — Preset definitions and custom config
Custom Filters — Writing custom filter expressions
Variant Scoring — Configuring scoring models
Usage Guide — Complete CLI reference