Inheritance Analysis¶
VariantCentrifuge includes a comprehensive Mendelian inheritance analysis system for identifying disease-causing variants in family studies. It supports de novo, autosomal dominant, autosomal recessive, X-linked, compound heterozygous, and mitochondrial inheritance patterns.
Overview¶
The inheritance analysis pipeline uses a three-pass approach:
Deduction — Determine the most likely inheritance pattern for each variant based on genotypes and family structure
Compound heterozygous detection — Identify pairs of heterozygous variants in the same gene inherited from different parents (trans configuration)
Prioritization — Rank patterns by clinical significance when multiple patterns are possible
Quick Start¶
Family Trio Analysis¶
variantcentrifuge \
--gene-file disease_genes.txt \
--vcf-file trio.vcf.gz \
--ped family.ped \
--inheritance-mode columns \
--preset rare,coding \
--html-report \
--output-file trio_results.tsv
Singleton Analysis (No PED File)¶
Without a PED file, all samples are treated as affected singletons:
variantcentrifuge \
--gene-file genes.txt \
--vcf-file proband.vcf.gz \
--inheritance-mode simple \
--preset rare,coding \
--output-file singleton_results.tsv
PED File Format¶
VariantCentrifuge uses the standard PLINK PED format (tab-separated, 6 columns):
#Family_ID Individual_ID Father_ID Mother_ID Sex Affected
FAM001 proband father mother 1 2
FAM001 father 0 0 1 1
FAM001 mother 0 0 2 1
Column |
Values |
|---|---|
Family_ID |
Family identifier (groups related individuals) |
Individual_ID |
Must match sample IDs in VCF |
Father_ID |
Father’s Individual_ID, or |
Mother_ID |
Mother’s Individual_ID, or |
Sex |
|
Affected |
|
Tip
Sample IDs in the PED file must exactly match the sample names in the VCF file. Use bcftools query -l your.vcf.gz to check sample names.
Inheritance Patterns¶
Supported Patterns¶
Pattern |
Description |
Typical Genotypes |
|---|---|---|
|
Not present in either parent |
Proband: 0/1, Parents: 0/0 |
|
Present in one affected parent |
Proband: 0/1, Affected parent: 0/1, Unaffected parent: 0/0 |
|
Both parents are carriers |
Proband: 1/1, Parents: 0/1 |
|
X-linked, male affected |
Male proband: 1 (hemizygous), Mother: 0/1 (carrier) |
|
X-linked, both sexes affected |
Proband: 0/1 or 1/1 on chrX |
|
Two het variants in same gene from different parents |
Proband: 0/1 at two loci, each parent carries one |
|
Mitochondrial (chrM) inheritance |
Proband: variant on chrM |
|
Pattern cannot be determined |
Missing data or ambiguous genotypes |
Pattern Prioritization¶
When multiple patterns are possible, they are ranked by clinical significance:
de_novo (highest priority)
compound_heterozygous
autosomal_recessive / x_linked_recessive
x_linked_dominant
autosomal_dominant
unknown (lowest priority)
Output Modes¶
The --inheritance-mode flag controls how inheritance results appear in the output:
Simple Mode (--inheritance-mode simple)¶
Adds a single Inheritance_Pattern column with the pattern name:
GENE POS Inheritance_Pattern
BRCA1 41276 de_novo
PKD1 2138 autosomal_recessive
Columns Mode (--inheritance-mode columns)¶
Expands inheritance results into separate columns for easier filtering:
GENE POS Inheritance_Pattern Inheritance_Confidence Inheritance_Samples Inheritance_Details
BRCA1 41276 de_novo 0.95 proband(0/1) Father: 0/0, Mother: 0/0
PKD1 2138 autosomal_recessive 0.90 proband(1/1) Father: 0/1, Mother: 0/1
Full Mode (--inheritance-mode full)¶
Outputs the complete inheritance analysis as a JSON object in a single column. Useful for programmatic downstream analysis.
Compound Heterozygous Detection¶
Compound heterozygous variants are two heterozygous variants in the same gene inherited from different parents (one from each parent — trans configuration).
How It Works¶
For each gene, identify samples with 2+ heterozygous variants
Check parental genotypes to confirm trans configuration (one variant from father, one from mother)
When parents are unavailable, variants are marked as
compound_het_possibleMultiple compound het pairs in a gene are ranked by variant impact and allele frequency
Performance¶
The default vectorized implementation is 10-50x faster than the original for genes with many variants:
Variants per Gene |
Original |
Vectorized |
Speedup |
|---|---|---|---|
10 |
0.01s |
0.001s |
10x |
50 |
0.19s |
0.004s |
48x |
100 |
1.23s |
0.017s |
72x |
500 |
31.0s |
0.55s |
56x |
To use the original implementation (for debugging or comparison):
--no-vectorized-comp-het
Integration with Scoring¶
Inheritance patterns integrate with the variant scoring system. The built-in inheritance_score model assigns scores based on clinical significance:
Pattern |
Score |
|---|---|
de_novo |
0.95 |
autosomal_recessive / compound_het / homozygous |
0.80 |
x_linked_recessive |
0.70 |
x_linked_dominant |
0.50 |
autosomal_dominant |
0.40 |
compound_het_possible |
0.40 |
unknown |
0.10 |
Use inheritance in scoring formulas:
variantcentrifuge \
--gene-file genes.txt \
--vcf-file trio.vcf.gz \
--ped family.ped \
--inheritance-mode columns \
--scoring-config-path scoring/nephro_candidate_score \
--output-file scored_trio.tsv
Filtering on Inheritance¶
Use --final-filter to filter results by inheritance pattern:
# Only de novo and compound het variants
--final-filter 'Inheritance_Pattern in ["de_novo", "compound_heterozygous"]'
# High-confidence inheritance calls
--final-filter 'Inheritance_Confidence > 0.8'
# Recessive patterns only
--final-filter 'Inheritance_Pattern in ["autosomal_recessive", "compound_heterozygous", "x_linked_recessive"]'
Memory Management¶
For large multi-sample VCFs, inheritance analysis memory usage is automatically managed:
System memory is detected (supports SLURM, PBS, cgroups, and bare metal)
Chunk sizes are calculated based on available memory
Use
--max-memory-gbto set explicit limitsUse
--force-inheritance-processingto override memory safety checks--memory-safety-factor(default 0.92) controls how much of detected memory to use
Worked Example: Rare Disease Trio¶
# Full trio analysis with inheritance, scoring, and reports
variantcentrifuge \
--gene-file intellectual_disability_genes.txt \
--vcf-file trio_annotated.vcf.gz \
--ped trio.ped \
--inheritance-mode columns \
--preset rare,coding \
--scoring-config-path scoring/nephro_candidate_score \
--final-filter 'Inheritance_Pattern != "unknown" and IMPACT in ["HIGH", "MODERATE"]' \
--html-report \
--xlsx \
--output-file trio_id_analysis.tsv
Troubleshooting¶
No inheritance patterns detected¶
Verify PED file sample IDs match VCF sample names exactly
Check that the PED file has correct family relationships
Ensure the VCF contains genotype data (GT field) for all family members
All patterns show as “unknown”¶
Missing genotype data (./.) for key family members
Parent samples not present in VCF
Incorrect sex assignments (affects X-linked analysis)
Compound het not detected¶
Need at least 2 heterozygous variants per gene per affected sample
Parent genotypes required to confirm trans configuration (without parents,
compound_het_possibleis reported)Check that gene names are consistent between variants
See Also¶
Rare Disease Workflow — End-to-end rare disease analysis
Variant Scoring — Integrating inheritance with scoring models
Performance Tips — Memory management for large datasets
Usage Guide — Complete CLI reference