Custom Filter Development¶
Learn how to create custom filter expressions for specialized analysis needs. VariantCentrifuge supports multiple filtering approaches at different stages of the pipeline.
Filtering Options Overview¶
1. SnpSift Filters (--filters
)¶
Applied early in the pipeline on VCF data. Use SnpSift’s Java-like expression syntax.
2. bcftools Pre-filters (--bcftools-prefilter
)¶
Applied during variant extraction for performance. Uses bcftools expression syntax.
3. Final Filters (--final-filter
)¶
Applied at the end of the pipeline on the final TSV. Uses pandas query syntax.
SnpSift Filter Syntax¶
SnpSift uses a Java-like expression syntax:
# Basic comparisons
"QUAL >= 30"
"AC[0] <= 5"
"AF[0] < 0.01"
# Field existence
"exists ClinVar_CLNSIG"
"na gnomAD_exomes_AF"
# String matching
"ClinVar_CLNSIG =~ '[Pp]athogenic'"
"ANN[ANY].IMPACT has 'HIGH'"
# Complex combinations
"((QUAL >= 30) & (AC[0] <= 5)) | (ClinVar_CLNSIG =~ '[Pp]athogenic')"
bcftools Pre-filter Syntax¶
bcftools uses a different syntax optimized for speed:
# Basic filters
--bcftools-prefilter 'FILTER="PASS"'
--bcftools-prefilter 'QUAL>30'
--bcftools-prefilter 'INFO/AC<10'
--bcftools-prefilter 'INFO/AF<0.01'
# Combined filters
--bcftools-prefilter 'FILTER="PASS" && QUAL>30 && INFO/AC<10'
--bcftools-prefilter 'INFO/AF<0.001 || INFO/AC<5'
Final Filter Syntax (pandas query)¶
The --final-filter
option uses pandas query syntax, allowing filtering on any column including computed values:
# Numeric comparisons
--final-filter 'inheritance_score > 0.8'
--final-filter 'CADD_phred >= 20'
--final-filter 'Inheritance_Confidence > 0.9'
# String equality
--final-filter 'IMPACT == "HIGH"'
--final-filter 'Inheritance_Pattern == "de_novo"'
# String contains
--final-filter 'Custom_Annotation.str.contains("cancer_panel")'
--final-filter 'GENE.str.startswith("BRCA")'
# IN operator
--final-filter 'Inheritance_Pattern in ["de_novo", "compound_heterozygous"]'
--final-filter 'GENE in ["BRCA1", "BRCA2", "TP53"]'
# Complex expressions
--final-filter '(IMPACT == "HIGH" and inheritance_score > 0.7) or CADD_phred > 30'
--final-filter 'Inheritance_Pattern == "de_novo" and Inheritance_Confidence > 0.8'
Filtering Strategy Examples¶
Performance-Optimized Filtering¶
# Use bcftools pre-filter to reduce data early
variantcentrifuge \
--bcftools-prefilter 'FILTER="PASS" && INFO/AC<20' \
--preset rare,coding \
--final-filter 'inheritance_score > 0.5' \
...
Score-Based Filtering¶
# Use late filtering or final filter for computed columns
variantcentrifuge \
--scoring-config-path scoring/my_model \
--late-filtering \
--filters "my_score > 0.7" \
...
# Or use final filter
variantcentrifuge \
--scoring-config-path scoring/my_model \
--final-filter 'my_score > 0.7 and IMPACT != "LOW"' \
...
Inheritance-Based Filtering¶
# Filter for high-confidence de novo variants
variantcentrifuge \
--ped family.ped \
--inheritance-mode columns \
--final-filter 'Inheritance_Pattern == "de_novo" and Inheritance_Confidence > 0.9' \
...
Example Custom Filter Presets¶
{
"presets": {
"custom_rare_high_impact": "(((gnomAD_exomes_AF < 0.0001) | (na gnomAD_exomes_AF)) & ((ANN[ANY].IMPACT has 'HIGH') | ((ANN[ANY].IMPACT has 'MODERATE') & (dbNSFP_CADD_phred >= 25))))"
}
}