Development Guide¶

This guide provides information for developers who want to contribute to VariantCentrifuge.

Development Setup¶

Prerequisites¶

Python 3.7+
Git
External bioinformatics tools (bcftools, snpEff, SnpSift, bedtools)

Setting Up Development Environment¶

Clone the repository:

git clone https://github.com/scholl-lab/variantcentrifuge.git
cd variantcentrifuge

Create development environment:

# Using conda (recommended)
mamba env create -f conda/environment.yml
mamba activate annotation

# Or using pip with virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r docs/requirements.txt

Install in development mode:
```
pip install -e .
```
Install pre-commit hooks:
```
pre-commit install
```

Code Quality¶

VariantCentrifuge maintains high code quality standards using automated tools:

Formatting and Linting¶

VariantCentrifuge uses a comprehensive linting setup to maintain code quality:

Tools Used¶

Black: Code formatting with 100-character line length
isort: Import statement organization (compatible with Black)
flake8: Style checking and error detection with docstring requirements
pre-commit: Automated quality checks before commits

Running Linting Tools¶

# Format code with Black (100 character line length)
black .

# Sort imports with isort (compatible with Black)
isort .

# Check code style with flake8
flake8 .

# Run all pre-commit hooks on all files
pre-commit run --all-files

# Install pre-commit hooks (run once after cloning)
pre-commit install

Automated Quality Assurance¶

Pre-commit hooks automatically run Black, isort, and flake8 on every commit to maintain code quality. The hooks will:

Format code automatically with Black
Sort imports according to isort configuration
Check style and fail commit if flake8 finds issues
Require docstrings for all functions and classes

Linting Configuration¶

Linting behavior is configured in:

pyproject.toml: Black configuration (line length, target versions)
setup.cfg: flake8 and isort configuration
.pre-commit-config.yaml: Pre-commit hook definitions and versions

Common Linting Issues and Solutions¶

Issue	Solution
Line too long	Black will auto-fix, or break long lines manually
Missing docstring	Add Google-style docstring to function/class
Import order	Run `isort .` to automatically fix
Unused imports	Remove unused imports or add `# noqa` comment
Trailing whitespace	Pre-commit will automatically remove

Docstring Requirements¶

All functions and classes must have docstrings following Google style:

def example_function(param1: str, param2: int = 10) -> bool:
    """
    Brief description of the function.

    Longer description if needed, explaining the function's purpose,
    algorithm, or important implementation details.

    Parameters
    ----------
    param1 : str
        Description of the first parameter.
    param2 : int, default=10
        Description of the second parameter with default value.

    Returns
    -------
    bool
        Description of the return value.

    Raises
    ------
    ValueError
        When param1 is empty or param2 is negative.

    Examples
    --------
    >>> example_function("test", 5)
    True
    """
    # Implementation here
    pass

Configuration¶

Code quality settings are configured in:

pyproject.toml - Black configuration
.pre-commit-config.yaml - Pre-commit hook definitions
setup.cfg or pyproject.toml - flake8 and isort settings

Testing¶

Running Tests¶

VariantCentrifuge has a comprehensive test suite with multiple categories and detailed coverage.

Basic Test Commands¶

# Run all tests with verbose output and colored results
pytest

# Run tests with maximum verbosity
pytest -v

# Run tests and stop on first failure
pytest -x

# Run tests in parallel (if pytest-xdist installed)
pytest -n auto

Test Categories¶

Tests are organized using pytest markers:

# Run only unit tests (fast, isolated tests)
pytest -m unit

# Run only integration tests (test component interactions)
pytest -m integration

# Run slow tests specifically (may involve large files or external tools)
pytest -m slow

# Run all tests except slow ones
pytest -m "not slow"

# Combine markers
pytest -m "unit or integration"

Test Coverage¶

# Run tests with coverage reporting
pytest --cov=variantcentrifuge

# Generate HTML coverage report
pytest --cov=variantcentrifuge --cov-report=html

# Generate XML coverage report (for CI)
pytest --cov=variantcentrifuge --cov-report=xml

# Show missing lines in coverage
pytest --cov=variantcentrifuge --cov-report=term-missing

Running Specific Tests¶

# Run specific test file
pytest tests/test_cli.py

# Run specific test class
pytest tests/test_cli.py::TestCLI

# Run specific test function
pytest tests/test_cli.py::TestCLI::test_basic_functionality

# Run tests matching a pattern
pytest -k "test_filter"

# Run tests with specific substring in name
pytest -k "vcf and not slow"

Test Output and Debugging¶

# Show print statements and logging output
pytest -s

# Drop into debugger on failures
pytest --pdb

# Drop into debugger on first failure
pytest --pdb -x

# Capture only failures (show output only for failed tests)
pytest --tb=short

# Show local variables on failures
pytest --tb=long

Test Organization¶

Tests are organized by functionality in the tests/ directory:

Test Files Structure¶

test_cli.py - Command-line interface tests
test_filters.py - Variant filtering tests
test_gene_lists.py - Gene list processing tests
test_igv.py - IGV integration tests
test_utils.py - Utility function tests
conftest.py - Pytest configuration and shared fixtures
pytest.ini - Pytest configuration file

Test Data¶

Test data is organized in subdirectories:

tests/data/ - Sample VCF files, configuration files, expected outputs
tests/fixtures/ - Pytest fixtures for common test setup
tests/integration/ - Integration test data and scenarios

Test Categories and Markers¶

Marker	Purpose	Examples
`@pytest.mark.unit`	Fast, isolated unit tests	Function input/output validation
`@pytest.mark.integration`	Component interaction tests	Pipeline workflow tests
`@pytest.mark.slow`	Tests that take significant time	Large file processing, external tools
`@pytest.mark.external_tools`	Tests requiring external tools	bcftools, snpEff integration tests

Configuration Files¶

pytest.ini: Main pytest configuration

[tool:pytest]
markers =
    unit: Unit tests (fast, isolated)
    integration: Integration tests
    slow: Slow tests that may take significant time
    external_tools: Tests requiring external tools
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*

Writing Tests¶

Follow these comprehensive guidelines when writing tests:

Test Design Principles¶

Use descriptive test names that explain what is being tested
Follow the AAA pattern (Arrange, Act, Assert)
Use pytest fixtures for setup and teardown
Mock external dependencies (file system, external tools)
Test both success and failure cases
Keep tests independent - each test should be able to run in isolation
Use appropriate markers to categorize tests

Test Naming Conventions¶

Use descriptive names that explain the scenario:

# Good test names
def test_filter_variants_with_valid_quality_threshold():
def test_filter_variants_raises_error_with_invalid_expression():
def test_gene_normalization_handles_case_insensitive_input():
def test_vcf_extraction_preserves_header_order():

# Avoid generic names
def test_filter():  # Too vague
def test_success():  # Doesn't describe what succeeds
def test_error():   # Doesn't describe what causes error

Test Structure and Examples¶

Basic Unit Test:

import pytest
from variantcentrifuge.filters import apply_filter

@pytest.mark.unit
def test_filter_variants_with_valid_quality_threshold():
    # Arrange
    vcf_file = "test_input.vcf"
    filter_expr = "QUAL >= 30"
    expected_variant_count = 5
    
    # Act
    result = apply_filter(vcf_file, filter_expr)
    
    # Assert
    assert result.returncode == 0
    assert "filtered variants" in result.output
    assert result.variant_count == expected_variant_count

Integration Test with Fixtures:

@pytest.mark.integration
def test_full_pipeline_with_sample_data(sample_vcf, temp_output_dir):
    # Arrange
    config = {
        "gene_name": "BRCA1",
        "filters": ["rare", "coding"],
        "output_format": "tsv"
    }
    
    # Act
    result = run_pipeline(sample_vcf, config, temp_output_dir)
    
    # Assert
    assert result.success
    assert (temp_output_dir / "output.tsv").exists()
    assert result.variant_count > 0

Error Testing:

@pytest.mark.unit
def test_filter_raises_error_with_invalid_expression():
    # Arrange
    vcf_file = "test_input.vcf"
    invalid_filter = "INVALID_FIELD >= 30"
    
    # Act & Assert
    with pytest.raises(ValueError, match="Invalid filter expression"):
        apply_filter(vcf_file, invalid_filter)

Parametrized Tests:

@pytest.mark.unit
@pytest.mark.parametrize("input_gene,expected_output", [
    ("brca1", "BRCA1"),
    ("BRCA1", "BRCA1"),
    ("BrCa1", "BRCA1"),
    ("tp53", "TP53"),
])
def test_gene_name_normalization(input_gene, expected_output):
    # Act
    result = normalize_gene_name(input_gene)
    
    # Assert
    assert result == expected_output

Mocking External Dependencies¶

from unittest.mock import Mock, patch

@pytest.mark.unit
@patch('variantcentrifuge.utils.subprocess.run')
def test_run_command_handles_tool_failure(mock_subprocess):
    # Arrange
    mock_subprocess.return_value.returncode = 1
    mock_subprocess.return_value.stderr = "Tool error"
    
    # Act & Assert
    with pytest.raises(subprocess.CalledProcessError):
        run_command(["failing_tool", "--option"])

Fixtures for Common Setup¶

# In conftest.py
@pytest.fixture
def sample_vcf(tmp_path):
    """Create a sample VCF file for testing."""
    vcf_content = '''
##fileformat=VCFv4.2
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
CHR1	100	.	A	T	50	PASS	AC=1
'''
    vcf_file = tmp_path / "sample.vcf"
    vcf_file.write_text(vcf_content)
    return str(vcf_file)

@pytest.fixture
def temp_output_dir(tmp_path):
    """Create a temporary output directory."""
    output_dir = tmp_path / "output"
    output_dir.mkdir()
    return output_dir

Test Quality Checklist¶

Before submitting tests, verify:

Test names clearly describe the scenario
Tests follow AAA pattern
External dependencies are mocked appropriately
Both success and failure cases are covered
Tests use appropriate markers (@pytest.mark.unit, etc.)
Tests are independent and can run in any order
Test data is properly cleaned up (use fixtures)
Assertions are specific and meaningful
Coverage includes edge cases and error conditions

Documentation¶

Building Documentation Locally¶

cd docs
pip install -r requirements.txt
sphinx-build -b html source build/html

Open docs/build/html/index.html in your browser.

Documentation Structure¶

docs/source/ - Documentation source files
docs/source/api/ - Auto-generated API documentation
docs/source/guides/ - User guides and tutorials

Writing Documentation¶

Use Markdown for all documentation files
Follow Google-style docstrings in Python code
Include code examples in user-facing documentation
Update API documentation when adding new modules or functions

Architecture Overview¶

Core Design Principles¶

Modularity - Each module has a single, well-defined responsibility
Separation of Concerns - Clear boundaries between data processing, analysis, and reporting
Testability - Code is structured to enable comprehensive testing
Extensibility - New functionality can be added without breaking existing code

Key Components¶

variantcentrifuge/
├── cli.py              # Command-line interface
├── pipeline.py         # Main workflow orchestration
├── config.py           # Configuration management
├── filters.py          # Variant filtering
├── extractor.py        # Field extraction
├── analyze_variants.py # Statistical analysis
├── generate_*_report.py # Report generation
└── utils.py            # Common utilities

Data Flow¶

Input Validation - CLI validates arguments and files
Configuration Loading - Load and merge config from file and CLI
Gene Processing - Convert genes to BED regions
Variant Extraction - Extract variants from VCF using external tools
Filtering - Apply SnpSift filters
Field Extraction - Extract specified fields
Analysis - Perform statistical analysis and gene burden testing
Report Generation - Create output files and reports

Contributing¶

Workflow¶

Fork the repository on GitHub
Create a feature branch from main: git checkout -b feature/my-feature
Make changes following the coding standards
Add tests for new functionality
Update documentation as needed
Run tests and quality checks
Commit changes with descriptive messages
Push to your fork and create a pull request

Pull Request Guidelines¶

Describe the change clearly in the PR description
Reference any issues that the PR addresses
Include tests for new functionality
Update documentation for user-facing changes
Ensure CI passes (tests, linting, documentation build)

Commit Message Format¶

Use clear, concise commit messages:

type(scope): brief description

Longer explanation if needed

Fixes #123

Types: feat, fix, docs, style, refactor, test, chore

Release Process¶

Version Management¶

Versions are managed in variantcentrifuge/version.py following semantic versioning (MAJOR.MINOR.PATCH).

Creating a Release¶

Update version in version.py
Update CHANGELOG.md with release notes
Create a release tag: git tag -a v0.5.0 -m "Release v0.5.0"
Push tag: git push origin v0.5.0
Create GitHub release with release notes

Debugging¶

Common Development Issues¶

Import errors - Check that package is installed in development mode
Test failures - Ensure external tools are available in PATH
Documentation build failures - Check for syntax errors in docstrings
Pre-commit failures - Run tools manually to fix issues

Debugging Tools¶

pdb - Python debugger for interactive debugging
pytest –pdb - Drop into debugger on test failures
logging - Use appropriate log levels for debugging
–keep-intermediates - Retain intermediate files for inspection

Getting Help¶

GitHub Issues - Report bugs and request features
GitHub Discussions - Ask questions and discuss development
Code Review - Request feedback on complex changes
Documentation - Check existing docs before asking questions