Phenotype Module¶
Phenotype data loading and integration
Phenotype integration module.
This module loads a phenotype file containing sample-to-phenotype mappings and provides a function to aggregate phenotypes for a given list of samples.
The phenotype file must be .csv or .tsv (detected by extension).
The specified sample and phenotype columns must be present in the file.
Phenotypes are stored in a dictionary (sample -> set of phenotypes).
Given a list of samples, phenotypes are aggregated as follows: - For each sample, join multiple phenotypes by “,”. - For multiple samples, join each sample’s phenotype string by “;”.
- variantcentrifuge.phenotype.load_phenotypes(phenotype_file, sample_column, phenotype_column)[source]¶
Load phenotypes from a .csv or .tsv file into a dictionary.
- Parameters:
- Returns:
dict of {str – A dictionary mapping each sample to a set of associated phenotypes.
- Return type:
set of str}
- Raises:
ValueError – If the file is not .csv or .tsv, or if the required columns are not found.
- variantcentrifuge.phenotype.aggregate_phenotypes_for_samples(samples, phenotypes)[source]¶
Aggregate phenotypes for a given list of samples into a single string.
For each sample: - Join multiple phenotypes with “,”. For multiple samples: - Join each sample’s phenotype string with “;”.
- Parameters:
- Returns:
A string aggregating all phenotypes for the given samples, with phenotypes comma-separated per sample, and samples separated by “;”.
- Return type:
- variantcentrifuge.phenotype.format_phenotypes_like_gt_column(samples, phenotypes)[source]¶
Format phenotypes in the same style as the GT column with sample IDs.
Creates a string similar to GT column format: “SampleID(phenotype1,phenotype2);SampleID(phenotype3);…”
This matches the format used in genotype replacement where each sample’s data is prefixed with the sample ID in parentheses.
- Parameters:
- Returns:
A string with phenotypes formatted like GT column: “Sample1(pheno1,pheno2);Sample2(pheno3);…” Samples without phenotypes get empty parentheses: “Sample3()”.
- Return type:
- variantcentrifuge.phenotype.extract_phenotypes_for_gt_row(gt_value, phenotypes)[source]¶
Extract phenotypes for samples that have variants in a specific GT row.
Parses the GT column value to find which samples have variants, then returns their phenotypes in the same format as the GT column.
- Parameters:
- Returns:
Phenotypes for samples with variants: “Sample1(pheno1,pheno2);Sample2(pheno3)” Samples with no variants (./. or 0/0) are excluded.
- Return type: