Phenotype Module¶

Phenotype data loading and integration

Phenotype integration module.

This module loads a phenotype file containing sample-to-phenotype mappings and provides a function to aggregate phenotypes for a given list of samples.

The phenotype file must be .csv or .tsv (detected by extension).
The specified sample and phenotype columns must be present in the file.
Phenotypes are stored in a dictionary (sample -> set of phenotypes).
Given a list of samples, phenotypes are aggregated as follows: - For each sample, join multiple phenotypes by “,”. - For multiple samples, join each sample’s phenotype string by “;”.

variantcentrifuge.phenotype.load_phenotypes(phenotype_file, sample_column, phenotype_column)[source]¶

Load phenotypes from a .csv or .tsv file into a dictionary.

Parameters:

phenotype_file (str) – Path to the phenotype file (must be .csv or .tsv).
sample_column (str) – Name of the column containing sample IDs.
phenotype_column (str) – Name of the column containing phenotype values.

Returns:

dict of {str – A dictionary mapping each sample to a set of associated phenotypes.

Return type:

set of str}

Raises:

ValueError – If the file is not .csv or .tsv, or if the required columns are not found.

variantcentrifuge.phenotype.aggregate_phenotypes_for_samples(samples, phenotypes)[source]¶

Aggregate phenotypes for a given list of samples into a single string.

For each sample: - Join multiple phenotypes with “,”. For multiple samples: - Join each sample’s phenotype string with “;”.

Parameters:

samples (list of str) – List of sample IDs.
phenotypes (dict of {str: set of str}) – Dictionary mapping sample IDs to a set of phenotypes.

Returns:

A string aggregating all phenotypes for the given samples, with phenotypes comma-separated per sample, and samples separated by “;”.

Return type:

str