Phenotype Module

Phenotype data loading and integration

Phenotype integration module.

This module loads a phenotype file containing sample-to-phenotype mappings and provides a function to aggregate phenotypes for a given list of samples.

  • The phenotype file must be .csv or .tsv (detected by extension).

  • The specified sample and phenotype columns must be present in the file.

  • Phenotypes are stored in a dictionary (sample -> set of phenotypes).

  • Given a list of samples, phenotypes are aggregated as follows: - For each sample, join multiple phenotypes by “,”. - For multiple samples, join each sample’s phenotype string by “;”.

variantcentrifuge.phenotype.load_phenotypes(phenotype_file, sample_column, phenotype_column)[source]

Load phenotypes from a .csv or .tsv file into a dictionary.

Parameters:
  • phenotype_file (str) – Path to the phenotype file (must be .csv or .tsv).

  • sample_column (str) – Name of the column containing sample IDs.

  • phenotype_column (str) – Name of the column containing phenotype values.

Returns:

dict of {str – A dictionary mapping each sample to a set of associated phenotypes.

Return type:

set of str}

Raises:

ValueError – If the file is not .csv or .tsv, or if the required columns are not found.

variantcentrifuge.phenotype.aggregate_phenotypes_for_samples(samples, phenotypes)[source]

Aggregate phenotypes for a given list of samples into a single string.

For each sample: - Join multiple phenotypes with “,”. For multiple samples: - Join each sample’s phenotype string with “;”.

Parameters:
  • samples (list of str) – List of sample IDs.

  • phenotypes (dict of {str: set of str}) – Dictionary mapping sample IDs to a set of phenotypes.

Returns:

A string aggregating all phenotypes for the given samples, with phenotypes comma-separated per sample, and samples separated by “;”.

Return type:

str