pyhiv.split package

Submodules

pyhiv.split.split module

pyhiv.split.split.get_gene_region(test_aligned, ref_aligned, aligned_gene_ranges)[source]

Identify the gene region(s) with the highest alignment score.

Parameters:

test_aligned (str) – The aligned test sequence (with gaps).
ref_aligned (str) – The aligned reference sequence (with gaps).
aligned_gene_ranges (dict) – Dictionary mapping gene names to (start, end) positions in the alignment coordinates (0-based).

Returns:

A list of gene names corresponding to the region(s) with the highest alignment score. If multiple genes share the same maximum score, all of them are returned.

Return type:

list

pyhiv.split.split.get_present_gene_regions(test_aligned, aligned_gene_ranges)[source]

Identify gene regions that contain at least one base (non-gap) in the aligned test sequence.

Parameters:

test_aligned (str) – The aligned test sequence (with gaps).
aligned_gene_ranges (dict) – Dictionary mapping gene names to (start, end) positions in the alignment coordinates (0-based).

Returns:

A list of gene names where the test sequence contains non-gap characters within the region.

Return type:

list

pyhiv.split.split.map_ref_coords_to_alignment(ref_aligned)[source]

Build a mapping from reference coordinates without gaps (GenBank) to alignment columns with gaps.

Parameters:: ref_aligned (str) – The aligned reference sequence (may contain ‘-’ characters representing gaps).
Returns:: A dictionary mapping 1-based reference positions (without gaps) to 0-based alignment positions (with gaps).
Return type:: dict

pyhiv.split package

Submodules

pyhiv.split.split module

Module contents