pyhiv.report package

Submodules

pyhiv.report.constants module

Constants and configuration for PyHIV reporting module.

class pyhiv.report.constants.GenePanelConfig[source]

Bases: object

Gene panel spacing and positioning configuration.

ALIGNMENT_CLEARANCE = 0.243
BOTTOM_MARGIN = 0.55
NON_K03455_X_MAX_DEFAULT = 10000
REV_CONNECTOR = 0.6075
TAT_CONNECTOR = 0.09450000000000001
TOP_MARGIN = 0.3
X_PAD_MIN = 60
Y_SCALE = 1.35
class pyhiv.report.constants.K03455Config[source]

Bases: object

Configuration for K03455 special reference handling.

DEFAULT_K03455_OFFSETS = (-0.15, 0.35)
K03455_NUMERIC_OFFSETS = {"3' LTR": (0.25, -0.2), "5' LTR": (-0.15, -0.2), 'env': (-0.15, -0.2), 'gag': (0.3, 0.2), 'nef': (-0.15, -0.2), 'pol': (-0.15, -0.2), 'rev 1': (0.2, 0.25), 'rev 2': (0.2, 0.25), 'tat 1': (-0.15, 0.15), 'tat 2': (-0.15, 0.2), 'vif': (-0.15, -0.2), 'vpr': (0.2, -0.2), 'vpu': (0.35, 0.15)}
TARGET_REGIONS = ["5' LTR", 'gag', 'pol', 'vif', 'vpr', 'vpu', 'tat 1', 'tat 2', 'rev 1', 'rev 2', 'env', 'nef', "3' LTR"]
Y_POSITIONS = {"3' LTR": 0.0, "5' LTR": 0.0, 'env': 0.0, 'gag': 0.4, 'nef': 0.8, 'pol': 0.0, 'rev 1': 1.0, 'rev 2': 1.0, 'tat 1': 0.2, 'tat 2': 0.4, 'vif': 0.4, 'vpr': 0.8, 'vpu': 0.6}
classmethod get_k03455_offsets(gene)[source]

Get K03455-specific offsets for a gene.

Parameters:

gene (str) – Gene name.

Return type:

tuple[float, float]

class pyhiv.report.constants.MetadataConfig[source]

Bases: object

Metadata block display configuration.

FONTSIZE = 9.5
INFO_TOP_Y = 0.78
TITLE_Y = 1.06
WRAP = 75
class pyhiv.report.constants.NumericOffsets[source]

Bases: object

Vertical offsets for numeric labels (non-K03455 only).

DEFAULT_OFFSETS = (-0.15, 0.35)
GENE_OFFSET_MAP = {"3' ltr": (0.25, -0.2), "5' ltr": (-0.15, -0.2), 'env': (0.15, 0.15), 'gag': (-0.15, -0.15), 'gag-pol': (-0.15, -0.2), 'nef': (-0.15, -0.2), 'pol': (-0.15, -0.2), 'rev 1': (0.2, 0.25), 'rev 2': (0.2, 0.25), 'tat 1': (0.15, 0.15), 'tat 2': (0.15, 0.2), 'vif': (-0.15, -0.2), 'vpr': (0.15, -0.2), 'vpu': (0.15, 0.15)}
classmethod get_offsets(gene)[source]

Get offsets for a gene (case-insensitive).

Parameters:

gene (str) – Gene name.

Return type:

tuple[float, float]

class pyhiv.report.constants.PageLayout[source]

Bases: object

Page layout and spacing configuration.

FIGSIZE = (11.69, 9.2)
GRID_HEIGHT_RATIOS = [0.9, 1.9]
HSPACE = 0.42

pyhiv.report.pdf_generator module

PDF report generation for PyHIV results.

pyhiv.report.pdf_generator.render_sequence_page(pdf, sequence, accession, subtype, mm_region, present_regions, features_aln, ref_seq_aligned, user_seq_aligned, y_positions=None)[source]

Render a single sequence page in the PDF report.

Parameters:
  • pdf (PdfPages) – The PdfPages object to save the figure into.

  • sequence (str) – The name or identifier of the sequence.

  • accession (str) – The accession number of the sequence.

  • subtype (str) – The subtype of the sequence.

  • mm_region (str) – The most matching region of the sequence.

  • present_regions (List[str]) – List of present regions in the sequence.

  • features_aln (Dict[str, Tuple[int, int]]) – Dictionary of gene features with their alignment coordinate ranges.

  • ref_seq_aligned (str) – The reference sequence aligned (with gaps).

  • user_seq_aligned (str) – The user’s sequence aligned (with gaps).

  • y_positions (Optional[Dict[str, float]], optional) – Fixed y-positions for gene lanes, by default None (auto lanes).

pyhiv.report.reporter module

Main reporting class for PyHIV results.

class pyhiv.report.reporter.PyHIVReporter(output_dir, log_level=20)[source]

Bases: object

Main class for generating PyHIV PDF reports.

Parameters:

output_dir (Path)

generate_report(final_table_path, sequences_with_locations_path, output_pdf_name='PyHIV_report_all_sequences.pdf')[source]

Generate PDF report from PyHIV results.

Parameters:
  • final_table_path (Path) – Path to final_table.tsv file.

  • sequences_with_locations_path (Path) – Path to sequences_with_locations.tsv file.

  • output_pdf_name (str, optional) – Name of the output PDF file, by default “PyHIV_report_all_sequences.pdf”

Returns:

Path to the generated PDF report.

Return type:

Path

pyhiv.report.utils module

Utility functions for PyHIV reporting module.

pyhiv.report.utils.build_alignment_path(sequence, alignments_dir)[source]

Build path to alignment FASTA file.

Parameters:
  • sequence (str) – The name or identifier of the sequence.

  • alignments_dir (Path) – The directory containing alignment FASTA files.

Returns:

The path to the alignment FASTA file.

Return type:

Path

pyhiv.report.utils.build_ref_to_alignment_map(ref_aligned)[source]

Build mapping from reference coordinates to alignment coordinates.

Parameters:

ref_aligned (str) – The reference sequence with alignment gaps.

Returns:

A tuple containing: - A dictionary mapping reference positions to alignment indices. - The length of the aligned reference sequence.

Return type:

Tuple[Dict[int, int], int]

pyhiv.report.utils.canon_label(label)[source]

Canonicalize gene label for K03455.

Parameters:

label (str) – The input gene label.

Returns:

The canonical gene label, or None if not recognized.

Return type:

Optional[str]

pyhiv.report.utils.first_last_nongap_idx(seq)[source]

Return the first and last indices of non-gap characters in a sequence.

Parameters:

seq (str) – The input sequence with gaps.

Returns:

A tuple containing the first and last indices of non-gap characters.

Return type:

Tuple[int, int]

pyhiv.report.utils.get_numeric_offsets_non_special(gene)[source]

Get numeric offsets for non-K03455 references using NumericOffsets.

Parameters:

gene (str) – The gene name.

Returns:

A tuple containing (start_offset, end_offset).

Return type:

tuple[float, float]

pyhiv.report.utils.is_special_reference(accession, ref_header)[source]

Check if reference is special (K03455).

Parameters:
  • accession (str) – The accession number of the reference.

  • ref_header (str) – The header of the reference sequence.

Returns:

True if the reference is K03455, False otherwise.

Return type:

bool

pyhiv.report.utils.normalize_features(raw_features, special)[source]

Normalize features based on reference type.

Parameters:
  • raw_features (Dict[str, Tuple[int, int]]) – Raw features mapping.

  • special (bool) – Whether the reference is special (K03455).

Returns:

Normalized features mapping.

Return type:

Dict[str, Tuple[int, int]]

pyhiv.report.utils.normalize_present_regions(regions, special)[source]

Normalize present regions based on reference type.

Parameters:
  • regions (List[str]) – List of raw present regions.

  • special (bool) – Whether the reference is special (K03455).

Returns:

Normalized list of present regions.

Return type:

List[str]

pyhiv.report.utils.parse_features(cell)[source]

Parse features from table cell.

Parameters:

cell (Any) – The table cell containing features.

Returns:

A dictionary mapping feature names to (start, end) tuples.

Return type:

Dict[str, Tuple[int, int]]

pyhiv.report.utils.parse_present_regions(cell)[source]

Parse present regions from a table cell into a list of region strings.

Parameters:

cell (Any) – The table cell containing present regions.

Returns:

A list of present region strings.

Return type:

List[str]

pyhiv.report.utils.project_features_to_alignment(features_genomic, ref_map)[source]

Project genomic features to alignment coordinates.

Parameters:
  • features_genomic (Dict[str, Tuple[int, int]]) – Genomic features mapping.

  • ref_map (Dict[int, int]) – Reference to alignment mapping.

Returns:

Features projected to alignment coordinates.

Return type:

Dict[str, Tuple[int, int]]

pyhiv.report.utils.read_alignment_fasta(fpath)[source]

Read alignment FASTA file and return headers and sequences.

Parameters:

fpath (Path) – Path to the alignment FASTA file.

Returns:

A tuple containing: - Reference header - Reference sequence (aligned) - User header - User sequence (aligned)

Return type:

Tuple[str, str, str, str]

pyhiv.report.utils.ungap(seq)[source]

Remove gaps from sequence.

Parameters:

seq (str) – The input sequence with gaps.

Returns:

The ungapped sequence.

Return type:

str

pyhiv.report.visualization module

Gene visualization and plotting functions for PyHIV reporting module.

pyhiv.report.visualization.plot_gene_axes(ax, genes_ranges, alignment_start, alignment_end, y_positions=None)[source]

Plot gene visualization with alignment information.

Parameters:
  • ax (matplotlib.axes.Axes) – The matplotlib Axes to plot on.

  • genes_ranges (Dict[str, Tuple[int, int]]) – Mapping of gene names to their (start, end) positions.

  • alignment_start (int) – Start position of the alignment span.

  • alignment_end (int) – End position of the alignment span.

  • y_positions (Optional[Dict[str, float]], optional) – Fixed y-positions for gene lanes, by default None (auto lanes).

Module contents

PyHIV reporting module.

This module provides functionality to generate PDF reports from PyHIV analysis results. The reports include sequence metadata and gene visualization plots.

class pyhiv.report.GenePanelConfig[source]

Bases: object

Gene panel spacing and positioning configuration.

ALIGNMENT_CLEARANCE = 0.243
BOTTOM_MARGIN = 0.55
NON_K03455_X_MAX_DEFAULT = 10000
REV_CONNECTOR = 0.6075
TAT_CONNECTOR = 0.09450000000000001
TOP_MARGIN = 0.3
X_PAD_MIN = 60
Y_SCALE = 1.35
class pyhiv.report.K03455Config[source]

Bases: object

Configuration for K03455 special reference handling.

DEFAULT_K03455_OFFSETS = (-0.15, 0.35)
K03455_NUMERIC_OFFSETS = {"3' LTR": (0.25, -0.2), "5' LTR": (-0.15, -0.2), 'env': (-0.15, -0.2), 'gag': (0.3, 0.2), 'nef': (-0.15, -0.2), 'pol': (-0.15, -0.2), 'rev 1': (0.2, 0.25), 'rev 2': (0.2, 0.25), 'tat 1': (-0.15, 0.15), 'tat 2': (-0.15, 0.2), 'vif': (-0.15, -0.2), 'vpr': (0.2, -0.2), 'vpu': (0.35, 0.15)}
TARGET_REGIONS = ["5' LTR", 'gag', 'pol', 'vif', 'vpr', 'vpu', 'tat 1', 'tat 2', 'rev 1', 'rev 2', 'env', 'nef', "3' LTR"]
Y_POSITIONS = {"3' LTR": 0.0, "5' LTR": 0.0, 'env': 0.0, 'gag': 0.4, 'nef': 0.8, 'pol': 0.0, 'rev 1': 1.0, 'rev 2': 1.0, 'tat 1': 0.2, 'tat 2': 0.4, 'vif': 0.4, 'vpr': 0.8, 'vpu': 0.6}
classmethod get_k03455_offsets(gene)[source]

Get K03455-specific offsets for a gene.

Parameters:

gene (str) – Gene name.

Return type:

tuple[float, float]

class pyhiv.report.MetadataConfig[source]

Bases: object

Metadata block display configuration.

FONTSIZE = 9.5
INFO_TOP_Y = 0.78
TITLE_Y = 1.06
WRAP = 75
class pyhiv.report.NumericOffsets[source]

Bases: object

Vertical offsets for numeric labels (non-K03455 only).

DEFAULT_OFFSETS = (-0.15, 0.35)
GENE_OFFSET_MAP = {"3' ltr": (0.25, -0.2), "5' ltr": (-0.15, -0.2), 'env': (0.15, 0.15), 'gag': (-0.15, -0.15), 'gag-pol': (-0.15, -0.2), 'nef': (-0.15, -0.2), 'pol': (-0.15, -0.2), 'rev 1': (0.2, 0.25), 'rev 2': (0.2, 0.25), 'tat 1': (0.15, 0.15), 'tat 2': (0.15, 0.2), 'vif': (-0.15, -0.2), 'vpr': (0.15, -0.2), 'vpu': (0.15, 0.15)}
classmethod get_offsets(gene)[source]

Get offsets for a gene (case-insensitive).

Parameters:

gene (str) – Gene name.

Return type:

tuple[float, float]

class pyhiv.report.PageLayout[source]

Bases: object

Page layout and spacing configuration.

FIGSIZE = (11.69, 9.2)
GRID_HEIGHT_RATIOS = [0.9, 1.9]
HSPACE = 0.42
class pyhiv.report.PyHIVReporter(output_dir, log_level=20)[source]

Bases: object

Main class for generating PyHIV PDF reports.

Parameters:

output_dir (Path)

generate_report(final_table_path, sequences_with_locations_path, output_pdf_name='PyHIV_report_all_sequences.pdf')[source]

Generate PDF report from PyHIV results.

Parameters:
  • final_table_path (Path) – Path to final_table.tsv file.

  • sequences_with_locations_path (Path) – Path to sequences_with_locations.tsv file.

  • output_pdf_name (str, optional) – Name of the output PDF file, by default “PyHIV_report_all_sequences.pdf”

Returns:

Path to the generated PDF report.

Return type:

Path

pyhiv.report.build_alignment_path(sequence, alignments_dir)[source]

Build path to alignment FASTA file.

Parameters:
  • sequence (str) – The name or identifier of the sequence.

  • alignments_dir (Path) – The directory containing alignment FASTA files.

Returns:

The path to the alignment FASTA file.

Return type:

Path

pyhiv.report.build_ref_to_alignment_map(ref_aligned)[source]

Build mapping from reference coordinates to alignment coordinates.

Parameters:

ref_aligned (str) – The reference sequence with alignment gaps.

Returns:

A tuple containing: - A dictionary mapping reference positions to alignment indices. - The length of the aligned reference sequence.

Return type:

Tuple[Dict[int, int], int]

pyhiv.report.canon_label(label)[source]

Canonicalize gene label for K03455.

Parameters:

label (str) – The input gene label.

Returns:

The canonical gene label, or None if not recognized.

Return type:

Optional[str]

pyhiv.report.first_last_nongap_idx(seq)[source]

Return the first and last indices of non-gap characters in a sequence.

Parameters:

seq (str) – The input sequence with gaps.

Returns:

A tuple containing the first and last indices of non-gap characters.

Return type:

Tuple[int, int]

pyhiv.report.get_numeric_offsets_non_special(gene)[source]

Get numeric offsets for non-K03455 references using NumericOffsets.

Parameters:

gene (str) – The gene name.

Returns:

A tuple containing (start_offset, end_offset).

Return type:

tuple[float, float]

pyhiv.report.is_special_reference(accession, ref_header)[source]

Check if reference is special (K03455).

Parameters:
  • accession (str) – The accession number of the reference.

  • ref_header (str) – The header of the reference sequence.

Returns:

True if the reference is K03455, False otherwise.

Return type:

bool

pyhiv.report.normalize_features(raw_features, special)[source]

Normalize features based on reference type.

Parameters:
  • raw_features (Dict[str, Tuple[int, int]]) – Raw features mapping.

  • special (bool) – Whether the reference is special (K03455).

Returns:

Normalized features mapping.

Return type:

Dict[str, Tuple[int, int]]

pyhiv.report.normalize_present_regions(regions, special)[source]

Normalize present regions based on reference type.

Parameters:
  • regions (List[str]) – List of raw present regions.

  • special (bool) – Whether the reference is special (K03455).

Returns:

Normalized list of present regions.

Return type:

List[str]

pyhiv.report.parse_features(cell)[source]

Parse features from table cell.

Parameters:

cell (Any) – The table cell containing features.

Returns:

A dictionary mapping feature names to (start, end) tuples.

Return type:

Dict[str, Tuple[int, int]]

pyhiv.report.parse_present_regions(cell)[source]

Parse present regions from a table cell into a list of region strings.

Parameters:

cell (Any) – The table cell containing present regions.

Returns:

A list of present region strings.

Return type:

List[str]

pyhiv.report.plot_gene_axes(ax, genes_ranges, alignment_start, alignment_end, y_positions=None)[source]

Plot gene visualization with alignment information.

Parameters:
  • ax (matplotlib.axes.Axes) – The matplotlib Axes to plot on.

  • genes_ranges (Dict[str, Tuple[int, int]]) – Mapping of gene names to their (start, end) positions.

  • alignment_start (int) – Start position of the alignment span.

  • alignment_end (int) – End position of the alignment span.

  • y_positions (Optional[Dict[str, float]], optional) – Fixed y-positions for gene lanes, by default None (auto lanes).

pyhiv.report.project_features_to_alignment(features_genomic, ref_map)[source]

Project genomic features to alignment coordinates.

Parameters:
  • features_genomic (Dict[str, Tuple[int, int]]) – Genomic features mapping.

  • ref_map (Dict[int, int]) – Reference to alignment mapping.

Returns:

Features projected to alignment coordinates.

Return type:

Dict[str, Tuple[int, int]]

pyhiv.report.read_alignment_fasta(fpath)[source]

Read alignment FASTA file and return headers and sequences.

Parameters:

fpath (Path) – Path to the alignment FASTA file.

Returns:

A tuple containing: - Reference header - Reference sequence (aligned) - User header - User sequence (aligned)

Return type:

Tuple[str, str, str, str]

pyhiv.report.render_sequence_page(pdf, sequence, accession, subtype, mm_region, present_regions, features_aln, ref_seq_aligned, user_seq_aligned, y_positions=None)[source]

Render a single sequence page in the PDF report.

Parameters:
  • pdf (PdfPages) – The PdfPages object to save the figure into.

  • sequence (str) – The name or identifier of the sequence.

  • accession (str) – The accession number of the sequence.

  • subtype (str) – The subtype of the sequence.

  • mm_region (str) – The most matching region of the sequence.

  • present_regions (List[str]) – List of present regions in the sequence.

  • features_aln (Dict[str, Tuple[int, int]]) – Dictionary of gene features with their alignment coordinate ranges.

  • ref_seq_aligned (str) – The reference sequence aligned (with gaps).

  • user_seq_aligned (str) – The user’s sequence aligned (with gaps).

  • y_positions (Optional[Dict[str, float]], optional) – Fixed y-positions for gene lanes, by default None (auto lanes).

pyhiv.report.ungap(seq)[source]

Remove gaps from sequence.

Parameters:

seq (str) – The input sequence with gaps.

Returns:

The ungapped sequence.

Return type:

str