🧬 PyHIV: A Python Package for Local HIV‑1 Sequence Alignment, Subtyping and Gene Splitting

CI codecov Python Version OS Supported

PyPI version Documentation Status License: MIT GitHub issues


πŸ“– Overview

PyHIV is a Python tool that aligns HIV nucleotide sequences against reference genomes to determine the most similar subtype and optionally split the aligned sequences into gene regions.

It produces:

  • Best reference alignment per sequence

  • Subtype and reference metadata

  • Gene-region–specific FASTA files (optional)

  • A final summary table (final_table.tsv)


βš™οΈ How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  User FASTA sequences                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚
                β–Ό
       Read and preprocess input
                β”‚
                β–Ό
 Align sequences against reference genomes
                β”‚
                β–Ό
    Identify best matching reference
                β”‚
                β–Ό
     (Optional) Split by gene region
                β”‚
                β–Ό
  Save results and summary table (.tsv)


πŸ“¦ Installation

You can install PyHIV using pip:

pip install pyhiv-tools

Alternatively, you can clone the repository and install it manually:

git clone https://github.com/anaapspereira/PyHIV.git
cd PyHIV
python setup.py install

πŸš€ Getting Started

Basic usage:

from pyhiv import PyHIV

PyHIV(
    fastas_dir="path/to/fasta/files",
    subtyping=True,
    splitting=True,
    output_dir="results_folder",
    n_jobs=4
)

Parameters:

Parameter

Type

Default

Description

fastas_dir

str

Required

Directory containing user FASTA files.

subtyping

bool

True

Aligns against subtype reference genomes. If False, aligns only to HXB2.

splitting

bool

True

Splits aligned sequences into gene regions.

output_dir

str

"PyHIV_results"

Output directory for results.

n_jobs

int

None

Number of parallel jobs for alignment.

πŸ“‚ Output Structure

After running PyHIV, your output directory (default: PyHIV_results/) will contain:

PyHIV_results/
β”‚
β”œβ”€β”€ best_alignment_<sequence>.fasta     # Alignment to best reference
β”œβ”€β”€ final_table.tsv                     # Summary of results
β”‚
β”œβ”€β”€ gag/
β”‚   β”œβ”€β”€ <sequence>_gag.fasta
β”‚   └── ...
β”œβ”€β”€ pol/
β”‚   β”œβ”€β”€ <sequence>_pol.fasta
β”‚   └── ...
└── env/
    β”œβ”€β”€ <sequence>_env.fasta
    └── ...

Final Table Columns:

Column

Description

Sequence

Input sequence name

Reference

Best matching reference accession

Subtype

Predicted HIV-1 subtype

Most Matching Gene Region

Region with highest similarity

Present Gene Regions

All detected gene regions with valid alignments


πŸ“Ÿ Command Line Interface

PyHIV provides a user-friendly CLI for HIV-1 sequence analysis.

πŸš€ Getting Started

# Basic usage
pyhiv run sequences/

# With custom options
pyhiv run sequences/ -o results/ -j 4 -v

# Validate inputs first
pyhiv validate sequences/

βš™οΈ Main Options

Option

Description

--subtyping / --no-subtyping

Enable/disable HIV-1 subtyping (default: enabled)

--splitting / --no-splitting

Enable/disable gene region splitting (default: enabled)

-o, --output-dir PATH

Output directory (default: PyHIV_results)

-j, --n-jobs INTEGER

Number of parallel jobs (default: all CPUs)

-v, --verbose

Detailed output

-q, --quiet

Suppress non-error output

πŸ’Ό Common Use Cases

Full analysis with subtyping and splitting:

pyhiv run data/sequences/

Alignment only:

pyhiv run data/sequences/ --no-subtyping --no-splitting

Parallel processing:

pyhiv run data/sequences/ -j 8 -o results/batch1/

Validation:

pyhiv validate data/sequences/

πŸ“€ Output

PyHIV generates:

  • final_table.tsv - Summary with sequence IDs, references, subtypes, and gene regions

  • best_alignment_*.fasta - Best alignment for each sequence

  • Gene-specific folders (when --splitting is enabled) with extracted regions

πŸ†˜ Getting Help

pyhiv --help           # Show all commands
pyhiv run --help       # Show options for run command
pyhiv --version        # Show version

For comprehensive documentation, see CLI_README.md.


πŸ—‚οΈ Citation

Manuscript in preparation. Please cite this repository if you use PyHIV in your research.


🧾 License

This project is licensed under the MIT License β€” see the LICENSE file for details.