𧬠PyHIV: A Python Package for Local HIVβ1 Sequence Alignment, Subtyping and Gene Splittingο
π Overviewο
PyHIV is a Python tool that aligns HIV nucleotide sequences against reference genomes to determine the most similar subtype and optionally split the aligned sequences into gene regions.
It produces:
Best reference alignment per sequence
Subtype and reference metadata
Gene-regionβspecific FASTA files (optional)
A final summary table (
final_table.tsv)
βοΈ How It Worksο
βββββββββββββββββββββββββββββββββββββββββββββββ
β User FASTA sequences β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
Read and preprocess input
β
βΌ
Align sequences against reference genomes
β
βΌ
Identify best matching reference
β
βΌ
(Optional) Split by gene region
β
βΌ
Save results and summary table (.tsv)
π¦ Installationο
You can install PyHIV using pip:
pip install pyhiv-tools
Alternatively, you can clone the repository and install it manually:
git clone https://github.com/anaapspereira/PyHIV.git
cd PyHIV
python setup.py install
π Getting Startedο
Basic usage:
from pyhiv import PyHIV
PyHIV(
fastas_dir="path/to/fasta/files",
subtyping=True,
splitting=True,
output_dir="results_folder",
n_jobs=4
)
Parameters:ο
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
Required |
Directory containing user FASTA files. |
|
|
|
Aligns against subtype reference genomes. If |
|
|
|
Splits aligned sequences into gene regions. |
|
|
|
Output directory for results. |
|
|
|
Number of parallel jobs for alignment. |
π Output Structureο
After running PyHIV, your output directory (default: PyHIV_results/) will contain:
PyHIV_results/
β
βββ best_alignment_<sequence>.fasta # Alignment to best reference
βββ final_table.tsv # Summary of results
β
βββ gag/
β βββ <sequence>_gag.fasta
β βββ ...
βββ pol/
β βββ <sequence>_pol.fasta
β βββ ...
βββ env/
βββ <sequence>_env.fasta
βββ ...
Final Table Columns:ο
Column |
Description |
|---|---|
Sequence |
Input sequence name |
Reference |
Best matching reference accession |
Subtype |
Predicted HIV-1 subtype |
Most Matching Gene Region |
Region with highest similarity |
Present Gene Regions |
All detected gene regions with valid alignments |
π Command Line Interfaceο
PyHIV provides a user-friendly CLI for HIV-1 sequence analysis.
π Getting Startedο
# Basic usage
pyhiv run sequences/
# With custom options
pyhiv run sequences/ -o results/ -j 4 -v
# Validate inputs first
pyhiv validate sequences/
βοΈ Main Optionsο
Option |
Description |
|---|---|
|
Enable/disable HIV-1 subtyping (default: enabled) |
|
Enable/disable gene region splitting (default: enabled) |
|
Output directory (default: |
|
Number of parallel jobs (default: all CPUs) |
|
Detailed output |
|
Suppress non-error output |
πΌ Common Use Casesο
Full analysis with subtyping and splitting:
pyhiv run data/sequences/
Alignment only:
pyhiv run data/sequences/ --no-subtyping --no-splitting
Parallel processing:
pyhiv run data/sequences/ -j 8 -o results/batch1/
Validation:
pyhiv validate data/sequences/
π€ Outputο
PyHIV generates:
final_table.tsv- Summary with sequence IDs, references, subtypes, and gene regionsbest_alignment_*.fasta- Best alignment for each sequenceGene-specific folders (when
--splittingis enabled) with extracted regions
π Getting Helpο
pyhiv --help # Show all commands
pyhiv run --help # Show options for run command
pyhiv --version # Show version
For comprehensive documentation, see CLI_README.md.
ποΈ Citationο
Manuscript in preparation. Please cite this repository if you use PyHIV in your research.
π§Ύ Licenseο
This project is licensed under the MIT License β see the LICENSE file for details.