Metadata-Version: 2.4
Name: activepathways
Version: 2.0.6
Summary: Integrative pathway enrichment analysis of multivariate omics data (Python port of the ActivePathways R package).
Author: Mykhaylo Slobodyanyuk, Jonathan Barenboim
Author-email: Juri Reimand <juri.reimand@utoronto.ca>
License-Expression: GPL-3.0-or-later
Project-URL: Homepage, https://github.com/abahcheli/ActivePathways_python
Project-URL: Source, https://github.com/abahcheli/ActivePathways_python
Project-URL: Issues, https://github.com/abahcheli/ActivePathways_python/issues
Project-URL: Upstream, https://github.com/reimandlab/ActivePathways
Keywords: pathway,enrichment,omics,p-value,Brown,Fisher,Stouffer
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.22
Requires-Dist: pandas>=1.5
Requires-Dist: scipy>=1.9
Provides-Extra: plot
Requires-Dist: matplotlib>=3.5; extra == "plot"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: matplotlib>=3.5; extra == "dev"
Dynamic: license-file

# ActivePathways (Python)

ActivePathways is a tool for **integrative pathway enrichment analysis** of multi-omics data. It identifies gene sets (such as pathways or Gene Ontology terms) that are over-represented in a matrix of genes and their p-values across multiple omics datasets. By fusing multiple datasets through p-value merging, ActivePathways surfaces biological signal that is invisible in any single dataset alone.

This is a Python port of the [R ActivePathways package](https://github.com/reimandlab/ActivePathways), preserving exact functionality and numerical output.

## Citation

If you use ActivePathways please cite:

**ActivePathways 2.0** (directional integration): Mykhaylo Slobodyanyuk, Alexander T. Bahcheli, et al. *Directional integration and pathway enrichment analysis for multi-omics data.* Nature Communications 15, 5690 (2024). [doi:10.1038/s41467-024-49986-4](https://doi.org/10.1038/s41467-024-49986-4)

**ActivePathways 1.0**: Marta Paczkowska, Jonathan Barenboim, et al. *Integrative pathway enrichment analysis of multivariate omics data.* Nature Communications 11, 735 (2020). [doi:10.1038/s41467-019-13983-9](https://doi.org/10.1038/s41467-019-13983-9)

## Installation

```bash
git clone https://github.com/abahcheli/ActivePathways_python.git
cd ActivePathways_python
pip install -e .
```

To build a distributable conda package (targets the PyPI tarball):

```bash
conda build conda.recipe/
conda install --use-local activepathways
```

> **Note:** `conda.recipe/meta.yaml` fetches the package from PyPI. Before the package is published to PyPI, the placeholder SHA256 in the recipe must be replaced with the real tarball hash, or the `source` section temporarily switched to `path: ..` for a local build.

**Dependencies:** `numpy>=1.22`, `pandas>=1.5`, `scipy>=1.9`. Python 3.9+ required.

## Quick start

The two required inputs are a **p-value matrix** (genes × omics datasets, TSV) and a **GMT file** of gene sets. GMT files for common pathway databases can be downloaded from the [Bader Lab gene sets page](https://download.baderlab.org/EM_Genesets/current_release/). Gene symbols must match between the matrix and the GMT file.

### Command line

```bash
activepathways \
  --scores data/Adenocarcinoma_scores_subset.tsv \
  --gmt    data/hsapiens_REAC_subset.gmt \
  --output results.csv
```

All options beyond `--scores`, `--gmt`, and `--output` are optional:

```bash
activepathways \
  --scores            data/Adenocarcinoma_scores_subset.tsv \
  --gmt               data/hsapiens_REAC_subset.gmt \
  --output            results.csv \
  --merge_method      Brown \
  --cutoff            0.1 \
  --significant       0.05 \
  --correction_method holm \
  --geneset_filter_min 5 \
  --geneset_filter_max 1000
```

Run `activepathways --help` for a full listing.

### Python

```python
import pandas as pd
from activepathways import active_pathways, export_as_csv

scores = pd.read_csv("data/Adenocarcinoma_scores_subset.tsv", sep="\t").set_index("Gene")
scores = scores.fillna(1.0)  # replace missing p-values with 1 (not significant)

results = active_pathways(scores, "data/hsapiens_REAC_subset.gmt")
export_as_csv(results, "results.csv")

print(results[["term_id", "term_name", "adjusted_p_val", "term_size"]].head())
```

```
        term_id            term_name  adjusted_p_val  term_size
0  REAC:2424491      DAP12 signaling    4.491268e-05        358
1   REAC:422475        Axon guidance    2.028966e-02        555
2   REAC:177929    Signaling by EGFR    6.245734e-04        366
3  REAC:2559583  Cellular Senescence    6.636060e-05        196
4   REAC:180292     GAB1 signalosome    1.215316e-02        133
```

The `overlap` column lists the genes driving enrichment; `evidence` lists which input datasets contributed.

For extended examples — directional integration, GMT utilities, Cytoscape output, and merging results — see [**docs/python_examples.md**](docs/python_examples.md).

## Key parameters

| Parameter | Default | Description |
| --- | --- | --- |
| scores | required | DataFrame of p-values (genes × datasets). No NAs — replace with 1.0. |
| gmt | required | Path to a GMT file, or a GMT object from read_gmt(). |
| background | all GMT genes | Custom gene universe for the hypergeometric test. |
| geneset_filter | (5, 1000) | (min, max) pathway size to retain. |
| cutoff | 0.1 | P-value cutoff for including genes in the ranked list. |
| significant | 0.05 | Adjusted p-value threshold for reporting pathways. |
| merge_method | "Fisher" | Method for combining p-values across datasets (see table below). |
| correction_method | "holm" | Multiple testing correction (holm, BH, bonferroni, etc.). |
| cytoscape_file_tag | None | File prefix for writing Cytoscape output files. |
| scores_direction | None | Fold-change direction matrix for directional methods. |
| constraints_vector | None | Expected directional relationships between datasets (1, -1, 0). |

### P-value merging methods

| Method | Directional | Description |
| --- | --- | --- |
| "Fisher" | No | Chi-squared combination of log p-values |
| "Brown" | No | Fisher's method corrected for between-dataset correlation |
| "Stouffer" | No | Z-score combination |
| "Strube" | No | Stouffer corrected for correlation |
| "Fisher_directional" | Yes | Fisher penalising directional conflicts |
| "DPM" | Yes | Brown's method with directional penalty (recommended) |
| "Stouffer_directional" | Yes | Stouffer with directional penalty |
| "Strube_directional" | Yes | Strube with directional penalty |

## API reference

| Function | Description |
| --- | --- |
| active_pathways(scores, gmt, ...) | Main enrichment function |
| merge_p_values(scores, method, ...) | Combine p-values across datasets |
| read_gmt(filename) | Load a GMT file → GMT object |
| write_gmt(gmt, filename) | Write a GMT object to file |
| make_background(gmt) | Union of all genes across all GMT terms |
| export_as_csv(results, filename) | Save results table to CSV |
| merge_results(...) | Merge standard and directional results for Cytoscape |
| p_adjust(p, method) | Multiple testing correction (mirrors R's p.adjust) |
| enrichment_analysis(genelist, gmt, background) | Ordered hypergeometric test per term |

## Differences from the R package

| R | Python |
| --- | --- |
| ActivePathways() | active_pathways() |
| export_as_CSV() | export_as_csv() |
| read.GMT() | read_gmt() |
| write.GMT() | write_gmt() |
| data.table output | pandas.DataFrame output |
| scores[is.na(scores)] <- 1 | scores.fillna(1.0) |

## References

- Slobodyanyuk M*, Bahcheli AT*, et al. *Directional integration and pathway enrichment analysis for multi-omics data.* Nature Communications (2024). [doi:10.1038/s41467-024-49986-4](https://doi.org/10.1038/s41467-024-49986-4)
- Paczkowska M*, Barenboim J*, et al. *Integrative pathway enrichment analysis of multivariate omics data.* Nature Communications (2020). [doi:10.1038/s41467-019-13983-9](https://doi.org/10.1038/s41467-019-13983-9)
- Reimand J*, Isserlin R*, et al. *Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap.* Nature Protocols (2019). [doi:10.1038/s41596-018-0103-9](https://doi.org/10.1038/s41596-018-0103-9)
