PegasusIO for reading / writing single-cell genomics data

PegasusIO is the IO package for Pegasus.

Read documentation

Version 0.10.0 May 22, 2025

read_input function supports overwriting genome and modality for 10x v3 hdf5 format data, but takes effects only when there is only one matrix, i.e. one (genome, feature_type) combination.
Add read_molecule_info function to support loading the hdf5 format UMI metadata table generated by cumulus_feature_barcoding.
Make anndata an optional dependency, as it’s only required for I/O with h5ad format files.

Version 0.9.1 June 11, 2024

The copy function of MultimodalData and UnimodalData removes unused categories in obs_names and var_names. [PR #114]

Version 0.9.0 January 19, 2024

UnimodalData’s copy function now returns a MultimodalData object by default. Use cases:
- data[indices, :].copy() returns a MultimodalData object;
- data[indices, :].copy(to_unidata=True) still returns a UnimodalData object.

Version 0.8.2 January 5, 2024

Add pop_matrix, update_matrix and as_int functions for manipulating matrices. [PR #107]
Improve aggregate_matrices function. [PR #108]

Version 0.8.1 July 13, 2023

Bug fix and backward compatibility support

Version 0.8.0 February 7, 2023

Fix for loading 10x Space Ranger v2 input data. [PR 98 and 99]
Improve support for mtx format input data. [PR 102]
Make compatible with Numpy v1.24+. [PR 101]

Version 0.7.1 August 7, 2022

Bug fix in to_anndata function. [PR #97]

Version 0.7.0 July 25, 2022

The default count matrix of a UnimodalData object now has key counts instead of X.
Add uid option to UnimodalData constructor.
Add get_uid function to UnimodalData class.

Version 0.6.2 July 5, 2022

In read_input function, add transpose option to transpose the loaded count matrix. Only works for CSV or TSV-format files.

Version 0.6.1 May 18, 2022

Make the generated h5 format count matrix readable by read10xCount function in R DropletUtil package. [PR #93]

Version 0.6.0 May 14, 2022

write_output function supports 10x hdf5 format. [PR #92]

Version 0.5.1 February 10, 2022

Make PegasusIO work with Zarr v2.11.0.
Bug fix in quality control. [PR #89 by hoondy]

Version 0.5.0 January 24, 2022

Add support on 10x Visium spatial data
- Read the data folder by read_input function with file_type="visium" option.
- Write 10x Visium data to Zarr format by write_output function with output file name of .zarr.zip extension.

Version 0.4.1 November 4, 2021

Fix issues on UnimodalData object construction.

Version 0.4.0 October 19, 2021

Add obsp and varp fields to store graph representation in terms of square matrices.
Allow copy from View of AnnData.
In MultimodalData, add register_attr function to register an attribute of a specified type in obs or obsm fields. This can be useful for adding information like gene signatures, etc.

Version 0.3.1 July 16, 2021

For aggregate_matrices function, allow sample-specific filtration on minimum number of UMIs (nUMI column in sample sheet) and minimum number of genes (nGene column in sample sheet), which would overwrite the corresponding parameters of the function for these samples.

Version 0.3.0 July 6, 2021

Add support for composite list (e.g. [0, pd.DataFrame, np.ndarray]) in data.uns field for Zarr read/write.

Version 0.2.14 June 28, 2021

Add parameter uns_white_list in filter_data function to keep QC statistics if needed.

Version 0.2.13 June 24, 2021

The aggregate_matrices function now accepts sample sheet in Python dictionary format besides a CSV file path string. See details in its description in API panel.

Version 0.2.12 May 28, 2021

Bug fix.

Version 0.2.11 May 17, 2021

Bug fix.

Version 0.2.10 February 2, 2021

Feature enhancement.

Version 0.2.9 December 25, 2020

Fix a bug for caching percent mito rate.
Improve write_mtx function.

Version 0.2.8 December 7, 2020

Add support on loading loom file with Seurat-style cell barcode and feature key names.
Bug fix: resolve an issue on count matrix dimension inconsistency with feature metadata on data aggregation, when last feature has 0 count across all cell barcodes. Thanks to Mikhail Alperovich for reporting this issue.
Other bug fix and performance improvements.

Version 0.2.7 October 13, 2020

Add support for Nanostring GeoMx data format.
Fix bugs.

Version 0.2.6 September 28, 2020

Fix bug in SCP compatible output generation.

Version 0.2.5 August 19, 2020

Adjustment for Pegasus command usage.

Version 0.2.2 June 16th, 2020

Fix bugs in data aggregation.

Version 0.2.1 June 8th, 2020

Fix bug in processing single h5 file.

Version 0.2.0 June 7th, 2020

Initial release.