PegasusIO Tutorial

In [1]:
import pegasusio as io
import pandas as pd

Case 1: Read h5ad file

We use pbmc3k h5ad file from https://cellxgene-example-data.czi.technology/pbmc3k.h5ad as demo. First, read it using PegasusIO:

In [2]:
data1 = io.read_input("pegasusio_test_cases/case1/pbmc3k.h5ad", genome = 'hg19')
data1
2020-06-05 09:35:14,667 - pegasusio.readwrite - INFO - h5ad file 'pegasusio_test_cases/case1/pbmc3k.h5ad' is loaded.
2020-06-05 09:35:14,668 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.28s.
Out[2]:
MultimodalData object with 1 UnimodalData: 'hg19-rna'
    It currently binds to UnimodalData object hg19-rna

UnimodalData object with n_obs x n_vars = 2638 x 1838
    Genome: hg19; Modality: rna
    It contains 1 matrices: 'X'
    It currently binds to matrix 'X' as X

    obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'n_genes', 'percent_mito', 'n_counts', 'louvain', 'leiden'
    var: 'featureid', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    obsm: 'X_pca', 'X_umap', 'X_tsne', 'X_draw_graph_fr'
    varm: 'PCs'
    uns: 'draw_graph', 'leiden', 'louvain', 'neighbors', 'pca', 'genome', 'modality'

The gene-count matrix has 2638 cell barcodes and 1838 genes, and PegasusIO stores it as the default UnimodalData element within a MultimodalData object.

We can generate SCP compatible outputs from it. These files are needed when one imports data to Single-Cell Portal (SCP):

In [3]:
io.write_output(data1, "pegasusio_test_cases/case1/pbmc3k", file_type = 'scp')
2020-06-05 09:35:14,721 - pegasusio.text_utils - INFO - Metadata file /Users/yy939/GitHub/pegasusio/notebooks/pegasusio_test_cases/case1/pbmc3k.scp.metadata.txt is written.
2020-06-05 09:35:14,736 - pegasusio.text_utils - INFO - Coordinate file /Users/yy939/GitHub/pegasusio/notebooks/pegasusio_test_cases/case1/pbmc3k.scp.X_pca.coords.txt is written.
2020-06-05 09:35:14,746 - pegasusio.text_utils - INFO - Coordinate file /Users/yy939/GitHub/pegasusio/notebooks/pegasusio_test_cases/case1/pbmc3k.scp.X_umap.coords.txt is written.
2020-06-05 09:35:14,758 - pegasusio.text_utils - INFO - Coordinate file /Users/yy939/GitHub/pegasusio/notebooks/pegasusio_test_cases/case1/pbmc3k.scp.X_tsne.coords.txt is written.
2020-06-05 09:35:14,774 - pegasusio.text_utils - INFO - Coordinate file /Users/yy939/GitHub/pegasusio/notebooks/pegasusio_test_cases/case1/pbmc3k.scp.X_draw_graph_fr.coords.txt is written.
2020-06-05 09:35:14,778 - pegasusio.text_utils - INFO - Barcode file /Users/yy939/GitHub/pegasusio/notebooks/pegasusio_test_cases/case1/pbmc3k.scp.barcodes.tsv is written.
2020-06-05 09:35:14,786 - pegasusio.text_utils - INFO - Feature file /Users/yy939/GitHub/pegasusio/notebooks/pegasusio_test_cases/case1/pbmc3k.scp.features.tsv is written.
2020-06-05 09:35:16,179 - pegasusio.text_utils - INFO - Matrix file /Users/yy939/GitHub/pegasusio/notebooks/pegasusio_test_cases/case1/pbmc3k.scp.matrix.mtx is written.
2020-06-05 09:35:16,180 - pegasusio.text_utils - INFO - write_scp_file is done.
2020-06-05 09:35:16,180 - pegasusio.readwrite - INFO - scp file 'pegasusio_test_cases/case1/pbmc3k' is written.
2020-06-05 09:35:16,181 - pegasusio.readwrite - INFO - Function 'write_output' finished in 1.50s.

We can also write the data in mtx format:

In [4]:
io.write_output(data1, "pegasusio_test_cases/case1/pbmc3k_mtx")
2020-06-05 09:35:19,647 - pegasusio.text_utils - INFO - /Users/yy939/GitHub/pegasusio/notebooks/pegasusio_test_cases/case1/pbmc3k_mtx/hg19-rna/matrix.mtx.gz is written.
2020-06-05 09:35:19,709 - pegasusio.text_utils - INFO - barcodes.tsv.gz is written.
2020-06-05 09:35:19,743 - pegasusio.text_utils - INFO - features.tsv.gz is written.
2020-06-05 09:35:19,744 - pegasusio.text_utils - INFO - Mtx for hg19-rna is written.
2020-06-05 09:35:19,744 - pegasusio.text_utils - INFO - Mtx files are written.
2020-06-05 09:35:19,745 - pegasusio.readwrite - INFO - mtx file 'pegasusio_test_cases/case1/pbmc3k_mtx' is written.
2020-06-05 09:35:19,745 - pegasusio.readwrite - INFO - Function 'write_output' finished in 3.56s.

Below is to generate loom format output:

In [5]:
io.write_output(data1, "pegasusio_test_cases/case1/pbmc3k.loom")
2020-06-05 09:35:21,730 - pegasusio.hdf5_utils - INFO - pegasusio_test_cases/case1/pbmc3k.loom is written.
2020-06-05 09:35:21,730 - pegasusio.readwrite - INFO - loom file 'pegasusio_test_cases/case1/pbmc3k.loom' is written.
2020-06-05 09:35:21,731 - pegasusio.readwrite - INFO - Function 'write_output' finished in 1.98s.

Below is to generate zarr.zip format output:

In [6]:
io.write_output(data1, "pegasusio_test_cases/case1/pbmc3k.zarr.zip")
2020-06-05 09:35:21,736 - pegasusio.zarr_utils - WARNING - Detected and removed pre-existing file pegasusio_test_cases/case1/pbmc3k.zarr.zip.
2020-06-05 09:35:21,834 - pegasusio.readwrite - INFO - zarr.zip file 'pegasusio_test_cases/case1/pbmc3k.zarr.zip' is written.
2020-06-05 09:35:21,835 - pegasusio.readwrite - INFO - Function 'write_output' finished in 0.10s.

Case 2: Process human and mouse mixture data with V3 chemistry

We use 10X data from http://cf.10xgenomics.com/samples/cell-exp/3.0.2/1k_hgmm_v3/1k_hgmm_v3_filtered_feature_bc_matrix.h5 for the demo.

In [7]:
data2 = io.read_input("pegasusio_test_cases/case2/1k_hgmm_v3_filtered_feature_bc_matrix.h5")
data2
2020-06-05 09:35:22,736 - pegasusio.readwrite - INFO - 10x file 'pegasusio_test_cases/case2/1k_hgmm_v3_filtered_feature_bc_matrix.h5' is loaded.
2020-06-05 09:35:22,737 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.90s.
Out[7]:
MultimodalData object with 2 UnimodalData: 'hg19-rna', 'mm10-rna'
    It currently binds to UnimodalData object hg19-rna

UnimodalData object with n_obs x n_vars = 1063 x 57905
    Genome: hg19; Modality: rna
    It contains 1 matrices: 'X'
    It currently binds to matrix 'X' as X

    obs: 
    var: 'featureid'
    obsm: 
    varm: 
    uns: 'genome', 'modality'

You can see that in the MultimodalData object data2, there are two UnimodalData elements: one with key hg19-rna, which is human data; the other with key mm10-rna, which is mouse data. And currently the default UnimodalData it refers to is the human data.

To reset the default UnimodalData to mouse data, use the following method:

In [8]:
data2.select_data('mm10-rna')
data2
Out[8]:
MultimodalData object with 2 UnimodalData: 'hg19-rna', 'mm10-rna'
    It currently binds to UnimodalData object mm10-rna

UnimodalData object with n_obs x n_vars = 1063 x 54232
    Genome: mm10; Modality: rna
    It contains 1 matrices: 'X'
    It currently binds to matrix 'X' as X

    obs: 
    var: 'featureid'
    obsm: 
    varm: 
    uns: 'genome', 'modality'

Now write_output will generate output for the mouse data matrix:

In [9]:
io.write_output(data2, "pegasusio_test_cases/case2/mouse.h5ad")
2020-06-05 09:35:23,727 - pegasusio.readwrite - INFO - h5ad file 'pegasusio_test_cases/case2/mouse.h5ad' is written.
2020-06-05 09:35:23,727 - pegasusio.readwrite - INFO - Function 'write_output' finished in 0.98s.

Case 3: Read different file formats

PegasusIO can read data matrix in different formats. In this case, we demonstrate csv, loom, and mtx formats. The data we use is from https://data.humancellatlas.org/explore/projects/cddab57b-6868-4be4-806f-395ed9dd635a/m/expression-matrices.

In [10]:
data3_csv = io.read_input("pegasusio_test_cases/case3/19dc248b-2e9d-4c52-8065-e681a61d1514.csv/expression.csv", genome = 'hg19')
data3_csv
2020-06-05 09:35:31,824 - pegasusio.readwrite - INFO - csv file 'pegasusio_test_cases/case3/19dc248b-2e9d-4c52-8065-e681a61d1514.csv/expression.csv' is loaded.
2020-06-05 09:35:31,825 - pegasusio.readwrite - INFO - Function 'read_input' finished in 8.09s.
Out[10]:
MultimodalData object with 1 UnimodalData: 'hg19-rna'
    It currently binds to UnimodalData object hg19-rna

UnimodalData object with n_obs x n_vars = 2544 x 58347
    Genome: hg19; Modality: rna
    It contains 1 matrices: 'X'
    It currently binds to matrix 'X' as X

    obs: 'genes_detected', 'file_uuid', 'file_version', 'total_umis', 'emptydrops_is_cell', 'barcode', 'cell_suspension.provenance.document_id', 'specimen_from_organism.provenance.document_id', 'derived_organ_ontology', 'derived_organ_label', 'derived_organ_parts_ontology', 'derived_organ_parts_label', 'cell_suspension.genus_species.ontology', 'cell_suspension.genus_species.ontology_label', 'donor_organism.provenance.document_id', 'donor_organism.human_specific.ethnicity.ontology', 'donor_organism.human_specific.ethnicity.ontology_label', 'donor_organism.diseases.ontology', 'donor_organism.diseases.ontology_label', 'donor_organism.development_stage.ontology', 'donor_organism.development_stage.ontology_label', 'donor_organism.sex', 'donor_organism.is_living', 'specimen_from_organism.organ.ontology', 'specimen_from_organism.organ.ontology_label', 'specimen_from_organism.organ_parts.ontology', 'specimen_from_organism.organ_parts.ontology_label', 'library_preparation_protocol.provenance.document_id', 'library_preparation_protocol.input_nucleic_acid_molecule.ontology', 'library_preparation_protocol.input_nucleic_acid_molecule.ontology_label', 'library_preparation_protocol.library_construction_method.ontology', 'library_preparation_protocol.library_construction_method.ontology_label', 'library_preparation_protocol.end_bias', 'library_preparation_protocol.strand', 'project.provenance.document_id', 'project.project_core.project_short_name', 'project.project_core.project_title', 'analysis_protocol.provenance.document_id', 'dss_bundle_fqid', 'bundle_uuid', 'bundle_version', 'analysis_protocol.protocol_core.protocol_id', 'analysis_working_group_approval_status'
    var: 'featureid', 'featuretype', 'chromosome', 'featurestart', 'featureend', 'isgene', 'genus_species'
    obsm: 
    varm: 
    uns: 'genome', 'modality'
In [11]:
data3_mtx = io.read_input("pegasusio_test_cases/case3/42468c97-1c5a-4c9f-86ea-9eaa1239445a.mtx", genome = 'hg19')
data3_mtx
2020-06-05 09:35:32,031 - pegasusio.text_utils - INFO - Detected mtx file in HCA DCP format.
2020-06-05 09:35:36,287 - pegasusio.readwrite - INFO - mtx file 'pegasusio_test_cases/case3/42468c97-1c5a-4c9f-86ea-9eaa1239445a.mtx' is loaded.
2020-06-05 09:35:36,288 - pegasusio.readwrite - INFO - Function 'read_input' finished in 4.46s.
Out[11]:
MultimodalData object with 1 UnimodalData: 'hg19-rna'
    It currently binds to UnimodalData object hg19-rna

UnimodalData object with n_obs x n_vars = 2544 x 58347
    Genome: hg19; Modality: rna
    It contains 1 matrices: 'X'
    It currently binds to matrix 'X' as X

    obs: 'genes_detected', 'file_uuid', 'file_version', 'total_umis', 'emptydrops_is_cell', 'barcode', 'cell_suspension.provenance.document_id', 'specimen_from_organism.provenance.document_id', 'derived_organ_ontology', 'derived_organ_label', 'derived_organ_parts_ontology', 'derived_organ_parts_label', 'cell_suspension.genus_species.ontology', 'cell_suspension.genus_species.ontology_label', 'donor_organism.provenance.document_id', 'donor_organism.human_specific.ethnicity.ontology', 'donor_organism.human_specific.ethnicity.ontology_label', 'donor_organism.diseases.ontology', 'donor_organism.diseases.ontology_label', 'donor_organism.development_stage.ontology', 'donor_organism.development_stage.ontology_label', 'donor_organism.sex', 'donor_organism.is_living', 'specimen_from_organism.organ.ontology', 'specimen_from_organism.organ.ontology_label', 'specimen_from_organism.organ_parts.ontology', 'specimen_from_organism.organ_parts.ontology_label', 'library_preparation_protocol.provenance.document_id', 'library_preparation_protocol.input_nucleic_acid_molecule.ontology', 'library_preparation_protocol.input_nucleic_acid_molecule.ontology_label', 'library_preparation_protocol.library_construction_method.ontology', 'library_preparation_protocol.library_construction_method.ontology_label', 'library_preparation_protocol.end_bias', 'library_preparation_protocol.strand', 'project.provenance.document_id', 'project.project_core.project_short_name', 'project.project_core.project_title', 'analysis_protocol.provenance.document_id', 'dss_bundle_fqid', 'bundle_uuid', 'bundle_version', 'analysis_protocol.protocol_core.protocol_id', 'analysis_working_group_approval_status'
    var: 'featureid', 'featuretype', 'chromosome', 'featurestart', 'featureend', 'isgene', 'genus_species'
    obsm: 
    varm: 
    uns: 'genome', 'modality'
In [12]:
data3_loom = io.read_input("pegasusio_test_cases/case3/pancreas.loom", genome = 'hg19')
data3_loom
2020-06-05 09:35:41,315 - pegasusio.readwrite - INFO - loom file 'pegasusio_test_cases/case3/pancreas.loom' is loaded.
2020-06-05 09:35:41,315 - pegasusio.readwrite - INFO - Function 'read_input' finished in 5.02s.
Out[12]:
MultimodalData object with 1 UnimodalData: 'hg19-rna'
    It currently binds to UnimodalData object hg19-rna

UnimodalData object with n_obs x n_vars = 2544 x 58347
    Genome: hg19; Modality: rna
    It contains 1 matrices: 'X'
    It currently binds to matrix 'X' as X

    obs: 'analysis_protocol.protocol_core.protocol_id', 'analysis_protocol.provenance.document_id', 'analysis_working_group_approval_status', 'barcode', 'bundle_uuid', 'bundle_version', 'cell_suspension.genus_species.ontology', 'cell_suspension.genus_species.ontology_label', 'cell_suspension.provenance.document_id', 'derived_organ_label', 'derived_organ_ontology', 'derived_organ_parts_label', 'derived_organ_parts_ontology', 'donor_organism.development_stage.ontology', 'donor_organism.development_stage.ontology_label', 'donor_organism.diseases.ontology', 'donor_organism.diseases.ontology_label', 'donor_organism.human_specific.ethnicity.ontology', 'donor_organism.human_specific.ethnicity.ontology_label', 'donor_organism.is_living', 'donor_organism.provenance.document_id', 'donor_organism.sex', 'dss_bundle_fqid', 'emptydrops_is_cell', 'file_uuid', 'file_version', 'genes_detected', 'library_preparation_protocol.end_bias', 'library_preparation_protocol.input_nucleic_acid_molecule.ontology', 'library_preparation_protocol.input_nucleic_acid_molecule.ontology_label', 'library_preparation_protocol.library_construction_method.ontology', 'library_preparation_protocol.library_construction_method.ontology_label', 'library_preparation_protocol.provenance.document_id', 'library_preparation_protocol.strand', 'project.project_core.project_short_name', 'project.project_core.project_title', 'project.provenance.document_id', 'specimen_from_organism.organ.ontology', 'specimen_from_organism.organ.ontology_label', 'specimen_from_organism.organ_parts.ontology', 'specimen_from_organism.organ_parts.ontology_label', 'specimen_from_organism.provenance.document_id', 'total_umis'
    var: 'featureid', 'chromosome', 'featureend', 'featurestart', 'featuretype', 'genus_species', 'isgene'
    obsm: 
    varm: 
    uns: 'CreationDate', 'LOOM_SPEC_VERSION', 'last_modified', 'genome', 'modality'

As mentioned above, all data3_csv, data3_mtx, and data3_loom are PegasusIO's UnimodalData objects.

We can then write the object into AnnData h5ad format:

In [13]:
io.write_output(data3_csv, "pegasusio_test_cases/case3/pancreas.h5ad")
... storing 'genes_detected' as categorical
... storing 'total_umis' as categorical
... storing 'emptydrops_is_cell' as categorical
... storing 'barcode' as categorical
... storing 'specimen_from_organism.provenance.document_id' as categorical
... storing 'derived_organ_ontology' as categorical
... storing 'derived_organ_label' as categorical
... storing 'derived_organ_parts_ontology' as categorical
... storing 'derived_organ_parts_label' as categorical
... storing 'cell_suspension.genus_species.ontology' as categorical
... storing 'cell_suspension.genus_species.ontology_label' as categorical
... storing 'donor_organism.provenance.document_id' as categorical
... storing 'donor_organism.human_specific.ethnicity.ontology' as categorical
... storing 'donor_organism.human_specific.ethnicity.ontology_label' as categorical
... storing 'donor_organism.diseases.ontology' as categorical
... storing 'donor_organism.diseases.ontology_label' as categorical
... storing 'donor_organism.development_stage.ontology' as categorical
... storing 'donor_organism.development_stage.ontology_label' as categorical
... storing 'donor_organism.sex' as categorical
... storing 'donor_organism.is_living' as categorical
... storing 'specimen_from_organism.organ.ontology' as categorical
... storing 'specimen_from_organism.organ.ontology_label' as categorical
... storing 'specimen_from_organism.organ_parts.ontology' as categorical
... storing 'specimen_from_organism.organ_parts.ontology_label' as categorical
... storing 'library_preparation_protocol.provenance.document_id' as categorical
... storing 'library_preparation_protocol.input_nucleic_acid_molecule.ontology' as categorical
... storing 'library_preparation_protocol.input_nucleic_acid_molecule.ontology_label' as categorical
... storing 'library_preparation_protocol.library_construction_method.ontology' as categorical
... storing 'library_preparation_protocol.library_construction_method.ontology_label' as categorical
... storing 'library_preparation_protocol.end_bias' as categorical
... storing 'library_preparation_protocol.strand' as categorical
... storing 'project.provenance.document_id' as categorical
... storing 'project.project_core.project_short_name' as categorical
... storing 'project.project_core.project_title' as categorical
... storing 'bundle_version' as categorical
... storing 'analysis_protocol.protocol_core.protocol_id' as categorical
... storing 'analysis_working_group_approval_status' as categorical
... storing 'featuretype' as categorical
... storing 'chromosome' as categorical
... storing 'featurestart' as categorical
... storing 'featureend' as categorical
... storing 'isgene' as categorical
... storing 'genus_species' as categorical
2020-06-05 09:35:45,457 - pegasusio.readwrite - INFO - h5ad file 'pegasusio_test_cases/case3/pancreas.h5ad' is written.
2020-06-05 09:35:45,457 - pegasusio.readwrite - INFO - Function 'write_output' finished in 4.14s.

Case 4: Process multiple zarr files with Scrublet scores

In this case, we use two channels from human bone marrow data at https://data.humancellatlas.org/explore/projects/cc95ff89-2e68-4a08-a234-480eca21ce79: donor 1 channel 1, and donor 8 channel 8. Both channels have been processed with Scrublet to estimate doublet scores, and are stored in zarr format.

First, load two files into memory:

In [14]:
data4_1 = io.read_input("pegasusio_test_cases/case4/MantonBM1_1_dbls.zarr")
data4_1
2020-06-05 09:35:45,507 - pegasusio.readwrite - INFO - zarr file 'pegasusio_test_cases/case4/MantonBM1_1_dbls.zarr' is loaded.
2020-06-05 09:35:45,508 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.05s.
Out[14]:
MultimodalData object with 1 UnimodalData: 'GRCh38-rna'
    It currently binds to UnimodalData object GRCh38-rna

UnimodalData object with n_obs x n_vars = 4274 x 19360
    Genome: GRCh38; Modality: rna
    It contains 1 matrices: 'X'
    It currently binds to matrix 'X' as X

    obs: 'passed_qc', 'n_genes', 'n_counts', 'percent_mito', 'scrublet_scores'
    var: 'featureid', 'n_cells', 'percent_cells', 'robust', 'highly_variable_features'
    obsm: 
    varm: 
    uns: 'genome', 'modality', 'scrublet_stats'
In [15]:
data4_2 = io.read_input("pegasusio_test_cases/case4/MantonBM8_8_dbls.zarr")
data4_2
2020-06-05 09:35:45,562 - pegasusio.readwrite - INFO - zarr file 'pegasusio_test_cases/case4/MantonBM8_8_dbls.zarr' is loaded.
2020-06-05 09:35:45,563 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.05s.
Out[15]:
MultimodalData object with 1 UnimodalData: 'GRCh38-rna'
    It currently binds to UnimodalData object GRCh38-rna

UnimodalData object with n_obs x n_vars = 4162 x 18178
    Genome: GRCh38; Modality: rna
    It contains 1 matrices: 'X'
    It currently binds to matrix 'X' as X

    obs: 'passed_qc', 'n_genes', 'n_counts', 'percent_mito', 'scrublet_scores'
    var: 'featureid', 'n_cells', 'percent_cells', 'robust', 'highly_variable_features'
    obsm: 
    varm: 
    uns: 'genome', 'modality', 'scrublet_stats'

Both channels have over 4000 cell barcodes. Below are Scrublet scores of the first channel:

In [16]:
data4_1.obs['scrublet_scores']
Out[16]:
barcodekey
AAACCTGAGCAGGTCA    0.022873
AAACCTGCACACTGCG    0.007703
AAACCTGCACCGGAAA    0.023813
AAACCTGCATAGACTC    0.054320
AAACCTGCATCGATGT    0.044107
                      ...   
TTTGTCAGTCCGCTGA    0.060842
TTTGTCATCAGTCAGT    0.010380
TTTGTCATCATGTAGC    0.005234
TTTGTCATCCGCTGTT    0.010812
TTTGTCATCCTCTAGC    0.012691
Name: scrublet_scores, Length: 4274, dtype: float64

And its scrublet stats information can be retrieved as below:

In [17]:
data4_1.uns['scrublet_stats']
Out[17]:
{'detectable_doublet_fraction': 0.35423490875058494,
 'detected_doublet_rate': 0.013102480112306972,
 'overall_doublet_rate': 0.03698811096433289,
 'threshold': 0.24325254050006934}

We can also aggregate two channels into one data matrix using PegasusIO's aggregate_matrices function. To do that, we need to first prepare a sample sheet in csv format:

In [18]:
sheet4 = pd.read_csv("pegasusio_test_cases/case4/count_matrix.csv")
sheet4
Out[18]:
Sample Location
0 sample1 pegasusio_test_cases/case4/MantonBM1_1_dbls.zarr
1 sample2 pegasusio_test_cases/case4/MantonBM8_8_dbls.zarr

The sample sheet should have at least two columns: Sample, specifying sample name; Location, specifying location of the sample's gene-count matrix file.

Then use this sample sheet for data aggregation:

In [19]:
data4 = io.aggregate_matrices("pegasusio_test_cases/case4/count_matrix.csv")
data4
2020-06-05 09:35:45,673 - pegasusio.readwrite - INFO - zarr file 'pegasusio_test_cases/case4/MantonBM1_1_dbls.zarr' is loaded.
2020-06-05 09:35:45,674 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.05s.
2020-06-05 09:35:45,725 - pegasusio.readwrite - INFO - zarr file 'pegasusio_test_cases/case4/MantonBM8_8_dbls.zarr' is loaded.
2020-06-05 09:35:45,725 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.05s.
2020-06-05 09:35:46,296 - pegasusio.aggr_data - INFO - Function 'aggregate' finished in 0.56s.
2020-06-05 09:35:46,297 - pegasusio.data_aggregation - INFO - Aggregated 2 files.
2020-06-05 09:35:46,298 - pegasusio.data_aggregation - INFO - Function 'aggregate_matrices' finished in 0.68s.
Out[19]:
MultimodalData object with 1 UnimodalData: 'GRCh38-rna'
    It currently binds to UnimodalData object GRCh38-rna

UnimodalData object with n_obs x n_vars = 8436 x 20381
    Genome: GRCh38; Modality: rna
    It contains 1 matrices: 'X'
    It currently binds to matrix 'X' as X

    obs: 'passed_qc', 'n_genes', 'n_counts', 'percent_mito', 'scrublet_scores', 'Channel'
    var: 'featureid'
    obsm: 
    varm: 
    uns: 'genome', 'modality', 'var_dict', 'uns_dict'

You can see that data4 contains 8436 cells from both channels altogether.

Case 5: Data aggregation with filtering

In this case, we demonstrate aggregating data matrices with quality-control filtering settings. We use mouse lung cells from Mouse Cell Atlas paper. In particular, we use samples "Lung 1", "Lung 2", and "Lung 3" from DGE format file here.

Similarly as in Case 4, first prepare a sample sheet:

In [20]:
sheet5 = pd.read_csv("pegasusio_test_cases/case5/count_matrix.csv")
sheet5
Out[20]:
Sample Location
0 lung1 pegasusio_test_cases/case5/Lung1_rm.batch_dge....
1 lung2 pegasusio_test_cases/case5/Lung2_rm.batch_dge....
2 lung3 pegasusio_test_cases/case5/Lung3_rm.batch_dge....

In details, Location column lists 3 files as the following:

In [21]:
for _, row in sheet5.iterrows():
    print(row['Location'])
pegasusio_test_cases/case5/Lung1_rm.batch_dge.txt.gz
pegasusio_test_cases/case5/Lung2_rm.batch_dge.txt.gz
pegasusio_test_cases/case5/Lung3_rm.batch_dge.txt.gz

Now we can aggregate the three samples with quality-control filtering settings:

In [22]:
data5 = io.aggregate_matrices("pegasusio_test_cases/case5/count_matrix.csv", 
                              default_ref = 'mm10', 
                              append_sample_name = False,
                              min_genes = 500,
                              max_genes = 6000,
                              mito_prefix = 'mt-',
                              percent_mito = 20)
data5
2020-06-05 09:35:47,452 - pegasusio.readwrite - INFO - tsv file 'pegasusio_test_cases/case5/Lung1_rm.batch_dge.txt.gz' is loaded.
2020-06-05 09:35:47,453 - pegasusio.readwrite - INFO - Function 'read_input' finished in 1.12s.
2020-06-05 09:35:48,050 - pegasusio.readwrite - INFO - tsv file 'pegasusio_test_cases/case5/Lung2_rm.batch_dge.txt.gz' is loaded.
2020-06-05 09:35:48,051 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.60s.
2020-06-05 09:35:48,146 - pegasusio.qc_utils - INFO - After filtration, 1589 out of 2835 cell barcodes are kept in UnimodalData object mm10-rna.
2020-06-05 09:35:49,487 - pegasusio.readwrite - INFO - tsv file 'pegasusio_test_cases/case5/Lung3_rm.batch_dge.txt.gz' is loaded.
2020-06-05 09:35:49,488 - pegasusio.readwrite - INFO - Function 'read_input' finished in 1.34s.
2020-06-05 09:35:49,575 - pegasusio.qc_utils - INFO - After filtration, 860 out of 1796 cell barcodes are kept in UnimodalData object mm10-rna.
2020-06-05 09:35:49,662 - pegasusio.qc_utils - INFO - After filtration, 1308 out of 4485 cell barcodes are kept in UnimodalData object mm10-rna.
2020-06-05 09:35:49,954 - pegasusio.aggr_data - INFO - Function 'aggregate' finished in 0.29s.
2020-06-05 09:35:49,955 - pegasusio.data_aggregation - INFO - Aggregated 3 files.
2020-06-05 09:35:49,960 - pegasusio.data_aggregation - INFO - Function 'aggregate_matrices' finished in 3.64s.
Out[22]:
MultimodalData object with 1 UnimodalData: 'mm10-rna'
    It currently binds to UnimodalData object mm10-rna

UnimodalData object with n_obs x n_vars = 3757 x 23450
    Genome: mm10; Modality: rna
    It contains 1 matrices: 'X'
    It currently binds to matrix 'X' as X

    obs: 'n_genes', 'n_counts', 'percent_mito', 'Channel'
    var: 
    obsm: 
    varm: 
    uns: 'genome', 'modality'

We keep cells with:

  • 500 $<=$ Number of expressed genes $<$ 6000, and
  • Percent of mitochondrial genes $<=$ 20%

Besides, we need to specify the name prefix of mitochondrial genes in order to calculate the second criterion.

We also don't append sample name as prefix to cell barcodes after aggregation, as in this case, all the barcodes are already distinct beforehand.

For details on these parameters, please see PegasusIO documentation.

In [23]:
sheet6 = pd.read_csv("pegasusio_test_cases/case6/count_matrix.csv")
sheet6
Out[23]:
Sample Location
0 health pegasusio_test_cases/case6/vdj_v1_hs_pbmc2_5ge...
1 health pegasusio_test_cases/case6/vdj_v1_hs_pbmc2_t_f...
2 health pegasusio_test_cases/case6/vdj_v1_hs_pbmc2_b_f...

Notice that the sample names are the same this time.

In [24]:
data6 = io.aggregate_matrices("pegasusio_test_cases/case6/count_matrix.csv")
data6
2020-06-05 09:35:50,970 - pegasusio.readwrite - INFO - 10x file 'pegasusio_test_cases/case6/vdj_v1_hs_pbmc2_5gex_protein_filtered_feature_bc_matrix.h5' is loaded.
2020-06-05 09:35:50,970 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.98s.
2020-06-05 09:35:51,043 - pegasusio.readwrite - INFO - csv file 'pegasusio_test_cases/case6/vdj_v1_hs_pbmc2_t_filtered_contig_annotations.csv' is loaded.
2020-06-05 09:35:51,044 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.07s.
2020-06-05 09:35:51,083 - pegasusio.readwrite - INFO - csv file 'pegasusio_test_cases/case6/vdj_v1_hs_pbmc2_b_filtered_contig_annotations.csv' is loaded.
2020-06-05 09:35:51,084 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.04s.
2020-06-05 09:35:51,137 - pegasusio.multimodal_data - INFO - After filtration, 8258 out of 8258 cell barcodes are kept in UnimodalData object GRCh38-citeseq.
2020-06-05 09:35:51,193 - pegasusio.multimodal_data - INFO - After filtration, 2987 out of 3009 cell barcodes are kept in UnimodalData object GRCh38-tcr.
2020-06-05 09:35:51,250 - pegasusio.multimodal_data - INFO - After filtration, 1185 out of 1202 cell barcodes are kept in UnimodalData object GRCh38-bcr.
2020-06-05 09:35:51,459 - pegasusio.aggr_data - INFO - Function 'aggregate' finished in 0.20s.
2020-06-05 09:35:51,460 - pegasusio.data_aggregation - INFO - Aggregated 3 files.
2020-06-05 09:35:51,460 - pegasusio.data_aggregation - INFO - Function 'aggregate_matrices' finished in 1.47s.
Out[24]:
MultimodalData object with 4 UnimodalData: 'GRCh38-citeseq', 'GRCh38-rna', 'GRCh38-tcr', 'GRCh38-bcr'
    It currently binds to CITESeqData object GRCh38-citeseq

CITESeqData object with n_obs x n_vars = 8258 x 17
    Genome: GRCh38; Modality: citeseq
    It contains 1 matrices: 'raw.count'
    It currently binds to matrix 'raw.count' as X

    obs: 'Channel'
    var: 'featureid', '_control_id'
    obsm: 
    varm: 
    uns: 'genome', 'modality', '_control_names', '_control_counts', '_obs_keys'

data6 has 4 UnimodalData elements: GRCh38-citeseq for CITE-Seq data, GRCh38-rna for RNA data, GRCh38-tcr for TCR data, and GRCh38-bcr for BCR data.

6.1. CITE-Seq

We first check CITE-Seq data. Its antibody control list is constructed based on information from biolegend website, as shown below:

In [25]:
antibody_control_sheet = pd.read_csv("pegasusio_test_cases/case6/antibody_control.csv")
antibody_control_sheet
Out[25]:
Antibody Control
0 CD3_TotalSeqC IgG1_control_TotalSeqC
1 CD19_TotalSeqC IgG1_control_TotalSeqC
2 CD45RA_TotalSeqC IgG2b_control_TotalSeqC
3 CD4_TotalSeqC IgG1_control_TotalSeqC
4 CD8a_TotalSeqC IgG1_control_TotalSeqC
5 CD14_TotalSeqC IgG2a_control_TotalSeqC
6 CD16_TotalSeqC IgG1_control_TotalSeqC
7 CD56_TotalSeqC IgG1_control_TotalSeqC
8 CD25_TotalSeqC IgG1_control_TotalSeqC
9 CD45RO_TotalSeqC IgG2a_control_TotalSeqC
10 PD-1_TotalSeqC IgG1_control_TotalSeqC
11 TIGIT_TotalSeqC IgG2a_control_TotalSeqC
12 CD127_TotalSeqC IgG1_control_TotalSeqC
13 CD15_TotalSeqC IgG1_control_TotalSeqC

Now load it to CITE-Seq data:

In [26]:
data6.select_data('GRCh38-citeseq')
data6.load_control_list("pegasusio_test_cases/case6/antibody_control.csv")
data6.arcsinh_transform()
data6
Out[26]:
MultimodalData object with 4 UnimodalData: 'GRCh38-citeseq', 'GRCh38-rna', 'GRCh38-tcr', 'GRCh38-bcr'
    It currently binds to CITESeqData object GRCh38-citeseq

CITESeqData object with n_obs x n_vars = 8258 x 14
    Genome: GRCh38; Modality: citeseq
    It contains 2 matrices: 'raw.count', 'arcsinh.transformed'
    It currently binds to matrix 'arcsinh.transformed' as X

    obs: 'Channel'
    var: 'featureid', '_control_id'
    obsm: 
    varm: 
    uns: 'genome', 'modality', '_control_names', '_control_counts', '_obs_keys'

6.2. TCR

In [27]:
data6.select_data('GRCh38-tcr')
data6
Out[27]:
MultimodalData object with 4 UnimodalData: 'GRCh38-citeseq', 'GRCh38-rna', 'GRCh38-tcr', 'GRCh38-bcr'
    It currently binds to VDJData object GRCh38-tcr

VDJData object with n_obs x n_vars = 2987 x 50
    Genome: GRCh38; Modality: tcr
    It contains 10 matrices: 'high_confidence', 'length', 'reads', 'umis', 'v_gene', 'd_gene', 'j_gene', 'c_gene', 'cdr3', 'cdr3_nt'
    It currently binds to matrix 'umis' as X

    obs: 'is_cell', 'nTRA', 'nTRB', 'nTRD', 'nTRG', 'nMulti', 'Channel'
    var: 
    obsm: 
    varm: 
    uns: 'genome', 'modality', '_v_gene', '_d_gene', '_j_gene', '_c_gene', '_cdr3', '_cdr3_nt'
In [28]:
data6.get_chain('TRA')
Out[28]:
high_confidence length reads umis v_gene d_gene j_gene c_gene cdr3 cdr3_nt
barcodekey
health-AAACCTGAGACCACGA True 521 1569 2 TRAV1-2 None TRAJ12 TRAC CAVMDSSYKLIF TGTGCTGTGATGGATAGCAGCTATAAATTGATCTTC
health-AAACCTGAGGCTCTTA True 518 2019 2 TRAV1-2 None TRAJ33 TRAC CAVKDSNYQLIW TGTGCTGTGAAGGATAGCAACTATCAGTTAATCTGG
health-AAACCTGAGTGAACGC True 504 2665 2 TRAV1-2 None TRAJ35 TRAC CAVCTI TGTGCTGTCTGTACGATA
health-AAACCTGAGTTGTCGT True 557 7528 7 TRAV12-2 None TRAJ54 TRAC CAVNLEIQGAQKLVF TGTGCCGTGAACCTCGAAATTCAGGGAGCCCAGAAGCTGGTATTT
health-AAACCTGCAAACGTGG False 0 0 0 None None None None None None
... ... ... ... ... ... ... ... ... ... ...
health-TTTGTCAGTTGCCTCT True 585 5663 5 TRAV14DV4 None TRAJ49 TRAC CAMREAGTGNQFYF TGTGCAATGAGAGAGGCCGGGACCGGTAACCAGTTCTATTTT
health-TTTGTCAGTTTAGCTG True 486 2727 3 TRAV35 None TRAJ7 TRAC CAGQLCYGNNRLAF TGTGCTGGGCAGCTCTGCTATGGGAACAACAGACTCGCTTTT
health-TTTGTCATCAAGGCTT True 741 1327 3 TRAV1-2 None TRAJ28 TRAC CAVRSTGTGAGSYQLTF TGTGCTGTGAGATCGACGGGGACTGGGGCTGGGAGTTACCAACTCA...
health-TTTGTCATCATGGTCA True 942 5027 6 TRAV1-2 None TRAJ33 TRAC CAALDSNYQLIW TGTGCTGCCCTGGATAGCAACTATCAGTTAATCTGG
health-TTTGTCATCTCGTTTA True 527 4486 4 TRAV1-2 None TRAJ33 TRAC CAVMDSNYQLIW TGTGCTGTGATGGATAGCAACTATCAGTTAATCTGG

2987 rows × 10 columns

In [29]:
data6.get_chain('TRB')
Out[29]:
high_confidence length reads umis v_gene d_gene j_gene c_gene cdr3 cdr3_nt
barcodekey
health-AAACCTGAGACCACGA True 584 5238 7 TRBV6-1 TRBD2 TRBJ2-1 TRBC2 CASSGLAGGYNEQFF TGTGCCAGCAGTGGACTAGCGGGGGGCTACAATGAGCAGTTCTTC
health-AAACCTGAGGCTCTTA True 551 3846 4 TRBV6-4 TRBD2 TRBJ2-3 TRBC2 CASSGVAGGTDTQYF TGTGCCAGCAGTGGGGTAGCGGGAGGCACAGATACGCAGTATTTT
health-AAACCTGAGTGAACGC True 674 3002 6 TRBV2 TRBD1 TRBJ1-2 TRBC1 CASNQGLNYGYTF TGTGCCAGCAATCAGGGCCTTAACTATGGCTACACCTTC
health-AAACCTGAGTTGTCGT True 676 8576 10 TRBV9 TRBD1 TRBJ1-6 TRBC1 CASSATGSGSPLHF TGTGCCAGCAGCGCTACAGGGTCGGGTTCACCCCTCCACTTT
health-AAACCTGCAAACGTGG True 695 17409 24 TRBV20-1 TRBD1 TRBJ2-3 TRBC2 CSGKGGTDTQYF TGCAGTGGAAAGGGTGGCACAGATACGCAGTATTTT
... ... ... ... ... ... ... ... ... ... ...
health-TTTGTCAGTTGCCTCT False 0 0 0 None None None None None None
health-TTTGTCAGTTTAGCTG True 764 21059 26 TRBV4-3 TRBD2 TRBJ2-5 TRBC2 CASSQAPISGAGETQYF TGCGCCAGCAGCCAAGCCCCAATTAGCGGGGCCGGAGAGACCCAGT...
health-TTTGTCATCAAGGCTT True 521 715 2 TRBV24-1 TRBD2 TRBJ2-5 TRBC2 CATSDPTSGGSQTQYF TGTGCCACCAGTGACCCCACTAGCGGGGGGTCGCAGACCCAGTACTTC
health-TTTGTCATCATGGTCA True 527 6829 6 TRBV20-1 TRBD2 TRBJ1-1 TRBC1 CSARGDGHTEAFF TGCAGTGCTAGAGGGGACGGACACACTGAAGCTTTCTTT
health-TTTGTCATCTCGTTTA True 542 6173 8 TRBV20-1 TRBD2 TRBJ2-5 TRBC2 CSATRLGREQETQYF TGCAGTGCTACGCGACTAGGCCGAGAACAAGAGACCCAGTACTTC

2987 rows × 10 columns

6.3 BCR

In [30]:
data6.select_data('GRCh38-bcr')
data6
Out[30]:
MultimodalData object with 4 UnimodalData: 'GRCh38-citeseq', 'GRCh38-rna', 'GRCh38-tcr', 'GRCh38-bcr'
    It currently binds to VDJData object GRCh38-bcr

VDJData object with n_obs x n_vars = 1185 x 40
    Genome: GRCh38; Modality: bcr
    It contains 10 matrices: 'high_confidence', 'length', 'reads', 'umis', 'v_gene', 'd_gene', 'j_gene', 'c_gene', 'cdr3', 'cdr3_nt'
    It currently binds to matrix 'umis' as X

    obs: 'is_cell', 'nIGK', 'nIGL', 'nIGH', 'nMulti', 'Channel'
    var: 
    obsm: 
    varm: 
    uns: 'genome', 'modality', '_v_gene', '_d_gene', '_j_gene', '_c_gene', '_cdr3', '_cdr3_nt'
In [31]:
data6.get_chain('IGK')
Out[31]:
high_confidence length reads umis v_gene d_gene j_gene c_gene cdr3 cdr3_nt
barcodekey
health-AAACCTGAGAGCAATT True 626 931 7 IGKV4-1 None IGKJ4 IGKC CQQYYSTPLTF TGTCAGCAGTATTATAGTACTCCTCTCACTTTC
health-AAAGCAACATCACAAC True 587 1823 16 IGKV1-12 None IGKJ2 IGKC CQQADSPPLF TGTCAACAGGCTGACAGTCCCCCTCTTTTT
health-AAAGTAGAGTGACATA True 570 3604 35 IGKV4-1 None IGKJ3 IGKC CQQYYSTPFTF TGTCAGCAATATTATAGTACTCCATTCACTTTC
health-AAATGCCAGTGTTGAA True 695 12547 113 IGKV4-1 None IGKJ1 IGKC CQQYYSTHRTF TGTCAGCAATATTATAGCACTCATCGGACGTTC
health-AAATGCCCACCGCTAG True 671 2229 24 IGKV1D-17 None IGKJ2 IGKC CLQHNSYPYTF TGTCTACAGCATAATAGTTACCCGTACACTTTT
... ... ... ... ... ... ... ... ... ... ...
health-TTTGTCAAGTCATGCT True 671 8218 77 IGKV3-15 None IGKJ4 IGKC CQQYNNWPPLTF TGTCAGCAGTATAATAACTGGCCTCCCCTCACTTTC
health-TTTGTCACAAACCCAT False 0 0 0 None None None None None None
health-TTTGTCACAGCTCGAC True 677 15418 142 IGKV1-12 None IGKJ2 IGKC CQQARSLPYTF TGTCAACAGGCTCGCAGCCTCCCGTACACTTTT
health-TTTGTCACAGTAAGAT True 679 5729 59 IGKV2D-40 None IGKJ2 IGKC CMQRIEFPYTF TGCATGCAACGTATAGAGTTCCCGTACACTTTT
health-TTTGTCAGTCCAGTAT False 0 0 0 None None None None None None

1185 rows × 10 columns

In [32]:
data6.get_chain('IGL')
Out[32]:
high_confidence length reads umis v_gene d_gene j_gene c_gene cdr3 cdr3_nt
barcodekey
health-AAACCTGAGAGCAATT True 636 569 12 IGLV2-14 None IGLJ2 IGLC2 CSSYTSSSPLF TGCAGCTCATATACAAGCAGCAGCCCCTTATTC
health-AAAGCAACATCACAAC False 0 0 0 None None None None None None
health-AAAGTAGAGTGACATA False 0 0 0 None None None None None None
health-AAATGCCAGTGTTGAA False 0 0 0 None None None None None None
health-AAATGCCCACCGCTAG False 0 0 0 None None None None None None
... ... ... ... ... ... ... ... ... ... ...
health-TTTGTCAAGTCATGCT False 0 0 0 None None None None None None
health-TTTGTCACAAACCCAT True 705 3737 69 IGLV3-19 None IGLJ2 IGLC2 CNSRDSSGNHVVF TGTAACTCCCGGGACAGCAGTGGTAACCATGTGGTATTC
health-TTTGTCACAGCTCGAC False 0 0 0 None None None None None None
health-TTTGTCACAGTAAGAT False 0 0 0 None None None None None None
health-TTTGTCAGTCCAGTAT True 674 1748 36 IGLV2-14 None IGLJ2 IGLC2 CSSYTSSSTVF TGCAGCTCATATACAAGCAGCAGCACGGTATTC

1185 rows × 10 columns

In [33]:
data6.get_chain('IGH')
Out[33]:
high_confidence length reads umis v_gene d_gene j_gene c_gene cdr3 cdr3_nt
barcodekey
health-AAACCTGAGAGCAATT True 594 327 2 IGHV1-2 IGHD1-26 IGHJ4 IGHD CARGNSGSYNRNWFFDYW TGTGCGAGAGGCAATAGTGGGAGCTACAATCGAAATTGGTTCTTTG...
health-AAAGCAACATCACAAC False 0 0 0 None None None None None None
health-AAAGTAGAGTGACATA True 652 3640 38 IGHV1-2 IGHD3-16 IGHJ5 IGHA1 CARVPGWGHNYFDPW TGTGCGAGAGTCCCCGGTTGGGGACACAACTACTTCGACCCCTGG
health-AAATGCCAGTGTTGAA True 691 2631 39 IGHV1-69-2 IGHD6-25 IGHJ4 IGHG2 CARDVPEGKAAILGYFDWW TGTGCGAGAGATGTCCCAGAGGGAAAAGCGGCCATTTTAGGGTACT...
health-AAATGCCCACCGCTAG True 521 3041 30 IGHV2-5 IGHD4-17 IGHJ4 IGHM CAHRRYGDYDGDFDYW TGTGCACACAGACGTTACGGTGACTACGACGGAGACTTTGACTACTGG
... ... ... ... ... ... ... ... ... ... ...
health-TTTGTCAAGTCATGCT True 574 1161 16 IGHV3-7 IGHD4-4 IGHJ1 IGHM CARAYFTVTTEGCFQHW TGTGCGAGAGCTTACTTTACAGTAACTACCGAAGGATGCTTCCAGC...
health-TTTGTCACAAACCCAT True 534 2474 31 IGHV4-39 IGHD6-19 IGHJ3 IGHM CARDSSGWYADAFDIW TGTGCGAGAGATAGCAGTGGCTGGTACGCGGATGCTTTTGATATCTGG
health-TTTGTCACAGCTCGAC True 653 1886 35 IGHV1-24 IGHD4-17 IGHJ4 IGHG1 CVGQNGDYFDYW TGTGTGGGGCAGAACGGTGACTACTTTGACTACTGG
health-TTTGTCACAGTAAGAT True 587 2273 28 IGHV3-23 IGHD6-13 IGHJ4 IGHM CAKRPDHSSSWYGRGFDYW TGTGCGAAAAGGCCCGATCATAGCAGCAGCTGGTACGGTAGGGGTT...
health-TTTGTCAGTCCAGTAT True 567 1351 14 IGHV4-31 IGHD6-6 IGHJ4 IGHM CARDLGQLGHFDYW TGTGCCAGAGATCTAGGGCAGCTCGGCCATTTTGACTACTGG

1185 rows × 10 columns

Case 7: Process Flow Cytometry data

In [34]:
sheet7 = pd.read_csv("pegasusio_test_cases/case7/count_matrix.csv")
sheet7
Out[34]:
Sample Location
0 PBMC1 pegasusio_test_cases/case7/PBMC8_30min_patient...
1 PBMC2 pegasusio_test_cases/case7/PBMC8_30min_patient...
2 PBMC3 pegasusio_test_cases/case7/PBMC8_30min_patient...
In [35]:
for _, row in sheet7.iterrows():
    print(row['Location'])
pegasusio_test_cases/case7/PBMC8_30min_patient1_Reference.fcs
pegasusio_test_cases/case7/PBMC8_30min_patient2_Reference.fcs
pegasusio_test_cases/case7/PBMC8_30min_patient3_Reference.fcs
In [36]:
data7 = io.aggregate_matrices("pegasusio_test_cases/case7/count_matrix.csv")
data7.arcsinh_transform()
data7
2020-06-05 09:35:51,708 - pegasusio.readwrite - INFO - fcs file 'pegasusio_test_cases/case7/PBMC8_30min_patient1_Reference.fcs' is loaded.
2020-06-05 09:35:51,709 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.01s.
2020-06-05 09:35:51,727 - pegasusio.readwrite - INFO - fcs file 'pegasusio_test_cases/case7/PBMC8_30min_patient2_Reference.fcs' is loaded.
2020-06-05 09:35:51,727 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.02s.
2020-06-05 09:35:51,740 - pegasusio.readwrite - INFO - fcs file 'pegasusio_test_cases/case7/PBMC8_30min_patient3_Reference.fcs' is loaded.
2020-06-05 09:35:51,741 - pegasusio.readwrite - INFO - Function 'read_input' finished in 0.01s.
2020-06-05 09:35:51,866 - pegasusio.aggr_data - INFO - Function 'aggregate' finished in 0.11s.
2020-06-05 09:35:51,867 - pegasusio.data_aggregation - INFO - Aggregated 3 files.
2020-06-05 09:35:51,868 - pegasusio.data_aggregation - INFO - Function 'aggregate_matrices' finished in 0.17s.
Out[36]:
MultimodalData object with 1 UnimodalData: 'unknown-cyto'
    It currently binds to CytoData object unknown-cyto

CytoData object with n_obs x n_vars = 28898 x 35
    Genome: unknown; Modality: cyto
    It contains 2 matrices: 'raw.data', 'arcsinh.transformed'
    It currently binds to matrix 'arcsinh.transformed' as X

    obs: 'Channel'
    var: 'featureid', '_control_id'
    obsm: '_controls'
    varm: 
    uns: 'genome', 'modality', 'uns_dict', '_control_names'