Abbreviations Key | |
ADT | antibody-derived tag |
ARC | ATAC + RNA chromium |
archR | analysis of regulatory chromatin in R |
BMMCs | bone marrow mononuclear cells |
CITE-seq | cellular indexing of transcriptomes and epitopes by sequencing |
CSV | comma-separated value |
FCS | flow cytometry standard |
HTO | hashtag oligo |
MFI | mean or median fluorescence intensity |
MSD | Meso Scale Discovery |
PB | probability binning |
PBMCs | peripheral blood mononuclear cells |
scATAC-seq | single-cell assay transposase-accessible chromatin sequencing |
scRNA-seq | single-cell RNA sequencing |
TSS | transcription start site |
tsv.gz | tab-separated values, GNU zip |
V(D)J | variable, (diversity), and joining [gene segments] |
Produce Searchable Output Files
At a Glance
HISE supports a variety of analysis pipelines that generate searchable output files. These files contain analysis results plus one or more reports. This document discusses the specific files each analysis pipeline produces.
scRNA-seq
In a simple scRNA-seq pipeline, the core scientific analysis method is CellRanger alignment. This analysis produces a report and an output H5 file.
In the cell hashing pipeline, where a number of samples have been barcoded and then mixed, CellRanger alignment is needed. In addition, a barcode recognition and counting process is used to identify the origin of each cell so that the results can be rearranged to produce an output H5 file for each sample. Both pipelines end with a labeling process, using a Seurat-based normalization and labeling method.
The specific searchable output files for the scRNA-seq pipeline are summarized in Table 1.
TABLE 1 | |
Output file type | Description |
scRNA-seq-CellHashing-Main-QC-report | Pipeline result report file that contains cell hashing and sample multiplexing info. |
scRNA-seq- labeled | A Seurat-based, labeled H5 file. Visualizes metadata and clinical features of a sample and patient to investigate a single time point or examine a longitudinal shifts in a patient population. |
scRNA-seq-tenx-report | A web_summary.html file generated by Cell Ranger containing QC metrics for 10x Genomics scRNA-seq data. |
scRNA-seq-merged | An H5 file containing scRNA-seq data from multiple batches. |
scATAC-seq
In the scATAC-seq pipeline, we implement Cell Ranger alignment, followed by a rigorous quality control process to ensure that cells from the scATAC-seq pipeline are high quality, to reduce the number of doublets, and to make the cells available for downstream analysis in a variety of formats (.arrow
, fragments.tsv.gz
, and H5-formatted count matrices).
The pipeline results files are not immediately available for further analysis, but require review and approval by a dedicated team of scientists. Once approved, the data is put through a labeling process using archR to provide an initial cell-type label for each cell.
The specific searchable output files for the scATAC-seq pipeline are summarized in Table 2.
TABLE 2 | |
Output file type | Description |
atac-archr-label-results | Results of cell-type labeling using archR's addGeneIntegrationMatrix against a Seurat scRNA-seq reference. |
atac-assembly archr-arrow | An .arrow file generated by archR that can be used as an input to archR projects for downstream analysis. |
atac-assembly-filtered-fragments-tsv-gz | File containing unique fragment positions for cell barcodes that pass QC and doublet filtering. |
atac-assembly-read-counts-gene bodies-h5 | A matrix in which each row represents a gene body, and each column represents a cell. The values show how many ATAC-seq reads align to each gene body in each cell. This read count matrix is stored in HDF5 format, similar to 10x Genomics scRNA-seq outputs. Genes are defined by Ensembl v93 and filtered to match the scRNA-seq reference. |
atac-assembly-read-counts-per-region-h5 | A count matrix for TSS regions (+/- 2 KB, rows) x cells (columns) stored in HDF5 format, similar to 10x Genomics scRNA-seq outputs. Genes are defined as for gene bodies, above. |
atac-assembly-read-counts-per-windows-h5 | A whole-genome 5 KB window-count matrix with windows (rows) x cells (columns). This matrix is stored in HDF5 format, similar to the scRNA-seq outputs from 10x Genomics. |
cellranger-atac-possorted_genome | The web_summary.html report generated by Cell Ranger-ATAC. |
TEA-seq combinations
TEA-seq is a trimodal single-cell assay that simultaneously measures transcriptomics, protein epitopes, and chromatin accessibility. This assay identifies cell type–specific gene regulation and expression grounded in phenotypically defined cell types.
In the TEA-seq pipeline, we couple Cell Ranger ARC with a rigorous QC process to ensure that cells from the TEA-seq pipeline are high quality, to reduce the number of doublets, and to make cells available in a variety of formats for downstream analysis.
In addition to TEA-seq, HISE supports the analysis of data generated by hashed TEA-seq and by CITE-seq + scRNA-seq. These methods integrate multiple layers of biological information at the single-cell level to yield a comprehensive view of cellular functioning.
These techniques produce the same searchable output files as scRNA-seq and ATAC-seq (see above), as well as the files listed in Table 3.
TABLE 3 | |
Output file type | Description |
adt-batch-summary-report | A summary report of ADT data, including quality metrics and overall statistics. |
adt-tea-seq-well-report | A detailed report for individual wells in a TEA-seq experiment, including per-well quality metrics and protein expression data. |
tea-main-batch-summary-report | A summary report containing combined gene expression statistics, protein level metrics, and chromatin accessibility data. |
Supervised gating
OpenCyto
Supervised gating is the automated application of gating criteria approved by a subject matter expert. First, an R package called FlowCut is used to examine ingested flow cytometry (FCS) files for irregularities introduced when the data was generated on the instrument. This step produces a new QC'd file, with irregularities removed, and a QC report. Next, an R package called OpenCyto is used to put each QC'd file through the supervised gating step itself. This step produces a report, cell population stats, and MFI files.
These result files are not immediately available for further analysis. First they must be reviewed and approval by a dedicated team of scientists. If a pipeline run is rejected, the team of scientists adjusts the data before making it available to others.
The specific searchable output files for flow cytometry are summarized in Table 4.
TABLE 4 | |
Output file type | Description |
FCS file (.fcs ) | A file that has been QC'd using FlowCut to remove irregularities. |
FlowCytometry-decoration-report-html | A pipeline result report file. |
FlowCytometry-decoration-report-csv | The output from the flow-qc-report. |
FlowCytometry-supervised-stats | A CSV file of the population counts for a kit generated with OpenCyto packages. Used to visualize population-based studies. |
FlowCytometry-supervised-report | A report generated by OpenCyto that includes plots showing the gates. |
FlowCytometry-supervised-mfis | A CSV file containing MFI values for cell populations, generated using OpenCyto supervised gating methods. |
FlowCytometry-supervised-hierarchy-report | A PNG graph showing the hierarchical gating structure. |
FlowCytometry-supervised-gating-set-pb | One of three output files generated with the save_gs function, so that the gating set can be loaded into your IDE. |
FlowCytometry-supervised-gating-set-h5 | The second of three output files generated with the save_gs function, so that the gating set can be loaded into your IDE. |
FlowCytometry-supervised-gating-set-gs | The third of three output files generated with the save_gs function, so that the gating set can be loaded into your IDE. |
FlowCytometry-supervised-comp | A CSV file that captures the compensation used during the supervised gating process. |
CyAnno (default)
The CyAnno pipeline is a machine learning framework that uses various models for each panel to label the cell types from a dataset. Unlike in OpenCyto, the CyAnno pipeline has no intermediate QC step.
The specific searchable output files for this pipeline are shown in Table 5.
TABLE 5 | |
Output file type | Description |
FlowCytometry-labeled-expr-csv | A CSV report of each cell and its labeled cell type. |
FlowCytometry-prediction-report | A collection of plots visualizing the cell population reports. |
FlowCytometry-summary-frequency-stats | A CSV file containing cell population summaries. |
FlowCytometry-decoration-report-csv | The input file of the CyAnno process. |
Olink Proteomics
In the pipeline for Olink Proteomics, Olink provides a raw results file and a PDF report on data that's missing because of an analysis problem. The appropriate samples are associated with either the results file or the missing data report.
The specific searchable output files are shown in Table 6.
TABLE 6 | |
Output file type | Description |
Olink | An Excel file with the results for an Olink batch. |
OlinkReport | A PDF certificate of analysis for an Olink batch. |
5-prime V(D)J
In the 5-prime V(D)J pipeline, the T cell receptor (TCR)/B cell receptor (BCR) contig information also comes with scRNA-seq data. The core scientific analysis method is also Cell Ranger alignment. This is a multimodal pipeline. Cell Ranger multi aligns the scRNA and contig sequence and improves cell calling. The Cell Ranger alignment produces an H5 output file for scRNA, and CSV files for both TCR and BCR contig information.
In the cell hashing pipeline, scRNA-seq data are processed the same as in the simple scRNA-seq pipeline. The contig CSV file of TCR/BCR is demultiplexed by the HTO barcodes and merged with each sample in the pool.
After the scRNA and contig file are dehashed and merged, the 5-prime V(D)J pipeline adds the contig information for TCR/BCR, arranged by cell, into the metadata of an H5 file. The 5-prime V(D)J pipeline also produces the TCR/BCR CSV files arranged by contig. The labeling pipeline is the same as for scRNA-seq, and the labels are based on scRNA-seq data.
The specific searchable output files are shown in Table 7.
TABLE 7 | |
Output file type | Description |
vdj-main-batch-summary-report | A file containing all quality metrics from Cell Ranger output, HTO QC, ADT QC, and scRNA QC, as well as some basic QC of TCR/BCR contig. |
scRNA HTO merge summary | A report generated by merging all the wells in the batch. |
scRNA HTO count processing report | A report quantifying the HTO reads from multiplexed single-cell RNA sequencing experiments, including metrics on cell barcode identification and HTO assignment. |
scRNA seq labeled | A Seurat-based, labeled H5 file for a sample. The contig information of TCR/BCR is stored in the metadata. |
scRNA seq labeled report | A Seurat-based, labeled report for the entire batch. |
scRNA CellRanger summary | The web_summary.html file generated by 10x Genomics Cell Ranger multi. |
TCR contig | A CSV file containing the contig information of T cell receptors, arranged by chain type. |
BCR contig | A CSV file containing the contig information of B cell receptors, arranged by chain type. |
Vizgen
The Vizgen MERSCOPE is a spatial transcriptomics platform that simultaneously detects multiple RNA transcripts within single cells and maps their precise locations within the tissue sample. This analysis provides unique insights into the physical location and functional interactions of different cell types within a tissue.
The Vizgen pipeline uses decoded transcripts and cell-segmented information from the Vizgen instrument to generate an AnnData object containing cell-type annotations and marker genes. Cell segmentation is initiated to generate these result files. This step produces QC reports and filtered parquet files. Next, each QC’d file undergoes a decomposition process to generate marker genes. Finally, a metrics process generates CSV files that describe the spatial properties of the transcripts.
The specific searchable output files are shown in Table 8.
TABLE 8 | |
Output file type | Description |
spatial-segment-cell-boundaries-parquet | A file containing segmented cell boundary information from spatial transcriptomics data, stored in Parquet format. |
spatial-segment-cell-bygene | A CSV file containing gene expression counts for each segmented cell. Rows represent cells, and columns represent genes. |
spatial-segment-cell-metadata | A CSV file containing spatial metadata, including cell ID, field of view, volume, and centroid coordinates, for each segmented cell. |
spatial-segment-detected-transcripts | A CSV file containing all detected transcripts, including their spatial coordinates, gene identity, and field of view. |
Fixed RNA
The fixed-RNA pipeline takes FASTQ files from short-read sequencers and applies a number of steps to generate a decorated H5 file. The initial processing is done by the 10x Genomics Cell Ranger tool, which produces a cell-by-gene matrix and performs demultiplexing if necessary. The pipeline then adds extra metadata to these H5 files and generates a QC report. This QC report determines if any samples, wells, or pools should be excluded from downstream processing.
Once the samples are approved, the pipeline merges the multiplexed samples into a single H5 file per sample. This consolidation is followed by cell-type labeling, using one of the currently available references (PBMCs or BMMCs). This final step in the pipeline produces a decorated H5 file with additional metadata and cell-type labels.
The searchable output files are summarized in Table 9.
TABLE 9 | |
Output file type | Description |
frna-labeled-h5 | A labeled H5 file for a given sample. |
frna-qc-report | An fRNA QC report. |
celltypist-csv | A CellTypist CSV file. |
celltypist-labeled-h5 | An fRNA CellTypist H5 file. |
fRNA-Seq-tenx-report | A pipeline result report file. |
Meso Scale Discovery
MSD uses electrochemiluminescence to detect and quantify multiplex biomarkers. MSD assays analyze multiple cytokines simultaneously in complex biological samples. They're highly sensitive, have a broad dynamic range, and perform better in serum compared with certain other multiplex platforms.
The specific searchable output files are shown in Table 10.
TABLE 10 | |
Output file type | Description |
MSD-analyzed-data-CSV | A CSV file containing protein concentrations calculated from standard curves. Uses MSD raw signal data as input. |
MSD-raw-data | A text file output containing MSD raw signal data. |
MSD-standard-curve-data-CSV | A CSV file containing standard curve metrics. Uses MSD raw signal data as input. |
Spatial analysis
Spatial analysis uses spatial and molecular data to map the distribution of biomarkers. These maps yield insights into tissue architecture and cellular phenotype, helping scientists study molecular processes and mechanisms of disease.
The specific searchable output files are shown in Table 11.
TABLE 11 | |
Output file type | Description |
spatial-analysis-metric-view | Visualization of spatial patterns, relationships, and metrics. |
spatial-analysis-qc-report | Pipeline result report file. |