Produce Searchable Output Files

	Abbreviations Key
ADT	antibody-derived tag
ARC	ATAC + RNA chromium
archR	analysis of regulatory chromatin in R
BMMCs	bone marrow mononuclear cells
CITE-seq	cellular indexing of transcriptomes and epitopes by sequencing
CSV	comma-separated value
FCS	flow cytometry standard
HTO	hashtag oligo
MFI	mean or median fluorescence intensity
MSD	Meso Scale Discovery
PB	probability binning
PBMCs	peripheral blood mononuclear cells
scATAC-seq	single-cell assay transposase-accessible chromatin sequencing
scRNA-seq	single-cell RNA sequencing
TSS	transcription start site
tsv.gz	tab-separated values, GNU zip
V(D)J	variable, (diversity), and joining [gene segments]

At a Glance

HISE supports a variety of analysis pipelines that generate searchable output files. These files contain analysis results plus one or more reports. This document discusses the specific files each analysis pipeline produces.

scRNA-seq

In a simple scRNA-seq pipeline, the core scientific analysis method is CellRanger alignment. This analysis produces a report and an output H5 file.

In the cell hashing pipeline, where a number of samples have been barcoded and then mixed, CellRanger alignment is needed. In addition, a barcode recognition and counting process is used to identify the origin of each cell so that the results can be rearranged to produce an output H5 file for each sample. Both pipelines end with a labeling process, using a Seurat-based normalization and labeling method.

The specific searchable output files for the scRNA-seq pipeline are summarized in Table 1.

TABLE 1
Output file type	Description
scRNA-seq-CellHashing-Main-QC-report	Pipeline result report file that contains cell hashing and sample multiplexing info.
scRNA-seq- labeled	A Seurat-based, labeled H5 file. Visualizes metadata and clinical features of a sample and patient to investigate a single time point or examine a longitudinal shifts in a patient population.
scRNA-seq-tenx-report	A web_summary.html file generated by Cell Ranger containing QC metrics for 10x Genomics scRNA-seq data.
scRNA-seq-merged	An H5 file containing scRNA-seq data from multiple batches.

scATAC-seq

In the scATAC-seq pipeline, we implement Cell Ranger alignment, followed by a rigorous quality control process to ensure that cells from the scATAC-seq pipeline are high quality, to reduce the number of doublets, and to make the cells available for downstream analysis in a variety of formats (.arrow, fragments.tsv.gz, and H5-formatted count matrices).

The pipeline results files are not immediately available for further analysis, but require review and approval by a dedicated team of scientists. Once approved, the data is put through a labeling process using archR to provide an initial cell-type label for each cell.

The specific searchable output files for the scATAC-seq pipeline are summarized in Table 2.

TABLE 2
Output file type	Description
atac-archr-label-results	Results of cell-type labeling using archR's addGeneIntegrationMatrix against a Seurat scRNA-seq reference.
atac-assembly archr-arrow	An `.arrow` file generated by archR that can be used as an input to archR projects for downstream analysis.
atac-assembly-filtered-fragments-tsv-gz	File containing unique fragment positions for cell barcodes that pass QC and doublet filtering.
atac-assembly-read-counts-gene bodies-h5	A matrix in which each row represents a gene body, and each column represents a cell. The values show how many ATAC-seq reads align to each gene body in each cell. This read count matrix is stored in HDF5 format, similar to 10x Genomics scRNA-seq outputs. Genes are defined by Ensembl v93 and filtered to match the scRNA-seq reference.
atac-assembly-read-counts-per-region-h5	A count matrix for TSS regions (+/- 2 KB, rows) x cells (columns) stored in HDF5 format, similar to 10x Genomics scRNA-seq outputs. Genes are defined as for gene bodies, above.
atac-assembly-read-counts-per-windows-h5	A whole-genome 5 KB window-count matrix with windows (rows) x cells (columns). This matrix is stored in HDF5 format, similar to the scRNA-seq outputs from 10x Genomics.
cellranger-atac-possorted_genome	The `web_summary.html` report generated by Cell Ranger-ATAC.

TEA-seq combinations

TEA-seq is a trimodal single-cell assay that simultaneously measures transcriptomics, protein epitopes, and chromatin accessibility. This assay identifies cell type–specific gene regulation and expression grounded in phenotypically defined cell types.

In the TEA-seq pipeline, we couple Cell Ranger ARC with a rigorous QC process to ensure that cells from the TEA-seq pipeline are high quality, to reduce the number of doublets, and to make cells available in a variety of formats for downstream analysis.

In addition to TEA-seq, HISE supports the analysis of data generated by hashed TEA-seq and by CITE-seq + scRNA-seq. These methods integrate multiple layers of biological information at the single-cell level to yield a comprehensive view of cellular functioning.

These techniques produce the same searchable output files as scRNA-seq and ATAC-seq (see above), as well as the files listed in Table 3.

TABLE 3
Output file type	Description
adt-batch-summary-report	A summary report of ADT data, including quality metrics and overall statistics.
adt-tea-seq-well-report	A detailed report for individual wells in a TEA-seq experiment, including per-well quality metrics and protein expression data.
tea-main-batch-summary-report	A summary report containing combined gene expression statistics, protein level metrics, and chromatin accessibility data.

Supervised gating

OpenCyto

Supervised gating is the automated application of gating criteria approved by a subject matter expert. First, an R package called FlowCut is used to examine ingested flow cytometry (FCS) files for irregularities introduced when the data was generated on the instrument. This step produces a new QC'd file, with irregularities removed, and a QC report. Next, an R package called OpenCyto is used to put each QC'd file through the supervised gating step itself. This step produces a report, cell population stats, and MFI files.

These result files are not immediately available for further analysis. First they must be reviewed and approval by a dedicated team of scientists. If a pipeline run is rejected, the team of scientists adjusts the data before making it available to others.

The specific searchable output files for flow cytometry are summarized in Table 4.

TABLE 4
Output file type	Description
FCS file (`.fcs`)	A file that has been QC'd using FlowCut to remove irregularities.
FlowCytometry-decoration-report-html	A pipeline result report file.
FlowCytometry-decoration-report-csv	The output from the flow-qc-report.
FlowCytometry-supervised-stats	A CSV file of the population counts for a kit generated with OpenCyto packages. Used to visualize population-based studies.
FlowCytometry-supervised-report	A report generated by OpenCyto that includes plots showing the gates.
FlowCytometry-supervised-mfis	A CSV file containing MFI values for cell populations, generated using OpenCyto supervised gating methods.
FlowCytometry-supervised-hierarchy-report	A PNG graph showing the hierarchical gating structure.
FlowCytometry-supervised-gating-set-pb	One of three output files generated with the `save_gs` function, so that the gating set can be loaded into your IDE.
FlowCytometry-supervised-gating-set-h5	The second of three output files generated with the `save_gs` function, so that the gating set can be loaded into your IDE.
FlowCytometry-supervised-gating-set-gs	The third of three output files generated with the `save_gs` function, so that the gating set can be loaded into your IDE.
FlowCytometry-supervised-comp	A CSV file that captures the compensation used during the supervised gating process.

CyAnno (default)

The CyAnno pipeline is a machine learning framework that uses various models for each panel to label the cell types from a dataset. Unlike in OpenCyto, the CyAnno pipeline has no intermediate QC step.

The specific searchable output files for this pipeline are shown in Table 5.

TABLE 5
Output file type	Description
FlowCytometry-labeled-expr-csv	A CSV report of each cell and its labeled cell type.
FlowCytometry-prediction-report	A collection of plots visualizing the cell population reports.
FlowCytometry-summary-frequency-stats	A CSV file containing cell population summaries.
FlowCytometry-decoration-report-csv	The input file of the CyAnno process.

Olink Proteomics

In the pipeline for Olink Proteomics, Olink provides a raw results file and a PDF report on data that's missing because of an analysis problem. The appropriate samples are associated with either the results file or the missing data report.

The specific searchable output files are shown in Table 6.

TABLE 6
Output file type	Description
Olink	An Excel file with the results for an Olink batch.
OlinkReport	A PDF certificate of analysis for an Olink batch.

5-prime V(D)J

In the 5-prime V(D)J pipeline, the T cell receptor (TCR)/B cell receptor (BCR) contig information also comes with scRNA-seq data. The core scientific analysis method is also Cell Ranger alignment. This is a multimodal pipeline. Cell Ranger multi aligns the scRNA and contig sequence and improves cell calling. The Cell Ranger alignment produces an H5 output file for scRNA, and CSV files for both TCR and BCR contig information.

In the cell hashing pipeline, scRNA-seq data are processed the same as in the simple scRNA-seq pipeline. The contig CSV file of TCR/BCR is demultiplexed by the HTO barcodes and merged with each sample in the pool.

After the scRNA and contig file are dehashed and merged, the 5-prime V(D)J pipeline adds the contig information for TCR/BCR, arranged by cell, into the metadata of an H5 file. The 5-prime V(D)J pipeline also produces the TCR/BCR CSV files arranged by contig. The labeling pipeline is the same as for scRNA-seq, and the labels are based on scRNA-seq data.

The specific searchable output files are shown in Table 7.

TABLE 7
Output file type	Description
vdj-main-batch-summary-report	A file containing all quality metrics from Cell Ranger output, HTO QC, ADT QC, and scRNA QC, as well as some basic QC of TCR/BCR contig.
scRNA HTO merge summary	A report generated by merging all the wells in the batch.
scRNA HTO count processing report	A report quantifying the HTO reads from multiplexed single-cell RNA sequencing experiments, including metrics on cell barcode identification and HTO assignment.
scRNA seq labeled	A Seurat-based, labeled H5 file for a sample. The contig information of TCR/BCR is stored in the metadata.
scRNA seq labeled report	A Seurat-based, labeled report for the entire batch.
scRNA CellRanger summary	The `web_summary.html` file generated by 10x Genomics Cell Ranger multi.
TCR contig	A CSV file containing the contig information of T cell receptors, arranged by chain type.
BCR contig	A CSV file containing the contig information of B cell receptors, arranged by chain type.

Fixed RNA

The fixed-RNA pipeline takes FASTQ files from short-read sequencers and applies a number of steps to generate a decorated H5 file. The initial processing is done by the 10x Genomics Cell Ranger tool, which produces a cell-by-gene matrix and performs demultiplexing if necessary. The pipeline then adds extra metadata to these H5 files and generates a QC report. This QC report determines if any samples, wells, or pools should be excluded from downstream processing.

Once the samples are approved, the pipeline merges the multiplexed samples into a single H5 file per sample. This consolidation is followed by cell-type labeling, using one of the currently available references (PBMCs or BMMCs). This final step in the pipeline produces a decorated H5 file with additional metadata and cell-type labels.

The searchable output files are summarized in Table 8.

TABLE 8
Output file type	Description
frna-labeled-h5	A labeled H5 file for a given sample.
frna-qc-report	An fRNA QC report.
celltypist-csv	A CellTypist CSV file.
celltypist-labeled-h5	An fRNA CellTypist H5 file.
fRNA-Seq-tenx-report	A pipeline result report file.

Meso Scale Discovery

MSD uses electrochemiluminescence to detect and quantify multiplex biomarkers. MSD assays analyze multiple cytokines simultaneously in complex biological samples. They're highly sensitive, have a broad dynamic range, and perform better in serum compared with certain other multiplex platforms.

The specific searchable output files are shown in Table 9.

TABLE 9
Output file type	Description
MSD-analyzed-data-CSV	A CSV file containing protein concentrations calculated from standard curves. Uses MSD raw signal data as input.
MSD-raw-data	A text file output containing MSD raw signal data.
MSD-standard-curve-data-CSV	A CSV file containing standard curve metrics. Uses MSD raw signal data as input.

Spatial analysis

Spatial analysis uses spatial and molecular data to map the distribution of biomarkers. These maps yield insights into tissue architecture and cellular phenotype, helping scientists study molecular processes and mechanisms of disease.

The specific searchable output files are shown in Table 10.

TABLE 10
Output file type	Description
spatial-analysis-metric-view	Visualization of spatial patterns, relationships, and metrics.
spatial-analysis-qc-report	Pipeline result report file.

Related Resources

Use Advanced Search (Tutorial)