Attach Metadata To Files
Automated pipelines enable HISE’s pipeline service to generate Result Files with Subject and Sample metadata attached to each file; pipelines standardize the data processing for an assay. However, there are situations in which data is experimental or part of a pilot, making it unsuitable for pipelines.
For situations where files don’t need to be processed in an automated pipeline, you can upload your files directly to a Project Store and then download them onto your IDE. By default, no metadata will be attached to these files; there are mechanisms in place that will allow you to attach metadata either at ingest, or later on after some experimentation has been done. We highly recommend to attach metadata to your files at ingest, that way any other HISE users can easily find your files via Advanced Search.
Attaching Metadata at Ingest
Create Manifest
In order to attach metadata to files going to a Project Store at ingest, you first create a manifest file that defines all the files you are ingesting, along with their File Types and Sample reference. This must be named “manifest.csv
” in order for it to be properly ingested into HISE.
Below is a sample manifest.csv
file. The first two lines show the account and project to which this data belongs. The third line is a static line showing three column headers: file
, samples
, fileType
. The remaining lines define the name of the file, the associated sample, and the file type. If a file is not associated with a sample, the special keyword reference
can be used.
accountGuid: 10f58583-1cdf-4f18-8de4-dc1ca94783e2 | ||
projectGuid: urn:hise:project:any | ||
file | samples | fileType |
EXP-00236-vizgen-submission.xlsx | KT00970;KT00971;KT00972 | Eln-Vizgen-Metadata |
EXP-0422_LN_region_0.vzg | KT00970;KT00971;KT00972 | vizgen_file |
subdirectory/population-stats.csv | KT00970;KT00971;KT00972 | FlowCytometry |
Note that for the manifest.csv
file:
- The first two lines contain the Account and Project this data belongs to
- Note that the first two rows are not required, but can be used if you want to ensure the data is put in the correct Account and Project (in case, for example, the file was dropped in the wrong watchfolder).
- The third line is a static line "
file, samples, fileType
" - The remaining lines define the name of the file, the associated Sample, and the File Type of the file
- Multiple Samples use the delimiter “;”
- Adhere to the folder structure when defining files in the manifest
- Note that if a file is not associated with a Sample, the special keyword "
reference
" can be used.
To create a manifest.csv from our LIMS (SLIMS) you must be using a Simplified ELN experiment. Select the block with your content:
From the top menu select “Generic Generate HISE Manifest”:
Select the Manifest you would like to create and click “Finish”:
If the File Type that you want to declare does not exist in the Project or the manifest is not available in LIMS, please contact immunology-support@alleninstitute.org
Create Tar File
After creating a manifest file, you need to tarball all of the files listed in your manifest file along with the manifest file. Once the tar file is dropped in the watchfolder, HISE’s decorator service will untar everything and upload data to the Project store that the watchfolder is linked to, and will add the Samples and/or File Types declared in the manifest.
To create a tar file, first move all of your files into a single folder. Then open a terminal and run the following command:
tar -cvf desired_tarball_name.tar.gz source-directory-name
For example, say we have a directory called /home/jupyter/bulk_upload and you would like to compress this directory then you can type the following tar command:
tar -cvf project_store_batch.tar /home/jupyter/bulk_upload
Ingest Tar into Watch Folder
Once you have tar file, you can now upload it into the watch folder. Navigate to the correct watch folder that’s configured to the project you’re working on and drag and drop that tar file.
Ingest Tar into Watch Folder
Once you have tar file, you can now upload it into the watch folder. Navigate to the correct watch folder that’s configured to the project you’re working on and drag and drop that tar file.
After a brief time has passed, you should now see your files in the project store with the samples and file types you specified. To confirm your files were ingested properly, navigate to the Data Processing and select Ingest Receipts from the dropdown selection.
You should see that each file has a "status==success". If this is not the case and you are unable to resolve the error, don’t hesitate to reach out to the support team by emailing immunology-support@alleninstitute.org.
Upon successful completion of ingesting of the tar file you will also receive an email listing how the files were tagged and where they are available.
Use the Project Store UI to Attach Metadata
Before you can add file metadata, you first need to select the projects you are working on. Navigate to your Personal Space and select Projects and make sure your desired projects are selected.
Now navigate to your Personal Space again, but this time select Project Stores. Then select your project.
Now select all the files and click “Add file metadata” in the top right corner of the screen. This will pop-up a new prompt where you can add metadata like file type, sample kit GUID, or batch ID.
Lastly, click “Submit” and you should now see your selected files with values for the fields that were filled in.
cohort | cohortDescription | subjectGuid | sampleKitGuid | birthYear | daysSinceFirstVisit | ethnicity | race | sex | specimenGuid | totalCellCount | visitDetails | visitName |
subjectA | sampleBB | 100 | NA | LastVisit | ||||||||
subjectA | sampleBC | 50 | NA | moreVisit | ||||||||
sampleBB | specimenB | 10000 | ||||||||||
sampleBC | specimenC | 8888 |
The demographics scheme mapping must be set up by admin prior to ingesting a demographics file. Each Project has its own demographics scheme.Subject demographics for associated samples are provided as part of the manifest delivered to the HISE wet lab. This data is automatically transferred to HISE. However, it is possible to submit some demographics data through a watchfolder instead (for example, as part of a set of survey data).
The following table shows a sample demographics scheme:
Variable in CSV | Variable in HISE |
birthYear | subject.birthYear |
cohort | cohort.cohortGuid |
cohortDescription | cohort.description |
daysSinceFirstVisit | sample.daySinceFirstVisit |
draw date | sample.drawDate |
ethnicity | subject.ethnicity |
race | subject.race |
sampleKitGuid | sample.sampleKitGuid |
sex | subject.biologicalSex |
specimenGuid | specimen.specimenGuid |
subjectGuid | subject.subjectGuid |
totalCellCount | specimen.totalViableCellCount |
visitDetails | sample.visitDetails |
visitName | sample.visitName |
Vizgen Submission
Submission sheets are the means of ingest into HISE that allows pipeline to be run. A Vizgen Submission can be created from our LIMS using a Vizgen Simplified ELN (Electronic Lab Notebook) Experiment in the same way a HISE Manifest is created. Alternatively a Submission can be created offline, care must be taken that it matches the format described in this section.
The following fields are used to populate the Vizgen Submission sheet:
AifiBarcode: Unique Identifier of the Tissue content generated by LIMS.
Reagent Panel: Gene panel reagent used
The basic file name has a syntax as follows:
Prefix_Vizgen_Submission_#.xlsx
The Prefix can be any combination of letters and numbers and can include “-”. It is recommended that this conform to the experiment unique identifier in LIMS or a Batch ID. “Submission” can have a capital letter s or lower case letter s. The ending number “#” must be a single digit 0-9.
Example File Names that would be handled by the decorator as a Vizgen Submission:
EXP-0123_Vizgen_submission_1.xlsx
EXP-500_Vizgen_Submission_2.xlsx
B150_Vizgen_Submission_1.xlsx
The Vizgen Submission Workbook has two Worksheets “Header” and “Tissue”:
Example Header Worksheet:
SubmissionBC | Type |
NDRI1020 | FFPE MERSCOPE slide |
- SubmissionBC: Subject of the tissue(s)
- Type: The Tissue sub-type from LIMS
Example Tissue Worksheet:
Tissue | Region | Specimen Type | Sample | Vizgen Panel | Section Distance |
TIS04416-001-004 | 1;2 | Tissue - Intact | KT04416 | VA00183 | 10 μm |
- Tissue: AifiBarcode of the tissue content in LIMS.
- Region: Region for imaging processing in the pipeline, has a “;” delimiter if multiple regions are input.
- Specimen Type: Content Type from LIMS
- Sample: Sample Aliquot Kit AifiBarcode that the Tissue content is from.
- Vizgen Panel: Gene panel that has been linked to the Vizgen Experiment in LIMS.
- Section Distance: Value entered in LIMS as the distance from the previous tissue section.
Note: A submission ID is created from the SubmissionBC on the header tab and the Batch ID or Experiment ID in the prefix of the file name. The Submission ID must be unique in order for the Submission to be created in HISE.
Last updated 2024-12-05