Understand Certificates of Reproducibility
At a Glance
A Certificate of Reproducibility (CertPro) is a detailed record of your scientific research generated in real time to foster reliable reproducibility of study results. This step-by-step trace includes your data, scientific algorithms, computational environment, tools stack, and results.
Abbreviations Key | |
AI | artificial intelligence |
CertPro | Certificate of Reproducibility |
HISE | Human Immune System Explorer |
IDE | integrated development environment |
ML | machine learning |
Background
Scientific research has proliferated in the past decade, generating a huge volume of data. Large-scale computational analysis has been driven by AI/ML–driven technological advances and wider availability of computing resources. But the pressure to publish, stiff competition for funding, natural human biases, and other factors encourage inaccurate reporting in published work. As a result, research findings often can't be reproduced, calling into question the validity of the original studies. The solution to this reproducibility gap is traceability.
Description
Detailed documentation is essential for reproducibility of large-scale data analyses. Our framework generates a reproducible trace called a Certificate of Reproducibility (CertPro). CertPro documents your data, methods, and tools so that other scientists can verify your analysis. A certificate is a badge of honor showing that your work meets the highest standards of scientific rigor, transparency, and integrity.
The data trace includes the data itself, enabling other scientists to reproduce your original results. These executable traces are set up in a dedicated collaboration space where the steps can be re-run. CertPro offers a transparent, literate analysis of your thought process as you carry out your research. This clear thread promotes team communication and reduces duplication of effort. The accompanying image shows part a certificate.
Real-time tracking
In a retroactive approach, the scientific team backtracks in an effort to reconstruct its analysis path. In our proactive approach, the scientific team actively tracks its own analysis methodology as the study unfolds.
Data provenance
Provenance (origin) tracing based on file metadata shows the precise path of your data, from samples to results. The data trace shows an uninterrupted thread from ingest receipt to published result, with at least one processing step in between.
Environment
CertPro captures the operating system, dependencies, and packages you used so that your analysis can be reliably reproduced.
Immutable file storage
CertPro generates immutable files to create a reproducible data trace that can't be accidentally overwritten.
Components
Certificates are awarded for file sets, visualizations, and notebooks. Reports and GitHub repositories are not eligible for certification. The vertices are color coded, as shown in the accompanying key.
Visibility and Access
A certificate can grow as research progresses or be pared down as publication assets are removed. You can mask sensitive fields or delete metadata as necessary, and you can preview your certificate repeatedly as your analysis proceeds. You can assign access privileges to others, such as reviewers. After publication, any HISE user can execute all or part of your public trace.
Certificate Generation
Certificates are generated in a dedicated workspace within your organization's account. (New or guest users have guest accounts.) HISE creates a trace as you work in the collaboration space to transform your data into analyses, insights, and new assets, such as visualizations. CertPro moves backward through the steps to create a graph composed of vertices (nodes) and connections that outline the processing pipeline and note the availability of data. As a result of this process, transparent, traceable analyses are awarded CertPro credentials. To get started, see Related Resources.
Related Resources
Explore Certificates of Reproducibility