Metadata standards and repositories for Applied Sciences/TNW

Open Data
Research Data Management
FAIR
Author

Esther Plomp

Published

September 4, 2022

Relevant metadata standards

ISA-Tab

MIBBI (Minimum Information for Biological and Biomedical Investigations)

NeXus

OME-XML (Open Microscopy Environment XML)

PDBx/mmCIF (Protein Data Bank Exchange Dictionary and the Macromolecular Crystallographic Information Framework)

CIF (Crystallographic Information Framework)

QuDEx (Qualitative Data Exchange Format)

SDMX (Statistical Data and Metadata Exchange)

The Synthetic Biology Open Language (SBOL) & SBOL Visual

STRENDA

Croissant

MeSH (Medical Subject Headings), a controlled vocabulary used to index biomedical information (see also MeSH on demand, a tool to that automatically detects MeSH terms in a text).

Standard for Learning Object Metadata

Potentially relevant metadata standards

Genome Metadata

CF (Climate and Forecast) Metadata Conventions

CIM (Common Information Model)

CSMD (Core Scientific Metadata Model)

DIF (Directory Interchange Format)

ISO 19115

Observations and Measurements

CERIF (Common European Research Information Format)

Data Package

ABCD (Access to Biological Collection Data)

MODS (Metadata Object Description Schema)

EDAM-bioimaging

Repositories

DNA DataBank of Japan (DDBJ)

European Nucleotide Archive (ENA)

GenBank

dbSNP

European Variation Archive (EVA)

dbVar

Database of Genomic Variants Archive (DGVa)

EBI Metagenomics

NCBI Trace Archive

NCBI Sequence Read Archive (SRA)

International Nucleotide Sequence Database Collaboration

NCBI Assembly

Structural Biology Data Grid

Worldwide Protein Data Bank (wwPDB)

UniProtKB

Biological Magnetic Resonance Data Bank (BMRB)

Flowrepository

LIPID MAPS

PRIDE

SynBioHub

Open Energy Data Initiative (OEDI)

HuggingFace ML datasets (see blogpost for more info)

Reporting standards

RNA-seq/qpcr

Microscopy

Genomics

  • GenBank, EMBL-Bank, DDBL (processed sequence data), Sequence Read Archive (raw sequencing data)

  • Minimum Information about Highly Multiplexed Tissue Imaging (MITI) standard that applies best practices developed for genomics and for other microscopy data to highly multiplexed tissue images and traditional histology. (Schapiro et al. 2022)

  • MIAME standard for micro array data (Brazma et al. 2001)

  • Minimum information about a genome sequence (MIGS) specification (Field et al. 2008)

  • Minimum reporting guidelines for biological and biomedical investigations: the MIBBI project (Taylor et al. 2008)

  • Minimum Information Standards (MIxS) established a community-based mechanism for sharing genomic data through a common framework

  • FAIR genomes for the medical setting

  • MINSEQE describes the Minimum Information about a high-throughput nucleotide SEQuencing Experiment that is needed to enable the unambiguous interpretation and facilitate reproduction of the results of the experiment.

  • MinSCe is a minimum set of single-cell metadata categories and a checklist of information that can be used to describe a single-cell assay in sufficient detail to enable the analysis of transcriptomic data

  • Running projects such as: https://www.ga4gh.org/product/experiments-metadata-standard/

  • https://genomicsstandardsconsortium.github.io/mixs/

  • From FASTQ to BAM

Tools

Simulation outputs

Open Research

Open Research: Examples of good practice, and resources across disciplines

Other

MRI data

Image Data

Publications & articles

Battery

[..E]arly independent battery-data activities include those of Battery Archive, BIG-MAP [(BattINFO ontology)], Batteries Europe, and the Faraday Institution. Their diversity of locations and formats underscores the critical need for a singular approach to improve uniformity.38,41 - Ward et al. 2022

  • BEEP: A Python library for Battery Evaluation and Early Prediction

  • Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

  • Battery Microstructures Library

  • MATBOX: Microstructure Analysis Toolbox (Li-ion batteries)

  • BruggemanEstimator: Open Source Code for Estimating the Tortuosity Estimation of Lithium Ion Battery Porous Electrodes from SEM images

  • TauFactor is an application for calculating tortuosity factors from tomographic data.

  • Battery Data Toolkit, converts battery testing data from native formats to a standardized HDF5 file.

  • Ontologies for battery data (BattINFO & BVCO)

    • “There are currently two major ongoing initiatives dedicated to ontologizing the battery domain: The Battery Interface Ontology (BattINFO) and the Battery Value Chain Ontology (BVCO). BattINFO describes batteries on the cell level and below, including not only components, materials, and their interfaces, but also electrochemical processes, models, and characterization data. The objective of BattINFO is to support AI workflows and interoperability of battery data in the research and development community. On the other hand, BVCO describes aspects of the battery value chain with a strong focus on battery manufacturing and recycling. Both BattINFO and BVCO stem from the top-level ontology EMMO and are publicly available under open-source licenses.”
  • BIG-MAP Data Management Plan article + deliverable

  • Repositories:

Bio–nano

Minimum information reporting in bio–nano experimental literature (MIRIBEL)

CFD

More information