Metadata files are in JSON format. JSON provides a good compromise between able to store structured data and ease of use. Templates and a validator are provided.
entry.json
This file contains information about the entry and team that is submitting
it. This is at the top of an entry tree (see Submission structure).
See entry.json
for an example. An empty
template is also available: entry.json
.
entry_id
- submitter-assigned symbolic identifer for this entry.challenge_id
- challenge to which this entry is being submitted.
See LRGASP Challenge identifiersteam_name
- the name of the Synapse team or Synapse user submitting the entry. This should exactly match the
name in Synapse.experiment_ids
- Experiment ids, which is also the directory name containing the
experiment. It is a symbolic identifer for this entry.data_category
- one of long_only
, short_only
, long_short
, long_genome
, or freestyle
.
See Experiment data categories.samples
- one or more of WTC11
, H1_mix
, ES
, blood
, mouse_simulation
, or human_simulation
.
See Sample identifiers.library_preps
- one or more of CapTrap
,dRNA
, R2C2
, or cDNA
, as allowed by data category.
See Library prep.platforms
- one or more of Illumina
, PacBio
, or * ONT
. as allowed by data category.
See Sequencing platform.notes
- notes (optional)contacts
- an array of contacts, with the first entry considered the primary contact
name
- name of the contactemail
- e-mail of the contact, which can be an e-mail listnotes
- notes about the contact (optional). Institutional information can be included here.experiment.json
This file describes the experiment, specifying all data files. One is created
in each experiment directory (see Experiment structure).
Data files are either in the experiment directory or sub-directories. All files paths in
experiment.json
are relative to the directory containing experiment.json
.
See experiment.json
for an example.
An empty template is also available: experiment.json
.
experiment_id
- experiment symbolic identifer for this entry, defined by the submitter.challenge_id
- challenge to which this entry is being submitted, see LRGASP Challenge identifiers. This must match the value inentry.json
.description
- description of experimentnotes
- notes (optional)species
- one of human
, mouse
, manatee
, or simulated
, see Species identifiers.data_category
- one of long_only
, short_only
, long_short
, long_genome
, or freestyle
,
See Experiment data categories.samples
- one or more of WTC11
, H1_mix
, ES
, blood
, mouse_simulation
, or human_simulation
.
See Sample identifiers.library_preps
- one or more of CapTrap
,dRNA
, R2C2
, or cDNA
, as allowed by data category.
See Library prep.platforms
- one or more of Illumina
, PacBio
, or * ONT
. as allowed by data category.
See Sequencing platform.libraries
- list of LRGASP RNA-Seq file accessions used in the experiment. The file accessions are those found in the LRGASP RNA-Seq Data Matrix. For non-freestyle experiments, only replicates of the same sample and library preparation. It must be one sequence method, or one sequencing method plus Illumina short-read sequencing. For freestyle experiments, any combination of LRGASP libraries may be specified, with at least one LRGASP library being used. For paired-end Illumina experiments, both pairs must be specified.extra_libraries
- list of non-LRGASP libraries files accessions that were used. Optional; should be empty or omitted for non-freestyle experiments. For Challenge 3, may also include external transcript that is used.
repository
- Public repository where data was obtained; one of the values in
Public repository identifiersacc
- accession in a public repository for input data file.notes
- notes about the file (optional)software
- list of software used by the pipeline:
name
- the name of the software packagedescription
- description of software (optional)version
- version of the softwareurl
- URL to the software repositoryconfig
- command line and/or configuration optionsnotes
- notes about the software or how it was used (optional)