This page describes various identifiers used in the metadata.
Various components of the submission system have symbolic identifiers. These identifiers consist of ASCII upper- and lower-case alphabetic characters, numbers, and underscores. They must not begin with numbers. In some cases, specific prefixes are required.
Feature identifiers (transcripts and genes) and read identifiers may contain any combination of printing, non-whitespace characters ASCII. This requirement is more restrictive than allowed by GTF.
Identifiers assigned by Synapse are in the form ```syn123456``.
iso_detect_ref
- Challenge 1: transcript isoform detection with a high-quality genomeiso_quant
- Challenge 2: transcript isoform quantificationiso_detect_de_novo
- Challenge 3: de novo transcript isoform detectionhuman
- taxon 9606mouse
- taxon 10090manatee
- taxon 127582simulated
- simulated readsThey following symbols identify the data category to which an entry belongs:
long_only
- uses only LGRASP-provided long-read RNA-Seq data from a single sample, library preparation method and sequencing platform.short_only
- uses only LGRASP-provided short-read Illumina RNA-Seq data from a single sample. This is to compare with long-read approaches.long_short
- uses only LGRASP-provided long-read and short-read RNA-Seq data from a single long-read library preparation method and the Illumina platform. Additional accessioned data in public genomics data repositories can also be used.long_genome
- Use only LGRASP-provided long-read RNA-Seq data from a single long-read library preparation method. A non-reference quality genome sequence can be used.freestyle
- any combination of at least one LRGASP data set as well as any other accessioned data in public genomics data repositories. For example, multiple library methods can be combined (e.g. PacBio cDNA + PacBio CapTrap, ONT cDNA + ONT CapTrap+ ONT R2C2+ ONT dRNA, all data, etc.). LRGASP simulated reads may not be used in freestyle experiments.Library preparation method
CapTrap
dRNA
R2C2
cDNA
Simplified RNA sequencing platform
Illumina
PacBio
ONT
WTC11
- human WTC11 iPSC cell lineH1_mix
- human H1 ES cell line mixed with human Definitive Endoderm derived from H1ES
- mouse Castaneus X S129/SvJae F121-9 ES cell lineblood
- manatee whole bloodmouse_simulation
- simulated mouse RNAshuman_simulation
- simulated human RNAsH1
- human H1 ES cell line for second-phase quantification experiments.endodermal
- human Definitive Endoderm derived from H1 for second-phase quantification experiments.The following public data repositories symbols are used to specify where non-LRGASP
data used in experiments has been obtained, as specified in the experiment
experiment JSON extra_libraries
field.
If another public archive is needed, please create an issue in the
LRGASP submissions GitHub tracker.
SRA
- NCBI SRA, the SRA and ENA share an accession namespace and are periodically synchronized. Please use the repository from which you obtained the data.ENA
- EMBL-EBI ENA, the ENA and SRA share an accession namespace and are periodically synchronized. Please use the repository from which you obtained the data.INSDC
- One of the INSDC database (DBDB, EMBL-EBI/ENA, or NCBI). These share an accession namespace and are synchronized daily.ENC
- ENCODE DCC.