For details on result files that must be submitted, see Submission structure.
iso_detect_ref
)
WTC11
(human iPSC cell line)H1_mix
(human H1 ES cell line mixed with human Definitive Endoderm derived from H1)ES
(mouse ES cell line)human_simulation
- simulated human reads (Illumina, ONT cDNA, and PacBio cDNA)mouse_simulation
- simulated mouse reads (Illumina and PacBio cDNA, ONT dRNA)models.gtf.gz
read_model_map.tsv.gz
iso_quant
)
WTC11
(human iPSC cell line)H1_mix
(human H1 ES cell line mixed with human Definitive Endoderm derived from H1)human_simulation
- simulated human reads (Illumina, ONT cDNA, and PacBio cDNA)mouse_simulation
- simulated mouse reads (Illumina and PacBio cDNA, ONT dRNA)expression.tsv.gz
models.gtf.gz
iso_detect_de_novo
)
Manatee
(manatee whole blood)ES
(mouse ES cell line)rna.fasta.gz
read_model_map.tsv.gz
Computational methods may have been developed and tuned to a specific sequencing platform, library prep approach(e.g. ONT dRNA), or use of additional orthogonal data; therefore, entries are organized such that a comparison can be made across different tools using the same type of data. Additionally, it is important to evaluate how robust computational tools are to transcript analysis in different species or biological samples. Thus, for each entry to a challenge, a team will select a data category, library prep, and sequencing platform and submit experiments for all samples that are available for the challenge + library prep + sequencing platform combination. The samples that are available for a challenge + library prep + sequencing platform combination can be found in the LRGASP RNA-seq Data Matrix. Note that there are also simulated samples that should also be selected for Challenges 1 and 2.
Each entry must meet the following requirements:
In all the above categories, the genome and transcriptome references specified by LRGASP should be used. For the long and short and freestyle category, additional transcriptome references can be used.
All replicates must be used in each experiment. For Challenge 1, different pipelines may treat replicates differently: for example the data could be combined or replicate information used to determine high-confidence transcripts. That will be up to the submitter.
Each team can submit multiple entries for each challenge; however, they can only submit one entry per challenge + data type + library prep + sequencing platform combination. This is to encourage tool development that is robust to different library preps and sequencing platforms, but prevent multiple entries that are subtle parameter changes.
For Challenge 1, the submitted GTF file should only contain transcripts that have been assigned a read.
For Challenge 2, submitters have the option of quantifying against the reference transcriptome or a transcriptome derived from the data (i.e., results from Challenge 1). The GTF used for quantification is included as part of the Challenge 2 submission. Submitters are allowed to submit one quantification using the reference and one quantification using a custom GTF.
Challenge 2 must report replicate quantification separately in the expression matrix.
Due to the challenges of isoform-level quantification and the lack of a gold standard, we devised a mixture sample, H1_mix
, in which an undisclosed ratio of two samples is mixed before sequencing. For validation, we sequenced H1
and endodermal cell
(derived from H1) samples individually to establish the isoforms present in only one or the other sample before mixing. In essence, the pre-mixed sample represents the “ground truth” of isoform expression before the mix. After the close of LRGASP submissions on October 1, the H1
and endodermal cell
data will be released.
H1_mix
. Libraries and computational pipelines can then be evaluated based on how well the transcript quantification in the H1_mix
sample represents the expected ratios determined from quantification from the individual cell lines, H1
and endodermal cell
. These submissions will be due 1 month after the main close of submissions, November 24.In all the above categories, except for freestyle, a transcriptome reference CANNOT be used.
The submitted FASTA file should only contain transcripts that have been assigned a read.
Each team can submit multiple entries for each challenge; however, they can only submit one entry per challenge + data type + library prep + sequencing platform combination. This is to encourage tool development that is robust to different library preps and sequencing platforms, but prevent multiple entries that are subtle parameter changes.