lrgasp-submissions

FAQ: Frequently asked questions

How do I submit predictions for the LRGASP challenges?

I would like to submit a Challenge 1 entry for the “long-only” category from PacBio sequencing? Can you clarify how many files and for which samples would be included in the entry?

You should select the challenge, data category, library prep, and sequencing platform, and then submit predictions for all samples that fit that selection criteria (See Challenges for more details). In almost all cases, there are multiple replicates per sample; however only one GTF should be returned for each sample. Different pipelines may treat replicates differently: for example the data could be combined or replicate information used to determine high-confidence transcripts. That will be up to the submitter.

For this inquiry, it would be important to also specify the library prep as the CapTrap method was also sequenced with PacBio. Let’s assume the selection is “Challenge 1, long-reads only, cDNA, PacBio”. A helpful way to know which samples are included for this is to use the HTML table of the LRGASP data to search for the samples.

On the HTML table, you can input a “1” to filter the “challenges” column, “cdna” to filter the “library_prep” column, and “pacbio” to filter the “platform” column. We also have this data matrix in TSV format to help with programmatic selection.

These selections give the following samples that are used for Challenge 1 with a cDNA prep, sequenced on PacBio.

We want to make sure we collect simulated and real data results from each computational pipeline and we would like to ensure that tools are robust to different organisms.

So in this scenario, you would be submitting five GTF files and five corresponding read map files for each of the five samples. This page gives an overview of the file structure for the submission that would contain subfolders for each of these samples: https://lrgasp.github.io/lrgasp-submissions/docs/submission.html.

The R2C2 libraries have both size-selected and non-size-selected libraries. How do I submit results?

The data from sequencing of the different library preparations were kept separately; however, the data for each bioreplicate should be combined. For example, if submitting R2C2 libraries for quantification, the data from size-selected and non-size-selected libraries for the same bioreplicate should be combined and the file accessions should be delimited with a , in the header of the transcript expression matrix.

The manatee samples have 2 PacBio and 2 ONT runs (size selected and normal, but both cDNA). Do you know if I should make 4 submissions for those or combine size-selected with normal ones?

Long-read sequencing platforms should not be combined unless the submission is “freestyle”. For the manatee submissions, the data from size selected and non-size selected libraries should be combined and included in the submission. It is up to the submitter how they decide to combine the information, e.g. just merge the reads or treat size-selected and non-size selected differently during the processing.

I notice some libraries have multiple files. I also notice other oddities of the data. Are these known issues?

What is the plan for authorship for the LRGASP study?

The LRGASP Consortium effort is being submitted as a Registered Report. The author list of Stage 1 of the LRGASP Registered Report consists of the LRGASP Organizers. The Stage 2, and final, version of the manuscript will include submission participants as co-authors, primarily as “LRGASP Submission Participants”.