Data was simulated using the following tools:
Transcripts were generated with the same reference genomes and annotations that are used for the LRGASP challenge. Prior to simulation, polyA tails were appended to all transcript sequences, and artificial novel isoforms were inserted into the reference transcriptome.
PacBio error rate was estimated from the real LRGASP PacBio CCS data and expected to be ~1.6%. For ONT data default pre-trained models contained in NanoSim package were used. Expected error rates are ~16% for ONT cDNA and ~11% ONT dRNA.
Detailed parameters used for data simulation will remain undisclosed to all participants during the entire LRGAPS challenge.
For convenience of the LRGASP challenge participants we created a simulation wrapper, which allows to easily generate synthetic data with described characteristic. Thus, any participant may simulate data and perfrom their own benchmarks prior to submission. The wrapper is available here.