[DOI] 10.5524/100952 [Title] Supporting data for "A nuclear genome for the Fever tree (Cinchona pubescens Vahl) built from extensive short and long-read DNA" [Release Date] 2022-07-28 [Citation] Canales, N, A; Pérez-Escobar, O, A; Powell, R, F; Töpel, M; Kidner, C; Nesbitt, M; Maldonado, C; Barnes, C, J; Rønsted, N; Leitch, I, J; Antonelli, A (2022): Supporting data for "A nuclear genome for the Fever tree (Cinchona pubescens Vahl) built from extensive short and long-read DNA" GigaScience Database. http://dx.doi.org/10.5524/100952 [Data Type] Genomic,Transcriptomic [Dataset Summary] The Andean fever tree (CinchonaL.; Rubiaceae) is the iconic source of bioactive quinine alkaloids which have been key to treating malaria for centuries. In particular, C. pubescens Vahl has been an important source of income for several countries in its native range in north-western South America. However, the genomic resources required to place Cinchona species in the tree of life and to explore the evolution and biosynthesis of alkaloids are meagre.
Using a combination of ~120 Gb of long sequencing reads derived from the Oxford Nanopore PromethION platform and 142 Gb of short read Illumina data, we address this gap by providing the first highly contiguous nuclear and organelle genome assemblies and their corresponding annotations. Our nuclear genome assembly consists of 603 scaffolds comprising a total length of 903 Mb, representing ~85% of the genome size (1.1 Gb/1C). This draft genome sequence was complemented by annotating 72,305 CDSs using a combination of de novo and reference-based transcriptome assemblies. Completeness analysis revealed that our assembly is highly complete, displaying 83% of the BUSCO gene set, and a small fraction of genes (4.6%) classified as fragmented. We demonstrate the utility of these novel genomic resources by placing C. pubescens in the Gentianales order using plastid and nuclear datasets.
Our study provides the first genomic resource for C. pubescens thus opening new research avenues, including the unravelling of the gene toolkit for alkaloid biosynthesis in the fever tree. [File Location] https://ftp.cngb.org/pub/gigadb/pub/10.5524/100001_101000/100952/ [File name] - [File Description] loci_alignments.tar.gz - the loci alignment (353 loci) before building the phylogenomic tree. mergedAss_P9018_3KbpINSERT_raconPOLISH_longer.codingseq - structural prediction and identity of the nuclear genes from the comprehensive transcriptome assembly showing the coding sequences mergedAss_P9018_3KbpINSERT_raconPOLISH_longer.mrna - structural prediction and identity of the nuclear genes from the comprehensive transcriptome assembly showing the mRNA sequences mergedAss_P9018_3KbpINSERT_raconPOLISH_longer.gff - structural prediction and identity of the nuclear genes from the comprehensive transcriptome assembly in a genetic feature format. mergedAss_P9018_3KbpINSERT_raconPOLISH_longer.cdsexons - structural prediction and identity of the nuclear genes from the comprehensive transcriptome assembly showing the cdsexons genome_full_table.tabular - Genome BUSCO results. Full table genome_short_summary.txt - Genome BUSCO results. Short summary genome__missing_buscos.tabular - Genome BUSCO results. Missing BUSCOs table transcript_short_summary.txt - Transriptome BUSCO results. Short summary transcript_missing_buscos.tabular - Transriptome BUSCO results. Missing BUSCOs table transcript_full_table.tabular - Transriptome BUSCO results. Full table CP9104_hyb_HP1_LSC_IR_SCC_IR_right_dir.fasta - plastid genome assembly compreh_init_build.fasta.gz - reference-based and de novo assembly of transcripts using Trinity 2.8 on the trimmed and filtered RNA-seq data. mergedAss_sizeSelec_4PAD_P9018_3KbpINSERT_raconPOLISH.fasta.gz - nuclear genome assembly RAxML_bootstrap-90gaps_consensus.boot.tre - bootstrapped trees rub_chl_genome_align.fasta - multifasta for all the taxa used rub_chl_genome_align.nex - nexus alignment rub_chl_genome_align_Tree_new.newick - alignment in newick version rub_gaps90_RAxML_MajorityRuleConsensusTree.con.tre.tre - consensus tree bestTree.tar.gz - Best tree for each loci bipartitions.tar.gz - Bipartition values brlabels.tar.gz - Branch labels of all alignments LB20.tar.gz - The gene trees (353 loci) input to build the nuclear phylogenomic tree. species_genti353.tre - file representing the rooted phylogenomic species tree for the nuclear data within 19 Genetianales species species_genti353_rooted.tre - file representing the phylogenomic species tree for the nuclear data within 19 Genetianales species mergedAss_P9018_3KbpINSERT_raconPOLISH_longer.aa - structural prediction and identity of the nuclear genes from the comprehensive transcriptome assembly showing the aminoacid sequences hints_CP.est.gff - gff file used as hints for the nuclear genome annotation with Augustus. Table_2_supplementary_material.csv - Supplementary Table 2. Overview of the samples from the Tree of Life Explorer (Royal Botanic Gardens, Kew) that were used in the phylogenetic analysis to construct the coalescent tree. [License] All files and data are distributed under the Creative Commons Attribution-CC0 License unless specifically stated otherwise, see http://gigadb.org/site/term for more details. [Comments] [End]