[DOI] 10.5524/100777 [Title] Supporting data for "Generation of a chromosome-scale genome assembly of the insect-repellent terpenoid-producing Lamiaceae species, Callicarpa americana" [Release Date] 2020-07-30 [Citation] Hamilton, J, P; Godden, G, T; Lanier, E; Bhat, W, W; Kinser, T, J; Vaillancourt, B; Wang, H; Wood, J, C; Jiang, J; Soltis, P, S; Soltis, D, E; Hamberger, B; Buell, C, R (2020): Supporting data for "Generation of a chromosome-scale genome assembly of the insect-repellent terpenoid-producing Lamiaceae species, Callicarpa americana" GigaScience Database. http://dx.doi.org/10.5524/100777 [Data Type] Genomic,Transcriptomic [Dataset Summary] Plants exhibit wide chemical diversity due to the production of specialized metabolites which function as pollinator attractants, defensive compounds, and signaling molecules. Lamiaceae (mints) are known for their chemodiversity and have been cultivated for use as culinary herbs as well as sources of insect repellents, health-promoting compounds, and fragrance.
We report the chromosome-scale genome assembly of Callicarpa americana L. (American beautyberry), a species within the early-diverging Callicarpoideae clade of Lamiaceae, known for its metallic purple fruits and use as an insect repellent due to its production of terpenoids. Using long-read sequencing and Hi-C scaffolding, we generated a 506.1 Mb assembly spanning 17 pseudomolecules with N50 contig and N50 scaffold sizes of 7.5 Mb and 29.0 Mb, respectively. In all, 32,164 genes were annotated, including 53 candidate terpene synthases and 47 putative clusters of specialized metabolite biosynthetic pathways. Our analyses revealed three putative whole-genome duplication events, which together with local tandem duplications, contributed to gene family expansion of terpene synthases. Kolavenyl diphosphate is a gateway to many of the bioactive terpenoids in C. americana; experimental validation confirmed that CamTPS2 encodes kolavenyl diphosphate synthase. Syntenic analyses with Tectona grandis L. f. (teak), a member of the Tectonoideae clade of Lamiaceae known for exceptionally strong wood resistant to insects, revealed 963 collinear blocks and 21,297 C. americana syntelogs.
Access to the C. americana genome provides a roadmap for rapid discovery of genes encoding plant-derived agrichemicals and a key resource for understanding the evolution of chemical diversity in Lamiaceae. [File Location] ftp://parrot.genomics.cn/gigadb/pub/10.5524/100001_101000/100777/ [File name] - [File Description] Callicarpa_RNA-Seq_TPM_expression_matrix.txt - Gene expression matrix containing the high confidence gene model transcript abundance values (TPM) for each library of the tissue RNA-seq atlas - text file car_asm.fa - Pseudomolecules and unanchored scaffolds car.hc_gene_models.cdna.fa - Transcript sequences (cDNA) of high confidence gene models car.hc_gene_models.cds.fa - Coding sequences (CDS) of high confidence gene models car.hc_gene_models.gff3 - High confidence gene models annotation in GFF3 format car.hc_gene_models.pep.fa - Protein sequences of high confidence gene models car.hc_gene_models.repr.gene_model.list.txt - List of representative high confidence gene model ids car.hc_gene_models.repr.gff3 - Representative high confidence gene models annotation in GFF3 format car.hc_gene_models.repr.iprscan.txt - InterProScan output file car.hc_gene_models.repr.pep.fa - Protein sequences of the representative high confidence gene models car.working_models.cdna.fa - Transcript sequences (cDNA) of working gene models car.working_models.cds.fa - Coding sequences (CDS) of working gene models car.working_models.func_anno.txt - Functional annotation for the working gene models car.working_models.gff3 - Working gene models annotation in GFF3 format car.working_models.pep.fa - Protein sequences of working gene models ca_tg.collinearity - MCScanX output file full_table_car_final_asm_busco.tsv - BUSCO output files for the final assembly - full table missing_busco_list_car_final_asm_busco.tsv - BUSCO output files for the final assembly - missing list Orthogroups.GeneCount.tsv - Orthofinder2 output files - gene count Orthogroups_SingleCopyOrthologues.txt - Orthofinder2 output files - single copy Orthogroups.tsv - Orthofinder2 output files - tsv format Orthogroups.txt - Orthofinder2 output files - txt format Orthogroups_UnassignedGenes.tsv - Orthofinder2 output files - unassigned genes short_summary_car_final_asm_busco.txt - BUSCO output files for the final assembly - short summary [License] All files and data are distributed under the Creative Commons Attribution-CC0 License unless specifically stated otherwise, see http://gigadb.org/site/term for more details. [Comments] [End]