[DOI] 10.5524/100777
[Title] Supporting data for "Generation of a chromosome-scale genome assembly of the insect-repellent terpenoid-producing Lamiaceae species, Callicarpa americana"
[Release Date] 2020-07-30
[Citation] Hamilton, J, P; Godden, G, T; Lanier, E; Bhat, W, W; Kinser, T, J; Vaillancourt, B; Wang, H; Wood, J, C; Jiang, J; Soltis, P, S; Soltis, D, E; Hamberger, B; Buell, C, R (2020): Supporting data for "Generation of a chromosome-scale genome assembly of the insect-repellent terpenoid-producing Lamiaceae species, Callicarpa americana" GigaScience Database. http://dx.doi.org/10.5524/100777
[Data Type] Genomic,Transcriptomic
[Dataset Summary] Plants exhibit wide chemical diversity due to the production of specialized metabolites which function as pollinator attractants, defensive compounds, and signaling molecules. Lamiaceae (mints) are known for their chemodiversity and have been cultivated for use as culinary herbs as well as sources of insect repellents, health-promoting compounds, and fragrance.
We report the chromosome-scale genome assembly of Callicarpa americana L. (American beautyberry), a species within the early-diverging Callicarpoideae clade of Lamiaceae, known for its metallic purple fruits and use as an insect repellent due to its production of terpenoids. Using long-read sequencing and Hi-C scaffolding, we generated a 506.1 Mb assembly spanning 17 pseudomolecules with N50 contig and N50 scaffold sizes of 7.5 Mb and 29.0 Mb, respectively. In all, 32,164 genes were annotated, including 53 candidate terpene synthases and 47 putative clusters of specialized metabolite biosynthetic pathways. Our analyses revealed three putative whole-genome duplication events, which together with local tandem duplications, contributed to gene family expansion of terpene synthases. Kolavenyl diphosphate is a gateway to many of the bioactive terpenoids in C. americana; experimental validation confirmed that CamTPS2 encodes kolavenyl diphosphate synthase. Syntenic analyses with Tectona grandis L. f. (teak), a member of the Tectonoideae clade of Lamiaceae known for exceptionally strong wood resistant to insects, revealed 963 collinear blocks and 21,297 C. americana syntelogs.
Access to the C. americana genome provides a roadmap for rapid discovery of genes encoding plant-derived agrichemicals and a key resource for understanding the evolution of chemical diversity in Lamiaceae.
[File Location] ftp://parrot.genomics.cn/gigadb/pub/10.5524/100001_101000/100777/
[File name] - [File Description]
Callicarpa_RNA-Seq_TPM_expression_matrix.txt - Gene expression matrix containing the high confidence gene model transcript abundance values (TPM) for each library of the tissue RNA-seq atlas - text file
car_asm.fa - Pseudomolecules and unanchored scaffolds
car.hc_gene_models.cdna.fa - Transcript sequences (cDNA) of high confidence gene models
car.hc_gene_models.cds.fa - Coding sequences (CDS) of high confidence gene models
car.hc_gene_models.gff3 - High confidence gene models annotation in GFF3 format
car.hc_gene_models.pep.fa - Protein sequences of high confidence gene models
car.hc_gene_models.repr.gene_model.list.txt - List of representative high confidence gene model ids
car.hc_gene_models.repr.gff3 - Representative high confidence gene models annotation in GFF3 format
car.hc_gene_models.repr.iprscan.txt - InterProScan output file
car.hc_gene_models.repr.pep.fa - Protein sequences of the representative high confidence gene models
car.working_models.cdna.fa - Transcript sequences (cDNA) of working gene models
car.working_models.cds.fa - Coding sequences (CDS) of working gene models
car.working_models.func_anno.txt - Functional annotation for the working gene models
car.working_models.gff3 - Working gene models annotation in GFF3 format
car.working_models.pep.fa - Protein sequences of working gene models
ca_tg.collinearity - MCScanX output file
full_table_car_final_asm_busco.tsv - BUSCO output files for the final assembly - full table
missing_busco_list_car_final_asm_busco.tsv - BUSCO output files for the final assembly - missing list
Orthogroups.GeneCount.tsv - Orthofinder2 output files - gene count
Orthogroups_SingleCopyOrthologues.txt - Orthofinder2 output files - single copy
Orthogroups.tsv - Orthofinder2 output files - tsv format
Orthogroups.txt - Orthofinder2 output files - txt format
Orthogroups_UnassignedGenes.tsv - Orthofinder2 output files - unassigned genes
short_summary_car_final_asm_busco.txt - BUSCO output files for the final assembly - short summary
[License]
All files and data are distributed under the Creative Commons Attribution-CC0 License unless specifically stated otherwise, see http://gigadb.org/site/term for more details.
[Comments]
[End]