[DOI] 10.5524/102378 [Title] Supporting data for "Genome assembly of an Australian native grass species reveals a recent whole genome duplication and biased gene retention of genes involved in stress response" [Release Date] 2023-04-21 [Citation] De Silva, N, P; Lee, C; Battlay, P; Fournier-Level, A; Moore, J, L; Hodgins, K, A (2023): Supporting data for "Genome assembly of an Australian native grass species reveals a recent whole genome duplication and biased gene retention of genes involved in stress response" GigaScience Database. http://dx.doi.org/10.5524/102378 [Data Type] Genomic,Transcriptomic [Dataset Summary] The adaptive significance of polyploidy has been extensively debated and chromosome level genome assemblies of polyploids can provide insight into this topic. The Australian grass, Bothriochloa decipiens, belongs to the BCD clade, a group with a complex history of hybridization and polyploidy. This is the first genome assembly and annotation of a species that belongs to this fascinating yet complex group.
Using a combination of Illumina short reads, 10X Genomics linked reads and Hi-C sequencing data we assembled a highly contiguous genome of Bothriochloa decipiens, with a total length of 1,218.22 Mb and scaffold N50 of 42.637 Mb. Comparative analysis revealed that the species is a diploidized allotetraploid. We clustered the 20 major scaffolds, representing the 20 chromosomes, into the two sub genomes of the parental species using unique repeat signatures. Found evidence of biased fractionation and differences in the activity of transposable elements between the sub genomes prior to hybridization. Duplicates were enriched for genes involved in transcription and response to external stimuli like drought, supporting a biased retention of duplicated genes following whole genome duplication.
Our results support hypotheses explaining the biased retention of duplicated genes following polyploidy and point to differences in repeat activity associated with sub genome dominance. Bothriochloa decipiens is a widespread species with the ability to establish across many soil types, making it useful for ecological restoration of Australian grasslands. This reference genome is a valuable resource for future population genomic research involving Australian grasses which may be helpful in ecological restoration projects. [File Location] https://ftp.cngb.org/pub/gigadb/pub/10.5524/102001_103000/102378/ [File name] - [File Description] Bdec_final_genome_assembly.fasta - Final genome assembly fasta file Bdec_genes.fasta - Coding gene nucleotide sequences fasta file Bdec_genes.gff - Coding gene annotations gff file Bdec_proteins.fasta - Coding gene translated sequences protein fasta file BUSCO_full_table.tsv - BUSCO results output file of full table BUSCO_missing_table.tsv - BUSCO results output file of missing table BUSCO_short_summary.txt - BUSCO results output file of short summary fig1_all_repeats.txt - The text file with the start, end and the chromosome of all annotated repeats in the 20 longest scaffolds of the genome fig1_cytoband.csv - The csv file with the lengths of the 20 longest scaffolds of the genome used to draw the circlize plot in fig1 fig1_DNA_reps_cytoband.csv - The text file with the start, end and the chromosome of all annotated DNA fig1_GC_content_cytobamd.csv - The text file with the start, end and the chromosome of all annotated GC content fig1_genes.csv - The text file with the start, end and the chromosome of all annotated genes in the 20 longest scaffolds of the genome missing data fig1_LTR_reps_cytobad.csv - The text file with the start, end and the chromosome of all annotated LTR fig2A_bd.collinearity - The collinear blocks between the Bothrichola decipiens genome aligned against itself fig2A_bd.gff - The gff file with the gene annotations of the Bothrichola decipiens genome fig2B_so_bd.collinearity - The collinear blocks between the Bothrichola decipiens genome aligned against the Sorghum bicolor genome fig2B_so_bd.gff - The gff file with the gene annotations of the Bothrichola decipiens genome and the Sorghum bicolor genome fig3_all.kmer.table - The all kmers and kmer locations of the 20 longest scaffolds in the genome fig4B_repeat_family_45_alingn.txt - The alignment of repeat family 45 using gblocks software fig4B_repeat_family_47_alingn.txt - The alignment of repeat family 47 using gblocks software fig4B_repeat_family_53_alingn.txt - The alignment of repeat family 53 using gblocks software fig4B_repeat_family_61_alingn.txt - The alignment of repeat family 61 using gblocks software fig4B_repeat_family_63_alingn.txt - The alignment of repeat family 63 using gblocks software fig4B_repeat_family_68_alingn.txt - The alignment of repeat family 68 using gblocks software fig4B_repeat_family_91_alingn.txt - The alignment of repeat family 91 using gblocks software FigS1_scaffold_alingmnet.txt - The self alignment of the genome assembly against itself using minimap figS2_bd.collinearity - The collinear blocks between the Bothrichola decipiens genome aligned against itself figS2_bd.gff - The gff file with the gene annotations of the Bothrichola decipiens genome figS3_subgenomeA_kmer_location.bed - The bed file with the location of kmers in subgenome A figS3_subgenomeB_kmer_location.bed - The bed file with the location of kmers in subgenome B figS4_GOterms_for_all_genes.txt - The Gene ontology terms related with all the genes of the genome figS4_GOterms_for_duplicated_genes.txt - The Gene ontology terms related with duplicated genes of the genome fig4B_repeat_family_70_alingn.txt - The alignment of repeat family 70 using gblocks software figS4_GOterms_for_singlecopy_genes.txt - The Gene ontology terms related with singlecopy genes of the genome FigS6_flowcytometry_data.csv - The summary of the flow cytometry runs of all the samples mentioned in the study under the methods section FigS7_manual_corrected_final_assembly.bed - The bed file of the manual corrections of the genome assembly FigS7_manual_corrected_final_assembly.break_report.txt - The information on where the missjoins were and how breaks were included in the manually corrected assembly FigS7_manually_corrected_assembly.agp - The juicebox output of the manual correction of the assembly FigS8_filtered_manual.assembly - The coordinates of the scaffold start and ends of the manually corrected assembly Phylo_tree.Newick - Newick file for phylogenetic tree of the Andropogeneae showing the time of divergence (MYA) between taxa readme_102378.txt - None B_decipiens_genome-main.zip - Archival copy of the GitHub repository https://github.com/NissankaPD/B_decipiens_genome downloaded 04-April-2023. R scripts used in the manustript titled "Genome assembly of an Australian native grass species reveals a recent whole genome duplication and biased gene retention of genes involved in stress response". This project is licensed under the MIT license. Please refer to the GitHub repo for most recent updates. Bdec_repeats.gff - Repeat annotation gff file Bdec_transcriptome_assembly.fasta - De novo transcriptome assembly fasta file fig4B_repeat_family_29_alingn.txt - The alignment of repeat family 29 using gblocks software [License] All files and data are distributed under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication (https://creativecommons.org/publicdomain/zero/1.0/), unless specifically stated otherwise, see http://gigadb.org/site/term for more details. [Comments] [End]