[DOI] 10.5524/102193 [Title] The haplotype-resolved chromosome pairs and transcriptome data of a heterozygous diploid African cassava cultivar [Release Date] 2022-02-11 [Citation] Qi, W; Lim, Y; Patrignani, A; Schläpfer, P; Bratus-Neuenschwander, A; Grüter, S; Chanez, C; Rodde, N; Prat, E; Vautrin, S; Fustier, M; Pratas, D; Schlapbach, R; Gruissem, W (2022): The haplotype-resolved chromosome pairs and transcriptome data of a heterozygous diploid African cassava cultivar GigaScience Database. http://dx.doi.org/10.5524/102193 [Data Type] Genomic,Transcriptomic [Dataset Summary] Cassava (Manihot esculenta) is an important clonally propagated food crop in tropical and sub-tropical regions worldwide. Genetic gain by molecular breeding has been limited, partially because cassava is a highly heterozygous crop with a repetitive and difficult to assemble genome.
Here we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, in combination with the assembler Hifiasm, produced genome assemblies at near complete haplotype resolution with higher continuity and accuracy compared to conventional long sequencing reads. We present two chromosome scale haploid genomes phased with Hi-C technology for the diploid African cassava variety TME204. With consensus accuracy above QV46, contig N50 above 18 Mbp, BUSCO completeness of 99%, and 35 K phased gene loci, it is the most accurate, continuous, complete and haplotype-resolved cassava genome assembly so far. Ab initio gene prediction with RNA-seq data and Iso-Seq transcripts identified abundant novel gene loci, with enriched functionality related to chromatin organization, meristem development and cell responses. During tissue development, differentially expressed transcripts of different haplotype origins were enriched for different functionality. In each tissue, 20-30% of transcripts showed allele-specific expression (ASE) differences. ASE bias was often tissue-specific and inconsistent across different tissues. Direction-shifting was observed in less than 2% of the ASE transcripts. Despite high gene synteny, the HiFi genome assembly revealed extensive chromosome re-arrangements and abundant intra-genomic and inter-genomic divergent sequences, with large structural variations mostly related to LTR-retrotransposons. We use the reference-quality assemblies to build a cassava pan-genome and demonstrate its importance in representing the genetic diversity of cassava for downstream reference-guided omics analysis and breeding.
The phased and annotated chromosome pairs allow a systematic view of the heterozygous diploid genome organization in cassava with improved accuracy, completeness and haplotype resolution. They will be a valuable resource for cassava breeding and research. Our study may also provide insights into developing cost-effective and efficient strategies for resolving complex genomes with high resolution, accuracy and continuity. [File Location] https://ftp.cngb.org/pub/gigadb/pub/10.5524/102001_103000/102193/ [File name] - [File Description] readme_102193.txt - None TME204.HiFi_HiC_allmap.hap1.agp - TME204 hap1 allmap AGP (A Golden Path) file TME204.HiFi_HiC_allmap.hap1.fasta - TME204 hap1 chromosome, scaffold, and haplotig sequences TME204.HiFi_HiC_allmap.hap2.agp - TME204 hap2 allmap AGP file TME204.HiFi_HiC_allmap.hap2.fasta - TME204 hap2 chromosome, scaffold, and haplotig sequences TME204.HiFi_HiC_allmap.hap1.bed - Unique alignments of genetic markers against TME204 hap1 TME204.HiFi_HiC_allmap.hap2.bed - Unique alignments of genetic markers against TME204 hap2 TME204.HiFi_HiC.haplotig1.fa - TME204 hap1 phased contigs TME204.HiFi_HiC.haplotig2.fa - TME204 hap2 phased contigs weights.txt - weight of genetic maps used by allmap full_table.viridiplantae_odb10.busco_hap1.tsv - TME204 hap1 busco full table full_table.viridiplantae_odb10.busco_hap2.tsv - TME204 hap2 busco full table missing_busco_list.viridiplantae_odb10.busco_hap1.tsv - TME204 hap1 busco missing genes missing_busco_list.viridiplantae_odb10.busco_hap2.tsv - TME204 hap2 busco missing genes short_summary.viridiplantae_odb10.busco_hap1.txt - TME204 hap1 busco summary short_summary.viridiplantae_odb10.busco_hap2.txt - TME204 hap2 busco summary hap1_hap2.Assemblytics_results.zip - assemblytics comparison of TME204 hap1 and hap2 haplotigs hap1_hap2.delta - nucmer alignment of TME204 hap1 and hap2 haplotigs ref_hap1.Assemblytics_results.zip - assemblytics comparison of AM560 contigs and TME204 hap1 haplotigs ref_hap1.delta - nucmer alignment of AM560 contigs and TME204 hap1 haplotigs TME204_AM560.sv.vcf.gz - SV between AM560 inbred genome and TME204 diplod genome, identified by alignments of TME204 HiFi reads against AM560 contigs ref_hap2.delta - nucmer alignment of AM560 contigs and TME204 hap2 haplotigs hap1_hap2.report - dnadiff comparison of TME204 hap1 and hap2 haplotigs ref_hap1.report - dnadiff comparison of AM560 contigs and TME204 hap1 haplotigs ref_hap2.report - dnadiff comoparison of AM560 contigs and TME204 hap2 haplotigs TME204_hap1_hap2.sv.vcf.gz - SV between TME204 hap1 and hap2, identified by alignments of TME204 HiFi reads against TME204 hap1 haplotigs TME204.rep-families.fa - Consensus sequences for each repeat family identified by RepeatModeler TME204.HiFi_HiC_allmap.hap1.RepeatMasker.gff - Repeat annotation, TME204 hap1 TME204_HiFi_HiC_allmap.hap1.soft_masked.fasta - Repeat masked sequences, TME204 hap1 TME204.HiFi_HiC_allmap.hap2.RepeatMasker.gff - Repeat annotation, TME204 hap2 TME204_HiFi_HiC_allmap.hap2.soft_masked.fasta - Repeat masked sequences, TME204 hap2 TME204.HiFi_HiC_allmap.hap1.liftoff.cds.fna - Lifted reference gene sequences, TME204 hap1 TME204.HiFi_HiC_allmap.hap1.liftoff.pep.faa - Lifted reference protein sequences, TME204 hap1 TME204.HiFi_HiC_allmap.hap2.liftoff.cds.fna - Lifted reference gene sequences, TME204 hap2 TME204.HiFi_HiC_allmap.hap2.liftoff.curated.gff3 - Lifted gene models, TME204 hap2 TME204.HiFi_HiC_allmap.hap2.liftoff.pep.faa - Lifted protein sequences, TME204 hap2 TME204.hap1.v1.1.cds.all.fasta - All ab initio coding sequences, TME204 hap1 TME204.hap1.v1.1.cds.complete.fasta - Complete ab initio coding sequeneces, TME204 hap1 TME204.hap1.v1.1.gtf - Ab initio gene models, TME204 hap1 TME204.hap1.v1.1.mRNA.all.fasta - All ab initio transcript sequences, TME204 hap1 TME204.hap1.v1.1.pep.all.fasta - All ab initio protein sequences, TME204 hap1 TME204.hap1.v1.1.pep.complete.fasta - Complete ab initio protein sequences, TME204 hap1 TME204.hap2.v1.1.cds.all.fasta - All ab initio coding sequences, TME204 hap2 TME204.hap2.v1.1.cds.complete.fasta - Complete ab initio coding sequences, TME204 hap2 TME204.hap2.v1.1.gtf - Ab initio gene models, TME204 hap2 TME204.hap2.v1.1.mRNA.all.fasta - All ab initio transcript sequences, TME204 hap2 TME204.hap2.v1.1.pep.all.fasta - All ab initio protein sequences, TME204 hap2 hap1.GO_mapping.txt - GO annotation, TME204 hap1 hap2.GO_mapping.txt - GO annotation, TME204 hap2 TME204.hap2.v1.1.pep.complete.fasta - Complete ab inito protein sequences, TME204 hap2 cd-hit-est.clstr - Information about the transcript clusters with the associated sequences per cluster cd-hit-est.fasta - Reference transcriptome cd-hit-est.kallisto - kallisto index of the reference transcriptome TME204.rep-families.stk - Seed alignments for each repeat family identified by RepeatModeler sample_describe.txt - experiment definition for DE analysis. list of sample names and their paired SRA run accessions FibrousRoot_describe.txt - experiment definition for DE analysis. The list of FibrousRoot sample names and their paired SRA run accessions kallisto.ase.isoform.counts.matrix - kallisto count matrix, bi-allelic transcripts only LateralBud_describe.txt - experiment definition for DE analysis. The list of LateralBud sample names and their paired SRA run accessions Leaf_describe.txt - experiment definition for DE analysis. The list of Leaf sample names and their paired SRA run accessions Midvein_describe.txt - experiment definition for DE analysis. The list of Midvein sample names and their paired SRA run accessions Petiole_describe.txt - experiment definition for DE analysis. The list of Petiole sample names and their paired SRA run accessions RAM_describe.txt - experiment definition for DE analysis. The list of RAM sample names and their paired SRA run accessions SAM_describe.txt - experiment definition for DE analysis. The list of SAM sample names and their paired SRA run accessions sample_describe.txt - experiment definition for DE analysis. The list of sample names and their paired SRA run accessions Stem_describe.txt - experiment definition for DE analysis. The list of Stem sample names and their paired SRA run accessions StorageRoot_describe.txt - experiment definition for DE analysis. The list of StorageRoot sample names and their paired SRA run accessions 00_assembly.sh - command line(s) used to run hifiasm assembly and allmap scaffolding 01_busco.sh - command line(s) used to run busco analysis 02_haplotype_differences.sh - command line(s) used to run genome comparison 03_repeatmodeler.sh - command line(s) used to run repeat prediction 04_repeatmasker.sh - command line(s) used to run repeat masking 05_liftoff.sh - command line(s) used to run transfer of reference genes 06_AUGUSTUS.sh - command line(s) used to run ab initio gene prediction 07_differentially_expressed_transcript.sh - command line(s) used to run differential expression 08_allele_specific_expression.sh - command line(s) used to run allele specific expression Supplementary_File_2.agp - SAL2 Hi-C scaffolding H2 AGP file Supplementary_File_3.agp - allmap scaffolding H1 AGP file Supplementary_File_4.agp - allmap scaffolding H2 AGP file Supplementary_File_5.zip - allmap scaffolding chromosome maps Supplementary_File_6.zip - smashpp chromosome maps Supplementary_File_1.agp - SALSA2 Hi-C scaffolding H1 AGP file TME204.HiFi_HiC_allmap.hap1.liftoff.curated.gff3 - Lifted reference gene models, TME204 hap1 kallisto.isoform.counts.matrix - kallisto count matrix kallisto.ase.isoform.id-map.txt - ID mapping of bi-allelic transcripts TME204.rep-families.noProtFinal.fa - Consensus sequences for each repeat family identified by RepeatModeler, after ProtExcluder filtering ref_hap2.Assemblytics_results.zip - assemblytics comparison of AM560 contigs and TME204 hap2 haplotigs [License] All files and data are distributed under the Creative Commons Attribution-CC0 License unless specifically stated otherwise, see http://gigadb.org/site/term for more details. [Comments] [End]