[DOI] 10.5524/102193
[Title] The haplotype-resolved chromosome pairs and transcriptome data of a heterozygous diploid African cassava cultivar
[Release Date] 2022-02-11
[Citation] Qi, W; Lim, Y; Patrignani, A; Schläpfer, P; Bratus-Neuenschwander, A; Grüter, S; Chanez, C; Rodde, N; Prat, E; Vautrin, S; Fustier, M; Pratas, D; Schlapbach, R; Gruissem, W (2022): The haplotype-resolved chromosome pairs and transcriptome data of a heterozygous diploid African cassava cultivar GigaScience Database. http://dx.doi.org/10.5524/102193
[Data Type] Genomic,Transcriptomic
[Dataset Summary] Cassava (Manihot esculenta) is an important clonally propagated food crop in tropical and sub-tropical regions worldwide. Genetic gain by molecular breeding has been limited, partially because cassava is a highly heterozygous crop with a repetitive and difficult to assemble genome.
Here we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, in combination with the assembler Hifiasm, produced genome assemblies at near complete haplotype resolution with higher continuity and accuracy compared to conventional long sequencing reads. We present two chromosome scale haploid genomes phased with Hi-C technology for the diploid African cassava variety TME204. With consensus accuracy above QV46, contig N50 above 18 Mbp, BUSCO completeness of 99%, and 35 K phased gene loci, it is the most accurate, continuous, complete and haplotype-resolved cassava genome assembly so far. Ab initio gene prediction with RNA-seq data and Iso-Seq transcripts identified abundant novel gene loci, with enriched functionality related to chromatin organization, meristem development and cell responses. During tissue development, differentially expressed transcripts of different haplotype origins were enriched for different functionality. In each tissue, 20-30% of transcripts showed allele-specific expression (ASE) differences. ASE bias was often tissue-specific and inconsistent across different tissues. Direction-shifting was observed in less than 2% of the ASE transcripts. Despite high gene synteny, the HiFi genome assembly revealed extensive chromosome re-arrangements and abundant intra-genomic and inter-genomic divergent sequences, with large structural variations mostly related to LTR-retrotransposons. We use the reference-quality assemblies to build a cassava pan-genome and demonstrate its importance in representing the genetic diversity of cassava for downstream reference-guided omics analysis and breeding.
The phased and annotated chromosome pairs allow a systematic view of the heterozygous diploid genome organization in cassava with improved accuracy, completeness and haplotype resolution. They will be a valuable resource for cassava breeding and research. Our study may also provide insights into developing cost-effective and efficient strategies for resolving complex genomes with high resolution, accuracy and continuity.
[File Location] https://ftp.cngb.org/pub/gigadb/pub/10.5524/102001_103000/102193/
[File name] - [File Description]
readme_102193.txt - None
TME204.HiFi_HiC_allmap.hap1.agp - TME204 hap1 allmap AGP (A Golden Path) file
TME204.HiFi_HiC_allmap.hap1.fasta - TME204 hap1 chromosome, scaffold, and haplotig sequences
TME204.HiFi_HiC_allmap.hap2.agp - TME204 hap2 allmap AGP file
TME204.HiFi_HiC_allmap.hap2.fasta - TME204 hap2 chromosome, scaffold, and haplotig sequences
TME204.HiFi_HiC_allmap.hap1.bed - Unique alignments of genetic markers against TME204 hap1
TME204.HiFi_HiC_allmap.hap2.bed - Unique alignments of genetic markers against TME204 hap2
TME204.HiFi_HiC.haplotig1.fa - TME204 hap1 phased contigs
TME204.HiFi_HiC.haplotig2.fa - TME204 hap2 phased contigs
weights.txt - weight of genetic maps used by allmap
full_table.viridiplantae_odb10.busco_hap1.tsv - TME204 hap1 busco full table
full_table.viridiplantae_odb10.busco_hap2.tsv - TME204 hap2 busco full table
missing_busco_list.viridiplantae_odb10.busco_hap1.tsv - TME204 hap1 busco missing genes
missing_busco_list.viridiplantae_odb10.busco_hap2.tsv - TME204 hap2 busco missing genes
short_summary.viridiplantae_odb10.busco_hap1.txt - TME204 hap1 busco summary
short_summary.viridiplantae_odb10.busco_hap2.txt - TME204 hap2 busco summary
hap1_hap2.Assemblytics_results.zip - assemblytics comparison of TME204 hap1 and hap2 haplotigs
hap1_hap2.delta - nucmer alignment of TME204 hap1 and hap2 haplotigs
ref_hap1.Assemblytics_results.zip - assemblytics comparison of AM560 contigs and TME204 hap1 haplotigs
ref_hap1.delta - nucmer alignment of AM560 contigs and TME204 hap1 haplotigs
TME204_AM560.sv.vcf.gz - SV between AM560 inbred genome and TME204 diplod genome, identified by alignments of TME204 HiFi reads against AM560 contigs
ref_hap2.delta - nucmer alignment of AM560 contigs and TME204 hap2 haplotigs
hap1_hap2.report - dnadiff comparison of TME204 hap1 and hap2 haplotigs
ref_hap1.report - dnadiff comparison of AM560 contigs and TME204 hap1 haplotigs
ref_hap2.report - dnadiff comoparison of AM560 contigs and TME204 hap2 haplotigs
TME204_hap1_hap2.sv.vcf.gz - SV between TME204 hap1 and hap2, identified by alignments of TME204 HiFi reads against TME204 hap1 haplotigs
TME204.rep-families.fa - Consensus sequences for each repeat family identified by RepeatModeler
TME204.HiFi_HiC_allmap.hap1.RepeatMasker.gff - Repeat annotation, TME204 hap1
TME204_HiFi_HiC_allmap.hap1.soft_masked.fasta - Repeat masked sequences, TME204 hap1
TME204.HiFi_HiC_allmap.hap2.RepeatMasker.gff - Repeat annotation, TME204 hap2
TME204_HiFi_HiC_allmap.hap2.soft_masked.fasta - Repeat masked sequences, TME204 hap2
TME204.HiFi_HiC_allmap.hap1.liftoff.cds.fna - Lifted reference gene sequences, TME204 hap1
TME204.HiFi_HiC_allmap.hap1.liftoff.pep.faa - Lifted reference protein sequences, TME204 hap1
TME204.HiFi_HiC_allmap.hap2.liftoff.cds.fna - Lifted reference gene sequences, TME204 hap2
TME204.HiFi_HiC_allmap.hap2.liftoff.curated.gff3 - Lifted gene models, TME204 hap2
TME204.HiFi_HiC_allmap.hap2.liftoff.pep.faa - Lifted protein sequences, TME204 hap2
TME204.hap1.v1.1.cds.all.fasta - All ab initio coding sequences, TME204 hap1
TME204.hap1.v1.1.cds.complete.fasta - Complete ab initio coding sequeneces, TME204 hap1
TME204.hap1.v1.1.gtf - Ab initio gene models, TME204 hap1
TME204.hap1.v1.1.mRNA.all.fasta - All ab initio transcript sequences, TME204 hap1
TME204.hap1.v1.1.pep.all.fasta - All ab initio protein sequences, TME204 hap1
TME204.hap1.v1.1.pep.complete.fasta - Complete ab initio protein sequences, TME204 hap1
TME204.hap2.v1.1.cds.all.fasta - All ab initio coding sequences, TME204 hap2
TME204.hap2.v1.1.cds.complete.fasta - Complete ab initio coding sequences, TME204 hap2
TME204.hap2.v1.1.gtf - Ab initio gene models, TME204 hap2
TME204.hap2.v1.1.mRNA.all.fasta - All ab initio transcript sequences, TME204 hap2
TME204.hap2.v1.1.pep.all.fasta - All ab initio protein sequences, TME204 hap2
hap1.GO_mapping.txt - GO annotation, TME204 hap1
hap2.GO_mapping.txt - GO annotation, TME204 hap2
TME204.hap2.v1.1.pep.complete.fasta - Complete ab inito protein sequences, TME204 hap2
cd-hit-est.clstr - Information about the transcript clusters with the associated sequences per cluster
cd-hit-est.fasta - Reference transcriptome
cd-hit-est.kallisto - kallisto index of the reference transcriptome
TME204.rep-families.stk - Seed alignments for each repeat family identified by RepeatModeler
sample_describe.txt - experiment definition for DE analysis. list of sample names and their paired SRA run accessions
FibrousRoot_describe.txt - experiment definition for DE analysis. The list of FibrousRoot sample names and their paired SRA run accessions
kallisto.ase.isoform.counts.matrix - kallisto count matrix, bi-allelic transcripts only
LateralBud_describe.txt - experiment definition for DE analysis. The list of LateralBud sample names and their paired SRA run accessions
Leaf_describe.txt - experiment definition for DE analysis. The list of Leaf sample names and their paired SRA run accessions
Midvein_describe.txt - experiment definition for DE analysis. The list of Midvein sample names and their paired SRA run accessions
Petiole_describe.txt - experiment definition for DE analysis. The list of Petiole sample names and their paired SRA run accessions
RAM_describe.txt - experiment definition for DE analysis. The list of RAM sample names and their paired SRA run accessions
SAM_describe.txt - experiment definition for DE analysis. The list of SAM sample names and their paired SRA run accessions
sample_describe.txt - experiment definition for DE analysis. The list of sample names and their paired SRA run accessions
Stem_describe.txt - experiment definition for DE analysis. The list of Stem sample names and their paired SRA run accessions
StorageRoot_describe.txt - experiment definition for DE analysis. The list of StorageRoot sample names and their paired SRA run accessions
00_assembly.sh - command line(s) used to run hifiasm assembly and allmap scaffolding
01_busco.sh - command line(s) used to run busco analysis
02_haplotype_differences.sh - command line(s) used to run genome comparison
03_repeatmodeler.sh - command line(s) used to run repeat prediction
04_repeatmasker.sh - command line(s) used to run repeat masking
05_liftoff.sh - command line(s) used to run transfer of reference genes
06_AUGUSTUS.sh - command line(s) used to run ab initio gene prediction
07_differentially_expressed_transcript.sh - command line(s) used to run differential expression
08_allele_specific_expression.sh - command line(s) used to run allele specific expression
Supplementary_File_2.agp - SAL2 Hi-C scaffolding H2 AGP file
Supplementary_File_3.agp - allmap scaffolding H1 AGP file
Supplementary_File_4.agp - allmap scaffolding H2 AGP file
Supplementary_File_5.zip - allmap scaffolding chromosome maps
Supplementary_File_6.zip - smashpp chromosome maps
Supplementary_File_1.agp - SALSA2 Hi-C scaffolding H1 AGP file
TME204.HiFi_HiC_allmap.hap1.liftoff.curated.gff3 - Lifted reference gene models, TME204 hap1
kallisto.isoform.counts.matrix - kallisto count matrix
kallisto.ase.isoform.id-map.txt - ID mapping of bi-allelic transcripts
TME204.rep-families.noProtFinal.fa - Consensus sequences for each repeat family identified by RepeatModeler, after ProtExcluder filtering
ref_hap2.Assemblytics_results.zip - assemblytics comparison of AM560 contigs and TME204 hap2 haplotigs
[License]
All files and data are distributed under the Creative Commons Attribution-CC0 License unless specifically stated otherwise, see http://gigadb.org/site/term for more details.
[Comments]
[End]