Genome Sequence of Chrysotila roscoffensis, a Coccolithphore Contributed to Global Biogeochemical Cycles

Meng, Ran; Zhang, Lin; Zhou, Chengxu; Liao, Kai; **ao, Peng; Luo, Qijun; Xu, Jilin; Cui, Yanze; Hu, **aodi; Yan, **aojun

doi:10.3390/genes13010040

Open AccessArticle

Genome Sequence of Chrysotila roscoffensis, a Coccolithphore Contributed to Global Biogeochemical Cycles

by

Ran Meng

^1,2,3,

Lin Zhang

³,

Chengxu Zhou

^1,2,

Kai Liao

³,

Peng **ao

³

,

Qijun Luo

³,

Jilin Xu

³,

Yanze Cui

⁴,

**aodi Hu

^4,* and

**aojun Yan

^3,5,*

¹

College of Food and Pharmaceutical Sciences, Ningbo University, Ningbo 315211, China

²

Li Dak Sum Yip Yio Chin Kenneth Li Marine Biopharmaceutical Research Center, Ningbo University, Ningbo 315211, China

³

School of Marine Science, Ningbo University, Ningbo 315211, China

⁴

Novogene Bioinformatics Institute, Bei**g 100083, China

⁵

School of Marine Science, Zhejiang Ocean University, Zhoushan 316022, China

^*

Authors to whom correspondence should be addressed.

Genes 2022, 13(1), 40; https://doi.org/10.3390/genes13010040

Submission received: 15 October 2021 / Revised: 16 December 2021 / Accepted: 21 December 2021 / Published: 23 December 2021

(This article belongs to the Section Plant Genetics and Genomics)

Download

Browse Figures

Versions Notes

Abstract

:

Chrysotila is a genus of coccolithophores. Together with Emiliania, it is one of the representative genera in the Haptophyta which have been extensively studied. They are photosynthetic unicellular marine algae sharing the common characteristic of the production of CaCO₃ platelets (coccoliths) on the surface of their cells and are crucial contributors to global biogeochemical cycles. Here, we report the genome assembly of Chrysotila roscoffensis. The assembled genome size was ~636 Mb distributed across 769 scaffolds with N50 of 1.63 Mb, and maximum contig length of ~2.6 Mb. Repetitive elements accounted for approximately 59% of the genome. A total of 23,341 genes were predicted from C. roscoffensis genome. The divergence time between C. roscoffensis and Emiliania huxleyi was estimated to be around 537.6 Mya. Gene families related to cytoskeleton, cellular motility and morphology, and ion transport were expanded. The genome of C. roscoffensis will provide a foundation for understanding the genetic and phenotypic diversification and calcification mechanisms of coccolithophores.

Keywords:

coccolithophores; Chrysotila roscoffensis; phenotypic diversification; calcification

1. Introduction

Coccolithophores, belonging to the Haptophyta, are photosynthetic unicellular marine algae sharing the common characteristic of the production of CaCO₃ platelets (coccoliths) on the surface of their cells. They are globally distributed across all oceans except the polar ones, with some species forming blooms that can be observed from artificial satellites [1]. Coccolithophores play a fundamental role in the marine carbon cycle through the fixation of inorganic carbon by photosynthesis (the organic carbon pump) and the export of CO₂ during calcification (the carbonate counter pump) [2]. Consequentially, they are thought to be responsible for about 10% of global carbon fixation [3] and to produce up to 50% of oceanic CaCO₃ [4]. Coccolithophores also affect the global sulfur cycle through their production of dimethylsulfoniopropionate (DMSP), the major precursor of atmospheric dimethyl sulfide (DMS) [5]. In addition, coccoliths provide ballast that drives the transfer of particulate organic matter to the deep ocean [6].

Given the ecological and biogeochemical importance of coccolith formation, the mechanisms of calcification have raised a considerable interest and research. The calcification is common for a number of organisms, including unicellular organisms, invertebrates, and vertebrates, but it has unique cellular and biochemical characteristics in the coccolithophores. Firstly, the sites of calcification in the coccolithophores differ from those in other organisms. These sites are either extracellular or intercellular for most biological calcification, while coccoliths are produced intracellularly in a Golgi-derived coccolith vesicle (CV) [7]. Secondly, the composition of the organic matrix is different between the coccolithophores and other organisms. Acidic polysaccharides, in contrast to proteins found in other organisms, such as bivalve mollusks [8], crayfish [9], pearl oyster [10], and fishotolith [11], are the main component of the organic matrix and are predominantly associated with coccolith formation in the coccolithophores [12]. Until now, the molecular mechanisms and regulators underlying characteristics of calcification in the coccolithophores are still not fully elucidated.

There are approximately 200 extant species of coccolithophores [13]; Emiliania and Chrysotila (formerly Pleurochrysis) are the two most explored genera. To our knowledge, only Emiliania huxleyi genome is available in the coccolithophores, even in the Haptophyta [14]. Moreover, these two genera exhibit high degree of genetic and phenotypic variations. For example, the gene content varied from 10% to 30% among E. huxleyi strains [15]. C. carterae (formerly Pleurochrysis carterae) calcification takes place at night, whereas E. huxleyi coccolith is mainly formed during day [16]. In Emiliania, the reticular body (RB) is closely connected to CV and is important in providing raw material for calcification [17] but appears to be absent in Chrysotila [7]. Three types of acidic polysaccharides (PS1, PS2, and PS3) were identified in Chrysotila, but Emiliania lacks PS1 and PS2, which deliver Ca²⁺ to CV in Chrysotila [12]. There are very limited data on the evolution and mechanisms of these variability in the coccolithophores.

While Emiliania species distribute globally in almost all ocean ecosystems, species of genus Chrysotila was mainly found in coastal, estuarine, brackish waters and in marine aquaculture pools. Notorious foaming blooms of Chrysotila species frequently occur in these areas. Some species in genus Chrysotila were lethal to brine shrimp [18], a model organism in many toxicological research. However, the mechanism of the lethal effects is not unraveled. Non-calcified filamentous colonies in the life cycle of Chrysotila species is typical heteromorphic characteristic in this genus [19]. In the present study, we report on the assembly and annotation of the C. roscoffensis genome. The data will provide a foundation for understanding the genetic and phenotypic diversification and calcification mechanisms of coccolithophore, a key player in the global biogeochemical cycles.

2. Materials and Methods

2.1. C. roscoffensis Strain and DNA Extraction

Genomic DNA from C. roscoffensis (strain NMBjih026-8, Figure 1) was used for library construction and sequencing. The strain was originally isolated from coastal waters in ** the clean short insert size reads to the scaffolds. Finally, we also evaluated the level of genome completeness of the final genome assembly using CEGMA [23].

2.4. Repetitive Sequences Annotation

Repeat sequences were identified and classified using a combination of de novo and homology-based approaches. The ab initio prediction program RepeatModeler (http://www.repeatmasker.org/RepeatModeler.html) was employed to construct a de novo repeat library from the C. roscoffensis genomes. The homology-based annotation was performed by map** the C. roscoffensis genomes onto Repbase database (http://www.girinst.org/) and TE protein database using RepeatMasker (http://www.repeatmasker.org/RMDownload.html) and RepeatProteinMask software [24], respectively. Tandem repeats were identified using Tandem Repeats Finder [25].

2.5. Genome Annotation

Homolog-based, de novo, and transcriptome-based methods were used to construct the gene model set. Homolog proteins sequences of E. huxleyi, Phaeodactylum tricornutum, Chlamydomonas reinhardtii, Chlorella variabilis, Arabidopsis thaliana, and Volvox carteri were downloaded from Ensemble (http://plants.ensembl.org/index.html) and NCBI (https://www.ncbi.nlm.nih.gov/). The gene models were extracted using GeneWise [26] in accordance with the alignments of the homolog proteins sequences to the repeat-masked genomes. We adopted five ab initio gene-prediction software: Augustus (version 2.5.5) [27], Genscan (version 1.0) [28], GlimmerHMM (version 3.0.1) [29], Geneid [30], and SNAP [31] to perform the de novo gene models predictions. RNA-seq data were mapped to the repeat-masked genomes using Tophat (version 2.0.8) [32], and Cufflinks (version 2.1.1) [33] (http://cufflinks.cbcb.umd.edu/). In addition, we de novo assembled RNA-seq data into several pseudo-ESTs by Trinity [34]. These pseudo-ESTs were also aligned to the repeat-masked genomes and gene models were predicted by PASA [35]. The EvidenceModeler (EVM) [36] was adopted to combine all of the Homo-set, Cufflinks-set, PASA-T-set and ab initio gene sets to generate a consensus and non-redundant reference gene set.

We annotated the gene functions according to the alignments to two integrated protein sequence databases (SwissProt and NR) by BLASTP with an e-value cutoff of at 1e⁻⁵. The InterProScan [37] was adopted to search motifs and conserved functional domains using Pfam and GO databases. The pathways involved in interactions, reactions, and relationships among genes were assigned by BLAST searching the KEGG databases [38], with an E-value cutoff at 1e⁻⁵.

2.6. Phylogenetic and Comparative Genomic Analysis

We performed comparative analysis between the C. roscoffensis genes and the genes identified from C. reinhardtii, C. eustigma, Chromochloris zofingiensis, Micromonas pusilla, Chlorella sorokiniana, Chara braunii, Thalassiosira oceanica, Thalassiosira pseudonana, P. tricornutum, Aureococcus anophagefferens, Saccharina japonica, E. huxleyi, Symbiodinium microadriaticum, Porphyra umbilicalis, Galdieria sulphuraria, Chondrus crispus, Bigelowiella natans, A. thaliana and Oryza sativa (Table S1). The genes of each species were filtered as follows: first, only the longest transcript was retained when multiple transcripts are present in one gene; second, only the genes with an encoding length longer than 50 amino acids were retained. Then, the similarity of protein sequences between pairs of all species was obtained by blastp with the e-value 1e⁻⁵. OrthoMCL (http://orthomcl.org/orthomcl/) [39] was applied to cluster into paralogous and orthologous among 20 species protein datasets with the inflation parameter 1.5. MUSCLE [40] (http://www.drive5.com/muscle/)was adopted to align the protein sequences of each of 25 one-to-one single-copy gene families shared by all species, and all the results were combined into a super alignment matrix. Then, the 20-species phylogenetic tree was constructed using RaxML [41] (http://sco.h-its.org/exelixis/web/software/raxml/index.html) with the maximum likelihood method, and the bootstrap was 100. B. natans was selected as the outgroup. We performed divergence dating based on the phylogenetic analysis using MCMCtree in PAML package [42,43].

The gene families that expanded and contracted in all genomes were identified using CAFÉ [44] based on phylogenetic analysis. To further functionally annotate the expanded gene families, the gene ontology (GO) term was retrieved from InterProScan results and the enrichment analysis was performed.

3. Results and Discussion

3.1. Genome Analysis of C. roscoffensis

Based on the total number of k-mers (26,900,644,184), the C. roscoffensis genome size was calculated to be approximately 674.07 Mb and the heterozygosity was 0.64%, which indicated a relatively lower intraspecific variation compared to E. huxleyi [14] (Figure 2 and Table 1). To prepare for following de novo assembly, we filtered the low quality, duplicated, and adapter-containing reads generated by Illumina Xten platform to ensure high accuracy. After that, a total of 35.33 Gb (52.41-fold coverage of the genome) data were retained (Table 2). A total of 53.12 Gb (78.80-fold coverage of the genome) PacBio sequencing data were produced for the assembly (Table 2). The 93.22 Gb library was sequenced with 150 bp paired-end reads were generated by an Illumina HiSeq X Ten platform (Table 2). The assembled genome size was ~636 Mb distributed across 769 scaffolds (Table 3). The final assembly result is close to the estimated genome size based on 17-mer analyses. Almost 85.30% of reads could successfully align to final assembly (Table S2). CEGMA analysis showed that 81.05% conserved core eukaryotic genes could be captured in our genome, of which 75.00% were complete (Table S3). These results indicated that the genome assembled in this paper contained comprehensive genomic information.

3.2. Genome Annotation

The results show that 58.54% of C. roscoffensis genome consists of repetitive elements (Table 4). Among these repeats, 53.67% could be divided into known repeat families. Long-terminal repeats (LTRs) were the most abundant repeat family, accounting for 37.04% of the genome size (Table 5). The second largest family in C. roscoffensis was DNA elements, which account for 5.66% of the genome size. A total of 23,341 genes were yielded from C. roscoffensis genome and the average lengths of CDS, exon, and intron were 1596 bp, 277 bp, and 719 bp, respectively (Table 6). Finally, a total of 23,216 genes were predicted to be functional, accounting for 99.5% of all genes in C. roscoffensis genome (Table 7).

3.3. Phylogenetic and Comparative Genomic Analysis

The distribution of genes in C. roscoffensis and other 19 species was shown in Figure 3. Additionally, common and unique gene families in C. roscoffensis, E. huxleyi, S. japonica, T. oceanica, and T. pseudonana were presented in Figure 4. Phylogenetic analysis has shown that the divergence time between C. roscoffensis and E. huxleyi is estimated to be around 537.6 Mya (Figure S1). This result suggested the divergence between C. roscoffensis and E. huxleyi was much earlier than previously predicted (approximately 250 Mya) [45].

3.4. Expanded Coccoliths-Related Gene Families

Compared with E. huxleyi, there were 22 significantly expanded gene families and 39 significantly contracted gene families were identified in C. roscoffensis (Figure S1). There are 60 GO terms were significantly enriched among the expanded gene families (p ≤ 0.05, Table S4). Among these significantly enriched GO terms, there are 16 terms associated with cytoskeleton, cellular motility and morphology, such as ‘dynein complex’, ‘cellular component movement’, ‘microtubule motor activity’, ‘microtubule-based movement’, ‘microtubule-based process’, ‘motor activity’, ‘microtubule cytoskeleton’, ‘microtubule associated complex’, ‘cytoskeletal part’, ‘cytoskeleton’, ‘anatomical structure morphogenesis’, ‘cilium or flagellum-dependent cell motility’, ‘axonemal dynein complex’, ‘cell morphogenesis’, ‘anatomical structure development’, and ‘non-membrane-bounded organelle’. The cytoskeleton plays fundamental roles in intracellular transport, secretion of cell wall materials, and the regulation of cell morphology in many eukaryotes [46]. In several species, the disruption of cytoskeleton prevents the secretion of coccoliths, resulting in the formation of malformed coccoliths [47,48]. The roles of cytoskeleton in calcification, such as regulating the shape of the coccolith vesicle and controlling vesicle and cell movements by interacting with the membrane trafficking system, have been proposed [5]. Thus, the significant expansion of families of genes associated with cytoskeleton in C. roscoffensis leads to a hypothesis that the calcification and morphological characteristics are associated with cytoskeleton and cellular motility.

Here, we also identified a set of significantly enriched GO terms associated with ion transport. The coccolith is produced in a Golgi-derived CV and then is secreted to the cell surface through exocytotic pathways [5]. The calcification process presents a remarkable case of transport physiology, requiring rapid rates uptake of Ca²⁺ and HCO₃⁻ from the surrounding seawater into the CV and meanwhile removal of the produced H⁺ which may exert pressure on the internal pH homeostasis of the cell [49,50]. The expansion of ion transport process related genes could reflect the demand for delivery of substrates and removal of products during calcification in C. roscoffensis.

4. Conclusions

In conclusion, we report the genome sequencing, assembly, and annotation of the coccolithophore, C. roscoffensis. The assembled genome size was ~636 Mb distributed across 769 scaffolds with N50 of 1.63 Mb, and maximum contig length of ~2.6 Mb. Repetitive elements accounted for approximately 59% of the genome. A total of 23,341 genes were predicted from C. roscoffensis genome. The divergence time between C. roscoffensis and E. huxleyi was estimated to be around 537.6 Mya. Gene families related to cytoskeleton, cellular motility, and morphology and ion transport were expanded. These data are valuable genetic resource for elucidating coccolithophore biology.

Supplementary Materials

The following are available online at https://mdpi.longhoe.net/article/10.3390/genes13010040/s1, Figure S1: Estimation of divergence time and expansion and contraction gene families in C. roscoffensis, Table S1: Basic statistical results of C. roscoffensis and relative species, Table S2. Coverage statistics of C. roscoffensis genome, Table S3. Assessment the gene coverage rate using CEGMA and Table S4. Enriched GO terms of expanded genes in C. roscoffensis genome assembly.

Author Contributions

X.Y. and C.Z. designed the experiments and managed the project. R.M., L.Z. and P.X. prepared the materials. X.H. and Y.C. performed genome assembly and data analysis. R.M., K.L. and C.Z. mainly wrote the manuscript. Q.L. and J.X., advised and coordinated the study. All authors contributed to manuscript writing and reviewing and approved the final version for submission. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China (2018YFD0900702; 2018YFD0901504; 2018YFA0903000), the National Science and Technology Basic Resources Investigation Program of China (2018FY100206), Ningbo Science and Technology Research Projects, China (2019B10006), the Earmarked Fund for Modern Agro-industry Technology Research System, China (CARS-49), Ningbo Science and Technology Research Projects, China (2019C10023), State Key Laboratory of Marine Geology, Tongji University (MGK202013), and was partially sponsored by K. C. Wong Magna Fund in Ningbo University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw sequencing data of the genomic and the transcriptome are available via NCBI with the BioProject accession number PRJNA648277 and BioSample accession number SAMN15644355. The assembly data have been deposited in NCBI under project accession No. SAMN15637150.

Conflicts of Interest

The authors declare that they have no competing interest.

References

Holligan, P.M.; Fernández, E.; Aiken, J.; Balch, W.M.; Boyd, P.; Burkill, P.H.; Finch, M.; Groom, S.B.; Malin, G.; Muller, K.; et al. A biogeochemical study of the coccolithophore, Emiliania huxleyi, in the North Atlantic. Glob. Biogeochem. Cycles 1993, 7, 879–900. [Google Scholar] [CrossRef]
Rost, B.; Riebesell, U. Coccolithophores and the biological pump: Responses to environmental changes. Coccolithophores 2004, 99–125. [Google Scholar] [CrossRef] [Green Version]
Poulton, A.J.; Adey, T.R.; Balch, W.M.; Holligan, P.M. Relating coccolithophore calcification rates to phytoplankton community dynamics: Regional differences and implications for carbon export. Deep. Sea Res. Part II Top. Stud. Oceanogr. 2007, 54, 538–557. [Google Scholar] [CrossRef]
Milliman, J.D. Production and accumulation of calcium carbonate in the ocean: Budget of a nonsteady state. Glob. Biogeochem. Cycles 1993, 7, 927–957. [Google Scholar] [CrossRef]
Taylor, A.R.; Brownlee, C.; Wheeler, G. Coccolithophore Cell Biology: Chalking Up Progress. Annu. Rev. Mar. Sci. 2017, 9, 283–310. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Klaas, C.; Archer, D.E. Association of sinking organic matter with various types of mineral ballast in the deep sea: Implications for the rain ratio. Glob. Biogeochem. Cycles 2002, 16, 63-1–63-14. [Google Scholar] [CrossRef]
Van der Wal, P.; de Jong, E.; Westbroek, P.; de Bruijn, W.; Mulder-Stapel, A. Polysaccharide localization, coccolith formation, and golgi dynamics in the coccolithophorid Hymenomonas carterae. J. Ultrastruct. Res. 1983, 85, 139–158. [Google Scholar] [CrossRef]
Marsh, M.E. Biomineralization in the presence of calcium-binding phosphoprotein particles. J. Exp. Zoöl. 1986, 239, 207–220. [Google Scholar] [CrossRef]
Inoue, H.; Ohira, T.; Ozaki, N.; Nagasawa, H. A novel calcium-binding peptide from the cuticle of the crayfish, Procambarus clarkii. Biochem. Biophys. Res. Commun. 2004, 318, 649–654. [Google Scholar] [CrossRef]
Miyamoto, H.; Miyashita, T.; Okushima, M.; Nakano, S.; Morita, T.; Matsushiro, A. A carbonic anhydrase from the nacreous layer in oyster pearls. Proc. Natl. Acad. Sci. USA 1996, 93, 9657–9660. [Google Scholar] [CrossRef] [Green Version]
Murayama, E.; Takagi, Y.; Ohira, T.; Davis, J.G.; Greene, M.I.; Nagasawa, H. Fish otolith contains a unique structural protein, otolin-1. Eur. J. Biochem. 2002, 269, 688–696. [Google Scholar] [CrossRef] [PubMed]
Marsh, M. Regulation of CaCO₃ formation in coccolithophores. Comp. Biochem. Physiol. Part B Biochem. Mol. Biol. 2003, 136, 743–754. [Google Scholar] [CrossRef]
Young, J.; Geisen, M.; Cros, L.; Kleijne, A.; Sprengel, C.; Probert, I.; Østergaard, J. A guide to extant coccolithophore taxonomy. J. Nannoplankton Res. 2003, 1, 1–125. [Google Scholar]
Read, B.A.; Kegel, J.; Klute, M.J.; Kuo, A.; Lefebvre, S.C.; Maumus, F.; Grigoriev, I.V. Pan genome of the phytoplankton Emiliania underpins its global distribution. Nature 2013, 499, 209–213. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kegel, J.U.; John, U.; Valentin, K.; Frickenhaus, S. Genome Variations Associated with Viral Susceptibility and Calcification in Emiliania huxleyi. PLoS ONE 2013, 8, e80684. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Moheimani, N.R.; Borowitzka, M.A. Increased CO₂ and the effect of pH on growth and calcification of Pleurochrysis carterae and Emiliania huxleyi (Haptophyta) in semicontinuous cultures. Appl. Microbiol. Biotechnol. 2011, 90, 1399–1407. [Google Scholar] [CrossRef]
Westbroek, P.; Young, J.R.; Linschooten, K. Coccolith Production (Biomineralization) in the Marine Alga Emiliania huxleyi. J. Protozool. 1989, 36, 368–373. [Google Scholar] [CrossRef]
Houdan, A.; Bonnard, A.; Fresnel, J.; Fouchard, S.; Billard, C.; Probert, I. Toxicity of coastal coccolithophores (Prymnesio-phyceae, Haptophyta). J. Plankton Res. 2004, 26, 875–883. [Google Scholar] [CrossRef]
Hawkins, E.K.; Lee, J.J.; Fimiarz, D.K. Colony Formation and Sexual Morphogenesis in the Coccolithophore Pleurochrysis sp. (Haptophyta)1. J. Phycol. 2011, 47, 1344–1349. [Google Scholar] [CrossRef]
Luo, R.; Liu, B.; **e, Y.; Li, Z.; Huang, W.; Yuan, J.; He, G.; Chen, Y.; Pan, Q.; Liu, Y.; et al. SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler. GigaScience 2012, 1, 18. [Google Scholar] [CrossRef]
Walker, B.J.; Abeel, T.; Shea, T.; Priest, M.; Abouelliel, A.; Sakthikumar, S.; Cuomo, C.A.; Zeng, Q.; Wortman, J.; Young, S.K.; et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE 2014, 9, e112963. [Google Scholar] [CrossRef] [PubMed]
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ar**v 2013, ar**v:1303.3997. [Google Scholar]
Parra, G.; Bradnam, K.; Korf, I. CEGMA: A pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 2007, 23, 1061–1067. [Google Scholar] [CrossRef] [PubMed]
Tarailo-Graovac, M.; Chen, N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Curr. Protoc. Bioinform. 2009, 25, 4.10.1–4.10.14. [Google Scholar] [CrossRef]
Benson, G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999, 27, 573–580. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Birney, E.; Clamp, M.; Durbin, R. GeneWise and Genomewise. Genome Res. 2004, 14, 988–995. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Stanke, M.; Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 2003, 19, ii215–ii225. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Burge, C.; Karlin, S. Prediction of complete gene structure in human genomic DNA. J. Mol. Biol. 1997, 268, 78–94. [Google Scholar] [CrossRef] [Green Version]
Majoros, W.H.; Pertea, M.; Salzberg, S. TigrScan and GlimmerHMM: Two open source ab initio eukaryotic gene-finders. Bioinformatics 2004, 20, 2878–2879. [Google Scholar] [CrossRef]
Alioto, T.; Blanco, E.; Parra, G.; Guigó, R. Using geneid to Identify Genes. Curr. Protoc. Bioinform. 2018, 64, e56. [Google Scholar] [CrossRef]
Korf, I. Gene finding in novel genomes. BMC Bioinform. 2004, 5, 59. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Trapnell, C.; Pachter, L.; Salzberg, S.L. TopHat: Discovering splice junctions with RNA-Seq. Bioinformatics 2009, 25, 1105–1111. [Google Scholar] [CrossRef]
Trapnell, C.; Roberts, A.; Goff, L.; Pertea, G.; Kim, D.; Kelley, D.R.; Pachter, L. Differential gene and transcript expres-sion analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 2012, 7, 562–578. [Google Scholar] [CrossRef] [Green Version]
Haas, B.J.; Papanicolaou, A.; Yassour, M.; Grabherr, M.; Blood, P.D.; Bowden, J.; Regev, A. De novo transcript se-quence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013, 8, 1494–1512. [Google Scholar] [CrossRef]
Haas, B.J.; Delcher, A.L.; Mount, S.M.; Wortman, J.R.; Smith Jr, R.K.; Hannick, L.I.; White, O. Improving the Arabidop-sis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003, 31, 5654–5666. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Haas, B.J.; Salzberg, S.L.; Zhu, W.; Pertea, M.; E Allen, J.; Orvis, J.; White, O.; Buell, C.R.; Wortman, J.R. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008, 9, R7. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jones, P.; Binns, D.; Chang, H.Y.; Fraser, M.; Li, W.; McAnulla, C.; McWilliam, H.; Maslen, J.; Mitchell, A.; Nuka, G.; et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics 2014, 30, 1236–1240. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kanehisa, M.; Sato, Y.; Kawashima, M.; Furumichi, M.; Tanabe, M. KEGG as a reference resource for gene and protein annota-tion. Nucleic Acids Res. 2016, 44, D457–D462. [Google Scholar] [CrossRef] [Green Version]
Li, L.; Stoeckert, C.J., Jr.; Roos, D.S. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res. 2003, 13, 2178–2189. [Google Scholar] [CrossRef] [Green Version]
Edgar, R.C. MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 2004, 5, 113. [Google Scholar] [CrossRef] [Green Version]
Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014, 30, 1312–1313. [Google Scholar] [CrossRef] [PubMed]
Yang, Z. PAML: A program package for phylogenetic analysis by maximum likelihood. Bioinformatics 1997, 13, 555–556. [Google Scholar] [CrossRef] [PubMed]
Yang, Z. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 2007, 24, 1586–1591. [Google Scholar] [CrossRef] [PubMed] [Green Version]
De Bie, T.; Cristianini, N.; DeMuth, J.P.; Hahn, M. CAFE: A computational tool for the study of gene family evolution. Bioinformatics 2006, 22, 1269–1271. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liu, H.; Aris-Brosou, S.; Probert, I.; de Vargas, C. A Time line of the Environmental Genetics of the Haptophytes. Mol. Biol. Evol. 2010, 27, 161–176. [Google Scholar] [CrossRef] [Green Version]
Szymanski, D.; Staiger, C.J. The Actin Cytoskeleton: Functional Arrays for Cytoplasmic Organization and Cell Shape Control. Plant Physiol. 2018, 176, 106–118. [Google Scholar] [CrossRef]
Langer, G.; De Nooijer, L.J.; Oetjen, K. On the role of the cytoskeleton in coccolith morphogenesis: The effect of cytoskeleton inhibitors. J. Phycol. 2010, 46, 1252–1256. [Google Scholar] [CrossRef] [Green Version]
Durak, G.M.; Brownlee, C.; Wheeler, G.L. The role of the cytoskeleton in biomineralisation in haptophyte algae. Sci. Rep. 2017, 7, 15409. [Google Scholar] [CrossRef] [Green Version]
Suffrian, K.; Schulz, K.; Gutowska, M.A.; Riebesell, U.; Bleich, M. Cellular pH measurements in Emiliania huxleyi reveal pronounced membrane proton permeability. New Phytol. 2011, 190, 595–608. [Google Scholar] [CrossRef] [Green Version]
Brownlee, C.; Taylor, A. Calcification in coccolithophores: A cellular perspective. Coccolithophores 2004, 31–49. [Google Scholar] [CrossRef]

Figure 1. Microscopic images of Chrysotila roscoffensis (strain NMBjih026-8). (a) motile coccolith-bearing cell, showing two flagellates (arrow) and coccolith (arrow head). (b) nonmotile coccolith-bearing cells. (c) non-calcified filamentous colonies. (d) scanning electron microscope (SEM) image of coccolith-bearing cell. (e) SEM image of coccoliths. (f) transmission electron microscope (TEM) image of coccolith-bearing cell. Chl: chloroplast; G: Golgi apparatus; M: mitochondrion; N: nucleus; P: pyrenoid; and V: vacuole.

Figure 2. 17 K-mer analysis for estimating the genome size of C. roscoffensis. The distribution of 17-mer was calculated using jellyfish (version2.1.3) based on the sequencing data from short insert size libraries and the genome size was estimated based on the formula: genome size = total_kmer_num / kmer_depth, where total_kmer_num is the total number of K-mer and kmer_depth indicates the peak position on the K-mer frequency distribution map. Heterozygous peak indicates the genome heterozygosity, repeat peak represents the repeat rate of the genome.

Figure 3. The distribution of genes in Aureococcus anophagefferens, Arabidopsis thaliana, Bigelowiella natans, Chondrus crispus, Chlamydomonas eustigma, Chromochloris zofingiensis, Chlamydomonas reinhardtii, Chlorella sorokiniana, Emiliania huxleyi, Galdieria sulphuraria, Micromonas pusilla, Oryza sativa, Chrysotila roscoffensis, Phaeodactylum tricornutum, Porphyra umbilicalis, Saccharina japonica, Symbiodinium microadriaticum, Thalassiosira oceanica, Thalassiosira pseudonana and Chara braunii.

Figure 4. Common and unique gene families in five groups. Venn diagram showing comparison of shared and unique protein-coding genes among Chrysotila roscoffensis, Emiliania huxleyi, Thalassiosira pseudonana, Thalassiosira oceanica, and Saccharina japonica based on orthology analysis.

Table 1. Survey statistic results of C. roscoffensis.

Species	Total Base (Gb)	K-Mer	K-Mer Number	K-Mer Depth	Genome Size (Mb)	Revised Genome Size (Mb)	Heterozygous Ratio (%)	Repeat Ratio (%)
C. roscoffensis	34.24	17	26,900,644,184	39	689.76	674.07	0.64	69.45

Table 2. Sequencing data statistics of C. roscoffensis.

Pair-End Libraries	Insert Size	Total Data (G)	Read Length (bp)	Sequence Coverage (X)
Illumina reads	350 bp	35.33	150	52.41
Pacbio reads		53.12		78.80
10X Genomics		93.22	150	138.29
Total		181.67		269.51

Table 3. Summary of the final genome assembly of C. roscoffensis.

Sample ID	Length		Number
Sample ID	Contig ** (bp)	Scaffold (bp)	Contig **	Scaffold
Total	629,886,791	635,699,922	2167	769
Max	2,590,224	12,677,996
Number ≥ 2000			2167	769
N50	441,430	1,631,423	434	111
N60	354,170	1,228,002	593	156
N70	281,606	954,517	791	215
N80	208,186	651,419	1053	296
N90	141,820	391,115	1414	420

** Contig after scaffolding.

Table 4. Summary of repeat contents in C. roscoffensis genome.

Type	Repeat Size	% of Genome
Trf	74,813,341	11.493833
Repeatmasker	327,015,645	50.240549
Proteinmask	67,002,054	10.293758
Total	381,019,300	58.537318

Table 5. Statistics of transposable element (TE) classification in C. roscoffensis genome.

	Denovo + Repbase		TE Proteins		Combined TEs (All without Trf)
	Length (bp)	% in Genome	Length (bp)	% in Genome	Length (bp)	% in Genome
DNA	33,824,343	5.196551	4,008,971	0.615912	36,809,695	5.655201
LINE	7,142,576	1.097339	2,411,479	0.370484	8,374,515	1.286606
SINE	196,696	0.030219	0	0	196,696	0.030219
LTR	236,201,808	36.288504	60,676,043	9.321871	241,112,694	37.042981
Other	0	0	0	0	0	0
Satellite	3,083,747	0.473767	0	0	3,083,747	0.473767
Simple_repeat	25,608,316	3.934295	0	0	25,608,316	3.934295
Unknown	31,651,266	4.862694	0	0	31,651,266	4.862694
Total	327,015,645	50.240549	67,002,054	10.293758	331,759,778	50.969407

Table 6. Basic statistical results of gene structure prediction of C. roscoffensis genome.

Gene Set		Number	Average Gene Length (bp)	Average CDS Length (bp)	Average Exons Per Gene	Average Exon Length (bp)	Average Intron Length (bp)
De novo	Augustus	43,490	3611.96	1504.35	4.12	365.32	675.96
	GlimmerHMM	313,490	1985.29	1123.67	3.85	292.11	302.67
	SNAP	102,913	1468.91	842.4	2.1	401.8	571.35
	Geneid	104,522	2507.4	1130.75	2.74	412.95	791.97
	Genscan	55,474	8837.72	2586.56	8.02	322.45	890.29
Homolog	Emiliania huxleyi	21,246	1339.63	695.35	1.75	397.92	861.93
	Phaeodactylum tricornutum	5755	1577.79	782.52	2.07	377.49	741.18
	Chlamydomonas reinhardtii	12,700	1608.62	938.93	1.92	489.67	729.92
	Chlorella variabilis	5117	1463.96	732.1	1.98	369.66	746.45
	Volvox carteri	13,333	922.29	609.61	1.47	413.34	658.52
	Arabidopsis thaliana	13,684	1312.01	892.26	1.47	609.02	902.56
RNA-seq	Cufflinks	43,799	7548.43	2585.25	6.42	402.61	915.52
RNA-seq	PASA	76,439	3568.24	1093.39	4.32	253.27	746.1
EVM		47,323	3839.76	1523.32	4.34	351	693.55
PASA-update		46,875	3848.09	1550.63	4.33	357.92	689.43
Final set		23,341	5013.31	1596.61	5.75	277.68	719.32

Table 7. The statistical results of gene function annotation of C. roscoffensis genome.

Database		Annotated Num	Annotated Percent (%)
NR		16,841	72.2
Swiss-Prot		11,919	51.1
KEGG		11,807	50.6
InterPro	All	23,179	99.3
	Pfam	12,799	54.8
	GO	21,194	90.8
Annotated		23,216	99.5
Total		23,341	-

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Meng, R.; Zhang, L.; Zhou, C.; Liao, K.; **ao, P.; Luo, Q.; Xu, J.; Cui, Y.; Hu, X.; Yan, X. Genome Sequence of Chrysotila roscoffensis, a Coccolithphore Contributed to Global Biogeochemical Cycles. Genes 2022, 13, 40. https://doi.org/10.3390/genes13010040

AMA Style

Meng R, Zhang L, Zhou C, Liao K, **ao P, Luo Q, Xu J, Cui Y, Hu X, Yan X. Genome Sequence of Chrysotila roscoffensis, a Coccolithphore Contributed to Global Biogeochemical Cycles. Genes. 2022; 13(1):40. https://doi.org/10.3390/genes13010040

Chicago/Turabian Style

Meng, Ran, Lin Zhang, Chengxu Zhou, Kai Liao, Peng **ao, Qijun Luo, Jilin Xu, Yanze Cui, **aodi Hu, and **aojun Yan. 2022. "Genome Sequence of Chrysotila roscoffensis, a Coccolithphore Contributed to Global Biogeochemical Cycles" Genes 13, no. 1: 40. https://doi.org/10.3390/genes13010040

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Genome Sequence of Chrysotila roscoffensis, a Coccolithphore Contributed to Global Biogeochemical Cycles

Abstract

1. Introduction

2. Materials and Methods

2.1. C. roscoffensis Strain and DNA Extraction

2.4. Repetitive Sequences Annotation

2.5. Genome Annotation

2.6. Phylogenetic and Comparative Genomic Analysis

3. Results and Discussion

3.1. Genome Analysis of C. roscoffensis

3.2. Genome Annotation

3.3. Phylogenetic and Comparative Genomic Analysis

3.4. Expanded Coccoliths-Related Gene Families

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI