Next Article in Journal
The Placenta as a Target of Epigenetic Alterations in Women with Gestational Diabetes Mellitus and Potential Implications for the Offspring
Previous Article in Journal
Firing up Cold Tumors—Targeting the Epigenetic Machinery to Enhance Cancer Immunotherapy
Previous Article in Special Issue
Quantitative Epigenetics: A New Avenue for Crop Improvement
 
 
Article
Peer-Review Record

The EpiDiverse Plant Epigenome-Wide Association Studies (EWAS) Pipeline

by Sultan Nilay Can 1, Adam Nunn 2,3, Dario Galanti 4, David Langenberger 2, Claude Becker 5,6, Katharina Volmer 7, Katrin Heer 8,9, Lars Opgenoorth 10, Noe Fernandez-Pozo 1 and Stefan A. Rensing 1,10,11,*
Reviewer 1:
Reviewer 2: Anonymous
Submission received: 15 March 2021 / Revised: 16 April 2021 / Accepted: 20 April 2021 / Published: 4 May 2021
(This article belongs to the Special Issue Advances in Plant Epigenetics and Epigenomics)

Round 1

Reviewer 1 Report

Can et al. present a very interesting and timely report which covers the use of GWAS to determine the impacts of so-called ‘epigenetic’ effects (in the meaning of DNA cytosine methylation) on plant phenotypes, so called EWAS. Specifically, bisulfite-seq from two species of tree, Picea abies (Norway spruce) and Quercus lobata (valley oak) are analysed with a new “EpiDiverse EWAS pipeline” and comparisons to previous analyses made.

This tool will be an important platform for advancing EWAS in plant species, including in non-model species, NUS and wild species where the basis of diversity is poorly described; in this regard, the use of a model conifer vs. a non-model eudicot angiosperm is particularly appealing.

I do have concerns about the write-up however, as some definitions and terms seem to be used quite loosely and it is not always clear what exactly is meant. The paper is also written very much as a report on the technique, and general readers would probably benefit from a summary of the main biological findings.

The English is sometimes stilted but almost always understandable – I have suggested a few minor corrections where it seems ambiguous below.

In conclusion the report is of wide interest and should definitely be published subject to some bits of re-writing to make the description of the study (and its assumptions) more coherent. Thanks to the authors for making this very useful tool available!

Introduction

This generally sets the background to the study well. My main comment is that more clarity is needed on the hypothesis that is being tested. The background to plant GWAS and the potential additional value of EWAS is not clear, and the term ‘epigenetic’ is used very loosely.

Line 57: it is a personal preference, but ‘Next generation sequencing (NGS)’ is not a helpful term – it has been the standard technique for over 15 years now. If RNA sequencing is meant that better to just say that.

Lines 61 and onwards – the discussion of plant GWAS is not very complete and the examples are a bit curious. A recent review (https://acsess.onlinelibrary.wiley.com/doi/full/10.1002/tpg2.20077) and references therein could be used to make this section more comprehensive.

Lines 70-73:

“Epigenetic changes are dynamic, making it difficult to discriminate significant relationship between phenotype and epigenetic mechanisms-a major challenge of EWAS[29].”

  • this could be expanded, especially with respect to the heritability of different marks (do they need to be common across generations?) and to the importance of variation between somatic tissues

“Common issues observed both in GWAS and EWAS are missing and big data[30].”

  • is this sentence incomplete? Or does it mean the issues are ‘missing data’ and ‘the need to handle big data’?

The background to EWAS itself also need more depth. In particular, the meaning of ‘epigenetic’ is not clear – does it refer specifically to meiotically heritable DNA methylation? If so the connection to human tumours does not necessarily follow, when presumably somatic changes to DNA methylation patterns or histone variants/modifications at oncogenes is the main factor. Better to give a brief summary of the evidence for heritable DNA methylation in plant populations e.g. in epialleles (including paramutable and imprinted loci) and epigenetic components to heterosis. Quadrana and Colot (2016) Plant transgenerational epigenetics. Annual Review of Genetics. 50:467-91 and refs therein would be a good addition to this section and make what is actually being looked for clearer. This section should finish with a clear statement of the hypothesis being tested.

 

Results

These are generally clear, and the technical elements are very user-friendly which is good to see!

Line 98 – the authors state they ‘thought’ there is bias in the FDR calling… the reasons for this could be summarized (probably in the Introduction as the background to the hypothesis mentioned above).

Lines 100-102 – ‘extra graphs’ – it is not clear if these provide additional analyses, or just a bigger choice of visualisations.

Figure 1 – this is very useful and practical. Please define all acronyms in the legend, but otherwise this is nice. Similarly, Fig 2 and Table 1 take the user through the steps of application very coherently.

Line 134 – I think ‘submit’ is meant rather than ‘commit’.

Line 194 – the fact that the GxE term allows determination of “correlation between genetic variation and environmental factors that may affect DNA methylation” is a potentially really powerful hypothesis-testing tool for plant adaptation to given environmental variables, but I am not sure how ‘correlation’ is being used here, is this about making a specific statistical test for explanatory power? Again, is DNA methylation here only referring to heritable methylation patterns?

Line 268 – at the start of this section it would be nice to summarise the biology and distribution of the two species, and the locations from which they were sampled. I note that only one SMV is correlated to minimum temperatures – I would expect this to be very different in say a boreal species…

 

Conclusions

These provide a good technical summary but what do the GO terms imply? Do they allow new conclusions to be drawn or are these solely intended as hypothesis-generating tools?

Line 508 – ‘could be reproduced’ is meant I think

 

Methods

These are generally written in a way which is hard to follow; terms are used without definition, no clear reason for methodological choices is made, and the English sometimes seems to be muddled. I would suggest a rewrite to make it more intelligible to general audiences.

Line 554: “These varying filter_SD values were found by several trials to get the same amount of data in the paper” –

  • what does this mean?

Lines 562-567 – this is hard to follow, ‘Germany’ is repeated, ortets are not defined.

Lines 578-582 – I am not familiar with the options of these programs, please can the reason for these parameters being chosen be briefly stated in this section?

Line 600 – how are ‘averages’ of the trees defined? Mean positions?

Author Response

We are very grateful to the reviewers for their critical review of the manuscript, and we sincerely appreciate the and the time spent. Reviewer’s suggestions were adopted in the revised version. Below, we provide a point by point response.

Introduction

This generally sets the background to the study well. My main comment is that more clarity is needed on the hypothesis that is being tested. The background to plant GWAS and the potential additional value of EWAS is not clear, and the term ‘epigenetic’ is used very loosely.

RESPONSE: We rewrote parts of the introduction to improve the background about plant GWAS, EWAS and epigenetics. 

Line 57: it is a personal preference, but ‘Next generation sequencing (NGS)’ is not a helpful term – it has been the standard technique for over 15 years now. If RNA sequencing is meant that better to just say that.

RESPONSE: The text was removed in the process of rewriting the introduction.

Lines 61 and onwards – the discussion of plant GWAS is not very complete and the examples are a bit curious. A recent review (https://acsess.onlinelibrary.wiley.com/doi/full/10.1002/tpg2.20077) and references therein could be used to make this section more comprehensive.

RESPONSE: We thank the reviewer for their suggestion, we have modified the text to include more recent citations, including the one suggested.

Lines 70-73:

“Epigenetic changes are dynamic, making it difficult to discriminate significant relationships between phenotype and epigenetic mechanisms-a major challenge of EWAS[29].”

this could be expanded, especially with respect to the heritability of different marks (do they need to be common across generations?) and to the importance of variation between somatic tissues

RESPONSE: We have expanded this section as suggested.

“Common issues observed both in GWAS and EWAS are missing and big data[30].”

is this sentence incomplete? Or does it mean the issues are ‘missing data’ and ‘the need to handle big data’

RESPONSE: Common issues both for GWAS and EWAS are dealing with missing and big data. We clarified that in the revised manuscript.

The background to EWAS itself also need more depth. In particular, the meaning of ‘epigenetic’ is not clear – does it refer specifically to meiotically heritable DNA methylation? If so the connection to human tumours does not necessarily follow, when presumably somatic changes to DNA methylation patterns or histone variants/modifications at oncogenes is the main factor. Better to give a brief summary of the evidence for heritable DNA methylation in plant populations e.g. in epialleles (including paramutable and imprinted loci) and epigenetic components to heterosis. Quadrana and Colot (2016) Plant transgenerational epigenetics. Annual Review of Genetics. 50:467-91 and refs therein would be a good addition to this section and make what is actually being looked for clearer. This section should finish with a clear statement of the hypothesis being tested.

RESPONSE: We thank the reviewer for their suggestion and have modified the text appropriately.

 Results

These are generally clear, and the technical elements are very user-friendly which is good to see!

Line 98 – the authors state they ‘thought’ there is bias in the FDR calling… the reasons for this could be summarized (probably in the Introduction as the background to the hypothesis mentioned above).

RESPONSE: We have modified the text as suggested.

Lines 100-102 – ‘extra graphs’ – it is not clear if these provide additional analyses, or just a bigger choice of visualisations.

RESPONSE: We have modified the text as suggested.

Figure 1 – this is very useful and practical. Please define all acronyms in the legend, but otherwise this is nice. Similarly, Fig 2 and Table 1 take the user through the steps of application very coherently.

RESPONSE: The legend of the figures and tables was modified to include the acronyms.

Line 134 – I think ‘submit’ is meant rather than ‘commit’.

RESPONSE: We have modified the text as suggested.

Line 194 – the fact that the GxE term allows determination of “correlation between genetic variation and environmental factors that may affect DNA methylation” is a potentially really powerful hypothesis-testing tool for plant adaptation to given environmental variables, but I am not sure how ‘correlation’ is being used here, is this about making a specific statistical test for explanatory power? Again, is DNA methylation here only referring to heritable methylation patterns?

RESPONSE: We did not want to convey statistical correlation and hence rephrased as association. DNA methylation refers not only to heritable methylation patterns, but all methylation sites observed in the methylation call after bisulfite sequencing. We modified the text accordingly.

Line 268 – at the start of this section it would be nice to summarise the biology and distribution of the two species, and the locations from which they were sampled. I note that only one SMV is correlated to minimum temperatures – I would expect this to be very different in say a boreal species…

RESPONSE: We have added these parts at the start of this section as suggested.

Conclusions

These provide a good technical summary but what do the GO terms imply? Do they allow new conclusions to be drawn or are these solely intended as hypothesis-generating tools?

RESPONSE: As a hierarchical controlled vocabulary, Gene Ontology helps to group meaningful biological functions that might be missed in individual gene descriptions. Different genes related to the same biological function may have GO terms in common. Finding most of the GO terms overlap** between different analyses shows a large part of the findings of these analyses are shared on the level of the ontological vocabulary and its underlying functionality, e.g. the biological process enacted. We added this explanation to the revised version of the manuscript.

Line 508 – ‘could be reproduced’ is meant I think

RESPONSE: We have modified the text as suggested.

Methods

These are generally written in a way which is hard to follow; terms are used without definition, no clear reason for methodological choices is made, and the English sometimes seems to be muddled. I would suggest a rewrite to make it more intelligible to general audiences.

RESPONSE: We rephrased major parts of the methods, including more details on parameters and method choices.

Line 554: “These varying filter_SD values were found by several trials to get the same amount of data in the paper” –

what does this mean?

RESPONSE: To mirror these analyses, the EWAS pipeline run was performed with 10x coverage, q value < 0.1, with a maximum of 10% of missing data, and different standard deviation values per position (0.028, 0.0176, and 0.0197 for CG, CHG, and CHH respectively to replicate the results in the previously published study [45].

Lines 562-567 – this is hard to follow, ‘Germany’ is repeated, ortets are not defined.

RESPONSE: We have modified the text to be more clear.

Lines 578-582 – I am not familiar with the options of these programs, please can the reason for these parameters being chosen be briefly stated in this section?

RESPONSE: Parameters were simplified, and reasons were briefly stated in this section.

 Line 600 – how are ‘averages’ of the trees defined? Mean positions?

RESPONSE: We have modified the text to be more precise.

Reviewer 2 Report

Can et al. present their newly-developed bioinformatic pipeline for conducting association analyses of DNA methylation with genetic variation, environmental factors and gene by environment interactions. This reviewer is not a bioinformatics expert, and therefore cannot comment on the underlying quality of the pipeline and software. However, as an experienced user of GWAS analyses, however, this reviewer finds that the pipeline as presented appears a valuable addition to the toolsets available to those interested in the role of DNA methylation in phenotypic diversity.

I only have one comment on the science presented: the P. abies dataset is a targeted capture dataset, and this is relevant for the GO analysis in 2.2.2.5 and 2.2.2.6 . Does the original targeted dataset of genes show any bias in the GO terms present that could influence the subsequent analysisv? If so, do the authors still find the same relative enrichment after this bias has been taken into account?

Otherwise, the paper could benefit from some improvement in the writing. In particular, the authors should give the introduction a careful and thorough revision. At present, it is poorly structured, the wrong examples are used, and a lot of the references are 15 years or more old. Some detailed notes are below. The results and discussion section is much better written, but again a quick revision to check for some odd English phraseology would be wise (some examples from the Introduction, Results and the supplementary ‘Blastx’ section are noted below, but there are others). A little more information is also required in the Materials and Methods on the sequencing (see below).

Introduction

  • Definition of epigenetics – usually includes the specification that the modifications are inherited through mitosis. Otherwise they are just modifications.
  • Line 34 – midifications?!
  • Line 36 - “Methylation is currently the most intensively studied”… most extensively, possibly. Intensively – there is a lot of excellent work on histone modifications too. Specify DNA methylation here, as histone methylation is not discussed.
  • Line 43 – the vernalization response is given as an example for DNA methylation, but it is driven by histone modification and DNA methylation is not involved. Remove and cite an appropriate example. (Also – that review is now 21 years old! Hence the mistake).
  • Line 63 – “to request a perspective on genomics-based crop design” – I don’t understand this sentence.
  • More information is required on what GEM, GLINT and EWAS are and do, and how they do it, to introduce the relevance of the modifications made in this software.
  • Line 80 – The EpiDiverse ITN is a great initiative but this section reads a little like an advert for it, without introducing what it is. In particular “share comprehensive pipelines with the EpiDiverse ITN network” is not relevant to most readers.

Results

  • Table 1: Column “Required for which runs?” Row “DMRs” – is it not “required to run the pipeline with DMRs”?
  • Line 439 - CHELSA database – what does the acronym stand for? Add a reference.
  • Line 456 - G and GxEGO - presumably G and GxE GO?

Supplemental:

  • Figure S9. “the blue line is called the significance threshold. The red line is set to 10-6 as default and the blue line’s threshold is calculated as dividing the red line threshold with 100. The header indicates the context.input_type_filtered_output_FDR (default=0.05).” This is confusing – is the blue line an 0.05 FDR threshold or is it not? If it is not, it should not be presented as such anywhere.
  • Line 399 – “receptor-like kinases seem to be efficient for the survival of plants” – check phrasing.

Materials and Methods

More information is required on the sequencing of the datasets analysed – for example, this reviewer found no mention that the valley oak dataset is RRBS and that the P. abies dataset is a targeted capture dataset, what type of sequencing was done, etc.

Author Response

We are very grateful to the reviewers for their critical review of the manuscript, and we sincerely appreciate the and the time spent. Reviewer’s suggestions were adopted in the revised version. Below, we provide a point by point response.

Can et al. present their newly-developed bioinformatic pipeline for conducting association analyses of DNA methylation with genetic variation, environmental factors and gene by environment interactions. This reviewer is not a bioinformatics expert, and therefore cannot comment on the underlying quality of the pipeline and software. However, as an experienced user of GWAS analyses, however, this reviewer finds that the pipeline as presented appears a valuable addition to the toolsets available to those interested in the role of DNA methylation in phenotypic diversity.

I only have one comment on the science presented: the P. abies dataset is a targeted capture dataset, and this is relevant for the GO analysis in 2.2.2.5 and 2.2.2.6 . Does the original targeted dataset of genes show any bias in the GO terms present that could influence the subsequent analysisv? If so, do the authors still find the same relative enrichment after this bias has been taken into account?

RESPONSE: The GO bias analysis used as reference only the GO terms of the genes targeted in the capture data set and hence no bias is expected. 

Otherwise, the paper could benefit from some improvement in the writing. In particular, the authors should give the introduction a careful and thorough revision. At present, it is poorly structured, the wrong examples are used, and a lot of the references are 15 years or more old. Some detailed notes are below. The results and discussion section is much better written, but again a quick revision to check for some odd English phraseology would be wise (some examples from the Introduction, Results and the supplementary ‘Blastx’ section are noted below, but there are others). A little more information is also required in the Materials and Methods on the sequencing (see below).

RESPONSE: We removed many of the older references and replaced them by more recent ones. We extensively worked on Introduction and Methods and hope that they are better comprehensible in the revised version. We also applied changes to many other sections, including the supplementary blastx paragraph.

Introduction

Definition of epigenetics – usually includes the specification that the modifications are inherited through mitosis. Otherwise they are just modifications.

RESPONSE: We have modified the text as suggested.

 Line 34 – midifications?!

RESPONSE: This was corrected.

 Line 36 - “Methylation is currently the most intensively studied”… most extensively, possibly. Intensively – there is a lot of excellent work on histone modifications too. Specify DNA methylation here, as histone methylation is not discussed.

RESPONSE: We have rephrased the text to make clear that we talk about DNA methylation.

 Line 43 – the vernalization response is given as an example for DNA methylation, but it is driven by histone modification and DNA methylation is not involved. Remove and cite an appropriate example. (Also – that review is now 21 years old! Hence the mistake).

RESPONSE: We now provide a more recent citation.

 Line 63 – “to request a perspective on genomics-based crop design” – I don’t understand this sentence.

RESPONSE: We have removed that sentence in the process of rewriting the introduction.

 More information is required on what GEM, GLINT and EWAS are and do, and how they do it, to introduce the relevance of the modifications made in this software.

RESPONSE: We have modified the text to better describe the limitations that led to not considering some of the tools.

 Line 80 – The EpiDiverse ITN is a great initiative but this section reads a little like an advert for it, without introducing what it is. In particular “share comprehensive pipelines with the EpiDiverse ITN network” is not relevant to most readers.

RESPONSE: This section was rewritten and expanded.

Results

Table 1: Column “Required for which runs?” Row “DMRs” – is it not “required to run the pipeline with DMRs”?

RESPONSE: Yes, thank you, we have corrected it in the text, DMPs was replaced by DMRs in the row DMRs: “Required to run the pipeline with DMRs”.

Line 439 - CHELSA database – what does the acronym stand for? Add a reference.

RESPONSE: acronym definition and reference were added.

 Line 456 - G and GxEGO - presumably G and GxE GO?

RESPONSE: Thank you, we corrected the typo.

Supplemental:

Figure S9. “the blue line is called the significance threshold. The red line is set to 10-6 as default and the blue line’s threshold is calculated as dividing the red line threshold with 100. The header indicates the context.input_type_filtered_output_FDR (default=0.05).” This is confusing – is the blue line an 0.05 FDR threshold or is it not? If it is not, it should not be presented as such anywhere.

RESPONSE: We corrected the legend of this figure.

Line 399 – “receptor-like kinases seem to be efficient for the survival of plants” – check phrasing.

RESPONSE: We rephrased this sentence.

 Materials and Methods

More information is required on the sequencing of the datasets analysed – for example, this reviewer found no mention that the valley oak dataset is RRBS and that the P. abies dataset is a targeted capture dataset, what type of sequencing was done, etc.

RESPONSE: We have added these parts at the start of this section as suggested.

Back to TopTop