Next Article in Journal
User Evaluation and Metrics Analysis of a Prototype Web-Based Federated Search Engine for Art and Cultural Heritage
Previous Article in Journal
A Novel Multi-View Ensemble Learning Architecture to Improve the Structured Text Classification
Previous Article in Special Issue
Towards Automated Semantic Explainability of Multimedia Feature Graphs
 
 
Article
Peer-Review Record

Multimodal Fake News Detection

Information 2022, 13(6), 284; https://doi.org/10.3390/info13060284
by Isabel Segura-Bedmar *,† and Santiago Alonso-Bartolome †
Reviewer 2: Anonymous
Reviewer 3:
Information 2022, 13(6), 284; https://doi.org/10.3390/info13060284
Submission received: 18 April 2022 / Revised: 30 May 2022 / Accepted: 31 May 2022 / Published: 2 June 2022
(This article belongs to the Special Issue Sentiment Analysis and Affective Computing)

Round 1

Reviewer 1 Report

The authors raise a very important topic related to the spread of fake news in society. It is especially important nowadays, during the barbaric attack of Russia on Ukraine. The authors briefly present the state of art. The key word is "briefly". Although the literature in the work consists of more than 20 items, there are not many of those that relate exactly to the topic of the article. From this point of view, authors should refer more to research related to the detection of fake news or the analysis of texts corresponding to their research, or directly say that no one else has done it. The idea of ​​multimodality itself is not new and concerns many areas, wherever some analysis and inference must be carried out. Also in this case, it was to be expected that the obtained results would be more promising. One is supposition and the other is proof. Therefore, the authors' work deserves a positive evaluation. Therefore, I believe that the article is worth publishing, but after completing the remarks related to the reference to the works on this subject.

Author Response

Reviewer #1

Response

The authors briefly present the state of art. The key word is "briefly". Although the literature in the work consists of more than 20 items, there are not many of those that relate exactly to the topic of the article. From this point of view, authors should refer more to research related to the detection of fake news or the analysis of texts corresponding to their research, or directly say that no one else has done it.

We have extended our state of art (see introduction)We have extended our state of art.

Reviewer 2 Report

Title: Multimodal fake news detection

Authors: Isabel Segura-Bedmar and Santiago Alonso-Bartolome

 

The authors proposed a fake new detection algorithm using the fusion strategy of multimodal information. My comments are listed below.

  1. The percentages of training, validation, and test data sets in Fig. 3 are all the same for each given length. There is no difference between them. Similarly, the sizes of six classes are not clear using bar chart representation in Fgis. 1& 2. I suggest the authors present the size of six classes for each datasets by using the number listed in tables.
  2. The label of the horizontal axis is inexactly aligned with the items in Figs. 1 & 2. They should be improved.
  3. What is the percentage of the image and text materials in the dataset?
  4. The architecture of BiLSTM as shown in Fig. 5 is composed of two parts. The first part converts an input matrix of 15 by 300 into a reduced matrix of 15 by 140. Second, the reduced matrix is processed by the CNN which is the same architecture in Fig. 4. I think the first part seems to be redundant except for the dimension reduction from my opinion.
  5. Some data are missed as list at LINES 185 & 189 in PAGE 7.
  6. The output of CNN for image data is flattened into a vector of length 56307. And then, it is concatenated with the text as described at LINE 205 in PAGE 7. However, the output of CNN for text material is the classification results of six classes. Is this operation reasonable?
  7. The definitions of micro and macro average are undefined in page 8.
  8. The authors Implement an SVM classifier for the fair comparison. What are the input features for the SVM? Text only or both?

     

Author Response

Reviewer #2

Responses

The percentages of training, validation, and test data sets in Fig. 3 are all the same for each given length. There is no difference between them. Similarly, the sizes of six classes are not clear using bar chart representation in Fgis. 1& 2. I suggest the authors present the size of six classes for each datasets by using the number listed in tables.

We have removed figure 3. We have also replaced Figures 1 and 2 with their corresponding tables. 

The label of the horizontal axis is inexactly aligned with the items in Figs. 1 & 2. They should be improved.

We have removed these figures

What is the percentage of the image and text materials in the dataset?

The dataset contains a total of 682,661 texts with images. In addition to these texts, there are almost 290,000 additional texts without images. Therefore, 70% of the instances include both texts and images, while 30% only contain texts. It has been explained in the paper (see subsection dataset).

The architecture of BiLSTM as shown in Fig. 5 is composed of two parts. The first part converts an input matrix of 15 by 300 into a reduced matrix of 15 by 140. Second, the reduced matrix is processed by the CNN which is the same architecture in Fig. 4. I think the first part seems to be redundant except for the dimension reduction from my opinion.

The reviewer is right. The first part of this architecture might seem a bit redundant. 

However, we think that this figure can help to clearly show that we are using a hybrid approach combining BiLSTM and CNN. We prefer to keep this figure. 

WE have removed the figure for CNN architecture.

Some data are missed as list at LINES 185 & 189 in PAGE 7

We have fixed these errors

The output of CNN for image data is flattened into a vector of length 56307. And then, it is concatenated with the text as described at LINE 205 in PAGE 7. However, the output of CNN for text material is the classification results of six classes. Is this operation reasonable?

I think that there is a misunderstanding. We concatenate the output of the dense layer of the CNN model for the text classification and the output provided by the CNN model for images. Then, the concatenated vector is passed to a softmax layer. We have improved this explanation in the section describing the multimodal approach. We have also included a new figure.

The definitions of micro and macro average are undefined in page 8.

We have added their definitions at the beginning of the section result.

The authors Implement an SVM classifier for the fair comparison. What are the input features for the SVM? Text only or both?

We used tf-idf for text representation. We do not apply SVM for images. 

Reviewer 3 Report

The authors consider a very important and very timely issue of fake news.

The procedure of CNN is explained well and the Figure is helpful, but it would be instructive to supplement the Figure to see how the text is encoded at the input layer (at least schematically) and how the (presumably) numerical data propagates through the network.

To enhance the clarity, the authors are suggested to briefly explain the key terms such as “unimodal”, “multimodal”, and “tokens.” It would be nice to include the Figure for BERT. A bit more discussion on the significance of the results obtained by BERT would be helpful, for example, what does 78% signify in simple terms?

Author Response

Reviewer #3

Responses

The procedure of CNN is explained well and the Figure is helpful, but it would be instructive to supplement the Figure to see how the text is encoded at the input layer (at least schematically) and how the (presumably) numerical data propagates through the network.

it is really difficult to include how the numerical data propagates through the network in a small figure. We should need several figures to clearly explain this. However, we have included some explanations about the convolution and max-pooling operations used by CNN.

To enhance the clarity, the authors are suggested to briefly explain the key terms such as “unimodal”, “multimodal”, and “tokens.

We have added explanations for them in the introduction section. 

It would be nice to include the Figure for BERT. 

We have added a figure for BERT

A bit more discussion on the significance of the results obtained by BERT would be helpful, for example, what does 78% signify in simple terms?

WE have extended the discussion about the results of BERT (see section discussion) 

Back to TopTop