Text Mining and Data Mining

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 November 2024 | Viewed by 1499

Special Issue Editor


E-Mail Website
Guest Editor
Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
Interests: text mining; entity relationship calculation; text clustering; data mining

Special Issue Information

Dear Colleagues,

In today's world, it is a complex task to gather, analyze, and extract information from huge amount of datasets. So, we use many efficient methods for the practical integration of the data. Text mining is a technique based around applying knowledge discovery techniques to unstructured text and termed knowledge discovery in text, text data mining, or text mining. Data mining technology is giving us the ability to extract meaningful patterns from large quantities of structured data. Information retrieval systems have made large quantities of textual data available, as well as the scope of multimodal data, in particular multimodal information extraction, which focuses knowledge discovery on multimodal data from various modalities such as image, text, and video, as well as image-aided information extraction, which enhances the performance of information extraction with image information, image information extraction, which extracts structured data from images, etc.

Text mining and data mining use diverse techniques such as natural language processing, computer vision, machine learning, information retrieval, and knowledge management for the automated analysis of digital content. By doing so, text mining and data mining can extract information, identify patterns, and discover new trends, insights, and correlations.

This Special Issue seeks original, unpublished articles that address recent advances in data mining and text mining techniques as well as their applications. Topics of interest include, but are not limited to, the following: text mining; knowledge discovery in text; content analysis; text analysis; text classification; audio-to-text mining; video-to-text mining; image-to-text mining; big data; data mining; artificial intelligence; machine learning; information retrieval; applications; multimodal information extraction; image-aided information extraction; image information extraction.

Prof. Dr. Ming Liu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at mdpi.longhoe.net by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • text mining
  • knowledge discovery in text
  • content analysis
  • text analysis
  • text classification
  • audio-to-text mining
  • video-to-text mining
  • image-to-text mining
  • big data
  • data mining
  • artificial intelligence
  • machine learning
  • information retrieval
  • applications
  • multimodal information extraction
  • image-aided information extraction
  • image information extraction

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 7509 KiB  
Article
How Does Digital Technology Inspire Global Fashion Design Trends? Big Data Analysis on Design Elements
by Nahyun Lee and Sungeun Suh
Appl. Sci. 2024, 14(13), 5693; https://doi.org/10.3390/app14135693 - 29 Jun 2024
Viewed by 373
Abstract
Digital technology has changed every process of the fashion industry significantly. Using big data analysis methods such as text-mining, network, CONCOR, and content analyses, this study aims to understand the impact of digital technology trends from the fashion design perspective. The influence of [...] Read more.
Digital technology has changed every process of the fashion industry significantly. Using big data analysis methods such as text-mining, network, CONCOR, and content analyses, this study aims to understand the impact of digital technology trends from the fashion design perspective. The influence of digital technology on fashion design elements (e.g., color, print and graphic, textiles, and style and details) was evident through various keywords related to digital technology, humans, and nature, and the relationships between these keywords were confirmed. The analysis of the implicit meanings and directions of the derived keywords resulted in four clusters: (1) human- and nature-oriented design in the digital world as a new reality; (2) new textiles reflecting digital technology; (3) sustainable design technology; and (4) new utility fashion in the digital space. This study proposed a new design research methodology in which big data were incorporated and could be applied to educational curricula, allowing students to derive practical design elements through big data analysis and serving as a guide for planning and develo** technology-inspired designs. Practically, it provided specific information on the direction of digital-technology-inspired fashion design trends, which could assist fashion designers and aspiring entrepreneurs in planning. Full article
(This article belongs to the Special Issue Text Mining and Data Mining)
Show Figures

Figure 1

19 pages, 1398 KiB  
Article
An Undersampling Method Approaching the Ideal Classification Boundary for Imbalance Problems
by Wensheng Zhou, Chen Liu, Peng Yuan and Lei Jiang
Appl. Sci. 2024, 14(13), 5421; https://doi.org/10.3390/app14135421 - 22 Jun 2024
Viewed by 236
Abstract
Data imbalance is a common problem in most practical classification applications of machine learning, and it may lead to classification results that are biased towards the majority class if not dealt with properly. An effective means of solving this problem is undersampling in [...] Read more.
Data imbalance is a common problem in most practical classification applications of machine learning, and it may lead to classification results that are biased towards the majority class if not dealt with properly. An effective means of solving this problem is undersampling in the borderline area; however, it is difficult to find the area that fits the classification boundary. In this paper, we present a novel undersampling framework, whereby the clustering of samples in the majority class is conducted and segmentation is then performed in the boundary area according to the clusters obtained; this enables a better shape that fits the classification boundary to be obtained via the performance of random sampling in the borderline area of these segments. In addition, we hypothesize that there exists an optimal number of classifiers to be integrated into the method of ensemble learning that utilizes multiple classifiers that have been obtained via sampling to promote the algorithm. After passing the hypothesis test, we apply the improved algorithm to the newly developed method. The experimental results show that the proposed method works well. Full article
(This article belongs to the Special Issue Text Mining and Data Mining)
Show Figures

Figure 1

25 pages, 811 KiB  
Article
Contextual Hypergraph Networks for Enhanced Extractive Summarization: Introducing Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES)
by Aytuğ Onan and Hesham Alhumyani
Appl. Sci. 2024, 14(11), 4671; https://doi.org/10.3390/app14114671 - 29 May 2024
Viewed by 346
Abstract
Extractive summarization, a pivotal task in natural language processing, aims to distill essential content from lengthy documents efficiently. Traditional methods often struggle with capturing the nuanced interdependencies between different document elements, which is crucial to producing coherent and contextually rich summaries. This paper [...] Read more.
Extractive summarization, a pivotal task in natural language processing, aims to distill essential content from lengthy documents efficiently. Traditional methods often struggle with capturing the nuanced interdependencies between different document elements, which is crucial to producing coherent and contextually rich summaries. This paper introduces Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES), a novel framework designed to address these challenges through an advanced hypergraph-based approach. MCHES constructs a contextual hypergraph where sentences form nodes interconnected by multiple types of hyperedges, including semantic, narrative, and discourse hyperedges. This structure captures complex relationships and maintains narrative flow, enhancing semantic coherence across the summary. The framework incorporates a Contextual Homogenization Module (CHM), which harmonizes features from diverse hyperedges, and a Hypergraph Contextual Attention Module (HCA), which employs a dual-level attention mechanism to focus on the most salient information. The innovative Extractive Read-out Strategy selects the optimal set of sentences to compose the final summary, ensuring that the latter reflects the core themes and logical structure of the original text. Our extensive evaluations demonstrate significant improvements over existing methods. Specifically, MCHES achieves an average ROUGE-1 score of 44.756, a ROUGE-2 score of 24.963, and a ROUGE-L score of 42.477 on the CNN/DailyMail dataset, surpassing the best-performing baseline by 3.662%, 3.395%, and 2.166% respectively. Furthermore, MCHES achieves BERTScore values of 59.995 on CNN/DailyMail, 88.424 on XSum, and 89.285 on PubMed, indicating superior semantic alignment with human-generated summaries. Additionally, MCHES achieves MoverScore values of 87.432 on CNN/DailyMail, 60.549 on XSum, and 59.739 on PubMed, highlighting its effectiveness in maintaining content movement and ordering. These results confirm that the MCHES framework sets a new standard for extractive summarization by leveraging contextual hypergraphs for better narrative and thematic fidelity. Full article
(This article belongs to the Special Issue Text Mining and Data Mining)
Show Figures

Figure 1

Back to TopTop