Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter
Abstract
:1. Introduction
Significance of Research
2. Background—Features Engineering and Machine Learning
2.1. Feature Engineering
2.1.1. Network-Based Features
2.1.2. Activity Features
2.1.3. User Features
2.1.4. Content-Based Features
2.1.5. Personality Features
2.1.6. Master Feature (PMI-SO)
2.1.7. Features Summary
2.2. Machine Learning
3. Material and Methods
3.1. Data Input Step
3.1.1. Data Accessibility, Collection, and Annotation
3.1.2. Manual Data Annotation
3.2. Pre-Processing Step
3.3. Feature Extraction Step
3.4. Feature Generation Step
3.4.1. Pointwise Mutual Information
3.5. Feature Engineering
3.6. Class Imbalance Distribution
3.7. Feature Analysis
3.8. Machine Learning Algorithms
3.9. Performance Evaluation
4. Results and Discussion
4.1. Results Achieved with Bag of Words (BoW) (Baseline 1)
4.2. Results Achieved with Word to Vector (Baseline 2)
4.3. Results Achieved with our Proposed Method (PMI-SO)
4.4. Results Summary
4.5. In-Depth Analysis of Results
- (a)
- What type of personality traits falls under different severity levels (Low, Medium, or High)?
- (b)
- What is their gender?
- (c)
- What age group do they belong to?
- (d)
- Since when the user has been using Twitter?
- (e)
- What time do they tweet post?
- (f)
- What makes their tweet cyberbullied?
4.5.1. Low-Level Cyberbullying Severity
4.5.2. Medium Cyberbullying Severity
4.5.3. High Cyberbullying Severity
5. Contribution and Limitations
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Huang, Q.; Singh, V.K.; Atrey, P.K. Cyber Bullying Detection Using Social and Textual Analysis. In Proceedings of the 3rd International Workshop on Socially-Aware Multimedia, Orlando, FL, USA, 7 November 2014; ACM: New York, NY, USA, 2014; pp. 3–6. [Google Scholar]
- Chatzakou, D.; Vakali, A.; Kafetsios, K. Detecting variation of emotions in online activities. Expert Syst. Appl. 2017, 89, 318–332. [Google Scholar] [CrossRef]
- Hoff, D.L.; Mitchell, S.N. Cyberbullying: Causes, effects, and remedies. J. Educ. Adm. 2009, 47, 652–665. [Google Scholar] [CrossRef]
- Patchin, J.W.; Hinduja, S. Cyberbullying and self-esteem. J. Sch. Health 2010, 80, 614–621, quiz 622–624. [Google Scholar] [CrossRef] [PubMed]
- Yao, M.; Chelmis, C.; Zois, D.-S. Cyberbullying detection on instagram with optimal online feature selection. In Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Barcelona, Spain, 28–31 August 2018; IEEE Press: Barcelona, Spain, 2018; pp. 401–408. [Google Scholar]
- Balakrishnan, V.; Khan, S.; Fernandez, T.; Arabnia, H.R. Cyberbullying detection on twitter using Big Five and Dark Triad features. Personal. Individ. Differ. 2019, 141, 252–257. [Google Scholar] [CrossRef]
- Galán-García, P.; de la Puerta, J.G.; Gómez, C.L.; Santos, I.; Bringas, P.G. Supervised machine learning for the detection of troll profiles in twitter social network: Application to a real case of cyberbullying. Log. J. Igpl. 2016, 24, 42–53. [Google Scholar] [CrossRef] [Green Version]
- Haidar, B.; Chamoun, M.; Serhrouchni, A. A Multilingual System for Cyberbullying Detection: Arabic Content Detection using Machine Learning. Adv. Sci. Technol. Eng. Syst. J. 2017, 2, 275–284. [Google Scholar] [CrossRef] [Green Version]
- Vyawahare, M.; Chatterjee, M. Taxonomy of Cyberbullying Detection and Prediction Techniques in Online Social Networks. In Data Communication and Networks; Jain, L.C., Tsihrintzis, G.A., Balas, V.E., Sharma, D.K., Eds.; Springer: Singapore, 2020; pp. 21–37. [Google Scholar]
- Domingos, P. A few useful things to know about machine learning. Commun. ACM 2012, 55, 78. [Google Scholar] [CrossRef] [Green Version]
- Myers, C.-A.; Cowie, H. Cyberbullying across the Lifespan of Education: Issues and Interventions from School to University. Int. J. Environ. Res. Public Health 2019, 16, 1217. [Google Scholar] [CrossRef] [Green Version]
- Modeling Detect. Textual Cyberbullying. Available online: https://web.media.mit.edu/~lieber/Publications/Cyberbullying-Barcelona.pdf (accessed on 15 November 2020).
- Isa, S.M.; Ashianti, L. Cyberbullying classification using text mining. In Proceedings of the 2017 1st International Conference on Informatics and Computational Sciences (ICICoS), Semarang City, Indonesia, 15–16 November 2017; 2017; pp. 241–246. [Google Scholar]
- Hosseinmardi, H.; Mattson, S.A.; Rafiq, R.I.; Han, R.; Lv, Q.; Mishra, S. Detection of cyberbullying incidents on the instagram social network. ar**: Reducing Gender Bias Amplification using Corpus-level Constraints. ar**:+Reducing+Gender+Bias+Amplification+using+Corpus-level+Constraints&author=Zhao,+J.&author=Wang,+T.&author=Yatskar,+M.&author=Ordonez,+V.&author=Chang,+K.-W.&publication_year=2017&journal=ar**v" class='google-scholar' target='_blank' rel='noopener noreferrer'>Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Talpur, B.A.; O’Sullivan, D. Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter. Informatics 2020, 7, 52. https://doi.org/10.3390/informatics7040052
Talpur BA, O’Sullivan D. Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter. Informatics. 2020; 7(4):52. https://doi.org/10.3390/informatics7040052
Chicago/Turabian StyleTalpur, Bandeh Ali, and Declan O’Sullivan. 2020. "Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter" Informatics 7, no. 4: 52. https://doi.org/10.3390/informatics7040052