Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing Approaches
Abstract
:1. Introduction
- RQ1—Which sources of passive sensing data are most effective for supporting the detection of MH disorders?
- RQ2—Which data fusion approaches are most effective for combining data features of varying modalities to prepare for training ML models to detect MH disorders?
- RQ3—What ML approaches have previous researchers used to successfully detect MH disorders from multimodal data?
2. Materials and Methods
2.1. Search Strategy
2.2. Inclusion and Exclusion Criteria
- The study collects data passively via ubiquitous or wearable devices, considering the cost-effectiveness and general accessibility.
- The data is human generated, i.e., derived from individuals’ actions in an environment or interactions with specific platforms or devices.
- The data source involves at least two different modalities.
- The study adopts ML algorithms intending to detect one or more MH disorders.
- The study is written in English.
- The study was published from the year 2015 onwards (further details in the following section).
- The study investigates data sources of a single modality or exclusively focuses on a specific modality, e.g., text-based approaches.
- The study specifically targets the pediatric population, i.e., toddlers and children below ten years old, as defined within the suggested adolescent age range of 10–24 years [39].
- The study targets a particular symptom of specific MH disorders, e.g., low mood, which is a common sign of depression.
- Data collection requires dedicated equipment or authorized resources:
- -
- Brain neuroimaging data, e.g., functional magnetic resonance imaging (fMRI), structural MRI (sMRI), electroencephalogram (EEG), electromyography (EMG), and photoplethysmography (PPG) signals
- -
- Clinical data, e.g., electronic health records (EHRs) and clinical notes
- -
- Genomic data
- -
- Body motions collected using specialized motion capture platforms or motor sensors
- -
- Makes use of Augmented Augmented Reality (AR) or Virtual Reality (VR) technology
- The study does not employ ML algorithms for detection/prediction, e.g., focusing on correlation/association analysis, treatment/intervention strategies, or proposing study protocols.
- The study is a survey, book, conference proceeding, workshop, or magazine
- The study is unpublished or non-peer-reviewed.
2.3. Selection Process
2.4. Data Extraction
2.5. Quality Assessment
3. Results
3.1. Data Source
3.1.1. Audio and Video Recordings
3.1.2. Social Media
3.1.3. Smartphones
3.1.4. Wearable Devices
3.2. Data Ground Truth
3.2.1. Clinical Assessments
3.2.2. Self-Reports
3.3. Modality and Features
3.3.1. Audio
3.3.2. Visual
3.3.3. Textual
3.3.4. Social Media
3.3.5. Smartphone and Wearable Sensors
- (1)
- Physical Mobility Features: Studies have shown that negative MH states and greater depression severity are associated with lower levels of physical activity, demonstrated via fewer footsteps, less exercising [154], being stationary for a greater proportion of time [205], and less motion variability [149], whereas a study on the student population showed an opposite trend for increased physical activity [157]. Movements across locations in terms of distance, location variability, significant locations (deduced through location clusters) [177], and time spent in these places [164] were also valuable. For instance, researchers found greater depression severity or negative MH states associated with less distance variance, less normalized location entropy [154,158], lower number of significant visited places with increased average length of stay [158], and fewer visits to new places [205]. In contrast, Kim et al.’s [162] investigation on adolescents with major depressive disorders (MDD) found that they traveled longer distances than healthy controls. Timing and location semantics could further contribute more detailed insights, such as the discoveries of individuals with negative MH states staying stationary more in the morning but less in the evening [205], those with more severe depression spending more time at home [154,175], and schizophrenia patients visiting more places in the morning [206]. Researchers also acquired sleep information either through inferences from a combination of sensor information relating to physical movement, environment, and phone-locked states or through the APIs of sleep inferences in wearable devices. Sleep patterns and regularity were demonstrated to correlate with depressive symptoms [150,158] where individuals with positive MH states wake up earlier [205], whereas MDD patients showed more irregular sleep (inferred from sleep regularity index) [149].
- (2)
- Phone Interaction Features: Phone usage (i.e., inferred from the frequency and duration of screen unlocks) and application usage were potentially helpful. For instance, several studies [158] found a high frequency of screen unlocks and low average unlock duration for each unlock as potential depressive symptoms. However, while Wang et al. [205] demonstrated the association between negative MH states and lower phone usage, the opposite trend was observed in students and adolescents with depressive symptoms who used smartphones longer [150,162,164]. Researchers also investigated more fine-grained features, such as phone usage at different times of the day, where they found schizophrenic patients exhibiting less phone usage at night but more in the afternoon [206]. Additionally, individuals with MH disorders also showed distinctive application engagement, such as Opoku Asare et al.’s [166] findings that individuals with depressive symptoms used social applications more frequently and for a longer duration. Generally, they also showed more active application engagement in the early hours or midnight compared to healthy controls, who showed diluted engagement patterns throughout the day. Meanwhile, Choudhary et al. [212] revealed that individuals with anxiety exhibited more frequent usage of applications from “passive information consumption apps”, “games”, and “health and fitness” categories.
- (3)
- Sociability Features: Sociability features, such as the number of incoming/outgoing phone calls and text messages and the duration of phone calls, were also potential indicators of MH disorders [164,175]. For instance, negative MH states are associated with making more phone calls and text messaging [205,222] and reaching out to more new contacts [222]. On the other hand, adult and adolescent populations suffering from MDD were revealed to receive fewer incoming messages [149] and more phone calls [162], respectively. Lastly, ambient environments could also play a role since individuals with schizophrenia were found to be around louder acoustic environments with human voices [206], whereas those with negative MH states demonstrated a higher tendency to be around fewer conversations [205] than healthy controls.
3.3.6. Demographics and Personalities
3.4. Modality Fusion
3.4.1. Feature Transformation to Prepare for Fusion
3.4.2. Multimodal Fusion Techniques
3.5. Machine Learning Models
- Supervised learning—trained on labeled input–output pairs to learn patterns for map** unseen inputs to outputs.
- Ensemble learning—combines multiple base learners of any kind (e.g., linear, tree-based or NN models) to obtain better predictive performance, assuming that errors of a single base learner will be compensated by the others [292].
- Multi-task learning—attempts to solve multiple tasks simultaneously by taking advantage of the similarities between tasks [289].
- Others—incorporates semi-supervised, unsupervised, or combination of approaches from various categories.
3.5.1. Supervised Learning
3.5.2. Neural-Network-Based Supervised Learning
3.5.3. Ensemble Learning
3.5.4. Multi-Task Learning (MTL)
3.5.5. Others
3.6. Additional Findings
3.6.1. Modality and Feature Comparisons
3.6.2. Personalized Machine Learning Models
4. Discussion
4.1. Principal Findings
4.1.1. RQ1—Which Sources of Data Are Most Effective for Supporting the Detection of MH Disorders?
4.1.2. RQ2—Which Data Fusion Approaches Are Most Effective for Combining Data Features of Varying Modalities to Prepare for Training ML Models to Detect MH Disorders?
4.1.3. RQ3—What ML Approaches Have Previous Researchers Used to Successfully Detect MH Disorders from Multimodal Data?
4.2. Evaluation of Data Sources
4.2.1. Criterion 1—Reliability of Data
4.2.2. Criterion 2—Validity of Ground Truth Acquisition
4.2.3. Criterion 3—Cost
4.2.4. Criterion 4—General Acceptance
4.2.5. Overall Findings
4.3. Guidelines for Data Source Selection
- Define research objectives and scope: Clearly defined research objectives and questions can guide researchers to determine the kind of information required to achieve the research goals and, subsequently, to evaluate the extent of the data source in accurately representing or capturing relevant information. Determining the scope of the study is crucial to pinpoint and assess the relevance of data information to ensure that collected data effectively contributes to the desired outcomes.
- Determine the target population: Identifying the target population and its characteristics involves various aspects, including the targeted MH disorders, demographics, cultural backgrounds, and geographical distribution. These aspects are mutually influential since individuals’ behaviors and data may vary based on reactions to different MH disorders, with further influence caused by cultural backgrounds and demographics, such as age, gender, and occupation. Additionally, geographical distribution and economic backgrounds may influence an individual’s accessibility to a specific data collection tool. This consideration ensures that the data collected is representative and applicable to the population of interest, enhancing the overall effectiveness of the approach.
- Identify candidate data sources and evaluate their feasibility: Evaluating the feasibility of each data source in light of the research objectives and target population identified above assists researchers in making informed decisions. Given the contexts and environments in which the target population is situated, researchers can assess which data source is the most practical and relevant. For example, researchers may consider employing remote sensing to introduce the unobtrusiveness of data collection for high-risk MH disorders or overcome geographical challenges. This assessment should consider its feasibility in terms of cost and accessibility, and it should be informed by Figure 5 to ensure that the selected data source can effectively capture relevant MH symptoms.
- Consult stakeholders: Engaging stakeholders, including healthcare professionals, patients, and families, provides various perspectives of parties involved in supporting individuals with MH disorders. These consultations verify and offer insights into the acceptability and feasibility of data sources and help ensure that researchers’ decisions align with ethical considerations and stakeholders’ comfort.
- Ethical considerations and guidelines: Researchers should further consult institutional review boards and established guidelines to ensure the compliance of data collection procedures with ethical standards and research practices. This step is crucial to safeguard participants’ rights and privacy, enhancing the credibility of the study.
- Assess the significance of ground truth information: Evaluating the significance of ground truth information informs how researchers gauge its impact on the study and whether specific workarounds are necessary to enhance ground truth reliability and validity during data collection. This evaluation will then aid researchers in designing the data collection procedure and determining the extent of reliance on ground truth to support future analysis, reasoning, and deductions.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AdaBoost | Adaptive Boosting |
ADHD | Attention Deficit Hyperactivity Disorder |
BDI | Beck Depression Inventory |
CES-D | Center for Epidemiological Studies Depression Scale |
CNN | Convolutional neural network |
DNN | Deep neural network |
DSM-V | Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition |
ED | Eating disorder |
GAD-7 | General Anxiety Disorder-7 |
GPS | Global Positioning System |
GRU | Gated recurrent unit |
HDRS | Hamilton Depression Rating Scale |
LSTM | Long short-term memory |
MDD | Major depressive disorder |
MFCC | Mel frequency cepstral coefficients |
MH | Mental health |
ML | Machine learning |
MLP | Multi-layer perceptron |
MRI | Magnetic Resonance Imaging |
MTL | Multi-task learning |
NN | Neural network |
OCD | Obsessive-compulsive disorder |
PHQ-9 | Patient Health Questionnaire-9 |
PTSD | Post-traumatic stress disorder |
RF | Random forest |
SLR | Systematic literature review |
SVM | Support vector machine |
XGBoost | Xtreme Gradient Boosting |
Appendix A. Existing Modality Features
Features | Tools | Studies | Feature Category |
---|---|---|---|
Low-level descriptors: jitter, shimmer, amplitude, pitch perturbation quotients, Mel-frequency cepstral coefficients (MFCCs), Teager-energy cepstrum coefficients (TECCs) [320], Discrete Cosine Transform (DCT) coefficients | OpenSmile [267], COVAREP [321], YAAFE [322], Praat [323], Python libraries (pyAudioAnalysis [324], DisVoice [325]), My-Voice Analysis [326], Surfboard [327], librosa [328] | [12,48,51,72,74,78,81,87,88,90,91,92,94,97,99,101,104,107,108,184,192,195,196,197,198,199,211,214] | Voice |
Existing acoustic feature sets: Interspeech 2010 Paralinguistics [329], Interspeech 2013 ComParE [330], extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) [331]) | OpenSmile [267] | [51,57,59,61,63,74,81,97,103,192,194,195,196,197,198] | Voice |
Speech, pause, laughter, utterances, articulation, phonation, intent expressivity | Praat [323], DeepSpeech [332] | [12,48,79,107,184,193,203,211] | Speech |
Vocal tract physiology features | N/A | [49] | Speech |
Embeddings of audio samples | VGG-16 [261], VGGish [333], DeepSpeech [332], DenseNet [334], SoundNet [259], SincNet [335], Wav2Vec [336], sentence embedding model [337], HuBERT [338], convolutional neural network (CNN), bidirectional LSTM (BiLSTM), ResNet [308], graph temporal convolution neural network (GTCN) [339] | [59,60,67,72,74,80,86,89,93,96,100,136,174,195] | Representations |
Graph features: average degree, clustering coefficient and shortest path, density, transitivity, diameter, local and global efficiency | Visibility graph (two data points visible to each other are connected with an edge) | [81] | Representations |
Statistical descriptors of voice/speech features: mean, standard deviation, variance, extreme values, kurtosis, 1st and 99th percentiles, skewness, quartiles, interquartile range, range, total, duration rate, occurrences, coefficient of variation (CV) | Manual computation, histograms, DeepSpeech [332] | [12,55,56,91,92,99,107,193,197,214] | Derived |
Bag-of-AudioWords (BoAW) representations of voice/speech features | openXBOW [340] | [59,74] | Representations, Derived |
High-level representations of features/representations (capture spatial and temporal information) | Gated recurrent unit (GRU) [341], LSTM, BiLSTM, combination of CNN residual and LSTM-based encoder–decoder networks [75], time-distributed CNN (T-CNN), multi-scale temporal dilated convolution (MS-TDConv) blocks, denoising autoencoder | [61,65,67,73,75,77,87,94,100,199] | Representations, Derived |
Session-level representations from segment-level features/representations | Simple concatenation, Fisher vector encoding, Gaussian Mixture Model (GMM) | [192,199,214] | Representations, Derived |
Facial/body appearance, landmarks, eye gaze, head pose | OpenFace [269], OpenCV [270], Viola Jones’ face detector [342], CascadeObjectDetector function in MATLAB’s vision toolbox, Haar classifier [270], Gauss–Newton Deformable Part Model (GN-DPM) [343], OpenPose [271], ZFace [344], CNN [46], Faster-RCNN (Region CNN) [147], multilevel convolutional coarse-to-fine network cascade [345], Inception-ResNet-V2 [346], VGG-Face [68], DenseNet [334], Affectiva https://go.affectiva.com/affdex-for-market-research (accessed on 10 December 2023), DBFace https://github.com/dlunion/DBFace (accessed on 10 December 2023), FaceMesh https://developers.google.com/android/reference/com/google/mlkit/vision/facemesh/FaceMesh (accessed on 10 December 2023), dlib [347] | [25,53,55,56,68,79,80,83,91,92,94,98,99,101,108,174,193,199,201,203,220] | Subject/Object |
Appearance coefficients of facial image and shape | Active Orientation Model (AOM) [348] | [50] | Subject/Object |
Probability distribution of 365 common scenes | Places365-CNN [349] | [220] | Subject/Object |
Features | Tools | Studies | Feature Category |
---|---|---|---|
Feature descriptors: local binary patterns, Edge Orientation Histogram, Local Phase Quantization, Histogram of Oriented Gradients (HOG) | OpenFace [269] | [47,48,53,195] | Subject/Object, Derived |
Geometric features: displacement, mean shape of chosen points, difference between coordinates of specific landmarks, Euclidean distance, angle between landmarks, angular orientation | Manual computation, subject-specific active appearance model (AMM), AFAR toolbox [350] | [47,51,54,56,70,79,83,91,99,195,198,203] | Subject/Object, Derived |
Motion features: movement across video frames, range and speed of displacements (facial landmarks, eye gaze direction, eye open and close, head pose, upper body points) | 3D convolutional layers on persons detected at frame-level, Motion history histogram (MHH) [351], feature dynamic history histogram (FDHH), residual network-based dynamic feature descriptor [75] | [52,53,68,75,147,193] | Subject/Object, Derived |
Facial action units (FAUs), facial expressions | OpenFace [269], Face++ [352], FACET software [353], AU detection module of AFAR [350] | [25,79,91,99,194,196,198] | Subject/Object, Emotion-related |
FAU features: occurrences, intensities, facial expressivity, peak expressivity, behavioral entropy | MHH, Modulation spectrum (MS), Fast Fourier transform (FFT) | [83,99,101,107,194,354] | Emotion-related, Derived |
Emotion profiles (EPs) | SVM-based EP detector [355] | [101] | Emotion-related |
Sentiment score | ResNeXt [356] | [186] | Emotion-related |
Turbulence features capturing sudden erratic changes in behaviors | N/A | [192] | Derived |
Deep visual representations from images or video frames | VGG-16 [261], VGG-Face [357], VGGNet [261], AlexNet [358], ResNet [308] ResNet-50 [359], ResNeXt [356], EfficientNet [360], InceptionResNetV2 [346], CNN, dense201 [195], self-supervised DINO (self-distillation with no labels) [361], GTCN [339], unsupervised Convolutional Auto-Encoder (CAE) (replaces autoencoder’s fully connected layer with CNN) [195] | [53,58,60,74,82,84,85,89,93,95,98,106,111,115,116,117,122,125,126,129,131,132,135,160,187,195,201,220] | Representations |
High-level (frame-level) representations of low-level features (LLDs, facial landmarks, FAUs) | Stacked Denoising Autoencoders (SDAE) [306], DenseXception block-based CNN [221] (replace DenseNet’s convolution layer with Xception layer), CNN-LSTM, denoising autoencoder, LSTM-based multitask learning modality encoder [62], 3D convolutional layers, LSTM | [55,62,87,98,199,221] | Representations, Derived |
Session-level representations from frame-level features/representations | Average of frame-level representations, Fisher vector (FV) encoding, improved FV coding [265], GMM, Temporal Attentive Pooling (TAP) [75] | [55,75,117,199] | Representations, Derived |
Texts extracted from images | python-tesseract [362] | [25,126,128,140] | Textual |
Image labels/tags | Deep CNN-based multi-label classifier [113], Contrastive Language Image Pre-training (CLIP) [363], Imagga [364] (CNN-based automatic tagging system) | [113,124,128,129] | Textual |
Bag-of-Words (BoVW) features | Multi-scale Dense SIFT features (MSDF) [365] | [124,195] | Textual, Derived |
Color distribution-cool, clear, and dominant colors, pixel intensities | Probabilistic Latent Semantic Analysis model [366] (assigns a color to each image pixel), cold color range [367], RGB histogram | [20,140,145,204,220] | Color-related |
Brightness, saturation, hue, value, sharpness, contrast, correlation, energy, homogeneity | HSV (Hue, Saturation, color) [368] color model | [20,106,109,113,145,204,220] | Color-related |
Statistical descriptors for each HSV distribution: quantiles, mean, variance, skewness, kurtosis | N/A | [145,204] | Color-related, Derived |
Pleasure, arousal, and dominance | Compute from brightness and saturation values [276] | [220] | Emotion-related, Derived |
Number of pixels, width, height, if image is modified (indicated via exif file) | N/A | [204] | Image metadata |
Features | Tools | Studies | Feature Category |
---|---|---|---|
Count of words: general, condition-specific (depressed, suicidal, eating disorder-related) keywords, emojis | N/A | [20,104,109,123,126,127,130,133,134,137,145,146,187,188,218,219] | Linguistic |
Words referring to social processes (e.g., reference to family, friends, social affiliation), and psychological states (e.g., negative/positive emotions) | Linguistic Inquiry and Word Count (LIWC) [278], LIWC 2007 Spanish dictionary [369], Chinese Suicide Dictionary [370], Chinese LIWC [371], TextMind [372], Suite of Automatic Linguistic Analysis Tools (SALAT) [279]—Simple Natural Language Processing (SiNLP) [373] | [20,79,109,118,121,128,186,194,196,197,198,204,211,219,374] | Linguistic |
Part-of-speech (POS) tags: adjectives, nouns, pronouns | Jieba [375], Natural Language Toolkit (NLTK) [280], TextBlob [376], spaCy, Penn Treebank [377], Empath [378] | [61,100,104,123,126,135,184,185,189,195,218,219] | Linguistic |
Word count-related representations: Term Frequency–Inverse Document Frequency (TF-IDF), Bag of Words (BoW), n-grams, Term Frequency–Category Ratio (TF-CR) [379] | Word2Vec embeddings, language models | [115,116,118,124,128,130,140,143,144,148,185,186,188,198,217,374] | Linguistic, Representations |
Readability metrics: Automated Readability Index (ARI), Simple Measure of Gobbledygook (SMOG), Coleman–Liau Index (CLI), Flesch reading ease, Gunning fog index, syllable count scores | Textstat [380] | [218,220] | Linguistic |
Lexicon-based representations [381] | Depression domain lexicon [382], Chinese suicide dictionary [370] | [120,135,189] | Representations |
Sentiment scores, valence, arousal, and dominance (VAD) ratings | NLTK [280], IBM Watson Tone Analyzer, Azure Text Analytics, Google NLP, NRC emotion lexicon [383], senti-py [384], Stanford NLP toolkit [281], Sentiment Analysis and Cognition Engine (SEANCE) [282], text SA API of Baidu Intelligent Cloud Platform [123], Valence Aware Dictionary and Sentiment Reasoner (VADER) [385], Chinese emotion lexicons DUTIR [386], Affective Norms for English Words ratings (ANEW) [283], EmoLex [387], SenticNet [388], Lasswell [389], AFINN SA tool [390], LabMT [391], text2emotion [392], BERT [266] | [20,54,61,86,110,115,118,119,121,123,126,127,128,130,132,133,137,143,144,145,146,148,184,185,186,188,194,196,197,198,218,219,374] | Sentiment-related |
Happiness scores of emojis | Emoji sentiment scale [393] | [110] | Sentiment-related |
Emotion transitions from love to joy, from love to anxiety/sorrow (inspired by [394]) | Chinese emotion lexicons DUTIR [386] | [187] | Sentiment-related |
Word representations | Global vectors for word representation (GloVe) [395], Word2Vec [396], FastText [397], Embeddings from Language Models (ELMo) [398], BERT [266], ALBERT [297], XLNet [285], bidirectional gated recurrent unit (BiGRU) [341], itwiki (Italian Wikipedia2Vec model), Spanish model [399], EmoBERTa [298] (incorporate linguistic and emotional information), MiniLM [400] (supports multiple languages), GPT [401], TextCNN [402], Bi-LSTM [294] | [49,60,65,67,69,72,73,77,78,81,82,87,88,90,95,96,97,98,100,106,111,112,113,116,122,125,128,129,131,135,136,138,142,145,147,148,185,186,187,201,214,218,308] | Semantic-related, Representations |
Sentence representations | Paragraph Vector (PV) [284], Universal Sentence Encoder [403], Sentence-BERT [404] | [52,59,70,71,89,102,103,174,199] | Semantic-related, Representations |
Topic modeling, topic-level features | Scikit-learn’s Latent Dirichlet Allocation module [405], Biterm Topic Model [406] | [20,43,114,118,119,126,130,134,136,137,146,185,188,194,217,219] | Semantic-related |
Description categories | IBM Watson’s Natural Language Understanding tool (https://cloud.ibm.com/apidocs/natural-language-understanding#text-analytics-features (accessed on 10 December 2023)) | [132] | Semantic-related |
High-level representations from low-level features/representations (e.g., sentence-level from word-level, to capture sequential and/or significant information) | BiLSTM with an attention layer, stacked CNN and BiGRU with attention, summarization [119] using K-means clustering and BART [407], combination of LSTM with attention mechanism and CNN, BiGRU with attention | [73,95,97,119,136,145,159,201] | Representations, Derived |
User-level representations from post-level representations | CNN-based triplet network [408] from existing Siamese network [409] (consider cosine similarities between post-level representations between each individual and others in the same and different target groups), LSTM with attention mechanism | [128,138] | Representations, Derived |
Session-level representations from segment-level representations | Fisher vector encoding | [199] | Representations, Derived |
Subject-level average, median, standard deviation of sentiment scores, representations, POS counts | N/A | [110,185,186] | Derived |
Subject-level representations in conversation | Graph attention network—vertex as question/answer pair incorporating LeakyReLU on neighbors with respective attention coefficients, edge between adjacent questions | [97] | Representations, Derived |
Features | Tools | Studies | Feature Category |
---|---|---|---|
Posts distribution (original posts, posts with images, posts of specific emotions/sentiments)-frequency, time | N/A | [109,112,122,123,126,130,134,137,142,145,188,218,219] | Post metadata |
Username, followers, followings, status/bio description, profile header and background images, location, time zone | N/A | [109,115,118,122,123,126,130,134,137,142,145,171,188,218,219] | User metadata |
Likes, comments, hashtags, mentions, retweets (Twitter), favourites (Twitter) | N/A | [115,126,135,137,142,171,185,189] | Social interactions, Post metadata |
Stressful periods with stress level and category (study, work, family, interpersonal relation, romantic relation, or self-cognition) | Algorithm [410] applied on users’ posting behaviors | [187] | Post metadata, Derived |
Aggregate posting time by 4 seasons, 7 days of the week, 4 epochs of the day (morning, afternoon, evening, midnight), or specific times (daytime, sleep time, weekdays, weekends) | N/A | [125,130,135,186,188,189,219] | Post metadata, Derived |
Encoding of numerical features | Categorize into quartiles (low, below average, average, high) | [115] | Representations, Derived |
Social interaction graph-node: user-level representations concatenated from post-level representations, edge: actions of following, mentioning, replying to comments, quoting | node2vec [411], Ego-network [412] | [139,185] | Social interactions |
Personalized graph-user-level node: user-level representations made up of property nodes, property node (individual), personal information, personality, mental health experience, post behavior, emotion expression and social interactions, user–user edge: mutual following-follower relationship, user-property edge: user’s characteristics | Attention mechanism to weigh property by contribution to individual’s mental health condition (user-property edge) and emotional influence (user–user edge) | [187] | Social interactions |
Retweet network node: user-level representations, directed edge: tweets of a user is retweeted by the directed user | Clustering-based neighborhood recognition-form communities with densely connected nodes, expand communities using similarity with adjacent nodes | [141] | Representations |
Features | Tools | Studies | Feature Category |
---|---|---|---|
Phone calls and text messages: frequency, duration, entropy | N/A | [104,105,149,155,156,159,161,162,166,169,170,171,175,190,205,206,209,222,223] | Calls and messages |
Phone unlocks: frequency, duration | Manual computation, RAPIDS [413]-a tool for data pre-processing and biomarker computation | [99,149,150,155,156,158,160,161,162,166,167,171,176,190,205,206,208,212,374,171] | Phone interactions |
Phone charge duration | N/A | [163] | Phone interactions |
Running applications: type, frequency, duration of usage | N/A | [99,149,150,155,156,158,160,161,162,166,169,170,171,190,205,206,208,212,374] | Phone interactions |
Activity states (e.g., walking, stationary, exercising, running, unknown): frequency, duration | Android activity recognition API, activity recognition model (LSTM-RNN [414], SVM), Google Activity Recognition Transition API (using gyroscope and accelerometer) | [150,152,154,160,163,169,170,176,177,190,205,206] | Physical mobility |
Footsteps | API of mobile devices, Euclidean norm of accelerometer data | [154,169,170] | Physical mobility |
Distance traveled, displacement from home, location variance and entropy, time spent at specific places, transitions | Manual computation, RAPIDS [413] | [99,150,151,153,154,155,158,160,161,162,165,166,175,176,177,200,205,206,208,209] | Physical mobility |
Location cluster features: number of clusters, largest cluster as primary location, most and least visited clusters | DBSCAN clustering [415], Adaptive K-means clustering [416] | [150,151,153,154,160,165,176,177,205,208] | Physical mobility |
Speed | Compute from GPS and/or accelerometer | [153,165,166,209] | Physical mobility |
Intensity of action | Compute rotational momentum from GPS and gyroscope | [162] | Physical mobility |
GPS sensor, calls and phone screen unlock features | RAPIDS [413]-a tool for data pre-processing and biomarker computation | [158,164] | Physical mobility, Calls and messages, Phone interactions |
WiFi association events (when a smartphone is associated or dissociated with a nearby access point at a location’s WiFi network) | N/A | [153] | Connectivity |
Occurrences of unique Bluetooth addresses, most/least frequently detected devices | N/A | [99,151,155,156,175] | Connectivity |
Surrounding sound: amplitude, conversations, human/non-human voices | N/A | [150,163,166,205,206,207,208,209] | Ambient environment |
Surrounding illuminance: amplitude, mean, variance, standard deviation | N/A | [99,163,190,205,208,209] | Ambient environment |
Silent and noise episodes: count, sum, minimum decibels | Detect via intermittent samples until noise state changes | [166] | Ambient environment |
Sleep duration, wake and sleep onset | Infer from ambient light, audio amplitude, activity state, and screen on/off | [150,160,161,167,169,170,175,176,206] | Derived, Physical mobility |
Keystroke features: count, transitions, time between two consecutive keystrokes | N/A | [166,202] | Phone interactions |
Time between two successive touch interactions (tap, long tap, touch) | N/A | [166] | Phone interactions |
Day-level features | Statistical functions (mean, median, mode, standard deviation, interquartile range) at the day-level or day of the week (weekdays, weekends) | [151,152,154,156,159,163,164,170,176,206] | Derived |
Epoch-level features | Statistical functions at partitions of a day-morning, afternoon, evening, night | [149,151,152,156,159,163,166,176,206] | Derived |
Hour-level features | Statistical functions at each hour of the day | [208,209] | Derived |
Week-level features | Statistical functions at the week-level, distance from weekly mean | [162,164] | Derived |
Rhythm-related features: ultradian, circadian, and infradian rhythms, regularity index [417], periodicity based on time windows | Manual computation, Cosinor [418]-a rhythmic regression function | [151,152,153,155,157,158,176,207] | Derived |
Degrees of complexity and irregularity | Shannon entropy of sensor features | [166] | Derived |
Statistical, temporal and spectral time series features | Time Series Feature Extraction Library (TSFEL) [419] | [104,105] | Derived |
High-level cluster-based features: cluster labels, likelihood scores, distance scores, transitions | Gaussian mixture model (GMM) [420], partition around methods (PAM) clustering model [421] | [208,209] | Derived |
Network of social interactions and personal characteristics: node type corresponds to a modality/category (e.g., individual, personality traits, social status, physical health, well-being, mental health status) | Heterogeneous Information Network (HIN) [422] | [173] | Representations |
Representations capturing important patterns across timestamps | Transformer encoder [295] | [179] | Representations |
Features | Tools | Studies | Feature Category |
---|---|---|---|
Duration and onset of sleep status (asleep, restless, awake, unknown), sleep efficiency, sleep debt | API of wristband | [149,151,155,156,164,171,180,181,182,191,374] | Physical mobility |
Number of steps, active and sedentary bouts, floor climb | API of wristband | [150,151,155,156,164,171,179,180,181,182,191,374] | Physical mobility |
Heart rate (HR), galvanic skin response (GSR), skin temperature (ST), electrodermal activity (EDA) | API of Wristband | [149,150,164,169,170,172,178,179,182,191] | Physiological |
Outliers of systolic and diastolic periods: centering tendency, spreading degree, distribution shape and symmetry degree values from blood volume pressure | N/A | [178] | Physiological, Derived |
Motion features from accelerometer data: acceleration, motion | N/A | [149] | Physical mobility |
Heart rate variability (HRV), rapid eye movement, wake after sleep onset, metabolic equivalent for task (MET) for physical activity | API of Oura ring | [158] | Physiological, Physical mobility |
High-level features from HR, GSR, and ST signals | CNN-LSTM | [215] | Representations |
Basal metabolic rate (BMR) calories | API of wristband | [179,180] | Physiological |
Features | Tools | Studies | Feature Category |
---|---|---|---|
Gender, age, location | Sina microblog user account | [187] | Demographic |
Gender, age, relationships, education levels | bBridge [423], big data platform for social multimedia analytics | [20] | Demographic |
Age, gender | Age and gender lexica [424], M3-inference model [425] performs multimodal analysis on profile images, usernames, and descriptions on social media profiles | [121,143,144] | Demographic |
Big 5 personality scores | IBM’s Personality Insights [426], BERT-MLP model [427] on textual content | [57,121,130,143,144,188] | Personality |
Proportion of perfection and ruminant thinking-related words in textual content (inspired by [287]) | Perfection and ruminant-thinking-related lexicons | [187] | Personality |
Interpersonal sensitivity: amount of stressful periods associated with interpersonal relations | Algorithm [410] applied on users’ posting behaviors | [187] | Personality |
Appendix B. Existing Modality Fusion Techniques
Category | Method | Tools | Studies |
---|---|---|---|
Feature level | Concatenate into a single representation | N/A | [67,84,85,89,96,97,105,132,142,143,145,146,166,170,179,197,199,200,201,217] |
Score/Decision level | Sum-rule, product-rule, max-rule, AND and OR operations, or majority voting on modality-level scores | N/A | [48,51,56,77,87,98,126,173,193,198,201] |
Weighted average or sum of modality-level scores | N/A | [51,68,147,198,200] | |
Average confidence scores from lower-level prediction | N/A | [121] | |
Combine predictions of individual modalities as inputs to secondary ML models | SVM, decision tree, random forest, novel ML models | [48,52,56,64,71,72,74,103,122,155,193] | |
Hierarchical score/decision-level fusion | Weighted voting fusion network [428] | [122,195] | |
Summation of question-level scores from rules enforced on modality-specific predictions | N/A | [88] | |
Model level | Map multiple features into a single vector | LSTM-based encoder–decoder network, LSTM-based neural network, BiLSTM, LSTM, fully connected layer, tensor fusion network | [46,59,75,80,86,95,187] |
Concatenate feature representations as a single input to learn high-level representations | Dense and fully connected layers with attention mechanisms, CNN, multi-head attention network, transformer [295], novel time-aware LSTM | [70,73,77,89,91,92,94,125,189,214] | |
Learn shared representations from weighted modality-specific representations | Gated Multimodal Unit (GMU) [429], parallel attention model, attention layer, sparse MLP (mix vertical and horizontal information via weight sharing and sparse connection), multimodal encoder–decoder, multimodal factorized bilinear pooling (combines compact output features of multi-modal low-rank bilinear [430] and robustness of multi-modal compact bilinear [431]), multi-head intermodal attention fusion, transformer [295], feed-forward network, low-rank multimodal fusion network [432] | [62,65,67,76,93,100,102,106,113,117,131,135,136,142,143,144,174,218,433] | |
Learn joint sparse representations | Dictionary learning | [20] | |
Learn and fuse outputs from different modality-specific parts at fixed time steps | Cell-coupled LSTM with L-skip fusion mechanism | [101] | |
Learn cross-modality representations that incorporate interactions between modalities | LXMERT [434], transformer encoder with cross-attention layers (representations of a modality as query and the other as key/value, and vice versa), memory fusion network [435] | [82,92,129] | |
Horizontal and vertical kernels to capture patterns across different levels | CASER [309] | [170] |
Appendix C. Existing Machine Learning Models
Category | Machine Learning Models | Application Method | Studies |
---|---|---|---|
Supervised learning | Linear regression, logistic regression, least absolute shrinkage and selection operator (Lasso) regularized linear regression [436], ElasticNet regression [437], stochastic gradient descent (SGD) regression, Gaussian staircase model, partial least square (PLS) [438] regression (useful for collinear features), generalized linear models | Learn relationship between features to predict continuous values (scores of assessment scales) or probabilities (correspond to output classes) | [20,43,49,53,55,68,70,99,104,105,126,130,134,140,150,154,163,164,167,175,179,182,188,200,211,212,213,219,222,223] |
SVM | Find a hyperplane that best fits features (regression) or divides features into classes (classification), secondary model in score-level fusion | [47,50,79,99,104,105,115,121,130,134,140,148,162,163,169,178,179,188,198,210,219,223] | |
One class SVM [439] | Anomaly detection by treating outliers as points on the other side of hyperplane | [165] | |
Three-step hierarchical logistic regression | Incremental inclusion of three feature groups in conventional logistic regression | [181] | |
Discriminant functions-Naive Bayes, quadratic discriminant analysis (QDA), linear discriminant analysis (LDA), Gaussian naive Bayes | Determine class based on Bayesian probabilities, detect state changes | [12,99,104,140,148,152,163,222] | |
Decision tree | Construct a tree that splits into leaf nodes based on feature | [99,134,140,148,164,178] | |
Mixed-effect classification and regression trees-generalized linear mixed-effects model (GLMM) trees [440] | Capture interactions and nonlinearity among features while accounting for longitudinal structure | [191] | |
Neural network | Fully connected (FC) layers, multilayer perceptron (MLP), CNN, LSTM, BiLSTM, GRU, temporal convolutional network (TCN) [441] (with dilation for long sequences)-with activation function like Sigmoid, Softmax, ReLU, LeakyReLU, and GeLU | Predict scores of assessment scales (regression) or probability distribution over classes (classification) | [60,78,80,84,85,86,87,88,90,91,92,93,94,96,98,105,111,113,117,131,133,135,136,142,143,144,146,162,163,167,168,170,172,174,178,179,190,197,199,201,218,219,221,223,308] |
DCNN-DNN (combination of deep CNN and DNN), GCNN-LSTM (combination of gated convolutional neural network, which replaces a convolution block in CNN with a gated convolution block, and LSTM) | The latter neural network makes predictions based on high-level global features learned by the prior | [52,308] | |
Cross-domain DNN with feature adaptive transformation and combination strategy (DNN-FATC) | Enhance detection in the target domain by transferring information from a heterogenous source domain | [109] | |
Attention-based TCN | Classify features using relational classification attention [442] | [72] | |
One-hot transformer (lower complexity than original sine and cosine functions) | Apply one-hot encoding on features for classification | [72] | |
Transformer [295] | Apply self-attention across post-level representations, attention masking masks missing information | [129] | |
Transformer-based sequence classification models-BERT, RoBERTa [296], XLNet [285], Informer [443] (for long sequences) | Perform classification using custom pre-trained tokenizers augmented with special tokens for tokenization | [121,179] | |
Hierarchical attention network (HAN) [444] | Predict on user-level representations derived from stacked attention-based post-level representations, each made up of attention-based word-level representations | [128] | |
LSTM-based encoder and decoder | Learn factorized joint distributions to generate modality-specific generative factors and multimodal discriminative factors to reconstruct unimodal inputs and predict labels respectively | [82] | |
GRU-RNN as baseline model with FC layers as personalized model | Train baseline model using data from all samples and fine-tune personalized model on individual samples | [161] | |
CNN-based triplet network [408] | Incorporate representations of homogeneous users | [138] | |
Stacked graph convolutional network | Perform classification on heterogeneous graphs by learning embeddings, sorting graph nodes, and performing graph comparisons | [139] | |
GRU-D (introduce decay rates in conventional GRU to control decay mechanism) | Learn feature-specific hidden decay rates from inputs | [171] | |
Ensemble learning | Random forest (RF) [300], eXtreme Gradient Boosting (XGBoost), AdaBoost [301], Gradient Boosted Regression Tree [302] (GDBT) (less sensitive to outliers and more robust to overfitting) | Predict based on numerical input features | [51,99,104,105,114,126,130,134,140,148,151,155,157,160,163,164,167,169,178,179,182,183,188,203,204,206,212,219,222,223] |
RF | Secondary model that predicts from regression scores and binary outputs of individual modality predictions | [71,81] | |
Balanced RF [445] (RF on imbalanced data) | Aggregate predictions of ensemble on balanced down-sampled data | [209] | |
XGBoost-based subject-specific hierarchical recall network | Deduce subject-level labels based on whether the output probability of XGBoost at a specific layer exceeds a predetermined threshold | [194] | |
Stacked ensemble learning architecture | Obtain the first level of predictions from KNN, naive Bayes, Lasso regression, ridge regression, and SVM, then use them as features of a second-layer logistic regression | [123] | |
Feature-stacking (a meta-learning approach) [303] | Use logistic regression as an L1 learner to combine predictions of weak L0 learners on different feature sets | [185] | |
Greedy Ensembles of Weighted Extreme Learning Machines (GEWELMs), WELM [446] (weighted map** for unbalanced class), Kernel ELM | ELM [447] as a building block that maps inputs to class-based outputs via least square regression | [63,127,192] | |
Stacked ensemble classifier | Use MLP as meta learner to integrate outputs of CNN base learners | [126] | |
Cost-sensitive boosting pruning trees-AdaBoost with pruned decision trees | Weighted pruning prunes redundant leaves to increase generalization and robustness | [137] | |
Weighted voting model | Weight predictions of baseline ML models (DT, Naive Bayes, KNN, SVM, generalized linear models, GDBT) based on class probabilities and deduce final outcome from the highest weighted class | [140] | |
Ensemble of SVM, DT, and naive Bayes | N/A | [89] | |
Combination of personalized LSTM-based and RF models | Train personalized LSTM on hourly time series data (of another sample most similar to the sample of concern based on demographic characteristics and baseline MH states), and RF on statistical and cluster-based features | [208] | |
Multi-task learning | CNN | Train jointly to produce two output branches, regression score and probability distribution for classification | [61,62] |
LSTM-RNN, attention-based LSTM subnetwork, MLP with shared and task-specific layers | Train for depression prediction with emotion recognition as the secondary task | [46,106,132] | |
LSTM with Swish [448] activation function (speeds up training with the advantages of linear and ReLU activation), GRU with FC layers, DNN with multi-task loss function | Perform both regression and classification simultaneously | [74,102,118,141,193] | |
Multi-task FC layers | Train jointly to predict severity level and discrete probability distribution | [97] | |
Multi-output support least-squares vector regression machines (m-SVR) [304] | Map multivariate inputs to a multivariate output space to predict several tasks | [207] | |
2-layer MLP with shared and task-specific dense layers with dynamic weight tuning technique | Train to perform individual predictions for positive and control groups | [180] | |
Bi-LSTM-based DNNs to provide auxiliary outputs into DNN for main output | Auxiliary outputs correspond to additional predictions to incorporate additional information | [176] | |
DNN (FC layers with Softmax activation) for auxiliary and main outputs | Train DNNs individually on different feature combinations as individual tasks to obtain auxiliary losses for joint optimization function of main output | [145] | |
Multi-task neural network with shared LSTM layer and two task-specific LSTM layers | Train to predict male and female samples individually | [70] | |
Others | Semi-supervised learning-ladder network classifier [305] of stacked noisy encoder and denoising autoencoder [306] | Reconstruct input using outputs of noisy encoder in the current layer and decoder from the previous layer, combine with MLP (inspired by [449]) | [196] |
DMF [450], RESCAL [451], DEDICOM [452], HERec [453] | Perform recommender system [307] approach on features modeled using HIN | [173] | |
Graphlets [454], colored graphlets [455], DeepWalk [456], Metapath2vec++ [457] | Perform node classification on features modeled using HIN | [173] | |
Combination of DBSCAN and K-Means | Density-based clustering | [78] | |
Clustering-based-KNN | Deduce predicted class through voting of K-nearest data | [140,163,166,178,212,223] | |
Linear superimpose of modality-specific features | Learn fitting parameters (between 0 and 1) that adjust the proportions of modality-specific features in the final outcome | [83] | |
Two-staged prediction with outlier detection | Baseline ML model (LR, SVM, KNN, DT, GBDT, AdaBoost, RF, Gaussian naive Bayes, LDA, QDA, DNN, CNN) performs day-level predictions, t-test detects outliers in first stage outputs | [163] | |
Label association mechanism | Apply to one-hot vectors of predictions from modality-specific DNNs | [189] | |
Isolation Forest (ISOFOR) [458], Local Outlier Factor (LOF) [459], Connectivity-Based Outlier Factor (COF) [460] | Unsupervised anomaly detection | [166] | |
Similarity and threshold relative to the model of normality (MoN) (from the average of deep representations of training instances in respective target groups) | Deduce predicted class based on higher similarity with corresponding MoN | [85] | |
Federated learning based on DNN | Train global model on all data and fine-tune the last layer locally | [168] |
References
- Institute of Health Metrics and Evaluation. Global Health Data Exchange (GHDx); Institute of Health Metrics and Evaluation: Seattle, WA, USA, 2019. [Google Scholar]
- World Health Organization. Mental Health and COVID-19: Early Evidence of the Pandemic’s Impact: Scientific Brief, 2 March 2022; Technical Report; World Health Organization: Geneva, Switzerland, 2022. [Google Scholar]
- Australian Bureau of Statistics (2020–2022). National Study of Mental Health and Wellbeing. 2022. Available online: https://www.abs.gov.au/statistics/health/mental-health/national-study-mental-health-and-wellbeing/latest-release (accessed on 10 December 2023).
- National Institute of Mental Health. Statistics of Mental Illness. 2021. Available online: https://www.nimh.nih.gov/health/statistics/mental-illness (accessed on 10 December 2023).
- Bloom, D.; Cafiero, E.; Jané-Llopis, E.; Abrahams-Gessel, S.; Bloom, L.; Fathima, S.; Feigl, A.; Gaziano, T.; Hamandi, A.; Mowafi, M.; et al. The Global Economic Burden of Noncommunicable Diseases; Technical Report; Harvard School of Public Health: Boston, MA, USA, 2011. [Google Scholar]
- World Health Organization. Mental Health and Substance Use. In Comprehensive Mental Health Action Plan 2013–2030; World Health Organization: Geneva, Switzerland, 2021. [Google Scholar]
- Borg, M. The Nature of Recovery as Lived in Everyday Life: Perspectives of Individuals Recovering from Severe Mental Health Problems. Ph.D. Thesis, Norwegian University of Science and Technology, Trondheim, Norway, 2007. [Google Scholar]
- Barge-Schaapveld, D.Q.; Nicolson, N.A.; Berkhof, J.; Devries, M.W. Quality of life in depression: Daily life determinants and variability. Psychiatry Res. 1999, 88, 173–189. [Google Scholar] [CrossRef] [PubMed]
- Rapee, R.M.; Heimberg, R.G. A cognitive-behavioral model of anxiety in social phobia. Behav. Res. Ther. 1997, 35, 741–756. [Google Scholar] [CrossRef]
- Stewart-Brown, S. Emotional wellbeing and its relation to health. BMJ 1998, 317, 1608–1609. [Google Scholar] [CrossRef]
- Goldman, L.S.; Nielsen, N.H.; Champion, H.C.; The Council on American Medical Association Council on Scientific Affairs. Awareness, Diagnosis, and Treatment of Depression. J. Gen. Intern. Med. 1999, 14, 569–580. [Google Scholar] [CrossRef]
- Grünerbl, A.; Muaremi, A.; Osmani, V.; Bahle, G.; Öhler, S.; Tröster, G.; Mayora, O.; Haring, C.; Lukowicz, P. Smartphone-Based Recognition of States and State Changes in Bipolar Disorder Patients. IEEE J. Biomed. Health Inform. 2015, 19, 140–148. [Google Scholar] [CrossRef]
- Kakuma, R.; Minas, H.; Ginneken, N.; Dal Poz, M.; Desiraju, K.; Morris, J.; Saxena, S.; Scheffler, R. Human resources for mental health care: Current situation and strategies for action. Lancet 2011, 378, 1654–1663. [Google Scholar] [CrossRef]
- Le Glaz, A.; Haralambous, Y.; Kim-Dufor, D.H.; Lenca, P.; Billot, R.; Ryan, T.C.; Marsh, J.; DeVylder, J.; Walter, M.; Berrouiguet, S.; et al. Machine Learning and Natural Language Processing in Mental Health: Systematic Review. J. Med. Internet Res. 2021, 23, e15708. [Google Scholar] [CrossRef]
- Rahman, R.A.; Omar, K.; Mohd Noah, S.A.; Danuri, M.S.N.M.; Al-Garadi, M.A. Application of Machine Learning Methods in Mental Health Detection: A Systematic Review. IEEE Access 2020, 8, 183952–183964. [Google Scholar] [CrossRef]
- Graham, S.; Depp, C.; Lee, E.E.; Nebeker, C.; Tu, X.; Kim, H.C.; Jeste, D.V. Artificial Intelligence for Mental Health and Mental Illnesses: An Overview. Curr. Psychiatry Rep. 2019, 21, 116. [Google Scholar] [CrossRef]
- Thieme, A.; Belgrave, D.; Doherty, G. Machine Learning in Mental Health: A Systematic Review of the HCI Literature to Support the Development of Effective and Implementable ML Systems. ACM Trans. Comput. Hum. Interact. 2020, 27, 1–53. [Google Scholar] [CrossRef]
- Javaid, M.; Haleem, A.; Pratap Singh, R.; Suman, R.; Rab, S. Significance of machine learning in healthcare: Features, pillars and applications. Int. J. Intell. Netw. 2022, 3, 58–73. [Google Scholar] [CrossRef]
- Riaz Choudhry, F.; Vasudevan Mani, L.C.M.; Khan, T.M. Beliefs and perception about mental health issues: A meta-synthesis. Neuropsychiatr. Dis. Treat. 2016, 12, 2807–2818. [Google Scholar] [CrossRef] [PubMed]
- Shen, G.; Jia, J.; Nie, L.; Feng, F.; Zhang, C.; Hu, T.; Chua, T.S.; Zhu, W. Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 3838–3844. [Google Scholar]
- Manickam, P.; Mariappan, S.A.; Murugesan, S.M.; Hansda, S.; Kaushik, A.; Shinde, R.; Thipperudraswamy, S.P. Artificial Intelligence (AI) and Internet of Medical Things (IoMT) Assisted Biomedical Systems for Intelligent Healthcare. Biosensors 2022, 12, 562. [Google Scholar] [CrossRef]
- Skaik, R.; Inkpen, D. Using Social Media for Mental Health Surveillance: A Review. ACM Comput. Surv. 2020, 53, 1–31. [Google Scholar] [CrossRef]
- Chen, X.; Genc, Y. A Systematic Review of Artificial Intelligence and Mental Health in the Context of Social Media. In Proceedings of the Artificial Intelligence in HCI, Virtual, 26 June–1 July 2022; pp. 353–368. [Google Scholar]
- Deshmukh, V.M.; Rajalakshmi, B.; Dash, S.; Kulkarni, P.; Gupta, S.K. Analysis and Characterization of Mental Health Conditions based on User Content on Social Media. In Proceedings of the 2022 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI), Chennai, India, 28–29 January 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Yazdavar, A.H.; Mahdavinejad, M.S.; Bajaj, G.; Romine, W.; Sheth, A.; Monadjemi, A.H.; Thirunarayan, K.; Meddar, J.M.; Myers, A.; Pathak, J.; et al. Multimodal mental health analysis in social media. PLoS ONE 2020, 15, e0226248. [Google Scholar] [CrossRef]
- Garcia Ceja, E.; Riegler, M.; Nordgreen, T.; Jakobsen, P.; Oedegaard, K.; Torresen, J. Mental health monitoring with multimodal sensing and machine learning: A survey. Pervasive Mob. Comput. 2018, 51, 1–26. [Google Scholar] [CrossRef]
- Hickey, B.A.; Chalmers, T.; Newton, P.; Lin, C.T.; Sibbritt, D.; McLachlan, C.S.; Clifton-Bligh, R.; Morley, J.; Lal, S. Smart Devices and Wearable Technologies to Detect and Monitor Mental Health Conditions and Stress: A Systematic Review. Sensors 2021, 21, 3461. [Google Scholar] [CrossRef]
- Woodward, K.; Kanjo, E.; Brown, D.J.; McGinnity, T.M.; Inkster, B.; Macintyre, D.J.; Tsanas, A. Beyond Mobile Apps: A Survey of Technologies for Mental Well-Being. IEEE Trans. Affect. Comput. 2022, 13, 1216–1235. [Google Scholar] [CrossRef]
- Craik, K.H. The lived day of an individual: A person-environment perspective. Pers. Environ. Psychol. New Dir. Perspect. 2000, 2, 233–266. [Google Scholar]
- Harari, G.M.; Müller, S.R.; Aung, M.S.; Rentfrow, P.J. Smartphone sensing methods for studying behavior in everyday life. Curr. Opin. Behav. Sci. 2017, 18, 83–90. [Google Scholar] [CrossRef]
- Stucki, R.A.; Urwyler, P.; Rampa, L.; Müri, R.; Mosimann, U.P.; Nef, T. A Web-Based Non-Intrusive Ambient System to Measure and Classify Activities of Daily Living. J. Med. Internet Res. 2014, 16, e175. [Google Scholar] [CrossRef]
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, 105906. [Google Scholar] [CrossRef]
- Liberati, A.; Altman, D.G.; Tetzlaff, J.; Mulrow, C.; Gøtzsche, P.C.; Ioannidis, J.P.A.; Clarke, M.; Devereaux, P.J.; Kleijnen, J.; Moher, D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: Explanation and elaboration. BMJ 2009, 339, W-65–W-94. [Google Scholar] [CrossRef]
- Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; Group, T.P. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med. 2009, 6, 336–341. [Google Scholar] [CrossRef] [PubMed]
- Kitchenham, B.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; Technical Report; University of Durham: Durham, UK, 2007. [Google Scholar]
- Zhang, T.; Schoene, A.; Ji, S.; Ananiadou, S. Natural language processing applied to mental illness detection: A narrative review. npj Digit. Med. 2022, 5, 46. [Google Scholar] [CrossRef]
- Valstar, M.; Schuller, B.; Smith, K.; Eyben, F.; Jiang, B.; Bilakhia, S.; Schnieder, S.; Cowie, R.; Pantic, M. AVEC 2013: The Continuous Audio/Visual Emotion and Depression Recognition Challenge. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge (AVEC ’13), Barcelona, Spain, 21 October 2013; pp. 3–10. [Google Scholar] [CrossRef]
- Valstar, M.; Schuller, B.; Smith, K.; Almaev, T.; Eyben, F.; Krajewski, J.; Cowie, R.; Pantic, M. AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (AVEC ’14), Orlando, FL, USA, 7 November 2014; pp. 3–10. [Google Scholar] [CrossRef]
- Sawyer, S.M.; Azzopardi, P.S.; Wickremarathne, D.; Patton, G.C. The age of adolescence. Lancet Child Adolesc. Health 2018, 2, 223–228. [Google Scholar] [CrossRef]
- Semrud-Clikeman, M.; Goldenring Fine, J. Pediatric versus adult psychopathology: Differences in neurological and clinical presentations. In The Neuropsychology of Psychopathology; Contemporary Neuropsychology; Springer: New York, NY, USA, 2013; pp. 11–27. [Google Scholar]
- Cobham, V.E.; McDermott, B.; Haslam, D.; Sanders, M.R. The Role of Parents, Parenting and the Family Environment in Children’s Post-Disaster Mental Health. Curr. Psychiatry Rep. 2016, 18, 53. [Google Scholar] [CrossRef]
- Tuma, J.M. Mental health services for children: The state of the art. Am. Psychol. 1989, 44, 188–199. [Google Scholar] [CrossRef]
- Gong, Y.; Poellabauer, C. Topic Modeling Based Multi-Modal Depression Detection. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge (AVEC ’17), Mountain View, CA, USA, 23 October 2017; pp. 69–76. [Google Scholar] [CrossRef]
- Van Praag, H. Can stress cause depression? Prog. Neuro-Psychopharmacol. Biol. Psychiatry 2004, 28, 891–907. [Google Scholar] [CrossRef]
- Power, M.J.; Tarsia, M. Basic and complex emotions in depression and anxiety. Clin. Psychol. Psychother. 2007, 14, 19–31. [Google Scholar] [CrossRef]
- Chao, L.; Tao, J.; Yang, M.; Li, Y. Multi task sequence learning for depression scale prediction from video. In Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), ** correlations in larger mental health samples: Analysis and replication. BJPsych Open 2022, 8, e106. [Google Scholar] [CrossRef]
- Wang, W.; Nepal, S.; Huckins, J.F.; Hernandez, L.; Vojdanovski, V.; Mack, D.; Plomp, J.; Pillai, A.; Obuchi, M.; daSilva, A.; et al. First-Gen Lens: Assessing Mental Health of First-Generation Students across Their First Year at College Using Mobile Sensing. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 1–32. [Google Scholar] [CrossRef]
- Thakur, S.S.; Roy, R.B. Predicting mental health using smart-phone usage and sensor data. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 9145–9161. [Google Scholar] [CrossRef]
- Choi, J.; Lee, S.; Kim, S.; Kim, D.; Kim, H. Depressed Mood Prediction of Elderly People with a Wearable Band. Sensors 2022, 22, 4174. [Google Scholar] [CrossRef]
- Dai, R.; Kannampallil, T.; Kim, S.; Thornton, V.; Bierut, L.; Lu, C. Detecting Mental Disorders with Wearables: A Large Cohort Study. In Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation (IoTDI ’23), San Antonio, TX, USA, 9–12 May 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 39–51. [Google Scholar] [CrossRef]
- Dai, R.; Kannampallil, T.; Zhang, J.; Lv, N.; Ma, J.; Lu, C. Multi-Task Learning for Randomized Controlled Trials: A Case Study on Predicting Depression with Wearable Data. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 1–23. [Google Scholar] [CrossRef]
- Horwitz, A.; Czyz, E.; Al-Dajani, N.; Dempsey, W.; Zhao, Z.; Nahum-Shani, I.; Sen, S. Utilizing daily mood diaries and wearable sensor data to predict depression and suicidal ideation among medical interns. J. Affect. Disord. 2022, 313, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Horwitz, A.G.; Kentopp, S.D.; Cleary, J.; Ross, K.; Wu, Z.; Sen, S.; Czyz, E.K. Using machine learning with intensive longitudinal data to predict depression and suicidal ideation among medical interns over time. Psychol. Med. 2022, 53, 5778–5785. [Google Scholar] [CrossRef]
- Shah, A.P.; Vaibhav, V.; Sharma, V.; Al Ismail, M.; Girard, J.; Morency, L.P. Multimodal Behavioral Markers Exploring Suicidal Intent in Social Media Videos. In Proceedings of the 2019 International Conference on Multimodal Interaction (ICMI ’19), Suzhou, China, 14–18 October 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 409–413. [Google Scholar] [CrossRef]
- Belouali, A.; Gupta, S.; Sourirajan, V.; Yu, J.; Allen, N.; Alaoui, A.; Dutton, M.A.; Reinhard, M.J. Acoustic and language analysis of speech for suicidal ideation among US veterans. BioData Min. 2021, 14, 11. [Google Scholar] [CrossRef] [PubMed]
- Mishra, R.; Prakhar Sinha, P.; Sawhney, R.; Mahata, D.; Mathur, P.; Ratn Shah, R. SNAP-BATNET: Cascading Author Profiling and Social Network Graphs for Suicide Ideation Detection on Social Media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, Dublin, Ireland, 22–27 May 2019; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 147–156. [Google Scholar] [CrossRef]
- Ramírez-Cifuentes, D.; Freire, A.; Baeza-Yates, R.; Puntí, J.; Medina-Bravo, P.; Velazquez, D.A.; Gonfaus, J.M.; Gonzàlez, J. Detection of suicidal ideation on social media: Multimodal, relational, and behavioral analysis. J. Med. Internet Res. 2020, 22, e17758. [Google Scholar] [CrossRef]
- Cao, L.; Zhang, H.; Feng, L. Building and Using Personal Knowledge Graph to Improve Suicidal Ideation Detection on Social Media. IEEE Trans. Multimed. 2022, 24, 87–102. [Google Scholar] [CrossRef]
- Chatterjee, M.; Kumar, P.; Samanta, P.; Sarkar, D. Suicide ideation detection from online social media: A multi-modal feature based technique. Int. J. Inf. Manag. Data Insights 2022, 2, 100103. [Google Scholar] [CrossRef]
- Li, Z.; Cheng, W.; Zhou, J.; An, Z.; Hu, B. Deep learning model with multi-feature fusion and label association for suicide detection. Multimed. Syst. 2023, 29, 2193–2203. [Google Scholar] [CrossRef]
- Heckler, W.F.; Feijó, L.P.; de Carvalho, J.V.; Barbosa, J.L.V. Thoth: An intelligent model for assisting individuals with suicidal ideation. Expert Syst. Appl. 2023, 233, 120918. [Google Scholar] [CrossRef]
- Czyz, E.K.; King, C.A.; Al-Dajani, N.; Zimmermann, L.; Hong, V.; Nahum-Shani, I. Ecological Momentary Assessments and Passive Sensing in the Prediction of Short-Term Suicidal Ideation in Young Adults. JAMA Netw. Open 2023, 6, e2328005. [Google Scholar] [CrossRef]
- Syed, Z.S.; Sidorov, K.; Marshall, D. Automated Screening for Bipolar Disorder from Audio/Visual Modalities. In Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, Seoul, Republic of Korea, 22 October 2022; Association for Computing Machinery: New York, NY, USA, 2018; pp. 39–45. [Google Scholar] [CrossRef]
- Yang, L.; Li, Y.; Chen, H.; Jiang, D.; Oveneke, M.C.; Sahli, H. Bipolar Disorder Recognition with Histogram Features of Arousal and Body Gestures. In Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop (AVEC’18), Seoul, Republic of Korea, 22 October 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 15–21. [Google Scholar] [CrossRef]
- **. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 4260–4264. [Google Scholar] [CrossRef]
- Duwairi, R.; Halloush, Z. A Multi-View Learning Approach for Detecting Personality Disorders Among Arab Social Media Users. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2023, 22, 1–19. [Google Scholar] [CrossRef]
- Bennett, C.C.; Ross, M.K.; Baek, E.; Kim, D.; Leow, A.D. Predicting clinically relevant changes in bipolar disorder outside the clinic walls based on pervasive technology interactions via smartphone ty** dynamics. Pervasive Mob. Comput. 2022, 83, 101598. [Google Scholar] [CrossRef]
- Richter, V.; Neumann, M.; Kothare, H.; Roesler, O.; Liscombe, J.; Suendermann-Oeft, D.; Prokop, S.; Khan, A.; Yavorsky, C.; Lindenmayer, J.P.; et al. Towards Multimodal Dialog-Based Speech & Facial Biomarkers of Schizophrenia. In Proceedings of the Companion Publication of the 2022 International Conference on Multimodal Interaction (ICMI ’22 Companion), Montreal, QC, Canada, 18–22 October 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 171–176. [Google Scholar] [CrossRef]
- Birnbaum, M.L.; Norel, R.; Van Meter, A.; Ali, A.F.; Arenare, E.; Eyigoz, E.; Agurto, C.; Germano, N.; Kane, J.M.; Cecchi, G.A. Identifying signals associated with psychiatric illness utilizing language and images posted to Facebook. npj Schizophr. 2020, 6, 38. [Google Scholar] [CrossRef]
- Wang, R.; Aung, M.S.H.; Abdullah, S.; Brian, R.; Campbell, A.T.; Choudhury, T.; Hauser, M.; Kane, J.; Merrill, M.; Scherer, E.A.; et al. CrossCheck: Toward Passive Sensing and Detection of Mental Health Changes in People with Schizophrenia. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’16), Heidelberg, Germany, 12–16 September 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 886–897. [Google Scholar] [CrossRef]
- Wang, R.; Wang, W.; Aung, M.S.H.; Ben-Zeev, D.; Brian, R.; Campbell, A.T.; Choudhury, T.; Hauser, M.; Kane, J.; Scherer, E.A.; et al. Predicting Symptom Trajectories of Schizophrenia Using Mobile Sensing. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2017, 1, 1–24. [Google Scholar] [CrossRef]
- Tseng, V.W.S.; Sano, A.; Ben-Zeev, D.; Brian, R.; Campbell, A.T.; Hauser, M.; Kane, J.M.; Scherer, E.A.; Wang, R.; Wang, W.; et al. Using behavioral rhythms and multi-task learning to predict fine-grained symptoms of schizophrenia. Sci. Rep. 2020, 10, 15100. [Google Scholar] [CrossRef]
- Lamichhane, B.; Zhou, J.; Sano, A. Psychotic Relapse Prediction in Schizophrenia Patients Using A Personalized Mobile Sensing-Based Supervised Deep Learning Model. IEEE J. Biomed. Health Inform. 2023, 27, 3246–3257. [Google Scholar] [CrossRef]
- Zhou, J.; Lamichhane, B.; Ben-Zeev, D.; Campbell, A.; Sano, A. Predicting Psychotic Relapse in Schizophrenia With Mobile Sensor Data: Routine Cluster Analysis. JMIR mHealth uHealth 2022, 10, e31006. [Google Scholar] [CrossRef]
- Osipov, M.; Behzadi, Y.; Kane, J.M.; Petrides, G.; Clifford, G.D. Objective identification and analysis of physiological and behavioral signs of schizophrenia. J. Ment. Health 2015, 24, 276–282. [Google Scholar] [CrossRef]
- Teferra, B.G.; Borwein, S.; DeSouza, D.D.; Rose, J. Screening for Generalized Anxiety Disorder From Acoustic and Linguistic Features of Impromptu Speech: Prediction Model Evaluation Study. JMIR Form. Res. 2022, 6, e39998. [Google Scholar] [CrossRef]
- Choudhary, S.; Thomas, N.; Alshamrani, S.; Srinivasan, G.; Ellenberger, J.; Nawaz, U.; Cohen, R. A Machine Learning Approach for Continuous Mining of Nonidentifiable Smartphone Data to Create a Novel Digital Biomarker Detecting Generalized Anxiety Disorder: Prospective Cohort Study. JMIR Med. Inform. 2022, 10, e38943. [Google Scholar] [CrossRef] [PubMed]
- Ding, Y.; Liu, J.; Zhang, X.; Yang, Z. Dynamic Tracking of State Anxiety via Multi-Modal Data and Machine Learning. Front. Psychiatry 2022, 13, 757961. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.P.; Gau, S.S.F.; Lee, C.C. Learning Converse-Level Multimodal Embedding to Assess Social Deficit Severity for Autism Spectrum Disorder. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Khullar, V.; Singh, H.P.; Bala, M. Meltdown/Tantrum Detection System for Individuals with Autism Spectrum Disorder. Appl. Artif. Intell. 2021, 35, 1708–1732. [Google Scholar] [CrossRef]
- Mallol-Ragolta, A.; Dhamija, S.; Boult, T.E. A Multimodal Approach for Predicting Changes in PTSD Symptom Severity. In Proceedings of the 20th ACM International Conference on Multimodal Interaction (ICMI ’18), Boulder, CA, USA, 16–18 October 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 324–333. [Google Scholar] [CrossRef]
- Tébar, B.; Gopalan, A. Early Detection of Eating Disorders using Social Media. In Proceedings of the 2021 IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), Orlando, FL, USA, 16–17 December 2021; pp. 193–198. [Google Scholar] [CrossRef]
- Abuhassan, M.; Anwar, T.; Liu, C.; Jarman, H.K.; Fuller-Tyszkiewicz, M. EDNet: Attention-Based Multimodal Representation for Classification of Twitter Users Related to Eating Disorders. In Proceedings of the ACM Web Conference 2023 (WWW ’23), Houston, TX, USA, 3–6 September 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 4065–4074. [Google Scholar] [CrossRef]
- Noguero, D.S.; Ramírez-Cifuentes, D.; Ríssola, E.A.; Freire, A. Gender Bias When Using Artificial Intelligence to Assess Anorexia Nervosa on Social Media: Data-Driven Study. J. Med. Internet Res. 2023, 25, e45184. [Google Scholar] [CrossRef] [PubMed]
- Xu, Z.; Pérez-Rosas, V.; Mihalcea, R. Inferring Social Media Users’ Mental Health Status from Multimodal Information. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; European Language Resources Association: Marseille, France, 2020; pp. 6292–6299. [Google Scholar]
- Meng, X.; Zhang, J.; Ren, G. The evaluation model of college students’ mental health in the environment of independent entrepreneurship using neural network technology. J. Healthc. Eng. 2021, 2021, 4379623. [Google Scholar] [CrossRef] [PubMed]
- Singh, V.K.; Long, T. Automatic assessment of mental health using phone metadata. Proc. Assoc. Inf. Sci. Technol. 2018, 55, 450–459. [Google Scholar] [CrossRef]
- Park, J.; Arunachalam, R.; Silenzio, V.; Singh, V.K. Fairness in Mobile Phone–Based Mental Health Assessment Algorithms: Exploratory Study. JMIR Form. Res. 2022, 6, e34366. [Google Scholar] [CrossRef]
- Liu, S. 3D Illustration of Cartoon Characters Talking And Discussing. Communication and Talking Concept. 3D Rendering on White Background. 2022. Available online: https://www.istockphoto.com/photo/3d-illustration-of-cartoon-characters-talking-and-discussing-communication-and-gm1428415103-471910717 (accessed on 22 November 2023).
- Arefin, S. Social Media. 2014. Available online: https://www.flickr.com/photos/54888897@N05/5102912860/ (accessed on 10 December 2023).
- Secret, A. Hand Holding Phone with Social Media Icon Stock Photo. 2021. Available online: https://www.istockphoto.com/photo/hand-holding-phone-with-social-media-icon-gm1351107098-426983736?phrase=smartphone+cartoon (accessed on 10 December 2023).
- Adventtr. Health Monitoring Information on Generic Smartwatch Screen Stock Photo. 2021. Available online: https://www.istockphoto.com/photo/health-monitoring-information-on-generic-smartwatch-screen-gm1307154121-397513158?utm_source=flickr&utm_medium=affiliate&utm_campaign=srp_photos_top&utm_term=smartphone+and+wearable+cartoon&utm_content=https%3A%2F%2Fwww.flickr.com%2Fsearch%2F&ref=sponsored (accessed on 10 December 2023).
- Gratch, J.; Artstein, R.; Lucas, G.; Stratou, G.; Scherer, S.; Nazarian, A.; Wood, R.; Boberg, J.; DeVault, D.; Marsella, S.; et al. Th Distress Analysis Interview Corpus of human and computer interviews. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 26–31 May 2014; European Language Resources Association (ELRA): Reykjavik, Iceland, 2014; pp. 3123–3128. [Google Scholar]
- Suendermann-Oeft, D.; Robinson, A.; Cornish, A.; Habberstad, D.; Pautler, D.; Schnelle-Walka, D.; Haller, F.; Liscombe, J.; Neumann, M.; Merrill, M.; et al. NEMSI: A Multimodal Dialog System for Screening of Neurological or Mental Conditions. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents (IVA ’19), Paris, France, 2–5 July 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 245–247. [Google Scholar] [CrossRef]
- Çiftçi, E.; Kaya, H.; Güleç, H.; Salah, A.A. The Turkish Audio-Visual Bipolar Disorder Corpus. In Proceedings of the 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), Bei**g, China, 20–22 May 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Yates, A.; Cohan, A.; Goharian, N. Depression and Self-Harm Risk Assessment in Online Forums. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, 7—11 September 2017; Association for Computational Linguistics: Copenhagen, Denmark, 2017; pp. 2958–2968. [Google Scholar]
- Schueller, S.M.; Begale, M.; Penedo, F.J.; Mohr, D.C. Purple: A Modular System for Develo** and Deploying Behavioral Intervention Technologies. J. Med. Internet Res. 2014, 16, e181. [Google Scholar] [CrossRef]
- Farhan, A.A.; Yue, C.; Morillo, R.; Ware, S.; Lu, J.; Bi, J.; Kamath, J.; Russell, A.; Bamis, A.; Wang, B. Behavior vs. introspection: Refining prediction of clinical depression via smartphone sensing data. In Proceedings of the 2016 IEEE Wireless Health (WH), Bethesda, MD, USA, 25–27 October 2016; pp. 1–8. [Google Scholar] [CrossRef]
- Montag, C.; Baumeister, H.; Kannen, C.; Sariyska, R.; Meßner, E.M.; Brand, M. Concept, Possibilities and Pilot-Testing of a New Smartphone Application for the Social and Life Sciences to Study Human Behavior Including Validation Data from Personality Psychology. J 2019, 2, 102–115. [Google Scholar] [CrossRef]
- Bai, R.; ** Simplified Chinese Psychological Linguistic Analysis Dictionary for Microblog. In Proceedings of the Brain and Health Informatics: International Conference (BHI 2013), Maebashi, Japan, 29–31 October 2013; Imamura, K., Usui, S., Shirao, T., Kasamatsu, T., Schwabe, L., Zhong, N., Eds.; Springer: Cham, Switzerland, 2013; pp. 359–368. [Google Scholar]
- Crossley, S.A.; Allen, L.K.; Kyle, K.; McNamara, D.S. Analyzing Discourse Processing Using a Simple Natural Language Processing Tool. Discourse Process. 2014, 51, 511–534. [Google Scholar] [CrossRef]
- Das Swain, V.; Chen, V.; Mishra, S.; Mattingly, S.M.; Abowd, G.D.; De Choudhury, M. Semantic Gap in Predicting Mental Wellbeing through Passive Sensing. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22), New Orleans, LA, USA, 3–5 May 2022; Association for Computing Machinery: New York, NY, USA, 2022. [Google Scholar] [CrossRef]
- Sun, J. Jieba Chinese Word Segmentation Tool; ACM: New York, NY, USA, 2012. [Google Scholar]
- Loria, S.; Keen, P.; Honnibal, M.; Yankovsky, R.; Karesh, D.; Dempsey, E.; Childs, W.; Schnurr, J.; Qalieh, A.; Ragnarsson, L.; et al. TextBlob: Simplified Text Processing. 2013. Available online: https://textblob.readthedocs.io/en/dev/ (accessed on 10 December 2023).
- Marcus, M.P.; Santorini, B.; Marcinkiewicz, M.A. Building a Large Annotated Corpus of English: The Penn Treebank. Comput. Linguist. 1993, 19, 313–330. [Google Scholar]
- Fast, E.; Chen, B.; Bernstein, M.S. Empath. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems ACM, San Jose, CA, USA, 7–12 May 2016. [Google Scholar] [CrossRef]
- Zubiaga, A. TF-CR: Weighting Embeddings for Text Classification. ar** age and gender predictive lexica over social media. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014. [Google Scholar]
- Wang, Z.; Hale, S.; Adelani, D.I.; Grabowicz, P.; Hartman, T.; Flöck, F.; Jurgens, D. Demographic Inference and Representative Population Estimates from Multilingual Social Media Data. In Proceedings of the The World Wide Web Conference ACM, Amsterdam, The Netherlands, 21–23 October 2019. [Google Scholar] [CrossRef]
- The International Business Machines Corporation (IBM). IBM Watson Natural Language Understanding. 2021. Available online: https://www.ibm.com/products/natural-language-understanding (accessed on 20 September 2023).
- Mehta, Y.; Fatehi, S.; Kazameini, A.; Stachl, C.; Cambria, E.; Eetemadi, S. Bottom-Up and Top-Down: Predicting Personality with Psycholinguistic and Language Model Features. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; pp. 1184–1189. [Google Scholar] [CrossRef]
- Sun, B.; Li, L.; Wu, X.; Zuo, T.; Chen, Y.; Zhou, G.; He, J.; Zhu, X. Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild. J. Multimodal User Interfaces 2016, 10, 125–137. [Google Scholar] [CrossRef]
- Arevalo, J.; Solorio, T.; Montes-y Gómez, M.; González, F.A. Gated multimodal networks. Neural Comput. Appl. 2020, 32, 10209–10228. [Google Scholar] [CrossRef]
- Kim, J.H.; On, K.W.; Lim, W.; Kim, J.; Ha, J.W.; Zhang, B.T. Hadamard Product for Low-rank Bilinear Pooling. ar**v 2016, ar**v:1610.04325. [Google Scholar] [CrossRef]
- Fukui, A.; Park, D.H.; Yang, D.; Rohrbach, A.; Darrell, T.; Rohrbach, M. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, TX, USA, 1–5 November 2016. [Google Scholar] [CrossRef]
- Liu, Z.; Shen, Y.; Lakshminarasimhan, V.B.; Liang, P.P.; Zadeh, A.B.; Morency, L.P. Efficient Low-rank Multimodal Fusion With Modality-Specific Factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; Association for Computational Linguistics: Toronto, ON, Canada, 2018. [Google Scholar] [CrossRef]
- Yu, Z.; Yu, J.; Fan, J.; Tao, D. Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar] [CrossRef]
- Tan, H.; Bansal, M. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Qingdao, China, 13–17 October 2019; Association for Computational Linguistics: Hong Kong, China, 2019; pp. 5100–5111. [Google Scholar] [CrossRef]
- Zadeh, A.; Liang, P.P.; Mazumder, N.; Poria, S.; Cambria, E.; Morency, L.P. Memory Fusion Network for Multi-View Sequential Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI’18/IAAI’18/EAAI’18), New Orleans, LA, USA, 2–7 February 2018; AAAI Press: Washington, DC, USA, 2018. [Google Scholar]
- Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T.J. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. (Statistical Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef]
- de Jong, S. SIMPLS: An alternative approach to partial least squares regression. Chemom. Intell. Lab. Syst. 1993, 18, 251–263. [Google Scholar] [CrossRef]
- Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the Support of a High-Dimensional Distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef] [PubMed]
- Fokkema, M.; Smits, N.; Zeileis, A.; Hothorn, T.; Kelderman, H. Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees. Behav. Res. Methods 2017, 50, 2016–2034. [Google Scholar] [CrossRef] [PubMed]
- van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. WaveNet: A Generative Model for Raw Audio. ar**v 2016, ar**v:1609.03499. [Google Scholar]
- Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), East Stroudsburg, PA, USA, 7–12 August 2016; Association for Computational Linguistics: Berlin, Germany, 2016; pp. 207–212. [Google Scholar] [CrossRef]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; **ong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical Attention Networks for Document Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; Association for Computational Linguistics: San Diego, CA, USA, 2016; pp. 1480–1489. [Google Scholar] [CrossRef]
- Chen, C.; Breiman, L. Using Random Forest to Learn Imbalanced Data; University of California: Berkeley, CA, USA, 2004. [Google Scholar]
- Zong, W.; Huang, G.B.; Chen, Y. Weighted extreme learning machine for imbalance learning. Neurocomputing 2013, 101, 229–242. [Google Scholar] [CrossRef]
- Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541), Budapest, Hungary, 25–29 July 2004; Volume 2, pp. 985–990. [Google Scholar] [CrossRef]
- Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for Activation Functions. ar**v 2018, ar**v:1710.05941. [Google Scholar]
- Pezeshki, M.; Fan, L.; Brakel, P.; Courville, A.; Bengio, Y. Deconstructing the Ladder Network Architecture. In Proceedings of the 33rd International Conference on Machine Learning (PMLR), New York, NY, USA, 20–22 June 2016; Volume 48, pp. 2368–2376. [Google Scholar]
- Drumond, L.R.; Diaz-Aviles, E.; Schmidt-Thieme, L.; Nejdl, W. Optimizing Multi-Relational Factorization Models for Multiple Target Relations. In Proceedings of the 23rd ACM International Conference on Information and Knowledge Management (CIKM ’14), Shanghai, China, 3–7 November 2014; Association for Computing Machinery: New York, NY, USA, 2014; pp. 191–200. [Google Scholar] [CrossRef]
- Nickel, M.; Tresp, V.; Kriegel, H.P. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning (ICML’11), Bellevue, WA, USA, 28 June–2 July 2011; Omnipress: Madison, WI, USA, 2011; pp. 809–816. [Google Scholar]
- Bader, B.W.; Harshman, R.A.; Kolda, T.G. Temporal Analysis of Semantic Graphs Using ASALSAN. In Proceedings of the Seventh IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, USA, 28–31 October 2007; pp. 33–42. [Google Scholar] [CrossRef]
- Shi, C.; Hu, B.; Zhao, W.; Yu, P. Heterogeneous Information Network Embedding for Recommendation. IEEE Trans. Knowl. Data Eng. 2017, 31, 357–370. [Google Scholar] [CrossRef]
- Milenković, T.; Przulj, N. Uncovering biological network function via graphlet degree signatures. Cancer Inform. 2008, 6, 257–273. [Google Scholar] [CrossRef]
- Gu, S.; Johnson, J.; Faisal, F.E.; Milenković, T. From homogeneous to heterogeneous network alignment via colored graphlets. Sci. Rep. 2017, 8, 12524. [Google Scholar] [CrossRef]
- Perozzi, B.; Al-Rfou, R.; Skiena, S. DeepWalk. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM, New York, NY, USA, 24–27 August 2014. [Google Scholar] [CrossRef]
- Dong, Y.; Chawla, N.V.; Swami, A. Metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’17), Halifax, NS, USA, 13–17 August 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 135–144. [Google Scholar] [CrossRef]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation-Based Anomaly Detection. ACM Trans. Knowl. Discov. Data 2012, 6, 1–39. [Google Scholar] [CrossRef]
- Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying Density-Based Local Outliers. SIGMOD Rec. 2000, 29, 93–104. [Google Scholar] [CrossRef]
- Feasel, K. Connectivity-Based Outlier Factor (COF). In Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python; Apress: Berkeley, CA, USA, 2022; pp. 185–201. [Google Scholar] [CrossRef]
Category | Keywords |
---|---|
Mental disorder | Mental health, mental disorder, mental illness, mental wellness, mental wellbeing |
Method | Artificial intelligence, machine learning, model |
Outcome | Detect, predict, classify, monitor, recognize, identify |
Data source/modality | Social media, text, speech, voice, audio, visual, image, video, smartphone, mobile, wearable, sensor |
ID | Item | RQ |
---|---|---|
I1 | Reference (authors and year) | N/A |
I2 | Title | N/A |
I3 | Mental health disorder investigated | N/A |
I4 | Data collection process | RQ1 |
I5 | Ground truth/data labeling | RQ1 |
I6 | Feature extraction process | RQ2 |
I7 | Feature transformation process if any | RQ2 |
I8 | Feature fusion process | RQ2 |
I9 | Machine learning model | RQ3 |
I10 | Results achieved | N/A |
I11 | Analysis findings if any | N/A |
ID | Criteria | Scoring |
---|---|---|
QC1 | Was there an adequate description of the context in which the research was carried out? | The design, setup, and experimental procedure are adequately (1), partially (0.5), or poorly described (0) |
QC2 | Were the participants representative of the population to which the results will generalize? | The participants fully (1), partially (0.5), or do not (0) represent the stated target population |
QC3 | Was there a control group for comparison? | Control group has (1) or has not (0) been included |
QC4 | Were the measures used in the research relevant for answering the research questions? | Adopted methodology and evaluation methods are fully (1), partially (0.5), or not (0) aligned with research objectives |
QC5 | Were the data collection methods adequately described? | Data collection methods are adequately (1), partially (0.5), or poorly (0) described |
QC6 | Were the data types (continuous, ordinal, categorical) and/or structures (dimensions) explained? | All (1), some (0.5), or none (0) of the data types and structures of various modalities are explained |
QC7 | Were the feature extraction methods adequately described? | Feature extraction methods are adequately (1), partially (0.5), or poorly (0) described |
QC8 | Were the machine learning approaches adequately described? | Machine learning models and architectures are adequately (1), partially (0.5), or poorly (0) described |
QC9 | On a scale of 1–5, how reliable/effective was the machine learning approach? | Effectiveness, reliability and consistency of machine learning approach is well (5), partially (3), or poorly (0) justified through evaluation, analysis and baseline comparison |
QC10 | Was there a clear statement of findings? | Experimental findings are well (1), partially (0.5), or poorly (0) described |
QC11 | Were limitations to the results discussed? | Result limitations are well (1), partially (0.5), or poorly (0) identified |
QC12 | Was the study of value for research or practice? | Research methodology or outcomes well (1), partially (0.5), or poorly (0) contribute valuable findings or application |
Dataset | Description | Mental Health Disorders | Source Category |
---|---|---|---|
Distress Analysis Interview Corpus—Wizard of Oz (DAIC-WOZ) [228] | Video recordings and text transcriptions of interviews conducted by a virtual interviewer on individual participants (used in Audio-Visual Emotion Challenge (AVEC) 2014 [38], 2016 [47], 2017 [238], and 2019 [239]) | Post-traumatic stress disorder (PTSD), depression, anxiety | AV |
Turkish Audio-Visual Bipolar Disorder Corpus [230] | Video recordings of patients during follow-ups in a hospital | Bipolar disorder | AV |
Engagement Arousal Self-Efficacy (EASE) [240] | Video recordings of individuals undergoing self-regulated tasks by interacting with a website | PTSD | AV |
Well-being [241] | Video recordings of conversational interviews conducted by a computer science researcher | Depression, anxiety | AV |
Emotional Audio-Textual Depression Corpus (EATD-Corpus) [73] | Audio responses and text transcripts extracted from student interviews conducted by a virtual interviewer through an application | Depression | AV |
Reddit Self-Reported Depression Diagnosis Corpus (RSDD) [231] | Reddit posts of self-claimed and control users | Depression | SM |
Self-Reported Mental Health Diagnosis Corpus (SMHD) [242] | Twitter posts of users with one or multiple mental health conditions and control users | ADHD, anxiety, autism, bipolar disorder, borderline personality disorder, depression, eating disorder, OCD, PTSD, schizophrenia, seasonal affective disorder | SM |
Multi-modal Getty Image depression and emotion (MGID) dataset [106] | Textual and visual documents from Getty Image with equal amount of depressive and non-depressive samples | Depression | SM |
Sina-Weibo suicidal dataset [243] | Sina microblog posts of suicidal and control users | Suicidal ideation | SM |
Weibo User Depression Detection dataset (WU3D) [112] | Sina microblog posts of depressed candidates and control users, and user information such as nickname, gender and profile description | Depression | SM |
Chinese Microblog depression dataset [244] | Sina microblog posts following the last posts of individuals who have committed suicide | Depression | SM |
eRisk 2016 dataset [245] | Textual posts and comments of depressed and control users from Twitter, MTV’s A Thin Line (ATL) and Reddit | Depression | SM |
eRisk 2018 dataset [246] | Textual posts and comments from Twitter, MTV’s A Thin Line (ATL) and Reddit | Depression, anorexia | SM |
StudentLife [237] | Smartphone sensor data of students from a college | Mental wellbeing, stress, depression | SS |
CrossCheck [205] | Smartphone sensor data of schizophrenia patients | Schizophrenia | SS |
Student Suicidal Ideation and Depression Detection (StudentSADD [100] | Voice recordings and textual responses obtained using smartphone microphones and keyboards | Suicidal ideation, depression | AV, SS |
BiAffect dataset [247] | Keyboard ty** dynamics captured by a mobile application | Depression | SS |
Tesserae dataset [248] | Smartphone and smartwatch sensor data, Bluetooth beacon signals, and Instagram and Twitter data of information workers | Mood, anxiety, stress | SS, WS, SM |
CLPsych 2015 Shared Task dataset [249] | Twitter posts of users who publicly stated a diagnosis of depression or PTSD with corresponding control users of the same estimated gender with the closest estimated age | Depression, PTSD | SM |
multiRedditDep dataset [128] | Reddit images posted by users who posted at least once in the /r/depression forum | Depression | SM |
Fitbit Bring-Your-Own-Device (BYOD) project by “All of Us” research program [250] | Fitbit data (e.g., steps, calories, and active duration), clinical assessments, demographics | Depression, anxiety | WS |
PsycheNet dataset [138] | Social contagion-based dataset containing timelines of Twitter users and those with whom they maintain bidirectional friendships | Depression | SM |
PsycheNet-G dataset [139] | Extends PsycheNet dataset [138] by incorporating users’ social interactions, including bidirectional replies, mentions, and quote-tweets | Depression | SM |
Spanish Twitter Anorexia Nervosa (AN)-related dataset [251] | Tweets posted by users whom clinical experts identified to fall into categories of AN (at early and advanced stages of AN but do not undergo treatment), treatment, recovered, focused control (control users that used AN-related vocabulary), and random control | AN | SM |
Audio-visual depressive language corpus (AViD-Corpus) [37] | Video clips of individuals performing PowerPoint-guided tasks, such as sustained vowel, loud vowel, and smiling vowel phonations, and speaking out loud while solving a task (used in AVEC 2013 [37]) | Depression | AV |
Existing call log dataset [222] | Call and text messaging logs and GPS data collected via mobile application and in-person demographic and mental wellbeing surveys | Mental wellbeing | SS |
Speech dataset [252] | Audio recordings of individuals performing two speech tasks via an external web application and demographics obtained from recruitment platform, Prolific [253] | Anxiety | AV |
Early Mental Health Uncovering (EMU) dataset [104] | Data gathered via a mobile application that collects sensor data (i.e., text and call logs, calendar logs, and GPS), Twitter posts, and audio samples from scripted and unscripted prompts and administers PHQ-9 and GAD-7 questionnaires and demographic (i.e., gender, age, and student status) questions | Depression, anxiety | SS, AV, SM |
Depression Stereotype Threat Call and Text log subset (DepreST-CAT) [105] | Data gathered via modifying the EMU application [104] to collect additional demographic (i.e., gender, age, student status, history of depression treatment, and racial/ethnic identity) and COVID-19 related questions | Depression, anxiety | SS, AV, SM |
D-vlog dataset [92] | YouTube videos with equal amounts of depressed and non-depressed vlogs | Depression | AV |
Modality | Category | Description | Examples |
---|---|---|---|
Audio | Voice | Characteristics of audio signals | Mel-frequency cepstral coefficients (MFCCs), pitch, energy, harmonic-to-noise ratio (HNR), zero-crossing rate (ZCR) |
Speech | Speech characteristics | Utterance, pause, articulation | |
Representations | Extracted from model architectures applied onto audio samples or representations | Features extracted from specific layers of pre-trained deep SoundNet [259] network applied onto audio samples | |
Derived | Derived from other features via computation methods or models | High-level features extracted from long short-term memory (LSTM) [260] model applied onto SoundNet representations to capture temporal information | |
Visual | Subject/object | Presence or features of a person or object | Face appearance, facial landmarks, upper body points |
Representations | Extracted from model architectures applied onto image frames or representations | Features extracted from specific layers of VGG-16 network [261] (pre-trained on ImageNet [262]) applied onto visual frames | |
Emotion-related | Capture emotions associated with facial expressions or image sentiment | Facial action units (FAUs) corresponding to Ekman’s model of six emotions [263], i.e., anger, disgust, fear, joy, sadness, and surprise, or eight basic emotions [264] that additionally include trust, negative and positive | |
Textual | Textual content or labels | Quotes in images identified via optical character recognition (OCR) | |
Color-related | Color information | Hue, saturation, color | |
Image metadata | Image characteristics and format | Width, height, presence of exchangeable image file format (exif) file | |
Derived | Derived from other features via computation methods or models | Fisher vector (FV) encoding [265] of facial landmarks | |
Textual | Linguistic | Language in terms of choice of words and sentence structure | Pronouns, verbs, suicidal keywords |
Sentiment-related | Emotion and sentiment components extracted via sentiment analysis (SA) tools | Valence, arousal and dominance (VAD) ratings | |
Semantic-related | Meaning of texts | Topics and categories describing text content | |
Representations | Vector representations generated using language models | Features extracted from pre-trained Bidirectional Encoder Representations from Transformers (BERT) [266] applied onto texts | |
Derived | Derived from other features via computation methods or models | Features extracted from LSTM with attention mechanism applied onto textual representations to emphasize significant words | |
Social media | Post metadata | Information associated with a social media post | Posting time, likes received |
User metadata | Information associated with a social media user account | Profile description and image, followers, followings | |
Representations | Representations of social network and interactions with other users | Graph network representing each user using a node and connecting two users mutually following each other | |
Derived | Derived from other features via aggregation or encoding | Number of posts made on the weekends | |
Smartphone sensor | Calls and messages | Relating to phone calls and text messaging | Frequency and duration of incoming/outgoing phone calls |
Physical mobility | Inferences from accelerometer, gyroscope, and GPS data | Walking duration, distance traveled | |
Phone interactions | Accessing phone, applications, and keyboards | Duration of phone unlocks, frequency of using specific applications, keystroke transitions | |
Ambient environment | Surrounding illumination and noise | Brightness, human conversations | |
Connectivity | Connections with external devices and environment | Association events with WiFi access points, occurrences of nearby Bluetooth devices | |
Representations | High-level representations of time series sensor data | Features extracted from transformer to capture temporal patterns | |
Derived | Derived from low-level features via computation or aggregation | Average weekly visited location clusters, sleep duration estimated from phone being locked and being stationary in a dark environment at night | |
Wearable sensor | Physical mobility | Inferences related to physical motion and sleep | Number of steps, sleep duration and onset time |
Physiological | Physiological signals | Heart rate, skin temperature | |
Representations | High-level representations of time series sensor data | Features extracted from LSTM applied onto heart rate signals | |
Demographics and Personalities | Demographic | Personal demographic information | Age, gender |
Personality | An individual’s personality | Big 5 personality scores |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khoo, L.S.; Lim, M.K.; Chong, C.Y.; McNaney, R. Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing Approaches. Sensors 2024, 24, 348. https://doi.org/10.3390/s24020348
Khoo LS, Lim MK, Chong CY, McNaney R. Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing Approaches. Sensors. 2024; 24(2):348. https://doi.org/10.3390/s24020348
Chicago/Turabian StyleKhoo, Lin Sze, Mei Kuan Lim, Chun Yong Chong, and Roisin McNaney. 2024. "Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing Approaches" Sensors 24, no. 2: 348. https://doi.org/10.3390/s24020348