1. Introduction
Schizophrenia is a severe mental illness that affects millions of people worldwide and can profoundly impact the affected individual, their families, and society as a whole [
1,
2,
3]. Chronic psychotic symptoms can make it difficult for individuals with the illness to maintain relationships, hold a job, and have a fulfilling life [
3]. Moreover, living with schizophrenia leads to a significantly reduced life expectancy due to a much higher risk of completing suicide and suffering from chronic physical conditions such as cardiovascular diseases or diabetes [
4,
5]. The societal burden of this illness is quite high given the loss of productivity and the substantial costs associated with treating schizophrenia (i.e., hospitalizations, regular healthcare appointments, and medications) [
2,
6]. Most distressing acute symptoms can be substantially reduced using antipsychotic medications; however, up to a third of patients fail to improve, making them resistant to treatment [
7]. These individuals often have a poorer quality of life, experience more frequent hospitalizations, have higher rates of suicide, leading to significantly higher societal costs compared to those who respond appropriately to antipsychotics [
8]. The most effective medication for this condition, clozapine, is not always an option since it has poor tolerability and requires careful monitoring for severe side effects [
9,
10]. Moreover, a significant subset of treatment-resistant patients also fails to respond to clozapine; these are often referred to as being “ultra-resistant” to treatment [
11].
To learn how to cope with their persistent symptoms, patients with treatment-resistant schizophrenia are generally encouraged to undergo psychotherapy [
12]. The most prevalent and distressing symptom is auditory verbal hallucinations (AVH) i.e., hearing voices); therefore, this specific component of schizophrenia is targeted by a few psychotherapeutic approaches [
13,
14]. The most studied and widespread one is cognitive–behavioral therapy, which has been shown to be significantly more effective than a control condition in reducing the frequency and distress associated with AVH in this population [
15]. However, the effect size is only moderate and the symptoms of only a small subset of patients are reduced in a clinically significant manner [
16,
17,
18,
19]. Additionally, according to a recent meta-analysis, CBT for psychosis might have little to no impact on quality of life [
19]. This could be due to the fact that this therapy, largely based on psychoeducation and mindfulness, does not offer the patient an opportunity to practice interacting with their voices and finding new co** strategies under therapeutic supervision. To address this gap, a few novel therapeutic approaches are now focused on having the patient improve their relationship with their voice(s), notably by entering a dialogue with them [
20]. This can be achieved using different techniques such as chairwork (i.e., having the patient take the role of the voice in one chair and then answering them in a different chair), through role-play with the therapist, or by dialoguing with an avatar representing the distressing voice [
20]. Avatar therapy (AT), which was initially developed using an avatar on a 2D screen, has now been adapted to virtual reality (VR), thereby increasing the immersive aspect of psychotherapy [
18,
21,
22]. In this therapy, patients with treatment-resistant schizophrenia are first invited to create and personalize an avatar resembling the mental image that they have of their most distressing hallucination, both in terms of physical appearance and tone of voice. Afterward, patients undergo six to ten one-hour weekly therapeutic sessions which all include approximately 5 to 20 min of dialogue with their avatar in VR. The avatar is animated by the therapist, who is installed in a separate room and has their voice modified in real-time. In addition to role-playing the voice, the therapist also has control over the facial expressions as well as the distance between the avatar and the patient. During the first few sessions, the therapist starts by repeating verbatim what the patient reports that their voice usually says, and mostly uses provocative techniques. For example, the therapist, animating the avatar, might repeat “you are worthless”. However, the avatar gradually opens to the patient and starts using more and more positive techniques [
18,
21,
22]. The different themes addressed during AT have been described in detail in a previous qualitative study by Beaudoin and her team [
23]. Notably, the avatar mainly used techniques that were classified as provocative (e.g., threats, accusations, affirmations of omnipotence) or positive (e.g., reinforcement, empathetic listening). The patients responded in a few different ways: with an emotional response (positive, neutral, or negative), by mentioning beliefs about the voices and/or schizophrenia (e.g., omnipotence, malevolence), self-perceptions (i.e., self-appraisal or self-deprecation), co** mechanisms (e.g., self-affirmation, counterattack), or aspirations (e.g., prevention strategies) [
23,
24].
While previous qualitative studies highlight promising avenues to better comprehend the inner psychotherapeutic processes that might be linked to a positive outcome, it is possible that some elements are underexamined or prone to subjective biases, which are prevalent in such studies [
25]. The use of artificial-intelligence-driven approaches, such as unsupervised machine learning, is an increasingly seen technique in various medical fields in order to derive objective data from several types of textual datasets (and other sources of datasets) in the medical field [
26]. It is a technique in which unlabeled data are used to conduct different types of tasks such as hierarchical learning, data clustering, latent variable modeling, dimensionality reduction (on large datasets), and outlier detection [
27]. A few implementations of such algorithms are found in psychiatry. For example, recent research conducted by Kung et al. (2022) used unsupervised learning to identify qualitative subtypes of depression based on the clinical data from 18,314 patients with depression [
28]. Another recent example is the identification of five subgroups of psychosis amongst 765 individuals suffering from
DSM-IV diagnoses of schizophrenia, bipolar affective disorder (I/II), schizoaffective disorder, schizophreniform disorder, and brief psychotic disorder by using clustering methods: affective, suicidal, high functioning, depressive, and severe psychosis [
29]. In the field of psychotherapy and psychotherapeutic approaches, the latest literature review on the subjective identifies nine studies that used unsupervised machine learning [
30]. Most of these applications were used to perform human-like responses to interact with patients after learning from datasets of multiple interactions derived from thousands of therapy sessions. An example of such application is the development of
ClientBot, by Tanana et al. (2019), which used natural language-processing methods for automated coding rather than human coders to perform interactions with the patients [
31]. To our knowledge, the use of unsupervised machine learning to objectively assess verbatims from AT has never been conducted. Natural-language-processing (a subset of machine learning) approaches for patients suffering from schizophrenia are currently being studied and demonstrate promising avenues to characterize sub-clinical linguistic differences in schizophrenia-spectrum disorders which might be clinically relevant [
32]. Analysis of verbatims using unsupervised learning might therefore provide insights as to different types of interactions taking place during the immersive sessions.
This study’s primary aim was to conduct an unsupervised machine-learning analysis of verbatims of treatment-resistant schizophrenia patients that had followed AT. The second aim of the study was to compare the data clusters obtained by the unsupervised machine-learning analysis with the main themes identified by Beaudoin et al. (2021) through human-driven qualitative analysis. The hypothesis was that unsupervised machine-learning analysis will provide clusters similar to the main themes identified by Beaudoin et al., while providing insight as to how certain themes might be sub-divided.
3. Results
Vectorization and data reduction were successfully conducted for all the data points of the unlabeled dataset prior to performing clustering. Interactions from 922 text files were identified for the avatar and 1140 text files for the patient.
3.1. Clustering
It can be observed in
Figure 3 that the avatar elbow curve indicates that the optimal number of clusters should be between two and four. Therefore, three clusters were selected as the initiation parameter for the k-means algorithm.
As displayed in
Figure 4, data points were scattered across the three different clusters. The red cluster appeared to have more homogeneous data points, whereas the blue cluster had data that were very far apart and more heterogeneous. In the middle of the graph, there appeared to be no clear delimitation across the three clusters which might have indicated that these data points were not clearly divisible into different clusters. These interactions could likely be susceptible to various diverging interpretations if they were to be qualitatively assessed by human coders.
Examples of verbatims from the different clusters can be found below (translated from Canadian French to English):
Blue cluster:
“You are supposed to let me win.”
“They are right, you are the one that stole it.”
“I don’t believe you; you can’t be right.”
Green cluster:
“Do you believe in yourself?”
“Maybe you are becoming crazy? Are you?”
“Let’s make peace.”
Red cluster:
“How will you do it? What is it that you will do?”
“Do you want me to stay? Should I leave?”
“What could they do for you?”
As depicted in
Figure 5, the patient elbow curve indicated that the optimal number of clusters should be around four. Therefore, four clusters were selected as the initiation parameter for the k-means algorithm.
As displayed in
Figure 6, data points were scattered across the four different clusters. The yellow and green clusters appeared to overlap, whereas the red and blue clusters were well delimited from all the other clusters. This indicated that some interactions clearly belonged together, whereas it was difficult to discriminate between interactions belonging to the yellow and the green clusters. The green cluster had very homogenous interactions, whereas the blue and red clusters were heterogeneous.
Examples of verbatims from the different clusters can be found below.
Blue cluster:
“I have weaknesses.”
“You are right, I need to call my mother. It is important that I call her very soon.”
“Yes, it is a fact. I’m not a good person.”
Yellow cluster:
“I’d like you to stop talking to me.”
“No, you can’t. You are not allowed to do this to me.”
“I would like you to give me positive energy and please stop trying to destroy me all the time.”
Green cluster:
“Life is great, my friend.”
“You are not so much in my head anymore.”
“This week you left me alone. I like that.”
Red cluster:
“I’ll confront you and tell you that you are very ill.”
“I need to stop playing slot machines.”
“The doctor is hel** me. He is my ally. He is telling me what to do.”
3.2. Comparison with Previously Labeled Data
Cross-labeling of the unlabeled dataset with the labeled dataset was conducted.
Table 1 presents the original division of the text files and their classification (labels) for the labeled dataset, whereas
Table 2 displays textual entities’ map** and their classification per the unlabeled dataset. Compared to Beaudoin et al. (2021), the clustering analysis identified three clusters (labels) for the avatar interactions and four clusters for the patient interactions.
With the map** of the labels on the clustering data, it can be observed for the avatar that some of the confrontational techniques appear to have been shared across the blue and green clusters. In contrast, the positive techniques were mostly spread across the green and the red clusters. Clustering highlights the heterogeneity of the interaction across these categories previously defined as confrontational techniques and positive techniques.
As for the patient, most of the interactions previously defined as per the five labels appear to have been clustered into the green and yellow clusters, especially emotional responses in the yellow cluster. They mostly regroup interactions that were previously classified as co** mechanisms, aspirations, and beliefs about voices and schizophrenia. The blue and red clusters appear to regroup interactions that were mainly scattered across the previously defined labels. Interactions previously labeled as co** mechanisms appear to be less present in the blue cluster, whereas they were more prevalent in the red cluster. The opposite classification can be observed with the interactions previously labeled as aspirations.
4. Discussion
The main goal of this study was to conduct unsupervised machine-learning analysis verbatims of treatment-resistant schizophrenia patients that had followed AT. This was done by vectorizing textual interactions of the avatar and the patient during immersive sessions of AT, reducing the complexity of the data, and performing a cluster classification of unlabeled data. That enabled the identification of three clusters for the avatar’s interactions and four clusters for the patient’s interactions. These unlabeled clustered data were then remapped as per the previous qualitative study on the same verbatims Beaudoin et al.
It was possible to observe three distinct clusters for the avatar interactions. Considering the variety of potential interactions that the therapist must employ during the immersive session to personalize the experience for each patient, it is possible that this provides a distinction between the
confrontational techniques classified in the blue cluster and those in the red cluster. As indicated in O’Brien et al. (2021), in AT, the therapist must consider a formulation to inform the direction of the therapy as well as quickly responding as the characterized avatar [
42]. Several studies outline the use of direct confrontation in psychotherapy as well as empathetic confrontation. Empathic confrontation is often observed as part of schema therapy to address patients’ maladaptive behaviors and it also serves emotional regulation [
43]. On the other hand, direct confrontation can be seen in AT to provoke the patient by mimicking their experience with AVH. Since these two techniques differ in terms of interactions and delivery, this might explain the division of these interactions between two clusters. As for
positive techniques, they were also scattered across two clusters (red and green). A previous study on integrative psychotherapy indicated that therapeutic alliance had the most evidence as a predictor of patient change [
44]. One challenge of AT is, therefore, to bring forward the personification of the AVH while maintaining the therapeutic alliance and inducing positive changes, which may imply different types of positive techniques. In CBT, psychotherapeutic approaches for patients with schizophrenia include the development of trust, normalization, co** strategy enhancement, and reality testing. In the green cluster, some of the interactions previously classified as
positive techniques appear to partly include elements of
confrontational techniques. That might be part of reality testing, which might appear confrontational while potentially assessing the self-perceptions or beliefs of the patient about their AVH.
The patient interactions were classified into four different clusters, meaning that the interactions might have been less heterogenous than what was found in previous qualitative study on the same dataset [
23]. The blue cluster contained very few interactions, which suggests there were outliers in the interactions between the patients and their avatars. A recent study assessing 499 language samples with a natural language processing algorithm on patients with schizophrenia or bipolar disorder outlined that sociodemographic and individual differences should be considered while conducting language analysis for psychosis [
45]. These, as well as relationships with others, were not specifically captured with the previously conducted qualitative analysis. That might account for the outliers identified in the blue cluster, but further analyses should be conducted. Most of the
co** mechanism interactions were found in the green cluster, whereas the
emotional responses were found in the yellow cluster while these two clusters seemed to intersect. That is not surprising considering that co** mechanisms and emotional responses are two strong components of psychotherapeutic approaches and are often tied together, considering that co** mechanisms involuntarily manifest when strong emotions are involved [
46]. The overlap of these two clusters might therefore indicate that interactions reflecting co** mechanisms could be further detailed as per other characteristics. For example, co** mechanisms found in the green cluster might be more tied to
aspirations and
beliefs about voices and schizophrenia, whereas the ones found in the yellow cluster might be more tied
to emotional responses.
Limitations
While using the k-means algorithm enabled clustering, a larger dataset would have been preferred to account for the errors linked to the centroids of the clusters being dragged by interactions that are outliers. It should be noted that the transcripts examined in our research were typed in Canadian French, and locating vectorizers which included stopwords was not possible to for that language. As insignificant terms can be considered part of a word vector, the accuracy may have been impacted. Another limitation was the small sample of patients involved in the presented study as it affected the generalizability of the study considering that the interactions identified were part of a small number of participants.