News

/

Feature

Holmusk publishes first known study to leverage novel NLP model on unstructured clinician notes for extraction of multiple symptoms in major depressive disorder

December 11, 2023

NLP-enriched longitudinal healthcare data has the potential to offer significantly enhanced insights into changes in symptom burden over time and lead to the development of larger patient cohorts to better inform real-world clinical studies.

Overview:

Holmusk has published a peer-reviewed paper on novel natural language processing (NLP) methods in Natural Language Processing Journal entitled, “Enhancing pre-trained contextual embeddings with triplet loss as an effective fine-tuning method for extracting clinical features from electronic health record derived mental health clinical notes.” This publication reports on a novel language model, pre-trained and fine-tuned further on real-world clinical mental health data using ‘triplet loss’ as an effective finetuning method as compared to traditional BERT classifier. This model was developed to extract highly contextual information on crucial clinical features in patients with major depressive disorder (MDD) like ‘Anhedonia’ and ‘Suicidality,’ supporting evidence of the observed symptoms in clinical notes. This model has outperformed existing models in mental health in identifying and categorizing symptoms, extracting relevant data, and creating structured data for further analysis.

Impact:

Real-world evidence in mental health is limited due to the scarcity of structured symptom information recorded in the form of measurement scales. The majority of patient information remains locked in unstructured clinical notes in real-world clinical practice. Existing NLP models have not been trained on unstructured clinical data that capture the nuances of different dimensions of mental health (e.g., symptomology, social history, etc.).

This publication discusses how the Holmusk team developed a novel transformer architecture-based NLP model to capture core clinical features of patients with MDD. This model combines insights from existing research with the valuable learnings derived from real-world data annotated by clinical experts, making it robust in handling downstream tasks in real-world settings. This model was trained using "triplet-loss," which, in contrast to conventional sequence classifiers and syntactic models, produced robust, well-trained, specialized embedders across a range of mental disorders. These embedders could better handle downstream tasks when coupled with any simple classifiers.

The model can be further scaled to capture more granular clinical features and time references, such as the history of illness, across multiple mental disorders. NLP-derived insights can contribute to a more comprehensive, easily accessible estimate of patient phenotypes in the real-world, and aids in the generation of real-world evidence in the mental health domain, ultimately benefiting patient care.

To read the full manuscript, visit the journal online.

Deepali Kulkarni, Abhijit Ghosh, Amey Girdhari, Shaomin Liu, L. Alexander Vance, Melissa Unruh, Joydeep Sarkar, Enhancing pre-trained contextual embeddings with triplet loss as an effective fine-tuning method for extracting clinical features from electronic health record derived mental health clinical notes, Natural Language Processing Journal, Volume 6, 2024, 100045, ISSN 2949-7191, https://doi.org/10.1016/j.nlp.2023.100045.

Printer button icon.Share button icon.
Back to top
Contact us