Research

Main → Research

Clinical depression in speech

Depression is a state of low mood and aversion to activity that can affect a person's thoughts, behavior, feelings and sense of well-being. Depressed people can feel sad, anxious, empty, hopeless, helpless, worthless, guilty, irritable or restless. They may lose interest in activities that were once pleasurable, experience loss of appetite or overeating, have problems concentrating, remembering details or making decisions, and may contemplate, attempt or commit suicide. Insomnia, excessive sleeping, fatigue, aches, pains, digestive problems or reduced energy may also be present. Depression is a feature of some psychiatric syndromes such as major depressive disorder but it may also be a normal reaction to life events such as bereavement, a symptom of some bodily ailments or a side effect of some drugs and medical treatments. A number of psychiatric syndromes feature depressed mood as a main symptom. The mood disorders are a group of disorders considered to be primary disturbances of mood. These include major depressive disorder (MDD; commonly called major depression or clinical depression) where a person has at least two weeks of depressed mood or a loss of interest or pleasure in nearly all activities; and dysthymia, a state of chronic depressed mood, the symptoms of which do not meet the severity of a major depressive episode. Another mood disorder, bipolar disorder, features one or more episodes of abnormally elevated mood, cognition and energy levels, but may also involve one or more episodes of depression. When the course of depressive episodes follows a seasonal pattern, the disorder (major depressive disorder, bipolar disorder, etc.) may be described as a seasonal affective disorder. The main goal of our research is to recognize how depression modifies human speech. Physicians often use the indicators “faded”, “slow”, ”monotonous”, “lifeless”, and “metallic” as properties of depressed speech. Our goal is to identify the acoustic-phonetic parameters, separately in segmental and supra-segmental level, that can characterize the speech of depressed people.

Development of Automatic Pathological Speech Recognition and Classification System

It is well known that different disorders and malfunctions in human voice production cause detectable changes in the acoustic parameters of the speech. The main goal of the research is to identify and define protocols which provides an automatic detection of these acoustic parameter changes and differentiate whether it has neurological (Functional Dysphonia, Recurrent Paresis, etc.) or morphological (Vocal tract, throat or tongue Cancer) root causes. Furthermore we aim to design and develop a Medical Decision Support System that can be used by Medical Doctors and Specialists to collect, diagnose and provide early detection of vocal tract diseases in order to increase the prevention.

FORENSICspeech: Forensic Voice analysis using Hungarian follow-up voice database

In forensic voice comparison, a voice sample needs to be associated with the voices of known speakers; similar to the case when a DNA sequence is matched against other known person’s DNA profiles. This necessitates an exact, reproducible and objective procedure, which, if based on statistics, is based both on a database and on a set of suitable tools for the purpose of presenting expert testimony in court or during pre-trial investigation. Therefore, research in the field needs a reliable forensic voice comparison database for Hungarian, as such a database, which meets the requirements for forensic aspects, takes into account phonetic, linguistic and speech technology interests. The planned database should contain more than 120 speakers, voice samples have to be collected through a strict protocol, non-contemporaneous recordings are needed of each speaker, in various speech styles from each speaker. The importance of the planned database is outstanding, because it would give possibility to examine the speaker specific acoustic-phonetic features on one hand, and allow for research on the other hand, for the benefit of the science and the whole society as well, as the planned database and tools developed from it would help the work of police, criminologist and secret service experts. Results will contribute to obtain more accurate and reliable forensic voice comparison methods and systems. In addition, the project would provide an opportunity to develop a new forensic voice comparison system for Hungarian, and fund a forensic research team of young scientists from different background.

Detection of Parkinson's Disease Using Speech

Parkinson’s disease (PD) is one of the most common neurodegenerative disorders. Clinical evidence suggests that most PD patients have some form of speech disorders. In fact, speech can be an early sign of PD. In addition, tremor, rigidity and loss of muscle control is one of the most fundamental symptoms of PD patients. These symptoms may be measured with mobile devices, such as smartphones. The analysis of the recorded signal gives information about the level and lateralization of tremor, which are related to the severity of PD.

The main goal of the project is to identify speech and hand movement features that characterize PD, and to establish an easy-to-use inexpensive multimodal (mobile) software, that is able to monitor the speech and hand movement of a subject in order to diagnose (or give a likelihood to) PD in an early stage of the disease and to monitor the changes of the patient’s conditions. Additionally, the database will contain material from PD patients, who underwent brain surgery in order to implant electrodes to improve their health conditions (such as reduced tremor). The proposed software will be able to monitor the increase in their speech and movement conditions caused by the surgical treatment.

Using the proposed system people can calculate severity scores at home, using inexpensive devices, such as tablets, smartphones. In addition, the diagnostic tool can help medical staff to measure PD severity in follow-up treatments and provides a supplementary tool.

Emotions in speech

Huge effort has been taken in the last decade to understand the operation of the verbal channel of speech. Research into the non-verbal channel has been smaller so far, and its operation is still less understood. Tone, modulation and rhythm changes can be expressed besides semantic content with human speech. These are also appropriate to express the emotional intent, health, mood and speech style of the speaker. Emotion helps to inform us better, even if it is not expressed in words. We examined expression of emotions, automatic classification without taking semantic content into account. The statistical results of investigation of spectral and prosodic acoustic parameters revealed a basis for automatic recognition possibilities. Automatic experiments were carried out in order to classify speech segments into four basic emotion categories with 80% performance. With machine learning algorithms fused with an automatic speech segmentation system we created an emotion recognition engine that can be used in call centers, human-machine interactions.

COALA project (ESA)

The examination of the sensitivity of acoustic-phonetic parameters of speech to hypoxia and to Seasonal Affective Disorder (SAD) and the development of a metric that alert crews at early stage of cognitive dysfunction (Automatic detection) using Concordia Antarctic Station as Human Exploration Analogue.