EmoVoice – Emotional Speech Recognition

EmoVoice is a comprehensive framework for real-time recognition of emotions from acoustic properties of speech (not using word information). Linking output to other applications is easy and thus allows the implementation of prototypes of affective interfaces.

The phonetic analysis largely uses algorithms from the Praat phonetic software and the ESMERALDA environment for speech recognition. Features are based on global statistics derived from pitch, energy, MFCCs, duration, voice quality and spectral information. Currently, two classifiers are integrated into the framework: Naive Bayes as a fast but simple classifier, and Support Vector Machines as a more sophisticated classifier.

Online recognition works as a command line application that outputs to the command line or over a socket using the Open Sound Control (OSC) protocol. The tool reads constantly from the microphone and extracts suitable voice segments by voice activity detection. After feature extraction, each segment is directly assigned an emotion label with the help of a previously trained classifier.

In order to create personalized models EmoVoice comes with a graphical user interface (ModelUI) that let you create your own emotional speech database. Stimuli to elicit emotions can be provided by the interface, for example by reading a set of emotional sentences. We have defined a set of sentences that is loosely based on the Velten mood induction technique (Velten, E. (1968), A laboratory task for induction of mood states, Behavior Research & Therapy, (6):473-482) which should facilitate the real experience of the emotions. However, the sentences can also be personalised so as to help the reader to better immerse into emotional states. This procedure reduces the effort of building a prototypical personalised emotion recogniser to just a few minutes. Of course, also already available emotional speech databases can be used with EmoVoice.

Downloads

[downloads query=”limit=5&category=3&orderby=date&order=DESC” format=”4″ before=”” after=”” wrap=””]

Name

Version

Date

Download URL

Publication

T. Vogt, E. André and N. Bee, “EmoVoice – A framework for online recognition of emotions from voice,” in Proceedings of Workshop on Perception and Interactive Technologies for Speech-Based Systems, 2008. [pdf]

@inproceedings{Vogt:2008,
 author = {Vogt, Thurid and Andr\'{e}, Elisabeth and Bee, Nikolaus},
 title = {EmoVoice -- A Framework for Online Recognition of Emotions from Voice},
 booktitle = {Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems},
 series = {PIT '08},
 year = {2008},
 location = {Kloster Irsee, Germany},
 pages = {188--199},
 publisher = {Springer-Verlag},
 address = {Berlin, Heidelberg},
}

Pipeline

 
<?xml version="1.0" ?>
<pipeline ssi-v="1">
 
	<register>
		<load name="ssiaudio.dll"/>
		<load name="ssiemovoice.dll"/>
		<load name="ssiioput.dll"/>		
		<load name="ssigraphic.dll"/>
		<load name="ssisignal.dll"/>
		<load name="ssimodel.dll"/>
	</register>	
 
	<!-- set framework options -->
	<framework console="true" cpos="400,400,400,400"/>
 
	<!-- set painter options -->
	<painter arrange="true" apos="1,2,0,0,400,800"/>
 
	<!-- sensor -->
	<sensor create="ssi_sensor_Audio" option="audio" scale="false">
		<provider channel="audio" pin="audio"/>		
	</sensor>
 
	<!-- voice activity detection -->
	<transformer create="ssi_feature_SNRatio">		
		<input pin="audio" frame="0.05s"/>		
		<output pin="audio_snr"/>		
	</transformer>	
	<consumer create="ssi_consumer_ZeroEventSender" mindur="1.0" hangin="3" hangout="3" ename="speech">
		<input pin="audio_snr" frame="0.1s"/>		
	</consumer>
 
	<!-- emo voice classifier -->
	<consumer create="ssi_consumer_Classifier" trainer="emovoice">
		<input pin="audio" listen="speech@">
			<transformer create="ssi_feature_EmoVoiceFeat" maj="1"/>
		</input>
	</consumer>
 
	<!-- visualization -->
	<consumer create="ssi_consumer_SignalPainter" name="audio (tr)" type="2">
		<input pin="audio" listen="speech@"/>
	</consumer>
	<consumer create="ssi_consumer_SignalPainter" name="audio" size="10.0" type="2">
		<input pin="audio" frame="0.2s"/>
	</consumer>
 
	<!-- listener -->
	<listener create="ssi_listener_EventMonitor" mpos="400,0,400,400">
		<input listen="@" span="20000" />
	</listener>		
 
</pipeline>