Home
Hands On
The following article published at ACM SIGMM Records provides a good starting point for developers who want to know more about SSI. Basic concepts and important functions of the framework are introduced and explained by means of a simple pipeline example.
SSI an Open Source Platform for Social Signal Interpretation (ACM SIGMM RECORDS)
Highlights
- Synchronized reading from multiple sensor devices, e.g. microphone, asio audio interface, web-cam, dv-cam, wiimote, kinect and physiological sensors
- General filter and feature algorithms, such as image processing, signal filtering, frequency analysis and statistical measurements in real-time
- Event-based signal processing to combine and interpret high level information, such as gestures, keywords, or emotional user states
- Pattern recognition and machine learning tools for on-line and off-line processing, including various algorithms for feature selection, clustering and classification
- Patch-based pipeline design (C++-API or easy-to-use XML editor) and a plug-in system to integrate new components
Recording

SSI supports synchronized recordings from different sensor devices. This allows us to capture user behavior during interaction with a software, a virtual agent or some other type of stimuli. In a typical setting we might decide to use separate cameras for body and face, capture speech from a wireless headset and record user movements with one or more wiimotes. In order to track the user’s physiological condition we also like to include sensors to measure skin conductivity and heartrate. Since the user is interacting with a virtual character, we find it useful to capture the screen to later on connect observed user behavior with certain actions of the agents. Finally, we also apply some real-time signal processing, such as face detection and noise filtering, and store the processed signals along with the raw streams.
Training

Once a considerable amount of data has been collected (often including several users recording at different sessions) we are ready to observe the signals. This can be done using a graphical interface that allows us to replay the recorded signals and add description to it. This step is known as annotation. Since the recordings are synchronized we can look for correlations between the signals and share the same annotation between several modalities.
Recognition

After describing the observed behavior in a set of annotation files, we can now extract models that are able to automatically detect and classify the behavior. To do so, we first apply certain filter and feature extraction methods in order to carve out important characteristics of the signals, e.g. in case of audio we might use a compact representation of the frequency spectrum. Second, we present the feature chunks together with the corresponding description to a classifier. Now, it is the task of a classifier to find a good separation between the categories. Finally, we add the classifier to our processing pipeline to classify user behavior in real-time.
Dissemination
SSI has been presented in September 2011 at INTERSPEECH 2011 in Florence in the course of the special event “Speech Processing Tools”. In July 2013 it was chosen as processing platform for the special neuroscience track as part of the Music Hack Day in Barcelona. In October 2013, SSI was accepted for oral presentation at ACM MM Open Source Software Competition in Barcelona and received a “honorable mention” from the competition judges. SSI has been and is used in several EU funded projects, please visit our project page for details.