Error on Ubuntu 12.04 with InstantPlayer and FaceLab

If you are attempting to use InstantReality’s InstantPlayer as a 3D environment, and you are using Seeing Machines’s FaceLab as an eye tracker (via a FaceLab plugin), then you may have run into the “undefined symbol” problem while running InstantPlayer.

This is a hashed message that can be filtered through “c++filt” (pass the hashed string as an argument), which gives the following:


That is, there is a problem during runtime that the above constructor cannot be found in the dynamic linked library, even if that library is in the correct path. The problem is that when the FaceLab node was compiled, it required the coredata sdk.h file, which pointed to the other .h files. More specifically, during compile the FaceLab plugin for InstantPlayer only had a definition of the constructor and not the constructor itself. During runtime, it never finds the constructor. A fix for this is to do the following.

1. Open and edit the coredata/include/eod/io/sdk.h file
2. Change the #include references for three .h files, namely:
3. Instead, use their .cpp counterparts:
i.e., #include “../src/eod/io/datagramsocket.cpp”
and do the same for the other two files.
4. Recompile the coredata (you may need to run “clean” first):
>make -f Makefile.linux32_ia32 all
This will make the new libraries where the constructor is accessible. This is our proposed fix until the Seeing Machines developers find a better solution.

5 more 2013 papers!

Timo Baumann and David Schlangen, Interactional Adequacy as a Factor in the Perception of Synthesized Speech, to appear in Proceedings of the 8th ISCA Speech Synthesis Workshop, Barcelona, Spain, August 2013


Speaking as part of a conversation is different from reading out aloud.
Speech synthesis systems, however, are typically developed using assumptions
(at least implicitly) that are more true of the latter than the former situation.
We address one particular aspect, which is the assumption that
a fully formulated sentence is available for synthesis.
We have built a system that does not make this assumption
but rather can synthesize speech given incrementally extended input.
In an evaluation experiment, we found that in a dynamic domain
where what is talked about changes quickly,
subjects rated the output of this system as more `naturally pronounced’ than that
of a baseline system that employed standard synthesis,
despite the synthesis quality objectively being degraded.
Our results highlight the importance of considering a synthesizer’s ability to
support interactive use-cases when determining the adequacy of synthesized

Casey Kennington and David Schlangen, Situated Incremental Natural Language Understanding Using Markov Logic Networks, to appear in Computer Speech and Language


We present work on understanding natural language in a situated domain in an incremental, word-by-word fashion.
We explore a set of models specified as Markov Logic Networks and show that a model that has access to information about the visual context during an utterance, its discourse context, the words of the utterance, as well as the linguistic structure of the utterance performs best and is robust to noisy speech input. We explore the incremental properties of the models and offer some analysis. We conclude that MLNs provide a promising framework for specifying such models in a general, possibly domain-independent way.

Casey Kennington, Spyros Kousidis and David Schlangen, Interpreting Situated Dialogue Utterances: an Update Model that Uses Speech, Gaze, and Gesture Information, to appear in Proceedings of SIGdial 2013, Metz, France, August 2013


In situated dialogue, speakers share time and space. We present a statistical model for understanding natural language that works incrementally (i.e., in real, shared time) and is grounded (i.e., links to entities in the shared space). We describe our model with an example, then establish that our model works well on non-situated, telephony application-type utterances, show that it is effective in grounding language in a situated environment, and further show that it can make good use of embodied cues such as gaze and pointing in a fully multi-modal setting.

Spyros Kousidis, Casey Kennington and David Schlangen, Investigating speaker gaze and pointing behaviour in human-computer interaction with the collection, to appear in Proceedings of SIGdial 2013, Metz, France, August 2013


Can speaker gaze and speaker arm movements be used as a practical information source for naturalistic conversational human–computer interfaces? To investigate this question, we’ve recorded (with eye tracking and motion capture) a corpus of interactions with a (wizarded) system. In this paper, we describe the recording and analysis infrastructure that we’ve built for such studies, and the particular analyses we performed on these data.
We find that with some initial calibration, a “minimally invasive”, stationary camera-based setting provides data of sufficient quality to support interaction.

Timo Baumann and David Schlangen, Open-ended, Extensible System Utterances Are Preferred, Even If They Require Filled Pauses, to appear in Proceedings of SIGdial 2013, Metz, France, August 2013


In many environments (e.g. sports commentary), situations incrementally unfold over time and often the future appearance of a relevant event can be predicted, but not in all its details or precise timing. We have built a simulation framework that uses our incremental
speech synthesis component to assemble in a timely manner complex commentary
utterances. In our evaluation, the resulting output is preferred over that from a baseline system that uses a simpler commenting strategy. Even in cases where the incremental
system overcommits temporally and requires a filled pause to wait for
the upcoming event, the system is preferred over the baseline.

PDFs to follow (or, should I forget to add links here, will be available under “Publications” in any case).