Robots face a lot of uncertainty about what users want them to do from their speech, and they also face difficulty in knowing when they’ve made their own goals common knowledge with the user. We recently presented a model as to how to best make a robot’s goal, and uncertainty about the user’s goal, common knowledge with the user at the 2017 Conference on Human-Robot Interaction (HRI2017) main conference:
Julian Hough and David Schlangen. It’s Not What You Do, It’s How You Do It: Grounding Uncertainty for a Simple Robot
and in the HRI workshop on intentions:
Julian Hough and David Schlangen. A Model of Continuous Intention Grounding for HRI.
Any comments on this are welcome.
2016 was a successful year for the DSG, where we had acceptances for papers in multiple international venues- see our 2016 publications.
We also had 2 ‘best paper’ awards, one at INLG, and one at the EISE workshop at ICMI:
Sina Zarrieß and David Schlangen. Towards Generating Colour Terms for Referents in Photographs: Prefer the Expected or the Unexpected?
In: Proceedings of the 9th International Natural Language Generation conference. Edinburgh, UK: Association for Computational Linguistics: 246–255.
Birte Carlmeyer, David Schlangen, Britta Wrede. Exploring self-interruptions as a strategy for regaining the attention of distracted users
In: Proceedings of the 1st Workshop on Embodied Interaction with Smart Environments – EISE ’16. Association for Computing Machinery (ACM).
Check out the papers and let us know your thoughts in comments!
We’re going to the 10th edition of the Language Resources and Evaluation Conference (LREC) in Portorož (Slovenia) this month presenting the following two papers:
Sina Zarrieß, Julian Hough, Casey Kennington, Ramesh Manuvinakurike, David DeVault, Raquel Fernández, and David Schlangen. PentoRef: A Corpus of Spoken References in Task-oriented Dialogues (PUB https://pub.uni-bielefeld.de/publication/2903076)
Julian Hough, Ye Tian, Laura de Ruiter, Simon Betz, David Schlangen and Jonathan Ginzburg. DUEL: A Multi-lingual Multimodal Dialogue Corpus for Disfluency, Exclamations and Laughter (PUB: https://pub.uni-bielefeld.de/publication/2903080)
It should be a lot of fun to interact with the linguistic resources community and share the cool stuff we’ve built up over the last few years!
Feel free to ask us (see the corresponding authors’ email addresses at the top of the papers) if you’re interested or need help getting the data.
We had two papers accepted to the Interspeech Conference in Dresden in August:
Title: Micro-Structure of Disfluencies: Basics for Conversational Speech Synthesis
Authors: Simon Betz, Petra Wagner and David Schlangen
Abstract: Incremental dialogue systems can produce fast responses and can interact in a human-like fashion. However, these systems occasionally produce erroneous material or run out of things to say. Humans in such situations use disfluencies to remedy their ongoing production and signal this to the listener. We devised a new model for inserting disfluencies into synthesis and evaluated this approach in a perception test. It showed that lengthenings and silent pauses can be built for speech synthesis with low effort and high output quality. Synthesized word fragments and filled pauses, while potentially useful in incremental dialogue systems, appear more difficult to handle for listeners. While we were able to get consistently high ratings for certain types of disfluencies, the need for more basic research on their micro structure became apparent in order to be able to synthesize the fine phonetic detail of disfluencies. For this, we analysed corpus data with regard to distributional and durational aspects of lengthenings, word fragments and pauses. Based on these natural speaking strategies, we explored further to what extent speech can be delayed using disfluency strategies, and how to handle difficult disfluency elements by determining the appropriate amount of durational variation applicable.
Title: Recurrent Neural Networks for Incremental Disfluency Detection
Authors: Julian Hough and David Schlangen
Abstract: For dialogue systems to become robust, they must be able to detect disfluencies accurately and with minimal latency. To meet this challenge, here we frame incremental disfluency detection as a word-by-word tagging task and, following their recent success in Spoken Language Understanding tasks, we test the performance of Recurrent Neural Networks (RNNs). We experiment with different inputs for RNNs to explore the effect of context on their ability to detect edit terms and repair disfluencies effectively, and also experiment with different tagging schemes. Although not eclipsing the state of the art in terms of utterance-final performance, RNNs achieve good detection results, requiring no feature engineering and using simple input vectors representing the incoming utterance as their training input. Furthermore, RNNs show very good incremental properties with low latency and very good output stability, surpassing previously reported results in these measures.
Last month, DUEL (“Disfluencies, exclamations and laughter in dialogue”), a joint project between the Bielefeld DSG and Université Paris Diderot (Paris 7) launched in Paris.
The project aims to investigate how and why people’s talk is filled with disfluent material such as filled pauses (`”um”, “uh”), repairs (e.g. “I, uh, I really want to go”..), exclamations such as “oops” and laughter of all different kinds, from the chortle to the titter.
Traditionally in theoretical linguistics, such phenomena are rendered outside of the human linguistic faculty, an opinion held since the dawn of the modern field, particularly owing to Chomsky’s early performance and competence distinction (Chomsky, 1965). However, as Jonathan Ginzburg and our own group head David Schlangen claim in their recent paper, disfluency is analogous to friction in physics: while an idealized theory of language can do without it, one that purports to model what actually happens in dialogue cannot throw these frequent phenomena aside.
The project aims to investigate the interactive contribution of the disfluency and laughter that fill our every conversation through a three-fold attack: empirical observation, theory building and, of course, dialogue system implementation. The project aims to investigate how the phenomena vary across languages and use the insights gained from data analyses and formal modelling to incorporate them into the interpretation and generation of a working spoken dialogue system. We aim to build a system that can be disfluent in a natural way, and is also capable of interactionally appropriate laughter when interacting with users. These are milestones for moving towards more natural spoken conversations between humans and machines, which despite the recent questionable press claiming this has recently leaped forward, is still a far-from-solved problem.
You can follow the progress of the DUEL project on its new website. Which- uh, I mean, haha, watch this space..