Demo of Words-as-Classifiers Model of Reference Resolution

The short video below shows how the Words-as-Classifiers (WAC) [2,3] model of reference resolution (RR) resolves verbal references made to an object that is visually present and tangible.

The setup is as follows: several Pentomino puzzle tiles (i.e., geometric shapes) are on a table. A camera placed above them. The video is fed to computer vision processing software which segments the objects and provides a set of low-level features of each object to the WAC module of RR as implemented as a module in InproTK [4]. Another module uses Kaldi to recognize speech through a microphone and that output is also fed to the WAC module. Using the visual features and the speech, the WAC module determines which object is being referred by producing a probability distribution over the objects, where the object with the highest probability is chosen as the referred object.

The module works incrementally, i.e., it processes word for word as they are recognized. The WAC model is trained on examples of similar interactions (i.e., geometric pentomino tiles and corresponding referring expressions).

This demo corresponds to the demo presented by Soledad Lopez at the SEMDial Conference which was held in Gothenburg, Sweden in 2015 [1]. Also part of that demo, but not showcased here, was the ability for the module to speak some simple utterances including confirmation if the object selected by WAC was the intended one.

References

[1] Kennington, C., Lopez Gambino, M. S., & Schlangen, D. (2015). Real-world Reference Game using the Words-as-Classifiers Model of Reference Resolution. In Proceedings of SemDial 2015 (pp. 188–189).

[2] Kennington, C., & Schlangen, D. (2015). Simple Learning and Compositional Application of Perceptually Grounded Word Meanings for Incremental Reference Resolution. In Proceedings of ACL. Beijing, China: Association for Computational Linguistics.

[3] Kennington, C., Dia, L., & Schlangen, D. (2015). A Discriminative Model for Perceptually-Grounded Incremental Reference Resolution. In Proceedings of IWCS. Association for Computational Linguistics.

[4] Baumann, T., & Schlangen, D. (2012). The InproTK 2012 Release. In Proceedings of NAACL-HLT.