Vinush Vigneswaran

Collaborative Question-guided Visual Storyteller

September 2021 | Posted January 2022 | The University of Edinburgh

" Cronus and the dragons , everything that could , this world with this world was so powerful , it was all of this world . There was no sense in this world or the things that would cause her to listen to him as much as he wanted her to . There was so many times in the past thousand years that he 'd never allowed her to be the cause of this world or the world that existed in this world . There was such an acute sense of foreboding that knocked the heart out of her , and that was the only thing that could ever happen to you . We are the only other vampires in this world , are we , Raven . "

- The AI Storyteller

Abstract

Storytelling is a powerful mechanism for communication that is central to human socialising and information transfer. As such this project introduces a novel collaborative visual storytelling task; Collaborative Question-guided Visual Storytelling (CQ-VS). The task involves the user asking multi-turn questions based on an image, and the model generates a continuation of a coherent narrative as a response, thus the user guides the story according to their interest – imitating human interactions during storytelling. This task is an extension of the popular Visual Question Answering (VQA) task, however, for CQ-VS in order to answer the question, the model must attend to three inputs; the question, the generated story and the image.

This project introduces a benchmark model leveraging state-of-the art VQA systems, using an encoder-decoder story generation model, and a history modelling technique to achieve user interactions. The focus of the project is directed towards the quality of the generated story, the user experience, and the impact of this application on helping users write creative stories. Three automatic metrics are proposed to evaluate fluency, coherence and lexical diversity. Whereby, perplexity is computed using an n-gram language model, to compare fluency of generated text with human-authored text. The coherence metric leverages sentence embeddings generated as part of the story generation model to compute semantic relatedness between sentences. We show that the proposed automatic coherence metric strongly correlates with human judgement rating, with a Spearman Rank Correlation Coefficient of 0.76. Additionally, the analysis of the user study shows great interest in the application, and collective exuberance towards the potential of this novel application. Although, major improvements are required in the technical aspects.

Throughout this report we expose potential improvements in the design and implementation of this system and the user study, to improve the quality and relevance of generated text. The proposed baseline model provides a constructive foundation for future work.

“The value of information does not survive the moment in which it was new. It lives only at that moment; it has to surrender to it completely and explain itself to it without losing any time. A story is different. It does not expend itself. It preserves and concentrates its strength and is capable of releasing it even after a long time.”

– Walter Benjamin

The Complete Dissertation

This project was part of my Master's Dissertation at The University of Edinburgh under the supervision of Pavlos Andreadis.

Collaborative Question-guided Visual Storyteller