- The AI Storyteller
Storytelling is a powerful mechanism for communication that is central to human socialising and information transfer. As such this project introduces a novel collaborative visual storytelling task; Collaborative Question-guided Visual Storytelling (CQ-VS). The task involves the user asking multi-turn questions based on an image, and the model generates a continuation of a coherent narrative as a response, thus the user guides the story according to their interest β imitating human interactions during storytelling. This task is an extension of the popular Visual Question Answering (VQA) task, however, for CQ-VS in order to answer the question, the model must attend to three inputs; the question, the generated story and the image.
This project introduces a benchmark model leveraging state-of-the art VQA systems, using an encoder-decoder story generation model, and a history modelling technique to achieve user interactions. The focus of the project is directed towards the quality of the generated story, the user experience, and the impact of this application on helping users write creative stories. Three automatic metrics are proposed to evaluate fluency, coherence and lexical diversity. Whereby, perplexity is computed using an n-gram language model, to compare fluency of generated text with human-authored text. The coherence metric leverages sentence embeddings generated as part of the story generation model to compute semantic relatedness between sentences. We show that the proposed automatic coherence metric strongly correlates with human judgement rating, with a Spearman Rank Correlation Coefficient of 0.76. Additionally, the analysis of the user study shows great interest in the application, and collective exuberance towards the potential of this novel application. Although, major improvements are required in the technical aspects.
Throughout this report we expose potential improvements in the design and implementation of this system and the user study, to improve the quality and relevance of generated text. The proposed baseline model provides a constructive foundation for future work.
βThe value of information does not survive the moment in which it was new. It lives only at that moment; it has to surrender to it completely and explain itself to it without losing any time. A story is different. It does not expend itself. It preserves and concentrates its strength and is capable of releasing it even after a long time.β
β Walter BenjaminThis project was part of my Master's Dissertation at The University of Edinburgh under the supervision of Pavlos Andreadis.