tailieunhanh - Báo cáo khoa học: " a Movie Dialogue Corpus for Research and Development"
This paper describes Movie-DiC a Movie Dialogue Corpus recently collected for research and development purposes. The collected dataset comprises 132,229 dialogues containing a total of 764,146 turns that have been extracted from 753 movies. Details on how the data collection has been created and how it is structured are provided along with its main statistics and characteristics. | Movie-DiC a Movie Dialogue Corpus for Research and Development Rafael E. Banchs Human Language Technology Institute for Infocomm Research Singapore 138632 rembanchs@ Abstract This paper describes Movie-DiC a Movie Dialogue Corpus recently collected for research and development purposes. The collected dataset comprises 132 229 dialogues containing a total of 764 146 turns that have been extracted from 753 movies. Details on how the data collection has been created and how it is structured are provided along with its main statistics and characteristics. 1 Introduction Data driven applications have proliferated in Computational Linguistics during the last decade. Several factors such as the availability of more powerful computers an almost unlimited storage capacity the availability of large volumes of data in digital format as well as the recent advances in machine learning theory have significantly contributed to such a proliferation. Among the many applications that have benefited from this data-driven boom probably the most representative examples are information retrieval Qin et al. 2008 machine translation Brown et al. 1993 question answering Molla-Aliod and Vicedo 2010 and dialogue systems Rieser and Lemon 2011 . In the specific case of dialogue systems data acquisition can impose some challenges depending on the specific domain and task the dialogue system is targeted for. In some specific domains in which human-human dialogue applications already 203 exists data collection is generally straight forward while in some other cases data design and collection can constitute a complex problem Williams and Young 2003 Zue 2007 Misu et al. 2009 . Depending on the objective being pursued dialogue systems can be grouped into two major categories task-oriented and chat-oriented systems. In the first case the system is required to help the user to accomplish a specific goal or objective Busemann et al. 1997 Stallard 2000 . In the second case the system .
đang nạp các trang xem trước