tailieunhanh - Báo cáo khoa học: "French TimeBank: An ISO-TimeML Annotated Reference Corpus
This article presents the main points in the creation of the French TimeBank (Bittar, 2010), a reference corpus annotated according to the ISO-TimeML standard for temporal annotation. A number of improvements were made to the markup language to deal with linguistic phenomena not yet covered by ISO-TimeML, including cross-language modifications and others specific to French. An automatic preannotation system was used to speed up the annotation process. | French TimeBank An ISO-TimeML Annotated Reference Corpus Andre Bittar Alpage Univ. Paris Diderot Pascal Amsili LLF Univ. Paris Diderot amsili@ Pascal Denis Alpage INRIA Laurence Danlos Alpage Univ. Paris Diderot danlos@ Abstract This article presents the main points in the creation of the French TimeBank Bittar 2010 a reference corpus annotated according to the ISO-TimeML standard for temporal annotation. A number of improvements were made to the markup language to deal with linguistic phenomena not yet covered by ISO-TimeML including cross-language modifications and others specific to French. An automatic preannotation system was used to speed up the annotation process. A preliminary evaluation of the methodology adopted for this project yields positive results in terms of data quality and annotation time. 1 Introduction The processing of temporal information events time expressions and relations between these entities is essential for overall comprehension of natural language discourse. Determining the temporal structure of a text can bring added value to numerous NLP applications information extraction Q A systems summarization. . Progress has been made in recent years in the processing of temporal data notably through the ISO-TimeML standard ISO 2008 and the creation of the TimeBank corpus Pustejovsky et al 2006 for English. Here we present the French TimeBank FTiB a corpus for French annotated in ISO-TimeML. We also present the methodology adopted for the creation of this resource which may be generalized to other annotation tasks. We evaluate the effects of our methodology on the quality of the corpus and the time taken in the task. 130 2 ISO-TimeML ISO-TimeML ISO 2008 is a surface-based language for the marking of events EVENT tag and temporal expressions TIMEX3 as well as the realization of the temporal TLINK aspectual ALINK and modal subordination SLINK .
đang nạp các trang xem trước