tailieunhanh - Báo cáo khoa học: "Demonstration of the UAM CorpusTool for text and image annotation"
This paper introduced the main features of the UAM CorpusTool, software for human and semi-automatic annotation of text and images. The demonstration will show how to set up an annotation project, how to annotate text files at multiple annotation levels, how to automatically assign tags to segments matching lexical patterns, and how to perform crosslayer searches of the corpus. | Demonstration of the UAM CorpusTool for text and image annotation Mick O Donnell Escuela Politécnica Superior Universidad Autónoma de Madrid 28049 Cantoblanco Madrid Spain Abstract This paper introduced the main features of the UAM CorpusTool software for human and semi-automatic annotation of text and images. The demonstration will show how to set up an annotation project how to annotate text files at multiple annotation levels how to automatically assign tags to segments matching lexical patterns and how to perform crosslayer searches of the corpus. 1 Introduction In the last 20 years a number of tools have been developed to facilitate the human annotation of text. These have been necessary where software for automatic annotation has not been available . for linguistic patterns which are not easily identified by machine or for languages without sufficient linguistic resources. The vast majority of these annotation tools have been developed for particular projects and have thus not been readily adaptable to different annotation problems. Often the annotation scheme has been built into the software or the software has been limited in that they allow only certain types of annotation to take place. A small number of systems have however been developed to be general purpose text annotation systems . MMAX-2 Muller and Strube 2006 GATE Cunningham et al 2002 WordFreak Morton and LaCivita 2003 and Knowtator Ogren 2006 . With the exception of the last of these however these systems are generally aimed at technically advanced users. WordFreak for instance requires writing of Java code to adapt to a different annotation scheme. Users of MMAX-2 need to edit XML by hand to provide annotation schemes. Gate allows editing of annotation schemes within the tool but it is a very complex system and lacks clear documentation to help the novice user become competent. The UAM CorpusTool is a text annotation tool primarily aimed at the linguist or .
đang nạp các trang xem trước