tailieunhanh - Báo cáo khoa học: "Towards A Modular Data Model For Multi-Layer Annotated Corpora"

In this paper we discuss the current methods in the representation of corpora annotated at multiple levels of linguistic organization (so-called multi-level or multi-layer corpora). Taking five approaches which are representative of the current practice in this area, we discuss the commonalities and differences between them focusing on the underlying data models. The goal of the paper is to identify the common concerns in multi-layer corpus representation and processing so as to lay a foundation for a unifying, modular data model. . | Towards A Modular Data Model For Multi-Layer Annotated Corpora Richard Eckart Department of English Linguistics Darmstadt University of Technology 64289 Darmstadt Germany eckart@ Abstract In this paper we discuss the current methods in the representation of corpora annotated at multiple levels of linguistic organization so-called multi-level or multi-layer corpora . Taking five approaches which are representative of the current practice in this area we discuss the commonalities and differences between them focusing on the underlying data models. The goal of the paper is to identify the common concerns in multi-layer corpus representation and processing so as to lay a foundation for a unifying modular data model. 1 Introduction Five approaches to representing multi-layer annotated corpora are reviewed in this paper. These reflect the current practice in the field and show the requirements typically posed on multi-layer corpus applications. Multi-layer annotated corpora keep annotations at different levels of linguistic organization separate from each other. Figure 1 illustrates two annotation layers on a transcription of an audio video signal. One layer contains a functional annotation of a sentence in the transcription. The other contains a phrase structure annotation and Part-of-Speech tags for each word. Layers and signals are coordinated by a common timeline. The motivation for this research is rooted in finding a proper data model for PACE-Ling Sec. . The ultimate goal of our research is to create a modular extensible data model for multilayer annotated corpora. To achieve this we aim to create a data model based on the current state-of-the-art that covers all current requirements and Figure 1 Multi-layer annotation on multi-modal base data then decompose it into exchangeable components. We identify and discuss objects contained in four tiers commonly playing an important role in multilayer corpus scenarios see Fig. 2 medial .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.