tailieunhanh - Báo cáo khoa học: "An Integrated Environment for Computational Linguistics Experimentation"

Several important tendencies have been emerging recently in the NLP community. First of all, work on corpora tends to become the norm, which constitutes a fruitful convergence area between taskdriven, computational approaches and descriptive linguistic ones. On corpora validation becomes more and more important for theoretical models, and the accuracy of these models can be evaluated either with regard to their ability to account for the reality of a given corpus (pursuing descriptive aims), either with regard to their ability to analyse it accurately (pursuing operational aims). . | LinguaStream An Integrated Environment for Computational Linguistics Experimentation Frederik Bilhaut GREYC-CNRS University of Caen fbilhaut@ Antoine Widlocher GREYC-CNRS University of Caen awidloch@ Abstract By presenting the LinguaStream platform we introduce different methodological principles and analysis models which make it possible to build hybrid experimental NLP systems by articulating corpus processing tasks. 1 Introduction Several important tendencies have been emerging recently in the NLP community. First of all work on corpora tends to become the norm which constitutes a fruitful convergence area between task-driven computational approaches and descriptive linguistic ones. On corpora validation becomes more and more important for theoretical models and the accuracy of these models can be evaluated either with regard to their ability to account for the reality of a given corpus pursuing descriptive aims either with regard to their ability to analyse it accurately pursuing operational aims . From this point of view important questions have to be considered regarding which methods should be used in order to project efficiently and accurately linguistic models on corpora. It is indeed less and less appropriate to consider corpora as raw materials to which models and processes could be immediately applicable. On the contrary the multiplicity of approaches would they be lexical syntactical semantic rhetorical or pragmatical would they focus on one of these dimensions or cross them raises questions about how these different levels can be articulated within operational models and how the related processing systems can be assembled applied on a corpus and evaluated within an experimental process. New NLP concerns confirm these needs recent works on automatic discourse structure anal ysis for example regarding thematic structures or rhetorical ones Bilhaut 2005 Widlocher 2004 show that the results obtained from lower-grained .

TỪ KHÓA LIÊN QUAN