tailieunhanh - Báo cáo khoa học: "Using Search Engines for Robust Cross-Domain Named Entity Recognition"

We use search engine results to address a particularly difficult cross-domain language processing task, the adaptation of named entity recognition (NER) from news text to web queries. The key novelty of the method is that we submit a token with context to a search engine and use similar contexts in the search results as additional information for correctly classifying the token. We achieve strong gains in NER performance on news, in-domain and out-of-domain, and on web queries. | Piggyback Using Search Engines for Robust Cross-Domain Named Entity Recognition Stefan Rud Institute for NLP University of Stuttgart Germany Massimiliano Ciaramita Google Research Zurich Switzerland Jens Muller and Hinrich Schutze Institute for NLP University of Stuttgart Germany Abstract We use search engine results to address a particularly difficult cross-domain language processing task the adaptation of named entity recognition NER from news text to web queries. The key novelty of the method is that we submit a token with context to a search engine and use similar contexts in the search results as additional information for correctly classifying the token. We achieve strong gains in NER performance on news in-domain and out-of-domain and on web queries. 1 Introduction As statistical Natural Language Processing NLP matures NLP components are increasingly used in real-world applications. In many cases this means that some form of cross-domain adaptation is necessary because there are distributional differences between the labeled training set that is available and the real-world data in the application. To address this problem we propose a new type of features for NLP data features extracted from search engine results. Our motivation is that search engine results can be viewed as a substitute for the world knowledge that is required in NLP tasks but that can only be extracted from a standard training set or precompiled resources to a limited extent. For example a named entity NE recognizer trained on news text may tag the NE London in an out-of-domain web query like London Klondike gold rush as a location. But if we train the recognizer on features derived from search results for the sentence to be tagged correct classification as person is possible. This is because the search results for London Klondike gold rush contain snippets in which the first name Jack precedes London this is a sure indicator of a last name and hence an NE of type person. We call our .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.