tailieunhanh - Báo cáo khoa học: "Private Access to Phrase Tables for Statistical Machine Translation"

Some Statistical Machine Translation systems never see the light because the owner of the appropriate training data cannot release them, and the potential user of the system cannot disclose what should be translated. We propose a simple and practical encryption-based method addressing this barrier. | Private Access to Phrase Tables for Statistical Machine Translation Nicola Cancedda Xerox Research Centre Europe 6 chemin de Maupertuis 38240 Meylan France Abstract Some Statistical Machine Translation systems never see the light because the owner of the appropriate training data cannot release them and the potential user of the system cannot disclose what should be translated. We propose a simple and practical encryption-based method addressing this barrier. 1 Introduction It is generally taken for granted that whoever is deploying a Statistical Machine Translation SMT system has unrestricted rights to access and use the parallel data required for its training. This is not always the case. The ideal resources for training SMT models are Translation Memories TM especially when they are large well maintained coherent in genre and topic and aligned with the application of interest. Such TMs are cherished as valuable assets by their owners who rarely accept to give away wholesale rights to their use. At the same time the prospective user of the SMT system that could be derived from such TM might be subject to confidentiality constraints on the text stream needing translation so that sending out text to translate to an SMT system deployed by the owner of the PT is not an option. We propose an encryption-based method that addresses such conflicting constraints. In this method the owner of the TM generates a Phrase Table PT from it and makes it accessible to the user following a special procedure. An SMT decoder is deployed 23 by the user with all the required resources to operate except the PT1. As a result of following the proposed procedure The user acquires all and only the phrase table entries required to perform the decoding of a specific file thus avoiding complete transfer of the TM to the user The owner of the PT does not learn anything about what is being translated thus satisfying the user s confidentiality constraints The owner

TỪ KHÓA LIÊN QUAN