tailieunhanh - Báo cáo khoa học: "What Makes Evaluation Hard?"
Classes of Users of database query systems V Familiar with the database and its software IV Familiar with the database and the interaction language Ill Familiar with the contents of database II Familiar with the domain of application I Passing knowledge of the domain of application Of course, as users gain experience with a system, they will continually attempt to adapt to its quirks. If the purpose of the evaluation is to demonstrate that the natural language understanding system is merely useable, adaptation resents no problem. . | What Makes Evaluation Hard Harry Tennant PO BOX 225621 M S 371 Texas Instruments Inc. Dallas Texas 75265 THE GOAL OF EVALUATION Ideally an evaluation technique should describe an algorithm that an evaluator could use that would result In a score or a vector of scores that depict the level of performance of the natural language system under test. The scores should mirror the subjective evaluation of the system that a qualified judge would make. The evaluation technique should yield consistent scores for multiple tests of one system and the scores for several systems should serve as a means for comparison among systems. Unfortunately there is no such evaluation technique for natural language understanding systems. In the following sections I will attempt to highlight some of the difficulties PERSPECTIVE OF THE EVALUATION The first problem is to determine who the qualified judge is whose judgements are to be modeled by the evaluation. One view Is that he be an expert In language understanding. As such his primary Interest would be In the linguistic and conceptual coverage of the system. He may attach the greatest weight to the coverage of constructions and concepts which he knows to be difficult to Include in a computer program. Another view of the judge Is that he Is a user of the system. His primary interest Is in whether the system can understand him well enough to satisfy his needs. This judge will put greatest weight on the system s ability to handle his most critical linguistic and conceptual requirements those used most frequently and those which occur Infrequently but must be satisfied. This judge will also want to compare the natural language system to other technologies. Furthermore he may attach strong weight to systems which can be learned quickly or whose use may be easily remembered or which takes time to learn but provides the user with considerable power once It is learned. The characteristics of the judge are not an impediment to evaluation .
đang nạp các trang xem trước