This paper encompasses two main topics: a broad and general analysis of the issue of performance evaluation of NLP systems and a report on a specific approach developed by the authors and experimented on a sample test case. More precisely, it first presents a brief survey of the major works in the area of NLP systems evaluation. Then, after introducing the notion of the life cycle of an NIP system, it focuses on the concept of performance evaluation and analyzes the scope and the major problems of the investigation. The tools generally used within computer science to assess the quality of a software system are briefly reviewed, and their applicability to the task of evaluation of NLP systems is discussed. Particular attention is devoted to the concepts of efficiency, correctness, reliability, and adequacy, and how all of them basically fail in capturing the peculiar features of performance evaluation of an NLP system is discussed. Two main approaches to performance evaluation are later introduced; namely, black-box- and modelbased, and their most important characteristics are presented. Finally, a specific model for performance evaluation proposed by the authors is illustrated, and the results of an experiment with a sample application are reported. The paper concludes with a discussion on research perspectwes, open problems, and importance of performance evaluation to industrial applications.
Mauri, G., Guida, G. (1986). Evaluation of natural language processing systems: issues and approaches. PROCEEDINGS OF THE IEEE, 74(7), 1026-1035 [10.1109/PROC.1986.13580].
Evaluation of natural language processing systems: issues and approaches
MAURI, GIANCARLO;
1986
Abstract
This paper encompasses two main topics: a broad and general analysis of the issue of performance evaluation of NLP systems and a report on a specific approach developed by the authors and experimented on a sample test case. More precisely, it first presents a brief survey of the major works in the area of NLP systems evaluation. Then, after introducing the notion of the life cycle of an NIP system, it focuses on the concept of performance evaluation and analyzes the scope and the major problems of the investigation. The tools generally used within computer science to assess the quality of a software system are briefly reviewed, and their applicability to the task of evaluation of NLP systems is discussed. Particular attention is devoted to the concepts of efficiency, correctness, reliability, and adequacy, and how all of them basically fail in capturing the peculiar features of performance evaluation of an NLP system is discussed. Two main approaches to performance evaluation are later introduced; namely, black-box- and modelbased, and their most important characteristics are presented. Finally, a specific model for performance evaluation proposed by the authors is illustrated, and the results of an experiment with a sample application are reported. The paper concludes with a discussion on research perspectwes, open problems, and importance of performance evaluation to industrial applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.