tailieunhanh - Báo cáo khoa học: "Digesting Virtual “Geek” Culture: The Summarization of Technical Internet Relay Chats"

This paper describes a summarization system for technical chats and emails on the Linux kernel. To reflect the complexity and sophistication of the discussions, they are clustered according to subtopic structure on the sub-message level, and immediate responding pairs are identified through machine learning methods. A resulting summary consists of one or more mini-summaries, each on a subtopic from the discussion. | Digesting Virtual Geek Culture The Summarization of Technical Internet Relay Chats Liang Zhou and Eduard Hovy University of Southern California Information Sciences Institute 4676 Admiralty Way Marina del Rey CA 90292-6695 liangz hovy @ Abstract This paper describes a summarization system for technical chats and emails on the Linux kernel. To reflect the complexity and sophistication of the discussions they are clustered according to subtopic structure on the sub-message level and immediate responding pairs are identified through machine learning methods. A resulting summary consists of one or more mini-summaries each on a subtopic from the discussion. 1 Introduction The availability of many chat forums reflects the formation of globally dispersed virtual communities. From them we select the very active and growing movement of Open Source Software OSS development. Working together in a virtual community in non-collocated environments OSS developers communicate and collaborate using a wide range of web-based tools including Internet Relay Chat IRC electronic mailing lists and more Elliott and Scacchi 2004 . In contrast to conventional instant message chats IRCs convey engaging and focused discussions on collaborative software development. Even though all OSS participants are technically savvy individually summaries of IRC content are necessary within a virtual organization both as a resource and an organizational memory of activities Ackerman and Halverson 2000 . They are regularly produced manually by volunteers. These summaries can be used for analyzing the impact of virtual social interactions and virtual organizational culture on software product development. The emergence of email thread discussions and chat logs as a major information source has prompted increased interest in thread summarization within the Natural Language Processing NLP community. One might assume a smooth transition from text-based summarization to email and chat-based .