tailieunhanh - Báo cáo khoa học: "A Comprehensive Gold Standard for the Enron Organizational Hierarchy"

Many researchers have attempted to predict the Enron corporate hierarchy from the data. This work, however, has been hampered by a lack of data. We present a new, large, and freely available gold-standard hierarchy. Using our new gold standard, we show that a simple lower bound for social network-based systems outperforms an upper bound on the approach taken by current NLP systems. | A Comprehensive Gold Standard for the Enron Organizational Hierarchy Apoorv Agarwal1 Adinoyi Omuya1 Aaron Harnly2 f Owen Rambow3 ị 1 Department of Computer Science Columbia University New York NY USA 2 Wireless Generation Inc. Brooklyn NY USA 3 Center for Computational Learning Systems Columbia University New York NY USA apoorv@ awo210 8@ faaron@ frambow@ Abstract Many researchers have attempted to predict the Enron corporate hierarchy from the data. This work however has been hampered by a lack of data. We present a new large and freely available gold-standard hierarchy. Using our new gold standard we show that a simple lower bound for social network-based systems outperforms an upper bound on the approach taken by current NLP systems. 1 Introduction Since the release of the Enron email corpus many researchers have attempted to predict the Enron corporate hierarchy from the email data. This work however has been hampered by a lack of data about the organizational hierarchy. Most researchers have used the job titles assembled by Shetty and Adibi 2004 and then have attempted to predict the relative ranking of two people s job titles Rowe et al. 2007 Palus et al. 2011 . A major limitation of the list compiled by Shetty and Adibi 2004 is that it only covers those core employees for whom the complete email inboxes are available in the Enron dataset. However it is also interesting to determine whether we can predict the hierarchy of other employees for whom we only have an incomplete set of emails those that they sent to or received from the core employees . This is difficult in particular because there are dominance relations between two employees such that no email between them is available in the Enron data set. The difficulties with the existing data have meant that researchers have either not performed quantitative analyses Rowe et 161 al. 2007 or have performed them on very small sets for example Bramsen .

TỪ KHÓA LIÊN QUAN