tailieunhanh - Báo cáo khoa học: "Structuring E-Commerce Inventory"
Large e-commerce enterprises feature millions of items entered daily by a large variety of sellers. While some sellers provide rich, structured descriptions of their items, a vast majority of them provide unstructured natural language descriptions. In the paper we present a 2 steps method for structuring items into descriptive properties. | Structuring E-Commerce Inventory Karin Mauge eBay Research Labs 2145 Hamilton Avenue San Jose CA 95125 kmauge@ Khash Rohanimanesh eBay Research Labs 2145 Hamilton Avenue San Jose CA 95125 krohanimanesh@ Jean-David Ruvini eBay Research Labs 2145 Hamilton Avenue San Jose CA 95125 jruvini@ Abstract Large e-commerce enterprises feature millions of items entered daily by a large variety of sellers. While some sellers provide rich structured descriptions of their items a vast majority of them provide unstructured natural language descriptions. In the paper we present a 2 steps method for structuring items into descriptive properties. The first step consists in unsupervised property discovery and extraction. The second step involves supervised property synonym discovery using a maximum entropy based clustering algorithm. We evaluate our method on a year worth of ecommerce data and show that it achieves excellent precision with good recall. 1 Introduction Online commerce has gained a lot of popularity over the past decade. Large on-line C2C marketplaces like eBay and Amazon feature a very large and long-tail inventory with millions of items product offers entered into the marketplace every day by a large variety of sellers. While some sellers generally large professional ones provide rich structured description of their products using schemas or via a global trade item number the vast majority only provide unstructured natural language descriptions. To manage items effectively and provide the best user experience it is critical for these marketplaces to structure their inventory into descriptive namevalue pairs called properties and ensure that items of the same kind digital cameras for instance are described using a unique set of property names 805 brand model zoom resolution etc. and values. For example this is important for measuring item similarity and complementarity in merchandising providing faceted navigation and various business .
đang nạp các trang xem trước