tailieunhanh - MANULEX: A grade-level lexical database from French elementary school readers

For evaluation purposes, three distinct training and test- ing splits were generated from the database. The sets were built to ensure that clips from the same video were not used for both training and testing and that the relative proportions of meta tags such as camera position, video quality, motion, etc. were evenly distributed across the training and testing sets. For each action category in our dataset we selected sets of 70 training and 30 testing clips so that they fulfill the 70/30 balance for each meta tag with the added constraint that clips in the training and testing set could not come from the same video file. To this end,. | Behavior Research Methods Instruments Computers 2004 36 1 156-166 MANULEX A grade-level lexical database from French elementary school readers BERNARD LÉTÉ INRP CNRS UMR 6057 and Université de Provence Aix-en-Provence France LILIANE SPRENGER-CHAROLLES CNRS UMR 8606 and Université de Paris 5 Paris France and PASCALE COLÉ CNRS UMR 5105 and Université de Savoie Chambéry France This article presents MANULEX a Web-accessible database that provides grade-level word frequency lists of nonlemmatized and lemmatized words 48 886 and 23 812 entries respectively computed from the million words taken from 54 French elementary school readers. Word frequencies are provided for four levels first grade G1 second grade G2 third to fifth grades G3-5 and all grades G1-5 . The frequencies were computed following the methods described by Carroll Davies and Richman 1971 and Zeno Ivenz Millard and Duvvuri 1995 with four statistics at each level F overall word frequency D index of dispersion across the selected readers U estimated frequency per million words and SFI standard frequency index . The database also provides the number of letters in the word and syntactic category information. MANULEX is intended to be a useful tool for studying language development through the selection of stimuli based on precise frequency norms. Researchers in artificial intelligence can also use it as a source of information on natural language processing to simulate written language acquisition in children. Finally it may serve an educational purpose by providing basic vocabulary lists. This article presents MANULEX 1 the first French linguistic tool that provides grade-based frequency lists of the million words found in first-grade second-grade and third- to fifth-grade French elementary school readers. The database contains 48 886 nonlemmatized entries and 23 812 lemmatized entries. It was compiled to supply the French counterpart to such works on the English language as Carroll Davis and Richman