tailieunhanh - Báo cáo khoa học: "Unsupervised Discovery of Rhyme Schemes"

This paper describes an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation. Rhyming corpora could be extremely useful for large-scale statistical analyses of poetic texts. | Unsupervised Discovery of Rhyme Schemes Sravana Reddy Department of Computer Science The University of Chicago Chicago IL 60637 sravana@ Kevin Knight Information Sciences Institute University of Southern California Marina del Rey CA 90292 knight@ Abstract This paper describes an unsupervised language-independent model for finding rhyme schemes in poetry using no prior knowledge about rhyme or pronunciation. 1 Introduction Rhyming stanzas of poetry are characterized by rhyme schemes patterns that specify how the lines in the stanza rhyme with one another. The question we raise in this paper is can we infer the rhyme scheme of a stanza given no information about pronunciations or rhyming relations among words Background A rhyme scheme is represented as a string corresponding to the sequence of lines that comprise the stanza in which rhyming lines are denoted by the same letter. For example the limerick s rhyme scheme is aabba indicating that the 1st 2nd and 5 th lines rhyme as do the the 3rd and 4th. Motivation Automatic rhyme scheme annotation would benefit several research areas including Machine Translation of Poetry There has been a growing interest in translation under constraints of rhyme and meter which requires training on a large amount of annotated poetry data in various languages. Culturomics The field of digital humanities is growing with a focus on statistics to track cultural and literary trends partially spurred by projects like the Google Books Ngrams1 . 1http 77 Rhyming corpora could be extremely useful for large-scale statistical analyses of poetic texts. Historical Linguistics Study of Dialects Rhymes of a word in poetry of a given time period or dialect region provide clues about its pronunciation in that time or dialect a fact that is often taken advantage of by linguists Wyld 1923 . One could automate this task given enough annotated data. An obvious approach to finding rhyme schemes is to use word .