tailieunhanh - Data Analysis Machine Learning and Applications Episode 3 Part 6

Tham khảo tài liệu 'data analysis machine learning and applications episode 3 part 6', kỹ thuật - công nghệ, cơ khí - chế tạo máy phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả | Quantitative Text Analysis Using L- F- and T-Segments 639 Table 1. Text numbers in the corpus with respect to genre and author Brentano Goethe Rilke Schnitzler E poetry 10 10 10 - 30 prose 2 9 10 15 36 3 Distribution of segment types Starting from the hypothesis that L- F- and T-segments are not only units which are easily defined and easy to determine but also posses a certain psychological reality . that they play a role in the process of text generation it seems plausible to assume that these units display a lawful distributional behaviour similar to the well-known linguistic units such as words or syntactic constructions . Kohler 1999 . A first confirmation - however on data from only a single Russian text - was found in Kohler 2007 . A corresponding test on the data of the present study corroborates the hypothesis. Each of the 66 texts shows a rank-frequency distribution of the 3 kinds of segment patterns according to the Zipf-Mandelbrot distribution which was fitted to the data in the following form Px n a x 1 2 3 . n a E R b 1 n E N n F n i a 1 Figure 1 shows the fit of this distribution to the data of one of the texts on the basis of Fig. 1. Rank-Frequency Distribution of L-Segments L-segments on a log-log scale. In this case the goodness-of-fit test yielded P 2 640 Reinhard Kohler and Sven Naumann with 92 degrees of freedom. N 941 L-segments were found in the text forming xmax 112 different patterns. Similar results were obtained for all three kinds of segments and all texts. Various experiments with the frequency distributions show promising differences between authors and genres. However these differences alone do not yet allow for a crisp discrimination. 4 Length distribution of L-segments As a consequence of our general hypothesis not only the segment types but also the length of the segments should follow lawful patterns. Here we study the distribution of L-segment length. First a theoretical model is set up on the basis of three .

TỪ KHÓA LIÊN QUAN