tailieunhanh - Beginning Regular Expressions 2005 phần 3
Các công cụ biểu thức chính quy cho một chữ số số. Nếu ký tự đầu tiên mà kiểm tra không phải là một số chữ số, nó di chuyển một trong những nhân vật thông qua các chuỗi thử nghiệm và sau đó kiểm tra xem nhân vật đó phù hợp với một chữ số. Nếu không, nó di chuyển một trong những nhân vật và kiểm tra lại. | Character Classes However in PowerGrep the regular expression pattern t-r ight won t compile and produces the error shown in Figure 5-14. Figure 5-14 There is typically no advantage in attempting to use reverse ranges in character classes and I suggest that you avoid using these. A Potential Range Trap Suppose that you want to allow for different separators in dates occurring in a document or set of documents. Among the issues this problem throws up is a possible trap in expressing character ranges. As a first test document we will use shown here 2004-12-31 2001 09 11 2002 04 29 2000 10 19 2005 08 28 2006 09 18 129 Chapter 5 As you can see in this file the dates are in YYYY MM DD format but sometimes the dates use the hyphen as a separator sometimes the forward slash and sometimes the period. Your task is to select all occurrences of sequences of characters that represent dates assume for this example that dates are expressed only using digits and separators and are not expressed using names of months for example . So if you wanted to select all dates whether they use hyphens forward slashes or periods as separators you might try a regular expression pattern like this 20 19 0-9 2 .- 01 0-9 .- 0123 0-9 In the character class .- which you attempt to use to match the separator the sequence of characters period followed by hyphen followed by forward slash is interpreted as the range from the period to the forward slash. However as you can see in the top row of Figure 5-15 the hyphen is U 002D and the period U 002E is the character immediately before the forward slash U 002F . So undesirably the pattern .- specifies a range that contains only the period and forward-slash characters. Figure 5-15 Characters can be expressed using Unicode numeric references. The period is U 002E uppercase A is U 0041. The Windows Character Map shows this syntax for characters if you hover the mouse over characters of interest. .
đang nạp các trang xem trước