tailieunhanh - Lecture Programming languages (2/e): Chapter 3b - Tucker, Noonan

Chapter 3 - Lexical and syntactic analysis: Lexical analysis. This chapter presents the following content: Example tokens, other sequences, regular expressions, clite lexical syntax, finite state automata, some conventions, lexer code, from design to code,. | Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic sugar causes cancer of the semicolon. A. Perlis Contents Chomsky Hierarchy Lexical Analysis Syntactic Analysis Lexical Analysis Purpose: transform program representation Input: printable Ascii characters Output: tokens Discard: whitespace, comments Defn: A token is a logically cohesive sequence of characters representing a single symbol. Example Tokens Identifiers Literals: 123, , 'x', true Keywords: bool char . Operators: + - * / . Punctuation: ; , ( ) { } Other Sequences Whitespace: space tab Comments // any-char* end-of-line End-of-line End-of-file Why a Separate Phase? Simpler, faster machine model than parser 75% of time spent in lexer for non-optimizing compiler Differences in character sets End of line convention differs Regular Expressions RegExpr Meaning x a character x \x an escaped character, ., \n { name } a reference to a name M | N M or N M N M followed by N M* zero or more occurrences of M RegExpr Meaning M+ One or more occurrences of M M? Zero or one occurrence of M [aeiou] the set of vowels [0-9] the set of digits . Any single character Clite Lexical Syntax Category Definition anyChar [ -~] Letter [a-zA-Z] Digit [0-9] Whitespace [ \t] Eol \n Eof \004 Category Definition Keyword bool | char | else | false | float | if | int | main | true | while Identifier {Letter}({Letter} | {Digit})* integerLit {Digit}+ floatLit {Digit}+\.{Digit}+ charLit ‘{anyChar}’ Category Definition Operator = | || | && | == | != | | >= | + | - | * | / |! | [ | ] Separator : | . | { | } | ( | ) Comment // ({anyChar} | {Whitespace})* {eol} Generators Input: usually regular expression Output: table (slow), code C/C++: Lex, Flex Java: JLex Finite State Automata Set of states: representation – graph nodes Input alphabet + unique end symbol State transition function Labelled (using alphabet) arcs in graph Unique start state One or more final . | Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic sugar causes cancer of the semicolon. A. Perlis Contents Chomsky Hierarchy Lexical Analysis Syntactic Analysis Lexical Analysis Purpose: transform program representation Input: printable Ascii characters Output: tokens Discard: whitespace, comments Defn: A token is a logically cohesive sequence of characters representing a single symbol. Example Tokens Identifiers Literals: 123, , 'x', true Keywords: bool char . Operators: + - * / . Punctuation: ; , ( ) { } Other Sequences Whitespace: space tab Comments // any-char* end-of-line End-of-line End-of-file Why a Separate Phase? Simpler, faster machine model than parser 75% of time spent in lexer for non-optimizing compiler Differences in character sets End of line convention differs Regular Expressions RegExpr Meaning x a character x \x an escaped character, ., \n { name } a reference to a name M | N M or N M N M .

TÀI LIỆU MỚI ĐĂNG
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.