Logic & Computation Regular Expression Lecture 1

A regular expression (RE) is a string of symbols that specifies a formal language. It consists of symbols from an alphabet that are “glued” together using the following operations. If R and S are strings in the language: Union: R | S. (R or S) Concatenation: RS (R followed by S) Closure: R* (R repeated zero or more times) Parenthesis: (R), used to specify operations within a string, I.E, (R | S)* For example, A(B | C)D* is a regular expression. As a regular expression, it specifies “A followed by B or C followed by zero or more D’s.” For example, AB, AC, ABD, ACD, ABDDDDDD, and ACDDDDDDD are all accepted strings. Additional notations: 1. A period (.), or the wild card, representing any character in the alphabet. In other words, for the alphabet of lower case letters: . = (a | b | c | d | … | x | y | z). 2. A list or range of symbols (if there is an understood “order” in the alphabet). As a list, [c–f] is all letters between and including c and f. In other words [c–f] = (c | d | e | f), and [4–9] = (4 | 5 | 6 | 7 | 8 | 9). As a range, [abcde] = (a | b | c | d | e) 3. If the symbol ^ is in braces, then it refers to the symbols not in the range. For example, [^4–7] = (0 | 1 | 2 | 3 | 8 | 9). It can also be used on lists: [^agh] is any character that is not a, g, or h. 4. The symbol + means “one or more”. For example, a+ = a(a*). 5. The symbol ? means “zero or one”. For example, colou?r = (colour | color) 6. The combination of symbols {n} means “exactly n of these things”. For example, a{4} = aaaa {m, n} means “between m and n of these things, inclusive”. For example, a{3,5} = (aaa | aaaa | aaaaa) 7. / is used to indicate an escape character, to use when you need a character that is also a notation character in REs: /( /) /* /^ /| /{ /} /[ /[ and of course //