Automata | Regular Expression
Regular Expressions:
A regular expression can be defined as a language or string accepted by a finite automata. We know that a finite automata consists of tuples (Q, Σ, δ, q0, F ).
Among them that a regular expression is a string on Σ, i.e. it will consist only with input alphabets.
Now , we switch our attention from machine-like descriptions of languages deterministic and nondeterministic finite automata to an algebraic description: the "regular expression." We shall find that regular expressions can define exactly the same languages that the various forms of automata describe: the regular languages, However, regular expressions offer something that automata do not: a declarative way to express the strings we want to accept. Thus regular expressions serve as the input language for many systems that process strings. Examples include:
- Search command such as the UNIX grep or equivalent commands for finding string that one sees in web browsers or text-formatting systems. These systems use a regular-expression-like notation for describing patterns that the user wants to find in a file. Different search systems convert the regular expression into either a DFA ( Deterministic Finite Automata) or an NFA ( Nondeterministic Finite Automata).
- Lexical-analyzer generators, such as Lex or Flex. Recall that a lexical analyzer is the component of a compiler that breaks the source program into logical units(called tokens) of one or more characters that have a shared significance. Examples of tokens include keywords (e.g. while ), identifiers (e.g., any letter followed by zero or more letter and / or digits), and signs, such as + or <=. A lexical-analyzer generator accepts descriptions of the forms of tokens, which are essentially regular expressions, and produces a DFA that recognizes which token appears next on the input.
The Operator Of Regular Expressions:
Regular expressions denote languages. for a simple example, the regular expression 01* + 10* denotes the language consisting of all strings that are either a single 0 followed by any number of 1's or a single 1 followed by any number of a 0's. We do not expect you to know at this point how to interpret regular expressions, so our statement about the language of this expression must be accepted on faith for the moment. We shortly shall define all the symbols used in this expression, so you can see why our interpretation of this regular expression is the correct one. Before describing the regular-expression notation, we need to learn the three operations on languages that the operators of regular expressions represented. These operations are:
- The union of two languages L and M, denoted L ∪ M , is the set of strings that are in either L or M, or both. For example, if L = { 001, 10, 111} and M = { Ñ”, 001 }, then L ∪ M = {Ñ”, 10, 001, 111 }.
- The concatenation of language L and M is the set of strings that an be formed by taking any string in L and concatenating it with any string in M. Recall Section 1.5.2, where we define the concatenation of a pair of strings; one string is followed by the other to from the result of the concatenation. We denote concatenation of language either with a dot or with no operator at all, although the concatenation operator is frequently called "dot." For example, if L = { 001, 10, 111 } and M = (Ñ”, 001 }, then L.M or just LM, is {001, 10, 111, 001001, 10001, 111001 }. The first three strings in LM are the strings in L concatenated with Ñ”. Since Ñ” is the identity for concatenation, the resulting strings are the same as the strings of L. However, the last three strings in LM are formed by taking each string in L and concatenating it with the second string in M, which is 011. For instance, 10 from L concatenated with 001 from M gives us10001 for LM.
Build Regular Expressions:
Algebras of all kinds start with some elementary expressions, usually constants and / or variables. Algebras then allow us to construct more expressions by applying a certain set of operators to these elementary expressions and to previously constructed expressions. Usually, some method of grouping operators with their operands, such as parentheses, is required as well. For instance, the familiar arithmetic algebra starts with constants such as integers and real numbers, plus variables, and builds more complex expressions with arithmetic operators such as + and x.
Use of the star operator |
The algebra of regular expressions follows this pattern, using constants and variables that denote languages, and operators for the three operations of Section 3.1.1 -union, dot, and star. We can describe the regular expression recursively as follows. In this definition, we not only describe what the legal regular expressions are, but for each regular expression E, we describe the language it represents, which we denote L(E).
BAISIS: The basis consists of three parts:
- The constants Ñ” and ∅ are regular expressions, denoting the languages{Ñ”} and ∅ , respectively. That is L(Ñ”) ={Ñ”}, and L(∅) = ∅.
- If a is any symbol, then a is a regular expression. This expression denotes the language {a}. That is , L(a) = {a}. Note that we use boldface font to denote an expression corresponding to a symbol. The corresponding, e.g. that a refers to a, should be obvious.
- A variable, usually capitalized and italic such as L, is a variable, representing any language.
- If E and F are regular expressions, then E+F is a regular expression denoting the union of L(E) and L(F). That is L(E+F) = L(E) ∪ L(F).
- If E and F regular expression, the EF is a regular expression denoting the concatenation of L(E). That is F(EF) = L(E) L(F). Note that the dot can optionally be used to denote the concatenation operator, either as an operation on language or as the operator in a regular expression. For instance, 0.1 is a regular expression meaning the same as 01 and representing The language {01}. However, we shall avoid the dot as concatenation in regular expressions.
- If E is a regular expression, then E* is a regular expression, denoting the closure of L(E). That is, L(E*) = (L(E))*.
- If E is a regular expression, then (E), a parenthesized E, is also a regular expression, denoting the same language as E. formally; L((E)) = L(E).
Precedence Of Regular-Expression Operators:
Like other algebras, the regular-expression operators have an assumed order of "precedence," which means that operators are associated with their operands in a particular order. We are familiar with the notion of precedence from ordinary arithmetic expressions. For instance, we know that xy +z groups the product xy before the sum, so it is equivalent to the parenthesized expression (xy) +z and not to the expression x(y+z). Similarly, we group two of the same not to x-(y-z). For regular expressions, the following is the order of precedence for the operators:
- The star operator is of highest precedence. That is, it applies only to the smallest sequence of symbols to its left that is a well-formed regular expression.
- Next in precedence comes the concatenation or "dot" operator. After grouping all stars to their operands, we group concatenation operator to their operands. That is, all expressions that are juxtaposed ( adjacent, with no intervening operator ) are grouped together.
Automata | Regular Expression
Reviewed by For Learnig
on
May 18, 2023
Rating:
No comments:
If you have any doubts, please tell me know