espressione regolare (di un linguaggio)
e' un procedimento applicabile ad ogni stringa, con un risultato binario si/no, interpretato come: la stringa ha la struttura descritta dall'espressione, oppure no.
alfabeto di un'espressione regolare i caratteri con cui viene formulata.
linguaggio di un'espressione regolare l'insieme delle stringhe che la soddisfano.
| regex | matches, generate |
|---|---|
| a|b | a b |
| b* | ε "b" "bb" "bbb" ... |
| b+ | "b" "bb" "bbb" ... |
| a.b | axb x is any character |
| ab*c | "ac" "abc" "abbc" "abbbc" ... |
| ab+c | "abc", "abbc" "abbbc" ... |
| a*bc* | the set of all strings consisting of: arbitrarily many "a"s,
followed by a single "b", followed by arbitrarily many "c"s |
| (ab)* | ε ab abab ababab ... |
| (a|b)(a|b) | aa, ab, ba, bb |
| (a|b)* | ε, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, … |
Most formalisms provide
A vertical bar separates alternatives. For example, gray|grey can match "gray" or "grey".
Parentheses are used to define the scope and precedence of the operators (among other uses). For example, gray|grey and gr(a|e)y are equivalent patterns which both describe the set of "gray" or "grey".
A quantifier after an element (such as a token, character, or group) specifies how many times the preceding element is allowed to repeat.
| ? | question mark indicates 0 or 1 occurrences of the preceding element. ex: colou?r matches both "color" and "colour". |
|---|---|
| * | asterisk (derived from the Kleene star) indicates 0 or more
occurrences of the preceding element.
ex: ab*c matches "ac", "abc", "abbc", "abbbc", and so on. |
| + | plus sign (Kleene plus) indicates 1 or more occurrences of the
preceding element. ex: ab+c matches "abc", "abbc", "abbbc", ... , but not "ac". |
| . | full stop is wildcard ex: a.b matches any string that contains an "a", and then any character and then "b" ex: a.*b matches any string that contains an "a", and then the character "b" at some later point. |
| {n} | The preceding item is matched exactly n times |
1968 Regular expressions entered popular use in pattern matching in a text editor.
“Regular expressions aren’t random jumbles of punctuation—they’re carefully thought-out jumbles of punctuation!” -- The Perl Cookbook
puo' essere definito in 3 modi
| Grammar | Grammar |
|---|---|
| A → ε A → aA |
A → a A → aA |
| Productions | Productions |
| ε aA → a aA → aaA → aa aA → aaA → aaaA → aaa aA → aaA → aaaA → aaaaA → aaaa ecc ... |
a aA → aa aA → aaA → aaa aA → aaA → aaaA → aaaa aA → aaA → aaaA → aaaaA → aaaaa ecc ... |
| Language | Language |
| L= {an | n≥0 } | L= {an | n>0 } |
| regex | regex |
| a* | a+ |
quello che voglio notare e': la produzione A → aA
A → ε
A → a
A → aB
parte sx: solo 1 non terminale, come per Context-free grammar.
parte dx: max 1 non terminale, sempre a dx; o 1 terminale, o stringa vuota.
| Grammar | Grammar |
|---|---|
| A → a A → aA A → b A → bA |
A → a A → aA A → bB B → bB B → ε |
| Productions | Productions |
| a b
aA → aa aA → ab bA → ba bA → bb
aA → aaA → aaa aA → abA → aba bA → baA → baa bA → bbA → bba
aA → aaA → aab aA → abA → abb bA → baA → bab bA → bbA → bbb
ecc ... |
A → a A → bB → b
A → aA → aa A → aA → abB → ab A → bB → bbB → bb
A → aA → aaA → aaa A → aA → aaA → aabB → aab A → aA → abB → abbB → abb A → bB → bbB → bbbB → bbb
A → aA → aaA → aaaA → aaaa A → aA → aaA → aaaA → aaabB → aaab A → aA → aaA → aabB → aabbB → aabb A → aA → abB → abbB → abbbB → abbb A → bB → bbB → bbbB → bbbbB → bbbb
ecc ... |
| regex | regex |
| (a|b)+ | a*b* |
o un nr qualsiasi di volte
| Grammar | Grammar |
|---|---|
| A → xB B → yC C → z |
A → xB B → yC C → zA C → z |
| Productions | Productions |
| A → xB → xyC → xyz | A → xB → xyC → xyzA →
xyzxB → xyzxyC → xyzxyz |
| regex | regex |
| xyz | (xyz)+ |
per poter scrivere la grammatica in modo piu' compatto, si introduce una notazione piu' compatta-potente, che pero' genera gli stessi linguaggi. Analogo a BNF Backus–Naur Form . EBNF Extended BNF.
2 regole
A → w A, B non-terminal, w∈Σ*
A → wB
In particolare e' possibile scrivere una produzione del tipo
A → B
| Grammar | Grammar | Grammar | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| A → a A → aA
|
|
|
||||||||
| Language | Language | Language | ||||||||
| L= {an
| n>0 } |
L= {anbm | n,m>0 } |
L= {aibjck | i,j,k>0 } |
||||||||
| regex | regex | regex | ||||||||
| a+ | a+b+ | a+b+c+ |
Progressivo
| Grammar | Language | regex |
|---|---|---|
| A → ε | L= {ε} | |
| A → aA | L= {an | n≥0 } | a* |
| A → B
B → ε |
idem | |
| B → bB | L= {anbm | n,m≥0 } | a*b* |
| B → C
C → ε |
idem | |
| C → cC | L= {aibjck | i,j,k≥0 } | a*b*c* |
| Grammar | Grammar | Language | regex |
|---|---|---|---|
| A → ε A → aA B → bC |
A → ε A → aA B → bC |
L= {aibck | i,k≥0 } |
a*bc* |
| Grammar | Language | regex |
|---|---|---|
| A1 → aA2 A2 → aA3 A3 → a |
L= {a3} | a{3} |
| A1 → aA2 A2 → aB1 B1 → bB2 B3 → bC1 C1 → c |
L= {a2b3c} | a{2}b{3}c |
wp/Regular_expression#Formal_definition
Pattern recognition algorithms generally aim to provide a reasonable answer for all possible inputs and to perform "most likely" matching of the inputs, taking into account their statistical variation.
es: riconoscere la foto di un cane.
pattern matching algorithms look for exact matches in the input with pre-existing patterns.
es: pattern-matching by regex.
espressione regolare (di un linguaggio) descrive un tipo di struttura di stringhe tramite un procedimento che permette di calcolare se una qualsiasi stringa ha o no tale struttura.
| A → ε A → aA L= {an |n≥0 }
regex a* |
A → ε A → aA
B → ε L= {anbm
regex a*b*
|
A → ε A → aA A → B B → ε B → C C → ε L= {aibjck
regex a*b*c* |
| Grammar | Grammar | Grammar |
|---|---|---|
| A → a A → aA L= {an | n>0 }
regex a+ |
A → a A → aA
B → b L= {anbm
regex a+b+
|
A → a A → aA A → B B → b B → C C → c L= {aibjck
regex a+b+c+ |