A grammar defining formal language is a quadruple , where is a finite set of nonterminals, is a finite set of terminal symbols, is a finite set of productions, and is an element of .
The set of terminal symbols is 's alphabet. Nonterminals are symbols representing language constructs. The sets and should not intersect. is called the start symbol. Productions are rules of the form: , where both and are strings of terminals and nonterminals, contains at least one nonterminal.
Sentential forms for grammar are defined by the following rules: is a sentential form and if is a sentential form and production belongs to , then is a sentential form as well.
is the set of all strings which are sentential forms consisting entirely of terminal symbols. For a language defined by a grammar, recognition whether a given string (expression) belongs to that language is, in general, a non-trivial task. All languages defined by grammars are recursively enumerable sets.
1. A grammar is called right linear if all its productions have the form or , where and is a string of terminal symbols.
2. A grammar is called context-free if all its productions have the form , where and is a string of terminal and nonterminal symbols.
3. A grammar is called context-sensitive if all its productions have the form , where both and are strings of terminal and nonterminal symbols and the length of is not more than the length of .
4. A grammar is called unrestricted if it does not belong to categories 1 through 3.
This hierarchy of grammars was introduced by N. Chomsky. The set of languages defined by grammars of every category is a proper superset of that for the previous category. The languages defined by grammars of categories 1 through 3 are recursive sets. A language can be defined by a grammar of category 1 iff it is defined by a regular expression.