A grammar defining formal language is a quadruple
, where
is a finite set of nonterminals,
is a finite set of terminal symbols,
is a finite set of productions, and
is an element of
.
The set of terminal symbols is
's alphabet. Nonterminals are symbols representing language
constructs. The sets
and
should not intersect.
is called the start symbol. Productions are rules of the form:
, where both
and
are strings of terminals and nonterminals,
contains at least one nonterminal.
Sentential forms for grammar
are defined by the following rules:
is a sentential form and if
is a sentential form and production
belongs to
, then
is a sentential form as well.
is the set of all strings which are
sentential forms consisting entirely of terminal symbols. For a language defined
by a grammar, recognition whether a given string (expression) belongs to that language
is, in general, a non-trivial task. All languages defined by grammars are recursively
enumerable sets.
1. A grammar
is called right linear if all its productions have the form
or
, where
and
is a string of terminal symbols.
2. A grammar
is called context-free if all its productions have the form
, where
and
is a string of terminal and nonterminal symbols.
3. A grammar
is called context-sensitive if all its productions have the form
, where both
and
are strings of terminal and nonterminal symbols and the
length of
is not more than the length of
.
4. A grammar
is called unrestricted if it does not belong to categories 1 through 3.
This hierarchy of grammars was introduced by N. Chomsky. The set of languages defined by grammars of every category is a proper superset of that for the previous category. The languages defined by grammars of categories 1 through 3 are recursive sets. A language can be defined by a grammar of category 1 iff it is defined by a regular expression.