Finite State Machines and Parsing Sentences

A FSM reads symbols and has the goal of getting to the final state of acceptance. FSMs are used to represent programming language grammars and they might also be used to represent natural language.

In English a simple sentence might be <subject> <verb> <object> while in Latin or German it might be <subject> <object> <verb>. In state 1 for English the FSM would expect input of a subject such as 'John' or 'A'. Input of a valid subject would lead to a second state where a verb was expected, for example 'kicks' or 'B'. Acceptance of a valid verb would lead to a third state where an object was expected, such as 'the ball' or 'C'. Input of C would lead to acceptance of the sentence as valid. Thus 'John kicks the ball' and 'ABC' are valid inputs for this FSM.

If ABC (subject, verb, object) is a valid input to an English sentence what might AABC be? In what way might AABBCC be valid? Or DABDC, or DDABDDC, DAEBDC and so on (what might D and E be?).

In natural languages this process is similar to parsing - here is an example in Latin. In Latin you are often required to parse a sentence as a prelude to understanding it, breaking it down into its constituent parts and decoding each part helps you to piece together the whole sentence and understand what it means. A sentence becomes valid when you understand what it means, in other words the FSM reaches acceptance. A sentence that lacks common elements such as subject, verb or object may be difficult to understand and would fail when passed through a FSM.

Natural languages are very complex and FSMs are more commonly used in programming languages where the rules are simpler. An FSM might be used for building a valid expression such as '2 + 2' - even here the rules can be quite complex e.g. 3 * (-2 + 3).