Klotho Grammar

What is a Graph Grammar?
First, What is a Grammar?

A grammar is a set of rules which can

  • generate a construct from a list of terminals;
  • recognize that a construct obeys the grammar's rules (is ``well-formed'').

In natural languages such as English, the construct is a sentence, and the terminals are words. The grammar can specify such notions as subject-verb agreement, the declension of nouns and the conjugation of verbs, and the syntactic structure of sentences (for example, interrogative sentences can begin with verbs). Computational linguistics (also called natural language processing) seeks to formally describe natural language grammars so that machines can recognize and interpret sentences (and perhaps someday even paragraphs). This is decidedly nontrivial because human languages freely use a variety of rhetorical devices and omissions, so this is a very active field of research. It is common programming practice to separate the generation and recognition functions into separate modules (generators and parsers), but this is not essential.
Languages, and by extension the grammars which encode them, are often classified in a hierarchy due to Noam Chomsky . Much of Chomsky's work was concerned with the formal classification and properties of types of languages, and the description below is decidedly informal.

The simplest are regular expression languages, such as those used in string searching utilities like grep, and often to search databases of DNA and protein primary structures;
the next are context-free languages, more complicated than regular expression languages by having a syntactic structure, but the parse of a given term does not depend on its context;
then come context-dependent languages, more complex by having these contextual dependencies (think of relative clauses to get an approximate idea);
and finally come the natural languages, like the ones we normally speak.

So What is a Graph Grammar?

Those who have ``diagrammed'' sentences in American schools know that even fairly simple sentences can look like trees with all the branches on one side. Sentences with relative pronouns or clauses can even be cyclic. If one imagines that each word is a dot (node) and places it at the end of its line (arc), one suddenly produces a connect-the-dots diagram which looks very much like an abstract molecule. Such a diagram is mathematically a graph, and a graph grammar is a grammar which operates on a graph. Naturally the grammar can be more complex, since the outcome of an operation at a node may depend critically on the node's context, so intuitively one might expect it could be at least greater than context-free.

What is Klotho's Grammar?
Like a Chomsky grammar, Klotho's graph grammar transforms one representation into another. Unlike many grammars, however, Klotho is layered, with each layer corresponding to the equivalent internal representations and the code needed to generate them. Since the outcome of running the grammar on a particular part of a molecule will depend directly on the atomic and bonding context, we expect the grammar to be at least context-dependent: however, this is an intuition, not a formal proof! As an aside, Klotho describes the nouns of the reaction construct; Atropos describes the verbs (mechanism). The analogy is somewhat imperfect, since Atropos also includes representations of reaction equations, but is otherwise very close.
The grammar exploits the idea of a high-level description to lump together atoms into substituent groups (formally, contract the graph); the groups are described in terminals, and the grammar substitutes the terminals and expands the high-level description appropriately to generate the other representations. Other parts of the grammar convert Klotho's internal representations to isomeric SMILES strings for output to CONCORD (TM) . The terminals Klotho employs are listed in the Terminals file, and we are now writing up a more thorough description of the grammar. A consequence of using a grammar, which operates naturally on constructs having both variable and instantiated (defined, ground) terms is that it easily accomodates variable structures, such as R groups (Markush structures), and can operate with these in all computations. To date we have concentrated mostly on the generator, but have also done some preliminary work on a parser to classify compounds. Not surprisingly, we find that much of the code overlaps the two grammatical functions.

 

What is Klotho's Grammar?


Like a Chomsky grammar, Klotho's graph grammar transforms one representation into another. Unlike many grammars, however, Klotho is layered, with each layer corresponding to the equivalent internal representations and the code needed to generate them.

Since the outcome of running the grammar on a particular part of a molecule will depend directly on the atomic and bonding context, we expect the grammar to be at least context-dependent: however, this is an intuition, not a formal proof! As an aside, Klotho describes the nouns of the reaction construct; Atropos describes the verbs (mechanism). The analogy is somewhat imperfect, since Atropos also includes representations of reaction equations, but is otherwise very close.
The grammar exploits the idea of a high-level description to lump together atoms into substituent groups (formally, contract the graph); the groups are described in terminals, and the grammar substitutes the terminals and expands the high-level description appropriately to generate the other representations. Other parts of the grammar convert Klotho's internal representations to isomeric SMILES strings for output to CONCORD (TM) .

The terminals Klotho employs are listed in the Terminals file, and we are now writing up a more thorough description of the grammar. A consequence of using a grammar, which operates naturally on constructs having both variable and instantiated (defined, ground) terms is that it easily accomodates variable structures, such as R groups (Markush structures), and can operate with these in all computations. To date we have concentrated mostly on the generator, but have also done some preliminary work on a parser to classify compounds. Not surprisingly, we find that much of the code overlaps the two grammatical functions.