Monday, March 6, 2006

Welcome Address By School

DCG: notation of grammar rules in Prolog parsers and tokenization

grammatical notation DCG (Definite Clause Grammar, Castilian grammar of clauses defined ) is a syntactic variant or extension of the regular syntax Prolog, which aims to implement gram ; formal policies for short, and therefore simpler and more readable parser written in this language, its ability to handle information (meaning by this "variables") implied. His expression is a shorthand notation syntax Prolog language, collected by almost all parsers and current for this language interpreters, which transform automatically and internal standard clauses. In the notation DCG the infix operator "-" is replaced by the operator of the same character "-->", and can establish, by way of example, the following correspondence between notations, the first Prolog clause in ordinary syntax, and the second DCG notation equivalent:

sentence (S0, S) - sintagma_nominal (S0, S1) sintagma_verbal (S1, S).
prayer -> sintagma_nominal, sintagma_verbal.

The above Prolog clause is expressed in words as follows:

"There is a prayer between S0 and S if there is a noun phrase between S0 and S1, and there is a verb phrase between S1 and S"

The DCG notation is based on that used for context-free grammars (context-free grammars or CFGs, N. Chomsky), indicating hierarchical relationships constructions language, described by BNF ( Backus Normal Form or Backus-Naur form ), originally created to define the syntactic structure of language ALGOL60 (1960, John Backus), and widely used both to define the syntactic structure of programming languages n, to define syntactic structures of languages \u200b\u200bin general. Therefore, using DCG notation is possible to handle directly in Prolog context-free grammars. Context-free grammars are to be a simplification of ordinary grammar, which, for example, and referring to Castilian language, could take the following expression in DCG notation :

prayer -> sintagma_nominal, sintagma_verbal.

sintagma_nominal -> decisive name.

sintagma_verbal -> verb, sintagma_nominal.
sintagma_verbal -> verb.

decisive -> [a].
decisive -> [la].

name -> [man].
name -> [apple].

verb -> [eats].
verb -> [sings]. That

Prolog system (in this case SWI-Prolog ) automatically translates the next set of clauses:

prayer (A, B): - sintagma_nominal (A, C),
; sintagma_verbal (C, B).

sintagma_nominal (A, B): - determiner (A, C),
name (C, B).

sintagma_verbal (A, B): - verb (A, C),
sintagma_nominal (C, B).
sintagma_verbal (A, B): - verb (A, B).

determiner ([the

As is easily deduced, the program based on this limited Pequena grammar, only accepted as grammatically correct "man eats the apple", "the man sings," and similar combinations, including phrases such obviously grammatically incorrect as "the singing apple," the man eats, etc., returning an empty list, but will not accept such "dancing man" (the verb "dance" is not within the basis of facts or knowledge of the program.) It is therefore understood that a complete grammar, covering all possible structures, relationships and combinations of the constituent parts of a language requires a very complex and difficult to implement. In the example above, you can get all accepted as grammatically correct sentences by the grammar of the program, launched in the interpreter's next target:

? - Sentence (X ,[]).


Meanwhile, a very simple grammar for English, written by DCG

notation takes the following expression:

sentence -> noun_phrase, verb_phrase.


noun_phrase --> noun.
noun_phrase --> determiner, noun, rel_clause.

verb_phrase --> verb.
verb_phrase --> verb, noun_phrase.


rel_clause --> [].

rel_clause --> [that], verb_phrase.

determiner --> [the].

determiner --> [a].

noun --> [john].

noun --> [annie].

noun --> [man].

noun --> [men].

noun --> [woman].

noun --> [women].

...



verb --> [like].

verb --> [likes].

...

In the notation

DCG deleted

two parameters, an input list, and an output list. The first can be represented, for example, by prayer [the dog bite] and the second by an empty list [] is the result of checking the syntactic structure of that sentence by function n the parser

used. Additionally, you can add variables to grammars represented by DCG

notation, for example, to specify the gender and number. As

above, the DCG notation to add additional arguments, and also incorporate Prolog goals in the body of rules, the latter enclosed in braces {}. Additional arguments allow us to introduce eg correct grammar rules based on the conventions adopted by the target language representation, and avoid mistakes in the matching gender / number, etc. Also, by DCG notation is also possible to construct programs that can to represent parse trees of sentences (representing grammatical categories involved in the breakdown of its parts) and

syntactic trees (parse trees

) than a graphically show the structure as a tree falling in a certain grammar. As explained in the previous notation, the parser or parsers are responsible for the construction of parse trees of sentences of a language able to represent the phrase structure of the same. Following the example of the simple grammar in Castilian, as described above, the graphical representation of the parse tree of the sentence "the man eats the apple" looks like this , making use of this program it : ? - draw (or (sn (d (el), n (man)), sv (v (come), sn (d (la), n (apple ))))). or         sn

  
   Where

apple "or" = sentence, "sn" = noun phrase, "d" = Determinant, "n" = Name, "sv" = verb phrase, and "v" = Word. It should be noted that this program does the tree of analysis itself, which is:

or (sn (d (el), n (man)), sv (v (come), sn (d ( la), n (apple))))

simply holds its graphic representation in chart form
ASCII characters
. The source code of that program, originally called draw.swi

has been copied to a plain text editor and saved with the extension. "pl" to be executed by the SWI-Prolog interpreter

, although another option is to associate files with extension ". swi "with the executable of the interpreter. Its author is
Ludäscher
Bertram, the "University of California Davis."

Examples of notation

DCG and Prolog in Castilian are taken from Chapter 9, "Using Prolog grammar rules", paragraph 9.1, "The problem of parsing" of the WF Clocksin and work Mellish CS Programming in Prolog "(2 nd ed., Barcelona: Gustavo Gili, 1993, ISBN: 84-252-1339-8), original English translation of" Programming in Prolog ", in I refer to a detailed explanation of grammar rules, parsing, and notation

DCG in Prolog, given that the authors reviewed are a clear and perfectly understandable the particular, perfectly accessible to people not too versed in this language. Meanwhile

example grammar English has been obtained from Chapter 17, "Language Processing with Grammar Rules", from the book "Prolog - Programming for Artificial Intelligence" (Ivan Bratko, 2nd ed., Addison-Wesley, 1990, ISBN: 0 -201-41606-9). This is a highly recommended book of more advanced and sophisticated than the Clocksin and Mellish, clearly more introductory, and as its name indicates it on one side the general Prolog language, and secondly its apply the main techniques in the field of IA (search and classification, expert systems and representation knowledge, PLN , Machine Learning, etc.). Finally, we mention another example of practical use of the notation

DCG and Prolog programming language, the library Html-write [1

] [ 2] [3

] shell and development environment

SWI-Prolog that translates Prolog terms in HTML , generating output in the latter language, for which uses precisely structures written in DCG notation

.



Más información ( PLN
y
DCG s
en Prolog):




Adventure
in Prolog
- 15.
Natural Language
.


En

Learn
Prolog Now!

: 7.
Definite Clause Grammars
, 8.
More Definite Clause Grammars
(

P.
Blackburn

another location).

Perl Style Regular Expressions in Prolog
    (
  • another location). Introduction to Definite Clause Grammars
  • .
  • CFGs and BNF:
  • Programming Languages: BNF notation and other grammars . What is BNF notation?
  • Backus-Naur Form (in

    FOLDOC).
  • Context-free grammar (in
  • Wikipedia ).
  • are part of the same set of class readings : Four Concepts in Programming Language Description: Syntax, Semantics, Pragmatics and Metalanguage
  • .
  • Grammatical Description of Syntax: The BNF Metalanguage .
Derivations, Ambiguity and Semantics

. Nonrecursive

Syntax: The Regular Expression
    Metalanguage. Between
  • Lexics and Syntax: Whitespace, Layout, Comment
  • Conventions.