Moving to RELAX NG

It improves on a DTD, as far as an XML author is concerned.
We can define RELAX NG as a schema that does a better job than XML Schema. This is also a grammar-based language, but unlike XML Schema it does not use tags (< >). It uses braces, which most programmers are familiar with.
RELAX NG is a pattern-based schema. RELAX stands for REgular LAnguage description for XML Next Generation. It is based on RELAX and TREX, a language developed by James Clark.
Some of the benefits attributed to RELAX NG include lesser lines of code you need to write, which are less cryptic too. You can handle Namespaces very well. RELAX NG can support both XML Schema data types as well as user-defined data types.
RELAX NG employs patterns as the main idea for developing schemas. Table 1 shows some simple patterns:

Pattern Name    Pattern
Element pattern    element nameClass{pattern}
Attribute pattern    attribute nameClass {pattern}
Text pattern    Text
Sequence pattern    pattern [pattern]+
Zero or more pattern    pattern*

Now let us create a RELAX NG schema for the file address.xml, we created in the previous article.

element Address {
    element Name {text},
    element Street {text},
    element Ciy    {text},
    element Zip    {text}
}

address.rnc

Open the XML Distilled Editor. Type in the code found in file address.rnc and save it as file address.rnc. Next, open address.xml, check for the formation and then validate the document against the RELAX NG file.
You will get a message that no errors are found.
You can also create a RELAX NG file using tags, which is optional. However, this defeats our purpose of making a less cryptic markup to read.
Unlike a DTD or XML Schema, you cannot reference an RNC document to be referenced by an XML document/ You can write a bit of code or use editing tools such as XML Undistilled to validate XML documents.
You mess around with the XML document address.xml, change the order a bit and save the file as address2.xml. Now, if you try to validate the new XML document, you will generate errors. The new file address2.xml has the tag <Zip> missing.

<?xml version="1.0"?>
<Address>
<Name>Tony Blair</Name>
<Street>Downing Street</Street>
<City>London</City>    
</Address>

address2.xml

To validate the same document, you need to basically apply some of the cardinality rules that you would have learned while getting into terms with Regular Expressions or even some shell programming.
For example, you can validate the address2.xml file by making a small alteration to address.rnc.

    element Zip    {text}*

By adding asterisk (*) to the line mentioning Zip tag, you apply the cardinality rule of element occurring zero or more. Similarly, the question mark (?) denotes an optional pattern. The plus sign (+) denotes occurrence of a pattern “one or more”.
If you add an attribute to the address.xml file, such as a first and last name to the name, you can still validate the XML document. See address3.xml.

<?xml version="1.0"?>
<Address>
<Name> first_name = "Tony" last_name = "Blair" </Name>
<Street>Downing Street</Street>
<City>London</City>
<Zip> 2W112EA </Zip>
</Address>

address3.xml

To validate an XML file using RELAX NG, you need to create an RNC file that understands the attribute names. An example of such an RNC file is given below.

element Address {
    element Name {
        attribute first_name {text},
        attribute last_name {text}
        },
    element Street {text},
    element Ciy    {text},
    element Zip    {text}
}

address3.rnc

You can have a choice of patterns in an RNC document. For example, in some cases the order of two attributes or two elements can be interchanged. Some forms are filled with first_name first and some forms accept last_name first. However, if you want a common RNC file, which will validate a series of documents with entries that are mixed, then you have to use the choice pattern or ‘|’ option. An example (code1) is given below.

element Name {
        (element first_name {text},
        element last_name {text})|
        (element last_name {text},
        element first_name {text})

        }

Code 1

Note: Comments inside an RNC document starts with the ‘#’ sign.
Another point is that though the sequence of elements appearing in an XML document is important while parsing using RELAX NG, the order of attributes need not be in a sequence.

Grammar pattern
Mostly, you would like to create an RELAX NG schema that looks very similar to examples we have discussed so far. However, you can define patterns and name them in RELAX NG which can be reused by the file. This works somewhat like a function in a programming language.

Consider the RNC file (code 2):
element address {
  element name {
    element first_name { text },
    element second_name { text }
  }*
}

Code 2

The above RNC file can be rewritten as follows. Refer code 3.

grammar {
  start =
    element address {
      element Name{ NameInfo}*
    }
  NameInfo =
    element first_name { text },
    element last_name { text }
}

Code 3
A grammar pattern contains one or more definitions. Each definition associates a name with a pattern. Inside a grammar, a pattern consisting of just a name references the definition of that name in the grammar. The name start is special. A grammar pattern is checked by matching the definition of start. A grammar pattern must define start. See code 4.

start = Address
Address = element adress { Name*}
Name = element name { first_name, last_name }
first_name = element first_name { text }
last_name = element last_name { text }

Code 4

Lists
The list pattern matches a whitespace separated sequence of tokens; it contains a pattern that the sequence of individual tokens must match. The list pattern splits a string into a list of strings and then matches the resulting list of strings against the pattern inside the list pattern.
Let us consider the example of a sentence.
element sentence {
  list { xsd:text, xsd:text }
}

Lists are fairly useful in writing RNC, which validates text-based patterns.
A major problem, which exists while validating XML documents, is that the order of elements in a cluster of XML documents can vary. Hence, to write an RNC, which validates all the XML documents irrespective of the fact whether the elements appear in an ordered fashion or otherwise, you need to use interleaving patterns.
As an example, consider that we have hundreds of XML files that contain data stored in different order of elements. If you modify address.rnc as address4.rnc, you will be able to validate such a set of documents.

element Address {
  element Name { text }
    & element Street { text }
       & element Zip { text }
       & element City { text }    
  }
address4.rnc

Namespaces
RELAX NG is very efficient in handling namespaces. Here is a simple example:

ag = “ http://www.developeriq.com/RNC “
bg = “ http://www.microsoft.com”

You have to qualify every element and attribute with a namespace. You can also use multiple namespaces. Let’s see an example:

element ag:address {
  element ag:name {
    element ag:first_name { text },
    element ag:second_name { text }
  }
  element bg:Zip {text}
}

If you do not want to laboriously add a prefix to each element or attribute name, you just need to use the key word default.
    
default namespace = “ http://www.developeriq.com/RNC”

End Note:
Readers may remember the articles by Mr. Ramaswamy on RELAX NG and Java. In the next month’s edition we will take a look at RELAX NG programming using alternate platforms. You can find some of the files used in this article at the link http://www.developeriq.com/mag/AugustXML.zip.



Added on June 27, 2007 Comment

Comments

Post a comment

Your name:

Comment: