An introduction to XML Schemas
Posted On August 5, 2005 by Madhu AP filed under Internet
XML authors use Document Type Definitions (DTD) to validate the XML documents they create. DTD has been the standard for long and offers several benefits. However, with exponential growth in the usage of XML, some limitations of DTD have surfaced. These include:
1. DTD Syntax and XML Syntax have some differences. You need to write a parser to verify or check the Syntax.
2. DTD Syntax has a complicated way of supporting XML Namespaces. For example, whenever an element or attribute is declared, the namespace associated with them also has to be declared. This creates long winding code in many cases and can confuse a reader.
3. There are limitations in the DTD grammar, which means that you can’t create flexible documents.
All these reasons prompted vendors and industry experts to suggest other ideas for document validation. XML Schemas and Relax NG are prime examples of the models that evolved during this period. In this article series we will learn about both the models.
Introduced in 1999, XML Schemas was a natural progression over DTD. It was meant to resolve existing problems faced by XML programmers who saw limitations in DTD. An XML Schema is simply a way to describe the structure of an XML document. It is different from DTD, which is also a schema to describe XML documents.
Without understanding what XML Schema is exactly, let us try figuring out its advantages:
1. XML Schemas use exactly the same syntax and grammar as XML. Hence you need not learn new grammar, neither use a new parser;
2. XML Schemas handle Namespaces very well simply for the reason that this standard came into existence after XML Namespaces were standardized by W3C;
3. XML has lesser limitations to its grammar and syntax; and
4. You can even build content models.
Now let us try to understand what an XML Schema really is. Before that, you need to understand that this article does not explain XML Schema completely but only provides an introduction. XML Schemas require extensive explanation that is beyond the scope of this article. Let us get started!
An XML Schema has the following capabilities:
· Define elements that appear in a document;
· Define attributes that can appear in a document;
· Define relationship, number and order of child elements;
· Define data types for elements and attributes; and
· Define default and fixed values for elements and attributes.
Why should XML documents be well formed, in light of the fact that a well-formed XML document does not guarantee accuracy in data?
Well, XML Schemas are also useful in ensuring that some of the data entered is correct. We will now create a simple XML file (code 1).
<?xml version="1.0"?>
<Address>
<Name>Tony Blair</Name>
<Street>Downing Street</Street>
<City>London</City>
<Zip>SW1A 2AA </Zip>
</Address>
address.xml
Now let us create a simple DTD document for the above XML. You may note that the DTD document is simple.
<!ELEMENT Address( Name, Street, City, Zip)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Street (#PCDATA)>
<!ELEMENT Zip (#PCDATA)>
<!ELEMENT City (#PCDATA)>
address.dtd
Now let us create an XML Schema for the XML document.
The first step in creating an XML Schema is to declare the XML Schema elements.
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.developeriq.com”
xmlns="http://www.developeriq.com"
elementFormDefault="qualified">
As you are aware, the first line always declares the document as an XML document. This is followed by declaring the <schema element>.
The schema element’s namespace is declared by the statement xmlns:xs="http://www.w3.org/2001/XMLSchema"
This statement indicates that the different elements, attributes and other syntax declared in the document come from the namespace http://www.w3.org/2001/XMLSchema.
Then you will notice two more namespace declarations. The first one indicates that the target Namespace comes from http://www.developeriq.com. The next line points that the default namespace is also from the same URL.
We also include the elementFormDefault attribute with its value as “qualified”.
The above statement controls the way in which elements are handled in the document. Readers are advised to use this statement by habit while creating Extended Schema Documents.
The file address.xsd has the complete schema (code 2).
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.developeriq.com
xmlns:target= "http://www.developeriq.com"
elementFormDefault="qualified">
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="Name" type="xs:string"/>
<xs:element name="Street" type="xs:string"/>
<xs:element name="City" type="xs:string"/>
<xs:element name="Zip" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element></xs:schema><?xml version="1.0"?>
address.xsd
We define the root element as “Address”.
Then we declare that the element is of XML Schema type complexType. Since element Address contains other child elements, it is imperative that we declare it as complexType.
As we expect the XML documents to be validated where, Name, Street, City and Zip are filled according to the same order, we include the <sequence> element.
Note that we have used a prefix ‘xs:’ in this example. You are free to choose whether you want to use or do not want to use the prefixes.
You declare each and every child element by specifying their data type.
<element name="Name" type="string"/>
The above line indicates that the name of the element is Name and its type is string. Rest of the XSD document should be easily decipherable.
Validating the Schema
To validate the schema you need to reference the schema to the sample XML document. The file address_xsd.xml shows a referenced XML document (code 3).
<?xml version="1.0"?>
<Address
xmlns=http://www.developeriq.com
xmlns:xsi= ”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="address.xsd">
<Name>Tony Blair</Name>
<Street>Downing Street</Street>
<City>London</City>
<Zip>SW1A 2AA </Zip>
</Address>
address_xsd.xml
Check the file address_xsd.xml. You start with an element Address and declare its namespaces. You declare its schema location using element xsi:schemaLocation.
Note that all files have to be saved under the same directory or folder on your system.
Schema Concepts
It is advisable that you start using an XML editor. The XML Distilled editor given in the CDs is an excellent choice to check for the schema’s formation.
XML Schema has a lot of built-in data types. Here is a list of the most common types:
xs:string
xs:decimal
xs:integer
xs:boolean
xs:date
xs:time
String data type can contain characters, line feeds, carriage returns and tab characters. Following is an example of a string declaration in a schema:
<xs:element name="Color" type="xs:string"/>
Date data type is used to specify a date. The date is specified in the following form "YYYY-MM-DD" where YYYY indicates the year, MM indicates the month and DD indicates the day.
Please ensure that all components are required.
The following is an example of a date declaration in a schema:
<xs:element name="birth_day" type="xs:date"/>
An element in your document might look like this:
<birth_day>1933-09-26</birth_day>
There are two types of XSD elements – Simple Type and Complex Type. Simple elements are single dimensional elements that do not have any child elements. Simple elements can have attributes. Here are a couple of Simple elements.
<xs:element name="Name" type="xs:string" default="GB Shah"/>
<xs:element name="Name" type="xs:string" fixed="GB Shah"/>
The default attribute sets the value to the default value while fixed attribute sets a fixed value.
XML documents consist of elements as well as attributes. You can create a schema for an attribute as follows:
<xs:attribute name="title" type="xs:string"/>
To limit the content of an XML element to a set of acceptable values, we would use the enumeration constraint.
This example defines an element called "HolyBook".
<xs:element name="HolyBook"><xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Bible"/>
<xs:enumeration value="Koran"/>
<xs:enumeration value="Gita"/>
</xs:restriction>
</xs:simpleType></xs:element>
Complex elements usually have child elements. Empty elements are also described under Complex elements in XML Schema. Empty elements and elements carrying only texts are also considered as Complex XSD elements. Here are some examples:
<Name f_name="Yogesh"/>
<Book type="Holy">The Holy Bible</Book>
The element Address described in address.xsd is a prime example of a complex element.
Limitations of XML Schema
We have only touched the tip of the iceberg as far as XML Schemas go. I advise you to check through the XML Schema specifications on http://www.w3.org/XML/Schema
However, XML Schema has many limitations that were described by James Clark, author of RELAX NG, in a path-breaking paper he presented to the public through newsgroup presentation. I reproduce here some of the comments in a condensed format.
1. XML Schema definitions require considerable expertise to understand and can contain quite a few surprises.
For example, if you derive a complex type by restriction you have to specify the new restricted content model explicitly. However, attributes are treated in the opposite way; by default you get all the attributes and you have to explicitly rule out the ones you don't want.
2. The XML Schema Recommendation is hard to read and understand.
To avoid the possible misinterpretations mentioned above, you might have to reference the specification in order to fully understand a specific schema definition. Consider, for example, the DTD document for validating the simple address.xml file and the XSD document for validating the same file. You will note that the XSD document is tougher to read and comprehend.
3. W3C XML Schema's support for attributes provides no advance over DTDs.
As with DTDs, W3C XML Schema only allows the specification of whether attributes are required or optional. There is no way to specify more complex constraints between attributes or between attributes or elements, for instance that either attribute X or attribute Y is allowed or that either attribute X or element Y is allowed.
4. W3C XML Schema provides very weak support for unordered content.
When the designer of an XML vocabulary does not wish to force child elements to occur in a particular order, it can be impractical to describe the XML vocabulary using XML Schema because XML schema imposes such limitations.
5. Data type handling in W3C XML Schema lacks modularity.
W3C XML Schema is tied to the single collection of data types defined in Part 2 of W3C XML Schema. Yet this collection of data types is a very ad-hoc collection. It includes data types of highly debatable relevance (gYearMonth, gDay, etc). Yet it lacks many data types that are important for many applications. A modular approach where a schema language can be combined with one or more standard collections of data types, some general-purpose and some domain-specific, is called for here.
All these limitations forced James Clark, leader of the Technical Committee at OASIS (Organization for the Advancement of Structured Information Standards), to work on RELAX NG. We will figure out how RELAX NG helps us to improve over XML Schemas and DTDs in the next article.
Reference:
Usergroup listing : http://www.imc.org/ietf-xml-use/mail-archive/maillist.html.
XML Schemas by Eric van der Vlist, O Reilly Press.
