XML DOM AND SAX using C#
Posted On February 2, 2007 by Anush filed under Internet
The Document Object Model has several benefits. One of the biggest advantages is its structure. All programmers are familiar with tree structure and can easily visualize an XML document as a tree structure. The tree structure is pretty good when you have documents that can be easily described as trees.
In many cases, developers may feel that the description as a tree may be a bit of an overkill. One of the issues that the DOM tree usually has pertains to memory in many cases. Imagine an XML file, which is essentially flat, but having millions of records. Loading the Document Object Model itself once in a program will eat up a lot of memory. Then accessing each node will bloat memory, further slowing down programs.
In such cases, XML developers use an XML SAX API.
SAX stands for Simple API for XML. SAX can be illustrated best through an analogy of a train passing by you. You first notice that a train is approaching. Then you see the engine (XML parlance the start tag), bogies (contents) and finally the last bogie, usually with a guard inside (end tag).
XML Parsing using SAX API is very similar. It reads a document as a series of events, only storing a particular instance in its memory. This has a huge limitation; you cannot go back and forth over an XML document. However, it saves time and memory usage, which are very important in real time scenarios. SAX as a standard was evolved through discussion lists at OASIS.
Our popular author RS Ramaswamy has discussed SAX in his literary discourses earlier from a Java developer’s point of view. Hence, in this article, we will be looking at XML SAX API from the .Net point of view. We will also use an Open Source language (Python) to understand some of the nuances.
Before we delve further, let us explore the Microsoft Dot Net Framework’s System.xml class and write a few programs to familiarize ourselves with the concepts.
We will first create a simple XML file using the XML DOM model, since this is an idea that we know already.
The System.xml namespace has more than a hundred classes that help you build, read and parse XML documents using your favorite Dot Net ready language.
Here is a simple program to print an XML file containing a Name and its child elements first_name and surname.
If you are familiar with the ideas of XML DOM model and know a bit of C#, you can easily understand code 1 given below.
Code 1
using System;
using System.Xml;
class Class1
{
XmlDocument xmldoc;
XmlNode xmlnode;
XmlElement xmlelem1;
XmlElement xmlelem2;
XmlElement xmlelem3;
XmlText xmltext;
static void Main(string[] args)
{
Class1 app=new Class1();
}
public Class1() //constructor
{
xmldoc=new XmlDocument();
//let's add the XML declaration section
xmlnode=xmldoc.CreateNode(XmlNodeType.XmlDeclaration,"","");
xmldoc.AppendChild(xmlnode);
//let's add the root element
xmlelem1=xmldoc.CreateElement("","NAME","");
xmltext=xmldoc.CreateTextNode("The NAME is split into 2 elements");
xmlelem1.AppendChild(xmltext);
xmldoc.AppendChild(xmlelem1);
//let's add another element (child of the Name element)
xmlelem2=xmldoc.CreateElement("","first_name","");
xmltext=xmldoc.CreateTextNode("Sherlock");
xmlelem2.AppendChild(xmltext);
xmldoc.ChildNodes.Item(1).AppendChild(xmlelem2);
//let's add another element (child of the Name element)
xmlelem3 =xmldoc.CreateElement("", "surname", "");
xmltext =xmldoc.CreateTextNode("Holmes");
xmlelem3.AppendChild(xmltext);
xmldoc.ChildNodes.Item(1).AppendChild(xmlelem3);
//let's try to save the XML document in a file: C:.xml
try
{
xmldoc.Save("C:\temp\xmldoc1.xml");
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
Console.ReadLine();
}
}
The code creates an XML file in the location C:\temp. The contents are seen in code 2. Change the code to suit your requirements.
Code 2
<?xml version="1.0" ?>
- <NAME>
The NAME is split into 2 elements
<first_name>Sherlock</first_name>
<surname>Holmes</surname>
</NAME>
Back to SAX
As mentioned earlier, in XML SAX model XML tree is not viewed as a data structure, but as a stream of events generated by the parser. The kinds of events are:
· start of the document is encountered;
· end of the document is encountered;
· start tag of an element is encountered;
· end tag of an element is encountered;
· character data is encountered; and
· A processing instruction is encountered.
To create and read XML files using the SAX API model, Dot Net Framework normally uses the following classes:
· XmlTextWriter - Represents a writer that provides a fast, non-cached, forward-only way of generating streams or files containing XML data that conform to the W3C Extensible Markup Language (XML) 1.0 and the Namespaces in XML recommendations; and
· XmlTextReader - Represents a reader that provides fast, non-cached and forward-only access to XML data.
There are other classes, but knowledge of these two classes is good enough to start off. Now, let us create an XML document very similar to the example used in the earlier program. Refer code 3.
Code 3
using System;
using System.Xml;
class xmlDocument
{
static void Main(string[] args)
{
xmlDocument xmlDoc = new xmlDocument();
xmlDoc.CreateXmlDocument();
}
XmlTextWriter writer;
public void CreateXmlDocument()
{
string xmlFilename = "Name.xml";
writer = new XmlTextWriter(xmlFilename, System.Text.Encoding.UTF8);
// You can Indent the XML document for readability
writer.Formatting = System.Xml.Formatting.Indented;
// The first step start the document
writer.WriteStartDocument();
// write root element - should at least contain a root element
writer.WriteStartElement("Name");
writer.WriteComment("The game is afoot");
writer.WriteElementString ("first_name", "Sherlock");
writer.WriteElementString ("surname","Holmes");
// Write End of root element
writer.WriteEndElement();
// Write End of document
writer.WriteEndDocument();
writer.Close();
}
}
Let us dissect the code.
We create a class called xmlDocument, which has a function to create an XML document. We create an XmlTextWriter object called writer.
The writer object is instantiated in the function CreateTextDocument. Then it is a very straightforward, serial or linear way of creating the document. First on the writer object, you call the method to start the document (WriteStartDocument).
In the next step, we create the root element using the method (WriteStartElement).
Then you create a comment (WriteComment), followed by child elements (WriteElementString) and then close the Root Element (Name). You need to close the document. Call the method from the main method in the class and you will generate the document. It is that simple.
The document will have contents as in the code snippet given below:
<?xml version="1.0" encoding="utf-8" ?>
- <Name>
- <!--
The game is afoot
-->
<first_name>Sherlock</first_name>
<surname>Holmes</surname>
</Name>
Now we will try to read XML documents using the XmlTextReader class discussed earlier. XmlTextReader class, which works similar to a file reader with support to read XML nodes in a structured manner, is used to read a well-formed XML document. XmlTextReader derives from a base XmlReader. The XmlTextReader class is also a sequential, forward-only class, meaning that you cannot dynamically search for any node — you must read every node from the beginning of the file until the end.
Here is a small method that reads an XML document. Mind you, this is not a full-fledged program and will not execute correctly. You need to customize according to the XML file you are reading. Check out code 4.
Code 4
using System;
using System.Xml;
public class ReadXML
{
static void Main(string[] args)
{
ReadXML reader = new ReadXML();
reader.ReadXmlDocument();
}
XmlTextReader reader;
public void ReadXmlDocument()
{
reader = new XmlTextReader("Name.xml");
reader.ReadStartElement("Name");
// Read an attribute
reader.Read();
if (reader.NodeType == XmlNodeType.Element &&
reader.Name == "Name" &&
reader.AttributeCount > 0)
{
WriteToFile("", “Name” );
}
reader.ReadStartElement("Name");
WriteToFile("First Name : " + reader.ReadElementString("first_name"));
WriteToFile("Surname : " + reader.ReadElementString("surname") + "");
reader.ReadEndElement();
reader.Close();
}
private void WriteToFile(string contents)
{
System.Console.WriteLine(contents);
}
}
The above method will print in the console, elements of the sample file Name.xml. Logic is similar to the XmlTextWriter class. You need to have an understanding of the XML Document structure to write a parser in this case.
In many cases, developers may feel that the description as a tree may be a bit of an overkill. One of the issues that the DOM tree usually has pertains to memory in many cases. Imagine an XML file, which is essentially flat, but having millions of records. Loading the Document Object Model itself once in a program will eat up a lot of memory. Then accessing each node will bloat memory, further slowing down programs.
In such cases, XML developers use an XML SAX API.
SAX stands for Simple API for XML. SAX can be illustrated best through an analogy of a train passing by you. You first notice that a train is approaching. Then you see the engine (XML parlance the start tag), bogies (contents) and finally the last bogie, usually with a guard inside (end tag).
XML Parsing using SAX API is very similar. It reads a document as a series of events, only storing a particular instance in its memory. This has a huge limitation; you cannot go back and forth over an XML document. However, it saves time and memory usage, which are very important in real time scenarios. SAX as a standard was evolved through discussion lists at OASIS.
Our popular author RS Ramaswamy has discussed SAX in his literary discourses earlier from a Java developer’s point of view. Hence, in this article, we will be looking at XML SAX API from the .Net point of view. We will also use an Open Source language (Python) to understand some of the nuances.
Before we delve further, let us explore the Microsoft Dot Net Framework’s System.xml class and write a few programs to familiarize ourselves with the concepts.
We will first create a simple XML file using the XML DOM model, since this is an idea that we know already.
The System.xml namespace has more than a hundred classes that help you build, read and parse XML documents using your favorite Dot Net ready language.
Here is a simple program to print an XML file containing a Name and its child elements first_name and surname.
If you are familiar with the ideas of XML DOM model and know a bit of C#, you can easily understand code 1 given below.
Code 1
using System;
using System.Xml;
class Class1
{
XmlDocument xmldoc;
XmlNode xmlnode;
XmlElement xmlelem1;
XmlElement xmlelem2;
XmlElement xmlelem3;
XmlText xmltext;
static void Main(string[] args)
{
Class1 app=new Class1();
}
public Class1() //constructor
{
xmldoc=new XmlDocument();
//let's add the XML declaration section
xmlnode=xmldoc.CreateNode(XmlNodeType.XmlDeclaration,"","");
xmldoc.AppendChild(xmlnode);
//let's add the root element
xmlelem1=xmldoc.CreateElement("","NAME","");
xmltext=xmldoc.CreateTextNode("The NAME is split into 2 elements");
xmlelem1.AppendChild(xmltext);
xmldoc.AppendChild(xmlelem1);
//let's add another element (child of the Name element)
xmlelem2=xmldoc.CreateElement("","first_name","");
xmltext=xmldoc.CreateTextNode("Sherlock");
xmlelem2.AppendChild(xmltext);
xmldoc.ChildNodes.Item(1).AppendChild(xmlelem2);
//let's add another element (child of the Name element)
xmlelem3 =xmldoc.CreateElement("", "surname", "");
xmltext =xmldoc.CreateTextNode("Holmes");
xmlelem3.AppendChild(xmltext);
xmldoc.ChildNodes.Item(1).AppendChild(xmlelem3);
//let's try to save the XML document in a file: C:.xml
try
{
xmldoc.Save("C:\temp\xmldoc1.xml");
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
Console.ReadLine();
}
}
The code creates an XML file in the location C:\temp. The contents are seen in code 2. Change the code to suit your requirements.
Code 2
<?xml version="1.0" ?>
- <NAME>
The NAME is split into 2 elements
<first_name>Sherlock</first_name>
<surname>Holmes</surname>
</NAME>
Back to SAX
As mentioned earlier, in XML SAX model XML tree is not viewed as a data structure, but as a stream of events generated by the parser. The kinds of events are:
· start of the document is encountered;
· end of the document is encountered;
· start tag of an element is encountered;
· end tag of an element is encountered;
· character data is encountered; and
· A processing instruction is encountered.
To create and read XML files using the SAX API model, Dot Net Framework normally uses the following classes:
· XmlTextWriter - Represents a writer that provides a fast, non-cached, forward-only way of generating streams or files containing XML data that conform to the W3C Extensible Markup Language (XML) 1.0 and the Namespaces in XML recommendations; and
· XmlTextReader - Represents a reader that provides fast, non-cached and forward-only access to XML data.
There are other classes, but knowledge of these two classes is good enough to start off. Now, let us create an XML document very similar to the example used in the earlier program. Refer code 3.
Code 3
using System;
using System.Xml;
class xmlDocument
{
static void Main(string[] args)
{
xmlDocument xmlDoc = new xmlDocument();
xmlDoc.CreateXmlDocument();
}
XmlTextWriter writer;
public void CreateXmlDocument()
{
string xmlFilename = "Name.xml";
writer = new XmlTextWriter(xmlFilename, System.Text.Encoding.UTF8);
// You can Indent the XML document for readability
writer.Formatting = System.Xml.Formatting.Indented;
// The first step start the document
writer.WriteStartDocument();
// write root element - should at least contain a root element
writer.WriteStartElement("Name");
writer.WriteComment("The game is afoot");
writer.WriteElementString ("first_name", "Sherlock");
writer.WriteElementString ("surname","Holmes");
// Write End of root element
writer.WriteEndElement();
// Write End of document
writer.WriteEndDocument();
writer.Close();
}
}
Let us dissect the code.
We create a class called xmlDocument, which has a function to create an XML document. We create an XmlTextWriter object called writer.
The writer object is instantiated in the function CreateTextDocument. Then it is a very straightforward, serial or linear way of creating the document. First on the writer object, you call the method to start the document (WriteStartDocument).
In the next step, we create the root element using the method (WriteStartElement).
Then you create a comment (WriteComment), followed by child elements (WriteElementString) and then close the Root Element (Name). You need to close the document. Call the method from the main method in the class and you will generate the document. It is that simple.
The document will have contents as in the code snippet given below:
<?xml version="1.0" encoding="utf-8" ?>
- <Name>
- <!--
The game is afoot
-->
<first_name>Sherlock</first_name>
<surname>Holmes</surname>
</Name>
Now we will try to read XML documents using the XmlTextReader class discussed earlier. XmlTextReader class, which works similar to a file reader with support to read XML nodes in a structured manner, is used to read a well-formed XML document. XmlTextReader derives from a base XmlReader. The XmlTextReader class is also a sequential, forward-only class, meaning that you cannot dynamically search for any node — you must read every node from the beginning of the file until the end.
Here is a small method that reads an XML document. Mind you, this is not a full-fledged program and will not execute correctly. You need to customize according to the XML file you are reading. Check out code 4.
Code 4
using System;
using System.Xml;
public class ReadXML
{
static void Main(string[] args)
{
ReadXML reader = new ReadXML();
reader.ReadXmlDocument();
}
XmlTextReader reader;
public void ReadXmlDocument()
{
reader = new XmlTextReader("Name.xml");
reader.ReadStartElement("Name");
// Read an attribute
reader.Read();
if (reader.NodeType == XmlNodeType.Element &&
reader.Name == "Name" &&
reader.AttributeCount > 0)
{
WriteToFile("", “Name” );
}
reader.ReadStartElement("Name");
WriteToFile("First Name : " + reader.ReadElementString("first_name"));
WriteToFile("Surname : " + reader.ReadElementString("surname") + "");
reader.ReadEndElement();
reader.Close();
}
private void WriteToFile(string contents)
{
System.Console.WriteLine(contents);
}
}
The above method will print in the console, elements of the sample file Name.xml. Logic is similar to the XmlTextWriter class. You need to have an understanding of the XML Document structure to write a parser in this case.
