Working with XML in Microsoft .NET
Introduction
Nowadays XML is a widespread format for data storing and processing. It is
used not only as "one more feature" for Web pages. It's a good means
of storing and transferring data between applications or between parts of the
same application. No wonder that Microsoft .NET provides a good variety of ways
to work with .xml files. For instance, the Dataset object includes methods that
allow users to load the contents of .xml files to a database table or to save
the table to an .xml file. Microsoft .NET libraries also have specific classes
used to parse .xml files.
Normally, XML parsers use either Document Object Model (DOM) or Simple API
for XML (SAX). DOM parsers read the whole document and create the document scheme
in memory. They provide random access to elements of .xml files and allow developers
to read and write information from or to .xml files.
SAX provides read-only access to .xml files. However, it uses less memory and
works faster than DOM.
SAX is an event-driven push parser. When it reads an XML document it generates
specific events. To handle these events, an application must implement a specific
event-handler interface that contains methods which handle appropriate events:
An alternative to the push model is the pull one. A pull parser does not "push" events to the application for processing. Instead, the application pulls data from the parser by calling special methods of the parser object:
In many cases the pull model is more convenient than the push one, which is why most parsers implement the pull model.
This article describes how to use .NET objects to process .xml files. Also,
it includes results of timing the performance of different objects.
We will work with an .xml file, dataset.xml, which is generated from the following
table (MS SQL Server):
|
We used the following code to fill this table with data:
using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.Data;
using System.Data.SqlClient;
using System.IO;
using System.Xml;
. . .
private void FillTableWithData(int NumberOfRecords)
{
// Creates a connection to the database
SqlConnection connection = new SqlConnection("server=(local);"+
"database=MyTestDB;Trusted_Connection=yes");
SqlCommand command = new SqlCommand();
command.Connection = connection;
connection.Open();
// Adds records to the table
for (int i = 0; i < NumberOfRecords; i++)
{
// Fills the Text, VarChar and
// Data columns with data.
// Note that since the ID column is an identity one,
// we don't fill it.
command.CommandText = "INSERT INTO Test (Text,[VarChar], Data)"+
" VALUES (@Text, @VarChar, @Data)";
command.Parameters.Clear();
command.Parameters.Add("@Text",SqlDbType.Text).Value="Text for row #"+ i;
command.Parameters.Add("@VarChar",SqlDbType.VarChar,100).Value=
"VarChar text for row #" + i;
command.Parameters.Add("@Data", SqlDbType.Decimal).Value = i / 0.54257;
command.ExecuteNonQuery();
}
connection.Close();
}
Upon running this procedure, the dataset holds the following data:
|
The following code saves the dataset to an .xml file:
using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.Data;
using System.Data.SqlClient;
using System.IO;
using System.Xml;
. . .
private void GenerateXML()
{
// Creates a new connection to the database
SqlConnection connection = new SqlConnection("server=(local);"+
"database=XMLTest;Trusted_Connection=yes");
connection.Open();
// Creates a new adapter
SqlDataAdapter adapter= new SqlDataAdapter("SELECT * FROM [Test]",
connection);
// Creates a new dataset and
// fills it with data though the adapter
DataSet data = new DataSet("Test");
adapter.Fill(data);
connection.Close();
// Exports a dataset to an .xml file
FileStream stream = File.Create("dataset.xml");
data.WriteXml(stream, XmlWriteMode.IgnoreSchema);
stream.Close();
}
The resultant dataset.xml looks as follows:
<Test>
<Table>
<ID>1</ID>
<Text>Text for row #0</Text>
<VarChar>VarChar text for row #0</VarChar>
<Data>0</Data>
</Table>
<Table>
<ID>2</ID>
<Text>Text for row #1</Text>
<VarChar>VarChar text for row #1</VarChar>
<Data>2</Data>
</Table>
. . .
Now we'll describe how to parse this file using different .NET objects.
XmlReader
Like SAX parsers, XMLReader provides forward-only read-only access to .xml files. However, unlike SAX parsers, it uses the pull model. Note that XMLReader is an abstract class (MustOverride in VB). To parse .xml files we will use its descendant, XMLTextReader. The following procedure parses an .xml file and saves its contents to a .txt file.
using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.IO;
using System.Xml;
. . .
private void CalcXMLReader()
{
// Opens the source and output files as streams
FileStream stream = File.Open("test.xml", FileMode.Open);
FileStream output = File.Create("output.txt");
StreamWriter writer = new StreamWriter(output);
// Creates a new instance of the XMLTextReader object
XmlReader reader = new XmlTextReader(stream);
// Reads the current element and processes it
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Whitespace)
continue;
if (reader.HasValue)
// Writes the node value to the output file
writer.Write(reader.Value);
else
if (reader.NodeType != XmlNodeType.EndElement )
// Writes the node name to the output file
writer.Write("\n" + reader.Name + ": ");
// Writes node attributes to the file
if (reader.HasAttributes)
{
writer.Write("\nAttributes:\n");
reader.MoveToFirstAttribute();
for (int i = 0; i < reader.AttributeCount; i++)
{
writer.Write("\t" + reader.Name + "=" + reader.Value + "\n");
reader.MoveToNextAttribute();
}
}
}
reader.Close();
writer.Close();
stream.Close();
output.Close();
}
The CalcXMLReader function outputs the node's name, its attributes
and value for each node in the parsed .xml file.
First, we will run this routine for a simple file, test.xml:
<test name="XML Test" version="1.0">
<node name="node_1">
<inner_node name="inner">Sample text</inner_node><self_closing/>
</node>
</test>
The resultant .txt file holds the following data:
|
XMLTextReader has several constructors that specify different sources of XML data: Stream, string or URL. In our example, we used the stream constructor:
FileStream stream = File.Open("test.xml", FileMode.Open);
XmlReader reader = new XmlTextReader(stream);
To read XML elements from the stream, we call the XMLTextReader.Read method:
[C#]
public abstract bool Read();
[Visual Basic]
MustOverride Public Function Read() As Boolean
[C++]
public: virtual bool Read() = 0;
[JScript]
public abstract function Read() : Boolean;
Read reads the current element in the file and moves the cursor to the next element. If the element was read successfully, Read returns true; Else - false. The current element is not defined upon opening an .xml file (the cursor is before the first element), so you must call Read at least once to move to the first element.
To get the type of the current element, we call the NodeType property. (NodeType can be one of the values of the XmlNodeType enumeration). If the element is Whitespace, we move to the next element:
. . .
if (reader.NodeType == XmlNodeType.Whitespace)
continue;
. . .
If the current element is a value, the HasValue returns true and we output this value to the .txt file using the following code:
. . .
if (reader.HasValue)
writer.Write(reader.Value);
. . .
The name of the current node is specified by the XmlReader.Name property. We output the node's name when we meet the node's opening tag. Note that both the opening and closing tags have no values. That is why we put code, which outputs the node's name, to the else statement:
if (reader.HasValue)
. . .
else
if (reader.NodeType != XmlNodeType.EndElement )
writer.Write("\n" + reader.Name + ": ");
. . .
A node can have a number of attributes. To access them, we use the following methods and properties:
In addition to methods and properties mentioned above, XMLReader also contains specific methods, ReadStartElement and ReadEndElement, used for reading nodes by their names. To use these methods you should know the structure of the processed .xml file. For more information on them, see MSDN (on-line version is available at www.msdn.microsoft.com).
Ok, we have parsed test.xml. To process dataset.xml (the file with our data), you should modify the code a little. Instead of
FileStream stream = File.Open("test.xml", FileMode.Open);
use
FileStream stream = File.Open("dataset.xml", FileMode.Open);
Note that nodes of the dataset.xml file don't have attributes. So, the code that processes the attributes is not executed when dataset.xml is being parsed. The CalcXMLReader function produces the following output for dataset.xml:
Test:
Table:
ID: 1
Text: Text for row #0
VarChar: VarChar text for row #0
Data: 0
Table:
ID: 2
Text: Text for row #1
VarChar: VarChar text for row #1
Data: 2
. . .
XMLDocument
The XMLDocument object works with .xml files using DOM (Document Object Model): It implements W3C Document Object Model (DOM) Level 1 Core and Core DOM Level 2. XMLDocument contains specific methods used to read and write XML documents and navigate the DOM tree. The following code illustrates the use of XMLDocument:
using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.IO;
using System.Xml;
. . .
private void CalcXMLDocument()
{
// Creates input and output files
FileStream stream = File.Open("dataset.xml", FileMode.Open);
FileStream output = File.Create("output2.txt");
StreamWriter writer = new StreamWriter(output);
// Creates a new XmlDocument object and
// fills it with data
XmlDocument document = new XmlDocument();
document.Load(stream);
// Gets a list of <Table> elements located within
// the <Test> and </Test> tags
XmlNodeList nodes = document.SelectNodes("Test/Table");
// Passes through all the nodes in the list
foreach (XmlNode node in nodes)
{
writer.WriteLine("Record:");
// Scans the child nodes and outputs them to the file
for (int i = 0; i < node.ChildNodes.Count; i++)
writer.WriteLine("{0}: {1}", node.ChildNodes[i].Name,
node.ChildNodes[i].InnerText);
// Processing node attributes
if (node.Attributes.Count > 0 )
{
writer.WriteLine("\n\tAttributes:\n");
for(int j = 0; j< node.Attributes.Count; j++)
writer.WriteLine("\tAttribute:{0} = [{1}]",
node.Attributes.Item(j).Name, node.Attributes.Item(j).InnerText);
}
}
writer.Close();
stream.Close();
output.Close();
}
For dataset.xml the program produces the following output:
Record:
ID: 1
Text: Text for row #0
VarChar: VarChar text for row #0
Data: 0
Record:
ID: 2
Text: Text for row #1
VarChar: VarChar text for row #1
Data: 2
Record:
ID: 3
Text: Text for row #2
VarChar: VarChar text for row #2
Data: 5
. . .
Like XMLTextReader, XMLDocument can process data from streams, strings or URLs. It has two methods, Load and LoadXml, which open the needed source. Load is used to obtain data from a stream or URL, LoadXml - to load data from a string. As we process data from a stream we used Load:
XmlDocument document = new XmlDocument();
document.Load(stream);
After we open the source, we call SelectNodes:
XmlNodeList nodes = document.SelectNodes("Test/Table");
This method creates a collection of nodes that match the specified pattern. "Test/Table" means that the collection will include <Table> nodes located within the <Test> and </Test> tags. The XmlNodeList collection consists of the XmlNode objects. Each of them corresponds to a node from the .xml file being parsed (in our case, from dataset.xml).
We process each Table node using the following code:
foreach (XmlNode node in nodes)
{
writer.WriteLine("Record:");
for (int i = 0; i < node.ChildNodes.Count; i++)
writer.WriteLine("\t{0}: {1}", node.ChildNodes[i].Name,
node.ChildNodes[i].InnerText);
}
As XMLDocuemnt is a DOM parser, it considers <Table> nodes as parent nodes, and <ID>, <Text>, <VarChar> and <Data> nodes as children of <Table>. To get access to them, we used the ChildNodes collection:
for (int i = 0; i < node.ChildNodes.Count; i++)
writer.WriteLine("\t{0}: {1}", node.ChildNodes[i].Name,
node.ChildNodes[i].InnerText);
The Name property specifies the name of a child node (ID, Text,
VarChar or Data). InnerText returns the value of the child node.
To process node attributes, we called the XmlNode.Attributes property.
It returns a collection of all attributes (the XmlAttributeCollection
object) of a node. Note that in our case this code is not executed, since nodes
of the dataset.xml file have no attributes:
if (node.Attributes.Count > 0 )
{
writer.WriteLine("\n\tAttributes:\n");
for(int j = 0; j < node.Attributes.Count; j++)
writer.WriteLine("\tAttribute:{0} = [{1}]",
node.Attributes.Item(j).Name, node.Attributes.Item(j).InnerText);
}
The XmlAttributeCollection.Count property returns the total number of the node's attributes. To obtain an attribute, we call XmlAttributeCollection.Item. To get the name and value of an attribute we use the Name and InnerText properties.
DataSet
One of the ways to process data stored in an .xml file is to load this file to a dataset and then process the data in the dataset fields. When the processing is over, the dataset records can be exported to an .xml file. The following code illustrates the use of the Dataset object for processing .xml files:
using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.Data;
using System.IO;
using System.Xml;
. . .
private void CalcDataset()
{
// Opens the input and output files as streams
FileStream stream = File.Open("dataset.xml", FileMode.Open);
FileStream output = File.Create("output3.txt");
StreamWriter writer = new StreamWriter(output);
// Creates a new Dataset object and
// fills it with data from the .xml file
DataSet dataset = new DataSet();
dataset.ReadXml(stream);
// Processes records of the dataset
foreach (DataTable table in dataset.Tables)
{
foreach (DataRow row in table.Rows)
{
// Outputs data to the file
writer.WriteLine("Record: ");
writer.WriteLine("ID: {0}", row["ID"]);
writer.WriteLine("Text: {0}", row["Text"]);
writer.WriteLine("VarChar: {0}", row["VarChar"]);
writer.WriteLine("Data: {0}", row["Data"]);
}
}
// Closes files
writer.Close();
stream.Close();
output.Close();
}
For dataset.xml this program outputs the following:
Table:
ID: 1
Text: Text for row #0
VarChar: VarChar text for row #0
Data: 0
Table:
ID: 2
Text: Text for row #1
VarChar: VarChar text for row #1
Data: 2
. . .
To fill a dataset with data from an .xml file, we called the Dataset.ReadXml method. This method is overloaded, so it can load data to the dataset from a stream, string or a TextReader object. After loading data from a file, we can process data in dataset fields. So, we walk through the dataset records and save data to a .txt file.
Note that the Dataset object contains more methods for working with .xml files. For instance, ReadXmlSchema, WriteXml, GetXml, etc. For more information on them, see MSDN (www.msdn.microsoft.com).
Profiling
To measure the performance of XMLReader, XmlDocument and Dataset objects we've created several tables with a different number of records. To time the routines, we've used AQtime .NET Edition. Here are the results:
|
Summary
As we can see, XMLReader is faster than XMLDocument
or Dataset. However, it provides the forward-only read-only access
to .xml files.
XMLDocument is two or four times slower than XMLReader.
However, XMLDocument provides random access to elements of an .xml
file and allows you to read and write from or to .xml files.
Dataset is the slowest method. It may be a convenient tool for processing .xml files that store data from a relational database.
We've demonstrated how the XML parser performance depends on the number of
records. Using AQtime .NET Edition you can do your own performance testing and
find some interesting dependencies.