AutomatedQA: Award-winning tools for software development and quality assurance

Home » Technical Papers » Technical Papers - Working with XML in Microsoft .NET

Working with XML in Microsoft .NET

Introduction

Nowadays XML is a widespread format for data storing and processing. It is used not only as "one more feature" for Web pages. It's a good means of storing and transferring data between applications or between parts of the same application. No wonder that Microsoft .NET provides a good variety of ways to work with .xml files. For instance, the Dataset object includes methods that allow users to load the contents of .xml files to a database table or to save the table to an .xml file. Microsoft .NET libraries also have specific classes used to parse .xml files.

Normally, XML parsers use either Document Object Model (DOM) or Simple API for XML (SAX). DOM parsers read the whole document and create the document scheme in memory. They provide random access to elements of .xml files and allow developers to read and write information from or to .xml files.

SAX provides read-only access to .xml files. However, it uses less memory and works faster than DOM.

SAX is an event-driven push parser. When it reads an XML document it generates specific events. To handle these events, an application must implement a specific event-handler interface that contains methods which handle appropriate events:

An alternative to the push model is the pull one. A pull parser does not "push" events to the application for processing. Instead, the application pulls data from the parser by calling special methods of the parser object:

In many cases the pull model is more convenient than the push one, which is why most parsers implement the pull model.

This article describes how to use .NET objects to process .xml files. Also, it includes results of timing the performance of different objects.
We will work with an .xml file, dataset.xml, which is generated from the following table (MS SQL Server):

Field Type
 ID  INTEGER (the identity column)
 Text  TEXT
 VarChar  VARCHAR(100)
 Data  DECIMAL

We used the following code to fill this table with data:

using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.Data;
using System.Data.SqlClient;
using System.IO;
using System.Xml;

 . . .

private void FillTableWithData(int NumberOfRecords)
{
  // Creates a connection to the database
  SqlConnection connection = new SqlConnection("server=(local);"+
                      "database=MyTestDB;Trusted_Connection=yes");

  SqlCommand command = new SqlCommand();
  command.Connection = connection;

  connection.Open();

  // Adds records to the table
  for (int i = 0; i < NumberOfRecords; i++)
  {
    // Fills the Text, VarChar and
    // Data columns with data.
    // Note that since the ID column is an identity one,
    // we don't fill it.
    command.CommandText = "INSERT INTO Test (Text,[VarChar], Data)"+
                     " VALUES (@Text, @VarChar, @Data)";
    command.Parameters.Clear();
    command.Parameters.Add("@Text",SqlDbType.Text).Value="Text for row #"+ i;
    command.Parameters.Add("@VarChar",SqlDbType.VarChar,100).Value=
                     "VarChar text for row #" + i;
    command.Parameters.Add("@Data", SqlDbType.Decimal).Value = i / 0.54257;
    command.ExecuteNonQuery();
  }
  connection.Close();
}

Upon running this procedure, the dataset holds the following data:

 ID Text VarChar Data
 1  Text for row #0  VarChar text for row #0  0
 2  Text for row #1  VarChar text for row #1  2
 ...  ...  ... ...

The following code saves the dataset to an .xml file:

using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.Data;
using System.Data.SqlClient;
using System.IO;
using System.Xml;

 . . .

private void GenerateXML()
{
  // Creates a new connection to the database
  SqlConnection connection = new SqlConnection("server=(local);"+
                          "database=XMLTest;Trusted_Connection=yes");
  connection.Open();

  // Creates a new adapter
  SqlDataAdapter adapter= new SqlDataAdapter("SELECT * FROM [Test]",
                           connection);
  // Creates a new dataset and 
  // fills it with data though the adapter
  DataSet data = new DataSet("Test");
  adapter.Fill(data);

  connection.Close();

  // Exports a dataset to an .xml file
  FileStream stream = File.Create("dataset.xml");
  data.WriteXml(stream, XmlWriteMode.IgnoreSchema);
  stream.Close();
}

The resultant dataset.xml looks as follows:

<Test>
<Table>
<ID>1</ID>
<Text>Text for row #0</Text>
<VarChar>VarChar text for row #0</VarChar>
<Data>0</Data>
</Table>
<Table>
<ID>2</ID>
<Text>Text for row #1</Text>
<VarChar>VarChar text for row #1</VarChar>
<Data>2</Data>
</Table>
. . .

Now we'll describe how to parse this file using different .NET objects.

 

XmlReader

Like SAX parsers, XMLReader provides forward-only read-only access to .xml files. However, unlike SAX parsers, it uses the pull model. Note that XMLReader is an abstract class (MustOverride in VB). To parse .xml files we will use its descendant, XMLTextReader. The following procedure parses an .xml file and saves its contents to a .txt file.

using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.IO;
using System.Xml;
 . . .

private void CalcXMLReader()
{
  // Opens the source and output files as streams
  FileStream stream = File.Open("test.xml", FileMode.Open);
  FileStream output = File.Create("output.txt");
  StreamWriter writer = new StreamWriter(output);

  // Creates a new instance of the XMLTextReader object
  XmlReader reader = new XmlTextReader(stream);

  // Reads the current element and processes it
  while (reader.Read())
  {
    if (reader.NodeType == XmlNodeType.Whitespace)
      continue;

    if (reader.HasValue)
      // Writes the node value to the output file
      writer.Write(reader.Value);
    else 
      if (reader.NodeType != XmlNodeType.EndElement )
        // Writes the node name to the output file
        writer.Write("\n" + reader.Name + ": ");

    // Writes node attributes to the file
    if (reader.HasAttributes)
    {
      writer.Write("\nAttributes:\n");
      reader.MoveToFirstAttribute();
      for (int i = 0; i < reader.AttributeCount; i++)
      {
        writer.Write("\t" + reader.Name + "=" + reader.Value + "\n");
        reader.MoveToNextAttribute();
      }
    }
  }

  reader.Close();
  writer.Close();

  stream.Close();
  output.Close();
}

The CalcXMLReader function outputs the node's name, its attributes and value for each node in the parsed .xml file.

First, we will run this routine for a simple file, test.xml:

<test name="XML Test" version="1.0">
<node name="node_1">
<inner_node name="inner">Sample text</inner_node><self_closing/>
</node>
</test>

The resultant .txt file holds the following data:

test: <-- node name
Attributes: <-- node attributes
     name=XML Test  
     version=1.0  
  <-- node value (the <test> node has no value, so the function outputs an empty string)
node:  
Attributes:  
     name=node_1  
   
inner_node:  
Attributes:  
     name=inner  
Sample text  
self_closing:  

XMLTextReader has several constructors that specify different sources of XML data: Stream, string or URL. In our example, we used the stream constructor:

FileStream stream = File.Open("test.xml", FileMode.Open);
XmlReader reader = new XmlTextReader(stream);

To read XML elements from the stream, we call the XMLTextReader.Read method:

[C#]
public abstract bool Read();

[Visual Basic]
MustOverride Public Function Read() As Boolean

[C++]
public: virtual bool Read() = 0;

[JScript]
public abstract function Read() : Boolean; 

Read reads the current element in the file and moves the cursor to the next element. If the element was read successfully, Read returns true; Else - false. The current element is not defined upon opening an .xml file (the cursor is before the first element), so you must call Read at least once to move to the first element.

To get the type of the current element, we call the NodeType property. (NodeType can be one of the values of the XmlNodeType enumeration). If the element is Whitespace, we move to the next element:

. . .
if (reader.NodeType == XmlNodeType.Whitespace)
  continue;
. . .

If the current element is a value, the HasValue returns true and we output this value to the .txt file using the following code:

. . .
if (reader.HasValue)
  writer.Write(reader.Value);
. . .

The name of the current node is specified by the XmlReader.Name property. We output the node's name when we meet the node's opening tag. Note that both the opening and closing tags have no values. That is why we put code, which outputs the node's name, to the else statement:

if (reader.HasValue)
 . . .
else
  if (reader.NodeType != XmlNodeType.EndElement )
    writer.Write("\n" + reader.Name + ": ");
 . . .

A node can have a number of attributes. To access them, we use the following methods and properties:

HasAttribute - Specifies if a node has attributes.
AttributeCount - Returns the number of node's attributes.
MoveToFirstAttribute - Moves the cursor to the first attribute of the node. To obtain the attribute value, use the Value property of XMLReader. If you know the name of the attribute, you can get its value by calling the GetAttribute method.
MoveToNextAttribute - Moves the cursor to the next attribute of the node. To position the cursor to an attribute by its index, use MoveToAttribute.

In addition to methods and properties mentioned above, XMLReader also contains specific methods, ReadStartElement and ReadEndElement, used for reading nodes by their names. To use these methods you should know the structure of the processed .xml file. For more information on them, see MSDN (on-line version is available at www.msdn.microsoft.com).

Ok, we have parsed test.xml. To process dataset.xml (the file with our data), you should modify the code a little. Instead of

FileStream stream = File.Open("test.xml", FileMode.Open);

use

FileStream stream = File.Open("dataset.xml", FileMode.Open);

Note that nodes of the dataset.xml file don't have attributes. So, the code that processes the attributes is not executed when dataset.xml is being parsed. The CalcXMLReader function produces the following output for dataset.xml:

Test:
Table:
ID: 1
Text: Text for row #0
VarChar: VarChar text for row #0
Data: 0
Table:
ID: 2
Text: Text for row #1
VarChar: VarChar text for row #1
Data: 2
. . .

 

XMLDocument

The XMLDocument object works with .xml files using DOM (Document Object Model): It implements W3C Document Object Model (DOM) Level 1 Core and Core DOM Level 2. XMLDocument contains specific methods used to read and write XML documents and navigate the DOM tree. The following code illustrates the use of XMLDocument:

using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.IO;
using System.Xml;
. . . private void CalcXMLDocument()
{
// Creates input and output files
FileStream stream = File.Open("dataset.xml", FileMode.Open);
FileStream output = File.Create("output2.txt");
StreamWriter writer = new StreamWriter(output);

// Creates a new XmlDocument object and
// fills it with data
XmlDocument document = new XmlDocument();
document.Load(stream);
// Gets a list of <Table> elements located within
// the <Test> and </Test> tags
XmlNodeList nodes = document.SelectNodes("Test/Table");
// Passes through all the nodes in the list
foreach (XmlNode node in nodes)
{
writer.WriteLine("Record:");
// Scans the child nodes and outputs them to the file
for (int i = 0; i < node.ChildNodes.Count; i++)
writer.WriteLine("{0}: {1}", node.ChildNodes[i].Name,
node.ChildNodes[i].InnerText);
// Processing node attributes

if (node.Attributes.Count > 0 )
{
writer.WriteLine("\n\tAttributes:\n");
for(int j = 0; j< node.Attributes.Count; j++)
writer.WriteLine("\tAttribute:{0} = [{1}]", node.Attributes.Item(j).Name, node.Attributes.Item(j).InnerText);
}
}
writer.Close();
stream.Close();
output.Close();
}

For dataset.xml the program produces the following output:

Record:
	ID: 1
	Text: Text for row #0
	VarChar: VarChar text for row #0
	Data: 0
Record:
	ID: 2
	Text: Text for row #1
	VarChar: VarChar text for row #1
	Data: 2
Record:
	ID: 3
	Text: Text for row #2
	VarChar: VarChar text for row #2
	Data: 5
. . .

Like XMLTextReader, XMLDocument can process data from streams, strings or URLs. It has two methods, Load and LoadXml, which open the needed source. Load is used to obtain data from a stream or URL, LoadXml - to load data from a string. As we process data from a stream we used Load:

XmlDocument document = new XmlDocument();
document.Load(stream);

After we open the source, we call SelectNodes:

XmlNodeList nodes = document.SelectNodes("Test/Table");

This method creates a collection of nodes that match the specified pattern. "Test/Table" means that the collection will include <Table> nodes located within the <Test> and </Test> tags. The XmlNodeList collection consists of the XmlNode objects. Each of them corresponds to a node from the .xml file being parsed (in our case, from dataset.xml).

We process each Table node using the following code:

foreach (XmlNode node in nodes)
{
  writer.WriteLine("Record:");
    for (int i = 0; i < node.ChildNodes.Count; i++)
      writer.WriteLine("\t{0}: {1}", node.ChildNodes[i].Name, 
                                   node.ChildNodes[i].InnerText);
}

As XMLDocuemnt is a DOM parser, it considers <Table> nodes as parent nodes, and <ID>, <Text>, <VarChar> and <Data> nodes as children of <Table>. To get access to them, we used the ChildNodes collection:

for (int i = 0; i < node.ChildNodes.Count; i++)
  writer.WriteLine("\t{0}: {1}", node.ChildNodes[i].Name, 
                    node.ChildNodes[i].InnerText);

The Name property specifies the name of a child node (ID, Text, VarChar or Data). InnerText returns the value of the child node.
To process node attributes, we called the XmlNode.Attributes property. It returns a collection of all attributes (the XmlAttributeCollection object) of a node. Note that in our case this code is not executed, since nodes of the dataset.xml file have no attributes:

if (node.Attributes.Count > 0 ) 
{
  writer.WriteLine("\n\tAttributes:\n");
  for(int j = 0; j < node.Attributes.Count; j++)
    writer.WriteLine("\tAttribute:{0} = [{1}]", 
     node.Attributes.Item(j).Name, node.Attributes.Item(j).InnerText);
}

The XmlAttributeCollection.Count property returns the total number of the node's attributes. To obtain an attribute, we call XmlAttributeCollection.Item. To get the name and value of an attribute we use the Name and InnerText properties.

 

DataSet

One of the ways to process data stored in an .xml file is to load this file to a dataset and then process the data in the dataset fields. When the processing is over, the dataset records can be exported to an .xml file. The following code illustrates the use of the Dataset object for processing .xml files:

using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.Data;
using System.IO;
using System.Xml;

 . . .

private void CalcDataset()
{
  // Opens the input and output files as streams
  FileStream stream = File.Open("dataset.xml", FileMode.Open);
  FileStream output = File.Create("output3.txt");
  StreamWriter writer = new StreamWriter(output);

  // Creates a new Dataset object and
  // fills it with data from the .xml file
  DataSet dataset = new DataSet();
  dataset.ReadXml(stream);

  // Processes records of the dataset
  foreach (DataTable table in dataset.Tables)
  {
    foreach (DataRow row in table.Rows)
    {
      // Outputs data to the file
      writer.WriteLine("Record: ");
      writer.WriteLine("ID: {0}", row["ID"]);
      writer.WriteLine("Text: {0}", row["Text"]);
      writer.WriteLine("VarChar: {0}", row["VarChar"]);
      writer.WriteLine("Data: {0}", row["Data"]);
    }
  }

  // Closes files
  writer.Close();
  stream.Close();
  output.Close();
}

For dataset.xml this program outputs the following:

Table:
ID: 1
Text: Text for row #0
VarChar: VarChar text for row #0
Data: 0
Table:
ID: 2
Text: Text for row #1
VarChar: VarChar text for row #1
Data: 2
. . .

To fill a dataset with data from an .xml file, we called the Dataset.ReadXml method. This method is overloaded, so it can load data to the dataset from a stream, string or a TextReader object. After loading data from a file, we can process data in dataset fields. So, we walk through the dataset records and save data to a .txt file.

Note that the Dataset object contains more methods for working with .xml files. For instance, ReadXmlSchema, WriteXml, GetXml, etc. For more information on them, see MSDN (www.msdn.microsoft.com).

 

Profiling

To measure the performance of XMLReader, XmlDocument and Dataset objects we've created several tables with a different number of records. To time the routines, we've used AQtime .NET Edition. Here are the results:

Records XMLReader (s) XMLDocument (s) Dataset (s)
 100  0.02  0.08  0.09
 1000  0.19  0.3  0.94
 5000  0.91  1.99  8.12
 10000  1.72  3.88  30.85
 40000  6.89  15.60  645.98

 

Summary

As we can see, XMLReader is faster than XMLDocument or Dataset. However, it provides the forward-only read-only access to .xml files.
XMLDocument is two or four times slower than XMLReader. However, XMLDocument provides random access to elements of an .xml file and allows you to read and write from or to .xml files.

Dataset is the slowest method. It may be a convenient tool for processing .xml files that store data from a relational database.

We've demonstrated how the XML parser performance depends on the number of records. Using AQtime .NET Edition you can do your own performance testing and find some interesting dependencies.

Copyright © 1999- 2008, AutomatedQA, Corp. All Rights Reserved.
Home | Legal | About | Contact | Site Map | Print