AutomatedQA: Award-winning tools for software development and quality assurance

Home » Technical Papers » Technical Papers - Working with text files in Microsoft .NET

Working with text files in Microsoft .NET

Visual Studio .NET includes not only RAD systems for creating WEB services, but a complete set of tools for creating any kind of application (with the exception of low-level apps such as device drivers). The base .NET class library serves as a bridge to the operating system. It provides access to text files and does this in a good object-oriented way.

In this article we will describe the different methods you can use to work with text files. We'll analyze several techniques and compare them, and finally we will compare the C# and Perl versions of a text file parser.

The Simplest Way to Read Text Files

We will start with classes used to work with text files. The easiest way to read data from a text file is to use the StreamReader class . StreamReader is returned via the OpenText method of the File class

public static StreamReader OpenText(
     string
path
);

Let's have a look at a simple example - text file parsing code taken from a DDLConverter utility (We use this utility to customize DLL scripts generated by a CASE tool.)

The following ProcessFile routine first reads the content of a text file specified by the fileName parameter. It then calls a function to process this content, and finally outputs the resulting string back to the same file:

private void ProcessFile(string fileName)
{
  if (!File.Exists(fileName))
  {
    MessageBox.Show(this,"Incorrect file name");
    return;
  }    

  StreamReader sr=File.OpenText(fileName);
  String input=sr.ReadToEnd();
  sr.Close();

  StreamWriter sw=File.CreateText(fileName);
  sw.Write(MakeDDLConversions(input));
  sw.Close();
}

The File.Exists() method checks to see if the specified file exists on your hard drive. If it exists, we read the file using StreamReader.ReadToEnd. This method reads the entire file into memory. It is necessary because we will use a regular expression to parse the file, so the text that will be processed must be in memory. After we've read the text file, we close it using StreamReader.Close.
To process the text, we call the MakeDDLConversions function. (Its implementation will be discussed below).

After processing, we can save the resultant text to a file. For simplicity we output the result to the same file. To create a text file, we call File.CreateText and pass it the name of the file to be created. This method returns a StreamWriter object that deals with character output. Then, we call StreamWrite.Write to save the result to the file. Finally, we close the output stream by calling StreamWriter.Close.

Note that for simplicity, we did not use the try...finally block here. This does not mean you should not use it in your applications.

Below is the simplest implementation of the MakeDDLConversions function. It illustrates regular expressions - a well-known technique for processing text files. Of course, regular expressions are extremely useful for file parsing, but they are not the topic of this article, so we will not describe them in full detail.

private string MakeDDLConversions(string input)
{
  Regex dropsRegex=new
    Regex(@"DROP\s+TABLE\s+(?<tablename>\S+)"+
    "\s+CREATE\s+TABLE\s+\k<tablename>\s*\(",
    RegexOptions.Multiline | RegexOptions.Compiled |
    RegexOptions.ExplicitCapture);
  return dropsRegex.Replace(input,
          new MatchEvaluator(matchEvaluator));
}

private string matchEvaluator(Match match)
{
  String tablename=match.Groups["tablename"].Value;
  return "IF EXISTS(SELECT name from sysobjects "+
         "WHERE name='"+tablename+
         "' AND type='U')\n DROP TABLE "+tablename+
         "\nGO\n CREATE TABLE "+tablename+"(";
}

We have analysed a simple example. Now we'll compare this technique with "manual" reading of a text file. But before this, we would like to remind you of some methods of the StreamReader class that may be useful in your everyday work (to get the full description of this class, see MSDN. The on-line version is available at www.msdn.microsoft.com):

public override int Peek();

Returns the next character in a file, but does not increase the current position in the file. This method is extremely useful for such algorithmic tasks as recursive descendant method, but it is useless for in-memory parsing. The simplest example of usage of this method is:

while (sr.Peek()>-1)
{
  String input = sr.ReadLine();
  Console.WriteLine (input);
}

public override int Read();

Reads the next character from a text file and increases the character position in the file.

public override int Read(char[]buffer, int index, int count);

Reads the maximum of count characters from the file into buffer. Index specifies the position within the buffer where writing starts. As a result, Read returns the number of read characters. This number is less or equal to count. A lesser number means Read attempts to read data beyond the stream bound.

The next method, ReadBlock, is very similar to the last Read. The difference between them is that the buffer parameter in ReadBlock is declared as an out parameter. That is, ReadBlock allocates the buffer itself. Then, it reads data from a file into this buffer and returns the number of read characters. The declaration is as follows --

public virtual int ReadBlock(out char[] buffer, int index, int count);

Buffer is the character array to write data to. Index specifies the position in the buffer where writing starts. Count specifies the number of characters to read.

public override string ReadLine();

Reads a line from a text file. The line is a sequence of characters ended with any or with both line breaks (0xA and/or 0xD).

Another Way to Read Text Files

Using StreamReader is very simple. Is there any other way to read data from text files? Maybe direct reading from a file will be faster? The BaseStream property of StreamReader returns the underlying stream from which the reader gets data. In case of text file reading, it returns a FileStream object. The other descendants of the Stream class are BufferedStream, MemoryStream, NetworkStream and CryptoStream.

FileStream is used to read data or write to any file in the system. It supports both synchronous and asynchronous operations on files and, most importantly, provides buffered access to files. This class has several constructors. By default, FileStream uses an 8Kb buffer. You may call a constructor which allows you to specify a different buffer size. Some constructors require additional parameters that specify opening mode, access type, etc.

Now we will create a function that will read characters from the text file, but it will not use StreamReader. Note that we use the ASCIIEncoding object to convert the sequence of bytes into a string.

private void ProcessFile(string fileName)
{
  FileStream fs=new FileStream(fileName, FileMode.Open, FileAccess.ReadWrite,
    FileShare.None, 8192 ); //we use the default buffer
  
  StringBuilder result=new StringBuilder(8192);
  byte[] buffer=new byte[8192];
  int count=fs.Read(buffer,0,8192); //reading a block from the file

  ASCIIEncoding encoding=new ASCIIEncoding();
  while (count!=0) //stop when Read returns 0
  {
    result.Append(encoding.GetString(buffer));
    count=fs.Read(buffer,0,8192);
  }

  fs.Seek(0, SeekOrigin.Begin);

  byte[] output=encoding.GetBytes(MakeDDLConversions(result.ToString()));

  fs.Write(output, 0, output.Length);
  fs.Close();
}

Now let's compare this version of ProcessFile with the previous version that used StreamReader (the variant proposed in MSDN). To get better results, replace the MakeDDLConversions function with the following code in both cases:

private string MakeDDLConversions(string input)
{
  return input;
}

To compare the variants, we used AQtime .NET Edition beta 2. We performed several tests and found out that the results depended upon the size of the tested files. For relatively small files ( < 10 Mb), the variant that uses FileStream was faster than the one offered in the documentation (StreamReader). On the other hand, for large files ( > 10Mb ) the StreamReader version was about two times faster than FileStream. Sounds incredible. Of course, there are lots of factors that may slow down the FileStream version. But try it yourself and you'll see the difference. For us it was quite enough to use the standard method, which in addition, makes the sources clearer and simpler.

What about Perl?

The next application we are going to review is also a text file parser. Originally it was written in Perl. As you may know, Perl was specifically designed to process reports and it is considered to be one of the best text processing languages. It would be interesting to compare the execution time of text parsers written in Perl versus one of the modern languages used for constructing web services.

As true .NET fans, we are working to port the Perl parser to Perl.NET as soon as possible. However, Perl.NET is in beta stage, so we've come across some problems. Having some experience in using regular expressions in C#, we decided to "translate" the Perl application to C#. As we progressed, an idea came up - we decided to compare timing and memory characteristics of C# and Perl parsers. We ran AQtime .NET Edition profilers and got some results which were in line with our expectations. You can perform these same tests for yourself using AQtime .NET's profilers.

You can download Perl.NET from http://www.activestate.com/Initiatives/NET/Research.html - and visit our download page to download AQtime .NET Edition. To get the test C# and Perl.NET code, follow these links: C# code and Perl code.

Additional Notes

If you store your application data in a file, have a look at IsolatedStorageFile and IsolatedStorageFileStream classes. They provide access to an isolated storage within the operating system. If you are working with binary data you can use BinaryReader and BinaryWriter classes. With their help, you can read and write any primitive C# type.

Summary

We hope this article helped you understand how to work with text files in Microsoft .NET. We've written two variants of a text file parser and found out that the standard way of working with a text file (i.e. the way proposed in the documentation) is faster than ours.

Copyright © 1999- 2008, AutomatedQA, Corp. All Rights Reserved.
Home | Legal | About | Contact | Site Map | Print