Visual Studio .NET includes not only RAD systems for creating WEB services, but
a complete set of tools for creating any kind of application (with the exception
of low-level apps such as device drivers). The base .NET class library serves as
a bridge to the operating system. It provides access to text files and does this
in a good object-oriented way.
In this article we will describe the different methods you can use to work with
text files. We'll analyze several techniques and compare them, and finally we will
compare the C# and Perl versions of a text file parser.
The Simplest Way to Read Text Files
We will start with classes used to work with text files. The easiest way to read
data from a text file is to use the
StreamReader
class .
StreamReader
is returned via the
OpenText
method of the
File
class
public static StreamReader OpenText(
string path
);
Let's have a look at a simple example - text file parsing code taken from a DDLConverter
utility (We use this utility to customize DLL scripts generated by a CASE tool.)
The following
ProcessFile
routine first reads the content of a text file specified by the
fileName
parameter. It then calls a function to process this content, and finally outputs
the resulting string back to the same file:
private void ProcessFile(string
fileName)
{
if (!File.Exists(fileName))
{
MessageBox.Show(this,"Incorrect
file name");
return;
}
StreamReader sr=File.OpenText(fileName);
String input=sr.ReadToEnd();
sr.Close();
StreamWriter sw=File.CreateText(fileName);
sw.Write(MakeDDLConversions(input));
sw.Close();
}
The
File.Exists()
method checks to see if the specified file exists on your hard drive. If it exists,
we read the file using
StreamReader.ReadToEnd. This method reads the entire file into memory.
It is necessary because we will use a regular expression to parse the file, so the
text that will be processed must be in memory. After we've read the text file, we
close it using
StreamReader.Close.
To process the text, we call the
MakeDDLConversions
function. (Its implementation will be discussed below).
After processing, we can save the resultant text to a file. For simplicity we output
the result to the same file. To create a text file, we call
File.CreateText
and pass it the name of the file to be created. This method returns a
StreamWriter
object that deals with character output. Then, we call
StreamWrite.Write to save the result to the file. Finally, we close the
output stream by calling
StreamWriter.Close.
Note that for simplicity, we did not use the
try...finally
block here. This does not mean you should not use it in your applications.
Below is the simplest implementation of the
MakeDDLConversions
function. It illustrates regular expressions - a well-known technique for processing
text files. Of course, regular expressions are extremely useful for file parsing,
but they are not the topic of this article, so we will not describe them in full
detail.
private string MakeDDLConversions(string
input)
{
Regex dropsRegex=new
Regex(@"DROP\s+TABLE\s+(?<tablename>\S+)"+
"\s+CREATE\s+TABLE\s+\k<tablename>\s*\(",
RegexOptions.Multiline | RegexOptions.Compiled |
RegexOptions.ExplicitCapture);
return dropsRegex.Replace(input,
new
MatchEvaluator(matchEvaluator));
}
private string matchEvaluator(Match match)
{
String tablename=match.Groups["tablename"].Value;
return "IF EXISTS(SELECT name from
sysobjects "+
"WHERE name='"+tablename+
"' AND type='U')\n DROP
TABLE "+tablename+
"\nGO\n CREATE TABLE
"+tablename+"(";
}
We have analysed a simple example. Now we'll compare this technique with "manual"
reading of a text file. But before this, we would like to remind you of some methods
of the
StreamReader
class that may be useful in your everyday work (to get the full description of this
class, see MSDN. The on-line version is available at
www.msdn.microsoft.com):
public override int Peek();
Returns the next character in a file, but does not increase the current position
in the file. This method is extremely useful for such algorithmic tasks as recursive
descendant method, but it is useless for in-memory parsing. The simplest example
of usage of this method is:
while (sr.Peek()>-1)
{
String input = sr.ReadLine();
Console.WriteLine (input);
}
public override int Read();
Reads the next character from a text file and increases the character position in
the file.
public override int Read(char[]buffer,
int index, int count);
Reads the maximum of count characters from the file into buffer. Index
specifies the position within the buffer where writing starts. As a result,
Read returns the number of read characters. This number is less or equal to count.
A lesser number means Read attempts to read data beyond the stream bound.
The next method,
ReadBlock, is very similar to the last Read. The difference between them
is that the buffer parameter in
ReadBlock
is declared as an out parameter. That is,
ReadBlock
allocates the buffer itself. Then, it reads data from a file into this buffer and
returns the number of read characters. The declaration is as follows --
public virtual int ReadBlock(out
char[] buffer, int index, int count);
Buffer is the character array to write data to. Index specifies the
position in the buffer where writing starts. Count specifies the number of
characters to read.
public override string ReadLine();
Reads a line from a text file. The line is a sequence of characters ended with any
or with both line breaks (0xA and/or 0xD).
Another Way to Read Text Files
Using
StreamReader
is very simple. Is there any other way to read data from text files? Maybe direct
reading from a file will be faster? The
BaseStream
property of
StreamReader
returns the underlying stream from which the reader gets data. In case of text file
reading, it returns a
FileStream
object. The other descendants of the
Stream
class are
BufferedStream, MemoryStream, NetworkStream
and
CryptoStream.
FileStream
is used to read data or write to any file in the system. It supports both synchronous
and asynchronous operations on files and, most importantly, provides buffered access
to files. This class has several constructors. By default,
FileStream
uses an 8Kb buffer. You may call a constructor which allows you to specify a different
buffer size. Some constructors require additional parameters that specify opening
mode, access type, etc.
Now we will create a function that will read characters from the text file, but
it will not use
StreamReader. Note that we use the
ASCIIEncoding
object to convert the sequence of bytes into a string.
private void ProcessFile(string
fileName)
{
FileStream fs=new FileStream(fileName,
FileMode.Open, FileAccess.ReadWrite,
FileShare.None, 8192 ); //we use the default
buffer
StringBuilder result=new StringBuilder(8192);
byte[] buffer=new byte[8192];
int count=fs.Read(buffer,0,8192); //reading a block from the file
ASCIIEncoding encoding=new ASCIIEncoding();
while (count!=0) //stop
when Read returns 0
{
result.Append(encoding.GetString(buffer));
count=fs.Read(buffer,0,8192);
}
fs.Seek(0, SeekOrigin.Begin);
byte[] output=encoding.GetBytes(MakeDDLConversions(result.ToString()));
fs.Write(output, 0, output.Length);
fs.Close();
}
Now let's compare this version of
ProcessFile
with the previous version that used
StreamReader
(the variant proposed in MSDN). To get better results, replace the
MakeDDLConversions
function with the following code in both cases:
private string MakeDDLConversions(string
input)
{
return input;
}
To compare the variants, we used AQtime .NET Edition beta 2. We performed several
tests and found out that the results depended upon the size of the tested files.
For relatively small files ( < 10 Mb), the variant that uses
FileStream
was faster than the one offered in the documentation (StreamReader).
On the other hand, for large files ( > 10Mb ) the
StreamReader
version was about two times faster than
FileStream. Sounds incredible. Of course, there are lots of factors that
may slow down the
FileStream
version. But try it yourself and you'll see the difference. For us it was quite
enough to use the standard method, which in addition, makes the sources clearer
and simpler.
What about Perl?
The next application we are going to review is also a text file parser. Originally
it was written in Perl. As you may know, Perl was specifically designed to process
reports and it is considered to be one of the best text processing languages. It
would be interesting to compare the execution time of text parsers written in Perl
versus one of the modern languages used for constructing web services.
As true .NET fans, we are working to port the Perl parser to Perl.NET as soon as
possible. However, Perl.NET is in beta stage, so we've come across some problems.
Having some experience in using regular expressions in C#, we decided to "translate"
the Perl application to C#. As we progressed, an idea came up - we decided to compare
timing and memory characteristics of C# and Perl parsers. We ran AQtime .NET Edition
profilers and got some results which were in line with our expectations. You can
perform these same tests for yourself using AQtime .NET's profilers.
You can download Perl.NET from
http://www.activestate.com/Initiatives/NET/Research.html - and visit our
download page to download AQtime .NET Edition. To get the test C# and Perl.NET code,
follow these links: C# code and
Perl code.
Additional Notes
If you store your application data in a file, have a look at
IsolatedStorageFile
and
IsolatedStorageFileStream
classes. They provide access to an isolated storage within the operating system.
If you are working with binary data you can use
BinaryReader
and
BinaryWriter
classes. With their help, you can read and write any primitive C# type.
Summary
We hope this article helped you understand how to work with text files in Microsoft
.NET. We've written two variants of a text file parser and found out that the standard
way of working with a text file (i.e. the way proposed in the documentation) is
faster than ours.