Post on 15-Jul-2015
XML Writers
XML documents are text-based files
The XML Writer Programming Interface:
An XML writer represents a component that provides a fast, forward-only way of outputting XML data to streams or files.
void CreateXmlFile(String[] theArray, string filename)
{
StringBuilder sb = new StringBuilder("");
// Loop through the array and build the file
sb.Append("<array>");
foreach(string s in theArray)
{
sb.Append("<element value=\"");
sb.Append(s);
sb.Append("\"/>");
}
sb.Append("</array>");
// Create the file
StreamWriter sw = new StreamWriter(filename);
sw.Write(sb.ToString());
sw.Close();
}
Let's rewrite our sample file using .NET XML writers, as shown in the following code. A
.NET XML writer features ad hoc write methods for each possible XML node type and
makes the creation of XML output more logical and much less dependent on the
intricacies, and even the quirkiness, of the markup languages.
void CreateXmlFileUsingWriters(String[] theArray,
string filename)
{
// Open the XML writer (default encoding charset)
XmlTextWriter xmlw = new XmlTextWriter(filename, null);
xmlw.Formatting = Formatting.Indented;
xmlw.WriteStartDocument();
xmlw.WriteStartElement("array");
foreach(string s in theArray)
{
xmlw.WriteStartElement("element");
xmlw.WriteAttributeString("value", s);
xmlw.WriteEndElement();
}
xmlw.WriteEndDocument();
// Close the writer
xmlw.Close();
}
<?xml version="1.0"?>
<array>
<element value="Rome" />
<element value="New York" />
<element value="Sydney" />
<element value="Stockholm" />
<element value="Paris" />
</array>
An XML writer is a specialized class that knows only how to write XML data to a variety
of storage media. It features ad hoc methods to write any special item that
characterizes XML documents—from character entities to processing instructions, from
comments to attributes, and from element nodes to plain text. In addition, and more
important, an XML writer guarantees well-formed XML 1.0–compliant output. And you
don't have to worry about a single angle bracket or the last element node that you left
open.
The XmlWriter Base ClassXML writers are based on the XmlWriter abstract class that defines the .NET
Framework interface for writing XML. The XmlWriter class is not directly creatable from
user applications, but it can be used as a reference type for objects that are instances
of classes derived from XmlWriter. Actually, the .NET Framework provides just one
class that gives a concrete implementation of the XmlWriter interface—the
XmlTextWriter class.
Writing Well-Formed XML TextTheXmlTextWriter class takes a number of precautions to ensure that the final XML
code is perfectly compliant with the XML 1.0 standard of well-formedness. In particular,
the class verifies that any special character found in the passed text is automatically
escaped and that no elements are written in the wrong order (such as attributes outside
nodes, or CDATA sections within attributes). Finally, the Close method performs a full
check of well-formedness immediately prior to return. If the verification is successful,
the method ends gracefully; otherwise, an exception is thrown.
Other controls that the XmlTextWriter class performs on the generated XML output
ensure that each document starts with the standard XML prolog, shown in the following
code, and that any DOCTYPE node always precedes the document root node:
<?xml version="1.0" ?>
The following code demonstrates how to write two identical attributes for a specified
node:
xmlw.WriteStartElement("element");
xmlw.WriteAttributeString("value", s);
xmlw.WriteAttributeString("value", s);
xmlw.WriteEndElement();
In the check made just before dumping data out, the writer neither verifies the names
and semantics of the attributes nor validates the schema of the resultant document,
thus authorizing this code to generate bad XML.
Building an XML DocumentInitialize the document
Write data
Close the document
Writing the XML PrologOnce you have a living and functional instance of the XmlTextWriter class, the first XML
element you add to it is the official XML 1.0 signature. You obtain this signature in a
very natural and transparent way simply by calling the WriteStartDocument method.
This method starts a new document and marks the XML declaration with the version
attribute set to "1.0", as shown in the following code:
// produces: <?xml version="1.0"?>
writer.WriteStartDocument();
Decoding Base64 and BinHex DataReading encoded data is a bit trickier, but not because the ReadBase64 and
ReadBinHex methods feature a more complex interface. The difficulty lies in the fact
that you have to allocate a buffer to hold the data and make some decision about its
size. If the buffer is too large, you can easily waste memory; if the buffer is too small,
you must set up a potentially lengthy loop to read all the data. In addition, if you can't
process data as you read it, you need another buffer or stream in which you can
accumulate incoming data.
Encoding-derived classes also provide a method—GetString—to transform an array ofbytes into a string, as shown here:
XmlTextReader reader = new XmlTextReader(filename);
while(reader.Read())
{
if (reader.LocalName == "element")
{
byte[] bytes = new byte[1000];
int n = reader.ReadBase64(bytes, 0, 1000);
string buf = Encoding.Unicode.GetString(bytes);
// Output the decoded data
Console.WriteLine(buf.Substring(0,n));
}
}
reader.Close();
Embedding Images in XML DocumentsThe structure of the sample XML document is extremely simple. It will consist of a
single <jpeg> node holding the BinHex data plus an attribute containing the original
name, as shown here:
writer.WriteStartDocument();
writer.WriteComment("Contains a BinHex JPEG image");
writer.WriteStartElement("jpeg");
writer.WriteAttributeString("FileName", filename);
// Get the size of the file
FileInfo fi = new FileInfo(jpegFileName);
int size = (int) fi.Length;
// Read the JPEG file
byte[] img = new byte[size];
FileStream fs = new FileStream(jpegFileName, FileMode.Open);
BinaryReader f = new BinaryReader(fs);
img = f.ReadBytes(size);
f.Close();
// Write the JPEG data
writer.WriteBinHex(img, 0, size);
// Close the document
writer.WriteEndElement();
writer.WriteEndDocument();
public void WriteContent(DataTable dt)
{
// Write data
Writer.WriteStartElement("rs", "data", null);
foreach(DataRow row in dt.Rows)
{
Writer.WriteStartElement("z", "row", null);
foreach(DataColumn dc in dt.Columns)
Writer.WriteAttributeString(dc.ColumnName,
row[dc.ColumnName].ToString());
Writer.WriteEndElement();
}
Writer.WriteEndElement();
}
ADO Recordset objects do not support embedding more result sets in a single XML file.
For this reason, you must either develop a new XML format or use separate files, one
for each result set
Testing the XmlRecordsetWriter Class
For .NET Framework applications, using the XmlRecordsetWriter class is no big deal.
You simply instantiate the class and call its methods, as shown here:
void ButtonLoad_Click(object sender, System.EventArgs e)
{
// Create and display the XML document
CreateDocument("adors.xml");
UpdateUI("adors.xml");
}
void CreateDocument(string filename)
{
DataSet ds = LoadDataFromDatabase();
XmlRecordsetWriter writer = new
XmlRecordsetWriter(filename);
writer.WriteRecordset(ds);
}
A Read/Write XML Streaming ParserXML readers and writers work in separate compartments and in an extremely
specialized way. Readers just read, and writers just write. There is no way to force
things to go differently, and in fact, the underlying streams are read-only or write-only
as required. Suppose that your application manages lengthy XML documents that
contain rather volatile data. Readers provide a powerful and effective way to read that
contents.
Designing a Writer on Top of a ReaderIn the .NET Framework, the XML DOM classes make intensive use of streaming
readers and writers to build the in-memory tree and to flush it out to disk. Thus, readers
and writers are definitely the only XML primitives available in the .NET Framework.
Consequently, to build up a sort of lightweight XML DOM parser, we can only rely, once
more, on readers and writers
The inspiration for designing such a read/write streaming parser is database server
cursors. With database server cursors, you visit records one after the next and, if
needed, can apply changes on the fly. Database changes are immediately effective,
and actually the canvas on which your code operates is simply the database table. The
same model can be arranged to work with XML documents.
You will use a normal XML (validating) reader to visit the nodes in sequence. While
reading, however, you are given the opportunity to change attribute values and node
contents. Unlike the XML DOM, changes will have immediate effect. How can you
obtain these results? The idea is to use an XML writer on top of the reader
Built-In Support for Read/Write OperationsWhen I first began thinking about this lightweight XML DOM component, one of key
points I identified was an efficient way to copy (in bulk) blocks of nodes from the readonly
stream to the write stream. Luckily enough, two somewhat underappreciated
XmlTextWriter methods just happen to cover this tricky but boring aspect of two-way
streaming: WriteAttributes and WriteNode.
The WriteAttributes method reads all the attributes available on the currently selected
node in the specified reader. It then copies them as a single string to the current output
stream. Likewise, the WriteNode method does the same for any other type of node.
Note that WriteNode does nothing if the node type is XmlNodeType.Attribute
The following code shows how to use these methods to create a copy of the original
XML file, modified to skip some nodes. The XML tree is visited in the usual node-first
approach using an XML reader. Each node is then processed and written out to the
associated XML writer according to the index. This code scans a document and writes
out every other node
XmlTextReader reader = new XmlTextReader(inputFile);
XmlTextWriter writer = new XmlTextWriter(outputFile);
// Configure reader and writer
writer.Formatting = Formatting.Indented;
reader.MoveToContent();
// Write the root
writer.WriteStartElement(reader.LocalName);
// Read and output every other node
int i=0;
while(reader.Read())
{
if (i % 2)
writer.WriteNode(reader, false);
i++;
}
// Close the root
writer.WriteEndElement();
// Close reader and writer
writer.Close();
reader.Close();
The CSV Reader/Writer in ActionLet's take a sample CSV file, read it, and apply some changes to the contents so that
they will automatically be persisted when the reader is closed. Here is the source CSV
file:
LastName,FirstName,Title,Country
Davolio,Nancy,Sales Representative,USA
Fuller,Andrew,Sales Manager,USA
Leverling,Janet,Sales Representative,UK
Suyama,Michael,Sales Representative,UK
// Instantiate the reader on a CSV file
XmlCsvReadWriter reader;
reader = new XmlCsvReadWriter("employees.csv",
hasHeader.Checked);
reader.EnableOutput = true;
reader.Read();
// Define the schema of the table to bind to the grid
DataTable dt = new DataTable();
for(int i=0; i<reader.AttributeCount; i++)
{
reader.MoveToAttribute(i);
DataColumn col = new DataColumn(reader.Name,
typeof(string));
dt.Columns.Add(col);
}
reader.MoveToElement();
// Loop through the CSV rows and populate the DataTable
do
{
DataRow row = dt.NewRow();
for(int i=0; i<reader.AttributeCount; i++)
{
if (reader[i] == "Sales Representative")
reader[i] = "Sales Force";
row[i] = reader[i].ToString();
}
dt.Rows.Add(row);
}
while (reader.Read());
// Flushes the changes to disk
reader.Close();
// Bind the table to the grid
dataGrid1.DataSource = dt;
Readers and writers are at the foundation of every I/O operation in the .NET
Framework. You find them at work when you operate on disk and on network files,
when you serialize and deserialize, while you perform data access, even when you
read and write configuration settings.
XML writers are ad hoc tools for creating XML documents using a higherlevel metaphor
and putting more abstraction between your code and the markup. By using XML
writers, you go far beyond markup to reach a nodeoriented dimension in which, instead
of just accumulating bytes in a block of contiguous memory, you assemble nodes and
entities to create the desired schema and infoset
.NET XML writers only ensure the well-formedness of each individual XML element
being generated. Writers can in no way guarantee the well-formedness of the entire
document and can do even less to validate a document against a DTD or a schema.
Although badly formed XML documents can only result from actual gross programming
errors, the need for an extra step of validation is often felt in production environments,
especially when the creation of the document depends on a number of variable factors
and run-time conditions. For this reason, we've also examined the key points involved
in the design and implementation of a validating XML writer.