Document Toolkit: a better document service
This is part three of a series of blog posts on Document Toolkit. Document Toolkit is a Silverlight library offering a range of features that enable easy document access and document display in Silverlight 2 and Silverlight 3 applications.
Related Links
- Part 1: Customizing the document viewer
- Part 2: Using the WebPackageReader
- Document Toolkit Evaluation version
- Source code related to this post
Document Service
In the previous post on the WebPackageReader I presented a simple document service capable of serving XPS document parts. The client-side WebPackageReader issues HTTP requests for every document part it needs (XAML, metadata, images, etc.) and the service on the server is responsible for fetching the requested parts from the XPS document and return them as fast as possible. The following image provides an overview of the client and server architecture.

In this post I want to discuss a number of improvements in order to build a more reliable and better document service. Please note that the samples in this post are based on ASP.NET technology, but any server-side technology such as PHP or Java can be used.
Content Types
The document service presented in the previous post uses the generic mime type 'application/octet-stream' for all XPS part responses. That is not correct, all content returned from the server should have a proper content type. The Open Package Convention explicitly defines content types for all parts in an XPS document. The content types are available in the XPS document itself in the part named [Content_Types].xml.
<?xml version="1.0" encoding="utf-8"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="fdseq" ContentType="application/vnd.ms-package.xps-fixeddocumentsequence+xml"/>
<Default Extension="fdoc" ContentType="application/vnd.ms-package.xps-fixeddocument+xml"/>
<Default Extension="xml" ContentType="application/vnd.ms-printing.printticket+xml"/>
<Default Extension="JPG" ContentType="image/jpeg"/>
<Default Extension="fpage" ContentType="application/vnd.ms-package.xps-fixedpage+xml"/>
<Default Extension="odttf" ContentType="application/vnd.ms-package.obfuscated-opentype"/>
<Default Extension="PNG" ContentType="image/png"/>
<Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
</Types>
All we need to do is read the content types from the document and use the appropiate type for each requested document part. If we read the content types only once from the document and cache it in memory on the server, we do not have to read the file for every document request.
The following sample demonstrates the use of blistering fast LINQ-to-XML queries to lookup a content type.
// lookup override
var contentType = (from o in document.Descendants("{http://schemas.openxmlformats.org/package/2006/content-types}Override")
where (string)o.Attribute("PartName") == partName
select (string)o.Attribute("ContentType")).FirstOrDefault();
if (contentType == null) {
// lookup default
string extension = Path.GetExtension(partName).Substring(1); // loose the dot
contentType = (from type in document.Descendants("{http://schemas.openxmlformats.org/package/2006/content-types}Default")
where (string)type.Attribute("Extension") == extension
select (string)type.Attribute("ContentType")).FirstOrDefault();
}
// not found, set generic content-type
if (contentType == null){
contentType = "application/octet-stream";
}
For more information on the [Content_Types].xml part, please see here. Please note it is not possible to hard-wire the XPS content types into the IIS metabase. Each XPS document defines its own set of content types.
Caching
In the current situation every part request causes a full round-trip to the server. That is not needed when the requested part is available in the local browser cache. Let's add some HTTP caching headers to ensure that the client doesn't issue HTTP requests for cached parts. The following sample demonstrates the use of HTTP last modified and expiration headers. In the sample the HTTP response expires in one day.
private bool IsModified(HttpContext context, DateTime lastModified)
{
string value = context.Request.Headers["If-Modified-Since"];
if (value != null) {
DateTime ifModifiedSince;
if (DateTime.TryParse(value, out ifModifiedSince)) {
return ifModifiedSince != lastModified;
}
}
return true;
}
if (!IsModified(context, lastModified)) {
context.Response.StatusCode = (int)HttpStatusCode.NotModified;
context.Response.StatusDescription = "Not Modified";
}
else {
context.Response.Cache.SetLastModified(lastModified);
context.Response.Cache.SetExpires(DateTime.UtcNow.AddDays(1));
}
Compression
The amount of bytes sent over the wire can be greatly reduced by employing HTTP compression using the Content-Encoding header. We use the System.IO.Compression streams in combination with HTTP response filters to compress part responses. XML based parts typically get an excellent compression ratio. Compressing an image usually doesn't gain a lot, so we exclude images from compression. Compressing HTTP responses increases the server load, but we gain a reduction on HTTP traffic.
private void SetCompressionFilter(HttpContext context)
{
if (context.Response.ContentType.StartsWith("image/")) {
// do not perform compression on images
return;
} string acceptEncoding = context.Request.Headers["Accept-Encoding"];
if (acceptEncoding != null) {
if (acceptEncoding.Contains("gzip")) {
context.Response.Filter = new GZipStream(context.Response.Filter, CompressionMode.Compress);
context.Response.AppendHeader("Content-Encoding", "gzip");
}
else if (acceptEncoding.Contains("deflate")) {
context.Response.Filter = new DeflateStream(context.Response.Filter, CompressionMode.Compress);
context.Response.AppendHeader("Content-Encoding", "deflate");
}
}
}
Improving disk IO
Using caching and compression benefits the amount of bytes sent over the wire. However, the document service still opens and searches the XPS ZIP archive for every part request. This is a CPU and disk IO intensive operation that should be avoided in order to reduce the server load.
A good solution is to extract an XPS document once and store the XPS parts in a temporary folder on disk on the server. Now we only need to lookup a part on disk and sent it back to the response stream right away. If we combine this with HTTP compression, we might even choose to write a compressed version of each part to disk. In this case we do not need to dynamically compress each part before it is send to the response stream.
Extracting an XPS document to the server disk and (optionally) compressing each part is a one-time operation. Once the XPS document is extracted, the requested documents can be served quickly without any compression/decompression per request.
You may choose to unzip the XPS documents once and leave them on disk forever for documents that are not subject to updates.
Summary
Building a more reliable and better document service requires us to provide a correct content type for each requested document part. Significant performance improvements are achieved when HTTP caching, HTTP compression and smart ZIP extraction are implemented.
Download the document service source code.
Published: May 27, 2009