CollapseXML as a serializer?

CollapseXML is our new simple XML templating engine which you can download from NuGet and use in your projects. Before reading any further, you should at least take a peek at it to know what we’re talking about in this post.

The reason for writing this post is to explain why CollapseXML is not a serializer but a templating engine, and why that matters.

Serialization is a process of converting data structures into some format that can be stored for later use or transmitted across a network. The main goal of this process is to recreate the original data structures from that format at another location in the network or at another point in time.

CollapseXML can be used as an XML serializer, but it is not its main job. The reason is simple – because it requires a template. Using it as a serializer would require you to write a template to describe every property of every object in the object graph, which would simply be too tedious. This is not CollapseXML’s strength. What is, then, the reason for CollapseXML’s existence?

There are several points.

CollapseXML allows multiple views on the same data

Most serializers often work on all-or-nothing principle. They will usually serialize every public property to an unbounded depth. Some actually do allow you to set a maximum depth but that will not prevent the output of elements which have already been output. In addition, serializing every public property does not allow different ‘views’ on the same set of data.

CollapseXML was created to allow you to output only a part of the object graph, the part that makes sense in a given context. Additionally, it allows you to have multiple views on the same set of data by making more than one template. This can be used in many scenarios – a view per user, a view per customer, a view dependent on current month or some other parameter, whatever drives your data. Another scenario is a service that returns an XML as part of some protocol. The same set of data can be used to export data using different XML protocols, etc.

Another problem with serialization of every property is the occurence of circular references.

CollapseXML does not have to handle circular references at all

Some serializers handle circular references by ignoring references they already visited. This is indeed the most reasonable action one can take. However, there are two problems.

The first is that the serializer must pollute the output with identity metadata and back-references. Objects that are referenced at multiple points in the document must have an identity metadata and objects that reference them must do so by using that metadata. This may be ok only if deserialization is performed by the same engine. However, not every situation is symetrical. Sometimes you don’t have the control over the format. Sometimes a 3rd party service requires you to send your data in a schema-specific XML format. With regular serializers, it is often impossible to match the schema by defaults. Either you have to create additional classes that will mimic the schema, or you have to export XML by hand from the data you have. To have both, you’d need a templating engine.

CollapseXML does not even handle circular references because they can’t be produced.

CollapseXML allows the data to be output in multiple locations in XML

This is also mostly useful for communication protocols or for XML-based APIs. When serializers output an element, it is not output again. Also, it is almost always output in a depth-first way, with data enumerated one after another. You almost never have any control over the order. And you usually can’t output an element multiple times.

Take a look at this example:

<Company>
  <TeamLeaders>
    <TeamLeader name="Mark Johnson" />
    <TeamLeader name="Gretta Mayer" />
    <TeamLeader name="Jenny Garfield" />
  </TeamLeaders>
  <Programmers>
    <Programmer name="Gretta Mayer" alias="Gretty" />
  </Programmers>
  <Designers>
    <Designer name="Jenny Garfield" favApp="Photoshop" />
  </Designers>
</Company>

Several people are used in multiple parts of the output XML even though the data came from the same in-memory objects. Every role the employee can take shows only data related to that role. Of course, the same could be done with serializers, but the output they would produce would look a lot more like this (The choice of metadata is arbitrary in the example):

<Company>
  <People>
    <Person name="Mark Johnson" _id="212CE434F436" />
    <Person name="Gretta Mayer" alias="Gretty" _id="2353E2412CD" />
    <Person name="Jenny Garfield" favApp="Photoshop" _id="BC42353444" />
  </People>
  <TeamLeaders>
    <Person _refId="212CE434F436" />
    <Person _refId="2353E2412CD" />
    <Person _refId="BC42353444" />
  </TeamLeaders>
  <Programmers>
    <Person  _refId="2353E2412CD" />
  </Programmers>
  <Designers>
    <Person _refId="BC42353444" />
  </Designers>
</Company>

CollapseXML does not require data annotations

Some serializers can ignore properties you don’t want to be serialized by using data annotations. In .NET, for example, one could use attributes. That’s a nice way, but there are cases where you don’t own the classes. You can get your data from another library or framework so you can’t add attributes to them. The only solution then is to do things manually or create the wrapper classes. CollapseXML does not care about attributes, it only takes what is described in the template.

Conclusion

There is even more to it, but to avoid too lengthy posts, this will be enough.

As you can see, CollapseXML may not be a good choice if all you need is a de/serialization. But if you need something more configurable, give CollapseXML a try.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>