BLOB

As a regular programmer I was quite accustomed to storing program information as blob files (binary large objects). In C# this is really easy, you just mark a class as serializable via the [Serializable()] attribute, special fields that you don’t wish to be serialized you mark as [NonSerialized]. Finally you create a BinaryFormatter for the actual serialization and de-serialization and you’re done. Source code (note: there’s no error handling in any of the source code to keep things short) would look something like this:

[Serializable()]
public class SerializeBLOB
{
	public string name = "Hello my name is...";

	[NonSerialized]
	public SerializeBLOB circularReference;

	public static void Serialize(SerializeBLOB obj, Stream stream)
	{
		BinaryFormatter bf = new BinaryFormatter();
		bf.Serialize(stream, obj);
	}

	public static SerializeBLOB DeSerialize(Stream stream)
	{
		BinaryFormatter bf = new BinaryFormatter();
		return (SerializeBLOB)bf.Deserialize(stream);
	}
}

Pointing the stream at a file would generate a binary file. Using blobs is very easy, it takes only a few lines of code to store and retrieve an object. However the generated output is not human readable. And a program which was written in a different language and/or has no access to the original class file (be it in compiled form or not) will not understand, let alone modify the file, correctly without a lot of effort.

INI

A different take on this are INI-files, which are files where an attribute is named and a value is followed. An ini file could look like this:

name=someName
number=18

This file is human readable and easily understood and edited by either computers or humans. This file was generated by the following code, which is not really complex, but there are some big problems which I haven’t solved yet.

public class SerializeINI
{
	public string name = "someName";
	public int number = 18;

	public static void Serialize(SerializeINI obj, Stream stream)
	{
		StringBuilder sb = new StringBuilder();
		sb.AppendLine("name=" + obj.name);
		sb.AppendLine("number=" + obj.number.ToString());
		StreamWriter writer = new StreamWriter(stream);
		writer.Write(sb.ToString());
	}

	public static SerializeINI DeSerialize(Stream stream)
	{
		StreamReader reader = new StreamReader(stream);
		SerializeINI obj = new SerializeINI();
		string line;
		while((line = reader.ReadLine()) != String.Empty)
		{
			string[] id_value = line.Split('=');
			switch (id_value[0])
			{
				case "name":
					obj.name = id_value[1];
					break;
				case "number":
					obj.number = Int32.Parse(id_value[1]);
					break;
			}
		}

		return obj;
	}
}

As you can see, serialization and de-serialization is not as automated as you’d hope for. We have to write the code for each action ourselves, which can be cumbersome if we are dealing with large and/or numerous classes. Also classes that we serialize need to have only public fields that we want to store, or have some kind of constructor (that maybe accepts a nice struct) that allows us to set all fields. There’s also another problem, there is no clear syntax on arrays, we could of course invent some syntax, but this would differ from program to program. Another problem is how do we save text that contains a ‘=’ char. We could solve this by using escaping or by using a special dummy character, although these problems are all solvable, the way these problems are solved differ from application, which makes it hard to make applications interact with each other.

In the comments Alex correctly states that with reflection you could get automatically get all the objects members automatically, you can even introduce your own attributes or respond to the existing attributes like [nonserialized]. Of course this would make ini serialization and de-serialization a lot easier. However reflection code is often very complex so I won’t put any source code here. But maybe I’ll write a separate tutorial about reflection.

XML

So let’s try another type of serialization, XML (eXtensible Markup Language) has been quite the buzz lately, it’s being used to create web pages (XHTML) and a lot of web services and web APIs communicate via XML. Some databases even allow us to save XML and to search through XML and in .Net there is even a separate XML namespace (System.XML). As with most things, it’s trivially easy to serialize to XML in .NET.

 <Human>
  <Name>Anthony</Name>
  <Age>38</Age>
 <Stuff  test='123'>something</Stuff>
</Human>

XML is human readable and very understandable by machines. However the syntax is a bit complex sometimes. The tree like structure is understandable, but when do we write an attribute between two tags like ‘Name’ or do we use the parameter syntax like in ‘Stuff’? Or both. There’s also a lot of confusion about arrays, we could use a tree like structure and creating a new leaf for each element in the array, but there are other solutions. The syntax for serialization to XML looks quite like that of serialization to BLOB. We mark the class as serializable by setting it as a root element via [XMLRootAttribute(…)] and stuff we don’t want to serialize we mark with [XmlIgnore]. To beautify the XML output we can even add attributes to different fields to specify the arguments, type and name).

[XmlRootAttribute("TestRoot")]
public class SerializeXML
{
	public string name = "Hello my name is...";

	[XmlIgnore]
	public SerializeXML circularReference;

	public static void Serialize(SerializeXML obj, Stream stream)
	{
		XmlSerializer xs = new XmlSerializer(typeof(SerializeXML));
		xs.Serialize(stream, obj);
	}

	public static SerializeXML DeSerialize(Stream stream)
	{
		XmlSerializer xs = new XmlSerializer(typeof(SerializeXML));
		return (SerializeXML)xs.Deserialize(stream);
	}
}

JSON

JSON (JavaScript Object Notation) is not as well known as any of the previously mentioned techniques, also don’t mind the name, JSON is usable in any programming language. JSON is slightly easier to read and requires less typing than XML, the notation is much cleaner, and there is always only one way to store data. The graphs at http://json.org/ show how easy it is to learn how to read and write a JSON file (however the website is quite biased). A JSON file would look like this (from Wikipedia):

{
     "firstName": "John",
     "lastName": "Smith",
     "age": 25,
     "address": {
         "streetAddress": "21 2nd Street",
         "city": "New York",
         "state": "NY",
         "postalCode": "10021"
     },
     "phoneNumber": [
         { "type": "home", "number": "212 555-1234" },
         { "type": "fax", "number": "646 555-4567" }
     ]
 }

New knowledge for me is that JSON in .Net is fully integrated into the .Net Framework. But you have to manually add the references to “System.ServiceModel.Web” and System.Runtime.Serialization.XmlObjectSerializer to your project to take full advantage of this. Take a look at the following sourcecode:

    [Serializable()]
    public class SerializeJSON
    {
        //Add the references System.ServiceModel.Web
        //and System.Runtime.Serialization.XmlObjectSerializer
        //to your project to be able to see DataContractJSONSerializer
        public string name = "Hello my name is...";

        [NonSerialized]
        public SerializeJSON circularReference;

        public static void Serialize(SerializeJSON obj, Stream stream)
        {
            DataContractJsonSerializer js = new DataContractJsonSerializer(typeof(SerializeJSON));
            js.WriteObject(stream, obj);            
        }

        public static SerializeJSON DeSerialize(Stream stream)
        {
            DataContractJsonSerializer js = new DataContractJsonSerializer(typeof(SerializeJSON));
            return (SerializeJSON)js.ReadObject(stream);
        }
    }
}

As you can see the JSON serialization syntax closely matches that of the BLOB (BinaryFormatter) syntax, which makes it easy to use, but there is less control over the output than with XML.

Conclusion

BLOB files are handy for data that doesn’t need to be edited by anything else than your own application, INI files are something from the past, but might still have a right to exist because it is so easy to edit them by hand, INI files are still heavily used in the game industry to set a lot of properties at game. Big engines like the Unreal Engine still use a lot of INI files. XML has a slightly more difficult syntax but you can give a lot of meaning to attributes and it is very customizable how attributes are stored. Also the support from the .NET framework is incredible and XML is a widely appreciated standard. JSON is less widely adopted than XML except in JAVASCRIPT driven web applications but JSON has a very clear syntax. However XML is more expressive as it is a bit more difficult to give extra meaning to an attribute in JSON than it is in XML, but JSON is a purer form of storing data where XML might be a bit too multi-purpose sometimes. As with all techniques there is no ‘winner’, all techniques have their uses and I look forward to use all of them.

Other cool things to know is that JSON can be directly evaluated in javascript by using the eval(…) function. And for XML there is a full query language called XPATH which allows you to query your XML files like you would query a database.

Special thanks to creator1988 for pointing me to the JSON serializer.