F# ASCII85 module

Sometimes, you need to store binary data in places where only text data is welcome, such as XML files. The quick & dirty solution is to code your data into hexadecimal strings, which you can easily get away with when storing a few bytes. However, sometimes you want better space efficiency and this is where ASCII85 comes to the rescue.

ASCII85 is an encoding that represents binary data, using only printable ASCII characters. Using the 85 characters from ‘!’ to ‘u’ to represent values between 0 and 84, you get a base 85 number system. In other words, if we write a number with the digit sequence d4 d3 d2 d1 d0, the d0 digit has a weight of 850 (= 1), the d1 digit weighs 851 (= 85), the d2 digit 852 etc.

Adding all five digit values together, you’ll get 855-1 = 4 437 053 125, which conveniently happens to be slightly greater than 232. This enables us to store a 32 bit integer neatly in a 5 character ASCII85 string. For a longer and better explanation, head over to Wikipedia.

As I couldn’t find an F# implementation of ASCII85, I had a go at writing one. I needed it in order to store GUIDs, and beholding ye olde ASCII table for the first time in years was quite a nice retro trip.

Download source code

Download VS2010 solution here.

The solution includes a test project.

How to use the module

encode : byte [] -> string
decode : string -> byte []

Hardly surprising, these functions do what you think they do. decode will strip “<~” and “~>” delimiters before decoding, if there are any.

addDelimiters : string -> string

Encloses the string between “<~” and “~>”.

breakIntoLines : string -> int -> string

Inserts line breaks to keep line length within the supplied number.

isDelimited : string -> bool

Returns true if the string is enclosed between “<~” and “~>”.

stripDelimiters : string -> string

Removes the enclosing “<~” and “~>” if they’re present, and raises an exception if they’re not.

stripOptionalDelimiters : string -> string

Removes the enclosing “<~” and “~>” if they’re present, and does nothing if they’re not.

A word about handling delimiters and line breaks

Honouring the functional style of doing things, adding line breaks and the optional ASCII85 delimiters <~ and ~> is done by composing small functions, rather than specifying options as arguments to a big one. Example:

byteArray |> encode |> addDelimiters |> breakIntoLines 75

Performance

As I wrote the ASCII85 module with clarity rather than performance in mind, accepting a little extra copying of bytes here and there for the sake of neatness, it’s fair to suspect that my code runs slower than an imperative solution. Compared to Jeff Atwood’s C# implementation, encoding is about 2-3 times slower, but decoding is surprisingly a little less than twice as fast. I haven’t given performance much thought except for the compromises explained below, lacking a use case where ASCII85 encoding could be a bottle neck. Functional programming isn’t about maximum performance anyway. That’s what C++ and other low level languages are for.

Compromises

The purely functional way of encoding binary data into ASCII85, would IMHO be to consume a list of bytes as you build the encoded string. Since most binary data will probably reside in a .NET array, rather than a list, we would need to convert it into a list. This not only takes time, but also increases memory requirements 9 fold, since each single byte would have a pointer (the list being a linked list) to the next byte, and pointers are 8 bytes long these days. As for the decoding, the same applies to the string, which should ideally have been a list of characters.

A nine fold memory cost is a bit rich even for a functional program, so as a concession to the limited nature of memory resources and CPU performance, the encoder increments an index into a byte array instead of consuming a list and conversely, the decoder uses an index into a string. This solution smells slightly imperative, and as if to emphasize the perils of everything imperative, I had track down a pesky little bug related to the index housekeeping.

The joys of statelessness

Apart from the use of the mutable StringBuilder in breakIntoLines, the ASCII85 module is stateless, as functional programs tend to be. The advantages of not having a state, are best examplified by the disadvantages of having one. This can be demonstrated by calling Jeff Atwood’s C# version from the following code:

static void Destabilize()
{
    var guid = Guid.NewGuid();
    Ascii85 a = new Ascii85();
    var encodedGuid = a.Encode(guid.ToByteArray());
    string encoded = "<~blocz~>";  // invalid coded string, will cause exception
                                   // and place a in an unstable state.
    try {
        a.Decode(encoded);
    }
    catch (Exception) {
        // do nothing...
    }

    var decodedGuid = new Guid(a.Decode(encodedGuid));
    Console.WriteLine(guid != decodedGuid ? "Decoding error." : "Decoded correctly.");
}

The exception will leave residues of an old decoding state in a, causing it to decode incorrectly (unless you get the result “Decoded correctly”, in which case Atwood has fixed the glitch).

Speaking of glitches, if you find any bugs in my code, please post a comment so that I can fix them.

  1. No comments yet.

  1. November 7th, 2014
    Trackback from : fresh staff