Skip to content

Considerations about strings

J. Zebedee edited this page Oct 5, 2015 · 1 revision

This document describes the implementation of MsgPack-CLI design and implementation for it.

The de-facto standard interpretation of MessagePack specification is that a Unicode string should be encoded as UTF-8 without BOM and stored on Raw type.

MsgPack-CLI is implemented as following:

  • Packer packs String (or Char sequence) as UTF-8 bytes on Raw type. Note that Packer provides overloaded methods which accepts System.Text.Encoding to specify custom character encoding.
  • Unpacker and MessagePackObject handles Raw type value as byte[], and they provide ReadString or AsString methods which handle character decoding from unpacked Raw type value.
  • MessagePackSerializer<T> uses above primitive APIs as following rules:
  • If target field or property is String type, then UTF-8 encoding will be used. If deserializing stream contains invalid byte sequence as UTF-8, an exception is thrown.
  • If target field or property is Byte[] type, then raw bytes will be stored as is.
  • If you want to handle other encoding like Latin-1 string, Shift-JIS string etc., you must build custom serializer by hand.