Madly Disproportionate Enums

Rust enums are beautiful. I don't know a ton about algebraic data types and such, but they're super useful, especially for storing tons of variable data. I, as some know, work extensively on developing modding tools for The Legend of Zelda: Breath of the Wild, and one of the most important components is roead, a Rust library I ported from oead, which is a C++ library to handle common Nintendo data types.

A great deal of Breath of the Wild's actor configuration is stored in AAMP files, a somewhat funky format which can store parameters of several different types. Naively, I originally represented an AAMP parameter as an enum like this:

#[derive(Debug, Clone)]
pub enum Parameter {
    /// Boolean.
    Bool(bool),
    /// Float.
    F32(f32),
    /// Int.
    Int(i32),
    /// 2D vector.
    Vec2(Vector2f),
    /// 3D vector.
    Vec3(Vector3f),
    /// 4D vector.
    Vec4(Vector4f),
    /// Color.
    Color(Color),
    /// String (max length 32 bytes).
    String32(FixedSafeString<32>),
    /// String (max length 64 bytes).
    String64(FixedSafeString<64>),
    /// A single curve.
    Curve1([Curve; 1]),
    /// Two curves.
    Curve2([Curve; 2]),
    /// Three curves.
    Curve3([Curve; 3]),
    /// Four curves.
    Curve4([Curve; 4]),
    /// Buffer of signed ints.
    BufferInt(Vec<i32>),
    /// Buffer of floats.
    BufferF32(Vec<f32>),
    /// String (max length 256 bytes).
    String256(FixedSafeString<256>),
    /// Quaternion.
    Quat(Quat),
    /// Unsigned int.
    U32(u32),
    /// Buffer of unsigned ints.
    BufferU32(Vec<u32>),
    /// Buffer of binary data.
    BufferBinary(Vec<u8>),
    /// String (no length limit).
    StringRef(String),
}

Simple, right? So thought I, until later when I was trying to stick a parameter value into another enum. There I got a helpful suggestion from Clippy: warning: large size difference between variants. Oh? Curious. I looked at the details. It said that though my smallest variant was only a byte or two, the largest variant was 544 bytes, and it just so happened to be the one which stored a Parameter value. I was first slightly surprised. How could it be so big?

When I looked back at the definition of Paramemter, I quickly realized the problem. It lay in the Curve4 variant, which stores an array of four Curve values. I double-checked the definition of Curve:

pub struct Curve {
    pub a: u32,
    pub b: u32,
    pub floats: [f32; 30]
}

Aha! Each Curve is 128 bytes, which meant that an array of four of them added up to a honking 512 bytes, much larger than the vast majority of variants. Indeed, most of the variants types are between 4 and 24 bytes, except chiefly for the fixed-size strings.

Fortunately, Clippy also helpfully reminded me of the solution to such a problem: help: consider boxing the large fields to reduce the total size of the enum. This I dutifully prepared to do, but alas, it raised another complication. See, the thing about using Box is, of course, that such values are heap-allocated. To do this extensively is not idea in the case of processing potentially thousands of parameter files each containing dozens or even hundreds of parameters. To stick all the larger variants on the heap is no good.

I had to find a balance, but fortunately that was not particularly difficult. The largest variants are the Curve arrays, and these are also happily the most rare parameter type in Breath of the Wild. I can easily box them on the very rare occasion they're used and let the enum shave off hundreds of bytes for its other 10,000 instantiations at any given time. But even here, the matter is not settled, because there are the string types. Everything left except for the fixed strings are at most 24 bytes. These string types, however, are much more common than Curve values, especially String32. How ought this to be balanced?

Well, it came down to a judgment call. String32 is by far the most common string variant in BOTW. Nothing else comes close. String64 is not exactly rare but far from everywhere. String256 is used in only a few specific places. It's not as simple as an inverse linear relationship between string size and frequency, but that's not the worst way to picture it.

Anyay, the FixedSafeString types all consist of N bytes string storage plus a usize length. This means that on x64 (the only architecture I need to worry about here), FixedSafeString<32> is exactly 40 bytes. This is not too much more than the 24 bytes used for the existing dynamically allocated variants (like StringRef or the buffer types).

All these factors together, then, suggested that I box everything larger than String32. These relatively small inline strings can still enjoy the perks of being on the stack without adding too much baggage for all their other inline friends, while all of the much rarer fat variants can avoid blowing up the space. In the end, the adjusted Parameter enum is only 48 bytes, over 10x smaller than the original but with almost no real world overhead for the relatively rare heap allocations. I'm happy.

If you any reason you have further interest in roead, check out the repo. If instead you are interested in what I would be using roead for when modding BOTW, check out BCML or its WIP impending replacement UKMM.