Madly Disproportionate Enums
Rust enums are beautiful. I don't know a ton about algebraic data types and such, but they're super useful, especially for storing tons of variable data. I, as some know, work extensively on developing modding tools for The Legend of Zelda: Breath of the Wild, and one of the most important components is roead, a Rust library I ported from oead, which is a C++ library to handle common Nintendo data types.
A great deal of Breath of the Wild's actor configuration is stored in AAMP files, a somewhat funky format which can store parameters of several different types. Naively, I originally represented an AAMP parameter as an enum like this:
#[derive(Debug, Clone)]
pub enum Parameter {
/// Boolean.
Bool(bool),
/// Float.
F32(f32),
/// Int.
Int(i32),
/// 2D vector.
Vec2(Vector2f),
/// 3D vector.
Vec3(Vector3f),
/// 4D vector.
Vec4(Vector4f),
/// Color.
Color(Color),
/// String (max length 32 bytes).
String32(FixedSafeString<32>),
/// String (max length 64 bytes).
String64(FixedSafeString<64>),
/// A single curve.
Curve1([Curve; 1]),
/// Two curves.
Curve2([Curve; 2]),
/// Three curves.
Curve3([Curve; 3]),
/// Four curves.
Curve4([Curve; 4]),
/// Buffer of signed ints.
BufferInt(Vec<i32>),
/// Buffer of floats.
BufferF32(Vec<f32>),
/// String (max length 256 bytes).
String256(FixedSafeString<256>),
/// Quaternion.
Quat(Quat),
/// Unsigned int.
U32(u32),
/// Buffer of unsigned ints.
BufferU32(Vec<u32>),
/// Buffer of binary data.
BufferBinary(Vec<u8>),
/// String (no length limit).
StringRef(String),
}
Simple, right? So thought I, until later when I was trying to stick a parameter value into another enum.
There I got a helpful suggestion from Clippy: warning: large size difference between variants
. Oh? Curious.
I looked at the details. It said that though my smallest variant was only a byte or two, the largest variant
was 544 bytes, and it just so happened to be the one which stored a Parameter
value. I was first slightly
surprised. How could it be so big?
When I looked back at the definition of Paramemter
, I quickly realized the problem. It lay in the Curve4
variant, which stores an array of four Curve
values. I double-checked the definition of Curve
:
pub struct Curve {
pub a: u32,
pub b: u32,
pub floats: [f32; 30]
}
Aha! Each Curve
is 128 bytes, which meant that an array of four of them added up to a honking 512 bytes,
much larger than the vast majority of variants. Indeed, most of the variants types are between 4 and 24 bytes,
except chiefly for the fixed-size strings.
Fortunately, Clippy also helpfully reminded me of the solution to such a problem: help: consider boxing the large fields to reduce the total size of the enum
. This I dutifully prepared to do, but alas, it raised another
complication. See, the thing about using Box
is, of course, that such values are heap-allocated. To do this
extensively is not idea in the case of processing potentially thousands of parameter files each containing dozens
or even hundreds of parameters. To stick all the larger variants on the heap is no good.
I had to find a balance, but fortunately that was not particularly difficult. The largest variants are the Curve
arrays, and these are also happily the most rare parameter type in Breath of the Wild. I can easily box them on
the very rare occasion they're used and let the enum shave off hundreds of bytes for its other 10,000 instantiations
at any given time. But even here, the matter is not settled, because there are the string types. Everything left
except for the fixed strings are at most 24 bytes. These string types, however, are much more common than Curve
values, especially String32
. How ought this to be balanced?
Well, it came down to a judgment call. String32
is by far the most common string variant in BOTW. Nothing else
comes close. String64
is not exactly rare but far from everywhere. String256
is used in only a few specific
places. It's not as simple as an inverse linear relationship between string size and frequency, but that's not
the worst way to picture it.
Anyay, the FixedSafeString
types all consist of N
bytes string storage plus a usize
length. This means that
on x64 (the only architecture I need to worry about here), FixedSafeString<32>
is exactly 40 bytes. This is not
too much more than the 24 bytes used for the existing dynamically allocated variants (like StringRef
or the buffer
types).
All these factors together, then, suggested that I box everything larger than String32
. These relatively small
inline strings can still enjoy the perks of being on the stack without adding too much baggage for all their other
inline friends, while all of the much rarer fat variants can avoid blowing up the space. In the end, the adjusted
Parameter
enum is only 48 bytes, over 10x smaller than the original but with almost no real world overhead for the
relatively rare heap allocations. I'm happy.
If you any reason you have further interest in roead, check out the repo. If instead you are interested in what I would be using roead for when modding BOTW, check out BCML or its WIP impending replacement UKMM.