A Naive Misadventure with Newtype Structs

Read the docs. While this is obvious advice, and indeed advice which I often insist upon for the users of my own tools, it is nonetheless perilously easy to neglect, especially when you already feel like you know what you're doing. Today I recount a recent incident where doing so could have saved me from a great deal of trouble.
The case at hand: the roead library, which I use in BCML and UKMM to handle the most important file formats in Breath of the Wild. Specifically, I was working on the SARC module. For those who know it not, SARC is a Nintendo archive format which has been around for quite some time (going back at least, as far as I know myself, to the 3DS, but probably older). It's a very simple format, fairly easy to parse and still relatively easy to write.
Recently I ran into an issue with roead. When creating a new SARC, you use a
SarcWriter
struct and add files to the files
field, which is a
BTreeMap<FileName, Vec<u8>>
(where FileName
is a newtype wrapper around String
, about which I will have more to say in a moment). In this case I was using that to build a bunch of archives using files from multiple sources, and I was perplexed by my tests constantly failing.
The error came at a point where I needed to check the archive to access files which had already been added to it. So I naturally reached for the get()
method, and was flabbergasted when this continued to fail, even after extensive debugging assured to me that all the files I was failing to access were definitely in the the file map.
This investigation brought me back to the FileName
type, a simple newtype wrapper around a String
. What did I need this for? Well, SARC archives make use of a Nintendo-specific (probably completely ad hoc) hashing algorithm for file paths, and Nintendo's parser uses a binary search on the file hashes to find them. The hashing algorithm as I have implemented it in roead looks like this (note that it could be more concisely written except that I wanted it to be usable as a const fn
):1
const fn hash_name(multiplier: u32, name: &str) -> u32 {
let mut hash = 0u32;
let bytes = name.as_bytes();
let mut i = 0;
while i < name.len() {
hash = hash
.wrapping_mul(multiplier)
.wrapping_add(bytes[i] as u32);
i += 1;
}
hash
}
Given this setup, I decided to make a FileName
struct that simply holds a string but implements Hash
and Ord
using this hashing algorithm. Basically works like this:
/// Newtype wrapper for [`String`] that hashes with the Nintendo algorithm and
/// is ordered by that hash value. Used only for the file map in [`SarcWriter`].
#[derive(Debug, Clone, Eq)]
#[repr(transparent)]
pub struct FileName(String);
impl std::hash::Hash for FileName {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
hash_name(HASH_MULTIPLIER, &self.0).hash(state)
}
}
impl PartialOrd for FileName {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
hash_name(HASH_MULTIPLIER, &self.0).partial_cmp(&hash_name(HASH_MULTIPLIER, &other.0))
}
}
impl Ord for FileName {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
hash_name(HASH_MULTIPLIER, &self.0).cmp(&hash_name(HASH_MULTIPLIER, &other.0))
}
}
// There's lots of other stuff, too, but this is what's relevant.
Finally, in order to improve ergonomics, I tried to make sure the Borrow
trait was implemented such that a string literal could be used for lookups and other key-based operations on the file map.
impl std::borrow::Borrow<str> for FileName {
fn borrow(&self) -> &str {
self.0.borrow()
}
}
As some of you well-versed in these matters might have already realized, therein lie my trouble. From the documentation for BTreeMap::get()
:
Returns a reference to the value corresponding to the key.
The key may be any borrowed form of the map’s key type, but the ordering on the borrowed form must match the ordering on the key type.
My problems all began from my neglect to note this little requirement. While in the abstract I really should have known this could be an issue, it escaped my notice that the different orderings for str
and FileName
could kick in to prevent correct lookups from static strings. So, for example, a common case of something like sarc.files.get("Actor/ResidentActors.byml")
becomes prone to fail.
I've been undecided on the preferred solution, but at least I finally tracked down the problem. The moral of the story: read the docs.
Also, if you're curious enough to care, the multiplier is in practice always 0x65
.