Bit fields for everyone
2025-10-27
This is an edited version of the talk I gave at RustFest Zurich 2024. Titled “PSA: You too can pack structs”
Bit fields is a way of putting data together to save memory. Not merly 10%, we are talking about dividing memory usage by 2, 10, in one personal case, it divided by 100 memory usage!
How to do it? Read on!
Here is why you would want to use bit fields:
- Reduce memory usage
- Reduce battery usage
- Reduce bandwidth usage
- Support older and cheaper devices
Why not use bit fields? Well, mostly you should use them.
Yet, it’s not widdely known. It should be, as it’s an easy solution to a lot of problems, and has been arround since the dawn of computing.
Let’s start with some code:
pub struct MeshPipelineKey {
hdr: bool,
tonemap_in_shader: bool,
deband_dither: bool,
depth_prepass: bool,
normal_prepass: bool,
deferred_prepass: bool,
motion_vector_prepass: bool,
may_discard: bool,
environment_map: bool,
screen_space_ambient_occlusion: bool,
depth_clamp_ortho: bool,
temporal_jitter: bool,
morph_targets: bool,
reads_view_transmission_texture: bool,
lightmapped: bool,
irradiance_volume: bool,
blend: Blend,
msaa: Msaa,
primitive_topology: PrimitiveTopology,
shadow_filter_method: ShadowFilterMethod,
screen_space_specular_transmission: SsstQuality,
view_projection: ViewProjection,
tonemap_method: TonemapMethod,
}
This is taken from the Bevy1 game engine.
Bevy is an amazing engine. It let you write games in Rust, like pygame, but with type safety and performance. It’s open source, so no nasty license backstabs, and if the documentation is lacking in place, you just press enter in the IDE and get a direct look at the code.
In Bevy, MeshPipelineKey is part of the rendering engine, it associates a mesh to a material. Each frame – so, probably 120 times per seconds – we check the material of all our meshes, a Bevy game can have from 4 meshes to 10K, even 100K meshes.
There is about a million hash and comparison per second. And MeshPipelineKey is just a tinny little bit of the whole engine, and beyond the engine, we still have your whole game to run. This needs to go damn fast.
With this struct, it doesn’t go damn fast. Why? CPUs are perfect at hashing (especially with the right algorithm) and comparing, it takes a single instruction! But MeshPipelineKey is 23 bytes long, and a CPU can’t compare 23 bytes in one go, it has to load this in two different registers, then keep the hash result and compare it.
This is quite fast, but it’s not enough, it’s not damn fast. What would be damn fast? Using a primitive, like u32, a hash and comparison would be 1 instruction, if not less than 1!
Can we use an u32 instead of this large struct? In the first place, why does it take 23 bytes?
Each field take one byte. bool in Rust takes one byte, not one bit, as you could expect. This is because bools are addressable, and addresses work on bytes, not bits. Otherwise, how would you do &bool?2
Ok, this isn’t a good reason, since references in Rust aren’t addresses.
Here is how it looks in memory:
The same is true for the other fields. They are all enums with a tinny number of variants. They could be encoded in as few as 2 bits. For example:
enum TonemapMethod {
None = 0b000,
Reinhard = 0b001,
ReinhardLuminance = 0b010,
AcesFitted = 0b011,
Agx = 0b100,
SomewhatBoringDisplayTransform = 0b101,
TonyMcMapface = 0b110,
BlenderFilmic = 0b111,
}
That represents 7*16=112 unused bits (ie bits we are going to hash but don’t care about). That’s about 87.5% of the struct. So many wasted bits!
Can we do better than Rust here? Of course.
Instead of declaring a whole struct, let’s use a u32. Then, we can assign fields to one, two, or three bits. As opposed to using one byte per field.
Nice visual and all, but in code, what does that look like?
One way of doing it is with masks. We define a constant for each of our “field” that represents the bits we use to store the field in the u32.
To read a field, we have to “single out” the field from the rest of the struct, by masking with the & operator. Then, if we choose carefully our representation, we have the final value.
To set a field value, we need to clear the value and then set the bits with the | operator.
Well, this will take a couple more CPU instructions than directly loading from memory. But memory loads are already a couple of instructions, and can take hundreds if you are hitting something out of cache.
In code what does that look like?
#[repr(u32)]
enum Blend {
Opaque = 0b00 << 11,
PremultipliedAlpha = 0b01 << 11,
Multiply = 0b10 << 11,
Alpha = 0b11 << 11,
}
const BLEND_MASK: u32 = 0b11 << 11;
impl TryFrom<u32> for Blend {
fn try_from(u32: u32) -> Result<Blend, ()> {
// ...
}
}
fn set_blend(mesh_pipeline_key: u32, blend: Blend) -> u32 {
let blend_offset = blend as u32;
(mesh_pipeline_key & !BLEND_MASK) | blend_offset
}
fn get_blend(mesh_pipeline_key: u32) -> Blend {
let masked = mesh_pipeline_key & BLEND_MASK;
masked.try_into().unwrap()
}
Gosh, what horror. This sure is code, but that looks dangerous and error prone3.
In fact, the original code I used in the presentation contained an error that I only corrected when cleaning up for this blog post.
So, how does that look like in the wild? Maybe Bevy has better code quality?
Bevy uses the bitflags crate, let’s see how that looks:
bitflags::bitflags! {
pub struct MeshPipelineKey: u32 {
const HDR = 1 << 0;
const TONEMAP_IN_SHADER = 1 << 1;
const DEBAND_DITHER = 1 << 2;
const DEPTH_PREPASS = 1 << 3;
const NORMAL_PREPASS = 1 << 4;
const DEFERRED_PREPASS = 1 << 5;
const MOTION_VECTOR_PREPASS = 1 << 6;
const MAY_DISCARD = 1 << 7;
const ENVIRONMENT_MAP = 1 << 8;
const SCREEN_SPACE_AMBIENT_OCCLUSION = 1 << 9;
const DEPTH_CLAMP_ORTHO = 1 << 10;
const TEMPORAL_JITTER = 1 << 11;
const MORPH_TARGETS = 1 << 12;
const READS_VIEW_TRANSMISSION_TEXTURE = 1 << 13;
const LIGHTMAPPED = 1 << 14;
const IRRADIANCE_VOLUME = 1 << 15;
// etc.
Starting with the boolean fields, we convert the struct into a newtype, and declare associated constants, one per field. Then, to read and write to the packed struct, we do as we did earlier, but with named constants instead of magic numbers. Here is how it looks like when reading/writing to a MeshPipelineKey:
let key = MeshPipelineKey::HDR |
MeshPipelineKey::MAY_DISCARD |
MeshPipelineKey::IRRADIANCE_VOLUME;
// Note that unlike C, and other C-derived language,
// operations follow the correct operator precedence.
if key & MeshPipelineKey::IRRADIANCE_VOLUME
== MeshPipelineKey::IRRADIANCE_VOLUME {
// do irradiance volume
}
As a developer, I don’t like this code. While it is “type safe” the code doesn’t reflect what we are doing. It is ostensibly manipulating bits, or bit twiddling, but what we mean to do is not manipulating bits, but rather check a boolean value, no?
In any case, we should look at how to handle enums with this.
Please keep in mind that this is an anti-pattern. Do not reproduce it at home or at work, not even in the presence of a trained professional.
const BLEND_RESERVED_BITS = Self::BLEND_MASK_BITS << Self::BLEND_SHIFT_BITS;
const BLEND_OPAQUE = 0 << Self::BLEND_SHIFT_BITS;
const BLEND_PREMULTIPLIED_ALPHA = 1 << Self::BLEND_SHIFT_BITS;
const BLEND_MULTIPLY = 2 << Self::BLEND_SHIFT_BITS;
const BLEND_ALPHA = 3 << Self::BLEND_SHIFT_BITS;
const BLEND_MASK_BITS: u32 = 0b11;
const BLEND_SHIFT_BITS: u32 =
Self::PRIMITIVE_TOPOLOGY_SHIFT_BITS - Self::BLEND_MASK_BITS.count_ones();
For enums, we just define additional constants for each variant. We then check for the variant this way:
key |= MeshPipelineKey::BLEND_MULTIPLY;
if key & MeshPipelineKey::BLEND_RESERVED_BITS
== MeshPipelineKey::BLEND_OPAQUE {
// do opaque blend
} else {
// ...
}
Very similar to booleans, just checking against variants. Straightforward isn’t it? But haven’t you noticed something?
First off, the BLEND_ prefix, very reminiscent of C enums. Second, we have lost exhaustivity check!
Consider what happens when we add a new blend mode. We add a new constant, and then, of course, we should update every place that checks the value of the blend bits. But since this is an if/else, the compiler can’t remind us to do it if we forget. Furthermore, we forgot to add a bit to the mask BLEND_MASK_BITS, because we now need 3 bits to encode our blend variants, and again, the compiler couldn’t help us.
const BLEND_RESERVED_BITS = Self::BLEND_MASK_BITS << Self::BLEND_SHIFT_BITS;
const BLEND_OPAQUE = 0 << Self::BLEND_SHIFT_BITS;
const BLEND_PREMULTIPLIED_ALPHA = 1 << Self::BLEND_SHIFT_BITS;
const BLEND_MULTIPLY = 2 << Self::BLEND_SHIFT_BITS;
const BLEND_ALPHA = 3 << Self::BLEND_SHIFT_BITS;
// vvv new variant
const BLEND_ADDITIVE = 4 << Self::BLEND_SHIFT_BITS;
const BLEND_MASK_BITS: u32 = 0b11;
const BLEND_SHIFT_BITS: u32 =
Self::PRIMITIVE_TOPOLOGY_SHIFT_BITS - Self::BLEND_MASK_BITS.count_ones();
// Then when accessing:
key |= MeshPipelineKey::BLEND_MULTIPLY;
if key & MeshPipelineKey::BLEND_RESERVED_BITS == MeshPipelineKey::BLEND_OPAQUE {
// do opaque blend
// vvv new variant check
} else key & MeshPipelineKey::BLEND_RESERVED_BITS == MeshPipelineKey::BLEND_ADDITIVE {
// do additive blend
} else {
// ...
}
I can tell you, the Bevy rendering code is littered with bit twiddling like that. And every time there is a change to the rendering code, there is a massive thousand lines diff that conflicts with everything else. It makes unanimity, ask anyone they will tell you they dislike this code.
Also, personally, I found this to be extremely confusing, I had to learn (on top of learning graphics programming) what the hell was going on, and why the silly.
But we finally managed to use a u32 for our MeshPipelineKey. Our hash is indeed dang fast, in fact, it’s order of magnitudes faster. But, that was at the cost of maintainability, which we still pay to this day.
Seemingly, there is a tradeoff between maintainability and performance. We got performance, but we lost:
- Readability, accessing a field is now a boolean operation with C-style manual namespacing
- We lost exhaustiveness checking
- It’s easy to forget to update field offsets and masks
- We now depend on the
bitflagscrate.
But dang! We are using the Rust programming language! What Rust did is tell us that no, we don’t have to choose between performance and maintainability, we can do both! Emphatically!
Engineering is not just about making decision, it knowing about which decisions you can make.
So let’s make the right decision here and use the nearly forgotten art of bit fields.
I’ll be using a Rust crate called bitbybit. There is a variety of bit field crates, but I choose this one because I can vouch for it, I’ve thouroughfully reviewed it. I’m being told however, that there are better crates you can use.
You remember the asterisk in the “list of drawbacks” slide? There is actually a drawback here. Can you guess at it? It is an additional dependency. Still, compared to bitflags, it’s just a different dependency. But you should be careful about which dependencies you use, and, like me, verify them before using them.
So let’s see how it looks like with bitbybit:
#[bitfield(u32)]
pub struct MeshPipelineKey {
#[bit(0)] hdr: bool,
#[bit(1)] tonemap_in_shader: bool,
#[bit(2)] deband_dither: bool,
#[bit(3)] depth_prepass: bool,
#[bit(4)] normal_prepass: bool,
#[bit(5)] deferred_prepass: bool,
#[bit(6)] motion_vector_prepass: bool,
#[bit(7)] may_discard: bool,
#[bit(8)] environment_map: bool,
#[bit(9)] screen_space_ambient_occlusion: bool,
#[bit(10)] depth_clamp_ortho: bool,
#[bit(11)] temporal_jitter: bool,
#[bit(12)] morph_targets: bool,
#[bit(13)] reads_view_transmission_texture: bool,
#[bit(14)] lightmapped: bool,
#[bit(15)] irradiance_volume: bool,
// etc.
}
This looks very similar to the original struct declaration. Under the cover though, we are doing the exact same bit twiddling as before. In fact, the macro declares MeshPipelineKey as a newtype over a u32.
Notice that we have an attribute #[bit(x)]. Here, x represents the position in the u32 of the field.
Do we really care about where our fields go? Actually yes: Network and hardware protocols use bit fields, where the precise location of bits and their meaning is specified into an RFC or burried in dusty manuals.
C has a special syntax just for bit fields. But because you cannot control where each bit goes (it is implementation defined because of course everything in C is implementation defined) everybody will tell you to never use bit fields in C. Hence, this is the likely reason no one uses bit fields nowadays anymore.
But here, unlike in C, we have control.
How do we read and write to fields now? The fields are declared as methods on MeshPipelineKey. If we were to expand the bitfield macro, it would look as follow:
pub struct MeshPipelineKey(u32);
impl MeshPipelineKey {
pub fn set_irradiance_volume(self, irradiance_volume: bool) -> Self {
// ...
}
pub fn irradiance_volume(&self) -> bool {
// ...
}
// etc.
}
To use them, we just call the methods:
let key = MeshPipelineKey::builder()
.with_irradiance_volume()
// ...
.build();
if key.irradiance_volume() {
// do irradiance volume
}
No more bit fiddling, no more weird constants. You just access data with a getter, like the wise Java sages taught you at school. They work the way you expect them to work.
What about enums? bitflags’ devious limitations were brought to light with enums, how does bitbybit handle enums?
So, this is how the rest of the MeshPipelineKey struct looks like:
#[bitfield(u32)]
pub struct MeshPipelineKey {
// bits ..
#[bits(16..=17)]
blend: BlendMask,
#[bits(18..=20)]
msaa: Msaa,
#[bits(21..=23)]
primitive_topology: PrimitiveTopology,
#[bits(24..=26)]
tonemap_method: TonemapMethod,
#[bits(27..=28)]
shadow_filter_method: ShadowFilterMethod,
// etc.
}
// The enum declarations:
#[bitenum(u2, exhaustive = true)]
enum Blend {
Opaque = 0b00,
PremultipliedAlpha = 0b01,
Multiply = 0b10,
Alpha = 0b11,
}
Instead of specifying a unique bit, we specify a range of bits. We also have to add the bitenum attribute to our enums. With this, bitbybit can tell you if the range is enough to hold the enum in question, and refuses to compile if not. With nice and detailed error messages at compile time and all (I would know, I wrote them). bitbybit can also tell you if the target size of the struct is not enough to hold all the fields.
And we access an enum field as we would expect it:
key.set_blend(BlendMask::Opaque);
match key.blend() {
BlendMask::Opaque => { /* do opaque blend */ }
// other blend modes
}
We match on an enum, that means that we still have exhaustiveness checking!
OK, it’s technically less efficient.
To find out why, I’ll let you scroll up to the third code listing, the one where we manually do masking, without any external crates. Notice the values of the variants of the Blend enum. They are “pre-shifted”, and when comparing against the value extracted from mesh_pipeline_key, we don’t do any more shifting.
That’s like one less instruction. I mean, in this specific case, the compiler should be able to optimize it.
The moral of the story is that (rarely, but indeed sometimes) we don’t have to chose between maintainability and performance. This time, we picked both.
Don’t let yourself be blinded by false dichotomies.
Remember: engineering is not just about making choices. It’s also about knowing which choices you can make.
On packed representation
The original talk title was “PSA: you too can pack structs”.
Here, for clarity, I kept using “bit fields”. But in the literature, “packed” is often used interchangeably with “bit fields” (that wouldn’t be the first time CS has confusing naming schemes)
In Rust, you might know of the #[repr(packed)] attribute. It’s the weaker version of bit fields.
I focused on MeshPipelineKey – where packed isn’t relevant – because it was a practical use case with real world application. Here is a different scenario:
We are building the cheese panopticon. A database of the cheese stashes of every person on earth. We start by making a struct to represent cheese stashes:
enum CheeseType {
Gruyere,
Vacherin,
Raclette,
}
struct Cheese {
kind: CheeseType,
aged: bool,
weight_kg: u64,
}
We are going to have billions of copies of Cheese in memory, so better not take too much memory. Let’s see how many bytes it takes:
fn main() {
println!("{}", std::mem::size_of::<Cheese>());
}
// -> 16
Sixteen! This means that Rust uses 128 bits to represent a Cheese datastructure. Let’s see why. We have three fields:
kindis an enum with three variants, 1 byteagedis a boolean value, 1 byteweight_kgis au64, so 8 bytes
I count 10 bytes, so what’s up? Why is Rust wasting 6 whole bytes for my Cheese?!
Let me explain. You know that in Rust, a reference cannot be null. Specifically, they are always well formed. What does it take for a reference to be well formed?
- Can’t be null
- Always points to valid data
- It respects the alignment of the data type it’s pointing to.
- Some other hooey gooey stuff that is a constant headache if you are ever to use
unsafe.
So what’s alignment? In short, some machine instructions only work (or are much faster) if the address of an integer it’s loading is a multiple of 1, 2, 4, 8, 16 etc. That address coefficient is what we call “alignment”. If you have ever written C code for ARM devices, you might have encountered a mysterious “Bus Error”. It comes from alignment.
All Rust types have an alignment. For u8, it’s 1, for u16 2, u32 4 and u64 8. (same as its size). When it comes to composite types (ie: struct), the size of the composite must be divisible by all it’s component type’s alignment. In particular, since all alignments are power of two, it just needs to be divisible by its highest component’s alignment. And its alignment will be that.
In the case of Cheese, weight_kg: u64 has an alignment of 8. So Cheese, which contains weight_kg, must itself have a size that is divisible by 8. 10 is not divisible by 8, the next best one is 16. So Rust adds 6 bytes of padding to each instace of the Cheese struct. Padding is just blank data that has no purpose (in fact, reading it is undefined behavior).
Thanks to this, when your struct is in an array, its fields will always be aligned (ie: you can create valid references to them that won’t trigger bus errors (which are very similar to segmentation faults in term of fun to debug))
In Rust, #[repr(packed)] allows you eliminate padding:
#[repr(packed)]
struct Cheese {
kind: CheeseType,
aged: bool,
weight_kg: u64,
}
// -> size: 10 bytes
But that does mean that we lose alignment. Since it’s undefined behavior to read from unaligned memory, we actually can’t create references to weight_kg in Cheese, &cheese.weight_kg is a compiler error.
When the struct is not packed, we have this:
With the #[repr(packed)] attributes, we have the following:
So, indeed, #[repr(packed)] can drastically reduce memory usage! Instead of storing 8 Example in our array, we stored 10.
You’ll notice that bitbybit requires a backing storage type. We used u32. But imagine just a second that your data fits perfectly in 33 bits. Would you need to use a u64? There is no padding, but that’s still a lot of wasted memory.
Some bit field crates allow using arrays as backing storage, making it possible to have – for example – [u8; 5] as backing storage. That minimizes memory usage, at the cost of slightly more complex generated code.