Warlock's Gambit trace analysis

I did some analysis on the traces of warlock’s gambit.

I captured traces both using Chromium’s dev tools and bevy’s trace_chrome features. I usually use the trace_tracy feature, and run tracy, but this time, for the benefit of a potential employer (and my curiosity), I decided to try out the chrome version.

Sadly, the trace as captured by chrome in the WASM stack do not help much.

The rust symbols show up, but bevy relies heavily on type erasure both for performance and ergonomics.

Specifically the game systems are added to a scheduler as an erased function, and ran each frame. Using the argument types of the function, the scheduler knows which systems touches which components, and only runs in parallel systems that do not touch mutably the same components. This allows automatic concurrency, but at the cost of decoupling the running of game systems from their declaration, thus losing a lot of “free” debugging tools.

However, bevy smartly sets up custom tracing events, actually associating the system ran with the name of the function of the system. Bevy uses the tracing-wasm crate for tracing on wasm, sadly, I didn’t manage to get it working. It seems chrome has difficulties properly understanding the trace event hierarchy.

Follows the tracing files for native and chrome telemetry:

Both traces are in the chrome tracing format, so they work in the same tools.

I’ve been recommended to use https://ui.perfetto.dev/ to inspect chrome traces, but I fond in the chromium dev website https://speedscope.app and I find it much better. The sandwich view makes understanding what takes most time a piece of cake (or sandwich as the name would have it).

speedscope has three views accessible through the top left bar:

“self” time is the time a span ran minus inner span runtime.

fn fun1() {
    takes_a_lot_of_time();
    span2();
    span3();
}
fn fun2() {
    span2();
    span3();
}

Here, fun2 does no work beside calling other functions with spans, meaning its self time is very low. While fun1 will have a long self time. Self time tells you which function to optimize.

I recommend you get familiar with the sandwich view, because it is the fastest way to figure out what is slowing down your app.

Following is the “left heavy” view in speedscope.app for warlock’s traces. Click on the heading to download the file.

native traces

native trace generated with speedscope.app

chrome traces

chrome trace generated with speedscope.app

Analysis

The native trace will take about 5 minutes to open on https://speedscope.app, but here is what I can get out of it:

With this insight, I’ll probably be able to help optimize bevy a little bit. I won’t touch the scheduling, since it is currently being rewritten. But it seems some low hanging fruits are: