This Week In Veloren 125

This week, we get an in-depth view into what was required for the wgpu
integration. We see some changes to econsim, performance, and caves.

– AngelOnFira, TWiV Editor

Contributor Work

Thanks to this week’s contributors, @YuriMomo, @zesterer, @Slipped, @xMAC94x,
@SWilliamsGames, @Christof, @aweinstock, @Sam, @juliancoffee, @Yusdacra,
@imbris, @DaforLynx, @donovanlank, @AngelOnFira, @Capucho, @Snowram, @Zakru, and
@Kisa!

@Sam worked on a harvester rework. It’ll have 4 unique attacks and all-new AI.
To see it, go visit your local T0 dungeon (after it merges).

This week’s meeting minutes.

Lava and performance by @aweinstock

This week, I added Lava to caves, which glows, and which sets you on fire when
you enter it.

I also improved the performance of entity-terrain collisions (one of the main
parts of physics) by an average of 20 microseconds per frame by adding a check
for a more precise bounding box before looking up voxels in the terrain.

Econsim improvements @Christof

I finally found the mistake in the faster economy simulation code and the code
got merged – server startup should be faster by several seconds.

After that, I started an
RFC

about short and long-term goals. Econsim will soon get more wares and
professions to match the increased number of items, and the much more powerful
crafting recipes. Also, prices will probably get independent of loot
probabilities and become more raw material derived.

All about wgpu

A section by @imbris and @Sharp, edited by @Capucho and @AngelOnFira

We now use the wgpu crate for our graphics!

After a long journey of refactoring our rendering code for wgpu, rebasing,
refactoring new graphics features to use wgpu, updating the code across wgpu
versions, testing on a plethora of platforms, fixing miscellaneous bugs, and
rewriting parts of our CI cache generation with efforts from @Imbris, @Capucho,
@Sharp, and others we have finally
merged the long
awaited 129 commit branch that switches our rendering code to this new graphics
crate!

wgpu is the spiritual successor to the gfx (pre low level) crate we were
using for graphics so it sits at a similar level of abstraction. However, wgpu
provides a cleaner modern API that gives use more control and the ability to
leverage the power of modern backends including vulkan, metal, and dx12. With
gfx we only used the OpenGL backend and now with wgpu we have automatic
support for multiple backends including automatic runtime backend selection.

The currently functioning backends are vulkan, dx12, metal, and dx11 (support
for OpenGL is lost for the moment). We saw improvements in performance
especially for Windows users with decent graphics cards that previously
experienced a significant discrepancy in performance from Linux users with
similar cards.

Now that this period of refactoring has been finished, we’re excited to see the
optimizations and features we can develop on this fresh graphics API.

Much thanks to @kvark and @cwfitzgerald ❤️ for answering the questions we had
about wgpu, resolving bugs we encountered, and being extremely receptive to
issue reports and pull requests!

Thanks to all the testers who helped us track down platform-specific bugs and
make sure everything was mostly working everywhere!

Notable previous mentions of the wgpu refactor are in
devblog-64 and
devblog-100

Presentation modes option

We added a new option in the graphics menu with the three presentation modes
exposed by wgpu: Immediate, Mailbox, and Fifo. Selecting Fifo or Mailbox can
be used to avoid screen tearing. Fifo corresponds to VSync and as such caps
the frame rate at the display refresh rate. Immediate and Mailbox both fallback
to Fifo within wgpu if that option isn’t available on the current platform.

GPU timing with wgpu-profiler

While working on this refactor I discovered, via lurking on the Rust gamedev
Discord and the wgpu matrix channels, that wpgu provided an API for collecting
timestamps

between sections of work on the GPU. This piqued my interest as I’ve previously
had much success identifying and optimizing our CPU bottlenecks with the use of
flamegraph and
Tracy while our GPU side performance was
mostly opaque to me.

Thus, I started working on writing an abstraction to make it easier to
instrument our rendering code with timestamps. While I was in the process of
this, Wumpf released wgpu-profiler. I discovered it had everything I needed
and nicely solved the complications of identifying timestamps which I was too
focused on over-optimizing. It even included the ability to write out to a
chrome trace which was pretty neat!

So I switched to wgpu-profiler, made a
PR adding drop based scope
wrappers around wgpu types, and deleted all the extra code I had for this in the
Veloren codebase which was pretty satisfying. I then added scopes around all the
major portions of our rendering (essentially one for each render pass and one
for each pipeline).

To make it easy to access this new information I added a new option in the
graphics menu to enable GPU timing. The timing values can be viewed in the HUD
debug info (F3) and will be saved as chrome trace files in the working directory
when taking a screenshot. It’s mostly just supported on the Vulkan backend at
the moment so if you don’t see anything new in the debug info then the feature
isn’t available on your current backend/platform combination. I intend to
eventually make the checkbox gray out in this case instead of having no
indication. It would be neat to plot these values over time in the future.

Chrome trace

GPU timing in HUD debug info

We can also now use tooling for profiling vulkan and dx12 such as this profiler
from AMD: https://gpuopen.com/rgp/

Example RGP profile on Ryzen 3 3200U integrated graphics

Sprite optimization

High sprite view distances previously had a really large impact on the CPU. We
used instancing for sprites to draw multiple ones with the same model within a
single draw call. However, within a chunk there several different sprite kinds
(some with multiple variations in their models) leading to a large number of
draw calls and significant CPU overhead when trying to draw more sprites farther
away.

To reduce the number of draw calls we need to make I combined all the sprite
models into a single buffer of vertices and then used a technique called vertex
pulling to fetch the vertex data from this buffer within the vertex shader by
determining the location in the buffer from the index of the current vertex
combined with the mesh offset for the current instance.

This moved the bottleneck for sprites into the vertex shader both because this
introduced a bit more work in the shader and because sprites simply have a lot
of vertices and were no longer limited by the CPU. To help alleviate this I
converted all our quad drawing (i.e. what all our terrain, figures, and sprites
are made of) to use an index buffer to reduce the number of vertices that need
to be processed per quad from 6 to 4 (as the GPU should cache vertices with the
same index).

This helped and also led to small gains for shadow rendering times, presumably
terrain and figures also benefited but the cost of the fragment shader for those
is far greater in typical scenes. I think we can squeeze more performance out of
sprites by optimizing the vertex shader further and reducing the amount of
per-instance data. Another area where we can tackle this is reducing the
vertices used for sprites that are far off and take up only a few pixels. We
already have different LOD levels that reduce the number of vertices but we may
be able to go further and simply render 2D billboard-style sprites for these.

Reversing the depth buffer

One of the bigger wins from switching to wgpu is that it lets us abandon the
OpenGL depth buffer. To make a long story short, the standard method for
calculating depth buffers is the worst possible way to do
it
. They should be computed
“backwards” from how people normally set them up due to this taking far better
advantage of the way floating point precision works (in fact, it is nearly
optimal for floating point)!

Unfortunately, since it takes some fairly involved math to realize this, the
OpenGL spec was published in a way that makes it impossible to realize the
benefits of reversed depth buffers, due to setting depth planes to go from -1 to
1 instead of 0 to 1; the conversion to 0 to 1 must then be done in the drivers
by computing depth * 0.5 + 0.5, which is pretty much the worst possible thing
you can do for floating point accuracy and negates the reversal effect. wgpu
follows the far more sensible Vulkan, Metal, and DirectX option, and sets depth
buffers from 0 to 1, enabling us to realize the effect.

The practical upshot of this is that we can set the near plane much, much closer
than before, while extending the far plane to (effectively) infinity, while
retaining better precision than before at literally every point in the whole
range (except maybe within a very tiny fraction of a block in front of the near
plane). This should help us improve clipping since it solves the most annoying
aspect of it (near plane) and also lets us render much bigger maps without
noticeable artifacts (unless you zoom all the way out).

Organizing bind group and pipeline switching

To avoid redundant state changes (setting pipelines, bind groups, vertex
buffers) as much as possible. We created a series of “Drawer” types that wrap a
reference to the render pass and guarantee for instance that a certain pipeline
is in use. We have a top level Drawer that exists transiently during the
recording of the frame:

pub struct Drawer<'frame> {
    encoder: Option<ManualOwningScope<'frame, wgpu::CommandEncoder>>,
    borrow: RendererBorrow<'frame>,
    swap_tex: wgpu::SwapChainTexture,
    globals: &'frame GlobalsBindGroup,
    // Texture and other info for taking a screenshot
    // Writes to this instead in the third pass if it is present
    taking_screenshot: Option<super::screenshot::TakeScreenshot>,
}

This hold references to bits of the render state that we need and owns the
command encoder. Then for example, there is a method to start the first render
pass that returns a FirstPassDrawer:

    /// Returns None if all the pipelines are not available
    pub fn first_pass(&mut self) -> Option<FirstPassDrawer> {
        let pipelines = self.borrow.pipelines.all()?;
        // Note: this becomes Some once pipeline creation is complete even if shadows
        // are not enabled
        let shadow = self.borrow.shadow?;

        let encoder = self.encoder.as_mut().unwrap();
        let device = self.borrow.device;
        let mut render_pass = encoder.scoped_render_pass("first_pass", /* snip */ });

        render_pass.set_bind_group(0, &self.globals.bind_group, &[]);
        render_pass.set_bind_group(1, &shadow.bind.bind_group, &[]);

        Some(FirstPassDrawer {
            render_pass,
            borrow: &self.borrow,
            pipelines,
            globals: self.globals,
            shadows: &shadow.bind,
        })
    }

But before returning it two bind groups are set:

        render_pass.set_bind_group(0, &self.globals.bind_group, &[]);
        render_pass.set_bind_group(1, &shadow.bind.bind_group, &[]);

In subsequent drawing, we can then rely on these bind groups being set (as long
as we maintain that they aren’t overridden which we accomplish by keeping all
the logic which sets bind groups (and other things) in this one module). Then
e.g. to start drawing terrain we create a TerrainDrawer setting the pipeline
and the index buffer in the process:

    pub fn draw_terrain(&mut self) -> TerrainDrawer<'_, 'pass> {
        let mut render_pass = self.render_pass.scope("terrain", self.borrow.device);

        render_pass.set_pipeline(&self.pipelines.terrain.pipeline);
        set_quad_index_buffer::<terrain::Vertex>(&mut render_pass, &self.borrow);

        TerrainDrawer {
            render_pass,
            col_lights: None,
        }
    }

Such that when we draw all the chunks we only need to set the additional bind
groups and vertex buffer specific to each chunk:

impl<'pass_ref, 'pass: 'pass_ref> TerrainDrawer<'pass_ref, 'pass> {
    pub fn draw<'data: 'pass>(
        &mut self,
        model: &'data Model<terrain::Vertex>,
        col_lights: &'data Arc<ColLights<terrain::Locals>>,
        locals: &'data terrain::BoundLocals,
    ) {
        if self.col_lights
            // Check if we are still using the same atlas texture as the previous drawn chunk
            .filter(|current_col_lights| Arc::ptr_eq(current_col_lights, col_lights))
            .is_none()
        {
            self.render_pass
                .set_bind_group(2, &col_lights.bind_group, &[]);
            self.col_lights = Some(col_lights);
        };

        self.render_pass.set_bind_group(3, &locals.bind_group, &[]);
        self.render_pass.set_vertex_buffer(0, model.buf().slice(..));
        self.render_pass
            .draw_indexed(0..model.len() as u32 / 4 * 6, 0, 0..1);
    }
}

Future of OpenGL support

With this transition, we gain access to a lot of new graphics backends, but we
now no longer have OpenGL support. This mainly affects Linux users as support
for older Windows machines is provided through dx11. Nevertheless, we may regain
some form of OpenGL support on Linux in the future. The backend abstraction
layer for wgpu was recently rewritten and moved into the main wgpu
repo
removing a lot of the barriers
to getting an OpenGL backend working.

Shaderc adventures

To use our glsl shaders with wgpu we have to compile them to SPIRV. Currently,
the only viable option for this is using shaderc. However, shaderc is a C++
dependency and thus has the potential to bring complications to the build
process which it does. Building shaderc requires git, cmake, and python to
be installed and also ninja on Windows. This adds extra steps for new
contributors looking to compile Veloren.

Additionally, shaderc introduced complications into our cross compiled CI builds
(for non Linux platforms). Cross compiling to Windows, I ran into this
issue
which I was only able to
solve by using the posix threads version of MinGW which necessitated
distributing some DLL files.

Cross compiling to macOS also had various issues with compiling and/or linking
shaderc. After several attempts including trying to use prebuilt versions of
shaderc, we eventually switched to natively building on a macOS runner which was
admittedly a somewhat useful change outside of getting shaderc working as we
have had other
issues with cross
compiling to macOS. Shaderc is also pretty massive and appears to add at least
~10 mb to our binary size. Hence, we are really looking forward to when
naga, a shader translator library written in
pure Rust, will be able to replace shaderc for our purposes.

CPU profiling

Since we had a major refactor I thought it would be neat to examine the CPU
profiles of the work being done during rendering. Most of the CPU side time for
rendering appears to be spent in run_render_pass:

Tracy profile showing ~2 ms to queue rendering of everything for a
frame

Flamegraph snippet showing where `run_render_pass` spends time

Settings used for CPU profiling

Future directions

Our future with wgpu is bright. Because its API is much closer to that of modern
graphics drivers, it should be much easier to incorporate newer hardware
features (at least if they have a little bit of cross-platform support). We also
have a great opportunity to help influence the shape of not only wgpu, but the
WebGPU specification itself, to make it better for games in general.

Finally, having made this switch lets us focus on other rendering improvements
we’d been putting off in order not to block wgpu work (or which were blocked by
lack of wgpu), such as:

  • Separate the voxygen renderer module into its own crate (e.g. we can use it in
    a voxel editor)
  • Create a small renderer engine test/demo for rapid iteration and testing.
  • Rendering on multiple threads (impossible before wgpu, currently waiting on
    removal of locks in wgpu).
  • Deferred rendering (making large numbers of unshaded point lights cheap).
  • Move completely to naga once it has glsl support and can handle our shaders.
  • SMAA
  • Improved LOD rendering (including things like trees, sites, and maybe even
    some form of sprites).
  • Smarter and more accurate shadows with occlusion queries (useful for many
    other purposes as well) and Z-partitioning.
  • Faster and smarter chunk updates, both for initial rendering and updates to
    things like sprites, to reduce apparent load times.
  • More physically correct HDR, allowing us to remove various lighting hacks
    (should especially help at night).
  • Take advantage of compute shaders and GPU-side buffer updates to speed up
    CPU-limited operations.
  • (Maybe) explicit tiled rendering, tesselation shaders, and multiple queues,
    if/when supported!
  • And much, much more!

Out for an early morning dip. See you next week!

Support our devs!

Veloren Open Collective

This Week In Veloren 124

This week, 0.10 released! We hear from @Slipped about caves, plugins from
@zesterer, and other improvements by @aweinstock and @juliancoffe.

– AngelOnFira, TWiV Editor

Contributor Work

Thanks to this week’s contributors, @AsumFace, @xMAC94x, @aweinstock,
@SpecificProtagonist, @juliancoffee, @Pfau, @Yusdacra, @Kisa, @XVar,
@AngelOnFira, @Cr0ss0vr, @Slipped, @zesterer, @Capucho, @jvck10, @Scott, @tygyh,
@Sam, and @Christof!

This past week, 0.10 released! At peak, there were 118 unique players online.
Read more about the release in the writeup below.

@Nomagno has been pondering the feasibility of a partial WASM port of
veloren-server, veloren-common, and veloren-client, and hopes to put some
work into it in the future. @Scott adjusted most of the NPC’s hitboxes to better
fit their models and lowered a lot of the smaller creatures’ health values to
make early game easier.

Crafting exp with a different colour and an extra symbol

During the debugging of the meat MR, @aweinstock made a graphviz generator that
processes the recipe book:

@Snowram added some new npcs, including a roc npc with a new tornado summoning
attack. Look out for it in the sky! @Slipped merged some rebalancing to correct
things after the armor MR made everyone stronger. Hopefully, now we have better
progression barricades for the tiers of monsters that drop the materials you
need to advance through the crafting system, by making lots of stuff stronger.
And we have a basilisk petrify attack so that’s pretty rad.

@juliancoffee worked on loading EntityInfo from assets. EntityInfo is
builder-like structure that is used to populate chunks and then server spawns
creatures from that. Now you can describe body (only random for now), loadout,
skillset, name, loot, and weapons in asset and then just load them where you
need them. Currently, it works only for single entities (so if you need to spawn
multiple entities, you are using vector for this) but this is base for things
like loading whole dungeon crowd from asset where you can specify enemies for
each room or describe what pack of animals will be spawned somewhere in
overworld.

Caves by @Slipped

One thing that became super clear once we decided to start adding more ores to
caves was that they were due for a tuneup. They took very little advantage of
static lighting, weren’t very deeply tied to progression and their difficulty
was a bit all over the place. However, their actual structure set up by
@zesterer was amazing and we were completely underselling that. Caves break down
to 4 points of hierarchy:

Layout

This is the existing canvas. Caves can be super vertical, super wide or
frighteningly narrow. They can have platforms, glider obstacle courses, and a
ton of other things. No changes were needed for the lowest level hierarchy, it’s
already great.

Structures

So far, caves have had some really cool stalactites for structure. We needed
more of that. First was color, there are now noisy colored floors and ceilings
in the caves, which helps break up the old gray. New caves also have different
structures depending on depth. Once you enter, there are ridges, which are huge
white scaffolds that hold up the ceiling of the cave. They cover some percent of
the ceiling at specific intervals of the cave depth, which creates clean
structures in narrow hallways and very chaotic bands of scaffold in bigger
rooms.

As you go deeper, we trigger the stalactites. Finally, at the lowest levels we
trigger pits, which can go 30 blocks deep, along with veins. Veins store the
best ores in the game, so look for orange sand and you can pickaxe through for
the ore deposits at the bottom.

Sprites

Sprites are the next level up the hierarchy. A few big changes here were making
sprite spawns width dependent so that they mostly spawn closer to the walls,
adding ceiling sprites, and adjusting the sprite glow color to be less fire-like
and more neutral. Thanks to the artists, we have different layers which use
mushroom, crystal, or orb sprites as you go deeper, and caves are very lit up
now. We also added dripping particles from the ceiling.

Enemies

Enemies are on the top level of the hierarchy, and got rebalanced to tune
difficulty. On entry, things are still passive so new players can gather basic
ore. Next, you hit moderate enemies like asps and salamanders, to give scales
and protect the iron. After that you reach the cobalt/coal layer, protected by
basilisks and firedrakes, upping the difficulty. Finally, the bottom layer has
ogres, cyclops, trolls, and others protecting gold, silver, and bloodstone, the
most useful ores in the game.

The next plans are to start getting into biome-specific cave themes, so look out
for that!

Plugin Tutorial by @zesterer

I finally got around to writing a tutorial for the new plugin API. It’s worth
noting that the plugin API is still under heavy development, but I’ll try to
update the tutorial regularly to reflect changes.

The tutorial takes budding modders all the way from setting up the project to
packing, playing in the game, and demonstrates a series of API features.
Hopefully, this is going to accelerate activity around plugins and help us push
the API to the point that it becomes viable for community modding.

https://book.veloren.net/contributors/modders/writing-a-plugin.html

Mining and movement fixes by @aweinstock

I added a mining skill tree, which gives xp when mining ore (and not from
combat). Skill points in the mining skill tree can be spent on mining faster,
and on a chance to gain extra ore and gems when mining. @Scott made the icons
for the skill tree, and @Pfau helped to fine-tune the UI. @DrDystopia also
helped by refactoring voxygen::hud::diary, which made it easier for me to add
the new skill tree to the UI.

I also made it so that character abilities that apply forced movement (primarily
leaps and rolls) temporarily disable entity-entity pushback. This fixes a bug
with axe/hammer leap where you couldn’t deliver the hit because you’d be pushed
away on landing from the leap before swinging the axe or hammer. This also makes
it possible to roll through creatures, which means that it’s now possible to
escape being stuck in a corner by a minotaur.

0.10 Release Party by @AngelOnFira

With the release party last Saturday, lots of telemetry was recorded. In
comparison to the previous release, 0.10 went a lot smoother. Some of the devs
got into a voice chat to talk about the game’s development, and this was
streamed to Twitch. Check out the recording below!

One of the primary metrics that we can best use to determine server health
during releases is the tick time. In the case of this release, we saw healthy
tick times of between ~40-50ms. Ideally, it would be best if they were less than
33ms, as this gives the target tick rate of 30 ticks per second. During previous
releases, we saw it go as high as 100ms, which would tend to snowball while
trying to catch up on work from previous ticks.

For the release, we used a 32 core dedicated server from Hetzner. This time, we
made use of the new AMD CPU option, which turned out to help a lot.

As you can see, during the release party each core never really went above 30%
usage. This shows that although we can make use of parallel processing, we’re
still limited by how fast the main thread is working. A key point of future
optimizations will be to distribute work across the cores more efficiently.

Tests were done on both the 32 core AMD and Intel chips that could have been
provisioned for the dedicated server. Hetzner won’t tell us the exact chip
models that are used “for security reasons”, so instead we ran some of our own
tests. The tests were run with sysbench, and although they don’t exactly
represent more complex problems than just what can calculate primes better, it
still showed us some useful information:

Overall, the release party went quite well. We definitely want to do a bigger
push for the next release, so we can hopefully blow past our record of 133
players on the server at once!

Balance and UX improvements by @juliancoffee

I was recently working on some balance and UX improvements. This includes
slightly buffing the bow, staff (especially at lower stages of the game), sword,
and nerfing the hammer. Dungeon enemies were also nerfed; primarily Tidal
Warrior and npc dashes. Also, the bow zoom is less annyoing now. /kit and
/skill_preset were also updated.

Here is the changelog of weapons:

!!! DISCLAIMER: these values are just coefficients, which is then multiplied by
weapon power, weapon kind, skills, and other coefficients !!!

- Bow M1:
  * charge_time reduced from 1.2 to 1.0
  * damage range was changed from [20; 100] to [5; 125]
  * regen range was changed from [20; 140] to [5; 125]
  -> bow now shoots faster

- Staff M1:
  * regen was changed from 50 to 60
  * explosion effect was reworked to more smooth damage distribution over the area
  -> you now can get stamina to roll if you hit two enemies. You also do more AoE damage on lower stages of the game when your explosion radius isn't that great

**Tidal Warrior**
Bubbles:
  * damage reduced twice -> you have more chances to not die when doing melee
Totem:
  * much longer reload time of wave
  * wave now pushes you away of totem, not upward -> orcs aren't dying from fall damage while fighting Tidal Warrior now

**Dashes**
- Spear dash (T0-T4 enemies)
  * damage range was reduced from [60;160] to [10;90]
  * poise range was reduced from [45;45] to [0;25]
  * recover duration was increased from 0.5 to 0.8
- Animal Dashes (Lions, Bonerattlers, etc)
  * recover duration was increased from 0.5 to 1.0
  -> you now can move after getting dashed so it's less likely you will get damage twice

**UX**
- Bow Zoom:
  * starts with 50% of charge
  * more lite zoom at the end of charge
  -> aiming with bow M1 is now easier
- `/kit` command
  * new `tier-5` kit with yellow armour and tier-5 weapons.
  * kits from `tier-0` to `tier-4` were updated to match new armour update.
  * new `endgame` kit with cultist and legendary weapons, cultist rings, and gem of lesser protection
- `/skill_preset` command
  * temporary `middle` preset with random copypasta from `max` :P. I plan to add skill presets for each tier in future.

Bug fixes

Damage from melee weapons is now produced in the middle of swing animation, not
at the start. This solves a bug when you can hit your enemy without swing and roll
away. Now you still can cancel swing animation, but you won't hit your enemy.

Melee Balance changes
- Sword:
  * first stage buildup is boosted from 0.15 to 0.1
  -> you start your attack faster
- Hammer:
  * base poise damage is reduced from 25 to 20
  -> it's less likely hammer can stunlock enemies

Will you please listen? I’m not the Messiah! ( See you next week! )

Support our devs!

Veloren Open Collective