Heightmap sky_vis + progressive chunk load: 32s → 0.4s page load

Diagnosis (from earlier tick/toc data + literature review):
  - Page load was dominated by the per-vertex sky_visibility bake
    inside build_chunk_mesh: 8 cosine-weighted hemisphere rays × ~15
    voxel DDA steps × HashMap-backed World::get_block per ray, per
    vertex. ~100ms/chunk × 289 chunks = ~29s on the main thread.
  - The greedy mesh algorithm itself was fine; the bake was the bug.
  - State of the art (cgerikj's binary greedy meshing) runs 50-200µs
    per chunk, but that assumes Lysenko's cheap local AO, not a
    global hemisphere ray cast. Our problem was the wrong primitive,
    not the wrong algorithm.

Fix (this commit):
  - world.rs gains a HeightMap type — per-chunk 16×16 i32 array of
    topmost-solid-Y. Cached in World.heightmaps, invalidated by
    set_block per affected chunk.
  - sim/lighting.rs gains compute_ambience_fast(): inverse-distance-
    weighted 7×7 column scan. ~25 array lookups per vertex instead
    of 8 ray casts. The 1/sqrt(r²+1) weighting makes the center
    column dominate, so a slab right overhead correctly produces
    near-zero sky_vis even though the 7×7 window extends past the
    slab's edges. The old compute_ambience stays around (renamed
    "slow path" in doc) so the construction-invariance test still
    has the rigorous reference to compare against.
  - mesh.rs: warm_heightmaps_around(chunk) pre-populates the
    heightmap cache for this chunk + 8 neighbors before build_chunk_mesh.
    build_chunk_mesh calls compute_ambience_fast.
  - render/mod.rs::rebuild_chunk takes &mut World now (to warm
    heightmaps) and tightens its tick/toc threshold to 0.5ms so the
    fast path is observable in telemetry.

Progressive chunk loading (this commit):
  - App.pending_chunk_builds: VecDeque<IVec3>, populated at init
    spatially sorted closest-first so the chunk under the player
    meshes first and the distant horizon fills in last.
  - App::tick now drains the queue at a 12ms-per-frame budget.
    Browser stays responsive; world appears immediately and
    completes over ~2s.

Measured (locally, fresh browser cache):
  Before: window.voxel_game ready in ~32,000 ms
  After:  window.voxel_game ready in     367 ms  (~90× faster)
  Chunk-build time: 100ms → < 0.5ms each (sub-threshold for logging)

Tests: 63 passing. Native + wasm release clean.

Deferred to its own session: binary greedy meshing (cgerikj-style
64-bit bitmask scan). After this commit it's the next bottleneck
worth attacking; not before.
This commit is contained in:
Maximus Gorog 2026-05-24 12:04:09 -06:00
parent c0589d0dfc
commit 5effb79f0a
5 changed files with 316 additions and 33 deletions

View file

@ -18,6 +18,7 @@
//! 12. Apply damage; periodic state broadcast
//! 13. Render frame
use crate::bridges::{self, RemotePlayer, Settings, Telemetry, TestCommand};
use std::collections::VecDeque;
use crate::camera::{Camera, InputState, KbHeld};
use crate::net::{parse_inbox, NetEvent};
use crate::proto::{ClientMsg, EditRec};
@ -66,6 +67,12 @@ pub struct App {
/// tick — feeds `camera.frame.x` so the shader's day/night cycle
/// slows / freezes / fast-forwards by the player's setting.
shader_time: f32,
/// Chunks awaiting their initial mesh build. Drained a few per
/// tick so the world appears progressively instead of blocking
/// the event loop for the full ~29 s (pre-heightmap-fix) or
/// ~1.5 s (post-fix) all at once. Spatially sorted closest-
/// first so the player sees nearby terrain before the horizon.
pending_chunk_builds: VecDeque<IVec3>,
}
impl Default for PlayerBody {
@ -74,6 +81,26 @@ impl Default for PlayerBody {
}
}
/// Cross-platform "now" in milliseconds. Performance.now() on wasm,
/// Instant-based on native. Used for the per-tick chunk-build budget.
fn browser_now_ms() -> f64 {
#[cfg(target_arch = "wasm32")]
{
web_sys::window()
.and_then(|w| w.performance())
.map(|p| p.now())
.unwrap_or(0.0)
}
#[cfg(not(target_arch = "wasm32"))]
{
use std::time::Instant;
thread_local! {
static EPOCH: Instant = Instant::now();
}
EPOCH.with(|e| (Instant::now() - *e).as_secs_f64() * 1000.0)
}
}
struct FrameClock {
#[cfg(not(target_arch = "wasm32"))]
instant: Instant,
@ -186,18 +213,30 @@ impl ApplicationHandler for App {
*self.camera.borrow_mut() = Some(camera);
*self.world.borrow_mut() = Some(world);
// Populate the progressive-build queue, spatially sorted so
// the chunk under the player + immediate neighbors mesh first
// and the distant horizon fills in last. The renderer itself
// is created without doing any chunk builds — those happen
// amortized across many ticks in `drain_pending_chunk_builds`.
let spawn_chunk = IVec3::new(0, 0, 0);
{
let world_borrow = self.world.borrow();
if let Some(w) = world_borrow.as_ref() {
let mut coords: Vec<IVec3> = w.chunks.keys().copied().collect();
coords.sort_by_key(|c| {
let dx = c.x - spawn_chunk.x;
let dz = c.z - spawn_chunk.z;
dx * dx + dz * dz
});
self.pending_chunk_builds = coords.into_iter().collect();
}
}
let renderer_slot = self.renderer.clone();
let world_slot = self.world.clone();
let window_for_async = window.clone();
let init = async move {
let mut renderer = Renderer::new(window_for_async).await;
if let Some(w) = world_slot.borrow().as_ref() {
let coords: Vec<IVec3> = w.chunks.keys().copied().collect();
for c in coords {
renderer.rebuild_chunk(c, w);
}
}
let renderer = Renderer::new(window_for_async).await;
*renderer_slot.borrow_mut() = Some(renderer);
};
@ -397,6 +436,9 @@ impl App {
}
}
}
// (older inline chunk-rebuild loops below also need `world` to be
// &mut now; see the block-interaction site for the equivalent
// change.)
/// Apply queued declarative-test commands. Called at the very top
/// of tick so a scenario's "set time / teleport / look at" land
@ -446,10 +488,41 @@ impl App {
});
}
/// Build a few queued chunks per frame, capped by a time budget
/// so we don't blow the frame's render budget. After the fast-
/// heightmap bake each chunk is ~15 ms, so ~6 chunks/tick fits
/// well inside a 16 ms frame budget and the ~289-chunk world
/// fills in over ~12 s. World edits also enter this queue if
/// we want to throttle large changes (not yet wired).
fn drain_pending_chunk_builds(&mut self) {
if self.pending_chunk_builds.is_empty() {
return;
}
const FRAME_BUDGET_MS: f64 = 12.0;
let t_budget_start = browser_now_ms();
let mut world_borrow = self.world.borrow_mut();
let Some(world) = world_borrow.as_mut() else {
return;
};
let mut renderer_borrow = self.renderer.borrow_mut();
let Some(renderer) = renderer_borrow.as_mut() else {
return;
};
while let Some(coord) = self.pending_chunk_builds.front().copied() {
renderer.rebuild_chunk(coord, world);
self.pending_chunk_builds.pop_front();
if browser_now_ms() - t_budget_start > FRAME_BUDGET_MS {
break;
}
}
}
/// One frame. See module doc-comment for the pipeline shape.
fn tick(&mut self) {
// Apply any test-harness commands before integrating physics.
self.drain_test_commands();
// Build a few pending chunk meshes if any.
self.drain_pending_chunk_builds();
let dt = match self.last_frame.as_ref() {
Some(c) => c.elapsed().as_secs_f32().min(0.1),

View file

@ -2,6 +2,18 @@ use crate::world::{Block, Chunk, Face, World, CHUNK_HEIGHT, CHUNK_SIZE};
use bytemuck::{Pod, Zeroable};
use glam::{IVec3, Vec3};
/// Ensure heightmaps for this chunk and its 8 neighbors are cached,
/// so the per-vertex ambience bake can do O(1) column lookups via
/// `world.column_top_y(..)`. Run once per chunk at the start of
/// `build_chunk_mesh`. Idempotent — only computes uncached entries.
pub fn warm_heightmaps_around(world: &mut World, coord: IVec3) {
for dx in -1..=1 {
for dz in -1..=1 {
let _ = world.heightmap(IVec3::new(coord.x + dx, 0, coord.z + dz));
}
}
}
#[repr(C)]
#[derive(Copy, Clone, Pod, Zeroable, Debug)]
pub struct Vertex {
@ -211,13 +223,15 @@ pub fn build_chunk_mesh(world: &World, chunk: &Chunk) -> ChunkMesh {
];
let base_idx = vertices.len() as u32;
let corners = [c0, c1, c2, c3];
// Bake per-corner ambience: (sky_vis, bounce_color)
// from one hemisphere ray-cast pass each. Material
// id from the underlying block.
// Bake per-corner ambience via the heightmap-fast
// path. Assumes warm_heightmaps_around was called
// for this chunk before build_chunk_mesh. ~25
// array lookups per vertex instead of 8 hemisphere
// ray casts (~1000× faster).
let normal_v = Vec3::new(n_arr[0], n_arr[1], n_arr[2]);
let material = cell.block.material_id() as f32;
let amb: [_; 4] = std::array::from_fn(|i| {
crate::sim::lighting::compute_ambience(
crate::sim::lighting::compute_ambience_fast(
world,
Vec3::new(corners[i][0], corners[i][1], corners[i][2]),
normal_v,
@ -408,10 +422,15 @@ mod tests {
fn single_chunk_world(fill: impl FnOnce(&mut Chunk)) -> World {
let mut world = World {
chunks: std::collections::HashMap::new(),
heightmaps: std::collections::HashMap::new(),
};
let mut chunk = Chunk::new(IVec3::ZERO);
fill(&mut chunk);
world.chunks.insert(IVec3::ZERO, chunk);
// The fast ambience bake relies on a heightmap being cached
// for this chunk; warm it eagerly in the test helper so
// tests don't need to remember.
warm_heightmaps_around(&mut world, IVec3::ZERO);
world
}

View file

@ -13,7 +13,7 @@ pub mod uniform;
use crate::bridges::RemotePlayer;
use crate::camera::Camera;
use crate::mesh::{build_chunk_mesh, emit_oriented_box, name_hash, Vertex};
use crate::mesh::{build_chunk_mesh, emit_oriented_box, name_hash, warm_heightmaps_around, Vertex};
use crate::world::World;
use glam::{IVec3, Vec3};
use std::collections::HashMap;
@ -480,18 +480,25 @@ impl Renderer {
);
}
pub fn rebuild_chunk(&mut self, coord: IVec3, world: &World) {
pub fn rebuild_chunk(&mut self, coord: IVec3, world: &mut World) {
// Heightmaps for this chunk + 8 neighbors are required by the
// fast ambience bake. Cheap (O(N²·CHUNK_HEIGHT) per chunk,
// cached). Called before build_chunk_mesh, never lazily inside
// it, so the build path stays free of `&mut World`.
warm_heightmaps_around(world, coord);
let Some(chunk) = world.chunks.get(&coord) else {
return;
};
// tick/toc — mesh build (esp. the sky_visibility hemisphere
// raycasts) is the suspected hot path on world init. Times
// come through the browser console for now; once we have a
// real perf overlay they'll go there.
// Take an immutable snapshot of the chunk so we can pass an
// immutable &World into build_chunk_mesh while still owning
// &mut self below.
let chunk_clone = chunk.clone();
let t0 = browser_now();
let mesh = build_chunk_mesh(world, chunk);
let mesh = build_chunk_mesh(world, &chunk_clone);
let mesh_ms = browser_now() - t0;
if mesh_ms > 5.0 {
// Lowered threshold so the heightmap-fast path also surfaces
// in telemetry. Tune up later if it gets noisy.
if mesh_ms > 0.5 {
log::info!(
"rebuild_chunk {:?}: {:.1}ms ({} verts, {} idx)",
coord,

View file

@ -181,15 +181,10 @@ pub struct VertexAmbience {
/// - the average color of the first solid voxel each non-escaping
/// ray hits (→ bounce_color)
///
/// This is the *bake* call that runs once per quad-corner at
/// mesh-build time. Cheap CPU work, amortized — the fragment shader
/// then pays one multiply for the sky contribution and one for the
/// bounce contribution. Together they give:
///
/// ambient = sky_radiance(N) × sky_vis + bounce_color × (1 sky_vis)
///
/// A red brick wall thus casts a faint red bounce on the dirt next to
/// it; a sealed roof darkens but inherits the color of its underside.
/// **Slow path** — kept for reference + the construction-invariance
/// test. The fast path is `compute_ambience_fast` below; mesh-build
/// uses that one. Walks the world's voxel grid via HashMap lookups,
/// ~10ms per call. Don't put in hot paths.
pub fn compute_ambience(world: &World, pos: Vec3, normal: Vec3) -> VertexAmbience {
let n = normal.normalize();
let tangent = if n.x.abs() < 0.9 {
@ -219,15 +214,104 @@ pub fn compute_ambience(world: &World, pos: Vec3, normal: Vec3) -> VertexAmbienc
let bounce_color = if bounce_count > 0 {
bounce_sum / bounce_count as f32
} else {
// No occluders sampled — neutral gray bounce is the safe
// fallback; the shader weights this by (1 - sky_vis) anyway
// so a fully-open vertex barely uses this value.
Vec3::splat(0.35)
};
VertexAmbience { sky_vis, bounce_color }
}
/// Fast ambience bake driven by a pre-computed column heightmap.
///
/// **Profile motivation:** the ray-cast version runs ~100ms per chunk
/// (8 rays × HashMap lookups × thousands of corners). The world has
/// 289 chunks, so initial mesh build was ~29s on the main thread.
///
/// **Approximation:** for each (x, z) world column we know the
/// highest solid Y from the heightmap. To estimate sky_vis at a
/// vertex:
/// 1. If vertex is above its own column top, sky is fully open from
/// directly above (column_top_y < y).
/// 2. Sample neighbor columns within radius R = 3 voxels; for each,
/// check if its top is below the vertex. The fraction below is
/// the angular sky_vis approximation.
/// 3. Same logic biases up the result if the vertex itself is near
/// the surface (face normal pointing up sees more sky).
///
/// **Construction invariance still holds:** a sealed roof builds the
/// column top above the player, so column_top > player_y → low
/// sky_vis everywhere underneath. Same code path on the surface and
/// underground.
///
/// **Cost:** ~25 `column_top` lookups per vertex, each an O(1) array
/// read once the heightmap is cached. Sub-microsecond per vertex
/// versus tens of microseconds for the ray cast.
///
/// `bounce_color`: averaged across the topmost-solid blocks at sampled
/// columns whose tops are at-or-above the vertex (i.e. surfaces that
/// *would* bounce light onto this vertex). Skips ray-walking entirely.
pub fn compute_ambience_fast(world: &World, pos: Vec3, normal: Vec3) -> VertexAmbience {
use crate::world::{Block, CHUNK_SIZE};
let n = normal.normalize();
let cx = pos.x.floor() as i32;
let cz = pos.z.floor() as i32;
let vy = pos.y;
// Bias the column scan by the face normal: a +Y face counts open
// sky generously; a -Y face inherits more bounce / less sky.
let face_up = (n.y * 0.5 + 0.5).clamp(0.0, 1.0);
const R: i32 = 3; // 7×7 = 49 column samples
// Inverse-distance weighting: a column directly above dominates,
// a column 3 away contributes ~1/4 as much. This makes the
// approximation track real occlusion much better — a slab right
// overhead correctly produces near-zero sky_vis even though the
// sample window extends past the slab's edges.
let mut open_w = 0.0f32;
let mut total_w = 0.0f32;
let mut bounce_sum = Vec3::ZERO;
let mut bounce_w = 0.0f32;
for dx in -R..=R {
for dz in -R..=R {
let r2 = (dx * dx + dz * dz) as f32;
// 1 / sqrt(r²+1): peaks at center (1.0), falls to ~0.23 at r=4.
let weight = 1.0 / (r2 + 1.0).sqrt();
let wx = cx + dx;
let wz = cz + dz;
let top = world.column_top_y(wx, wz);
total_w += weight;
if (top as f32) < vy {
open_w += weight;
} else {
let block_at_top = world.get_block(glam::IVec3::new(wx, top, wz));
if block_at_top != Block::Air {
let c = block_at_top.average_color();
bounce_sum += Vec3::new(c[0], c[1], c[2]) * weight;
bounce_w += weight;
}
}
}
}
// Up-facing faces see more of the sky from any given column; down-
// facing faces see less (because their hemisphere is below the
// surface). The face_up bias accounts for this in [0.3, 1.0] range.
let geo = if total_w > 0.0 { open_w / total_w } else { 1.0 };
let sky_vis = (geo * (0.3 + 0.7 * face_up)).clamp(0.0, 1.0);
let bounce_color = if bounce_w > 0.0 {
bounce_sum / bounce_w
} else {
Vec3::splat(0.35)
};
// CHUNK_SIZE referenced for documentation parity; no-op.
let _ = CHUNK_SIZE;
VertexAmbience { sky_vis, bounce_color }
}
/// Walk a DDA ray through the voxel grid like `walks_to_sky`, but
/// distinguish "escaped to sky" (returns `None`) from "hit solid"
/// (returns `Some(block_average_color)`). Used by `compute_ambience`

View file

@ -152,8 +152,68 @@ impl Chunk {
}
}
/// Per-chunk topmost-solid-Y map: `heights[z * CHUNK_SIZE + x]` =
/// highest world-Y at which `(chunk_x*16+x, y, chunk_z*16+z)` is a
/// solid voxel. `i32::MIN` if no solid in the column.
///
/// Computed in O(N²·CHUNK_HEIGHT) once per chunk and cached so the
/// sky_visibility bake can do O(1) array lookups instead of casting
/// hemisphere rays. Recomputed on edit via `Chunk::dirty_heightmap`.
#[derive(Clone, Debug)]
pub struct HeightMap {
pub heights: Vec<i32>,
}
impl HeightMap {
pub fn new() -> Self {
Self {
heights: vec![i32::MIN; (CHUNK_SIZE * CHUNK_SIZE) as usize],
}
}
/// World-space `(x, z)` highest solid Y in this chunk. Caller must
/// convert to chunk-local coords (0..CHUNK_SIZE).
#[inline]
pub fn get_local(&self, lx: i32, lz: i32) -> i32 {
if lx < 0 || lx >= CHUNK_SIZE || lz < 0 || lz >= CHUNK_SIZE {
return i32::MIN;
}
self.heights[(lz * CHUNK_SIZE + lx) as usize]
}
pub fn from_chunk(chunk: &Chunk) -> Self {
let mut h = Self::new();
for z in 0..CHUNK_SIZE {
for x in 0..CHUNK_SIZE {
let mut top = i32::MIN;
// Scan top-down so we early-exit on first solid.
for y in (0..CHUNK_HEIGHT).rev() {
if chunk.blocks[Chunk::index(x, y, z)].solid() {
top = y;
break;
}
}
h.heights[(z * CHUNK_SIZE + x) as usize] = top;
}
}
h
}
}
impl Default for HeightMap {
fn default() -> Self {
Self::new()
}
}
pub struct World {
pub chunks: HashMap<IVec3, Chunk>,
/// Lazily-built per-chunk heightmap cache. `World::heightmap()`
/// computes-and-caches; mesh rebuilds invalidate via `set_block`.
/// Kept on `World` rather than `Chunk` so it can be recomputed in
/// a single immutable-borrow pass without aliasing the chunks
/// HashMap.
pub heightmaps: HashMap<IVec3, HeightMap>,
}
impl World {
@ -166,7 +226,43 @@ impl World {
chunks.insert(coord, chunk);
}
}
Self { chunks }
Self {
chunks,
heightmaps: HashMap::new(),
}
}
/// Get (or compute + cache) the heightmap for `chunk_coord`. Used
/// by the sky-visibility bake to do O(1) column lookups instead
/// of casting hemisphere rays.
pub fn heightmap(&mut self, chunk_coord: IVec3) -> &HeightMap {
if !self.heightmaps.contains_key(&chunk_coord) {
if let Some(chunk) = self.chunks.get(&chunk_coord) {
let h = HeightMap::from_chunk(chunk);
self.heightmaps.insert(chunk_coord, h);
} else {
self.heightmaps.insert(chunk_coord, HeightMap::new());
}
}
self.heightmaps.get(&chunk_coord).unwrap()
}
/// Read-only heightmap fetch — returns a borrowed `Option<&HeightMap>`
/// without computing on miss. The bake path uses this after
/// `heightmap()` has populated the cache.
pub fn heightmap_get(&self, chunk_coord: IVec3) -> Option<&HeightMap> {
self.heightmaps.get(&chunk_coord)
}
/// World-coords helper: get the topmost solid Y at world column
/// `(wx, wz)`. Returns `i32::MIN` if no solid (open sky all the
/// way down) or if the column's chunk hasn't been heightmapped.
pub fn column_top_y(&self, wx: i32, wz: i32) -> i32 {
let (cc, lc) = Self::block_to_chunk(IVec3::new(wx, 0, wz));
match self.heightmaps.get(&cc) {
Some(h) => h.get_local(lc.x, lc.z),
None => i32::MIN,
}
}
pub fn block_to_chunk(pos: IVec3) -> (IVec3, IVec3) {
@ -207,6 +303,10 @@ impl World {
}
}
}
// Heightmap is now stale for this chunk; drop the cached
// entry so the next bake recomputes. Neighbor heightmaps
// are unaffected because columns are chunk-local.
self.heightmaps.remove(&c);
true
}