3 times increase in voxel generation speed

TerraVol’s procedural terrain generation is thrashing its internal ChunkData cache!

The loops are ordered:

  1. outside: work across X & Z
  2. inside: down Y!

But this means every 8 Blocks done (SIZE_Z), another ChunkData must be found (flushing the cache) and that repeats 8×8 = 64 times for a single Chunk2D column!  Better would be going down Y as outer loop and across X & Y as inner loops.  This should give 512 hits before a new ChunkData must be found.

Let’s measure performance before and after.

This is all done in threads so we’ll need thread-local stopwatches that, in a lock, add their value to a global and increases a numDone count.  Periodically lock, average, dump and clear.

Timing results before:

With minHeight 1.4 (default in PlayerSetupArena):

0.0001 seconds/column  = not terrible but just about the *best* case possible — barely 3 chunks high!

With minHeight 20:

 0.004490196+
 0.006598039+
 0.009725491+
 0.01311765 +
 0.02141177 +
 0.01319608 +
 0.02912745 +
 0.02162745 +
 0.01719608 + (note worryingly increasing!)
 ===
 Average: 0.01516

After (still minHeight 20):

 0.003186275 +
 0.004343137 +
 0.006480392 +
 0.004676471 +
 0.007215686 +
 0.004666667 +
 0.00472549  +
 0.008745098 +
 0.00472549  +
 0.004921569 +
 ===
 Average: 0.00537

I make that roughly a 3 times improvement!  Real-time use shows FPS now much closer to being pegged at 60!

This also paves the way to my next optimization which should drastically reduce the load on both the Builder threads *and* (most importantly) the Unity Main Thread!  (all this will improve the local-multiplayer game but, more importantly, make the campaign able to use much wider areas of voxel snow!)

Before that, a quick aside: a further improvement might be retrieving the ChunkData outside the X,Z loops since they’re the same for all X & Z for a given Y!  It’s cached (threadlocal) but it’s 4 deep in method calls and involves some struct allocations, etc that just aren’t needed.  Think I’ll leave for now but…

The difficulty here is that, since this is all threaded code, I cannot profile it!

I’ll repeat: No profilability!  Unprofilable!  Profligate!  (check it)  Gah!

Unity’s profiler doesn’t record data for anything but the Unity Main Thread!  (which it can do excellently btw and also does awesome things for GPU, memory (better coming), etc.)  Furthermore since Unity isn’t Microsoft’s .Net runtime, one cannot apply Visual Studio (or others aimed at MS’).  As such, one’s reduced to personal timing tests like my Stopwatch use here.  That requires knowing where potential problems might be!  Recalling that “Premature optimization is the root of all evil“, this is a hard pill to unswallow!

By that I mean that, for gamedevs like me, we find it hard not to write things optimally as we go — but Knuth’s Adage means we must try to focus on code clarity (for maintainability).  That’s the hard pill.  Trying to unswallow it… yuck!  Even worse!

Just added all my remaining feedback points to the Unity Enhancement Request for Profiling MultiThreaded code.  Who’s with me!

Advertisements

2 thoughts on “3 times increase in voxel generation speed

  1. Pingback: Looking further, TerraVol performance work, part 2 | Snwscf Dev Blog

  2. Pingback: Fixing and 99% improvement in voxel replay | Snwscf Dev Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s