Tuesday, June 18, 2013

Goodbye Octrees

It's been a while since I posted an update on the Voxelus project but it has been advancing slowly as time allows.  As with most hobby projects there never seems to be enough time but it's certainly not defunct and I enjoy fiddling with it when I can pull myself away from the Kerbal Space Program!

Framework Switch

Most of the changes however are fairly invisible from the outside, one of the largest for example was changing C++ framework from the commercial one that I use and help develop at work to an entirely new one of my own creation.  Although a considerable amount of work, I was finding the constantly moving goalposts of using a framework being actively developed for other purposes frustrating, as was the ever increasing complexity it brought.

By creating my own framework I am now entirely in control and can create one that works exactly as I want and is only as complex as necessary to achieve my goals.  Another benefit is that because it's now 100% my own code should I ever wish to distribute source for any of my projects I can do so freely without worrying about copyright or IP ownership issues.

Goodbye Octrees

Apart from changes to the underlying code structure, the biggest change to the project and definitely the largest consumer of development effort is my move away from sparse voxel octrees.  I had started out using SVOs as they have been receiving quite a lot of coverage recently and they felt like a good way to experiment with raycasting voxels.  While they work well for relatively small bounded areas I was having trouble working out a way to make them scale sufficiently to encompass an entire planet - my stated goal for this project.

The main problem with the SVO approach was the number of levels of data necessary to represent a planet to a sufficient fidelity.  Using some basic arithmetic I worked out that I would need 22 levels of voxel data to represent a cube of space 25,000 km on a side (roughly twice the size of the Earth) down to a resolution of 4.5 cm per voxel which I felt would be sufficient for walking around on the surface.  Ignoring the issues of storage, having to recurse down up to 22 levels of octree every step along every pixel ray reading index and data textures at every iteration felt like asking too much of the GPU.

The problem can be simplified somewhat by restricting the number of levels actually being used in any given frame to a sensible range based on the distance of the viewpoint from the surface - my guesstimate at the moment is using ten of the 22 levels - but you either need to still recurse down from the root to get to the first of the ten levels of interest or you need some sort of strategy allowing you to jump in to the octree structure at any given level and point in space.  Neither of these felt like problems I wanted to tackle.

I decided instead to ditch the octree structure altogether and move instead to a clipmap based system.  I had used clipmaps previously on other terrain based projects for rendering heightfields where they provide a relatively simple and effective solution for storing, streaming and rendering multi-resolution data so I thought I would be able to drop them in fairly easily to this current project.  Well...sort of.

Clipmap Background

Clipmaps have been around for a long time, there is a SGI paper on them from 2004 for example or more recently an article on using them for terrain rendering in GPU Gems 2, Miguel Cepero also appears to use them in 3D in his Voxel Farm Engine project. Essentially they provide a way to scroll a potentially infinite data field through a fixed size multi-resolution cache.  If you are familiar with mip-maps you can think of the 2D case as being somewhat like a reverse mip-map, instead of each level of data representing the same area of space but at half the resolution, in a clipmap each level of data is the same resolution and instead represents an area of space twice the size of the preceding level.
Three levels of a 4x4 clipmap illustrating how each level is the same resolution but covers four times the area of the preceding level at quarter the data density
So for example, if the first level of the clipmap stores a 1 km square of terrain in a 512x512 grid, the second level would store a 2 km square also in a 512x512 grid while the third level would store a 4 km square once again in a 512x512 grid.  It can be easily seen here that each level is storing four times the area of the preceding one, a geometric progression that allows huge areas to be covered with a relatively low number of levels.

When rendering with clipmaps you centre each level around the viewpoint so the highest fidelity data from the first level represents the terrain closest to the camera, the next level the data a little further away and so on.  As perspective is reducing the size of features on-screen anyway the drop in resolution with distance provides a natural level of detail scheme.

A key trick for efficiency when moving clipmaps around your data set is to think of the mapping of terrain to the clipmap buffers in memory as toroidal so instead of having to copy all the data around as the viewpoint moves you simply write the new data that's just moved into range over the old data that's just moved out of range.  In this way the minimum amount of memory is rewritten and because the clipmaps never actually change size or move away from the viewpoint you don't get any numerical precision issues during rendering facilitating an effectively infinite data set.

For a system like Voxelus where creating or loading the data for a region can take a number of frames and it's possible to move the viewpoint rapidly around you do get inevitable flicker if the data for the new region can't be copied in fast enough but unnecessary flickering can be minimised by keeping a border around the edge of the clipmap that isn't normally rendered.  When the viewpoint moves at a more measured pace this extra data gives you something to render while the new data is paged in.  Of course the faster your viewpoint moves the wider this band of pre-prepared data must be to avoid flickering so it's a trade-off of memory versus visual artifacts.

Adding Dimensions...

As I said I had used 2D clipmaps before in a fairly traditional heightfield renderer but to use them for rendering a volumetric planet it would appear that I had to extend them to work in 3D, but as I was looking to render a set of ten levels from my full planet sized 22 level data set I actually had to implement a four dimensional clipmap arrangement as the renderable data set scrolls not just in the X, Y and Z spacial axes but also along the W axis representing which level of data to use as the viewpoint moves closer to and further away from the surface.

This isn't conceptually that much more difficult, but it did give me some head scratching moments trying to work out the shape of dirty data regions as the viewpoint moved around!

It's well known that voxels can produce vast amounts of data which can be difficult to manage efficiently in a real-time environment,  trying to manage them individually makes this even worse so although I dropped the octree data structure I decided to keep the brick concept from the earlier version.  In my currently implementation each of the ten clipmap levels stores a 19^3 grid of bricks each of which is a 14^3 array of voxels.

Wireframe bounds of each of the 10 clipmap levels when the viewpoint is located close to the planet surface on the left hand side
As can be seen here, each level encloses 1/8th of the volume of the preceding level, normally they are centred around the viewpoint but that makes them hard to visualise so for the purposes of this screenshot I moved the viewpoint close to the surface on the left hand side of the planet then locked their positions before flying back out for illustrative purposes.

Clipmap levels coloured by level
This second image shows the view from the original camera position near the surface, but here the voxels from each clipmap level are being rendered in a different colour.  Note the smooth blending between voxels of different levels achieved by sampling adjacent levels and blending the distance values.  This is one of the main benefits of raycasting voxels rather than triangulating them - seamless per-pixel level of detail blending - which I plan to expand on in another post.

Voxel brick visualisation
Finally this image shows the same style of colour coded level of detail view but also illustrates the size of the voxel bricks.  Each of those patches is a 14^3 brick of voxels which is the smallest unit of data I generate, store and transfer.  Only the higher detail brick size is shown here so each pixel is in fact sampling from the illustrated brick size and the equivalent one from the next level down (i.e. double the width, height and depth).  Note how the bricks get larger automatically with distance - a simple metric of pixel size over distance is used to pick the best pair of levels at each point along each pixel's ray.

As the viewpoint moves bricks are pulled from a CPU side cache and copied into the dirty regions.  By transferring voxel data around in brick sized units it's possible to use the hardware far more efficiently than trying to do so individually.

The Devil's in the Details

As anyone who's ever tried to implement complicated real time graphics will know even conceptually simple systems can end up taking plenty of effort to get working well, but to help stave off TL:DR syndrome I'll save further detail for another post.

Until then, anyone care to guess how many voxels I'll need to make up a planet?


  1. I'm guessing 4.7810528e+20 voxels.

  2. I think that a planet that size would contain about 9.8782084e+17 voxels. But i guess you don't really care about all the voxels inside it...

  3. Why not having pyramids instead of rays? You don't have to change anything except that, when you are dealing with a cube/octree & a point inside it on a ray, you paint the corresponding pixel with the cube's color & don't descend further if that ray's point has a view space z such that "sqrt(3) * e <= z * 2 / r" where e is the cube edges' size & r is the resolution of your viewport. (I assume a 90° view pyramid). Note that, as usual, you need no float, * or /. A sufficient condition is "e + (e >> 1) + (e >> 2) <= z << 1 >> log(r)".

  4. Even for an Earth sized planet, you need around 10^21 voxels - encoding each voxel as a byte, wolfram alpha claims that's about 200x the entire of human knowledge. Bravo!

    What sort of performance are you seeing when raytracing this sort of dataset? I've been tempted to try a ray-tracing solution, but the performance numbers have never looked particularly good...

    1. Performance wise, raycasting is certainly not the most efficient way to render such a data set with GPUs being heavily specialised for triangle rasterisation, but it's surprisingly good with a reasonable graphics card (I'm using Radeon HD 6800/6900's so not exactly cutting edge). Using the clipmaps instead of a tree structure reduces the number of volume texture samples required at each iteration along the ray to two or even just one if you don't want to blend between LOD levels.

      The biggest performance problem I have at the moment is uploading new brick data from main memory to the GPU. I've tried a couple of techniques but so far haven't found one that doesn't cause stalls in rendering when under load - TBH information on best practices for asynchronous uploading to GPU with DirectX seems a bit thin on the ground and PC GPU debugging tools for profiling the API & HW seem a bit...unstable...


Comments, questions or feedback? Here's your chance...