Impostor on Intel WebSite.

Introduction

Rendering dynamic, complex scenes inreal-time is a challenging task. Developers and gamers alike love to haveenvironments that are as realistic in appearance as possible withoutsacrificing game-play or speed. One such environment that more and more gamesare adopting, and that is used as an example in this column, is the greatoutdoors. Unfortunately for performance, being outdoors means, among otherthings, rendering a lot of geometry. Terrain, water, clouds, and vegetationespecially, are all factors that contribute to the challenge. However, with abit of creative thinking, and smart use of the 3D API, be it OpenGL, Direct3D*,or any other API, developers can squeeze an amazing amount of performance outof today's hardware.

The example discussed in this column specifically pertains to rendering forestsof trees, and uses Direct3D*, but the ideas presented could be applied to any3D scene containing lots of objects with similar geometry using any 3D API.Trees provide an interesting example because they provide all the necessaryingredients for the rendering challenge. They are geometrically complex, theymay require a detailed texture, they are abundant in nature, and they areaffected by natural elements (e.g. gravity, wind, etc...). Because trees existin nature, people know what they look like, how they behave, and how they fitinto their environment. A consequence of this is that people also have anexpectation of what trees will be in a game. Since the goal is to create asrealistic an environment as possible, developers must take all of that generalknowledge into consideration when creating virtual environments.

Feasibility of Rendering Full Geometry

A great many species of trees live inthe natural world, and modeling every one would be very difficult and timeconsuming. Most of the trees in a given geographic area, however, arerelatively similar, and detailed modeling of as few as four different trees,which are then rotated and scaled differently, can result in an impressivelyrealistic forest. The best part is that you don't need, or need to be, atalented artist. Each tree in the example project corresponding to this columnis procedurally generated on the fly, and contains more than 10,000 polygons!The utilization of a noise function determines how many trees end up on theland, as well as each tree's type, scale, rotation, and position. A constantvariable that is a threshold determines the tree density of the forests, and isset to allow a total of about 400 trees into the example's initial scene.

Assuming that all the trees in the initial scene are in place, and that eachtree's properties are set (type of tree, scale, rotation, and position), theinteresting question of how to render the trees comes into play. The answer tothis question is by no means set in stone, and may in fact vary depending onthe desired effect. Let's get into a couple of the possible approaches,discussing the pros and cons of each, and expand on some importantimplementation decisions. Our desired effect here is the fastest possiblerendering of realistic trees.

This first approach is by far the slowest, and the run-time performance isunacceptable for any real-time gaming environment. The idea is simple: renderthe full geometry of everything in the scene. Four hundred trees, ten thousandplus polygons per tree, not to mention land, sky, and water is a lot ofgeometry to render, over 4 million triangles per frame! On the other hand, theamount of work that must be done in software is extremely small. Our frameworkis not as efficient as the best engines available today, but at best, we couldonly achieve 1 or 2 fps, which is unacceptable. This kind of performancequickly sparks thoughts of improvement, and as we will see next, hugeimprovements are possible without reducing the polygon count of the trees, orthe tree density of the forests.

The first obvious improvement is to apply some sort of visibility culling tothe scene. The example divides the world into a 2-dimensional grid, andarranges each quadrant of a grid such that a quadrant contains a single patchof terrain, a single plane at a fixed level for water, and a linked list oftrees. Any quadrant that falls outside the view volume of the camera is notrendered. Granted, this is a relatively simple approach to culling, and addssome work to be done in software, but it will prove to serve the desired effectwell. From here on out, assume culling is on. With this small improvement, only40% of the quadrants in a 10x10 grid are drawn, resulting in a maximum framerate of 3 fps. A finer division of quadrants (i.e. putting more, smallerquadrants in the scene) will result in the percentage of quadrants drawn beinglowered to about 25%. Nonetheless, rendering the full geometry of the visiblescene is still extremely expensive, and the question of what really needs to berendered remains.

Enter the Impostors

It turns out that it is not necessary torender the full geometry of a tree that is so far away from the camera that ittakes up a single pixel of screen real estate. The user can barely see itanyway. It is not even necessary to render the full geometry of a tree unlessthat tree is close enough to the camera where the user can see the differencebetween a 3D tree and a texture that looks like a tree, or an impostor. Animpostor, in the context here, is a billboard that always faces the camera withan applied texture that visually represents the geometric object it replaces,as figures 1 and 2 illustrate.

Figure 1: A solid view of a tree and an impostor (the alpha channel of theimpostor allows omission of the black region when rendering).

Figure 2: A wireframe view of the same tree and impostor

Using impostors will drastically reduce the polygon count, but it raises otherissues:

· How will billboards be used to represent different viewsof the individual trees?

· When to render an impostor vs. a real tree?

· How to go about creating the impostors?

Now things become more interesting froma developer's standpoint. Consider walking a full circle around a tree in reallife. The branches of the tree remain stationary with respect to the treetrunk. In other words, they do not spin around the tree's trunk to face youreyes at all times. Now consid er moving a camera around a tree, or any objectfor that matter, in a game. If the tree were to look exactly the same from anypoint of view, then the goal of simulating realism would not be met. To avoidthis problem, the full geometry of any tree within a certain distance of thecamera is rendered while trees beyond that distance are rendered as impostors.But, a problem remains to be solved with this solution.

As mentioned earlier, each tree in the scene has unique properties assigned toit at initialization. If the same texture were applied to all the impostors,then all the trees rendered as impostors would look exactly alike, which wouldeffectively create a very unnatural forest. Also, when the distance between thecamera and a tree became small enough to constitute rendering the tree's fullgeometry, a very noticeable popping would surely occur. All is not lost,however, for we have a powerful tool in our toolbox to help us maintain theillusion: render-to-texture.

Smart use of Render-to-Texture

Rendering just one view of a tree to atexture is not enough, as per reasons just discussed; therefore, a decisionmust be made as to how many views of a tree to render to texture, and whetherto use one texture per view, or one texture representing all the chosen viewsof a tree. The trick to selecting how many views to use is careful selection ofthe swapping distance, the distance from the camera at which the full geometryof a tree is rendered instead of its impostor. In order to keep the memoryfootprint small, but maintain the appearance of realism, the example uses aswapping distance such that no noticeable popping occurs when using just eightviews: N, NE, E, SE, S, SW, W, NW. Generally, just these eight are effective,but experiment with more or less views depending on your application's needs.Of course, keep in mind that it all depends on the scale of the world in whichyour trees live, but with some experimentation finding a suitable value isrelatively easy.

Next consider a second decision concerning the textures: whether to use onetexture per view or one texture representing all eight views of a tree. Textureswapping can be quite expensive with modern APIs, so choosing to render oneview of a tree to a texture would add some penalty. Rendering all eight viewsof a tree to a single texture reduces the amount of texture swappingenormously. Taking one step further, rendering all eight views of all fourtrees to a single texture eliminates texture swapping between impostorsentirely! In the interest of maintaining a realistic appearance while keepingresource usage down, and the performance up, the example uses this lasttechnique.

Though an elegant solution, choosing to render all possible views to a singletexture using render-to-texture has its drawbacks. The first drawback concernsrendering to a texture with an alpha channel, and this caveat actually appliesto all render-to-texture techniques (the alpha channel is necessary so alphatesting may be done against the background of the impostors at render time).The example code creates a render target with an alpha channel, which issomething that not all graphics hardware supports. We tested the example onseveral graphics cards, and worked well (running the latest publicly availabledrivers), but there is hardware out there on which the example will notcorrectly execute.

If supporting graphics hardware that does not support render targets with analpha channel is a requirement, there is a way to work around the problem.Instead of creating a render target with an alpha channel, create a basic(lockable) render target. Also create a system memory texture with the sameformat as the render target, a system memory texture with an alpha channel, anda video memory texture with an alpha channel. An illustration best describeshow this solution works, but we'll also step through it:

Figure 3: Creating an impostor with alpha for hardware that does notdirectly support Render Target Textures with alpha.

First, render an object to the target. Lock the render target, copy the bits tothe system memory texture with the same format as the render target, and unlockthe render target. At this point the render target may be discarded, as it isno longer needed. Copy the bits from the system memory texture to the systemmemory texture with an alpha channel, and take care to set the alpha bits asnecessary (opaque where the object was rendered, transparent otherwise).Finally, lock the video memory texture with alpha, copy over the bits from thesystem memory texture with alpha, and unlock the video memory texture withalpha. The final result is an exact impostor of the object with alpha valuesset for alpha testing when the impostor is rendered. Since this is all done atstartup time, performance is not an issue.

A second drawback to rendering all possible views to a single texture concernsusing a possible alternative method to impostors, point sprites. All eightviews of all four trees rendered to a single texture means that each impostormust apply some sort of texture transform (see figure 4). Point sprites do notsupport texture transforms as of DirectX* 8.0, therefore enabling point spriteswould mean rendering each view of each tree to a single texture. Thatparticular path was avoided for reasons discussed earlier, thus we must ruleout point sprites even though they have the advantage of only storing az-position.

Figure 4: Impostor Texture Transform (selecting texture coordinates)

MIP-mapped textures become substantially more difficult as well because wewould have to maintain a buffer around each tree's view in order to keep thedifferent views from bleeding together.

With a texturing scheme is in place, it is time to move on to another importantaspect to consider when rendering hundreds (potentially thousands) of smallobjects with Direct3D*: smart use of vertex buffers. Since all the impostorsare essentially quads with specific texture coordinates based on each tree'srotation, it is possible to use:

· a single 32-vertex VB for all the impostors, or

· a 4-vertex VB per impostor

Doing either requires selecting theappropriate texture coordinates per impostor on the fly, but that will be truein any case using a single texture representing multiple views. The performancehit comes with the hundreds of calls to DrawIndexedPrimitive(), with each callonly rendering 4 vertices.

A better approach is to do the transformations for all the impostors in software,and stuff all the transformed vertices into a single, large vertex buffer. Theexample project uses a vertex buffer large enough to accommodate 1000 treeimpostors (4000 vertices), though it could potentially support up to ~16,000tree impostors (~64,000 vertices). The example only supports 1000 treeimposters in a VB because anything over 4000 vertices had negligibleperformance gains. Although it may seem like a lot of work to use a singletexture, do the impostor transforms (scale, rotate, position, calculate texturecoordinates) in software, and use a single, large vertex buffer, the finalresult is really quite pleasing both visually and with respect to performance.Performance, as measured by the frame-rate, boosts from a slow 3 fps to anawesome 90 fps, a 1000% increase!

Scene rendered with no impostors

Similar scene rendered with impostors

Conclusion

As the numbers demonstrate, the fastrendering of easy impostors is a great tool to have in a developer's toolbox.Combining techniques like render-to-texture with procedural content alleviatesthe need for additional artwork, and saves in content development time. Acareful thought process concerning visibility culling, texturing, and API usagewill increase the overall rendering speed of your pipeline, and ultimatelyimprove the overall visual quality. Hopefully the ideas presented here willprove useful the next time you are faced with the difficult problem ofrendering reality.