This tutorial demonstrates how to use the render passes API to implement simple deferred shading.
Render passes is a feature of the next-generation APIs that allows applications to define rendering commands in a way that better maps to tiled-deferred rendering architectures used by virtually all mobile platforms. Unlike immediate rendering architectures typical for desktop platforms, tiled-deferred renderers split the screen into small tiles (e.g. 64x64 pixels, the actual size depends on multiple factors including render target format, fast memory size, GPU vendor, etc.) and perform rendering operations tile after tile. This allows GPU to keep all data in a fast GPU-local cache, which is both faster and more power-efficient. When GPU is done processing one tile, it flushes all the data to the main memory and moves to the next tile.
Render passes were introduced to give applications explicit control over tile operations. A good metal model of a render pass is a set of operations that the GPU performs in a local tile cache before flushing the data to the main memory and moving to the next tile.
A render pass is defined by the following key components:
Diligent Engine enables applications to use and intermix render target API and render passes API. While the former one is a more implicit way, the latter is a more explicit approach and requires more effort from the application developers. Most importantly, no state transitions are allowed within the render pass. As a result, an application must not use RESOURCE_STATE_TRANSITION_MODE_TRANSITION
with any command while a render pass is active.
This tutorial demonstrates a simple deferred shading renderer implemented using render passes API. The render pass consists of two subpasses. The first subpass is a G-buffer pass: it renders the scene and populates two buffers - color and depth. The second pass is a lighting pass. It renders light volumes and applies simple distance-based lighting to the G-buffer. Using the render passes API lets the driver reorder the operations and fuse G-buffer pass and lighting pass into a single tile operation thus avoiding the need to store intermediate G-buffer data to the main memory and reading it back.
To create a render pass we need to prepare an instance of RenderPassDesc
struct. But first we need to define some auxiliary data.
The first piece of the information we need to define is the render pass attachments. In this tutorial we will be using 4 attachments:
The first attachment is the color G-Buffer:
Notice that we must specify the initial attachment state that the corresponding texture will be in before the render pass begins as well as the final state it will be in after the render pass ends. Also notice that as the load operation, we specify ATTACHMENT_LOAD_OP_CLEAR
. This will tell the driver that old contents of the texture is not needed and should not be loaded from the main memory. Also note that as the store operation we use ATTACHMENT_STORE_OP_DISCARD
that instructs the driver to discard all the data after the end of the render pass thus avoiding the need to write it back to the main memory.
The second attachment is the normalized device Z coordinate. Note that we can't extract this from the depth buffer (attacment 3), as we can't use it as both depth-stencil and input attachment during the second lighting subpass.
Note again that we use ATTACHMENT_LOAD_OP_CLEAR
and ATTACHMENT_STORE_OP_DISCARD
as load and store operations.
The third attachment is the depth buffer:
The last attachment is the final buffer where the shaded result will be written to:
Note that unlike previous attachments, this time we use ATTACHMENT_STORE_OP_STORE
because we will need to keep the final image to display it on the screen.
As discussed above, the render pass will have two subpasses. The first subpass is the G-buffer pass, the second one is the lighting pass:
The first subpass uses attachments 0 and 1 as render targets, and attachment 2 as depth-stencil buffer.
The AttachmentReference
struct defines the attachment number as well as its state during the subpass.
The second subpass uses attachments 0 and 1 as input attachments, attachment 2 as depth-stencil buffer, and attachment 3 as render target:
Each subpass defines the states of all its attachments, and the attachments are transitioned between the states when going from one subpasspass to the next. However, besides attachment states, a render pass must also specify execution dependencies. In our specific example, attachments 0 and 1 are used as render targets in the first subpass and as input attachments in the second. So we need to specify a dependency from ACCESS_FLAG_RENDER_TARGET_WRITE
access type performed by PIPELINE_STAGE_FLAG_RENDER_TARGET
pipeline stage of subass 0 to ACCESS_FLAG_SHADER_READ
access type from PIPELINE_STAGE_FLAG_PIXEL_SHADER
pipeline stage of subpass 1.
Execution dependencies is a very complicated topic and is beyond the scope of this tutorial.
Finally, when we have all pieces that describe the render pass, we can populate the RenderPassDesc
structure and create the render pass object:
Creating a pipeline state object that uses explicit render pass is mostly the same as creating a PSO that uses render targets, with one difference: the PSO description structure should use the pRenderPass
and SubpassIndex
members:
Note that when pRenderPass
is not null, all render target formats as well as depth-stencil format must be TEX_FORMAT_UNKNOWN
, and the number of render targets must be 0.
The only backend that currently natively supports input attachments is Vulkan, and subpass attachments are only supported in GLSL. To define subpass inputs in the shader, use the following syntax:
In the shader, use subpassLoad
function to load the subpass data:
Note that subpassLoad
function does not take the position because it is implicitly defined by the position of the current fragment.
In all other backends input attachments should be defined as regular textures and accessed appropriately:
The final part of the render passes API is the framebuffer. The framebuffer encapsulates the actual textures that will be used as attachments in the render pass. The framebuffer must use exactly same number of attachments as the render pass, and the the texture view formats must match exactly the corresponding render pass attachment formats. To create a framebuffer, prepare FramebufferDesc
structure and call IRenderDevice::CreateFramebuffer
method:
There are three main subpass commands: BeginRenderPass
, NextSubpass
, and EndRenderPass
.
BeginRenderPass
as the name suggests begins a render pass and starts the first subpass. To begin a render pass, besides the render pass itself we also need to specify a framebuffer, as well as clear values for all attachments that use ATTACHMENT_LOAD_OP_CLEAR
load operation:
In the first subpass of our render pass, we render the scene. Then we call NextSubpass
to move to the lighting subpass and draw the lights. Finally, we call EndRenderPass
to finish the render pass:
A very important aspect of render passes that needs to be mentioned again is that state transitions are not allowed between BeginRenderPass
and EndRenderPass
calls. The tutorial explicitly transitions all resources it uses to correct state during the initialization:
and then uses RESOURCE_STATE_TRANSITION_MODE_VERIFY
mode with every call that requires state transition mode.
Diligent Engine's render passes API largely resembles Vulkan, so Vulkan spec will provide the most comprehensive description. ARM software maintains a list of Vulkan best practices for mobile developers that include attachment load/store operations, attachment layouts transitions, and subpasses.