This tutorial demonstrates how to implement order-independent transparency (OIT) methods to render transparent objects without sorting.
Transparent objects have long posed a challenge in computer graphics because the order in which they are rendered affects the final image. This is due to the non-commutative nature of the blending operation - i.e., the order in which transparent surfaces are composited over each other matters. Sorting objects by their depth is only a partial solution because in many cases there is no single correct order that works for all pixels (for example, when transparent objects intersect, for self-overlapping transparent objects with complex geometry, or for nested objects). This is where order-independent transparency (OIT) comes in. OIT methods allow transparent objects to be rendered in any order without producing visual artifacts. This tutorial shows how to implement OIT in Diligent Engine.
This tutorial demonstrates three different approaches to rendering transparent objects:
This is the simplest approach. It is used here to illustrate the kind of visual artifacts that appear when rendering transparent objects without sorting.
Weighted-blended OIT assigns weights to each transparent surface based on its depth and transparency, giving higher weights to surfaces closer to the camera and more opaque. This method is easy to implement and very efficient, but because it is an approximation, it has limitations:
This approach is based on the method of Fang Liu et al.. However, instead of storing the individual fragments, the algorithm builds a transmittance function that allows the transparent objects to be rendered in any order with additive blending.
Consider three surfaces A, B, and C that are composited on top of each other in that order. The final color is given by:
where RGB_A
, RGB_B
, and RGB_C
are the (alpha-premultiplied) colors of the surfaces, and T_B
and T_C
are the transmittance functions of the surfaces. The transmittance function is the fraction of light that passes through a given surface.
We can rewrite this more generally as:
where Tc(X)
is the cumulative transmittance function from the camera to the surface X
. This cumulative transmittance is the product of the transmittances of all surfaces between the camera and X
. If we know the transmittance function for each pixel, we can render the transparent objects in any order using additive blending.
In our implementation, the transmittance function is represented by up to K
closest layers, each storing that layer's transmittance and depth. Any additional layers are merged into a tail, which contains the total number of merged layers and the total transmittance. Each layer is packed into a 32-bit integer where the top 24 bits store the depth and the bottom 8 bits store the transmittance. This design allows us to sort layers by depth using atomic operations. The following function packs the layer data into a 32-bit integer:
Layer data is stored in a structured buffer, and the tail is stored in an RGBA8 texture.
After rendering all opaque objects, the algorithm proceeds with the following steps (described in detail below):
The layers buffer is cleared by a compute shader that sets the value of each element to 0xFFFFFFFF
, which indicates an empty layer.
The main challenge is designing an algorithm that merges layers in parallel, given that multiple surfaces covering the same pixel might be processed simultaneously in arbitrary order. We build upon the algorithm by Fang Liu et al. and use atomic operations. Recall that we pack the layer depth and transmittance into a 32-bit integer, making atomic min suitable for inserting a new layer into the buffer while keeping the buffer sorted by depth. To insert a new layer, the shader performs atomic min operation with all stored layers in order. If the operation succeeds, the new layer is inserted into the buffer, and atomic operation returns the previous value, which in turn needs to be inserted into the buffer or merged into the tail.
Below is the key part of the pixel shader:
The algorithm starts by obtaining the pixel depth value and the alpha value of the transparent object. Notice that alpha represents the object opacity, while we need the transmittance value which is 1 - A
. The opacity value is then compared with the opacity threshold that is set to 1.0/255.0
. Since we use 8 bits to store transmittance, any opacity below that corresponds to the fully transparent object and is discarded.
The algorithm then packs the layer data and starts inserting it into the buffer:
For each layer, it performs atomic min operation in an attempt to insert the new layer. The operation returns the previous value stored in the buffer. Few outcomes are possible:
0xFFFFFFFF
- the layer is inserted into the empty space in the buffer. In this case, nothing else needs to be done.OrigLayer
contains another value larger than the new layer, or the layer was not inserted and OrigLayer
contains a value smaller that the new layer. In both cases, the algorithm needs to insert the maximum of the two values into the next position in the buffer.Let's take a look at the following example. Suppose we have the following values in the buffer: 10
, 20
, 30
, and we want to insert the value 15
. The algorithm will perform the following steps:
15
into the first position. Since 10 < 15
, the buffer is not updated. OrigLayer = 10
, Layer = max(15, 10) = 15
15
into the second position. Since 15 < 20
, the buffer is updated. OrigLayer = 20
, Layer = max(15, 20) = 20
. Buffer now contains 10
, 15
, 30
20
into the third position. Since 20 < 30
, the buffer is updated. OrigLayer = 30
, Layer = max(20, 30) = 30
. Buffer now contains 10
, 15
, 20
, and we left with the value 30
that needs to be inserted into the tail.Note that multiple shader invocations can attempt to insert different layers into the same place in the buffer. In general, the buffer contents may change between each loop iteration. However, the following considerations ensure that the insertion algorithm works correctly:
i
, it may only be inserted at position i
or later, no matter what other invocations are doing.i
is larger than the value at position i-1
. This ensures that the buffer is always sorted.After the loop finishes, there are two possible outcomes: either the layer was inserted into empty space in the buffer, in which case Layer == 0xFFFFFFFFu
, or we are left with the value that needs to be inserted into the tail. The tail is updated using the alpha blending. We accumulate the total number of tail layers in the color channel, and the total transmittance in the alpha channel:
An important property of this algorithm is that it is stable. The order in which transparent objects are rendered does not affect the final result. The K
closest layers are stored in the buffer, and all other layers are merged into the tail. The tail blending operations are commutative, so the order in which they are composited does not matter.
Note: The earlydepthstencil
attribute is crucial because writing to a UAV normally disables early depth-stencil tests. We re-enable them to ensure opaque objects correctly occlude transparent ones behind them.
Before rendering the transparent objects, we first attenuate the background based on the computed transmittance function. We draw a full-screen quad that multiplies the background color by the product of all transmittances:
This shader iterates over all layers in the buffer to accumulate the total transmittance value. If the maximum number of layers has been reached, the tail's transmittance is taken from the alpha channel of the tail texture. If T
is 1.0 (meaning the background is unchanged), we simply discard the pixel. Otherwise, alpha-blending scales the background color by T
.
Finally, we render the transparent objects again, this time using the transmittance data to composite them correctly. The pixel shader computes how much light was transmitted from the current fragment by accumulating the transmittances of all layers closer to the camera:
Here, T
accumulates the transmittances of all layers in front of the current fragment. If there are more than K
layers, we take the average contribution from the tail. This ensures correct blending, even if many layers overlap.
earlydepthstencil
attribute, the depth test must be done manually in the pixel shader.The layered OIT algorithm is both efficient and stable. It produces correct results when the number of overlapping transparent objects does not exceed the number of layers K
. It also handles high-opacity objects (even fully opaque ones) correctly. For K
layers, the algorithm requires K * 4 + 4
bytes per pixel. For four layers, that amounts to 20 bytes. By adjusting K
, you can balance memory usage and performance against image quality.
However, the algorithm is less efficient for large numbers of overlapping, highly transparent objects (e.g., smoke). In such cases, the weighted-blended OIT or moment-based OIT may be more suitable.