New Project : Architecture and Performances.

Hello there :-).

As I have promised the last time, I have started again from the beginning my engine. I am going to explain why I had need to change my architecture, and explain my differents choices.

To begin, after thinking a lot about Illumination in real time, I found that it’s not useful to have Global Illumination, with no « correct » direct illumination. That’s why, I will implement a Phong Shading gamma correct.

About performances, my old code was very very bad (you can see the proof after), I had to implement a powerful multi stage culling. Moreover, I had to use the latest technologies of GPU to increase performances. Therefore, I am using Compute Shaders, Bindless Textures, Multi Drawing.

For the beginning, I will explain how my engine works internally. I am using several big buffers, for example one which contains all of Vertex, another one which contains all of Index, Command, and so on. The Matrix, Command, AABB3D buffers are filled in each frame. After, a view frustrum culling on GPU is performed, therefore, the Command Buffer is also bind to SHADER_STORAGE_BUFFER and INDIRECT_DRAW. After culling, a pre Depth pass is performed and, finally, the final model pass is performed. We will see the details later.

To have a certain coherence in the scene, like glasses on cupboard, if you move the cupboard, your glasses have to move as well, you have to have a scene graph. To do that, I already explained that I am using a system of Node. But now, I have improve this model with one obvious feature. If the Parent Node don’t have to be render, I will not render the children Nodes and Models as well. My system is decomposed in 3 Objects. One SceneManager, One Node, and one Part of this Node (There may be several kind of Part (lights for example is another part)) : ModelNode.

It is the Model Node. It is a part of a Node which contains a class like « pair » between one Model and One Matrix.
ModelNode contains :
– One bounding box in World Landmark (it depends of the Global Matrix of the parent Node).
– One matrix which contains a relative landmark on the Global Matrix of the Parent Node. If one transformation is performed by this Object, the bounding boxe for the ModelNode and the bounding boxes of « all » parents Node are recomputed.

The function « PushInPipeline » can be strange for you, that is why I am going to try to explain you its role. As I already said, I have several big buffers, this function will modify few buffers, like Matrix, bounding box, and commands buffer. You have exactly the same number of Command, AABB, and Matrix. It’s not obvious to have exactly the same number for Matrix and Command, for examples, if you have one model with 300 meshes, you have, normally, 300 commands, but one matrix. In my system, you have therefore 300 matrix. After you will see why.

The Node object is a root of a Sub-Tree in our Scene Graph. Each child of a Node depends of a Landmark of this root, and the bounding box of this root depends of childs bounding boxes, therefore, the building of this « tree » may be one down approach (if a Root transformation is performed) or one up approach(if a child transformation or Model transformation is performed).
This Node contains :
-a Parent
-a vector of children
-a vector of Model
-a globalFactor : It is a global scale factor (useful for the future lights if scaling Node).

As you can see, the Scene Manager contains two objects, one pointer on Node and one Camera, the last one is not important here, and it can may be move later to Node for example. The objective of this class is to give to user one interface to render all the scene, here, it doesn’t matter the lights, but later, we can add renderLights for example. The function render call succesively initialize, renderModels, renderFinal, the function initlalize update Camera Position and Frustrum, and « reset » our pipeline, the function renderModels render all Models and material information on FrameBuffer, the last one function draws the final texture on the screen with post processing.

Wait, it seems like highly to the « normal » pipeline, right?
Yes, but now, we can improve performances with pre Depth Pass and View Frustrum culling.

What is View Frustrum Culling? The idea behind this term is to reduce a number of « Draw Objects » in this frame. So, you will test if the object is or not into the Camera View, and if not, you don’t render it.
How to perform efficiently this culling?
For one unaccurate culling, you have to perform test on the CPU. Globally, this test is perform in the Node function pushModelsInPipeline.
To have a really accurate and efficient culling, because you have a high number of bounding boxes, you should use your GPU to perform culling. So, a good combinaison between Draw Indirects and Shader Buffer storage will be your friend with the Compute Shaders.

A pre Depth Pass is to avoid to render on the color buffers data that you will change with a nearest object. So, the rendering take 2 passes, but it is, in general, most efficient :).

There is a little diagram with and without optimizations. (ms / size)

As you can see on this picture, the performances gave by culling are almost constant, but a Pre Depth pass performances depends of the size of FrameBuffer’s textures.
We gain nearly twice performances with pre depth pass and culling, but if you have a higher number of objects (here is just Sponza drew), you will earn too much performances :-).

If you want to see the code or the documentation of the code :

See you next time :-).

Kisses !

Publicités

Ambient Occlusion revisited, reflections, global illumination and new project

Hello there.

Today, we are going to talk about 4 subjects.
The first one is a new computation of Ambient Occlusion, the second one is reflection, as I said the last time.

Even if we are going to talk of several subjects, it will be fast because I don’t have too much things to say on every field.

So, for the ambient occlusion

We remind to:

$\displaystyle{AO=\frac{1}{\pi}\int_{\Omega}V(\overrightarrow{x},\overrightarrow{\omega})\cdot\cos(\theta)\cdot d\omega}$
$\displaystyle{AO=\frac{1}{\pi}\int_{[0,2\pi]}\int_{[0,\frac{\pi}{2}]}V(\overrightarrow{x},\overrightarrow{\omega})\cdot\cos(\theta)\sin(\theta)\cdot d\theta d\phi}$
$\displaystyle{AO=1-\frac{1}{2\pi}\int_{[0,2\pi]}\int_{[0,\frac{\pi}{2}]}\overline{V}(\overrightarrow{x},\overrightarrow{\omega})\cdot\sin(2\theta)\cdot d\theta d\phi}$
$\displaystyle{AO\approx 1-\frac{\pi}{2n}\sum_{i=0}^{n}\overline{V}(\overrightarrow{x},\overrightarrow{\omega_{i}})\cdot\sin(2\theta_{i})}$

where $\theta_{i}$ and $\omega_{i}$ are a « random » variable.

If you want the « real AO » (real is not just, because Ambient Occlusion is not one physic based measure), you have to raytrace this function, and take care about all of these members.

So, we want to approximate again this formula, and, now, we have 2 textures, and the second one is the normal map, another own the position map.

If you want to know the AO in one point, you have to take few samplers around this pixel (the sampling area is a disk in my code).
So, if your ray is in the hemisphere, the angle between ray and normal is $\theta\in[-\frac{\pi}{2},+\frac{\pi}{2}]$ . You can improve this formula
with a factor distance
We don’t need really the view function, indeed, if you take a sampler no « empty », you have one intersection, and if the cell is empty, $\overrightarrow{n}\cdot\overrightarrow{\omega_{i}}=0$, so doesn’t matter.

#version 440 core

/* Uniform */
#define CONTEXT 0

layout(shared, binding = CONTEXT) uniform Context
{
vec4 invSizeScreen;
vec4 cameraPosition;
};

layout(binding = 0) uniform sampler2D positionSampler;
layout(binding = 1) uniform sampler2D normalSampler;

// Poisson disk
vec2 samples[16] = {
vec2(-0.114895, -0.222034),
vec2(-0.0587226, -0.243005),
vec2(0.249325, 0.0183541),
vec2(0.13406, 0.211016),
vec2(-0.382147, -0.322433),
vec2(0.193765, -0.460928),
vec2(0.459788, -0.196457),
vec2(-0.0730352, 0.494637),
vec2(-0.323177, -0.676799),
vec2(0.198718, -0.723195),
vec2(0.722377, -0.201672),
vec2(0.396227, 0.636792),
vec2(-0.372639, -0.927976),
vec2(0.474483, -0.880265),
vec2(0.911652, 0.410962),
vec2(-0.0112949, 0.999936),
};

in vec2 texCoord;
out float ao;

void main(void)
{
vec3 positionAO = texture(positionSampler, texCoord).xyz;
vec3 N = texture(normalSampler, texCoord).xyz;

vec2 radius = vec2(4.0 * invSizeScreen.xy); // 4 pixels
float total = 0;

for(uint i = 0; i < 16; i ++)
{
vec2 texSampler = texCoord + samples[i] * radius;
vec3 dirRay = texture(positionSampler, texSampler).xyz - positionAO;
float cosTheta = dot(normalize(dirRay), N);

if(cosTheta > 0.01 && dot(dirRay, dirRay) < 100.0)
{
ao += sin(2.0 * acos(cosTheta));
total += 1.0;
}
}

// Don't multiply by Pi to get a better result
// addition is here if total == 0
ao = 1.0 - ao / (2.0 * total + 0.01);
}


For that, you perform the calculation over the screen divided by 4 (2 for width, and 2 for height), and you can apply a gaussian blur with separate pass thanks to compute shaders.
For one rendering in 1920 * 1080p, (Thus, the occlusion map size is : 1024 * 1024), the ambient occlusion pass take less than 2ms to be computed.

I have nothing to say about reflections, I don’t use the optimisations of László Szirmay-Kalos, Barnabás Aszódi, István Lazányi and Mátyás Premecz with their Approximate Ray-Tracing on the GPU with Distance Impostors.

So, it is only a environment map, like omnidirectional shadow (the prior article).

Global Illumination (GI), a very interesting subject where researchers spend a lot of time…

The GI, is bounces of light on different surfaces.

You can see blue, green, and red on the floor.

I don’t implement a very efficient model, so I gonna to present you only the way and the idea that I found.

So, for our mathematical model, we use $\overline{\cos}$ for cos clamped between 0 and 1.

We need to use BRDF (Bidirectional reflectance distribution function). The BRDF is a function (really :p? ) which is given 2 directions($\omega_{i}$ and $\omega_{o}$) and return a rapport of « power of outgoing vector on power of incident vector ». This function verify these properties :

$\displaystyle{\int_{\Omega}BRDF(\omega_{i},\omega_{o})\cdot\cos(\theta_{r})\cdot d\omega_{r}\leq 1}$
$\displaystyle{BRDF(\omega_{i},\omega_{o})=BRDF(\omega_{o},\omega_{i})}$

For a lambertian surface, the light is reflected with the same intensity all over the hemisphere, so, we can lay $BRDF=K$ and, with these previous property

$\displaystyle{\int_{\Omega}BRDF(\omega_{i},\omega_{o})\cdot\cos(\theta_{r})\cdot d\omega_{r}\leq 1}$
$\displaystyle{\int_{\Omega}K\cdot\cos(\theta_{r})d\omega_{r}\leq 1}$
$\displaystyle{K\pi\leq 1}$
$\displaystyle{K\leq\frac{1}{\pi}}$

so

$\displaystyle{BRDF_{lambertian}=\frac{\rho}{\pi}}$ where $\rho\in[0, 1]$.

$\rho$ is called « Albedo ». For the clean snow, $\rho$ is close to 1, for a very dark surface $\rho$ is close to 0.

The basic idea behind my algorithm is based on Instant Radiosity from Keller.

We render the scene from the light source, on a very little FrameBuffer (8*8, 16 * 16, 32 * 32, or 64 * 64 (but it’s already too much) ), and thanks to shader buffer storage, for each pixel, we can store on this buffer our lights.

$\displaystyle{L_{o}=\frac{\rho\cdot K_{d}}{N\cdot\pi}\cdot L_{i}\cdot\overline{\cos(\theta_{i})}PHONG}$
where PHONG (Phong lighting on google) is calculation of the power of light in x and Kd is the colour of the object touch by light.

After that, you just have to render lights like other.

So, for the new project, I will begin to write one open source graphic engine. I will try to do it totally bindless. (Zero bind texture : extension bindless texture, Few bind buffer : Only few big buffers). I will implement a basic tree of partitionning space totally in GPU thanks to compute shaders. I will implement phong shading or phong brdf (if I success ^^).

See you !