Ambient occlusion : explanations

Hello there. I am sorry that I have not written since long time ago, but I had to pass my exams ^^.

So, what is ambient occlusion?

AO_firstResult

There, we can see respictively : No ambient occlusion, occlusion map, and rendering with ambient occlusion.

So, Ambient occlusion allow to improve shadow in the scene.

Generally, Ambient Occlusion is defined by

ao_integral

Ohhhh, it’s a very difficult formula, with a strange integrale.

We are going to see how we can get one formula in this kind.

Firstly, ambient occlusion is a « ray tracing » technique. The idea behind ambient occlusion is, you launch many rays throughout the hemisphere oriented by normal \overrightarrow{n}

Obviously, we can say that one orthogonal ray to normal is not influent compared to one parallel ray, so, we can introduce a dot product between ray and normal into the integral. We do not perform this integral in all hemisphere, but only in the hemisphere’s surface.

I remind you that the infinitesimal surface of sphere is d\omega = R^{2}\sin(\theta) d\theta d\phi .

So, we can lay down that :

\displaystyle{ ka = K \cdot ka_{total} = \int_{\Omega} V(\overrightarrow{\omega})\cdot \cos(\overrightarrow{n},\overrightarrow{\omega})\cdot d\overrightarrow{\omega}}

Where \Omega is the hemisphere oriented by \overrightarrow{n} and \overrightarrow{\omega} is the ray « launched » in the hemisphere, V(\overrightarrow{\omega}) is the view function defined by

\displaystyle{V(\overrightarrow{\omega}) =  \left\{\begin{matrix}  &0&\hspace{1mm}if\hspace{1mm}occluder\\  &1&\hspace{1mm}if\hspace{1mm}no\hspace{1mm}occluder  \end{matrix}\right.}

K is the constant as we have to compute, because ka have to be ranged between 0 and 1. So, now we can try to compute K, with the view function always to 1, because we compute the case with no occlusion is performed.

\displaystyle{\begin{array}{lcl}&&\int_{\Omega} V(\overrightarrow{\omega})\cdot \cos(\overrightarrow{n},\overrightarrow{\omega})\cdot d\omega \\  &=&\int_{\Omega}\cos(\overrightarrow{n},\overrightarrow{\omega})\cdot d\omega\\  &=&\int_{0}^{2\pi}\int_{0}^{\frac{\pi}{2}}R^2\cdot\frac{\overrightarrow{n}}{{||\overrightarrow{n}||}}\cdot\frac{\overrightarrow{\omega}}{{||{\overrightarrow{w}||}}}\cdot \sin(\theta)\cdot d\theta d\phi \left(||\overrightarrow{\omega}||=||\overrightarrow{n}|| = R \right ) \\  &=&\int_{0}^{2\pi}\int_{0}^{\frac{\pi}{2}}\cos(\theta)\cdot \sin(\theta)\cdot d\theta d\phi\\  &=&\int_{0}^{2\pi}\frac{1}{2}\cdot d\phi\\  &=&\pi \end{array}}

So, K=\frac{1}{\pi}, so, we have the same expression of the first integral in the beginning of this article :

\displaystyle{ka=\frac{1}{\pi}\int_{\Omega}V(\overrightarrow{\omega})\cdot \cos(\theta)\cdot d\omega}

For people who like so much rigourous mathematics, I know it’s not very rigourous, but it’s with this « method » that we will compute our occlusion factor :).
If you prefer a more accurate technique, you can integrate in all Hemisphere (with the variable radius and take a view function who return one value between 0 and 1 according to the distance of occluder from the origin of ray) and, you get exactly the same formula cause you do one thing like this : \frac{1}{R}\int_{0}^{R} dr = 1 with \frac{1}{R} is from the View function to limit the return value from 0 to 1.

So, now, we can try to approximate this integral. We have two problems, we can’t perform « launching » infinite rays, so, we launch only few rays, and we have to use a « inverse » of view function \bar{V}

\displaystyle{\begin{array}{lcl}&&\frac{1}{\pi}\int_{\Omega}V(\overrightarrow{\omega})\cdot\overrightarrow{n}\cdot\overrightarrow{\omega}\cdot d\omega \\  &=&1-\frac{1}{\pi}\int_{\Omega}\bar{V}(\overrightarrow{\omega})\cdot\overrightarrow{n}\cdot\overrightarrow{\omega}\cdot d\omega\\  &\approx&1-\frac{1}{N}\sum_{\mathbb{N}}\bar{V}(\overrightarrow{\omega})\cdot\overrightarrow{n}\cdot\overrightarrow{\omega} \end{array}}

After that, we can blur the occlusion map to improve its rendering.

So we just use a simple blur.

#version 440 core

/* Uniform */
#define CONTEXT 0
#define MATRIX 1
#define MATERIAL 2
#define POINT_LIGHT 3
#define MATRIX_SHADOW 4

layout(local_size_x = 256)in;

layout(shared, binding = CONTEXT) uniform Context
{
    uvec4 sizeScreenFrameBuffer;
    vec4 posCamera;
    mat4 invProjectionViewMatrix;
};

layout(binding = 4) uniform sampler2D AO;
layout(binding = 4, r32f) uniform image2D imageAO;

void main(void)
{
    float blur = texture(AO, vec2(gl_GlobalInvocationID.xy) / sizeScreenFrameBuffer.zw).x;

    for(int i = 4; i > 0; --i)
        blur += texture(AO, vec2((ivec2(gl_GlobalInvocationID.xy) + ivec2(-i, 0))) / sizeScreenFrameBuffer.zw).x;

    for(int i = 4; i > 0; --i)
        blur += texture(AO, vec2((ivec2(gl_GlobalInvocationID.xy) + ivec2(i, 0))) / sizeScreenFrameBuffer.zw).x;

    imageStore(imageAO, ivec2(gl_GlobalInvocationID.xy), vec4(blur / 9, 0, 0, 0));
}

Now, we have to code our ambient occlusion.

ssao_sphere_samples

This picture is very good to understand how we can use our approximation.
Indeed, there, we can see that if the point is red, we have V(\overrightarrow{\omega}) = 1, so \bar{V}\overrightarrow{\omega}) = 0 You just have to test the depth buffer to know if the point is occluder or no.

#version 440 core

/* Uniform */
#define CONTEXT 0
#define MATRIX 1
#define MATERIAL 2
#define POINT_LIGHT 3
#define MATRIX_SHADOW 4

layout(shared, binding = CONTEXT) uniform Context
{
    uvec4 sizeScreenFrameBuffer;
    vec4 posCamera;
    mat4 invProjectionViewMatrix;
};

layout(local_size_x = 16, local_size_y = 16) in;

layout(binding = 1) uniform sampler2D position;
layout(binding = 2) uniform sampler2D normal;
layout(binding = 3) uniform sampler2D distSquare;

writeonly layout(binding = 4, r16f) uniform image2D AO;

void main(void)
{
    float ao = 0.0;

    const ivec2 texCoord = ivec2(gl_GlobalInvocationID.xy);
    const vec2 texCoordAO = vec2(texCoord) / sizeScreenFrameBuffer.zw;

    vec3 positionAO = texture(position, texCoordAO).xyz;
    vec3 normalAO = texture(normal, texCoordAO).xyz;
    float distSquareAO = texture(distSquare, texCoordAO).x;

    for(int j = -2; j < 3; ++j)
    {
        for(int i = -2; i < 3; ++i)
        {
            vec2 texCoordRay = vec2(texCoord + ivec2(i, j)) / sizeScreenFrameBuffer.zw;

            vec3 positionRay = texture(position, texCoordRay).xyz;
            float distSquareRay = texture(distSquare, texCoordRay).x;

            float c = dot(normalAO, normalize(positionRay - positionAO));

            if(c < 0.0)
                c = -c;

            if(distSquareRay < distSquareAO)
                ao += c;
        }
    }

    imageStore(AO, texCoord, vec4((1 - ao / 25), 0.0, 0.0, 0.0));
}

Strangely, in this case, if I use shared memory, I have badder result than just texture. Maybe the allocation of shared memory is longer and it’s not very efficient here :-).

I advise you to use texture instead of imageLoad, indeed, I get 8 times performance better with texture ^^.

Bye :). The next time, we will talk about shadows !

Publicités

Return to Rasterization, Lighting, and Ambient Occlusion

Hello there. I have not written since few weeks, so I’m sorry.

So, now, I’m going to explain why I leave the ray tracing concept for the rasterization (again?).

To have an efficient ray tracing, you need a good structure with tree, but, the problem is that having a tree on GPU is not easy. Indeed, GPU does not have a call stack and pointer.

So, for the rasterization part, I keep the same model, with 4 textures (Diffuse, pos, normal, distSquare) into FrameBuffer.

Now, to add lighting, I can add another FBO with another texture.

This technique is called : Deferred lighting (or shading). The main advantage of this technique, is, instead of computing lighting in all pixel in the screen, you compute lighting only in pixels that are affected by lighting.

For example, if your lights are placed in the center of your screen, with a radius of 100 pixels, your lighting is computing of 10 000 pixels instead of 2 000 000 (in FULL HD).

So. I am going to explain how to create a deferred lighting for point light. Indeed, to have a spot lights, or globe light, It’s more difficult. Really, I don’t know for globe lights, but for spot light, you have to create a cone, with a good « angle ». If you want to know more about globe lights, you can go here Nvidia GPU Gems 3 : Tabula Rasa .

So, it exists many techniques about deferred lighting. I am going to explain how my engine works.

I began to introduce how my light’s system it’s implemented.

/* PointLight structure */
struct PointLight
{
    mat4 projectionViewModel; // Matrix for light
    vec4 positionRadius;
    vec4 color;
};

The matrix projectionViewModel is a matrix used to « project » our lights in the screen to avoid compute lighting in pixels « useless ».

Remember, I saw on the last article, I store my Positions and normals in Frame Buffer.

Now, I bind these textures, and I draw a cube which own my lights (I use only point lights), and configure your cube.

Wait, What is a configuration for the cube?? It’s only the position of your light, with the radius.

ptr->projectionViewModel = projectionView * scale(translate(mat4(1.0f), light.positionRadius.xyz()), vec3(light.positionRadius.w));

Now I can draw my cube and do computing.

Yes but, I have a little problem. Indeed, a cube have « two opposite faces », so if I am out of your lights, my lighting is computing two times and it don’t provide a good result.

 

In
In

 

out
out

You can see that I have one light more powerful when I am out instead in ^^.

So, how can I solve this issue? Simply on using the Stencil Buffer.
You clear your buffer with 0, and if any draw is make, you increment this value only if this value is 0. So when you compute once time on one pixel, you can’t perform another computing in the same pixel.

glEnable(GL_STENCIL_TEST); // Active stencil test
glStencilFunc(GL_EQUAL, 0, 0xFF); // Pass only if equal to 0
glStencilOp(GL_KEEP, GL_KEEP, GL_INCR);

// Increment only if pass

So, now, I can introduce computing 😀 .

Currently, my lighting algorithme is very simple, but I will complicate that later (with quadratique attenuation, normal / height map and other).

Currently, I use a linear attenuation and a famous \displaystyle \overrightarrow{n} \cdot \overrightarrow{l}. :

float computeFactorLight(vec3 posToLight, vec3 posToLightNorm, vec3 normal, float distToLight, float radius)
{
 float attenuation = 1.0 - distToLight / radius;

 if(attenuation <= 0.0)
 return 0.0;

 float nDotL = dot(normal, posToLightNorm);

 if(nDotL <= 0.0)
 return 0.0;

 return nDotL * attenuation;
}

The name of this article own a term : Ambient Occlusion

It is the Ambient Occlusion map with a average blur :

Capture du 2014-11-20 00:09:14

But, I will talk about that in the next article. It is planned to talk about calculation, and optimisation. It’s not the best formula, and not the best technique, but I will explain how can I get a little formula, and how can I use this formula.

Spheres, Planes, Deferred Shading, compute shaders

Hello there, today, we are going to talk about too many things.

Firstly, we are going to show you how our engine work, and how to compute a ray sphere, or ray plane intersection.

We are going to talk about optimization too.

So, to begin, we are going to explain how our engine is build.

We use a technique called Deferred Shading.

Into our GBuffer (it’s the name of buffer used in deferred shading), we have 4 + 1 textures.

G-Buffer

We can see that we have 4 important textures, and one less important.

So, now, I’m going to explain what their utilities are.

The color Texture is the basic texture that contains color of each pixel. (RGBA8 : normalized unsigned byte per component)
The position texture is a texture that contains positions of each object projected in the screen. (RGBA32F : float per component)
The normal Texture is a texture that contains normal of each object projected on the screen. (RGBA16_SNORM : normalized signed short per component)
The Distance to the Camera Texture that contains the distance of camera for each position written in the position texture. The same applies to the depth map. (GL_R32F : float for one component).

For example, here are 3 screenshots of Color, Position, and Normal textures.

Capture d'écran de 2014-11-04 19:05:59 Capture d'écran de 2014-11-04 19:09:52 Capture d'écran de 2014-11-04 19:11:43

So, now, we are going to explain the multiple passes in our rendering algorithm.

The first pass, is a rasterization pass, we use a vertex and fragment shaders to do this.
As a reminder, vertex is a processor of vertex, it is invoked one time for each vertex (if you have 1 000 triangles, it’s invoked 3 000 times), and fragment shaders are invoked once time for each pixel.

Our engine works in world space, so, positions, normals are in world position.

To rasterize, we use a simple shader with Frame Buffers (which replace screen, or multiple screens)

So, now, we will explain the maths about Ray Tracing.

As a reminder, here are the respective equations of plane, and spheres.

\left\{\begin{matrix} a\cdot x + b\cdot y + c\cdot z + d &=& &0& \\  (x-pos.x)^2 + (y-pos.y)^2 + (z-pos.z)^2 &=& &R^2& \Leftrightarrow &<xyz - pos, xyz - pos> = R^2&  \end{matrix}\right.

So, your ray begin at position ro (position of Camera), and have a  rd direction.

\left\{\begin{matrix}  x &=& ro.x + t\cdot rd.x \\  y &=& ro.y + t\cdot rd.y \\  z &=& ro.z + t\cdot rd.z & &  \end{matrix}\right.

Now, you just can solve for t the above equations, and you have a position of the intersection object.

But, how can I get a rd vector??

It’s really easy.

Projection\cdot View\cdot rd = (x,y,1.0,1.0) \\  \Leftrightarrow rd = (Projection\cdot View)^{-1}\cdot(x,y,1.0,1.0)

The signification of « 1.0 » for the z component here is : « far plane ».

Now, to optimize the sphere computing, you only have to draw one cube with rasterization, and, in a shader, you can do a computing, so, instead of computing of all your pixels, you compute only in a little area in your screen. It’s a power of deferred shading.

Ohhhh, I have a problem : Spheres are cut …

Capture d'écran de 2014-11-04 22:23:09

It’s because you have to use a infiniteProjection (glm see around of infinitePerspective).

For the plane, my advise is to use Compute Shaders instead Fragment shaders. Indeed, if you put plane in shared memory (equivalence to L1 cache), you obtain a little gain in performance.
Indeed, shared memory is about 100 times faster than global memory, so if you have a big number of accesses of your buffers, it’s better to use shared memory ^^.

Now, we can draw lights and shadows ^^.

But it will be for the next article ^^.

Bye 🙂 .