How to make a photon mapper : Whitted Ray Tracer

Hello there,

Today, we are going to see how to make the first part of our photon mapper. We are unfortunately not going to talk about photons, but only about normal ray tracing.

Whitted Ray Tracer without shadows
Whitted Ray Tracer without shadows

This is ugly !!! There is no shadow…

It is utterly wanted, shadows will be draw by photon mapping with opposite flux. We will see that in the next article.


In this chapter, we are going to see one implementation for a whitted ray tracer

Direct Lighting

Direct lighting equation could be exprimed by :

\displaystyle{L_o(\mathbf{x}, \vec{\omega_o}) = \sum_{i\in \{Lights\}}f_r(\mathbf{x}, \vec{\omega_i}, \vec{\omega_o})\frac{\phi_{Light}}{\Omega r^2}cos(\theta)}

The main difficult is to compute the solid angle \Omega .
For a simple isotropic spot light, the solid angle could be compute as :

\displaystyle{\Omega=\int_{0}^{angleCutOff}\int_{0}^{2\pi}sin(\theta)d\theta d\phi =-2\pi(cos(angleCutOff)-1)}

with :

  1. \Omega the solid angle.
  2. \phi_{Light} the total flux carried by the light.
  3. cos(\theta) the attenuation get by projected light area on lighted surface area.
  4. angleCutOff \in [0; pi].

Refraction and reflection

Both are drawn by normal ray tracing.


Now, we are going to see how our ray tracer works :

Shapes :

Shapes are bases of renderer. Without any shapes, you can’t have any render. We can have many differents shape, so, we can use one object approach for our shapes.

* @brief This class provides an interface to manage differents shape
class AbstractShape {
    AbstractShape() = default;
    AbstractShape(std::unique_ptr &material);

    * @brief Return the distance between shape and ray emetter
    * @param ray
    * @return Negative value if not found. Distance between object and ray emetter
    virtual float intersect(Ray const &ray) const = 0;
    virtual glm::vec3 getNormal(glm::vec3 const &position) const = 0;

    * @brief Return the radiance returned by the material owned
    * @param ray
    * @return radiance
    glm::vec3 getReflectedRadiance(Ray const &ray);

    virtual ~AbstractShape() = default;

    std::unique_ptr mMaterial = std::make_unique(glm::vec3(1.0, 1.0, 1.0), 1.0);


For each shapes, we obviously have a particular material. The material have to give us a brdf and can reflect radiance.

* @brief This class describes a material
class AbstractMaterial {
    AbstractMaterial(float albedo);

    * @brief Get the reflected radiance
    * @param ray
    * @param shape Useful to get normal, UV for texture???
    * @return { description_of_the_return_value }
    virtual glm::vec3 getReflectedRadiance(Ray const &ray, AbstractShape const &shape) = 0;

    virtual ~AbstractMaterial() = default;

    float albedo;
    * @brief Get the Bidirectionnal Reflectance Distribution Function
    * @param ingoing
    * @param outgoing
    * @param normal
    * @return the brdf
    virtual float brdf(glm::vec3 const &ingoing, glm::vec3 const &outgoing, glm::vec3 const &normal) = 0;

Storage Shapes

To have a better ray tracing algorithm, we could use a spatial structure like Kd-tree or other like one :

* @brief This class provide a structure to store shapes
class AbstractShapeStorage {
    * @brief Add a shape in structure
    * @param shape
    virtual void addShape(std::shared_ptr const &shape) = 0;

    * @brief Get the nearest shape
    * @param ray
    * @return a tuple with shape and distance. Shape could be null if no shape found
    virtual std::tuple<std::shared_ptr, float> findNearest(Ray const &ray) = 0;

    ~AbstractShapeStorage() = default;


The main algorithm part is on the materials side. Below, a piece of code where I compute the reflected radiance for a lambertian material and a mirror. You could see that material part has access to other shape via the global variable world.

float LambertianMaterial::brdf(const glm::vec3 &, const glm::vec3 &, glm::vec3 const &) {
    return albedo / M_PI;

vec3 UniformLambertianMaterial::getReflectedRadiance(Ray const &ray, AbstractShape const &shape) {
    vec3 directLighting = getIrradianceFromDirectLighting(ray, shape);
    float f = brdf(vec3(), vec3(), vec3());

    return color * f * (directLighting);

float MirrorMaterial::brdf(const vec3 &, const vec3 &, const vec3 &) {
    return albedo;

vec3 MirrorMaterial::getReflectedRadiance(const Ray &ray, const AbstractShape &shape) {
    if(ray.recursionDeep >= MAX_BOUNCES)
        return vec3();

    Ray reflectedRay = getReflectedRay(ray, shape.getNormal(ray.origin + ray.direction * ray.distMax));

    auto nearest = World::world.findNearest(reflectedRay);

    if(get(nearest) != nullptr) {
        reflectedRay.distMax = get(nearest);
        return brdf(vec3(), vec3(), vec3()) * get(nearest)->getReflectedRadiance(reflectedRay);

    return vec3();


Lighting is a useful feature in a render. It's thanks to lights that you can see the relief. A light carry a flux. Irradiance is the flux received by a surface.

So, our interface is :

 * @brief      Interface for a light
class AbstractLight {
    AbstractLight(glm::vec3 const &flux);

     * @brief      Compute irradiance receive by the projected area in position
     * @param      position  surface's position
     * @param      normal    surface's normal
     * @return     irradiance
    virtual glm::vec3 getIrradiance(glm::vec3 const &position, glm::vec3 const &normal) = 0;

    virtual ~AbstractLight() = default;

    glm::vec3 mTotalFlux;

Below a piece of code about computing irradiance :

vec3 SpotLight::getIrradiance(const vec3 &position, const vec3 &normal) {
    vec3 posToLight = mPosition - position;
    vec3 posToLightNormalized = normalize(posToLight);

    if(dot(-posToLightNormalized, mDirection) > mCosCutoff) {
        float solidAngle = - 2.f * M_PI * (mCosCutoff - 1);

        return lambertCosineLaw(posToLightNormalized, normal) * mTotalFlux /
            (solidAngle * dot(posToLight, posToLight));

    return vec3();

The next time, we will see how to integrate a photon mapper to our photon mapper. If you want to have the complete code, you could get it here :

Bye my friends :).


How to make a Photon Mapper : Rendering Equation Debunked


Hi guys!

I have not written anything here since a long time, I had exams, professional mission in C++ with Qt and too many things to do. Don’t be afraid, activities in this blog will be start again soon.

In computer graphics, I didn’t do beautiful things, except a little ray tracer with photon mapping. I’m going to explain how to make a little photon mapper. Firstly, we’ll see the it on the CPU side, after we’ll try to implement it in the GPU side.

We will see several points :

  1. Explanations about rendering equations
  2. A raytracer
  3. A photon mapper
  4. A GPU side with Photons Volume

Perhaps, some chapters will be divided into several parts.

Rendering Equation :

Source : Wikipedia Scheme relating to the equation

The rendering equation is :

\displaystyle{L_o(\mathbf{x}, \vec{\omega_o}) = L_e(\mathbf{x}, \vec{\omega_o}) + \int_{\Omega}f_r(\mathbf{x}, \vec{\omega_i}, \vec{\omega_o})L_i(\mathbf{x}, \vec{\omega_i})(\vec{\omega_i}\cdot\vec{n}) d\omega_i =}
\displaystyle{L_e(\mathbf{x}, \vec{\omega_o}) + \int_{\Omega}f_r(\mathbf{x}, \vec{\omega_i}, \vec{\omega_o})dE_i(\vec{\omega_i}) =}
\displaystyle{L_e(\mathbf{x}, \vec{\omega_o}) + \int_{\Omega}f_r(\mathbf{x}, \vec{\omega_i}, \vec{\omega_o})\frac{d^2 \phi_i(\mathbf{x}, \vec{\omega_i})}{dA}}

What is it burried into this equation?

  1. L_o(\mathbf{x}, \vec{\omega_o}) is the outgoing radiance. It will be our pixel color in R, G, B space. It is exprimed in W\cdot sr^{-1}\cdot m^{-2}.
  2. L_e(\mathbf{x}, \vec{\omega_o}) is the emmited radiance. It should be zero for almost all materials, excepting those that emit light (fluorescent material or lights for examples).
  3. f_r(\mathbf{x}, \vec{\omega_i}, \vec{\omega_o}) is the Bidirectional Reflectance Distribution Function(aka BRDF). It defines how light is reflected on surface point \mathbf{x}. It is exprimed in sr^{-1}. Its integral over \Omega should be inferior or equal to one.
  4. L_i(\mathbf{x}, \vec{\omega_i}) is the incoming radiance. It is like the light received by \mathbf{x} in a direct or indirect way.
  5. (\vec{\omega_i}\cdot\vec{n}) is the lambert cosine law.
  6. d\omega_i is the differential solid angle. It is exprimed in sr.
  7. dE(\vec{\omega_i}) is differential irradiance. It is the radiant flux (or power) received by a surface per unit area. It is exprimed in W\cdot m^{-2}.
  8. \phi_i(\mathbf{x}, \vec{\omega_i}) is the radiant flux(or power) passing from \mathbf{x} and have for direction \omega_i. It could be seen as one of our photons. It is exprimed in R, G, B space as well. It is exprimed in W.
  9. dA is the area which received the radiant power.

Manipulate rendering equation for an approximation

This part is going to be a bit mathematical with « physical approximation 😀 ».

Simplification of irradiance

Zack Waters : Irradiance and flux.

\displaystyle{dE_i=\frac{d^2 \phi_i}{dA}\rightarrow E_i=\frac{d\phi_i}{dA}}

Assuming the « floor » of the hemisphere is flat and photon hitting this floor is like hit \mathbf{x}, we have :

\displaystyle{E_i \simeq \sum \frac{\phi_i}{\pi r^2}\rightarrow E \simeq \sum_i \sum \frac{\phi_i}{\pi r^2} = \sum \frac{\phi}{\pi r^2}}.

So, to have a better approximation, you must have to reduce the hemisphere’s radius and increase the photons number.

The rendering equation simplified

\displaystyle{L_o(\mathbf{x}, \vec{\omega_o}) = L_e(\mathbf{x}, \vec{\omega_o}) + \int_{\Omega}f_r(\mathbf{x}, \vec{\omega_i}, \vec{\omega_o})L_i(\mathbf{x}, \vec{\omega_i})(\vec{\omega_i}\cdot\vec{n}) d\omega_i =}
\displaystyle{L_o(\mathbf{x}, \vec{\omega_o}) = L_e(\mathbf{x}, \vec{\omega_o}) + \frac{1}{\pi r^2}\sum f_r(\mathbf{x}, \vec{\omega_{photon}}, \vec{\omega_o})\phi(\mathbf{x}, \vec{\omega_{photon}})}

Direct Illumination

Direct illumination need an accurate quality, so, we will not use the simplified equation seen above but the real one. The main difficulty is to compute the solid angle. For example, a point light with power \phi_L :

\displaystyle{L_r = f_r(\mathbf{x}, \vec{\omega_i},\vec{\omega_o})(\vec{\omega_i}\cdot\vec{n})\frac{\phi_l}{4\pi r^2}}

We remind that E = \frac{I}{r^2} = \frac{\phi_l}{\Omega r^2} with \Omega the solid angle.

Photon tracing, reflection and refractions

The photon tracing is easy. Imagine you have a 1000W lamp, and you want to share all of its power into 1 000 photons, each photon should have a 1W power (We could say the colour of this photon is (1.0, 1.0, 1.0) in RGB normalized space). When photon hit a surface, you store it into the photon maps and reflect it with the BRDF :).

If you didn’t get it all, don’t be afraid, in our futur implementation, we are going to see how it works. We will probably not do any optimization, but it should be good :).

Bye my friends 🙂 !

Lighting, lighting… Did you say something about Lights?

Hello there !

I have implemented many features inside the engine, Lighting, Shadows, and Indirect lighting (as I am going to talk with further details as the second last one article).

I have nothing specialy to say about lighting.
I have implemented one deferred shading with projection into the compute shaders.

The good news is the « Array Shadowing ». Indeed, it lets to have a totally lighting bindless shaders, buffers, textures and frameBuffer. The idea behind the « Array Shadowing », is to render all cube map in an ARB_TEXTURE_CUBE_MAP_ARRAY, so you can access to all of them in the direct lighting step. You store into the light the index of your shadow map. You have only to apply the formula saw in the third last one article (Variance Shadow maps) .
To improve the quality of your rendering, you can blur your shadow maps, but I didn’t implement that, because I would have implemented other features before. For example, I will try to implement specular reflections before glossy reflections, they are easier too implement, and faster to render.

Today, I don’t want to make a big article, and we have already talk about lighting, shadows, reflections, and a little bit about global illumination, that’s why we are going to explain many things about the global illumination.

What is Global Illumination ?




In this order, you can see, one bounce, two bounces, and three bounces. In this article, we will see only the first bounce, but you can iterate the following methods to approximate multi bounces global illumination.

James Kajiya lay in 1986 the rendering equation :

\displaystyle{L^{O}(\bold{x}\rightarrow\omega_o)=L^{e}(\bold{x}\rightarrow\omega_o)+\int_{\omega_i\in\Omega}BRDF(\bold{x},\omega_i\rightarrow\omega_o)L^{i}(\omega_i\rightarrow\bold{x})(\bold{N}\cdot\omega_i)\cdot d\omega_i}

The main objectif is to approximate this equation, but, as you can see, this equation is infinity recursive (into the integral), so we have to stop the computing after few bounces.

Explanations of each terms :

The term L^{O}(\bold{x}\rightarrow\omega_o) is the radiance leaving \bold{x} in direction \omega_o, radiance is measured by sensor, and the colour is proportionnal at radiance, in our model, we will lay that radiance is the colour of pixel.

The term L^{e}(\bold{x}\rightarrow\omega_o) is as well the radiance leaving \bold{x} in direction \omega_o, but is the radiance emmited without the radiance reflected. This term will be almost everywhere null, except for the « direct lighting ».

The term \int_{\omega_i\in\Omega}BRDF(\bold{x},\omega_i\rightarrow\omega_o)L^{i}(\omega_i\rightarrow\bold{x})(\bold{N}\cdot\omega_i)\cdot d\omega_i is more complicate, so I am going to explain every members one by one.

The \int_{\omega_i\in\Omega} means we integrate over all the hemisphere oriented by normal \bold{N} at the point \bold{x} .

The BRDF(\bold{x},\omega_i\rightarrow\omega_o) is the BRDF, you can refer to my article about Global Illumination.

The L^{i}(\omega_i\rightarrow\bold{x}) is the radiance coming in \bold{x}, it’s the most important term of this equation.

Inside the rendering equation : The direct lighting

The first bounce of light is also known as direct lighting.

We can write this :

\displaystyle{L^{O}(\bold{x}\rightarrow\omega_o)=L^{e}(\bold{x}\rightarrow\omega_o)+\int_{\omega_i\in\Omega}BRDF(\bold{x},\omega_i\rightarrow\omega_o)L^{i}(\omega_i\rightarrow\bold{x})(\bold{N}\cdot\omega_i)\cdot d\omega_i}
\displaystyle{L^{O}(\bold{x}\rightarrow\omega_o)=\sum_{i\in \{Lights\}}BRDF(\bold{x},\omega_i\rightarrow\omega_o)L^{i}(\omega_i\rightarrow\bold{x})(\bold{N}\cdot\omega_i)}
\displaystyle{L^{O}(\bold{x}\rightarrow\omega_o)=\sum_{i\in\{Lights\}}\bold{c_{diff}}\otimes L^{i}(\omega_i\rightarrow\bold{x})(\bold{N}\cdot\omega_i)}

You can find the same result with the area formulation of the global rendering equation.

Inside the rendering equation : The indirect lighting.

The idea is to compute the differents L^{i}(\omega_i\rightarrow\bold{x}).
To make that in real time, we can use the Instant Radiosity (from Keller (in 1997)).

The idea is to put virtual point lights for each intersection between light ray and world geometry.
We use an atomic counter with a little framebuffer and we render the geometry to put at the good place virtual point lights.

Thus, we have to compute the « power » of the new light and use this new light as a normal light.
If we write the « differential » rendering equation, we find :

\displaystyle{dL^{O}(\bold{x}\rightarrow\omega_o)=BRDF(\bold{x},\omega_i\rightarrow\omega_o)L^{i}(\omega_i\rightarrow\bold{x})(\bold{N}\cdot\omega_i)\cdot d\omega_i}

For PointLights and lambertian surface, we have

\displaystyle{d\omega_i=\frac{4\pi}{6S},\; S=\frac{1}{width\cdot height},\; BRDF=\frac{\rho}{\pi}}

With a FrameBuffer of 8×8 pixels I have a good results and VPL from the ceiling (With an albedo of 1.0)



The first one is without global illumination provide by VPL, and the second one is with GI.

What will be the future?

  1. I will try to implement a photon mapper on the GPU
  2. I will continue my engine, the next article about it will be ray marching AO or improving models of VPLs.
  3. I will try to implement a voxel cone tracing (at least the ambient occlusion).

Thus, maybe future article are not talking about « real time » in the proper sense, but interractive rendering ^_^.

References :

The rendering equation(Kajiya)
Reflective Shadow Maps(Dachsbacher)
An introduction to BRDF
Instant Radiosity

Thanks for reading!

New Project : Architecture and Performances.

Hello there :-).

As I have promised the last time, I have started again from the beginning my engine. I am going to explain why I had need to change my architecture, and explain my differents choices.

To begin, after thinking a lot about Illumination in real time, I found that it’s not useful to have Global Illumination, with no « correct » direct illumination. That’s why, I will implement a Phong Shading gamma correct.

About performances, my old code was very very bad (you can see the proof after), I had to implement a powerful multi stage culling. Moreover, I had to use the latest technologies of GPU to increase performances. Therefore, I am using Compute Shaders, Bindless Textures, Multi Drawing.

For the beginning, I will explain how my engine works internally. I am using several big buffers, for example one which contains all of Vertex, another one which contains all of Index, Command, and so on. The Matrix, Command, AABB3D buffers are filled in each frame. After, a view frustrum culling on GPU is performed, therefore, the Command Buffer is also bind to SHADER_STORAGE_BUFFER and INDIRECT_DRAW. After culling, a pre Depth pass is performed and, finally, the final model pass is performed. We will see the details later.

To have a certain coherence in the scene, like glasses on cupboard, if you move the cupboard, your glasses have to move as well, you have to have a scene graph. To do that, I already explained that I am using a system of Node. But now, I have improve this model with one obvious feature. If the Parent Node don’t have to be render, I will not render the children Nodes and Models as well. My system is decomposed in 3 Objects. One SceneManager, One Node, and one Part of this Node (There may be several kind of Part (lights for example is another part)) : ModelNode.


It is the Model Node. It is a part of a Node which contains a class like « pair » between one Model and One Matrix.
ModelNode contains :
– One bounding box in World Landmark (it depends of the Global Matrix of the parent Node).
– One matrix which contains a relative landmark on the Global Matrix of the Parent Node. If one transformation is performed by this Object, the bounding boxe for the ModelNode and the bounding boxes of « all » parents Node are recomputed.

The function « PushInPipeline » can be strange for you, that is why I am going to try to explain you its role. As I already said, I have several big buffers, this function will modify few buffers, like Matrix, bounding box, and commands buffer. You have exactly the same number of Command, AABB, and Matrix. It’s not obvious to have exactly the same number for Matrix and Command, for examples, if you have one model with 300 meshes, you have, normally, 300 commands, but one matrix. In my system, you have therefore 300 matrix. After you will see why.


The Node object is a root of a Sub-Tree in our Scene Graph. Each child of a Node depends of a Landmark of this root, and the bounding box of this root depends of childs bounding boxes, therefore, the building of this « tree » may be one down approach (if a Root transformation is performed) or one up approach(if a child transformation or Model transformation is performed).
This Node contains :
-a Parent
-a vector of children
-a vector of Model
-a globalFactor : It is a global scale factor (useful for the future lights if scaling Node).


As you can see, the Scene Manager contains two objects, one pointer on Node and one Camera, the last one is not important here, and it can may be move later to Node for example. The objective of this class is to give to user one interface to render all the scene, here, it doesn’t matter the lights, but later, we can add renderLights for example. The function render call succesively initialize, renderModels, renderFinal, the function initlalize update Camera Position and Frustrum, and « reset » our pipeline, the function renderModels render all Models and material information on FrameBuffer, the last one function draws the final texture on the screen with post processing.

Wait, it seems like highly to the « normal » pipeline, right?
Yes, but now, we can improve performances with pre Depth Pass and View Frustrum culling.

What is View Frustrum Culling? The idea behind this term is to reduce a number of « Draw Objects » in this frame. So, you will test if the object is or not into the Camera View, and if not, you don’t render it.
How to perform efficiently this culling?
For one unaccurate culling, you have to perform test on the CPU. Globally, this test is perform in the Node function pushModelsInPipeline.
To have a really accurate and efficient culling, because you have a high number of bounding boxes, you should use your GPU to perform culling. So, a good combinaison between Draw Indirects and Shader Buffer storage will be your friend with the Compute Shaders.

A pre Depth Pass is to avoid to render on the color buffers data that you will change with a nearest object. So, the rendering take 2 passes, but it is, in general, most efficient :).

There is a little diagram with and without optimizations. (ms / size)


As you can see on this picture, the performances gave by culling are almost constant, but a Pre Depth pass performances depends of the size of FrameBuffer’s textures.
We gain nearly twice performances with pre depth pass and culling, but if you have a higher number of objects (here is just Sponza drew), you will earn too much performances :-).

If you want to see the code or the documentation of the code :

See you next time :-).

Kisses !

Ambient Occlusion revisited, reflections, global illumination and new project

Hello there.

Today, we are going to talk about 4 subjects.
The first one is a new computation of Ambient Occlusion, the second one is reflection, as I said the last time.

The final part of this article will be on GI and about a graphic engine.

Even if we are going to talk of several subjects, it will be fast because I don’t have too much things to say on every field.

So, for the ambient occlusion

We remind to:

\displaystyle{AO=\frac{1}{\pi}\int_{\Omega}V(\overrightarrow{x},\overrightarrow{\omega})\cdot\cos(\theta)\cdot d\omega}
\displaystyle{AO=\frac{1}{\pi}\int_{[0,2\pi]}\int_{[0,\frac{\pi}{2}]}V(\overrightarrow{x},\overrightarrow{\omega})\cdot\cos(\theta)\sin(\theta)\cdot d\theta d\phi}
\displaystyle{AO=1-\frac{1}{2\pi}\int_{[0,2\pi]}\int_{[0,\frac{\pi}{2}]}\overline{V}(\overrightarrow{x},\overrightarrow{\omega})\cdot\sin(2\theta)\cdot d\theta d\phi}
\displaystyle{AO\approx 1-\frac{\pi}{2n}\sum_{i=0}^{n}\overline{V}(\overrightarrow{x},\overrightarrow{\omega_{i}})\cdot\sin(2\theta_{i})}

where \theta_{i} and \omega_{i} are a « random » variable.

If you want the « real AO » (real is not just, because Ambient Occlusion is not one physic based measure), you have to raytrace this function, and take care about all of these members.

So, we want to approximate again this formula, and, now, we have 2 textures, and the second one is the normal map, another own the position map.

If you want to know the AO in one point, you have to take few samplers around this pixel (the sampling area is a disk in my code).
So, if your ray is in the hemisphere, the angle between ray and normal is \theta\in[-\frac{\pi}{2},+\frac{\pi}{2}] . You can improve this formula
with a factor distance
We don’t need really the view function, indeed, if you take a sampler no « empty », you have one intersection, and if the cell is empty, \overrightarrow{n}\cdot\overrightarrow{\omega_{i}}=0, so doesn’t matter.

#version 440 core

/* Uniform */
#define CONTEXT 0

layout(shared, binding = CONTEXT) uniform Context
    vec4 invSizeScreen;
    vec4 cameraPosition;

layout(binding = 0) uniform sampler2D positionSampler;
layout(binding = 1) uniform sampler2D normalSampler;

// Poisson disk
vec2 samples[16] = {
vec2(-0.114895, -0.222034),
vec2(-0.0587226, -0.243005),
vec2(0.249325, 0.0183541),
vec2(0.13406, 0.211016),
vec2(-0.382147, -0.322433),
vec2(0.193765, -0.460928),
vec2(0.459788, -0.196457),
vec2(-0.0730352, 0.494637),
vec2(-0.323177, -0.676799),
vec2(0.198718, -0.723195),
vec2(0.722377, -0.201672),
vec2(0.396227, 0.636792),
vec2(-0.372639, -0.927976),
vec2(0.474483, -0.880265),
vec2(0.911652, 0.410962),
vec2(-0.0112949, 0.999936),

in vec2 texCoord;
out float ao;

void main(void)
    vec3 positionAO = texture(positionSampler, texCoord).xyz;
    vec3 N = texture(normalSampler, texCoord).xyz;

    vec2 radius = vec2(4.0 * invSizeScreen.xy); // 4 pixels
    float total = 0;

    for(uint i = 0; i < 16; i ++)
        vec2 texSampler = texCoord + samples[i] * radius;
        vec3 dirRay = texture(positionSampler, texSampler).xyz - positionAO;
        float cosTheta = dot(normalize(dirRay), N);

        if(cosTheta > 0.01 && dot(dirRay, dirRay) < 100.0)
            ao += sin(2.0 * acos(cosTheta));
            total += 1.0;

    // Don't multiply by Pi to get a better result
    // addition is here if total == 0
    ao = 1.0 - ao / (2.0 * total + 0.01);

For that, you perform the calculation over the screen divided by 4 (2 for width, and 2 for height), and you can apply a gaussian blur with separate pass thanks to compute shaders.
For one rendering in 1920 * 1080p, (Thus, the occlusion map size is : 1024 * 1024), the ambient occlusion pass take less than 2ms to be computed.

I have nothing to say about reflections, I don’t use the optimisations of László Szirmay-Kalos, Barnabás Aszódi, István Lazányi and Mátyás Premecz with their Approximate Ray-Tracing on the GPU with Distance Impostors.

So, it is only a environment map, like omnidirectional shadow (the prior article).


Global Illumination (GI), a very interesting subject where researchers spend a lot of time…

The GI, is bounces of light on different surfaces.


You can see blue, green, and red on the floor.

I don’t implement a very efficient model, so I gonna to present you only the way and the idea that I found.

So, for our mathematical model, we use \overline{\cos} for cos clamped between 0 and 1.

We need to use BRDF (Bidirectional reflectance distribution function). The BRDF is a function (really :p? ) which is given 2 directions(\omega_{i} and \omega_{o}) and return a rapport of « power of outgoing vector on power of incident vector ». This function verify these properties :

\displaystyle{\int_{\Omega}BRDF(\omega_{i},\omega_{o})\cdot\cos(\theta_{r})\cdot d\omega_{r}\leq 1}

For a lambertian surface, the light is reflected with the same intensity all over the hemisphere, so, we can lay BRDF=K and, with these previous property

\displaystyle{\int_{\Omega}BRDF(\omega_{i},\omega_{o})\cdot\cos(\theta_{r})\cdot d\omega_{r}\leq 1}
\displaystyle{\int_{\Omega}K\cdot\cos(\theta_{r})d\omega_{r}\leq 1}
\displaystyle{K\pi\leq 1}


\displaystyle{BRDF_{lambertian}=\frac{\rho}{\pi}} where \rho\in[0, 1].

\rho is called « Albedo ». For the clean snow, \rho is close to 1, for a very dark surface \rho is close to 0.

The basic idea behind my algorithm is based on Instant Radiosity from Keller.

We render the scene from the light source, on a very little FrameBuffer (8*8, 16 * 16, 32 * 32, or 64 * 64 (but it’s already too much) ), and thanks to shader buffer storage, for each pixel, we can store on this buffer our lights.

\displaystyle{L_{o}=\frac{\rho\cdot K_{d}}{N\cdot\pi}\cdot L_{i}\cdot\overline{\cos(\theta_{i})}PHONG}
where PHONG (Phong lighting on google) is calculation of the power of light in x and Kd is the colour of the object touch by light.

After that, you just have to render lights like other.

So, for the new project, I will begin to write one open source graphic engine. I will try to do it totally bindless. (Zero bind texture : extension bindless texture, Few bind buffer : Only few big buffers). I will implement a basic tree of partitionning space totally in GPU thanks to compute shaders. I will implement phong shading or phong brdf (if I success ^^).

See you !

Shadows of Darkness

Hello :-). Today we are going to talk about shadows in my engine. So, why shadows are very useful in video games?

Simply because they let to improve strongly the realism of one scene.One exemple with, and without shadows. Capture d'écran de 2014-12-30 14:47:55

Capture d'écran de 2014-12-30 14:49:25

The first one is better, right?
So, how can we draw shadows in real times?
You have two way to do it.

The first way, which we don’t talk really today, is the Stencil Shadow Volume. The main idea hide behind this concept, is you detect the silhouette, and extend the light ray, thus, you can know if you are in the shadow or not. If you want more explanations, this is some links to improve your knowledge about Stencil Shadow Volume.

Silhouette Detection
Stencil Shadow Volume
Efficient Shadow Volume

I don’t use shadow volume because they don’t allow soft shadowing, and they are more complicate than shadow mapping.

The second way is : Shadow Mapping. We are not going to talk soft shadow mapping yet, only hard shadow mapping, even if soft shadows improve more realism, but they are more complicate too.

So, what is the main idea behind Shadow Mapping ?
It is a multi pass algorithm.
In the first pass, you render the scene from the point of view of the light, and in the shadow map, you store the distance of mesh from the point light position.
In the second pass, you render the scene from the point of view of Camera, and you compare the current distance with the shadow map distance to know if you are in the shadow. If Shadow map distance is bigger, you are not in shadow, else, you are.
If you want to use a point light shadow, you have to render 6 times the scene from de light point of view in all directions. It’s my own code to render in a cube map.

static vec3 const look[] = {vec3(1.0, 0.0, 0.0),
                            vec3(-1.0, 0.0, 0.0),
                            vec3(0.0, 1.0, 0.0),
                            vec3(0.0, -1.0, 0.0),
                            vec3(0.0, 0.0, 1.0),
                            vec3(0.0, 0.0, -1.0)};

static vec3 const up[] = {vec3(0.0, -1.0, 0.0),
                          vec3(0.0, -1.0, 0.0),
                          vec3(0.0, 0.0, 1.0),
                          vec3(0.0, 0.0, -1.0),
                          vec3(0.0, -1.0, 0.0),
                          vec3(0.0, -1.0, 0.0)};

LightShadow *lightShadowPtr = (LightShadow*)shadowBuffer->map();

vec3 position = (matrix * vec4(, 1.0)).xyz();

lightShadowPtr->positionRadius = vec4(position, mPosRadius.w);


mat4 projection = perspective<float>(M_PI / 2.0, 1.0, 1.0, mPosRadius.w);

mShadowMaps.bindToDraw(true, device);

for(u32 i = 0; i < 6; ++i)
    mShadowMaps.attachCubeMapLayer(GL_TEXTURE_CUBE_MAP_POSITIVE_X + i, 1);

    mat4 projectionView = projection * lookAt(position, position + look[i], up[i]);

    rootNode->renderModelsShadow(mat4(1.0f), projectionView, matrixBuffer);

mShadowMaps.bindToDraw(false, device);


Capture d'écran de 2014-12-28 18:39:16

It’s the Cube shadow map of my scene.

You have to apply a little bias, else, you can have some little issues ^^.

Capture d'écran de 2014-12-28 18:43:54

You can see many little artefacts, even if it’s beautiful, it’s not a realistic thing :D.

Capture d'écran de 2014-12-28 18:49:37

You can see a little bit aliasing.

Many solutions exist to avoid this aliasing. We are going to talk about variance shadow map.

I remind you esperance and variance

E(x)=\int_{-\infty}^{+\infty}xp(x)dx and Var(x)=\int_{-\infty}^{+\infty}x^2p(x)dx

Variance can be approximated by \displaystyle 0.25\left(\left(\frac{\partial f}{\partial x} \right)^{2} + \left(\frac{\partial f}{\partial y} \right)^{2}\right)

So, my code to store data into variance shadow map is :

moments.x = distance(position, / positionRadius.w;
float dx = dFdx(moments.x);
float dy = dFdy(moments.x);
moments.y = 0.25 * (dx * dx + dy * dy);

And to compute shadow factor :

vec2 moments = texture(depthCube, lightToVertex).xy;
float variance = max(moments.y, 0.005);
float diff = dist - moments.x;
float p = variance / (variance + diff * diff);
return smoothstep(0.5, 1.0, 2 * (p - 0.5));

Capture d'écran de 2014-12-28 20:29:07

No aliasing now :).

The next time, we will talk about reflections with environment map.

See you soon :).

References :

OpenGL-Tutorial Shadow mapping
Omnidirectional shadows
Variance Shadow Maps

Ambient occlusion : explanations

Hello there. I am sorry that I have not written since long time ago, but I had to pass my exams ^^.

So, what is ambient occlusion?


There, we can see respictively : No ambient occlusion, occlusion map, and rendering with ambient occlusion.

So, Ambient occlusion allow to improve shadow in the scene.

Generally, Ambient Occlusion is defined by


Ohhhh, it’s a very difficult formula, with a strange integrale.

We are going to see how we can get one formula in this kind.

Firstly, ambient occlusion is a « ray tracing » technique. The idea behind ambient occlusion is, you launch many rays throughout the hemisphere oriented by normal \overrightarrow{n}

Obviously, we can say that one orthogonal ray to normal is not influent compared to one parallel ray, so, we can introduce a dot product between ray and normal into the integral. We do not perform this integral in all hemisphere, but only in the hemisphere’s surface.

I remind you that the infinitesimal surface of sphere is d\omega = R^{2}\sin(\theta) d\theta d\phi .

So, we can lay down that :

\displaystyle{ ka = K \cdot ka_{total} = \int_{\Omega} V(\overrightarrow{\omega})\cdot \cos(\overrightarrow{n},\overrightarrow{\omega})\cdot d\overrightarrow{\omega}}

Where \Omega is the hemisphere oriented by \overrightarrow{n} and \overrightarrow{\omega} is the ray « launched » in the hemisphere, V(\overrightarrow{\omega}) is the view function defined by

\displaystyle{V(\overrightarrow{\omega}) =  \left\{\begin{matrix}  &0&\hspace{1mm}if\hspace{1mm}occluder\\  &1&\hspace{1mm}if\hspace{1mm}no\hspace{1mm}occluder  \end{matrix}\right.}

K is the constant as we have to compute, because ka have to be ranged between 0 and 1. So, now we can try to compute K, with the view function always to 1, because we compute the case with no occlusion is performed.

\displaystyle{\begin{array}{lcl}&&\int_{\Omega} V(\overrightarrow{\omega})\cdot \cos(\overrightarrow{n},\overrightarrow{\omega})\cdot d\omega \\  &=&\int_{\Omega}\cos(\overrightarrow{n},\overrightarrow{\omega})\cdot d\omega\\  &=&\int_{0}^{2\pi}\int_{0}^{\frac{\pi}{2}}R^2\cdot\frac{\overrightarrow{n}}{{||\overrightarrow{n}||}}\cdot\frac{\overrightarrow{\omega}}{{||{\overrightarrow{w}||}}}\cdot \sin(\theta)\cdot d\theta d\phi \left(||\overrightarrow{\omega}||=||\overrightarrow{n}|| = R \right ) \\  &=&\int_{0}^{2\pi}\int_{0}^{\frac{\pi}{2}}\cos(\theta)\cdot \sin(\theta)\cdot d\theta d\phi\\  &=&\int_{0}^{2\pi}\frac{1}{2}\cdot d\phi\\  &=&\pi \end{array}}

So, K=\frac{1}{\pi}, so, we have the same expression of the first integral in the beginning of this article :

\displaystyle{ka=\frac{1}{\pi}\int_{\Omega}V(\overrightarrow{\omega})\cdot \cos(\theta)\cdot d\omega}

For people who like so much rigourous mathematics, I know it’s not very rigourous, but it’s with this « method » that we will compute our occlusion factor :).
If you prefer a more accurate technique, you can integrate in all Hemisphere (with the variable radius and take a view function who return one value between 0 and 1 according to the distance of occluder from the origin of ray) and, you get exactly the same formula cause you do one thing like this : \frac{1}{R}\int_{0}^{R} dr = 1 with \frac{1}{R} is from the View function to limit the return value from 0 to 1.

So, now, we can try to approximate this integral. We have two problems, we can’t perform « launching » infinite rays, so, we launch only few rays, and we have to use a « inverse » of view function \bar{V}

\displaystyle{\begin{array}{lcl}&&\frac{1}{\pi}\int_{\Omega}V(\overrightarrow{\omega})\cdot\overrightarrow{n}\cdot\overrightarrow{\omega}\cdot d\omega \\  &=&1-\frac{1}{\pi}\int_{\Omega}\bar{V}(\overrightarrow{\omega})\cdot\overrightarrow{n}\cdot\overrightarrow{\omega}\cdot d\omega\\  &\approx&1-\frac{1}{N}\sum_{\mathbb{N}}\bar{V}(\overrightarrow{\omega})\cdot\overrightarrow{n}\cdot\overrightarrow{\omega} \end{array}}

After that, we can blur the occlusion map to improve its rendering.

So we just use a simple blur.

#version 440 core

/* Uniform */
#define CONTEXT 0
#define MATRIX 1
#define MATERIAL 2
#define POINT_LIGHT 3

layout(local_size_x = 256)in;

layout(shared, binding = CONTEXT) uniform Context
    uvec4 sizeScreenFrameBuffer;
    vec4 posCamera;
    mat4 invProjectionViewMatrix;

layout(binding = 4) uniform sampler2D AO;
layout(binding = 4, r32f) uniform image2D imageAO;

void main(void)
    float blur = texture(AO, vec2(gl_GlobalInvocationID.xy) /;

    for(int i = 4; i > 0; --i)
        blur += texture(AO, vec2((ivec2(gl_GlobalInvocationID.xy) + ivec2(-i, 0))) /;

    for(int i = 4; i > 0; --i)
        blur += texture(AO, vec2((ivec2(gl_GlobalInvocationID.xy) + ivec2(i, 0))) /;

    imageStore(imageAO, ivec2(gl_GlobalInvocationID.xy), vec4(blur / 9, 0, 0, 0));

Now, we have to code our ambient occlusion.


This picture is very good to understand how we can use our approximation.
Indeed, there, we can see that if the point is red, we have V(\overrightarrow{\omega}) = 1, so \bar{V}\overrightarrow{\omega}) = 0 You just have to test the depth buffer to know if the point is occluder or no.

#version 440 core

/* Uniform */
#define CONTEXT 0
#define MATRIX 1
#define MATERIAL 2
#define POINT_LIGHT 3

layout(shared, binding = CONTEXT) uniform Context
    uvec4 sizeScreenFrameBuffer;
    vec4 posCamera;
    mat4 invProjectionViewMatrix;

layout(local_size_x = 16, local_size_y = 16) in;

layout(binding = 1) uniform sampler2D position;
layout(binding = 2) uniform sampler2D normal;
layout(binding = 3) uniform sampler2D distSquare;

writeonly layout(binding = 4, r16f) uniform image2D AO;

void main(void)
    float ao = 0.0;

    const ivec2 texCoord = ivec2(gl_GlobalInvocationID.xy);
    const vec2 texCoordAO = vec2(texCoord) /;

    vec3 positionAO = texture(position, texCoordAO).xyz;
    vec3 normalAO = texture(normal, texCoordAO).xyz;
    float distSquareAO = texture(distSquare, texCoordAO).x;

    for(int j = -2; j < 3; ++j)
        for(int i = -2; i < 3; ++i)
            vec2 texCoordRay = vec2(texCoord + ivec2(i, j)) /;

            vec3 positionRay = texture(position, texCoordRay).xyz;
            float distSquareRay = texture(distSquare, texCoordRay).x;

            float c = dot(normalAO, normalize(positionRay - positionAO));

            if(c < 0.0)
                c = -c;

            if(distSquareRay < distSquareAO)
                ao += c;

    imageStore(AO, texCoord, vec4((1 - ao / 25), 0.0, 0.0, 0.0));

Strangely, in this case, if I use shared memory, I have badder result than just texture. Maybe the allocation of shared memory is longer and it’s not very efficient here :-).

I advise you to use texture instead of imageLoad, indeed, I get 8 times performance better with texture ^^.

Bye :). The next time, we will talk about shadows !

Return to Rasterization, Lighting, and Ambient Occlusion

Hello there. I have not written since few weeks, so I’m sorry.

So, now, I’m going to explain why I leave the ray tracing concept for the rasterization (again?).

To have an efficient ray tracing, you need a good structure with tree, but, the problem is that having a tree on GPU is not easy. Indeed, GPU does not have a call stack and pointer.

So, for the rasterization part, I keep the same model, with 4 textures (Diffuse, pos, normal, distSquare) into FrameBuffer.

Now, to add lighting, I can add another FBO with another texture.

This technique is called : Deferred lighting (or shading). The main advantage of this technique, is, instead of computing lighting in all pixel in the screen, you compute lighting only in pixels that are affected by lighting.

For example, if your lights are placed in the center of your screen, with a radius of 100 pixels, your lighting is computing of 10 000 pixels instead of 2 000 000 (in FULL HD).

So. I am going to explain how to create a deferred lighting for point light. Indeed, to have a spot lights, or globe light, It’s more difficult. Really, I don’t know for globe lights, but for spot light, you have to create a cone, with a good « angle ». If you want to know more about globe lights, you can go here Nvidia GPU Gems 3 : Tabula Rasa .

So, it exists many techniques about deferred lighting. I am going to explain how my engine works.

I began to introduce how my light’s system it’s implemented.

/* PointLight structure */
struct PointLight
    mat4 projectionViewModel; // Matrix for light
    vec4 positionRadius;
    vec4 color;

The matrix projectionViewModel is a matrix used to « project » our lights in the screen to avoid compute lighting in pixels « useless ».

Remember, I saw on the last article, I store my Positions and normals in Frame Buffer.

Now, I bind these textures, and I draw a cube which own my lights (I use only point lights), and configure your cube.

Wait, What is a configuration for the cube?? It’s only the position of your light, with the radius.

ptr->projectionViewModel = projectionView * scale(translate(mat4(1.0f),, vec3(light.positionRadius.w));

Now I can draw my cube and do computing.

Yes but, I have a little problem. Indeed, a cube have « two opposite faces », so if I am out of your lights, my lighting is computing two times and it don’t provide a good result.





You can see that I have one light more powerful when I am out instead in ^^.

So, how can I solve this issue? Simply on using the Stencil Buffer.
You clear your buffer with 0, and if any draw is make, you increment this value only if this value is 0. So when you compute once time on one pixel, you can’t perform another computing in the same pixel.

glEnable(GL_STENCIL_TEST); // Active stencil test
glStencilFunc(GL_EQUAL, 0, 0xFF); // Pass only if equal to 0

// Increment only if pass

So, now, I can introduce computing 😀 .

Currently, my lighting algorithme is very simple, but I will complicate that later (with quadratique attenuation, normal / height map and other).

Currently, I use a linear attenuation and a famous \displaystyle \overrightarrow{n} \cdot \overrightarrow{l}. :

float computeFactorLight(vec3 posToLight, vec3 posToLightNorm, vec3 normal, float distToLight, float radius)
 float attenuation = 1.0 - distToLight / radius;

 if(attenuation <= 0.0)
 return 0.0;

 float nDotL = dot(normal, posToLightNorm);

 if(nDotL <= 0.0)
 return 0.0;

 return nDotL * attenuation;

The name of this article own a term : Ambient Occlusion

It is the Ambient Occlusion map with a average blur :

Capture du 2014-11-20 00:09:14

But, I will talk about that in the next article. It is planned to talk about calculation, and optimisation. It’s not the best formula, and not the best technique, but I will explain how can I get a little formula, and how can I use this formula.

Spheres, Planes, Deferred Shading, compute shaders

Hello there, today, we are going to talk about too many things.

Firstly, we are going to show you how our engine work, and how to compute a ray sphere, or ray plane intersection.

We are going to talk about optimization too.

So, to begin, we are going to explain how our engine is build.

We use a technique called Deferred Shading.

Into our GBuffer (it’s the name of buffer used in deferred shading), we have 4 + 1 textures.


We can see that we have 4 important textures, and one less important.

So, now, I’m going to explain what their utilities are.

The color Texture is the basic texture that contains color of each pixel. (RGBA8 : normalized unsigned byte per component)
The position texture is a texture that contains positions of each object projected in the screen. (RGBA32F : float per component)
The normal Texture is a texture that contains normal of each object projected on the screen. (RGBA16_SNORM : normalized signed short per component)
The Distance to the Camera Texture that contains the distance of camera for each position written in the position texture. The same applies to the depth map. (GL_R32F : float for one component).

For example, here are 3 screenshots of Color, Position, and Normal textures.

Capture d'écran de 2014-11-04 19:05:59 Capture d'écran de 2014-11-04 19:09:52 Capture d'écran de 2014-11-04 19:11:43

So, now, we are going to explain the multiple passes in our rendering algorithm.

The first pass, is a rasterization pass, we use a vertex and fragment shaders to do this.
As a reminder, vertex is a processor of vertex, it is invoked one time for each vertex (if you have 1 000 triangles, it’s invoked 3 000 times), and fragment shaders are invoked once time for each pixel.

Our engine works in world space, so, positions, normals are in world position.

To rasterize, we use a simple shader with Frame Buffers (which replace screen, or multiple screens)

So, now, we will explain the maths about Ray Tracing.

As a reminder, here are the respective equations of plane, and spheres.

\left\{\begin{matrix} a\cdot x + b\cdot y + c\cdot z + d &=& &0& \\  (x-pos.x)^2 + (y-pos.y)^2 + (z-pos.z)^2 &=& &R^2& \Leftrightarrow &<xyz - pos, xyz - pos> = R^2&  \end{matrix}\right.

So, your ray begin at position ro (position of Camera), and have a  rd direction.

\left\{\begin{matrix}  x &=& ro.x + t\cdot rd.x \\  y &=& ro.y + t\cdot rd.y \\  z &=& ro.z + t\cdot rd.z & &  \end{matrix}\right.

Now, you just can solve for t the above equations, and you have a position of the intersection object.

But, how can I get a rd vector??

It’s really easy.

Projection\cdot View\cdot rd = (x,y,1.0,1.0) \\  \Leftrightarrow rd = (Projection\cdot View)^{-1}\cdot(x,y,1.0,1.0)

The signification of « 1.0 » for the z component here is : « far plane ».

Now, to optimize the sphere computing, you only have to draw one cube with rasterization, and, in a shader, you can do a computing, so, instead of computing of all your pixels, you compute only in a little area in your screen. It’s a power of deferred shading.

Ohhhh, I have a problem : Spheres are cut …

Capture d'écran de 2014-11-04 22:23:09

It’s because you have to use a infiniteProjection (glm see around of infinitePerspective).

For the plane, my advise is to use Compute Shaders instead Fragment shaders. Indeed, if you put plane in shared memory (equivalence to L1 cache), you obtain a little gain in performance.
Indeed, shared memory is about 100 times faster than global memory, so if you have a big number of accesses of your buffers, it’s better to use shared memory ^^.

Now, we can draw lights and shadows ^^.

But it will be for the next article ^^.

Bye 🙂 .