World space is a very useful intermediary between camera space and model space. It makes it easy to position cameras and so forth. But there is a lingering issue when dealing with world space directly. Namely, the problem of large worlds and numerical precision.

Let us say that you're trying to model a very large area down to fairly small accuracy. Your units are inches, and you want precision to within 0.25 inches. You want to cover an area with a radius of 1,000 miles, or 63,360,000 inches.

Let us also say that the various pieces of this world all have their own model spaces and are transformed into their appropriate positions via a model-to-world transformation matrix. So the world is assembled out of various parts. This is almost always true to some degree.

Let us also say that, while you do have a large world, you are not concerned about
rendering *all* of it at any one time. The part of the world you're
interested in is the part within view from the camera. And you're not interested in
viewing incredibly distant objects; the far depth plane is going to cull out the world
beyond a certain point from the camera.

The problem is that a 32-bit floating-point number can only hold about 7 digits of precision. So towards the edges of the world, at around 63,000,000 inches, you only have a precision out to about ±10 inches at best. This means that vertex positions closer than this will not be distinct from one another. Since your world is modeled down to 0.25 inches of precision, this is a substantial problem. Indeed, even if you go out to 6,000,000 inches, ten-times closer to the middle, you still have only ±1 inch, which is greater than the tolerance you need.

One solution, if you have access to powerful hardware capable of OpenGL 4.0 or better, is to use double-precision floating point values for your matrices and shader values. Double-precision floats, 64-bits in size, give you about 16 digits of precision, which is enough to measure the size of atoms in inches at more than 60 miles away from the origin.

However, you would be sacrificing a lot of performance to do this. Even though the
hardware *can* do double-precision math, it loses quite a bit of
performance in doing so (anywhere between 25% and 75% or more, depending on the GPU).
And why bother, when the real solution is much easier?

Let's look at our shader again.

#version 330 layout(location = 0) in vec4 position; uniform mat4 cameraToClipMatrix; uniform mat4 worldToCameraMatrix; uniform mat4 modelToWorldMatrix; void main() { vec4 worldPos = modelToWorldMatrix * position; vec4 cameraPos = worldToCameraMatrix * worldPos; gl_Position = cameraToClipMatrix * cameraPos; }

The `position`

is relatively close to the origin, since model
coordinates tend to be close to the model space origin. So you have plenty of
floating-point precision there. The `cameraPos`

value is also close to
the origin. Remember, the camera in camera space is *at* the origin.
The world-to-camera matrix simply transforms the world to the camera's position. And as
stated before, the only parts of the world that we are interested in seeing are the
parts close to the camera. So there's quite a bit of precision available in
`cameraPos`

.

And in `gl_Position`

, everything is in clip-space, which is again
relative to the camera. While you can have depth buffer precision problems, that only
happens at far distances from the near plane. Again, since everything is relative to the
camera, there is no precision problem.

The only precision problem is with `worldPos`

. Or rather, in the
`modelToWorldMatrix`

.

Think about what `modelToWorldMatrix`

and
`worldToCameraMatrix`

must look like regardless of the precision of
the values. The model to world transform would have a massive translational component.
We're moving from model space, which is close to the origin, to world-space which is far
away. However, almost all of that will be immediately *negated*,
because everything you're drawing is close to the camera. The camera matrix will have
another massive translational component, since the camera is also far from the
origin.

This means that, if you *combined* the two matrices into one, you
would have one matrix with a relatively small translation component. Therefore, you
would not have a precision problem.

Now, 32-bit floats on the CPU are no more precise than on the GPU. However, on the CPU
you are guaranteed to be able to do double-precision math. And while it is slower than
single-precision math, the CPU is not doing as many computations. You are not doing
vector/matrix multiplies per vertex; you're doing them per *object*.
And since the final result would actually fit within 32-bit precision limitations, the
solution is obvious.

The take-home point is this: avoid presenting OpenGL with an explicit model-to-world
matrix. Instead, generate a matrix that goes straight from model space to
*camera* space. You can use double-precision computations to do
this if you need to; simply transform them back to single-precision when uploading the
matrices to OpenGL.