Saturday, December 4, 2010

Graphics fundamentals - the why of "4-D" (homogenous) coordinates

(Note: This is not meant to be a comprehensive list of the why's of homogenous coordinates, just some of the bigger ones)

Apparently computer graphics is a bit of a black art when it comes to why things are represented the way they are, even if you "know" basic linear algebra.

This is a response to http://shawnpresser.blogspot.com/2010/10/how-to-become-game-programmer.html by way of http://www.mauriziopz.eu/from0togame/.

In this article the only prerequisite is knowing matrix multiplication is not commutative(i.e. order matters in multiplication of matrices) and how to multiply and add matrices.

If you've never been exposed to OpenGL or D3D you might assume everything is done with 3x3 matrices, but you'd be wrong.  Actually, everything is done in homogenous ("4-D") coordinates.  Now why the hell would you use "4-D" coordinate when you're going to be projecting 3-D objects onto a 2-D screen?  Well, it simplifies the math, that's why.

Suppose you have a line segment (from pt A=(1,1,1) to B=(0,0,0)) in space and you want to stretch it along the y-axis and move it out one unit from the origin on the x-axis. Well, using normal 3-D coordinates you'd need your scaling/stretching matrix (S) and your translation vector (T)

Translation is pretty straight forward just add how much you want to move it on each axis (negative for moving it closer to the origin, positive for further away).

T = (1, 0, 0)

The basic scaling matrix looks like this

  [stretches x 0 0]
S=[0 stretches y 0]
  [0 0 stretches z]

We'll stretch it along y by double, so we just make the second term 2 and the rest 1.

  [1 0 0]
S=[0 2 0]
  [0 0 1]

Then you apply it to your points (A and B).

(First stretch then move)
New A = SA + T and
New B = SB + T

Well, what's wrong with that? It seems simple enough. And it is until you realize that most graphics objects are going to be significantly more than two points and you need to apply those transformations (and usually many more) to each point in the same order (because remember multiplication is not commutative for matrices).

Now how can we make this process more fool proof? Well, if we had a way to represent all our transformations as a single object that would make it much easier to not make mistakes. And hey, while we're at it let's only have to worry about one type of object, matrices, for those transformations (rather than matrices and vectors/points).  It turns out that the 4x4 matrix space takes care of both of these problems for us. And for free it throws in elimination of an operation, addition. Yup, now we can do all of our transformations as matrix multiplication only.

Going back to our last example if we add another coordinate we have points that looks like P = (x,y,z,w). Now we're going to set w to 1, so terms don't disappear on us as we move our points around. This makes our original points A=(1,1,1,1) and B=(0,0,0,1).

First, in 4x4 space we have to notice that the identity matrix is


   [1 0 0 0]
I= [0 1 0 0]
   [0 0 1 0]
   [0 0 0 1]

The 1 in the bottom row is important to keep things consistant just like having w = 1. As I'll show in a couple of steps.

This makes scaling matrices look like

  [Sx 0  0  0]
S=[0  Sy 0  0]
  [0  0  Sz 0]
  [0  0  0 Sw]

For the most part we'll never scale w because it is just there as a place holder.

Moving our S from 3x3 to 4x4 we get

  [1 0 0 0]
S=[0 2 0 0]
  [0 0 1 0]
  [0 0 0 1]

Now let's put our T in 4x4 as a matrix this time

  [1 0 0 Tx]
T=[0 1 0 Ty]
  [0 0 1 Tz]
  [0 0 0 Tw]

Why are we doing it like this? Well we need to notice that if we multiply a matrix and a point (x1,y1,z1,w1) in 4d we get

[1(x1) + Tx(w1)]   [1 0 0 Tx][x1]
[1(y1) + Ty(w1)] = [0 1 0 Ty][y1]
[1(z1) + Tz(w1)]   [0 0 1 Tz][z1]
[Tw(w1)]           [0 0 0 Tw][w1]

This is almost what we want. Except that w1 is in the way.  Ah, that's why we set w=1. If we do that everything works out right.

Now we can do our formulas again, but this time we need to combine matrices before apply it to the point

New A = (T+S)A and
New B = (T+S)B

I thought you said we could eliminate addition I hear you crying. We can! IF you set Sw = 1 and Tw = 1

[1 0 0 Tx]   [Sx 0  0  0]   [1 0 0 Tx]   [Sx 0  0  0]
[0 1 0 Ty] * [0  Sy 0  0] = [0 1 0 Ty] + [0  Sy 0  0]
[0 0 1 Tz]   [0  0  Sz 0]   [0 0 1 Tz]   [0  0  Sz 0]
[0 0 0  1]   [0  0  0  1]   [0 0 0  1]   [0  0  0  1]

That gives us a single matrix that we can work with now

         [Sx 0  0  Tx]
TS = T*S=[0  Sy 0  Ty]
         [0  0  Sz Tz]
         [0  0  0   1]

Now we can take this TS matrix and apply (multiply) it to each point, rather than having to apply a formula to each point. This saves us coding and it also saves cycles (by only combining our transforms once, rather than each time we apply it to a point).

New A = (TS)A and
New B = (TS)B

Be careful with this example though, even though it doesn't matter what order we did the operation this time that's not true once you start rotating things in space, then it very much matters what order you build your transformation matrix in.

If you want to learn more about this subject let me know and I'll start posting about some more of the fundamentals.  Or you can go to check out this course on fundamental math used for graphics (http://bit.ly/hTJNA3). He keeps the linear algebra as simple as possible. For a more in-depth explination see chapter 4 of Essential Mathematics for Games and Interactive Applications, Second Edition: A Programmer's Guide (A great book, all the math you need for graphics and games, but in general WAAAY more than you need for most projects and it's pretty formal so you better remember your sophomore/junior level linear algebra. Overall, it makes a great reference though).