## Online Graphics Transforms 2: GluLookAt

July 30, 2019

In this final segment of lecture 4, we have
achieved all the tools needed to derive gluLookAt, which is a key OpenGL component for viewing. And it’s really a case study, of course immediately
relevant to homework 1, but really a case study on how you can use all of the ideas
you have to derive this relatively complicated 4×4 transformation matrix that can be used
to view objects and position an arbitrary camera in the world. The function we are going to talk about positions
the camera, and is therefore fundamental to how we look at images. Let’s talk about the
parameters for a moment. So you have the x coordinate, y coordinate and z coordinate
of the eye. You have the, x, y and z coordinates of the center, which is where it is looking
at. And, you have an up vector of the camera, which is, what is the up direction of the
camera? And that is important for determining what
parts of the image are on the X axis and the Y axis. What is the orientation of the image
with respect to the world? It corresponds to rotating your camera. The eye location
corresponds to where you place your eye, or your camera. Whereas, the up direction corresponds
to the orientation or heading of the camera, or equivalently how you rotate your head. And really the key thing is, this combines
many of the concepts discussed in the last two lectures and gives you a practical example
of how you solve problems. It involves 3 main steps to derive the matrix.
First we need to create a coordinate frame for the camera. Then we need to create a rotation
matrix corresponding to the camera’s coordinate frame. And finally, we need to apply a translation
for the camera, or the eye location. In the basic math lecture, and we talked about
vectors and orthonormal basis frames, we introduced the notation and the way in which we can create
the coordinate frame. So essentially you have a vector a and a vector
b, in this case given by the direction of u, which is the difference between eye and
center, and b which is the up direction. But of course these might not be orthogonal nor
unit norm, and you also need to find the third axis of the coordinate frame. So what you
do is, first you create the w vector, which is just normalizing a, and that’s equivalent
to Z axis. And then of course you have to remove the
components of v which fall along w, and instead of doing that explicitly, a more elegant way
is to find the u vector which is b cross w and so the cross product automatically removes
the appropriate components. And finally, you can create v as w cross u which is an identity
that must hold in an orthonormal coordinate frame. So we will go back to this early construction
we saw in the basic math lecture. In order to create the coordinate frame, of
course you have to know what is my a and b that I’m given, and so I’ve rewritten what
w, u and v are here, but I need to know what’s a and b. And let’s go back to our problem statement.
We want to position the camera at the origin, looking down the -Z direction. In OpenGL,
you always have a camera at the origin looking at (0,0,-1). So it’s looking at -Z direction.
Therefore, center – eye, you want to move that to the -Z direction. Therefore, the +Z
direction will be given by eye – center. And that’s what the vector a is given by, eye
– center. The vector b is then simply the up vector,
and those are your two vectors, a and b, eye – center and eye. So from that, you can compute the vectors
u, v and w using the formulae discussed earlier. But the main reason for doing that is you
want to create a coordinate frame and a rotation matrix. So then we go back to what we discussed
earlier in this lecture and also in the next lecture, in previous lecture, that the rows
of the rotation matrix are the 3 unit vectors of the new coordinate frame. So given be u, v and w, you can define, you
know the x, y and z coordinates of u, x, y, and z coordinates of v, x, y, and z coordinates
of w. The rows of the matrix of are the vectors u, v and w. And so, you have constructed a
rotation matrix for the camera. The final step is that you need to apply a translation
for the camera, or the eye location. And this, is a little bit of a trick question.
Do you do the rotation first or the translation first? And that’s what we talked about earlier
in the lecture. The important point to note is that you cannot apply the translation after
the rotation. Think about it, you have a camera. And, so, you can think about it in 2 steps.
First the camera is looking down the -Z axis. It’s moved to the appropriate location in
the world, so you have to apply to the inverse translation to the world. And then, it’s positioned
appropriately, which corresponds to the rotation. So the translation must come first to bring
the camera to the origin before the rotation is applied. How do we combine translations
and rotations? Again, I’m just repeating slides earlier in the lecture. The main point of
this segment is to show easy application of the way you can use all of the knowledge I’ve
been talking about. So if I do translation first and then rotation I have an effective
translation, which is rotation times translation. So the rotation matrix times the translation
vector. In this case, the translation vector is the
eye coordinate or really the -eye coordinate, because I have to do the inverse translation
to the world. And the rotation matrix are the u, v, and w coordinates of the new coordinate
frame. Take the rotation matrix, and I take the translation
vector, which is -eye. And now, I multiply these together. So, of course, the rotation
part doesn’t change, and I’m not interested in that. So I’ll just talk about how the translational
components will change. And you notice here, I’m going to have x_u * -e_x, y_u * -e_y,
and z_u * -e_z. This just corresponds to a dot product of
the vector u, and the vector e, of course with the negative, and therefore these things
will become, -u dot e, -v dot e, and -w dot e. Of course, do not confuse this w vector
with the w homogeneous coordinate, which is just 1 in this case. In fact, in the earlier segments, we did derive
this form, for this is what happens, that when you rotate a particular vector, it’s
equivalent to considering its projection in terms of the new u, v and w. And just writing this down explicitly, we
derive the formula shown here. And this is the final form of gluLookAt. Some components
of this you will need in homework 1.

## 3 Replies to “Online Graphics Transforms 2: GluLookAt”

1. Ankur Rathore says:

so abstract, no elaboration, useless

2. Narek Aghekyan says:

Sorry, I could not understand why the translation must come first. What was the argument proving it? Looks like it is possible to do both ways RT or TR but the matrices will be different, right?

In 4:09 we say that rows of matrix are are 3 unit vectors of new coordinate frame. But we came to this conclusion when we were rotating coordinate frame around it's origin (https://www.youtube.com/watch?v=gdoI2nM6Lio) Is this the reason why we translate first to make sure that we rotate the frame only when origins coincide?

3. Dren Kajm says:

@raviramamoorthi How would we implement a orbital-camera/third-person-camera? the one that rotates around a specific point/object.