Wednesday, November 30, 2011

Matching calibrated cameras with OpenGL

When working with calibrated cameras it is often useful to be able to display things on screen for debugging purposes.  However the camera model used by OpenGL is quite different from the calibration parameters from, for example, OpenCV.  The linear parameters that OpenCV provides are the following:

where (from is the skew between the x and y axes, are the image principle point , with f being the focal length and being scale factors relating pixels to distance.  Multiplying a point by this matrix and dividing by resulting z-coordinate then gives the point projected into the image.

The OpenGL parameters are quite different.  Generally the projection is set using the glFrustum command, which takes the left, right, top, bottom, near and far clip plane locations as parameters and maps these into "normalized device coordinates" which range from [-1, 1].  The normalized device coordinates are then transformed by the current viewport, which maps them onto the final image plane.  Because of the differences, obtaining an OpenGL projection matrix which matches a given set of intrinsic parameters is somewhat complicated.

Roughly following this post, (update: a much-improved update from Kyle, the post's author is available here) the following code will produce an OpenGL projection matrix and viewport.  I have tested this code against the OS-X OpenGL implementation (using gluProject) to verify that for randomly generated intrinsic parameters, the corresponding OpenGL frustum and viewport reproduce the x and y coordinates of the projected point.  The code works by multiplying a perspective projection matrix by an orthographic projection to map into normalized device coordinates, and setting the appropriate box for the glViewport command.

 @brief basic function to produce an OpenGL projection matrix and associated viewport parameters
 which match a given set of camera intrinsics. This is currently written for the Eigen linear
 algebra library, however it should be straightforward to port to any 4x4 matrix class.
 @param[out] frustum Eigen::Matrix4d projection matrix.  Eigen stores these matrices in column-major (i.e. OpenGL) order.
 @param[out] viewport 4-component OpenGL viewport values, as might be retrieved by glGetIntegerv( GL_VIEWPORT, &viewport[0] )
 @param[in]  alpha x-axis focal length, from camera intrinsic matrix
 @param[in]  alpha y-axis focal length, from camera intrinsic matrix
 @param[in]  skew  x and y axis skew, from camera intrinsic matrix
 @param[in]  u0 image origin x-coordinate, from camera intrinsic matrix
 @param[in]  v0 image origin y-coordinate, from camera intrinsic matrix
 @param[in]  img_width image width, in pixels
 @param[in]  img_height image height, in pixels
 @param[in]  near_clip near clipping plane z-location, can be set arbitrarily > 0, controls the mapping of z-coordinates for OpenGL
 @param[in]  far_clip  far clipping plane z-location, can be set arbitrarily > near_clip, controls the mapping of z-coordinate for OpenGL
void build_opengl_projection_for_intrinsics( Eigen::Matrix4d &frustum, int *viewport, double alpha, double beta, double skew, double u0, double v0, int img_width, int img_height, double near_clip, double far_clip ){
    // These parameters define the final viewport that is rendered into by
    // the camera.
    double L = 0;
    double R = img_width;
    double B = 0;
    double T = img_height;
    // near and far clipping planes, these only matter for the mapping from
    // world-space z-coordinate into the depth coordinate for OpenGL
    double N = near_clip;
    double F = far_clip;
    // set the viewport parameters
    viewport[0] = L;
    viewport[1] = B;
    viewport[2] = R-L;
    viewport[3] = T-B;
    // construct an orthographic matrix which maps from projected
    // coordinates to normalized device coordinates in the range
    // [-1, 1].  OpenGL then maps coordinates in NDC to the current
    // viewport
    Eigen::Matrix4d ortho = Eigen::Matrix4d::Zero();
    ortho(0,0) =  2.0/(R-L); ortho(0,3) = -(R+L)/(R-L);
    ortho(1,1) =  2.0/(T-B); ortho(1,3) = -(T+B)/(T-B);
    ortho(2,2) = -2.0/(F-N); ortho(2,3) = -(F+N)/(F-N);
    ortho(3,3) =  1.0;
    // construct a projection matrix, this is identical to the 
    // projection matrix computed for the intrinsicx, except an
    // additional row is inserted to map the z-coordinate to
    // OpenGL. 
    Eigen::Matrix4d tproj = Eigen::Matrix4d::Zero();
    tproj(0,0) = alpha; tproj(0,1) = skew; tproj(0,2) = u0;
                        tproj(1,1) = beta; tproj(1,2) = v0;
                                           tproj(2,2) = -(N+F); tproj(2,3) = -N*F;
                                           tproj(3,2) = 1.0;
    // resulting OpenGL frustum is the product of the orthographic
    // mapping to normalized device coordinates and the augmented
    // camera intrinsic matrix
    frustum = ortho*tproj;

The code uses the Eigen linear algebra library, which conveniently stored matrices in column-major order, so applying the resulting frustum matrix is as simple as:
glLoadMatrixd( &frustum(0,0) );


Maikon said...


How about modelview matrix? What do you think of this matrix?
|R t|
|0 1|

where R is 3x3 rotation matrix and t is a translation vector (t= -RC)

James Gregson said...


The focus of this post was really on just the projection matrix, but matching the modelview matrix is fairly straightforward if your cameras are calibrated.

For example, OpenCV ( has functions that will estimate the modelview parameters for the camera (camera extrinsic parameters) from checkerboard patterns in the images. These extrinsics are just the rotation and translation matrices you're referring to.

Hope this is helpful,


Maikon said...

thank you for answer. My cameras are calibrated. I have a R rotation matrix (3x3) and a t translation vector. My doubt is how can i convert this matrices to modedelview 4x4 for use in OpenGL. Do you know how?

Another question. Why the both tproj(2,2) and tproj(2,3) are negative?


James Gregson said...

Sorry for the delay,

To generate the modelview matrix you simply store your 3x3 rotation matrix R in the top left submatrix and the translation t in the first three rows of the right column, and set the bottom row to [0,0,0,1]^T.

Note that OpenGL stores the data by column first, then by row, so a positions [row,col] in the matrix will be the index row+col*4.

I don't recall offhand why the negatives are there except to make the frustum point the right way, i.e. so the camera looks forward rather than backwards.


Maikon said...

thank you very much James

Kyle said...

Hi, James. I'm glad my blog post was helpful. If you're interested, I've written an updated version of that post that better explains how to set the parameters to glOrtho and discusses why the third column is negated. It is available here:

James Gregson said...

Thanks! I've added a link to your new post. Great work by the way, the original post was the only useful reference that I was able to find and your new post improves on it considerably!

skylook said...
This comment has been removed by the author.
skylook said...

Seems that there is a difference between your implementation and Kyle's:
In his article top = 0.0 and bottom = height, but yours top = height and bottom = 0.0.
Does that matter?

b4silio said...

@skylook swapping the bottom and top is used to invert the Y axis depending on how you're working on your original space. (Basically: if things are upside down, then you probably need to swap the two of them).