Friday, August 10, 2012

Optical Tomography Setup

As noted in a previous post, my Stochastic Tomography paper was accepted to SIGGRAPH 2012.  Last Tuesday I was in Los Angeles for the conference to present the paper, including present the synopsis in the conference 'fast-forward' to an appallingly large audience.  The photo below shows the seating but during the actual event it was standing room only. 

A bit nerve-wracking to say the least.  However it went well and after presenting my main talk to a MUCH smaller crowd, I'd like to post some photos of the setup that we used for the paper.  I should point out that this project actually did not contribute much to the capture setup, this was previously in place from work done by Borislav Trifonov, Michael Krimerman, Brad Atcheson, Derek Bradley and a slew of others.  My work on this paper focused on the algorithms primarily, but I thought people might be interested in a quick overview of the tomography capture apparatus.

The goal of the paper was to build 3D animated models of mixing liquids from multiview video.  To accomplish this, we used an array of roughly 16 Hi-Def Sony camcorders arranged in a semi-circle around our capture volume to record video streams of the two fluids mixing.

You can see the cameras in the photo above, all focused on the capture volume which is inside the glass cylinder.  Each of these records a video of the mixing process, producing 16 streams of video that look more or less like the photo shown below:

You can see one of the cameras peeking out at the right side of the frame.  The cameras are controlled by an Arduino based box that talks to each camcorder using the SONY LANC protocol.  This is an unpublished protocol used by SONY editing consoles, however it has been reverse-engineered by others to allow people to control SONY equipment.  We implemented this protocol on an Arduino, which allows us to start all the cameras recording, turn them on and off as arrays, switch them between photo and video mode and so on.  Unfortunately we can't easily set the exposure levels, transfer files to and from the device, instead we have to painstakingly do this by hand through the on-camera menus, which is error-prone and time-consuming.

The two fluids we use are water, for the clear liquid, and Fluoroscein-Sodium fluorescing dye for the mixing liquid.  This fluorescent dye is available in a powder that is soluble in water, which allows us to perform several types of capture.  The image above is dye powder dropped onto the surface of the water, this mixes with the water and is slightly denser, forming the Rayleigh-Taylor mixing process you see in that shot.  We can also pre-mix the dye powder and simple pour or inject it into the domain, this was the process used for the following two captures that were used in the paper.

This shows an unstable vortex propaging downwards, leaving a complex wake.  I recommend watching in high-def (720p or 1080p).  The next is alcohol mixed with the dye powder, injected into the cylinder from the bottom.  Since alcohol is less dense than water, it rises under buoyancy, mixing as it goes.

In the video above you can see a laminar to turbulent transition as well as lots of complex eddies that form as part of the mixing process.

The captures are illuminated with a set of white LED concert strobe panels.  These panels serve two purposes. First they let us get LOTS of light into the scene in a controlled fashion.  Second we actually use a strobed illumination at about 30Hz to optically synchronize the cameras and remove rolling-shutted shear effects.

All captures start in darkness so we can tell the time offset from the start of the video to the first frame where there is significant illumination.  In fact we can do better than alignment to a single frame, since with the rolling shutter used by these cameras, we can actually determine the first scanline that is exposed.  Using a 30Hz illumination pattern, we can also determine the exposure setting of the camera by looking for the last scanline before the light goes off again.

We then have a rolling shutter compensation program that scans through each video and reassembles a new video from the exposed and dark scanlines.  The result is a set of videos that are optically synchronized and that have minimal shearing introduced.

This gives us a set of input data, however we also need to perform some geometric calibration of the scene in order to know from what angle each video was recorded and to be able to obtain the ray inside of the capture volume that corresponds to every observed pixel.

To do this, we use an optical calibration library called CalTag that detects self-identifying marker patterns similar to QR codes in images.  We print a calibration target on overheard transparencies and mount this pattern to a 3D printed calibration jig that is placed in the glass cylinder.

This jig fits tightly in the cylinder and is registered with a set of detents that fit into recesses in a registration plate that is glued to the inside of the capture cylinder.  The marker pattern that you see in the photo above is also registered to a set of registration tabs.  We have a calibrated pattern on the front of the target as shown above, but also on the back.

When a camera takes an image of this jig after placing it into the capture domain (filled with water), an image similar to the following is obtained, although generally with far less blur due to condensation.

CalTag can then give us the corners of the marker patterns, which can be interpolated to associate with every image pixel, the corresponding 3D point on the calibration target that it 'sees'.  We then rotate the target 180 degrees to face away from the camera and take a picture of an identical and carefully aligned target on the back side of the jig, giving another 3D point for each pixel.  Connecting the points gives a ray in 3D space inside the cylinder, without having to account for any optical interactions between the interior liquid and cylinder.

We do this for every camera, by mounting the capture domain on a rotation stage, which is again controlled by an Arduino.  An automated calibration procedure rotates the stage and triggers each camera to image the front calibration plane, then rotates an additional 180 degrees to repeat the process.  The whole mess is controlled by a python script using pySerial, including the strobes, the rotation stage and the embedded camera controller.

This gives us the needed calibration data to express our scene as a tomographic inverse problem.  Here we look for the scene content that would reproduce the measurements (videos) we obtained, given a physical model for the scene.  In this capture case, the scene is simply emissivity adding up along a ray-path, so we get a linear inverse problem, that we solve using our new Stochastic Tomography algorithm.  The result is volumetric 3D fields that you can animate, inspect and slice through and re-render however you like, as seen below in the submission video.

Stochastic Tomography and its Applications in 3D Imaging of Mixing Fluids from al. et Hullin et al. on Vimeo.

No comments: