

Welcome
to the Computer Vision Research Laboratory at UCSB. The name is there for
historical reasons; however, our research agenda and project scope have
broadened to include a diverse set of interesting and challenging topics.
Today, students, visitors, and faculty are engaged in advanced research related
to computer vision, medical image analysis, computer graphics, and
bioinformatics. This page is constantly under construction and contains
descriptions of some sample projects in the Computer Vision Laboratory. If you
desire further information, please contact Professor Yuan-Fang Wang directly.
A Video Analysis Framework
for Soft Biometry Security Surveillance
We propose a distributed, multi-camera video
analysis paradigm for airport security surveillance. We propose to use a new
class of biometry signatures, which are called soft biometry including a
person's height, built, skin tone, color of shirts and trousers, motion
pattern, trajectory history, etc., to ID and track errant passengers and
suspicious events without having to shut down a whole terminal building and
cancel multiple flights. One might suspect, and we concur, that it is not very
difficult to compute some of these soft biometric signatures from individual,
properly-segmented image frames. The real challenge, however, is in designing a
robust and intelligent video-analysis system to support the reliable
acquisition, maintenance, and correspondence of soft biometry signatures in a
coordinated manner from a large number of video streams gathered in a large
camera network. The intellectual merit of the proposed research is to address
three important video analysis problems in a distributed, multi-camera
surveillance network: sensor network calibration, peer-to-peer sensor data
fusion, and stationary-dynamic cooperative camera sensing.
q Sensor network
calibration. In order to correctly correlate and fuse
information from multiple cameras, calibration is of paramount importance.
Cameras deployed in a large network have different physical characteristics,
such as location, field-of-view (FOV), spatial resolution, color sensitivity,
and notion of time. The difference makes answering even simple queries
exceedingly difficult. For example, if a subject moves from the FOV of one
camera to another, which has different color sensitivity and operates under
dissimilar lighting conditions, drastic changes in color signatures do occur. To
reliably compute soft biometry to assist the identification of subjects across
the FOVs of multiple cameras therefore requires careful color calibration. We
have developed and integrated a suite of algorithms for spatial, temporal, and
color calibration for cameras with both overlapped and non-overlapped FOVs.
q Peer-to-peer sensor
data fusion. As cameras have limited FOVs, multiple
cameras are often stationed to monitor an extended surveillance area, such as
an indoor arrival/departure lounge or an outdoor parking lot. Collectively,
these cameras provide complete spatial coverage of the surveillance area. (A
small amount of occlusion by architectural fixtures, decoration, and plantation
is often unavoidable.) Individually, the event description inferred from a
single camera is likely to be incomplete. (E.g., the trajectory of a vehicle
entering a parking lot is only partially observed from a certain vantage
point.) We have developed algorithms to fuse video data from multiple cameras
for reliable event detection using a hierarchy of Kalman Filters.
q Stationary-dynamic
cooperative camera sensing. To achieve
effective wide-area surveillance with limited hardware, a surveillance camera is
often configured to have a large FOV. However, once suspicious
persons/activities have been identified through video analysis, selected
cameras ought to obtain close-up views of these suspicious subjects for further
scrutiny and identification (e.g., to obtain a close-up view of the license
plate of a car or the face of a person). Our solution is to employ
stationary-dynamic camera assemblies to enable wide-area coverage and selective
focus-of-attention through cooperative sensing. That is, the stationary cameras
perform a global, wide FOV analysis of the motion patterns in the surveillance
zone. Based on some pre-specified criteria, the stationary cameras identify
suspicious behaviors or subjects that need further attention. The dynamic
camera, mounted on a mobile platform and equipped with a zoom lens, is then
used to obtain close-up view of the subject to reliably compute soft biometry
signatures. We have studied research issues to enable cooperative camera
sensing, including dynamic camera calibration and stationary-dynamic camera
sensing using a visual feedback paradigm.
Toward
Automated Reconstruction of 2D and 3D Scenes from Video ImagesIn
this project, we study the problem of automated reconstruction of 2D and 3D
scene (structure, appearance, and behavior) from video images. While there are
many similar projects being conducted at academia and industry, our project
addresses a number of difficult issues in (1) the 3D structures can vary
significantly from almost planar to highly complex with large variation in
depth, (2) the camera can be at varying distances to the scene, and (3) the
scene may show significant deformation over time. Some sample results are shown
below. In the first two examples, two images (the top row) were used to generate a 3D
model of the visible surface structure with correct texture mapping. Novel
views can then be synthesized from the reconstructed 3D model (the bottom
rows). These images are taken from inside a knee mockup and from real endoscopy
surgery. The fourth example shows the stitching of terrain model from UAV
(unmanned aerial vehicle) flight data.




We
study a unified framework for achieving robust and real-time image
stabilization and rectification. While compensating for a small amount of image
jitter due to platform vibration and hand tremble is not a very difficult task,
canceling a large amount of image jitter, due to significant, long-range, and
purposeful camera motion (such as panning, zooming, and rotation), is much more
challenging. While the terms “significant,
long-range, and purposeful” may
imply that we should not cancel this motion, it should be remembered that in
many real-world imaging systems there may be multiple objectives with conflicting
solutions. By this we mean that while significant and purposeful camera
manipulation is needed to explore new perspectives and acquire novel views,
such manipulation often times causes difficulty for the human operator in image
interpretation. Hence, an image stabilization algorithm should be designed to
allow significant freedom in image acquisition
while alleviating difficulty in image interpretation.
We mention here two practical
problems in diverse application domains that can make use of such an image
stabilization and rectification algorithm.
q
The
first application is in rectifying the video display in video-endoscopy.
Endoscopes procedures are minimally invasive surgical procedures where several
small incisions are made on the patient to accommodate surgical instruments
such as scalpels, scissors, staple guns, and an endoscope. The scope acquires
images of the bodily cavity that are displayed in real time on a monitor to
provide the visual feedback to the surgeon to perform surgery. In order to view
the anatomy in a highly constrictive body cavity (e.g., nasal passage in
rhinoscopy and inner ear cavity in otoscopy) and subject to the entry point
constraint, the surgeon often manipulates the scope with large panning and
rotation motion to eliminate blind spots. The views acquired can be highly
non-intuitive, e.g., the anatomy can appear with large perspective distortion,
sideways, or even upside down. Hence, while this type of manipulation is
necessary to reveal anatomical details, it does cause significant difficulty in
image interpretation.
q The second application is in
rectifying the video display in an unmanned aerial vehicle (UAV). Under
the control of a ground operator, an UAV may purposefully pitch, roll, and
rotate to maneuver into certain positions or to evade ground fire. Executing
such maneuvers severely alters the capture angle of the on-board camera. Again,
the banking action is purposefully aiding in the flight but hindering the
intuitive nature of the viewed video.
As
should become clear from the preceding discussion, in both applications, the
operator manipulates the camera using large panning, zooming, and rotating
actions to obtain better views of the subjects (i.e., organs and ground
vehicles). The views thus displayed can be highly non-intuitive, may have large
perspective or other types of distortion, and may even be upside down. The
freedom in such manipulation is absolutely necessary and should not be
restricted solely for easing the difficulty in image interpretation. Instead,
the goal in designing image rectification algorithms for such applications
should be to maintain some consistency and uniformity in the display, while
allowing the operator to survey the scene as before.
Our framework selectively compensates for unwanted
camera motion to maintain a stable view of the scene. The rectified display has
the same information content, but is shown in a much more operator-friendly
way. Our contribution is three-fold: (1) proposing
a unified image rectification algorithm to cancel large and purposeful image
motion to achieve a stable display that is applicable for both far-field and
near-field image conditions, (2) improving the robustness and real-time
performance of these algorithms with extensive validation on real images, and
(3) illustrating the potential of these algorithms by applying them to
real-world problems in diverse application domains. The following
figures show some sampling results (from real endoscopic surgery and UAV flight
data). Video sequences are shown from left to right and from top to bottom. The
display is grouped by showing the original, uncertified images on top and the
corresponding rectified images immediately on the bottom. As can be seen that
even with large panning, zooming and rotation (1st and 3rd
rows), our algorithm is able to maintain the orientation in the rectified
images (2nd and 4th rows).


We present a behavior simulation algorithm
that has the potential of enabling physically-correct, photo-realistic, and
real-time behavior simulation for soft tissues and organs. Our approach
combines a physically-correct formulation based on boundary element methods
with an efficient numeric solver. There are many scenarios, both in off-line
training of surgeons and on-line computer-assisted surgery, that the proposed
technique can be useful. For example, simulators, used for the purpose of
training surgeons in the pre-operative stage, require physically-correct,
real-time response of the graphical rendering to inputs from the trainee.
However, it has long been recognized that it is difficult to simultaneously
satisfy the requirements of physical-correctness and real-time. It is well
known that simulation of deformable behaviors is difficult. So far, approaches
to this problem can be roughly classified into two categories: those that aim more
at efficiency and those that aim more at accuracy. The former is categorized by
many mass-spring, spline, and superquadric models while the latter mainly
comprises techniques based on the finite element methods (FEM). Our modeling
scheme aims to achieve the best of both worlds by providing necessary accuracy
at a speed comparable to that of the efficient models.
Our method is based on a physically corrected
formulation based on boundary element methods (BEM). Naively speaking, BEM for structure
simulation concentrates the analysis power on the boundary of the object (or
the surface of a 3D organ). Intuitively, for the same level of discrete
resolution, the BEM-based methods have the potential of being significantly
less expensive than their FEM-based counterparts. This is because that the FEM
methods employ O(n^3) variables (representing the displacement of the body at a
particular point under externally applied forces and torques), scattered both
in the interior and on the boundary of the object. The BEM methods employ
O(n^2) variables (representing surface displacement and traction) only on the
boundary of the object, with n representing
the resolution along a particular dimension. Hence, BEM achieves significant
saving in terms of problem size. (One might argue that there are ways to reduce
the problem size for FEM, e.g., multi-resolution and adaptive grid. We are
aware of the possibilities. The above analysis serves only as an illustration
and does not mean to ignore these possibilities. Furthermore, similar efficient
numeric techniques are often applicable to BEM to achieve a corresponding
reduction in problem size as well.)
While this simple analysis might look
promising, the reality is never this straightforward. No matter it is an FEM-
or a BEM-based simulation, the gist of the simulation all comes down to solving
a system of linear equations of the form AX=B
over time (or a time marching problem), where A involves the material properties such as the Lame constants (and
other quantities that have to do with the discretization and interpolation
functions in the elements), B
involves known boundary conditions (e.g., known displacement and traction at
certain surface and interior points), and X
are the unknown displacement and traction inside and on the surface of the
object. Numeric analysts will quickly point out that while BEM requires less
number of variables – which results in a much smaller system of equations
(O(n^2) for BEM vs. O(n^3) for FEM) – the real complexity of the BEM solution
can be higher. This is because that while FEM results in a bigger matrix A, A
is often well conditioned and sparse. Efficient solutions exist for many
classes of well conditioned, sparse matrices, which bring the complexity of
solution to O(n^3). BEM, on the other hand, always results in a dense matrix A, and the complexity of the solution
can be proportional to the cube of the matrix size.
Our proposed numeric algorithm exploits a
mechanism to reduce matrix update and solve complexity. Our observation is that
the kernel function in the fundamental solution of the BEM is usually smooth,
which results in the coefficient matrix having a block-wise low-rank structure,
which we call sequentially semi-separable (SSS) for the one-dimensional case,
and hierarchically semi-separable (HSS) for higher dimensions. Exploiting this
particular matrix structure in our simulation, we are able to achieve real-time
behavior simulation on ordinary PC of fairly complex organs. Our simulation
allows large changes in boundary conditions, such as those resulted from
organ-organ and organ-body wall collision, which is not possible in current
state-of-the-art using BEM. This is a significant improvement that increases
the applicability of the BEM methods. Snap shots of two deformation sequences
are shown below.


Sample
video clips (These files are in Windows Media Player wmv
format. To save download bandwidth, the video clips are fairly short. They
demonstrate an organ being deformed by spatially- and temporally-varying large
disturbance with the volume preserved.)
A
spherical organ under a poking disturbance. (584k)
The
same spherical organ under a rubbing disturbance. (560k)
The
same sphere organ under a combined rubbing and poking disturbance. (631k)
Extremely
large deformation. (2.5M)
We
are investigating a system for sensing, modeling and control of an upper
extremity neural prosthesis. The sensing unit employs a computer vision
approach wherein one or more video cameras are used to detect movement of the
arm and provide the arm position information to a model. The model uses
kinematics and dynamics simulation to control the stimulation and animation of
the articulated links. The motion control unit integrates a priori knowledge
from the trained model and the observed sensor input, smoothes the limb motion
tracking results and delivers a feedback signal to guide or correct the sensing
process. In our experiments we compared sensed elbow angle accuracy results
between our computer vision based system and developed a visualization system
for the arm model.


Protein Structure Alignment and Fast Similarity Search Using Local
Shape SignaturesThe number of known protein structures is increasing
rapidly, as more researchers are joining the hunt for novel protein structures,
more experimental apparatus are deployed, and more theoretical frameworks and
software tools are developed for predicting protein structures. Protein
structure comparison tools play an important role in this enterprise. In
predicting a protein structure from its sequence, researchers usually form a
new candidate structure. To avoid potential exponential explosion of
structures, that new structure is compared with previously known structures for
verification/tuning/correction. Discovering similar folds or similar substructures
thus provides restrictions on the conformational space and serves as a starting
point for producing useful models.
We present a new method for conducting
protein structure similarity searches, which improves on the accuracy,
robustness, and efficiency of some existing techniques. Our method is grounded
in the theory of differential geometry on 3D space curve matching. We generate
shape signatures for proteins that are invariant, localized, robust, compact,
and biologically meaningful. To improve matching accuracy, we smooth the noisy
raw atomic coordinate data with spline fitting. To improve matching efficiency,
we adopt a hierarchical coarse-to-fine strategy. We use an efficient
hashing-based technique to screen out unlikely candidates and perform detailed
pairwise alignments only for a small number of candidates that survive the
screening process. Contrary to other hashing based techniques, our technique
employs domain specific information (not just geometric information) in
constructing the hash key, and hence, is more tuned to the domain of biology.
Furthermore, the invariancy, localization, and compactness of the shape
signatures allow us to utilize a well-known local sequence alignment algorithm
for aligning two protein structures. One measure of the efficacy of the
proposed technique is that we were able to perform structure alignment queries
30 times faster than a well-known method while keeping the quality of the query
results at the same level.





FPV: Fast
Protein Visualization Using Java3DThe ability to visualize the 3D structure of
proteins is critical in many areas such as drug design and protein modeling.
This is because the 3D structure of a protein determines its interaction with
other molecules, hence its function, and the relation of the protein to other
known proteins. We have developed a protein visualization system based on Java
3D. There is growing trend in adopting the Java technology in the fields of
bioinformatics and computational biology. The main advantages of Java are its
compatibility across different systems/platforms and having the ability to be
run remotely through web browsers. Using Java 3D as a graphics engine has also
the additional advantage of rapid application development, because Java 3D API
incorporates a high-level scene graph model that allows developers to focus on
the objects and the scene composition. Java 3D also promises high performance,
because it is capable of taking advantage of the graphics hardware in a system.
However, using Java 3D for visualization has some performance issues with it.
The primary concerns about molecular visualization tools based on Java 3D are
in their being slow in terms of interaction speed and in their inability to
load large molecules. This behavior is especially apparent when the number of
atoms to be displayed is huge, or when several proteins are to be displayed
simultaneously for comparison. In this project we present techniques for
organizing a Java 3D scene graph to tackle these problems. We demonstrate the
effectiveness of these techniques by comparing the visualization component of
our system with two other Java 3D based molecular visualization tools. In
particular, for van der Waals display mode, with the efficient organization of
the scene graph, we could achieve up to eight times improvement in rendering
speed and could load molecules three times as large as the previous systems
could.

Efficient Molecular
Surface Generation Using Level-Set MethodsMolecules interact through their surface
residues. Calculation of the molecular surface of a protein structure is thus
an important step for a detailed functional analysis. One of the main
considerations in comparing existing methods for molecular surface computations
is their speed. Most of the methods that produce satisfying results for small
molecules fail to do so for large complexes. In this project we present a
level-set-based approach to compute and visualize a molecular surface at a
desired resolution. The emerging level-set methods have been used for computing
evolving boundaries in several application areas from fluid mechanics to
computer vision. We use a level-set-based approach to compute the molecular
surface of a protein of known structure. Our method proceeds in three
stages:(1) An outward propagation step that generates the van der Waals surface
and the solvent-accessible surface, (2) An inward propagation step that
generates the re-entrant surfaces and contact surfaces, i.e., the solvent
excluded or the molecular surface, and (3) Another inward propagation step to
determine the outer surface and interior cavities of the molecule. The novelty
of our algorithm is three-fold: First, we propose a unified framework for
solving all the tasks above based on the level-set front-propagation method;
second, our algorithm traverses each grid cell at most once and never visits grid cells that are outside the
sought-after surfaces to guarantee efficiency; and third, our algorithm
correctly detects interior cavities for all kinds of protein topologies. Our
method is able to calculate the surface and interior inaccessible cavities very
efficiently even for very large molecular complexes. We compared our method to
some of the most widely used molecular visualization tools (Swiss-PDBViewer,
PyMol, and Chimera) and our results show that we can calculate and display a
molecular surface 1.5 to 3.14 times faster on average than all three of the
compared programs. Furthermore, we demonstrate that our method is able to
detect all of the interior inaccessible cavities that can accommodate one or
more water molecules.
Automated Protein Classification Using Consensus DecisionProtein classification is important as a
protein’s label often times gives a good indication to its biological function.
Of many existing classification schemes, SCOP is probably the most trusted one
(as it involves significant manual inspection). However, SCOP classification is
labor intensive and is not updated frequently. Hence, there is a desire to be
able to predict, through some automated means, the SCOP classification of new
proteins. A multitude of techniques, based on both sequence and structure
similarity, can be used for predicting SCOP classification. However, their
applicability and accuracy vary, depending both on the level of taxonomy
(family, superfamily, or fold level) and the parameter settings of the
techniques. Hence, the classification results from multiple techniques often
show varying degrees of conformity with the manually-generated SCOP
classifications (the ground truth) and with one another.
This project is aimed at improving the
accuracy of automated SCOP classification by combining the decisions of
multiple methods in an intelligent manner, using the consensus of a committee (or
an ensemble) classifier. Our technique is rooted in machine learning that shows
that by judicially employing component classifiers, an ensemble classifier can
be constructed to outperform its components. We use two sequence- and three
structure-comparison tools as component classifiers. Given a protein structure,
using the joint hypothesis we first determine if the protein belongs to an
existing group (family, superfamily, or fold) in the SCOP hierarchy. For the
proteins that are predicted as members of the existing groups, we then compute
their family-, superfamily-, and fold-level classifications using the consensus
classifier.
We have shown that by using this method we
can significantly improve the classification accuracy compared to those of the
individual component classifiers. In particular, we have achieved error rates
that are 3 to 12 times less than the individual classifiers' error rates at the
family level, 1.5 to 4.5 times less at the superfamily level, and 1.1 to 2.4
times less at the fold level. Our method achieves 98% success for family
assignments, 87% success for superfamily assignments, and 61% success for fold
assignments. What is significant is that these accuracy numbers are very close
to the theoretically maximum performance achievable through ensemble
combination.