PUBLICATION ABSTRACTS
Autonomous Robots,
vol. 44, no. 5, pp. 1485-1503, November 2020.
In this paper, we present a system for estimating the trajectory of a moving RGB-D camera with applications to building maps of large indoor environments. Unlike the current most researches, we propose a ‘feature model’ based RGB-D visual odometry system for a computationally-constrained mobile platform, where the ‘feature model’ is persistent and dynamically updated from new observations using a Kalman filter. In this paper, we firstly propose a mixture of Gaussians model for the depth random noise estimation, which is used to describe the spatial uncertainty of the feature point cloud. Besides, we also introduce a general depth calibration method to remove systematic errors in the depth readings of the RGB-D camera. We provide comprehensive theoretical and experimental analysis to demonstrate that our model based iterative-closest-point (ICP) algorithm can achieve much higher localization accuracy compared to the conventional ICP. The visual odometry runs at frequencies of 30 Hz or higher, on VGA images, in a single thread on a desktop CPU with no GPU acceleration required. Finally, we examine the problem of place recognition from RGB-D images, in order to form a pose-graph SLAM approach to refining the trajectory and closing loops. We evaluate the effectiveness of the system on using publicly available datasets with ground-truth data. The entire system is available for free and open-source online.
[Download ar20.pdf (4344599 bytes)]
Visual Computer,
vol. 34, no. 5, pp. 605-616, May 2018.
Online mapping services from Google, Apple, and Microsoft are exceedingly popular applications for exploring 3D
urban cities. Their explosive growth provides impetus for photorealistic 3D modeling of urban scenes.
Although classical algorithms such as multiview stereo and laser range scanners are traditional sources for
detailed 3D models of existing structures, they generate heavyweight models that are not appropriate for the
streaming data that these navigation applications leverage. Instead, lightweight models as produced by interactive
image-based tools are better suited for this domain. The contribution of this work is that it merges the benefits
of multiview geometry, an intuitive sketching interface, and dynamic texture mapping to produce lightweight
photorealistic 3D models of buildings. We present experimental results from urban scenes using our PhotoSketch system.
[Download vc18.pdf (4155317 bytes)]
American J. Physiology Heart and Circulatory Physiology ,
vol. 313, no. 5, pp. H1063-H1073, July 2017.
Numerous studies have examined the role of aquaporins in osmotic water transport in various systems, but virtually
none have focused on the role of aquaporin in hydrostatically driven water transport involving mammalian cells
save for our laboratory's recent study of aortic endothelial cells. Here, we investigated aquaporin-1 expression
and function in the aortic endothelium in two high-renin rat models of hypertension, the spontaneously hypertensive
genetically altered Wystar-Kyoto rat variant and Sprague-Dawley rats made hypertensive by two kidney, one clip
Goldblatt surgery. We measured aquaporin-1 expression in aortic endothelial cells from whole rat aortas by
quantitative immunohistochemistry and function by measuring the pressure-driven hydraulic conductivities of
excised rat aortas with both intact and denuded endothelia on the same vessel. We used them to calculate the
effective intimal hydraulic conductivity, which is a combination of endothelial and subendothelial components.
We observed well-correlated enhancements in aquaporin-1 expression and function in both hypertensive rat models
as well as in aortas from normotensive rats whose expression was upregulated by 2 h of forskolin treatment.
Upregulated aquaporin-1 expression and function may be a response to hypertension that critically determines
conduit artery vessel wall viability and long-term susceptibility to atherosclerosis.
[Download ajpheart17.pdf (635430 bytes)]
Graphical Models,
vol. 75, pp. 157-176, July 2013.
We present a new multiscale surface representation for 3D shape matching that is based on
scale-space theory. The representation, Curvature Scale-Space 3D (CS3), is well-suited for
measuring dissimilarity between (partial) surfaces having unknown position, orientation,
and scale. The CS3 representation is obtained by evolving the surface curvatures according
to the heat equation. This evolution process yields a stack of increasingly smoothed surface
curvatures that is useful for keypoint extraction and descriptor computations. We augment
this information with an associated scale parameter at each stack level to define our multiscale
CS3 surface representation. The scale parameter is necessary for automatic scale
selection, which has proven to be successful in 2D scale-invariant shape matching applications.
We show that our keypoint and descriptor computation approach outperforms many
of the leading methods. The main advantages of our representation are its computational
efficiency, lower memory requirements, and ease of implementation.
[Download gmod13.pdf (2960968 bytes)]
Visual Computer,
vol. 29, no. 6, pp. 525-534, June 2013.
DOI: 10.1007/s00371-013-0816-2.
Presented at Computer Graphics International, June 2013.
We address the problem of warping 2D images of garments onto target
mannequins of arbitrary poses. The motivation for this work is to
enable an online shopper to drag and drop selected articles of
clothing onto a single mannequin to configure and visualize
outfits. Such a capability requires each garment to be available in
a pose that is consistent with the target mannequin. A 2D
deformation system is proposed, which enables a designer to quickly
deform images of clothing onto a target shape with both fine and
coarse controls over the deformation.
This system has retargeted thousands of images for retailers to
establish virtual dressing rooms for their online customers.
[Download vc13.pdf (1539393 bytes)]
WORK: A Journal of Prevention, Assessment, and Rehabilitation,
vol. 41, no. 1, pp. 37-52, January 2012.
Musicians have long been hampered by the challenge in turning sheet music
while their hands are occupied playing an instrument.
The sight of a human page turner assisting a pianist during a performance,
for instance, is not uncommon.
This need for a page turning solution is no less acute during practice
sessions, which account for the vast majority of playing time.
Despite widespread appreciation of the problem, there have been virtually
no robust and affordable products to assist the musician.
Recent progress in assistive technology and electronic reading devices
offer promising solutions to this long-standing problem.
The objective of this paper is to survey the technology landscape and
assess the benefits and drawbacks of page turning solutions for musicians.
A full range of mechanical and digital page turning products are reviewed.
[Download work12.pdf (2863525 bytes)]
Proc. 3D Data Imaging, Modeling, Processing, Visualization, and Transmission (3DIMPVT11),
Hangzhou, China, May 2011.
Partial 3D shape matching refers to the process of computing a
similarity measure between partial regions of 3D objects.
This remains a difficult challenge without a priori knowledge of the
scale of the input objects, as well as their rotation and translation.
This paper focuses on the problem of partial shape matching among 3D
objects of unknown scale.
We consider the problem of face detection on arbitrary 3D
surfaces and introduce a multiscale surface representation for
feature extraction and matching.
This work is motivated by the scale-space theory for images.
Scale-space based techniques have proven very successful for dealing
with noise and scale changes in matching applications for 2D images.
However, efficient and practical scale-space representations for 3D
surfaces are lacking.
Our proposed scale-space representation is defined in terms
of the evolution of surface curvatures according to the heat equation.
This representation is shown to be insensitive to noise,
computationally efficient, and capable of automatic scale selection.
Examples in face detection and surface registration are given.
[Download 3dimpvt11a.pdf (1764745 bytes)]
Proc. 3D Data Imaging, Modeling, Processing, Visualization, and Transmission (3DIMPVT11),
Hangzhou, China, May 2011, pp. 124-131.
Laser range scanners are widely used to acquire accurate scene
measurements. The massive point clouds they generate, however,
present challenges to efficient modeling and visualization.
State-of-the-art techniques for generating 3D models from voluminous
range data is well-known to demand large computational and storage
requirements.
In this paper, attention is directed to the modeling of urban
buildings directly from range data.
We present an efficient modeling algorithm that exploits a priori
knowledge that buildings can be modeled from cross-sectional contours
using extrusion and tapering operations.
Inspired by this simple workflow, we identify key cross-sectional
slices among the point cloud.
These slices capture changes across the building facade along the
principal axes.
Standard image processing algorithms are used to remove noise, fill
missing data, and vectorize the projected points into planar contours.
Applying extrusion and tapering operations to these contours permits
us to achieve dramatic geometry compression, making the resulting
models suitable for web-based applications such as Google
Earth or Microsoft Virtual Earth.
This work has applications in architecture, urban design, virtual
city touring, and online gaming.
We present experimental results on synthetic and real urban building
datasets to validate the proposed algorithm.
[Download 3dimpvt11b.pdf (4768294 bytes)]
Intl. Journal on Computer Vision,
vol. 78, no. 2-3, pp. 237-260, July 2008.
The photorealistic modeling of large-scale scenes, such as urban structures,
requires a fusion of range sensing technology and traditional digital photography.
This paper presents a system that integrates automated 3D-to-3D and 2D-to-3D
registration techniques, with multiview geometry for the photorealistic
modeling of urban scenes.
The 3D range scans are registered using our automated 3D-to-3D registration
method that matches 3D features (linear or circular) in the range images.
A subset of the 2D photographs are then aligned with the 3D model using
our automated 2D-to-3D registration algorithm that matches linear features
between the range scans and the photographs.
Finally, the 2D photographs are used to generate a second 3D model of the scene
that consists of a sparse 3D point cloud, produced by applying a multiview
geometry (structure-from-motion) algorithm directly on a sequence of 2D
photographs.
The last part of this paper introduces a novel algorithm for automatically
recovering the rotation, scale, and translation that best aligns the dense
and sparse models.
This alignment is necessary to enable the photographs to be optimally
texture mapped onto the dense model.
The contribution of this work is that it merges the benefits of
multiview geometry with automated registration of 3D range scans
to produce photorealistic models with minimal human interaction.
We present results from experiments in large-scale urban scenes.
[Download ijcv08.pdf (4740673 bytes)]
4th Intl. Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT08),
Georgia Institute of Technology, Atlanta, GA, June 2008.
Modern range scanners can capture the geometry of large urban scenes on an
unprecedented scale. While the volume of data is overwhelming, urban scenes
can be approximated well by parametric surfaces such as planes. Piecewise
planar representation can reduce the size of the data dramatically.
Furthermore, it is ideal for rendering and other high-level applications.
We present a segmentation algorithm that extracts a piecewise planar function
from a large range image. Many existing algorithms for large datasets apply
planar criteria locally to achieve efficient segmentations. Our novel framework
combines local and global approximants to guarantee truly planar components
in the output. To demonstrate the effectiveness of our approach, we present
an evaluation method for piecewise planar segmentation results based on the
minimum description length principle. We compare our method to region growing
on simulated and actual data. Finally, we present results on large scale range
images acquired at New York's Grand Central Terminal.
[Download 3dpvt08.pdf (1148341 bytes)]
Chapter 4 in Multimodal Surveillance: Sensors, Algorithms and Systems,
Z. Zhu and T. S. Huang (eds), ISBN-10: 1596931841, Artech House Publisher,
July 2007, pp 59-90.
See description of
book here.
Proc. IEEE Conf. on Computer Vision and Pattern Recognition,
June 2007.
Recent improvements in laser Doppler vibrometry (LDV), day/night infrared (IR),
and electro-optical (EO) imaging technology have created the opportunity to
create a long-range multimodal surveillance system. This multimodal
capability would greatly improve security force performance through
clandestine listening of targets that are probing or penetrating a
perimeter defense. This system could also provide the feeds for advanced
face and voice recognition systems.
The study of the capabilities of these three types of sensors are critical
to such surveillance tasks. IR and EO cameras have been studied and widely
used in human and vehicle detection in traffic and surveillance applications.
Laser Doppler vibrometers can effectively detect vibration within
two hundred meters with a sensitivity on the order of 1μm/s. These
instruments have been used to measure the vibrations of civil structures
like high-rise buildings, bridges, towers, etc. at distances of up to 200m.
However, literature on remote acoustic detection using the emerging LDVs
is rare.
Therefore, we mainly focus on the experimental study of the LDV-based
voice detection, in the context of a multimodal surveillance system.
This paper presents an overall picture of our technical approach:
the integration of laser Doppler vibrometry and IR /color imaging for
multimodal surveillance.
[Download cvpr07.pdf (95620 bytes)]
Proc. IEEE Conf. on Multimedia and Expo (ICME 2006) ,
pp. 1649-1652, Toronto, Canada, July, 2006.
Multimodal surveillance systems using visible/IR cameras and other sensors
are widely deployed today for security purposes, particularly when subjects
are at a large distance.
However, audio information as an important data source has not been
well explored.
One of the reasons is because audio detection using microphones needs
installation close to the subjects in monitoring.
In this paper, we investigate a novel ``optical'' sensor, called
Laser Doppler Vibrometer (LDV), for capturing voice signals in a very
large range to realize a truly remote and multimodal surveillance system.
Speech enhancement approaches are studied based on the characteristics of
LDV audio.
Experimental results show that remote voice detection via an LDV is promising
when choosing appropriate targets close to human subjects in the environment.
[Download icme06.pdf (447656 bytes)]
Proc. IEEE Conf. on Computer Vision and Pattern Recognition,
pp. 2293-2300, June 2006.
The photorealistic modeling of large-scale scenes, such as urban
structures, requires a fusion of range sensing technology and
traditional digital photography.
This paper presents a system that integrates multiview geometry and automated
3D registration techniques for texture mapping 2D images onto 3D range data.
The 3D range scans and the 2D photographs are respectively used to
generate a pair of 3D models of the scene.
The first model consists of a dense 3D point cloud, produced by using a
3D-to-3D registration method that matches 3D lines in the range images.
The second model consists of a sparse 3D point cloud, produced by applying a
multiview geometry (structure-from-motion) algorithm directly on a sequence
of 2D photographs.
This paper introduces a novel algorithm for automatically recovering the
rotation, scale, and translation that best aligns the dense and sparse models.
This alignment is necessary to enable the photographs to be optimally
texture mapped onto the dense model.
The contribution of this work is that it merges the benefits of
multiview geometry with automated registration of 3D range scans
to produce photorealistic models with minimal human interaction.
We present results from experiments in large-scale urban scenes.
[Download cvpr06.pdf (236121 bytes)]
3rd Intl. Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT06),
University of North Carolina, Chapel Hill, June 2006.
Range sensing technology allows the photorealistic modeling of large-scale
scenes, such as urban structures.
The generated 3D representations, after automated registration, are useful for
urban planning, historical preservation, or virtual reality applications.
One major issue in 3D modeling of complex large-scale scenes is that the final
result is a dense complicated mesh.
Significant, in some cases manual, postprocessing (mesh simplification, hole
filling) is required to make this representation usable by graphics or CAD
applications.
This paper presents a 3D modeling approach that models large planar scene areas
of the scene with planar primitives (extracted via a segmentation pre-process),
and non-planar areas with mesh primitives.
In that respect, the final model is significantly compressed.
Also, lines of intersection between neighboring planes are modeled as such.
These steps bring the model closer to graphics/CAD applications.
We present results from experiments with complex range scans from urban
structures and from the interior of a large-scale landmark urban building
(Grand Central Terminal, NYC).
[Download 3dpvt06a.pdf (781883 bytes)]
3rd Intl. Symposium on 3D Data Processing, Visualization, and Transmission,
University of North Carolina, Chapel Hill, June 2006.
In this paper, a unified, segmentation-based approach is proposed to deal with
both stereo reconstruction and moving objects detection problems using multiple
stereo mosaics.
Each set of parallel-perspective (pushbroom) stereo mosaics is generated from
a video sequence captured by a single video camera.
First a color-segmentation approach is used to extract the so-called natural
matching primitives from a reference view of a pair of stereo mosaics to
facilitate both 3D reconstruction of textureless urban scenes and man-made
moving targets (e.g., vehicles).
Multiple pairs of stereo mosaics are used to improve the accuracy and robustness
in 3D recovery and occlusion handling.
Moving targets are detected by inspecting their 3D anomalies, either violating
the epipolar geometry of the pushbroom stereo or exhibiting abnormal 3D
structure. Experimental results on both simulated and real video sequences
are provided to show the effectiveness of our approach.
[Download 3dpvt06b.pdf (3262183 bytes)]
Proc. SPIE: Defense and Security Symposium,
Orlando, Florida, April 2006.
We propose a content-based 3D mosaic (CB3M) representation for long video
sequences of 3D and dynamic scenes captured by a camera on a mobile platform.
The motion of the camera has a dominant direction of motion (as on an
airplane or ground vehicle), but 6 DOF motion is allowed.
In the first step, a set of parallel-perspective (pushbroom) mosaics with
varying viewing directions is generated to capture both the 3D and dynamic
aspects of the scene under the camera coverage.
In the second step, a segmentation-based stereo matching algorithm is applied
to extract parametric representations of the color, structure and motion of the
dynamic and/or 3D objects in urban scenes where a lot of planar surfaces exist.
Multiple pairs of stereo mosaics are used for facilitating reliable stereo
matching, occlusion handling, accurate 3D reconstruction and robust moving
target detection.
We use the fact that all the static objects obey the epipolar geometry of
pushbroom stereo, whereas an independent moving object either violates the
epipolar geometry if the motion is not in the direction of sensor motion or
exhibits unusual 3D structures.
The CB3M is a highly compressed visual representation for a very long video
sequence of a dynamic 3D scene. More importantly, the CB3M representation
has object contents of both 3D and motion.
Experimental results are given for the CB3M construction for both simulated
and real video sequences to show the accuracy and effectiveness of the
representation.
[Download spie06.pdf (2221849 bytes)]
IEEE Trans. Image Processing,
vol. 14, no. 10, pp. 1422-1434, October 2005.
This paper describes a novel technique to recover large similarity
transformations (rotation/scale/translation) and moderate perspective
deformations among image pairs.
We introduce a hybrid algorithm that features log-polar mappings and
nonlinear least squares optimization.
The use of log-polar techniques in the spatial domain is introduced as a
preprocessing module to recover large scale changes (e.g., at least
four-fold) and arbitrary rotations.
Although log-polar techniques are used in the Fourier-Mellin transformto accommodate rotation and scale in the frequency domain, its use in
registering images subjected to very large scale changes has not yet been
exploited in the spatial domain.
In this paper, we demonstrate the superior performance of the log-polar
transform in featureless image registration in the spatial domain.
We achieve subpixel accuracy through the use of nonlinear least squares
optimization.
The registration process yields the eight parameters of the perspective
transformation that best aligns the two input images.
Extensive testing was performed on uncalibrated real images and an array of
10,000 image pairs with known transformations derived from the Corel
Stock Photo Library of royalty-free photographic images.
[Download tip05.pdf (2405299 bytes)]
IEEE/AIPR Workshop on Multi-Modal Imaging,
Washington, DC, October 2005.
We propose a content-based three-dimensional (3D)
mosaic representation for long video sequences of 3D
and dynamic scenes captured by a camera on a mobile
platform. The motion of the camera has a dominant
direction of motion (as on an airplane or ground
vehicle), but 6 degrees-of-freedom (DOF) motion is
allowed. In the first step, a pair of generalized parallel-
perspective (pushbroom) stereo mosaics is generated
that captured both the 3D and dynamic aspects of the
scene under the camera coverage. In the second step, a
segmentation-based stereo matching algorithm is
applied to extract parametric representation of the
color, structure and motion of the dynamic and/or 3D
objects in urban scenes where a lot of planar surfaces
exist. Based on these results, the content-based 3D
mosaic (CB3M) representation is created, which is a
highly compressed visual representation for very long
video sequences of dynamic 3D scenes. Experimental
results will be given.
[Download aipr05.pdf (2571197 bytes)]
IEEE Workshop on Object Tracking and Classification In and Beyond
the Visible Spectrum,
in conjunction with the IEEE Conference on Computer Vision and
Pattern Recognition, San Diego, CA, June 2005.
This paper describes a multimodal surveillance system for human
signature detection.
The system consists of three types of sensors: infrared (IR) cameras,
pan/tilt/zoom (PTZ) color cameras and laser Doppler vibrometers (LDVs).
The LDV is explored as a new non-contact remote voice detector.
We have found that voice energy vibrates most objects and the
vibrations can be detected by an LDV.
Since signals captured by the LDV are very noisy, we have designed
algorithms with Gaussian bandpass filtering and adaptive volume
scaling to enhance the LDV voice signals.
The enhanced voice signals are intelligible from targets without
retro-reflective finishes at short or medium distances (100m).
By using retro-reflective tapes, the distance could be as far as 300
meters.
However, the manual operation to search and focus the laser beam on a
target with both vibration and reflection is very difficult at medium
and large distances.
Therefore, infrared (IR) imaging for target selection and localization
is also discussed.
Future work remains in automatic LDV targeting and intelligent
refocusing for long range LDV listening.
[Download otcbvs05.pdf (726400 bytes)]
IEEE Workshop on Advanced 3D Imaging for Safety and Security,
in conjunction with the IEEE Conference on Computer Vision and
Pattern Recognition, San Diego, CA, June 2005.
In this paper, we propose a dynamic pushbroom stereo mosaic approach
for representing and extracting 3D structures and independent moving
targets from urban 3D scenes.
Our goal is to acquire panoramic mosaic maps with motion tracking
information for 3D (moving) targets using a light aerial vehicle
equipped with a video camera flying over an unknown area for urban
surveillance.
In dynamic pushbroom stereo mosaics, independent moving targets can be
easily identified in the matching process of stereo mosaics by detecting
the "out-of-place" regions that violate epipolar constraints and/or
give 3D anomalies.
We propose a segmentation-based stereo matching approach with natural
matching primitives to estimate the 3D structure of the scene,
particularly the ground structures (e.g., roads) on which humans or
vehicles move, and then to identify moving targets and to measure
their 3D structures and movements.
[Download a3diss05.pdf (1431382 bytes)]
CRC Handbook of Computer Science and Engineering, CRC Press, 1997, 2004.
This chapter reviews the principal ideas of sampling theory,
reconstruction, and antialiasing.
Sampling theory is central to the study of sampled-data systems, e.g., digital
image transformations.
It lays a firm mathematical foundation for the analysis of sampled signals,
offering invaluable insight into the problems and solutions of sampling.
It does so by providing an elegant mathematical formulation describing the
relationship between a continuous signal and its samples.
We use it to resolve the problems of image reconstruction and aliasing.
Reconstruction is an interpolation procedure applied to the sampled data.
It permits us to evaluate the discrete signal at any desired position,
not just the integer lattice upon which the sampled signal is given.
This is useful when implementing geometric transformations, or warps,
on the image.
Aliasing refers to the presence of unreproducibly high frequencies in
the image
and the resulting artifacts that arise upon undersampling.
Together with defining theoretical limits on the continuous reconstruction
of discrete input, sampling theory yields the guidelines for
numerically measuring the quality of various proposed filtering
techniques.
This proves most useful in formally describing reconstruction, aliasing,
and the filtering necessary to combat the artifacts that may appear at the
output.
[Download crc04.pdf (5827433 bytes)]
Visual Computer,
vol. 19, pp. 67-78, 2003.
This paper discusses the principles of traditional mosaics, and describes
a technique for implementing a digital mosaicing system.
The goal of this work is to transform digital images into
traditional mosaic-like renderings.
We achieve this effect by recovering freeform feature curves from the image
and laying rows of tiles along these curves.
Composition rules are applied to merge these tiles into an intricate jigsaw
that conforms to classical mosaic styles.
Mosaic rendering offers the user flexibility over every aspect of
this craft, including tile arrangement, shapes, and colors.
The result is a system that makes this wonderful craft more flexible and
widely accessible than previously possible.
[Download vc03.pdf (1282445 bytes)]
Journal of Computational and Applied Mathematics,
vol. 143, no. 2, pp. 145-188, 2002.
This paper describes the use of cubic splines for interpolating
monotonic data sets.
Interpolating cubic splines are popular for fitting data because they use
low-order polynomials and have C2 continuity, a property that
permits them to satisfy a desirable smoothness constraint.
Unfortunately, that same constraint often violates another
desirable property: monotonicity.
It is possible for a set of monotonically increasing (or decreasing)
data points to yield a curve that is not monotonic, i.e., the
spline may oscillate.
In such cases, it is necessary to sacrifice some smoothness
in order to preserve monotonicity.
The goal of this work is to determine the smoothest possible
curve that passes through its control points while simultaneously
satisfying the montonicity constraint.
We first describe a set of conditions that form the basis of the
monotonic cubic spline interpolation algorithm presented in this paper.
The conditions are simplified and consolidated to yield a fast method
for determining monotonicity.
This result is applied within an energy minimization framework to yield
linear and nonlinear optimization-based methods.
We consider various energy measures for the optimization objective functions.
Comparisons among the different techniques are given, and superior
monotonic C2 cubic spline interpolation results are presented.
Extensions to shape preserving splines and data smoothing are described.
[Download jcam02.pdf (370852 bytes)]
Journal of Graphics Tools,
vol. 5, no. 3, pp. 11-33, 2001.
Separable resampling algorithms significantly reduce the complexity of image
warping.
Fant presented a separable algorithm that is well suited for hardware
implementation [IEEE CG&A, 1986].
That method, however, is inherently serial, and applies only when the
inverse mapping is given.
Wolberg presented another algorithm that is less suited for hardware
implementation, and applies only when the forward mapping is given
[Digital Image Warping, 1990].
This paper demonstrates the equivalence of the two algorithms in the
sense that they produce identical output scanlines.
We derive a variation of Fant's algorithm that applies when the forward mapping
is given, and a variation of Wolberg's algorithm that applies when the inverse
mapping is given.
Integrated hardware implementations that perform 1-D resampling
under either forward or inverse mappings are presented for both algorithms
based on their software descriptions.
The Fant algorithm has the advantage of being simple when implemented in
hardware, while the Wolberg algorithm has the advantage of being parallelizable
and facilitates a faster software implementation.
The Wolberg algorithm also has the advantage of decoupling the roundoff errors
made among intervals since it does not accrue errors through the incremental
calculations required by the Fant algorithm.
[Download jgt01.pdf (209030 bytes)]
Proc. IEEE Intl. Conf. on Image Processing,
Vancouver, Canada, September 2000.
This paper describes a hierarchical image registration algorithm for
affine motion recovery.
The algorithm estimates the affine transformation parameters necessary to
register any two digital images misaligned due to rotation, scale, shear,
and translation.
The parameters are computed iteratively in a coarse-to-fine hierarchical
framework using a variation of the Levenberg-Marquadt nonlinear least
squares optimization method.
This approach yields a robust solution that precisely registers images with
subpixel accuracy.
A log-polar registration module is introduced to accommodate
arbitrary rotation angles and a wide range of scale changes.
This serves to furnish a good initial estimate for the optimization-based
affine registration stage.
We demonstrate the hybrid algorithm on pairs of digital images
subjected to large affine motion.
[Download icip00.pdf (169773 bytes)]
SPIE Conf. on Automatic Target Recognition X,
Orlando, Florida, April 2000.
This paper describes a hierarchical image registration algorithm
to infer the perspective transformation that best matches a pair of images.
This work estimates the perspective parameters by approximating the
transformation to be piecewise affine.
We demonstrate the process by subdividing a reference image into tiles
and applying affine registration to match them in the target image.
The affine parameters are computed iteratively in a coarse-to-fine hierarchical
framework using a variation of the Levenberg-Marquadt nonlinear least
squares optimization method.
This approach yields a robust solution that precisely registers image
tiles with subpixel accuracy.
The corresponding image tiles are used to estimate a global perspective
transformation.
We demonstrate this approach on pairs of digital images subjected to large
perspective deformation.
[Download spie00.ps.gz (1595678 bytes)]
Proc. Computer Graphics Intl. '99,
Canmore, Canada, June 1999.
This paper describes the use of cubic splines for interpolating
monotonic data sets.
Interpolating cubic splines are popular for fitting data because they use
low-order polynomials and have C2 continuity, a property that
permits them to satisfy a desirable smoothness constraint.
Unfortunately, that same constraint often violates another
desirable property: monotonicity.
The goal of this work is to determine the smoothest possible
curve that passes through its control points while simultaneously
satisfying the monotonicity constraint.
We first describe a set of conditions that form the basis of the
monotonic cubic spline interpolation algorithm presented in this paper.
The conditions are simplified and consolidated to yield a fast method
for determining monotonicity.
This result is applied within an energy minimization framework to yield
linear and nonlinear optimization-based methods.
We consider various energy measures for the optimization objective functions.
Comparisons among the different techniques are given, and superior
monotonic cubic spline interpolation results are presented.
[Download cgi99.ps (282295 bytes)]
Visual Computer,
vol. 14, pp. 360-372, 1998.
Image morphing has been the subject of much attention in recent years.
It has proven to be a powerful visual effects tool in film and television,
depicting the fluid transformation of one digital image into another.
This paper surveys the growth of this field and describes
recent advances in image morphing in terms of three areas:
feature specification, warp generation methods, and transition control.
These areas relate to the ease of use and quality of results.
We describe the role of radial basis functions, thin plate
splines, energy minimization, and multilevel free-form deformations
in advancing the state-of-the-art in image morphing.
Recent work on a generalized framework for morphing among multiple
images is described.
[Download vc98.pdf (640853 bytes)]
IEEE Computer Graphics and Applications,
vol. 18, no. 1, pp. 58-71, January-February 1998.
This paper presents polymorph, a novel algorithm for morphing
among multiple images.
Traditional image morphing generates a sequence of images depicting an
evolution from one image into another.
We extend this approach to permit morphed images to be derived from
more than two images at once.
We formulate each input image to be a vertex of a simplex.
An inbetween, or morphed, image is considered to be a point in the simplex.
It is generated by computing a linear combination of the input images,
with the weights derived from the barycentric coordinates of the point.
To reduce run-time computation and memory overhead, we define a central image
and use it as an intermediate node between the input images and the
inbetween image.
Preprocessing is introduced to resolve conflicting positions of
selected features in input images when they are blended to generate
a nonuniform inbetween image.
We present warp propagation to efficiently derive warp functions
among input images.
Blending functions are effectively obtained by constructing surfaces
that interpolate user-specified blending rates.
The polymorph algorithm furnishes a powerful tool for image composition
which effectively integrates geometric manipulations and color blending.
The algorithm is demonstrated with examples that seamlessly blend
and manipulate facial features derived from various input images.
[Download cga98.pdf (1241464 bytes)]
Proc. IEEE Intl. Conf. on Image Processing,
Santa Barbara, California, October 1997.
This paper describes a fast algorithm for nonuniform image reconstruction.
A multiresolution approach is formulated to compute a $C^2$-continuous surface
through a set of irregularly spaced samples.
The algorithm makes use of a coarse-to-fine hierarchy of control lattices
to generate a sequence of surfaces whose sum approaches the desired
interpolating surface.
Experimental results demonstrate that high fidelity reconstruction is
possible from a selected set of sparse and irregular samples.
[Download icip97.ps (873343 bytes)]
IEEE Trans. Visualization and Computer Graphics,
vol. 3, no. 3, pp. 228-244, July-September 1997.
This paper describes a fast algorithm for scattered data interpolation
and approximation.
Multilevel B-splines are introduced to compute a $C^2$-continuous surface
through a set of irregularly spaced points.
The algorithm makes use of a coarse-to-fine hierarchy of control lattices
to generate a sequence of bicubic B-spline functions whose sum approaches
the desired interpolation function.
Large performance gains are realized by using B-spline refinement to reduce
the sum of these functions into one equivalent B-spline function.
Experimental results demonstrate that high fidelity reconstruction is
possible from a selected set of sparse and irregular samples.
[Download tvcg97.pdf (1101623 bytes)]
IEEE Trans. Visualization and Computer Graphics,
vol. 2, no. 4, pp. 337-354, December 1996.
This paper describes an image metamorphosis technique to handle
scattered feature constraints specified with points, polylines, and splines.
Solutions to the following three problems are presented:
feature specification, warp generation, and transition control.
We demonstrate the use of snakes to reduce the burden of feature specification.
Next, we propose the use of multilevel free-form deformations (MFFD)
to compute $C^2$-continuous and one-to-one mapping functions among the
specified features.
The resulting technique, based on B-spline approximation, is simpler
and faster than previous warp generation methods.
Furthermore, it produces smooth image transformations without
undesirable ripples and foldovers.
Finally, we simplify the MFFD algorithm to derive transition functions
to control geometry and color blending.
Implementation details are furnished and comparisons among various
metamorphosis techniques are presented.
[Download tvcg96.pdf (2884596 bytes)]
Proc. Computer Graphics Intl. '96, Pohang, Korea, June 1996.
Image morphing has been the subject of much attention in recent years.
It has proven to be a powerful visual effects tool in film and television,
depicting the fluid transformation of one digital image into another.
This paper reviews the growth of this field and describes
recent advances in image morphing in terms of three areas:
feature specification, warp generation methods, and transition control.
These areas relate to the ease of use and quality of results.
We will describe the role of radial basis functions, thin plate
splines, energy minimization, and multilevel free-form deformations
in advancing the state-of-the-art in image morphing.
Recent work on a generalized framework for morphing among multiple
images will be described.
[Download cgi96.pdf (426407 bytes)]
Journal of Electronic Imaging, vol. 5, no. 1, pp. 50-65, January 1996.
Images scanned in the presence of mechanical vibration are subject to
artifacts such as brightness fluctuation and geometric warping.
The goal of the present study is to characterize these distortions and develop
a restoration algorithm to invert them, hence producing an output digital image
consistent with a scanner operating under ideal uniform motion conditions.
The image restoration algorithm described in this paper makes
use of the instantaneous velocity of the linear sensor array to reconstruct
an underlying piecewise constant or piecewise linear model of the image
irradiance profile.
That reconstructed image is then suitable for resampling under ideal
scanning conditions to produce the restored output digital image.
We demonstrate the algorithm on simulated scanned imagery with
typical operating parameters.
[Download jei96.ps (222812 bytes)]
[Download jei96_imgs.tar (1593344 bytes)]
Proc. IEEE Intl. Conf. on Image Processing, Washington, D.C., Oct. 1995.
SPIE Conf. on Document Processing II,
Proc. SPIE 2422, pp. 358-369, San Jose, CA, Feb. 1995.
Images scanned in the presence of mechanical vibrations are subject
to artifacts such as brightness fluctuation and geometric warping.
The goal of this work is to develop an algorithm to invert these
distortions and produce an output digital image consistent with a
scanner operating under ideal uniform motion conditions.
The image restoration algorithm described in this paper applies to typical
office scanners that employ a moving linear sensor array (LSA) or
moving optics. The velocity of the components is generally not
constant in time. Dynamic errors are introduced by gears, timing belts,
motors, and structural vibrations.
In this work, we make use of the instantaneous LSA velocity
to reconstruct an underlying piecewise constant or piecewise linear
model of the image irradiance function. The control points for the
underlying model are obtained by solving a system of equations derived
to relate the observed area samples with the instantaneous LSA velocity
and a spatially-varying sampling kernel. An efficient solution exists
for the narrow band diagonal matrix that results. The control points
computed with this method fully define the underlying irradiance function.
That function is then suitable for resampling under ideal scanning
conditions to produce a restored image.
[Download icip95.ps (67310 bytes)]
Proc. Siggraph '95, pp. 439-448, Los Angeles, CA, August 1995.
This paper presents new solutions to the following three problems in
image morphing: feature specification, warp generation, and transition control.
To reduce the burden of feature specification,
we first adopt a computer vision technique called snakes.
We next propose the use of multilevel free-form deformations (MFFD)
to achieve $C^2$-continuous and one-to-one warps among feature point pairs.
The resulting technique, based on B-spline approximation,
is simpler and faster than previous warp generation methods.
Finally, we simplify the MFFD method to construct $C^2$-continuous surfaces
for deriving transition functions to control geometry and
color blending.
[Download sig95.ps.gz (294427 bytes)]
SPIE Conf. on Document Processing II,
Proc. SPIE 2422, pp. 350-357, San Jose, CA, Feb. 1995.
Typical office scanners employ a moving linear-sensor array or moving
optics. the velocity of the components is generally not constant in time.
It may be modulated directly (at one or more frequencies) by dynamic errors of
gears, timing bets, and motors, and indirectly by structural vibrations induced
by gears, fans, etc. Nonuniform velocity is known to cause undesirable
brightness fluctuation and warping in the sampled image.
The present paper characterizes the image defects induced by nonuniform
velocity. A companion paper utilizes the degradation information to develop
an algorithm to restore the degraded image.
[Download spie95a.ps (78906 bytes)]
Graphics Gems IV, Ed. by P. Heckbert, Academic Press, 1994.
Convolution plays a central role in many image
processing applications, including image resizing, blurring, and
sharpening. In all such cases, each output sample is computed to be a
weighted sum of several input pixels. This is a computationally
expensive operation that is subject to optimization. In this gem, we
describe a novel algorithm to accelerate convolution for those
applications that require the same set of filter kernel values to be
applied throughout the image. The algorithm exploits some nice
properties of the convolution summation for this special, but common,
case to minimize the number of pixel fetches and multiply/add
operations. Computational savings are realized by precomputing and
packing all necessary products into lookup table fields that are then
subjected to simple integer (fixed-point) shift/add operations.
[Download ggIV94.pdf (141590 bytes)]
Dr. Dobb's Journal, no. 202, July 1993.
The files in this directory contain the code necessary to implement
a morph (metamorphosis) sequence. The process is based on a mesh warping
algorithm first introduced for a special effect sequence in the movie
"Willow" in 1988 [Smythe 90]. The algorithm, described in [Smythe 90] and
[Wolberg 90], has since been used in several films and commercials.
The self-contained code given here is adapted from a program listing
in [Wolberg 90].
The mesh warping algorithm is used to deform one image into another.
The input includes a source image I1 and two meshes, M1 and M2.
Mesh M1 is used to select landmark positions in I1, and M2 identifies
their corresponding positions in the output image. In this manner,
arbitrary points in I1 can be "pulled" to new positions. Although the
use of a (parametric) mesh might seem to place unnecessary constraints
on the positions of these points, a large class of useful transformations
is possible. It is important, though, that the mesh not self-interesect
in order to avoid the image from folding upon itself.
The benefit of using a mesh derives from the simplicity in interpolating
the new positions of intermediate points (between the mesh points).
A bilinear or bicubic function can be used. We use a Catmull-Rom cubic
spline to implement bicubic interpolation here.
There are two executables that the user can compile: warp and morph.
They are created by typing "make warp" and "make morph", respectively.
In "warp", I1 is simply deformed based on the correspondence points
given in meshes M1 and M2. In "morph", a second image I2 is used to
designate the target image. Not only is I1 deformed, but it simultaneously
undergoes a cross-dissolve with a warped version of I2 to create the
illusion of a metamorphosis.
The user must specify the number of frames to generate in this transformation.
The basic idea is that each frame in the transformation uses an interpolated
mesh M3 as the set of target positions for the input mesh points.
M3 is computed by performing linear interpolation between respective
points in M1 and M2.
The "warp" program actually plays an important role here since both
I1 and I2 are each warped using M3 as the target mesh.
Thus, I1 is warped using meshes M1 and M3. In addition, I2 is warped
using meshes M2 and M3. Now that the landmarks of the source and target
images are aligned, they are cross-dissolved to generate a morph frame.
FILES:
Makefile: dependency rules for creating "warp" and "morph"
meshwarp.h header file
warp.c: main function for "warp"
morph.c: main function for "morph"
meshwarp.c: workhorse mesh warping code
util.c: image I/O and memory allocation functions
catmullrom.c: Catmull-Rom cubic spline interpolation.
face.bw: source image
cat.bw: target image
face.XY: source mesh
cat.XY: target mesh
RUNNING THE PROGRAMS:
WARP:
After you type "make warp", an executable file called "warp" will
be created. You can invoke it by typing:
warp face.bw face.XY cat.XY out.bw
You may notice that the output has a distorted grid-like pattern on it.
This is not an artifact of the algorithm, but rather it is due to the
grid pattern that appears in the input after scanning it from a magazine.
MORPH:
After you type "make morph", an executable file called "morph" will
be created. You can invoke it by typing:
morph face.bw cat.bw face.XY cat.XY 10 out
This will create a 10-frame animation stored in files out_000.bw,
out_001.bw, out_002.bw, ... out_009.bw
COMMENTS:
This code works on grayscale images only. Extending the program to
handle 3 RGB color channels is straightforward.
The code is missing a program to help the user create and edit meshes.
A good mesh editor is a critical component to any mesh warping program.
That code is not given here because it falls outside of the scope of
this presentation. Instead, sample meshes face.XY and cat.XY are provided.
The reader should be aware that such an interface should allow the user
to control the cross-dissolve schedule at each mesh point, as well as its
position. This permits the intensities in different regions of the image
to interpolate at different rates.
REFERENCES:
- [Smythe 90]
-
Smythe, Douglas B., "A Two-Pass Mesh Warping Algorithm for
Object Transformation and Image Interpolation," ILM Technical
Memo #1030, Computer Graphics Department, Lucasfilm Ltd., 1990.
- [Wolberg 90]
-
Wolberg, George, Digital Image Warping ,
IEEE Computer Society Press, Los Alamitos, CA, 1990.
[Download dobbs93.tar.gz (105951 bytes)]
Proc. Computer Graphics Intl. '93, Lausanne, Switzerland, June, 1993.
This paper describes a fast algorithm for scaling digital images.
Large performance gains are realized by reducing the number
of convolution operations, and optimizing the evaluation of those that remain.
We achieve this by decomposing the overall scale transformation into
a cascade of smaller scale operations.
As an image is progressively scaled towards the desired resolution,
a multi-stage filter with kernels of varying size is applied.
We show that this results in a significant reduction in the number
of convolution operations.
Furthermore, by constraining the manner in which the transformation
is decomposed, we are able to derive optimal kernels and implement
efficient convolvers.
The convolvers are optimized in the sense that they require no multiplication;
only lookup table and addition operations are necessary.
This accelerates convolution and greatly extends the range of filters
that may be feasibly applied for image scaling.
The algorithm readily lends itself to efficient software and
hardware implementation.
[Download cgi93.ps (406608 bytes)]
CVGIP: Graphical Models and Image Processing,
vol. 55, no. 1, pp. 63-77, January 1993.
This paper introduces a new class of reconstruction algorithms that are
fundamentally different from traditional approaches. We deviate from the
standard practice that treats images as point samples. In this work, image
values are treated as area samples generated by nonoverlapping integrators.
This is consistent with the image formation process, particularly for CCD and
CID cameras. We show that superior results are obtained by formulating
reconstruction as a two-stage process: image restoration followed by
application of the point spread function (PSF) of the imaging sensor. By
coupling the PSF to the reconstruction process, we satisfy a more intuitive
fidelity measure of accuracy that is based on the physical limitations of the
sensor. Efficient local techniques for image restoration are derived to
invert the effects of the PSF and estimate the underlying image that passed
through the sensor.
The reconstruction algorithms derived herein are local methods that compare
favorably to cubic convolution, a well-known local technique, and they even
rival global algorithms such as interpolating cubic splines. Evaluations are
made by comparing their passband and stopband performances in the frequency
domain, as well as by direct inspection of the resulting images in the
spatial domain. A secondary advantage of the algorithms derived with this
approach is that they satisfy an imaging-consistency property. This means
that they exactly reconstruct the image for some function in the given class
of functions. Their error can be shown to be at most twice that of the
"optimal" algorithm for a wide range of optimality constraints.
[Download cvgip93.ps (139657 bytes; without figures)]
Proc. IEEE Conf. on Computer Vision and Pattern Recognition, June 1992.
Chromatic aberration is due to refraction affecting each color
channel differently.
This paper addresses the use of image warping to reduce the impact of
these aberrations in vision applications.
The warp is determined using edge displacements which are fit with
cubic splines.
A new image reconstruction algorithm is used for nonlinear resampling.
The main contribution of this work is to analyze the quality of the
warping approach by comparing it with active lens control.
Two different imaging systems are tested.
Computer Graphics (Proc. Siggraph '89),
vol. 23, no. 3, pp. 369-378, Boston, MA, July 1989.
Image warping refers to the 2-D resampling of a source image onto a
target image.
In the general case, this requires only 2-D filtering operations.
Simplifications are possible when the warp can be expressed as a
cascade of orthogonal 1-D transformations.
In these cases, separable transformations have been introduced to
realize large performance gains.
The central ideas in this area were formulated in the 2-pass algorithm
by Catmull and Smith.
Although that method applies over an important class of transformations,
there are intrinsic problems which limit its usefulness.
The goal of this work is to extend the 2-pass approach to handle arbitrary
spatial mapping functions.
We address the difficulties intrinisic to 2-pass scanline algorithms:
bottlenecking, foldovers, and the lack of closed-form inverse solutions.
These problems are shown to be resolved in a general, efficient, separable
technique, with graceful degradation for transformations of increasing
complexity.
Proc. Computer Graphics Intl. '88, Geneva, Switzerland, June, 1988.
Image warping refers to the 2D resampling of a source image onto a
target image.
Despite the variety of techniques proposed, a large class of image
warping problems remains inadequately solved: mapping between two
images which are delimited by arbitrary, closed, planar curves, e.g.,
hand-drawn curves.
This paper describes a novel algorithm to perform image warping
among arbitrary planar shapes whose boundary correspondences are known.
A generalized polar coordinate parameterization is introduced to
facilitate an efficient mapping procedure.
Images are treated as collections of interior layers, extracted via
a thinning process.
Mapping these layers between the source and target images generates
the 2D resampling grid that defines the warping.
The thinning operation extends the standard polar coordinate
representation to deal with arbitrary shapes.
Intl. Journal of Pattern Recognition and Artificial Intelligence ,
vol. 1, no. 3 & 4, December 1987, pp. 303-322.
Proc. IEEE Conf. on Computer Vision and Pattern Recognition,
pp. 168-173, June 1986.
This paper introduces a syntactic omni-font character recognition system.
The "omni-font" attribute reflects the wide range of fonts that fall
within the class of characters that can be recognized.
This includes hand-printed characters as well.
A structural pattern matching approach is employed.
Essentially, a set of loosely constrained rules specify pattern
components and their interrelationships.
The robustness of the system is derived from the orthogonal
set of pattern descriptors, location functions, and the manner in which
they are combined to exploit the topological structure of characters.
By virtue of the new pattern description language, PDL, developed in
this paper, the user may easily write rules to define new patterns for
the system to recognize.
The system also features scale-invariance and user-definable sensitivity
to tilt orientation.
Proc. IEEE Conf. on Computer Vision and Pattern Recognition,
pp. 570-575, Miami, FL, June 1986.
When bilevel images (such as printed text) are digitized, the result
is a gray scale image due to the averaging function of the scanner.
This paper presents a method for recovering the binary image.
It is based on the observation that the convolution of a step function
h(t) with a bell shaped function (such as the point spread function of
digitizers) is convex where h(t) is high and concave where h(t) is low.
A first pass assigns the value "black" or "white" to pixels that are in
regions where the intensity image is clearly concave or convex.
Subsequent iterations move the remaining pixels to these two values
according to their neighbors.
Examples of implementation are shown and a memory management scheme
is described that makes the algorithm feasible even when the size of
the image exceeds the available memory.
Pattern Recognition Letters, vol. 3, no. 6, pp. 375-388, December 1985.
This paper investigates the application of variations of Stochastic
Relaxation with Annealing (SRA) as proposed by Geman and Geman [1] to
the Bayesian restoration of binary images corrupted by white noise.
After a general review we present some prior models and show examples
of the application.
It appears that a proper selection of the prior model is critical
for the success of the method.
We obtained better results on artificial images which fitted the mode
closely than on real images for which there was no precise model.