Image stabilization remains a major challenge for video
cameras, from high-end cinema and broadcast units down through consumer camcorders. Although a variety of technologies now exist to stabilize images, they are typically complex and come at a steep price, making them impractical for most applications.
Yet some end users often swallow that cost simply because the alternative can be more expensive. For example, an intricate shot on a movie set could cost hundreds of thousands of dollars to recreate if the first take can't be used because it turned out to be too shaky.
Of course, not every end user can justify that expense. So what's needed is a solution that can scale from the low end to the high end, with no trade-offs along the way in terms of price and performance. That's a tall order, but meeting it creates a huge market opportunity. For example, besides applications such as broadcast, cinema and consumer cameras, the technology also could be used in verticals such as government and security.
To understand why it's such a challenge, consider the two fundamental variables. First, there are frequently multiple people in close proximity of each other in a single frame, so the system must be able to differentiate between each of them.
Second, there are often abrupt changes in luminance when the iris suddenly opens or closes while video is being recorded. These abrupt changes create what's known as "luminance imparity" between successive frames, which makes it complex to match frames. In low-light conditions, the limits of the field of view within a frame could change significantly due to a camera's auto-exposure feature. And if the application requires long-range zoom, that adds vibration and jerkiness, which means more work for the stabilizer.
The Level 1 (L1) metric -- which the industry uses for comparing frames -- also frequently introduces jerkiness or other anomalies, compounding the problem. And if the camera is handheld, airborne, mounted on a moving platform or mounted atop a tall building -- as many TV station "city cams" are -- there's even more jerkiness.
These environmentally induced anomalies are a major problem in one of today's fast-growing video applications: traffic cameras, deployed by municipalities and transportation departments to assist with traffic management. These cameras often are mounted on poles, so they're exposed to the wind, which causes vibration. Others are mounted on bridges, whose steel framework transfers the vibrations of passing vehicles to the cameras. Traffic cameras typically have zoom lenses, which compound the problem by turning even small vibrations into major jerks in the video. As a result, superior yet affordable image-stabilization features are a powerful market differentiator for camera vendors that target applications such as city cams and traffic cams. Figure 1 shows the improved image clarity that a stabilizer can provide in applications such as traffic cameras.
View full size
Figure 1: Traffic camera without a stabilizer and with a stabilizer
NEXT: Mechanical vs. Electronic Stabilization
Mechanical vs. Electronic
Although there are a variety of techniques for stabilizing images, they all can be grouped into two types: electronic and mechanical, with some systems using a combination of the two.
Mechanical stabilization typically uses hardware such as gyroscopes, gimbals, "squishy" prisms or a combination of these to identify and correct camera jitter. Major downsides include cost and, in some cases, noise. For example, nearly all of the 1981 film Das Boot had to be dubbed because the camera gyroscopes -- necessary because of the constant rocking and vibration inside the submarine -- produced too much noise to use the audio acquired during filming. That also shows how mechanical stabilization can have costs beyond the price of the equipment itself -- extra post-production costs, in the case of Das Boot.
Hardware such as gyroscopes and gimbals also add weight and size, both of which can be a drawback for applications such as airborne surveillance. That's an example of how mechanical stabilization can limit market opportunities.
These are among the reasons why end users typically prefer electronic stabilization. One technique -- common in consumer camcorders -- is to take an image produced by the Charge-Coupled Device (CCD) and store it as a sort of baseline. The system then compares all subsequent frames to that baseline.
For example, if the comparison finds that everything in the subsequent frame is shifted in the same direction relative to the baseline, then the system assumes that the camera was moved. But if the comparison finds that some objects in the frame are each moving in different directions -- such as a handheld broadcast camera trained on a group of people walking -- then the system knows that the objects, not the camera, moved.
In the latter case, the system provides stabilization by shifting the subsequent frame in the opposite direction that it appears to be moving. The catch is that this trick doesn't work well in all cases.
For example, if the moving object takes up a major portion of the camera's field of view, the system will lock onto the object simply because it dominates the field of view. That correction creates the sense that the object is fixed and that the background is moving.
That example also is why some systems combine mechanical and electronic stabilization, such as by adding gyroscopes to detect whether the camera is moving. That helps, but the extra hardware adds cost and can be a problem if the application requires a compact, lightweight and silent camera.
NEXT: Digital Dramamine
Multiple electronic stabilization techniques have been available for years, but two reasons why they haven't been widely used are the cost and abilities of the computer processing used to correct the images. But today, processing has improved exponentially even as it has decreased in cost, all to the point that more sophisticated electronic stabilization techniques have become practical.
One example is Human Monitoring's Leonardo system. Based on Texas Instruments' DM6467 DSP, Leonardo is designed for use in a wide variety of applications and devices, including broadcast and cinema cameras, video application processors, PCs and camcorders. Figures 2, 3 and 4 show where Leonardo fits into these applications.
Figures 2 and 3: Applications for Leonardo in pre-acquisition and post-acqusition.
Figure 4: Applications for Leonardo in parallel pre-acquisition processing.
Figure 5 illustrates Leonardo's first phase of stabilization in pre-acquisition mode, where oversized frames are aligned temporally by shifting them up or down and left or right. The aligned frames are then cropped to the desired output resolution.
Figure 5: Stabilization Technique One for Pre-Acquisition Mode.
The second phase of Leonardo's stabilization technique occurs post-acquisition. As in the first phase, oversized frames are aligned temporally by shifting them up or down and left or right. But in the second phase, the aligned frames provide parts that are missing from neighboring frames, with no change to the resolution. Figures 6 illustrates Leonardo's post-acquisition mode.
Figure 6: Stabilization Technique Two for Post-Acquisition Mode.
To see a video of these techniques in action, visit human-monitoring.com/paistab.html.
NEXT: Inside the Algorithm
Inside the Algorithm
Unlike other image-stabilization algorithms, Leonardo's is resolution-agnostic: As resolution increases, the algorithm doesn't require additional computational power to keep up. This highly efficient design reduces the solution's processing requirements, which is a major reason why Leonardo enables affordable image-stabilization -- even for cameras designed for the price-sensitive consumer market.
The Leonardo stabilization algorithm consists of two stages: global motion estimation, followed by frames alignment. The algorithm's key advantage is its ability to carry out the search procedure using a very large search window, yet with very low computational requirements.
For the purpose of obtaining a pixel-accuracy match, the search is conducted only at the full resolution space. The search range can be extended up to +/-20 percent of vertical-horizontal frame dimensions.
The frame resolution has little impact on the algorithm's performance. Depending on the desired resolution, the search window can accept values of +/-144 pixels for standard-definition (SD) resolution or +/-256 for high-definition (HD) resolution. This can be easily extended to even higher resolution, such as super HD, with minimal effort. Based on tests conducted by Human Monitoring, the amount of computations needed to perform a "pixel accuracy" global motion -- between a pair of frames -- in HD resolution is only about 20 percent higher than in SD.
The alignment algorithm controls the motion smoothness and amount of latency of the overall stabilization process. Using past, present and (possibly) future frames-displacement information, the algorithm builds a predictive alignment curve, upon which the frame's "uncorrected" coordinates are projected. The latency of the process is a function of the number of present and future frames that are stored in the memory for building the predictive alignment curve. The more frames that are stored -- in order to achieve higher quality -- the more accurate the predictive curve is, with the trade-off of increased latency.
However, increased latency is acceptable in most applications and does not impact user experience. For example, it's acceptable for video editing projects that require the highest quality and monitoring systems.
The image stabilizer is configurable to meet each application's requirements. This design enables a varying number of future frames to be used. For a progressive scan of 30/25 frames per second (fps), the corresponding minimum latency is 38/45 ms (for NTSC/PAL format), whereas for interlaced scan of 50/60 fields/sec, the minimum latency is 19/22 ms.
Feature vs. Non-Feature
To understand how Leonardo achieves its performance, it helps first to understand the two main types of methods for calculating global motion estimation: feature-based and non-feature-based methods. Feature-based methods usually extract a sparse set of distinct features from each image separately and then determine the motion parameters, typically by estimating the distance between a few corresponding features.
Most real-time global motion applications for video stabilization resort to feature-based methods. The problem with a feature-based method is that it relies heavily on the existence of distinct features, which typically are sub-images that have high- energy or high-variance content. The absence of distinct features in an image may lead to estimation errors. Moving objects within the frame can easily mislead such algorithms, as well.
Non-feature-based methods attempt to estimate the global motion by minimizing the error measure -- typically using L1 as the norm -- that is based on direct image information collected from all or most of the pixels of the image. A critical implementation issue concerning a non-feature-based global motion estimation is its significant computational complexity, making it useless for real-time applications.
To reduce the calculation complexity burden, non-feature-based algorithms usually employ coarse-to-fine processing, using iterative refinement within a multi-resolution pyramid. Experiments have shown that plain coarse-to-fine algorithms may often lead to erroneous convergence, and that adaptation techniques for robust estimation are required to improve convergence. (For more information, see Black, M. J. and Anandan P., 1996, "Robust Estimation of Multiple-Motions: Parametric and Piecewise-Smooth Flow Field," Computer Vision and Image Understanding, vol 63, No 1,75-104.)
Human Monitoring chose to use a non-feature-based algorithm, running on the Texas Instruments DM6467 DSP, in order to successfully estimate the global motion even in the presence of multiple objects of significant size moving randomly within the frame. Figure 7 shows a block diagram of the Texas Instruments DM6467 DSP.
Figure 7: The Texas Instruments DM6467 DSP.
In the example at http://human-monitoring.com/paistab.html, the stabilizer application determines the dominant motion -- that is, the global motion -- even in the presence of multiple motions of persons in the frame and while the cameraman is walking and turning. Even so, Leonardo is highly efficient in terms of processing power required to perform these tasks. For example, the application uses only about 40 percent of a Texas Instruments DM6467 chip's 600 MHz processing power even in a processing-intensive application such as stabilizing of 1080p at 60 frames per second (fps).
NEXT: DSP's Advantage
The DSP Advantage
The Texas Instruments DaVinci 6467 system-on-chip (SoC) includes a digital signal processer (DSP): the TMS320C64x+. DSPs are ideal for image stabilization because of their high performance, including support for intensive processing in real time.
DSPs are programmable devices, which means they can be software-upgraded in the field to support the latest and greatest algorithms or to fix bugs. As a result, they give both camera vendors and end users a level of future-proofing, which is a major plus for a device that typically remains in service for several years, if not longer.
When designing DaVinci, Texas Instruments optimized it for demanding, real-time video applications. DaVinci's high performance also can be seen in its support for video transcoding in commercial and consumer applications, where it delivers a 10x performance improvement over previous generation processors to perform simultaneous, multi-format HD encode, decode and transcoding up to H.264 HP@L4 (1080p 30fps, 1080i 60fps, 720p 60fps). (For more information about DaVinci, visit www.ti.com/corp/docs/landing/davinci/index.html. For more information about the DM6467, visit http://focus.ti.com/docs/prod/folders/print/tms320dm6467.html.)
Human Monitoring saw the Texas Instruments DM6467 as the most appropriate choice for developing the Leonardo stabilizer application. One reason is that the DM6467 provided a powerful yet cost-effective way to support HD, which is the way that the video market is headed, as well as super HD. Another reason is that as a DSP, the DM6467 was easier to program than alternatives such as FPGAs.
The DM6467 also provided a variety of features that enabled a quick and accurate development cycle, leading to a very stable application. These features include:
- Subsystems such as large internal memory (L1P , L1D).
- An enhanced direct memory access (EDMA) controller to transfer data between address ranges in the memory map without intervention by the CPU.
- An instructions set that supports SIMD multimedia operation.
- A video data conversion engine (VDCE) that allows operations such as a down-scaler and chrominance signal format conversion.
DSPs are widely used for processing-intensive applications in a variety of industries. For example, in health care, X-ray, magnetic resonance imaging (MRI) and ultrasound systems use DSPs to enable richer images that in turn lead to more accurate diagnoses and improved patient care. DSPs also are common in telecom, where they're used for third- and fourth-generation (3G/4G) wireless network infrastructure and user devices.
These applications are noteworthy for a couple of reasons. First, they show how DSPs have proven their value and performance in some of the world's most demanding applications. Second, this adoption has spurred DSP innovation, so developers in other industries -- such as video -- have a wide selection of DSPs to choose from for applications such as image stabilization.
It's important to note that despite all of the processing power that DSPs bring, Leonardo doesn't use all of that power simply because it's there. Just the opposite: The algorithm is designed to minimize the number of computational cycles, making it lean and efficient.
A single-chip solution also is ideal for enabling advanced image stabilization in camcorders and other consumer video devices. That's another example of how Texas Instruments DSPs and Leonardo are taking features that once were available only to professionals and making them available to a wider range of video content producers.
About the authors
Rajesh Pal is Manager of video infrastructure solutions for Texas Instruments. He has led strategy and business development for TI's broad portfolio of innovative video infrastructure products, which use multiple digital signal processors (DSPs) to create advanced equipment for telecommunications applications. Pal has established strong long-term relationships with the top video infrastructure equipment manufacturers, solution providers, and broadcast industry leaders worldwide. After years of managing and engineering experience at Motorola, Tellabs and Mangrove Systems, Pal began his career at TI in 2005 by managing customer support in the Voice over Internet Protocol (VoIP) residential gateway market. Pal published a paper in the Motorola Technical Journal entitled "Routing Optimization Using Mobile IP in Wireless Networks." He also has a patent pending on a mechanism for protecting automatic switching. Pal graduated from Columbia Business School in New York, where he earned his MBA. He received his Masters in Computer Applications from the National Institute of Technology in India. He can be reached at email@example.com.
Dr. Nitzan Rabinowitz is CTO of Human Monitoring. Dr. Rabinowitz is an acclaimed scientist, with more than 30 years of experience with algorithms development for geophysical applications. Rabinowitz received his Ph.D in seismology from Uppsala University, Sweden. After serving 20 years in the Geophysical Institute of Israel as a senior researcher in seismology, he teamed up with Ira Dvir in Moonlight Cordless, to develop advanced video compression technology. At Moonlight he served as the CTO of the company, after leading its algorithms development for another two years. Rabinowitz has a wide range of experience in working with modern technologies that are concerned with human-machine interface (Neural Networks, Fuzzy Logic, Cellular Automata, and Non Linear Optimization). Rabinowitz is the co-founder of Human Monitoring and servers today as the CTO of the firm. He can be reached at firstname.lastname@example.org.
Ira Dvir is VP of research and development, Human Monitoring. Ira Dvir is a world-renowned expert and pioneer in the field of video compression technologies. As co-founder of Moonlight Cordless Ltd., Dvir developed algorithms with Dr. Nitzan Rabinowitz, which made the company a brand-name for itself as a world leader of both MPEG-2 and AVC compression, in the PC and VLIW platforms. After leading Moonlight's R&D for three years, he became the CEO of Moonlight for the past two years, bringing it to world-attention by partnering with Philips, and licensing its compression technology to Cyberlink, NEC, Panasonic, and many other CE manufacturers. Previously, he had a successful career as a playwright and screenwriter, where he wrote TV series for the Israeli Channels, and had his plays produced by the Israeli National and Hakameri Theatre of Tel Aviv. Mr. Dvir received his MA Cum Laude from Tel Aviv University. Dvir is the co-founder of Human Monitoring and serves as its VP of research and development. He can be reached at email@example.com.