LoCoBoard: A low-cost interactive whiteboard using computer vision algorithms

Abstract

In the current digital age, the use of natural interfaces between Humans and machines is becoming more imperative than ever. This is particularly important in education institutions where the utilization of interactive tools and applications may convey new advantages (e.g., facilitate the exposition and the comprehension of complex concepts, stimulate collaborative work, improve teaching practices, etc.). The Interactive Whiteboards (IW) are becoming useful tools and their use is increasing in various levels of education; however, these solutions are usually expensive, making their use and dissemination slow, especially in countries with more fragile economies.

In this context, the LoCoBoard (Low-Cost interactive whiteBoard) project proposed the creation of an open-source IW, with low-cost hardware requirements (cf. a webcam-equipped computer, a video projector and an infrared pointing device). Here we discuss the physical and logical structure of the proposed IW prototype. An analysis framework is also presented for comparing the efficiency of the prototype with respect to related systems. We believe that the proposed solution may easy the access to IWs and consequently increase its use and widespread.

Introduction

Over the past decades the technology evolution has been remarkable but, Human-Computer interaction is still being realized, in most cases, through the traditional keyboard, monitor and mouse. The future, however, should rely more on the use of natural interfaces, such as, speech, gesture and hand manipulation [1].

In particular, the use of IW in education scenarios is becoming a major trend. Such IW may help exposing and manipulating complex concepts, allow collaborative work between teachers and students and improve pedagogical practices [2]. Although there are a wide range of business IW solutions, these are generally too costly thus, making their use restrict to few education institutions.

The main goal of this project was the creation of a software-based IW solution, using only a projector to visualize the computer screen images on the wall, a computer equipped with a webcam to capture the projected image and an Infra-Red (IR) pointing device to generate the interaction points. Another important goal of this project focused on providing a cross-platform solution to increase acceptation and dissemination.

The LoCoBoard prototype uses several vision computer algorithms for processing captured frames and detecting IR interactions. The application adapts itself to the environment light conditions (cf., Background Subtraction) and focus, and returns the coordinates of the interaction points. These coordinates are then distributed to other applications, through TUIO, or used directly to control the cursor/interaction. The developed prototype demonstrates that with the appropriate software running on a common hardware platform, it is possible to obtain a generic, cheap and useful IW system for any classroom.

One of the most important parts of the system is the algorithm for the detection of blobs. Blobs are a set of pixels with common characteristics that can be isolated on a frame. This project built also a comparison between the developed detection system and other similar platforms like Touchlib [3],[4] and Tbeta [5]. The prototype uses several detection algorithms that were compared against each other according to different criteria: performance, precision and CPU consumption.

LoCoBoard – Low-Cost Interactive Whiteboard

This section presents the architecture and functionality of the system. One of its major architectural elements is the BLOBs detection module which uses several algorithms that are also compared in this section.

Architecture

The system is composed of an open-source software, installed in an ordinary computer, a webcam (or any other type of generic camera), a pointing device with an embedded infrared source and a mechanism of projection (video projector). The robustness and simplicity of use are two main factors for user acceptance of the system (i.e., usability).

The main objective of the camera is to “sense” (i.e., see) the ambient, by capturing a video flux composed by the infrared radiation emitted from the pointing device. A simple way to achieve this is to put in front of your camera lens, a light filter that only let pass infrared light, such as photographic film. We use as pointing device, a pen with an IR LED mounted in the end and a battery (power) accommodated inside the pen.

The first phase is calibration in order to establish a scale factor between the resolutions of the whiteboard (i.e., the projection) and the computer; this will allow us to map the pointing device movements over the projection. The output of the system allows mouse interaction on the computer through the displacement of the pointer on the projection or to report the coordinates using TUIO protocol [6-8].

The block diagram (Figure 1), is composed by four main tasks that our system will process to be succeeded. The first one was the acquisition of an image by the camera, this will be generic for each camera that would be supported by the framework used, i.e., OpenCV [9-12]. Then we will apply some filters to improve the image quality, for a better recognition process of the IR Blobs. We use adaptive background subtraction methods to isolate the foreground actions, i.e. the interaction of the IR pointing device in front of the projection. The third phase will focus on the algorithms that we have developed for detection and tracking of IR Blobs. We will focus more deeply forward on the algorithms implementation. Finally in the last phase we will report the coordinates of the identified IR Blobs, if there isn’t any blob it will return (-1, -1).

Algorithms

This section describes how the analysis is performed for each image to detect the interaction; this position is returned using Cartesian coordinate’s format (x-axis, y-axis). A video is composed by a sequence of consecutive images. Each image represents a matrix of pixels. Each pixel has three values, representing the components red, green and blue of the RGB color space. Converting an image to grayscale, we only have one value for each pixel. In real time processing, for each image, this transformation appears to be essential, the analysis can be up to three times faster than with color images (fewer pixels to analyze decrease the CPU cost). Moreover, the application is unable to recognize and distinguish different colors. Another aspect to be considered is the coordinate of the Infrared pointer. These are reported using for origin the upper left corner pixel of each image acquired from the camcorder (0,0), and the maximum coordinate is defined as (image width, image height). Depending on the resolution of the camera, we can work on different resolution sizes.

Below we will give a brief description of the five algorithms for detection and monitoring BLOBs generated by the IR pen. We can see in Figure 2 the common legend that will be used for them all.

Simple Algorithm for Detection (A1)

This algorithm carried out a scan of the image pixel by pixel, as illustrated in Figure 3, to find a value exceeding a threshold or a maximum acceptable value set by the user during the startup. This algorithm collects the coordinates of all points that have a value above the threshold and returns the average of the values in x-axis and y-axis.

Simple Algorithm for Detection with Jump (A2)

This algorithm is similar to the previous, but it will allow setting a jump value S. Thus, between two pixels readings, the algorithm ignores S-1 pixels. The process to calculate the coordinates of PI is the same as the previous algorithm, ie, reporting the average of the values collected in x-axis and y-axis. This algorithm provide more accurate results of process BLOB center, when the value set for the jump S is small.

Simple Algorithm for Detection with Jump – second version (A3)

The differences between this and the previous algorithm are focus to reduce the cost of processing each image, in a specific case, when there is a PI trough consecutive frames and it isn’t in the right corner of the frame (image width, image height).

The algorithm starts search the frame using the jump value that was defined by the user (cf., algorithm A2), when a pixel is found with value above the threshold, it will stop searching, and applies an sub-algorithm in that area to determine the coordinates of the PI (see Figure 4).

To find the PI’s center, it will search in the four possible directions from the point found. First in horizontal and it determines the maximum and minimum along this line. Then it applies a search in vertical using for origin in the x-axis, the average between the minimum and maximum found earlier, and the y-axis, the same value of the original point. With the same technique it finds a maximum and a minimum on the vertical, or y-axis. This two set of values ([min x-axis, max x-axis]; [min y-axis, max y-axis]) provides the center of the PI (see Figure 5).

Algorithm with Prediction and Spiral Approximation (A4)

This algorithm proposes a different approach from previous. It focuses not so much in the detection but tracking the PI. It should present better results when a point moves over consecutive images (drag operation of the pointing device). We use in this algorithm a vector, which stores information about his displacement, i.e., the difference of its position between two consecutive images. Consider the following three consecutive frames: F1, F2 and F3. All contains a PI moving on, which has the respective coordinates, p1 = (x1, y1), p2 = (x2, y2) and p3 = (x3, y3). The value of the vector displacement of the PI between F1 and F2, Δs is calculated as follows:

Δs = (Δx, Δy) = (x2-x1, y2-y1)

We can then reuse Δs to predict the future position of p3, based on this prediction we apply spiral search to find the final coordinates of the PI in F3. We expect the displacement remains in consecutive frames (Δs value), that assumption is valid unless it ends, abruptly, the operation of drag. The estimation for the new coordinates of the PI in F3 is calculated using the formula:

Prevision = P2 + Δs = (Δx + x2, y2 + Δy)

Based on this prevision, we will start searching in frame following our spiral model. The major difference consist to used the estimates p3 that we have just calculated, instead of the origin (0, 0). We use a spiral mechanism to find as quickly as possible the solution, i.e., a coordinate with a value above the threshold. The traditional mechanism we mainly used to search in spiral would be very dramatic in the CPU cost, so we opted to pre build a table (lookup table) with the distance in x-axis and y-axis to simulate the movement of a spiral as can be seen in Figure 6.

When a PI disappears, an expansion of the spiral to the whole image can be inefficient when compared to linear methods of search. The reasons are related with the image memory storage. Where a spiral search repeatedly performs jumps between various areas of memory, a linear search area analyzed consecutive storage position; which allow the process to be played faster. Therefore we decided to limit its expansion with a parameter that corresponds to the maximum number N that corresponds to the levels allowed. When the algorithm doesn’t find anything on the search spiral area and it still remain an area of the image to overview, it switch to a linear approach to research and verify the presence or absence of a PI in the remaining pixels in the picture.

Multipoint Algorithm (A5)

This algorithm allows a simple linear scan of the image collecting information of all these PI as we can see in Figure 7. Then it analyzes the values collected and determined different groups in order to know how many points you have in each image. Finally it analyzes these groups to know what are the coordinates of its centroid, as is illustrated in Figure 8. The algorithm described has been adapted from the algorithm described by Erik van Kempen [13]. The author, in [13], emphasizes the time efficiency of this technique used for the recognition of several points of interest in a single scan image.

Source Code

You can find all the C++ source at http://code.google.com/p/locoboard/

Full Paper

You can find the full paper at http://dx.doi.org/10.1155/2013/252406 or http://bdigital.ufp.pt/handle/10284/1228.

References

[1] S. Ballmer, “Microsoft at International Consumer Electronics Show 2009 Virtual Pressroom,” Ballmer on Natural User Interface, 2009.

[2] Ekbutechnology, “Interactive Whiteboards and My Teaching Goals,” Interactive Whiteboards and My Teaching Goals.

[3] D. Wallin, “Wallin’s Webpage – Touchlib,” Touchlib Homepage, 2008.

[4] NUI Group, “Touchlib,” Touchlib – Home.

[5] NUI Group, “CCV – Tbeta,” Comunity Core Vision, 2008.

[6] reacTIVision, “Protocol TUIO – Tangible User Interface,” TUIO.org, 2009.

[7] TUIO.org, “TUIO Protocol Specification 1.1,” TUIO Protocol Specification 1.1.

[8] T. Bovermann, R. Bencina, E. Costanza, and M. Kaltenbrunner, “TUIO: A protocol for table-top tangible user interfaces,” 2005.

[9] M. Ferreira and A. Moraes, “Tutorial OpenCV,” 2007.

[10] G. Bradski and A. Kaehler, Learning OpenCV: Computer Vision with the OpenCV Library, O’Reilly Media, Inc., 2008.

[11] Vadim Pisarevsky, “Introduction to OpenCV,” 2007.

[12] Intel®, “Intel® IPP – Open Source Computer Vision Library (OpenCV),” 2009.

[13] E.V. Kempen, “Blob detection V: growing regions algorithm,” 2008.