Real Object Recognition and Augmentation for n-Screen Convergence Service

It is very challenging issue to converge n-screen media (Web, IPTV, SmartPhone) which are encouraged by AR, motion detection, hand shape recognition and non-marker object tracking technologies. Our services converge virtual/real world, individual/social relationship and in-door/out-door space pragmatically and intimately. We propose to-ken-sharing mechanism among media for guarantying synchronization and continuity of playing contents. Web side service is main region for educational works. User develops his/her and friend’s territories as solving educational quests. Our mobile out-door service traces the objects by not a marker but a non-marker image tracking for according the feeling of realistic response during collaboration activities to users. The in-door IPTV service interacts with user’s intuition simultaneously thru motions and hands recognition technologies. These approaches will be one of initiative applications to make virtual and real space seamless, and will open the new field of learning system as last.


Introduction
We are embarking on the exploration of converging spaces, media, screens and persons that can make our cognition seamless, continuous and simultaneous. Especially in educational environment, these works are very useful for consolidating learning, thinking and acting of students. As making borders of online and offline vague, our experiments force students to be motivated to study naturally and logically.
Our service, one of educational adventure game, connects virtual/real world, individual/social relationship and in-door/out-door space pragmatically and intimately. We implemented web-based social network contents on an in-door personal desk, group competition contents embedding motion detection and hand tracking on an IPTV in a classroom and mobile contents encouraged by Augmented Reality on mobile devices like IPhone or IPad operated in out-door places. User develops one's and friend's territories as solving educational quests. The SmartPhone service traces the objects by not marker but non-marker image tracking to feel realistic response during collaboration activities. The IPTV interacts with user's intuition simultaneously thru motions and hands recognition methods.
For guarantying completed synchronization and continuity among every media and its progressing contents running under each medium, our services propose token-sharing mechanism among media. Tokens are accessed/transmitted from/into media. Each medium has up-to-date values every time and every place to play the game. The up-to-dated values of tokens are stored in web-side database by user.
The remainder of this paper is organized as follows. Section 2 compares with existing related work. Section 3 introduces our services including not only overall system structures and flow of development of the services, but also convergence protocol among various media. Finally, we conclude with a note on the current status of our projects and future works.

Related Work
A variety of research efforts have recently explored n-screen convergence and computationally augmented interfaces that emphasize human interaction.

N-screen Convergence
The SMILES (Smartphones for Interacting with Local Embedded Systems) project proposes the use of Smart Phones as universal remote controllers [1]. They define a service discovery protocol built on top of Bluetooth SDP, an interaction mechanism to operate over the services discovered, and a payment protocol to pay for their use. Personal Server is a small-size mobile device that stores user's data on a removable Compact Flash and wirelessly utilizes any I/O interface available in its proximity (e.g., display, keyboard) [2]. Its main goal is to provide the user with a virtual personal computer wherever the user goes. Unlike Personal Server which cannot connect directly to the Internet, Smart Phones do not have to carry every possible data or code that the user may need; they can obtain on demand data and interfaces from the Internet. CoolTown proposes web presence as a basis for bridging the physical world with the World Wide Web. For example, entities in the physical world are embedded with URL-emitting devices (beacons) which advertise the URL for the corresponding entities. Our model proposes web service presence and makes use of only off-theshelf hardware [3].

Non-marker based Object Tracking
[4] used both edges and optical flow without the need of a known motion model, which is the case of most AR applications. Texture based feature extraction and optical flow tracking were also joined together in a multithreaded manner in [5]. Another approach to speed up the tracking is to use only a subset of the template pixels for pose calculation, which can be selected previously in an offline phase. [6] proposed the Selective Pixel Integration, where the pixels to be used are randomly selected from the ones that contain more texture information. There was initiative trial to apply object tracking technology into online dice and TCG game [7,8,9].

n-Screen Convergence Services
Our service converges media, spaces and contents, and so user can do work continuously and seamlessly in any places with dedicated media. The SmartPhone@out-door, IPTV@classroom and Web@home,are key media for our services as shown in Figure 1. User develops one's and friend's territories as solving educational quests. The SmartPhone service traces the objects by not marker but non-marker image tracking to feel realistic response during collaboration activities. The IPTV interacts with user's intuition simultaneously thru motions and hands recognition methods.

Service Overview
Our token-sharing mechanism guarantees synchronization and continuity of educational contents. Each medium gets and puts values defined as following Token structure. These values are stored in database by user. Whenever the medium needs the values of token it can access the database and acquire up-to-date one.
typedef The flow of development of the services is depicted in Figure 2. A sky-blue circle represents Web@home duty, yellow circle on SmartPhone@out-door and green circle on IPTV@classroom. User can achieve a card, which is a key for making magic-book, by self-playing on Web@home service, by solving group competition quiz on IPTV@classroom and by finding predefined objects on SmartPhone@out-door.

SmartPhone@outdoor Service
This service supports out-door field activities which may be collaborated among friends. After arriving at the destination user finds predefined animals or historic relics. To detect specified features most important thing is select the key-point of the image. This finds high-contrast region like an edge. The red or blue circles in Figure 3 and 4 represent the extracted key-points for the source object and target scene respectively.
This non-marker based object tracking takes two phases; off-line training and real-time recognition. The process of object recognition and tracking by each phase is descripted in Figure 4. We've got big hints from previous researches [10,11,12,13,14] for it.
As a result of implementation of the two phase algorithm, we call it Image Hot Spot Extraction Algorithm, this service can detect and trace the object in any orientation, within tilted pose about 60 degree above and below and in complicated environment. Please refer Figure 5. The 3D model can be augmented on the source object detected within target scene nicely. The model is translated, rotated and scaled by xyz 3 axis. The degrees in top-right corner of Figure 6 represent rotated angles by 3 axis. The degree value represents how many degrees are rotated of target object compared to source one. Using these recognition and augmentation technology we demonstrated educational App-application which encourages outdoor activities of students. A student takes mission to solve quest from Iphone-based game. When the student gets a pre-defined sign in pre-defined location he/she can take word card which is one of key to open magic-book. The flow to get a card is in Figure 7.

IPTV@classroom Service
This service supports group competition quiz in the classroom supervised by a teacher. As soon as a question is taken on the IPTV students raise his/her hand. The hand tracker picks out the fastest one. My skin color blob matching method can detect various shapes and motions of hands [15]. This hand tracking method detects skin color, moving area, face and hand as shown in Figure 8. Experiments have been performed in many live demonstrations and shown very good tracking performance with near frame rate speed. Figure 8 shows some tracking results with hand segmentation. One of main feature of the proposed algorithm is robustness to the fast and large movement. Figure 8(a-d) shows the successful tracking for fast movement that causes motion blur.

Conclusions
We proposed n-screen convergence services which combine Web, IPTV and SmartPhone medium. These are encouraged by AR, non-marker object tracking, motion detection and hand tracking. These approaches will be one of initiative applications to make virtual and real space seamless, and will open the new field of learning system finally. For guarantying completed synchronization and continuity among every media and its progressing, our services also proposed token-sharing mechanism among media. Tokens are accessed/transmitted from/into media. Each medium has up-to-date values every time and every place to play the game We are still working on following areas.
• New digital interface • Applying 3D to education • Virtual advertisement in produced videos • Interactive mechanism to make real and virtual world seamless