Monday, November 29, 2010

Reading #13: Ink Features for Diagram Recognition

Summary

In this work the objective is to distinguish text and shapes. The authors extract a set of features from various related previous works, their own set of features and features from newly available hardware such as pressure. They employed a set of 9 diagrams and asked 26 people to draw them. For each sketch 46 features where calculated. Next a decision tree was built recursively using Gini index on the data to partition the data into classes.
They finally reached a set of key feature set to efficiently distinguish between the two classes. These were: Time to next stroke, speed till next stroke, distance from last stroke, distance to next stroke, bounding box width, perimeter to area, amount of ink inside and total angle.
They tested their classifier with two others, namely Microsoft Divider and InkKit. The average misclassification rate for texts and shapes in their system was reported about %9.8 which is significantly better than the average misclassification rate of the other two systems.

Discussion

As noted by the authors, half of the extracted key features were related to inter-stroke gaps which seems to be a deciding factor here.
It would have also been better to compare the decision tree method applied in their system with other classification methods such as neural networks, nearest neighbor, etc.

Sunday, November 28, 2010

#Reading 8: A Lightweight Multistroke Recognizer for User Interface Prototypes

Summary

This paper introduces $N which is an extension of $1 sketch recognizer. The objective of the paper is to introduce a lightweight and quickly deployable multi stroke recognizer for interface creators.
The method build on top of $1 method. In $1, the candidate stroke and the templates strokes’ points are all normalized equidistantly. Next, they are rotated, scaled and translated to provide invariance. The rotation matching uses Golden Section Search to find the optimum angle in the proximity of an initial estimate. Finally the distance between a candidate and a template is the sum of their point-wise distances.
In $N the scaling preprocessing decides whether to scale uniformly or non-uniformly each gesture based its aspect ratio and a threshold. The rotation invariance is also bounded to a user-defined angle range.
A problem with multi stroke gestures is different possible stroke orders. $N generates all possible variations of drawing a template which is all order permutations and for each permutation, each single stroke can be drawn forward or backward which results in 2nxn! different cases for a multi stroke with n strokes. As a speed optimization, the candidate gesture is not compared with all the generated gestures, but with the ones having almost the same starting angle (the angle between the gesture’s centroid and its starting point). Another optimization was to only compare templates having the same number of components as that of the candidate gesture. The two optimizations resulted in about 90% increase in speed.
A group of middle and high school students were studied and the accuracy of the system reached 96.6% on algebra symbols with 15 training examples. The accuracy using the speed optimizations was 95.7% with 90% increase in speed.

Discussion

The number of possible ways of drawing a multi stroke gesture grows very rapidly with the number of its sub strokes. However the authors argue that it might not always result in combinatorial explosion. First, because gestures having a many strokes are prone to be forgotten and are not used.Second, comparison between a candidate gesture and each instance is relatively fast. Third, employed speed optimizations.
Nevertheless, the relative high number of generated gestures will cause slowdowns when the number of templates and stroke components per template is high. The performance is however acceptable for prototyping where ease of implementation and accuracy is important for smaller scale scenarios.