Monday, November 29, 2010

Reading #13: Ink Features for Diagram Recognition

Summary

In this work the objective is to distinguish text and shapes. The authors extract a set of features from various related previous works, their own set of features and features from newly available hardware such as pressure. They employed a set of 9 diagrams and asked 26 people to draw them. For each sketch 46 features where calculated. Next a decision tree was built recursively using Gini index on the data to partition the data into classes.
They finally reached a set of key feature set to efficiently distinguish between the two classes. These were: Time to next stroke, speed till next stroke, distance from last stroke, distance to next stroke, bounding box width, perimeter to area, amount of ink inside and total angle.
They tested their classifier with two others, namely Microsoft Divider and InkKit. The average misclassification rate for texts and shapes in their system was reported about %9.8 which is significantly better than the average misclassification rate of the other two systems.

Discussion

As noted by the authors, half of the extracted key features were related to inter-stroke gaps which seems to be a deciding factor here.
It would have also been better to compare the decision tree method applied in their system with other classification methods such as neural networks, nearest neighbor, etc.

Sunday, November 28, 2010

#Reading 8: A Lightweight Multistroke Recognizer for User Interface Prototypes

Summary

This paper introduces $N which is an extension of $1 sketch recognizer. The objective of the paper is to introduce a lightweight and quickly deployable multi stroke recognizer for interface creators.
The method build on top of $1 method. In $1, the candidate stroke and the templates strokes’ points are all normalized equidistantly. Next, they are rotated, scaled and translated to provide invariance. The rotation matching uses Golden Section Search to find the optimum angle in the proximity of an initial estimate. Finally the distance between a candidate and a template is the sum of their point-wise distances.
In $N the scaling preprocessing decides whether to scale uniformly or non-uniformly each gesture based its aspect ratio and a threshold. The rotation invariance is also bounded to a user-defined angle range.
A problem with multi stroke gestures is different possible stroke orders. $N generates all possible variations of drawing a template which is all order permutations and for each permutation, each single stroke can be drawn forward or backward which results in 2nxn! different cases for a multi stroke with n strokes. As a speed optimization, the candidate gesture is not compared with all the generated gestures, but with the ones having almost the same starting angle (the angle between the gesture’s centroid and its starting point). Another optimization was to only compare templates having the same number of components as that of the candidate gesture. The two optimizations resulted in about 90% increase in speed.
A group of middle and high school students were studied and the accuracy of the system reached 96.6% on algebra symbols with 15 training examples. The accuracy using the speed optimizations was 95.7% with 90% increase in speed.

Discussion

The number of possible ways of drawing a multi stroke gesture grows very rapidly with the number of its sub strokes. However the authors argue that it might not always result in combinatorial explosion. First, because gestures having a many strokes are prone to be forgotten and are not used.Second, comparison between a candidate gesture and each instance is relatively fast. Third, employed speed optimizations.
Nevertheless, the relative high number of generated gestures will cause slowdowns when the number of templates and stroke components per template is high. The performance is however acceptable for prototyping where ease of implementation and accuracy is important for smaller scale scenarios.

Saturday, October 16, 2010

# Reading #5: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes (Wobbrock)

Summary:

The $1 algorithm is basically intended to introduce an easy to implement and at the same time effective gesture recognition method. The paper mentions the difficulty to incorporate an HMM or NN based gesture classifier due to the large training corpus required and implementation complexity and the inflexibility of ad-hoc methods.
The proposed method is comprised of the following steps:
  1. Resample input points into a predefined number of equidistantly spaced points.
    1. The equidistant points are either a subset of the original points or a linear interpolation of those
  2. Rotates the points around the stroke's centroid so that the angle of the first point and the centroid becomes zero
  3. Scale the gesture to a reference square and translating the centroid to the origin
  4. Compute the average point-wise distance and map it into a 0-1 score based on maximum possible score
  5. After zeroing the indicative angles, the candidate gesture might not be at optimal angle with regards to the template and it is needed to further rotate it to optimize the angles.
  6. In order to try various angles and find the optimum value, Golden Section Search was utilized. This way the number of iteration is constant as opposed to hill climbing which dramatically increase speed for non matching gestures.

This method was compared against Dynamic Time Warping method and Rubine’s one were the rotation and scale invariance mechanisms were applied to the input of all these methods even without incorporating the Golden Section Search. The test set contains a set of 16 stroked drawn in different speed and by different people.

The DTW and the method proposed here produced similar and better result than that of Rubine’s especially when few samples were available for each template. Regarding the performance, DTW was significantly slower than Rubine’s and the method proposed here.

Discussion:

The limitations mentioned on the paper are the inability to distinguish between shapes whose difference is based on aspect ratio, location, scale, position or speed of drawing. However except speed, by not performing some of initial steps the issue is resolved.

Another weak point of the $1 recognizer is that it compares the candidate gesture to all templates at runtime which has performance implications. Nevertheless, in case of a large template set, templates tend to collide in the feature space and this is where the feature-based methods fail as well.

Wednesday, September 22, 2010

Reading #6: Protractor: A Fast and Accurate Gesture Recognizer (Li)

Summary
Protractor by Yang Li introduces a data driven gesture recognizer which is more accurate and faster than its counterparts. The recognizer first resamples the data points into equi-distant points using the procedure in $1 recognizer. However, unlike $1 method, the stroke is not rescaled into a square which keeps the aspect ratio and discriminates between long and narrow strokes. Next, the stroke is shifted so that its centroid resides in the origin. To discriminate between different rotation angles and to reduce noise, the gesture is rotated so that the angle of the initial point of the gesture is snapped into one of 4 main angles. It is also possible reset all the gesture to zero angle in which gestures are rotation invariant. In order to perform classification a distance measure between the input gesture and the template gesture is defined as the angle between their respective data vectors in n-dimensional space.
The author then shows that an optimal angle can be found in a closed form which if added to the template's rotation angle, minimizes the input gesture and the template gesture's distance (in the defined sense). Finding this optimal angle compensates for the noise in determining the gesture orientation just by its initial point rotation. To classify the input a nearest neighbor method is utilized.
Finally the author shows that his method is equivalent and in some cases superior than the $1 method and with regards to computational resource required significantly outperforms it.

Discussion
This is a relatively recent article. Among the advantages of this method is that no feature is required to be defined which makes it more flexible as it is difficult to design features that can always discriminate gestures in all domains. The author also shows that it can cope with large gesture set quite well. I am not able to figure out an obvious flaw right now.

Tuesday, September 21, 2010

Reading #4: Sutherland. Sketchpad: A Man-Made Graphical Communication System (Sutherland)

Summary

Sketchpad by Ivan Sutherland is a device which allows users to draw different shapes and diagrams from primary shapes, such as lines and circles. Compound shapes can be built upon simpler shapes. Later, they can become a template of which multiple copies can be reproduced and these templates can be stored on a storage device. Sketchpad is controlled with a light pen and an array of buttons. Buttons are used to issue commands, such as draw polygon after which the user can enter the polygon vertices using a light pen. It is also possible to enforce constraints on shapes and sketchpad will try to satisfy those constraints if possible. For instance, it is possible to force all the polygon vertices to lie on a circle or force two lines to become parallel. The shapes are stored in their hierarchical form and the screen is refreshed by redrawing each shape one after another. Two scan conversion formulas for circle and line is also introduced in the paper.
The constraint satisfaction mechanism works as follows: Initially it builds a graph where constrained variables are nodes and edges mean the two variables are related to each other in a constraint. It seeks to find a free variable which can be directly assigned a value that satisfies the constraint. This variable is then removed from the graph and this procedure is repeated until all the constraints are satisfied. If the aforementioned procedure did not work for a set of constraints, a relaxation will be performed on the constrained values to gradually minimize the error of constraint satisfaction.
As a conclusion, the author introduces various applications of this system. For instance, it is possible to draw a bridge and let the sketchpad calculate different force components on each part or design a circuit using reusable drawings of different elements. The author also brought up the possibility of simulating and debugging the electrical circuits and 3d drawing as future avenues of expanding Sketchpad.

Discussion

Sketchpad project amazingly utilizes object oriented ideas such as creating a template shape for storage and creating instances thereof and a lot of innovations. Moreover, its means of interaction with user and aiding him in design by satisfying constraints is also impressive. On the whole, Sketchpad accomplished a fine job in its time.

Saturday, September 11, 2010

Reading #3: “Those Look Similar!” Issues in Automating Gesture Design Advice (Long)

Summary

Here a gesture design tool called quill is introduced. Chris Long, James Landay and Lawrence Rowe, the authors, have incorporated similarity mechanisms to detect both similarities of gestures in human perception and in computer recognition. The human perception part is performed using experimental tests data from different human subjects. The similarity in computer recognition can be measured with the classification probability or other methods.
Their system similar to Rubine’s will let the user define new gestures and provide examples for each. The classification method is also similar to Rubine’s. The authors describe that deciding the advice timing and comprehensiveness was a challenge. The advice giving system was implemented as a background process which can be called up by the user or start automatically after some idle period. Since the background process takes some time, if called up by the user, it will lock the actions that might stale the analysis result. If it was run automatically, user actions which affect its result will cancel the analysis.
Finally the authors brought about the ideas of automatically morphing similar gestures or checking the user entered gestures against standard databases.

Discussion

I think the facilities incorporated in this gesture design kit is thoughtful especially the human perception similarity measurements. The paper did not elaborate on the classifier technicalities but it is implicit the classifier is similar to Rubine’s.

Thursday, September 9, 2010

Reading #2: Specifying Gestures by Example (Rubine)

Summary

This paper introduces a toolkit for creating gesture based UIs called GRANDMA. The toolkit lets a user to add gesture commands to different entities in the UI. For instance, user can add a delete gesture to the main UI windows and define the action(s) which must be executed upon its recognition. Each gesture is defined with a set of examples.
The system learns to distinguish between gestures by learning a linear classifier on a set of features extracted from the stroke. First, the gestures are assumed to be single-stroke. Second, for performance reasons, the features are constrained to only those which can be computed incrementally in constant time per point. A set of 13 features have been introduced in the paper such as total angular movement, maximum pen speed, size of the bounding box, etc. Consequently, a linear classifier is constructed based on the examples provided. Once the classification is ambiguous, either based on a low probability of being classified in the recognized class or for being far from the class centroid, the system rejects the gesture and asks the user to retry.
The precision of the system for classifying about 15 gestures, each with at least 15 examples is around 98%. For 30 gestures where each gesture is trained with 40 examples, the precision is about 97%. The author also finally proposes some ideas to handle multi-finger gestures

Discussion

The system proposed has promising performance despite a rather simple recognition method. It is also possible to improve the system by providing new features without impairing the performance. However the author has only reported the precision of the system where false negative responses are not taken into account. Another area where one can improve is the fact that most features selected are not rotation and scale invariant which might bring difficulty in some domains. On the whole, the system is simple and performs well.