Thursday, December 16, 2010

Reading #11. LADDER, a sketching language for user interface developers. (Hammond)

Summary

In this paper a sketch recognition language is proposed in which template objects are described by their component primitives and geometric constraints between them; For instance, a stick figure consists of a circle connected to a line, which is itself connected to four other lines.

The shapes consist of parts such as lines, constraints such as being parallel or intersections, aliases, editing and display properties.

Once the structural description have been entered using LADDER language, they can be automatically transformed into shape recognizers.

The paper brings examples of applying the framework to UML diagrams. The framework, also encompass a library of predefined shapes (line, curve, ellipse) and constrains ( collinear, tangent, acute ).

Discussion

LADDER is a tool which is valuable to many domains as long as the time it takes to precisely define all the shapes is justified. However as part of its future work, it can pose automatic description generation for complex shapes.

Reading #10. Graphical Input Through Machine Recognition of Sketches (Herot)

Summary

This paper introduces HUNCH system. It has a similar corner finding mechanism to that of Sezgin’s, that is , it find the minima in the speed curve. It also consider connects line endpoints using a connectivity threshold. Finally it experiments with some high level recognition such as 3D object inference or floor plan recognition. Their system employed CURVIT and STRAIT modules as curve recognizer and line endpoint latching module respectively.

They conclude that in order to have a successful sketch recognition system, some context information is necessary.

Discussion

This is a relatively old paper, however they propose some characteristics of a sketch system which is still ideal today, that is to identify sketches without context information. This paper is mostly experiments and possible solutions. Experiments and question proposed are still relevant today.

Wednesday, December 15, 2010

Reading #9. PaleoSketch: Accurate Primitive Sketch Recognition and Beautification (Paulson)

Summary

In this paper a system is introduced to recognize a set of primitive shapes. The primitive shapes recognized by this paper are: straight lines, polylines, arcs, circles, helixes, spirals, etc.

It is a low level process which is usually utilized in the initial steps of the recognition to provide input to higher levels.

The recognizer basically gives the strokes to a set of different low level recognizers, namely eight. Each classifier then returns whether the stroke is belongs to that class or not along with the beatified stroke in case it is.

Moreover, PaleoSketch incorporates two new features: normalized distance between direction extremes (NDDE) and direction change ratio (DCR), both of which are a measure to detect spikes in the direction graph and are discriminators between Curve and Polyline.

Discussion

PaleoSketch has a very high classification rate with overall recognition of about 98.6%, it paves way for higher level recognitions. However, I think the only important primitives to be recognized are circle, line and polyline. Shapes such as spiral or helix are quite domain dependent and the choice to add them to the list of primitive shapes seems arbitrary to me.

Reading #7. Sketch Based Interfaces: Early Processing for Sketch Understanding

Summary

This paper focuses on the initial steps of sketch recognition. It proposes a method to find corners in a sketch. Sezgin combines both curvature and speed measures to generate a set of potential corners.

The second feature in their method is a curve handling. It model curves as Bezier curves by approximating the control points using LS method.

The system also performs stroke beautification so the small noises of the pen while drawing the shapes are eliminated and as a result the drawn sketches will look better

Finally, the system does some basic object recognition on circles, rectangles, etc.

Discussion

This paper is one of the earliest in corner detection which takes advantage of both speed and curvature. About the segmentation, use of Bezier curves for segments is a good choice for its flexibility. Finally the results are promising

Monday, November 29, 2010

Reading #13: Ink Features for Diagram Recognition

Summary

In this work the objective is to distinguish text and shapes. The authors extract a set of features from various related previous works, their own set of features and features from newly available hardware such as pressure. They employed a set of 9 diagrams and asked 26 people to draw them. For each sketch 46 features where calculated. Next a decision tree was built recursively using Gini index on the data to partition the data into classes.
They finally reached a set of key feature set to efficiently distinguish between the two classes. These were: Time to next stroke, speed till next stroke, distance from last stroke, distance to next stroke, bounding box width, perimeter to area, amount of ink inside and total angle.
They tested their classifier with two others, namely Microsoft Divider and InkKit. The average misclassification rate for texts and shapes in their system was reported about %9.8 which is significantly better than the average misclassification rate of the other two systems.

Discussion

As noted by the authors, half of the extracted key features were related to inter-stroke gaps which seems to be a deciding factor here.
It would have also been better to compare the decision tree method applied in their system with other classification methods such as neural networks, nearest neighbor, etc.

Sunday, November 28, 2010

#Reading 8: A Lightweight Multistroke Recognizer for User Interface Prototypes

Summary

This paper introduces $N which is an extension of $1 sketch recognizer. The objective of the paper is to introduce a lightweight and quickly deployable multi stroke recognizer for interface creators.
The method build on top of $1 method. In $1, the candidate stroke and the templates strokes’ points are all normalized equidistantly. Next, they are rotated, scaled and translated to provide invariance. The rotation matching uses Golden Section Search to find the optimum angle in the proximity of an initial estimate. Finally the distance between a candidate and a template is the sum of their point-wise distances.
In $N the scaling preprocessing decides whether to scale uniformly or non-uniformly each gesture based its aspect ratio and a threshold. The rotation invariance is also bounded to a user-defined angle range.
A problem with multi stroke gestures is different possible stroke orders. $N generates all possible variations of drawing a template which is all order permutations and for each permutation, each single stroke can be drawn forward or backward which results in 2ⁿxn! different cases for a multi stroke with n strokes. As a speed optimization, the candidate gesture is not compared with all the generated gestures, but with the ones having almost the same starting angle (the angle between the gesture’s centroid and its starting point). Another optimization was to only compare templates having the same number of components as that of the candidate gesture. The two optimizations resulted in about 90% increase in speed.
A group of middle and high school students were studied and the accuracy of the system reached 96.6% on algebra symbols with 15 training examples. The accuracy using the speed optimizations was 95.7% with 90% increase in speed.

Discussion

The number of possible ways of drawing a multi stroke gesture grows very rapidly with the number of its sub strokes. However the authors argue that it might not always result in combinatorial explosion. First, because gestures having a many strokes are prone to be forgotten and are not used.Second, comparison between a candidate gesture and each instance is relatively fast. Third, employed speed optimizations.
Nevertheless, the relative high number of generated gestures will cause slowdowns when the number of templates and stroke components per template is high. The performance is however acceptable for prototyping where ease of implementation and accuracy is important for smaller scale scenarios.

Saturday, October 16, 2010

# Reading #5: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes (Wobbrock)

Summary:

The $1 algorithm is basically intended to introduce an easy to implement and at the same time effective gesture recognition method. The paper mentions the difficulty to incorporate an HMM or NN based gesture classifier due to the large training corpus required and implementation complexity and the inflexibility of ad-hoc methods.
The proposed method is comprised of the following steps:

Resample input points into a predefined number of equidistantly spaced points.
1. The equidistant points are either a subset of the original points or a linear interpolation of those
Rotates the points around the stroke's centroid so that the angle of the first point and the centroid becomes zero
Scale the gesture to a reference square and translating the centroid to the origin
Compute the average point-wise distance and map it into a 0-1 score based on maximum possible score
After zeroing the indicative angles, the candidate gesture might not be at optimal angle with regards to the template and it is needed to further rotate it to optimize the angles.
In order to try various angles and find the optimum value, Golden Section Search was utilized. This way the number of iteration is constant as opposed to hill climbing which dramatically increase speed for non matching gestures.

This method was compared against Dynamic Time Warping method and Rubine’s one were the rotation and scale invariance mechanisms were applied to the input of all these methods even without incorporating the Golden Section Search. The test set contains a set of 16 stroked drawn in different speed and by different people.

The DTW and the method proposed here produced similar and better result than that of Rubine’s especially when few samples were available for each template. Regarding the performance, DTW was significantly slower than Rubine’s and the method proposed here.

Discussion:

The limitations mentioned on the paper are the inability to distinguish between shapes whose difference is based on aspect ratio, location, scale, position or speed of drawing. However except speed, by not performing some of initial steps the issue is resolved.

Another weak point of the $1 recognizer is that it compares the candidate gesture to all templates at runtime which has performance implications. Nevertheless, in case of a large template set, templates tend to collide in the feature space and this is where the feature-based methods fail as well.