Saturday, October 16, 2010

# Reading #5: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes (Wobbrock)

Summary:

The $1 algorithm is basically intended to introduce an easy to implement and at the same time effective gesture recognition method. The paper mentions the difficulty to incorporate an HMM or NN based gesture classifier due to the large training corpus required and implementation complexity and the inflexibility of ad-hoc methods.
The proposed method is comprised of the following steps:
  1. Resample input points into a predefined number of equidistantly spaced points.
    1. The equidistant points are either a subset of the original points or a linear interpolation of those
  2. Rotates the points around the stroke's centroid so that the angle of the first point and the centroid becomes zero
  3. Scale the gesture to a reference square and translating the centroid to the origin
  4. Compute the average point-wise distance and map it into a 0-1 score based on maximum possible score
  5. After zeroing the indicative angles, the candidate gesture might not be at optimal angle with regards to the template and it is needed to further rotate it to optimize the angles.
  6. In order to try various angles and find the optimum value, Golden Section Search was utilized. This way the number of iteration is constant as opposed to hill climbing which dramatically increase speed for non matching gestures.

This method was compared against Dynamic Time Warping method and Rubine’s one were the rotation and scale invariance mechanisms were applied to the input of all these methods even without incorporating the Golden Section Search. The test set contains a set of 16 stroked drawn in different speed and by different people.

The DTW and the method proposed here produced similar and better result than that of Rubine’s especially when few samples were available for each template. Regarding the performance, DTW was significantly slower than Rubine’s and the method proposed here.

Discussion:

The limitations mentioned on the paper are the inability to distinguish between shapes whose difference is based on aspect ratio, location, scale, position or speed of drawing. However except speed, by not performing some of initial steps the issue is resolved.

Another weak point of the $1 recognizer is that it compares the candidate gesture to all templates at runtime which has performance implications. Nevertheless, in case of a large template set, templates tend to collide in the feature space and this is where the feature-based methods fail as well.