Wednesday, September 22, 2010

Reading #6: Protractor: A Fast and Accurate Gesture Recognizer (Li)

Summary
Protractor by Yang Li introduces a data driven gesture recognizer which is more accurate and faster than its counterparts. The recognizer first resamples the data points into equi-distant points using the procedure in $1 recognizer. However, unlike $1 method, the stroke is not rescaled into a square which keeps the aspect ratio and discriminates between long and narrow strokes. Next, the stroke is shifted so that its centroid resides in the origin. To discriminate between different rotation angles and to reduce noise, the gesture is rotated so that the angle of the initial point of the gesture is snapped into one of 4 main angles. It is also possible reset all the gesture to zero angle in which gestures are rotation invariant. In order to perform classification a distance measure between the input gesture and the template gesture is defined as the angle between their respective data vectors in n-dimensional space.
The author then shows that an optimal angle can be found in a closed form which if added to the template's rotation angle, minimizes the input gesture and the template gesture's distance (in the defined sense). Finding this optimal angle compensates for the noise in determining the gesture orientation just by its initial point rotation. To classify the input a nearest neighbor method is utilized.
Finally the author shows that his method is equivalent and in some cases superior than the $1 method and with regards to computational resource required significantly outperforms it.

Discussion
This is a relatively recent article. Among the advantages of this method is that no feature is required to be defined which makes it more flexible as it is difficult to design features that can always discriminate gestures in all domains. The author also shows that it can cope with large gesture set quite well. I am not able to figure out an obvious flaw right now.

Tuesday, September 21, 2010

Reading #4: Sutherland. Sketchpad: A Man-Made Graphical Communication System (Sutherland)

Summary

Sketchpad by Ivan Sutherland is a device which allows users to draw different shapes and diagrams from primary shapes, such as lines and circles. Compound shapes can be built upon simpler shapes. Later, they can become a template of which multiple copies can be reproduced and these templates can be stored on a storage device. Sketchpad is controlled with a light pen and an array of buttons. Buttons are used to issue commands, such as draw polygon after which the user can enter the polygon vertices using a light pen. It is also possible to enforce constraints on shapes and sketchpad will try to satisfy those constraints if possible. For instance, it is possible to force all the polygon vertices to lie on a circle or force two lines to become parallel. The shapes are stored in their hierarchical form and the screen is refreshed by redrawing each shape one after another. Two scan conversion formulas for circle and line is also introduced in the paper.
The constraint satisfaction mechanism works as follows: Initially it builds a graph where constrained variables are nodes and edges mean the two variables are related to each other in a constraint. It seeks to find a free variable which can be directly assigned a value that satisfies the constraint. This variable is then removed from the graph and this procedure is repeated until all the constraints are satisfied. If the aforementioned procedure did not work for a set of constraints, a relaxation will be performed on the constrained values to gradually minimize the error of constraint satisfaction.
As a conclusion, the author introduces various applications of this system. For instance, it is possible to draw a bridge and let the sketchpad calculate different force components on each part or design a circuit using reusable drawings of different elements. The author also brought up the possibility of simulating and debugging the electrical circuits and 3d drawing as future avenues of expanding Sketchpad.

Discussion

Sketchpad project amazingly utilizes object oriented ideas such as creating a template shape for storage and creating instances thereof and a lot of innovations. Moreover, its means of interaction with user and aiding him in design by satisfying constraints is also impressive. On the whole, Sketchpad accomplished a fine job in its time.

Saturday, September 11, 2010

Reading #3: “Those Look Similar!” Issues in Automating Gesture Design Advice (Long)

Summary

Here a gesture design tool called quill is introduced. Chris Long, James Landay and Lawrence Rowe, the authors, have incorporated similarity mechanisms to detect both similarities of gestures in human perception and in computer recognition. The human perception part is performed using experimental tests data from different human subjects. The similarity in computer recognition can be measured with the classification probability or other methods.
Their system similar to Rubine’s will let the user define new gestures and provide examples for each. The classification method is also similar to Rubine’s. The authors describe that deciding the advice timing and comprehensiveness was a challenge. The advice giving system was implemented as a background process which can be called up by the user or start automatically after some idle period. Since the background process takes some time, if called up by the user, it will lock the actions that might stale the analysis result. If it was run automatically, user actions which affect its result will cancel the analysis.
Finally the authors brought about the ideas of automatically morphing similar gestures or checking the user entered gestures against standard databases.

Discussion

I think the facilities incorporated in this gesture design kit is thoughtful especially the human perception similarity measurements. The paper did not elaborate on the classifier technicalities but it is implicit the classifier is similar to Rubine’s.

Thursday, September 9, 2010

Reading #2: Specifying Gestures by Example (Rubine)

Summary

This paper introduces a toolkit for creating gesture based UIs called GRANDMA. The toolkit lets a user to add gesture commands to different entities in the UI. For instance, user can add a delete gesture to the main UI windows and define the action(s) which must be executed upon its recognition. Each gesture is defined with a set of examples.
The system learns to distinguish between gestures by learning a linear classifier on a set of features extracted from the stroke. First, the gestures are assumed to be single-stroke. Second, for performance reasons, the features are constrained to only those which can be computed incrementally in constant time per point. A set of 13 features have been introduced in the paper such as total angular movement, maximum pen speed, size of the bounding box, etc. Consequently, a linear classifier is constructed based on the examples provided. Once the classification is ambiguous, either based on a low probability of being classified in the recognized class or for being far from the class centroid, the system rejects the gesture and asks the user to retry.
The precision of the system for classifying about 15 gestures, each with at least 15 examples is around 98%. For 30 gestures where each gesture is trained with 40 examples, the precision is about 97%. The author also finally proposes some ideas to handle multi-finger gestures

Discussion

The system proposed has promising performance despite a rather simple recognition method. It is also possible to improve the system by providing new features without impairing the performance. However the author has only reported the precision of the system where false negative responses are not taken into account. Another area where one can improve is the fact that most features selected are not rotation and scale invariant which might bring difficulty in some domains. On the whole, the system is simple and performs well.

Wednesday, September 8, 2010

Reading #1: Gesture Recognition (Hammond)

Summary

This article is an introduction to gesture recognition while insisting on the fact that in order for gesture recognition to be helpful for sketch recognition, the path followed by the stroke plays an important role. Subsequently the pros and cons of gesture recognition are explained (e.g. gestures are suitable for commands not for drawings). The rest of the article basically elaborates on three papers on gesture recognition.
First, Dean Rubine’s early work on this field is a linear classifier with a 13 hand-picked features of a stroke. The features include some measures of the shape and size of the bounding box, the angular motion and smoothness of the stroke, etc. The timestamps are not used extensively in the feature set.
Christopher Long’s work was apparently similar to Rubine’s but with additional features that include some nonlinear combinations of the previous one. However the new feature set did not make a significant improvement.
Finally Jacob Wobbrock performs the recognition from another perspective. His system introduces a rotation and scale invariant method in which point-wise comparison is made between the user stroke and class templates.

Discussion

This reading clarifies the jargon and act as an introduction to subsequent readings. It also explicates the Rubine’s feature set and at the same time proposes challenging question on the implementation issues such as duplicate points or timestamps. These questions make the reader more acquainted with the subject. Nonetheless, there are some image place holders in the document that have to be filled or corrected.

Sunday, September 5, 2010

Gesture Recognition Questions

  • Question: What are the disadvantages of deleting the first point at a particular location?
    • Loses acceleration (and speed) to that point (removed first time stamp)
  • What are the disadvantages of deleting the second point at a particular location?
    • Loses acceleration (and speed) from that point onwards (removed second time stamp)
  • Question: What are the disadvantages and advantages to removing the first point when there are duplicate time stamps?
    • If it is assumed that the duplication of time stamps is the result of a very quick motion and we call 4 consecutive points a,b,c and d where timestamp(b)=timestamp(c)Then discarding point b will result in less digitization noise for the change of angle between a and c but more noise between c and d. Since the farther two points are, the less noisy is the angle between them.
  • What are the disadvantages and advantages of removing the second point?
    • Same as above, but here the advantage is less noise between b and d but more noise between a and b.
  • What are the disadvantages and advantages of altering the time stamp values?
    • If we alter the time stamps then we will not lose any of the points thus no data is discarded. However in order to differentiate between the two points, their time stamp has to be an epsilon seconds apart. Epsilon has to be smaller than the maximum timer resolution. The disadvantage of altering the time stamps is that if the two consecutive points have noise in their spatial location, small epsilon values will amplify that noise in the speed measurements
  • What would be the best way to do this?
    • We can take the mean of the duplicate points and regard it as the sole point with that timestamp with the assumption that the sampling rate is uniform. This way, no data is discarded but might need more computation.
  • The Rubine features were calculated for the shapes below. Can you match up the shape letter label from Figure 10 with their Rubine features in Figure 11?
    • Answers were in the document
  • Questions: What was the insight behind the other features that Long used?
    • Total angle traversed / total length: some how an average curvature. Circles of different sizes will differ in this measure
    • Density metric 1: a lengthy stroke which ends near its starting point probably has all of its point in a small neighborhood around the starting point which makes it dense. However there are cases that this does not apply
    • Density metric 2: The rationale is similar to the previous but unlike that, it doesn't rely on the distance of only the first and last points, but somehow takes into account the whole stroke
    • Opnness: closed curves will result in lower value for this measure while a straight line would result in larger values
    • Area of bounding box: a measure of size of the stroke. However sensitive to rotation and error prone
    • Log(area): to compensate for large area differences in larger shapes while the difference in perception is not as high
    • Total angle / total absolute angle: Some how measures how much the stroke changes its rotation direction, the lower times it changes, the higher this measure will be

First day of class Questionnaire

Photo of yourself.
Its there --> under About Me section

E-mail address (e.g., yourname at domain.com). 
afayazi [at] tamu [.] edu

Graduate standing (e.g., 3rd year Phd) (e.g., 3rd Year PhD, 2nd Year Masters, 1st Year PhD w/ Masters).
1st year Masters

Why are you taking this class?
Getting to know researches in this field

What experience do you bring to this class?
Well, I am a bit acquainted with topics in the AI realm such as Machine Learning, Vision and robot localization

What do you expect to be doing in 10 years?
Have finished my studies and have started working in the industry + thinking about the next 10 years again

    What do you think will be the next biggest technological advancement in computer science?
    Something other than the Von Neumann architecture maybe ?? in the sense of a decentralized processing

      What was your favorite course when you were an undergraduate (computer science or otherwise)?
      Theory of Languages and Computation since I finally managed to read all the covered chapters of the textbook of a course.

        What is your favorite movie and why?
        "Taste of Cherry" 1997 for not being yet another preachy and stereotypical movie and "The Matrix" 1999 although Hollywoodish but I enjoyed some of its dialogues.

        If you could travel back in time, who would you like to meet and why? 
        Kurt Gödel. Because I think he's famous and I would ask him the reason. :)

        Give some interesting fact about yourself.
        I always try to take advantage of all the added features of devices (phone, camera, etc). Sometimes I wish devices were simpler and had less features.