Sketch Practically Anywhere Recognition Kit (SPARK)
When faced with complex design, analysis, or engineering tasks, novices and professionals alike attempt to better understand problems through diagrams, and the natural first step in this process is working on a whiteboard. Through their drawings, people can gain valuable insights into subtleties of design and analysis tasks, but once a diagram gains sufficient complexity, further progress becomes tedious (or even intractable) without the aid of a computer.
The goal of this work is to utilize popular consumer hardware (webcams, smartphones, and projectors when available) to enable sketch recognition where people are already drawing: whiteboards, chalkboards, and even on loose paper. Our system enables a person to interact with her real world drawings by recognizing meaning from images of hand drawn diagrams that are captured via a smartphone or a webcam, and by providing an interface for interacting with that meaning through augmenting projectors and/or the phone's display. In service of this goal, we make contributions to the three operational phases of the system's architecture: capturing users' marks, recognizing the drawn diagrams, and enabling interaction with the recognized structures.
MyScript equation recognition is provided courtesy of VisionObjects (link).
Mobile, Vision-based Sketch Recognition
Sketch recognition has made impressive gains over the years, helping many novice users communicate complicated ideas to their machines through free-hand drawings. While sketch-based applications significantly lower the barrier for entry to advanced equation solvers (MathPad^2) or chemical analysis tools (ChemDraw), they assume specialized pen capture hardware is present, such as a digitizing pad or tablet PC interface.
In my current work, I explore the potential for applying general sketch-recognition techniques to photographs of diagrams drawn on chalkboards, whiteboards, and paper. To this end, I have created an Android app and backend server allowing users to photograph a hand-drawn Turing machine diagram with their phone, and simulate the recognized result on their device.
(An old version of the system that recognizes only 1,0, and - (dash) for head direction.)
High level overview
With this system, images of drawings are converted into strokes, the standard unit of computation for sketch recognition algorithms (contrast with bitmaps or feature vectors). In order to extract strokes, we must first isolate ink from background, a process that is complicated first by variations in lighting or surface wear and second, the fact that we must recognize both dark ink on a light background (whiteboard/paper) and light ink on a dark background (chalkboard).
Once ink is isolated from the background, converting the regions of ink pixels to strokes requires two steps: thinning and tracing. Thinning involves narrowing the regions of pixels into single-pixel wide lines, while tracing involves finding paths between the pixels that both cover all points and reasonably approximate how they were originally drawn.
Finally, one the system generates a list of strokes, our sketch framework must perform recognition over the data. However, in the context of recognition over strokes extracted from an image, our algorithms cannot leverage consistent timing information (such as drawing speed or stroke ordering), and recognizing diagrams requires accuracy remaining high despite necessarily arbitrary decisions made during stroke extraction.
Multi-application Sketch Framework
Currently, developing sketch applications requires a huge upfront investment on the part of the programmer(s). Since each application is created in a one-off manner, basic recognition code in one system is at best very time-consuming to port to another system, and at worst wholly incompatible. My recent work has focused on reducing the burden on the developer of micromanaging every step of recognition.
Sketchy* is a development platform for sketch recognition applications that lowers the barrier to entry for programmers by coordinating distinct recognition tasks (apps) such that they can be composed into more complex sketch-based applications.
The basic distinction between applications created in Sketchy and previous sketch-recognition programs is that smaller recognition tasks are naturally more modular in Sketchy. The user's interface to the system is through the board where she draws her strokes. At the same time, apps may monitor the board for new strokes and decide to annotate them with recognition results. For example, a circle recognizer might calculate the circularity of incoming strokes and annotate some of them as "Circle."
Further, apps may monitor the board for specific annotations and perform additional, higher-level recognition. This key strength allows apps to build upon the recognition work of lower-level apps by composing together for more complex tasks. For example, a Markov model recognizer could leverage the "edge" and "node" annotations of a graph recognizer, which could work off of "arrow" and "shape" annotations generated by other apps.
With separate apps simultaneously performing recognition on the same board, it is up to the framework to coordinate their efforts. The key coordination interface for our system comes in the form of annotations. Regardless of the logic (or even language) performing the recognition tasks, the only means of information transfer between apps is that which is "stamped" onto a stroke (or set of strokes) in an annotation. Apps register with the framework in order to listen for strokes annotated a certain way, and conversely may only annotate strokes through the centralized interface.
Being able to compose apps into larger sketch applications has significant implications for the state of pen-based interaction. Experts in one area of sketch can provide real improvement for other developers who may not have the time or expertise to take their advice; they can simply provide an app for public use.
On the other end of the spectrum, developers new to sketch recognition can leverage existing apps to more quickly produce novel applications. Thus, a programmer prototyping an idea she has for a map drawing utility can focus on being a virtual cartographer instead of getting bogged down deciding between neural nets or hidden Markov models for distinguishing between a "g" and a "q" in the street names.
*Sketchy is just a working name for the system.