| Below are the patents that I worked on at SGI. All of these patents were created while developing MineSet, a data mining and visualization product.
A lot of business software could benefit from the introduction of machine learning algorithms. There are two main sorts: directed and undirected. Examples of directed learning are classifiers and regression. Examples of undirected learning are clustering (i.e. segmenation) and determining attribute importance. The primary difference is that directed learning implies that you know what you are searching for.
When there are hundreds of dimensions and measures it is very difficult for the user or the person configuring the system to know which to select for a specific chart. Classifiers can help a lot in this regard by showing at a high level all attributes and how they relate to a specific target. Furthermore, the visualization of classifiers helps with data integrity issues. It can tell you at a glance which of your dimensions or measures are not behaving the way you expect.
The patents that I worked on are:
* Evidence Visualizer - US patent number 5,930,803 (issued/granted July 1999; after a couple of years of processing). At the same time I submitted a patent application, I authored a paper with Ronny Kohavi and Dan Sommerfield titled Visualizing the Simple Bayesian Classifier. This paper shows the visualizer in an early form. The power of the naive-bayes classifier comes from smart binning of numerical attributes, and smart sorting and grouping of categorical attributes, with respect to some target. This tools was really great for getting an immediate overview of your data, but it also allowed interactive classification.
* Decision Table Visualizer. US Patent number 6,301,579. Issued around 2000. This is similar to the Evidence Visualizer in that we are also visualizing a classifier. It kind of looks like a graphical representation of a giant pivot table with many dimensions or binned measures for the rows and columns. One advantage of the decision table classifier and it visualization is that it does not assume attribute independence like Naive Bayes. The reason is that it selects pairs of attributes in decreasing order of importance with respect to the target. In the visualization, it is possible to interactively drill into your data and do interactive classification just as was possible for the evidence classifier. The paper for this one is Visualizing Decision Table Classifiers (pictures) are separate. This visualizer could be used to answer most of the same sorts of questions as the evidence classifier, but it can also show correlation between attributes.
* Visual Approximation of Scattered data. US Patent number 5,861,891 (issued/granted Jan 1999; after a couple of years of processing) Allows creating a volumetric rendering of a 3D scaterplot when rendering the millions of individual points is prohibitive. The associated paper is Volume Rendering for Relational Data. (Pictures) are separate.
* Interpolation between relational tables for purposes of animating a data visualization US Patent number 6,034,697. Issued about 1999. Closely related to the last one. It allowed animating the volume rendering of the relational data using sliders that represented other measures like time or age.
* Method for approximating scattered data using color to represent values of a categorical variable US Patent number 6,301,579. Issues around 2000. This is also closely related to the volume rendering for relational data idea. Here we adapt the technique to rendering scattplots that have a categorical (i.e. dimension instead of measure) variable mapped to color. The corresponding paper is Nominal Splats.
Other SGI MineSet patents of possible interest.
* Attribute importance - don't have a reference yet.
* Tree visualizer - Shows data in a tree hierarchy with a bar chart at each node. Used to show decision tree classifier in addition to straigh visualization.
* MLC++ - MLC++ was a machine learning library written by Ronny Kohavi and Dan Sommerfield when they were at stanford. Its a bit dated now. We can probably recreate the algorithms.
US Patent Office