Applicability of DBsurfer to neural net learning

Neural nets are complex mathematical black boxes that, given a set of inputs and an associated list of outputs, build a mathematical model of how the inputs correspond to the given outputs.

In effect, they learn the correspondence of inputs to outputs from the given data values. From a technical point of view, the neural net program can be conceived as constructing a set of non-linear equations in n-dimensional data space. From a practical point of view, it is able to classify and/or predict the values of outputs given a new set of inputs.

In the past, this paradigm was relegated and only useful to individuals familiar with the abstract processes required to formulate the learning rules embodied in the neural nets. This is no longer the case. We recently came across products whose usefulness is underscored by their ability to automate the learning process to a degree that makes it quasi-trivial to adapt neural nets to everyday business processes. We are referring to NeuralShell Classifier and NeuralShell Predictor form Ward Systems Group Inc.

The site contains examples of how to use the products, demo versions and references. Ward Systems has done for Neural Nets what IBX has done for SQL queries: hide the internal complexities of the paradigm to such an extent that operation of the product requires only application domain knowledge. We believe this to be a true mark of product excellence.

The applicability of a neural net is only as good as the model it computes is accurate. This accuracy is based to a large degree on the relevance of the learning set, i.e. the data it is given so it can learn. If you have a significant database, certain portions of it probably contain key data that can be used to model business rules for prediction.

For example, if you have a POS system collecting daily sales data by store and UPC, there could be a correlation between the weekday, the sales clerk, the store number and the units sold for a particular article's property such as weight, size or color. As a seasoned manager you might know what this correlation is, and be able to stock accordingly. On the other hand, the relationship might involve more subtle and interrelated data elements, or your level of experience may not be high enough to accurately predict stocking levels.

The distinguishing feature of problems that neural nets are called-upon to solve is a non-linear relationship between what you are basing your predictions on, and what you are predicting. You may wish to predict quantities, such as employment levels, or raw material stocks to feed a manufacturing plant based on historical data and future expectations. Obviously if you expect to sell x widgets and each needs 10 grams of alcohol, then that is a linear relationship and prediction is a snap. But if the manufacturing process leaks alcohol based on ambient temperature or different operators or production volumes or a combination thereof, then the relationship becomes non-linear and increasingly difficult to predict, particularly as more variables are involved.

A database that tracks this historical data can yield powerful classifier and predictor neural net models. Your task is to determine the most appropriate variables that affect the outcome, and design a database search strategy to collect them. To do that in general you need a powerful ad-hoc query tool like DBsurfer, because your data is probably not located in any one particular area of the database. DBsurfer allows you to use point-and-click synthetic English to bring this data together in an efficient manner.

The efficiency of the ad-hoc tool is important because some of the sets of data you assemble may not be appropriate, and you may have to go through an iterative process to arrive at the best predictor set. The faster you can do this, the more likely you are to do it and therefore to optimize your neural net predictor.

There is only one painless way of deciding whether your learning set is accurate: apply the resulting neural net to a test set and determine if the result fits the test set data. Comparing the neural net outputs with the actual test set data easily does this. You split the test set into two parts: inputs and outputs, then feed the inputs to the neural net and compare its outputs with the actual test set output values.

Based on discussions with Steve Ward, I have added a mechanism to DBsurfer that is specialized to neural net learning set construction. DBsurfer's Espresso language builder now supports a randomizing feature that works as follows. When the randomizing feature is invoked, DBsurfer generates two answers to a query: one is a random set of up to 16000 lines picked from the answer, the other is the balance of the answer. The first part is used to feed the neural net, the second is used as a test set. In general, DBsurfer will attempt to place 90% of the answer in the learning set and 10% in the test set. This feature will be added as of DBsurfer build # 2235.