The paper attempts to explain why DBsurfer as a more efficient tool for the discovery and presentation of information than commonly available -and particularly leading - report generators. Before you read on, it is not an OLAP tool. OLAP is just a presentation layer. DBsurfer is designed to let you ask pointed questions of the data, then display it as information that can be formatted in a report.
DBsurfer has a vital economic purpose - to reduce the real cost of using a database by delivering a RAD environment endowed with tools that provide analysts with quick turnaround times when supporting end-users. Many companies spend countless programmer hours designing views and queries that channel end-users into tightly controlled query paths. These programmers are responsible to analysts that define user requirements. A department within one of our customers - Fedex, USA - short-circuits this process by having senior analysts design the queries directly with DBsurfer. The savings can be considerable.
The fundamentals of DBsurferAlthough the idea of communicating with a program in natural language is alluring, the sad conclusion by those that have tried it for other than aritificially simple queries is that it doesn't work. What we have done with one of the interfaces in DBsurfer is to turn around the paradigm - you are led through the steps required to form an allowable query in artificial English.
DBsurfer assumes you are far better at translating your query into precise terms than it could ever be. It therefore provides you with the a set of guided phrases that lead you to compose a sentence representing your query. The choices it gives you are the only ones it can understand, so there is no ambiguity. This doesn't make its job any easier, because in order to provide these phrases DBsurfer has to ensure they are syntactically and semantically correct in terms of your database. This accuracy has to be enforced while you are composing your sentence, so it looks ahead to all possible choices, compares them to what you have entered so far and rejects those that don't apply. For example, you should not be allowed to take the average of a number of colors, even if it is syntactically legal. Neither should you be allowed to formulate a query that starts with "Display the day of the week of..." if your database does not contain dates.
For more casual users, who need an even quicker way to build queries, or who like a more unstructured approach, DBsurfer offers drag and drop querying. This interface offers fewer query features, but requires less study. This interface uses a tree paradigm representing the database, from which you can draw (drag and drop) the data required to assemble a query; then you can qualify it and transform it so that it becomes information. This is a simpler form for many users, and has almost the same power as the syntactic interface.
A BI tool's main figure of merit should be its ability to examine databases for information. and to give the user as much flexibility in doing this as possible. This flexibility resides much more on the tool's ability to examine the data quickly and filter it for information, than in presenting the data so that it can be visually examined, or manually correlated. For example, if you are trying to find salespeople who underperformed, it can be inefficient to look at everybody's sales and then pointing at those that are low, for details. First, you are dumping a lot of data out for inspection, then you make another set of passes for details. DBsurfer assumes you have filtering criteria for your query and then formulate something like ...names of salespersons that sell in RegionA and that have sales with the average of sale less than 1/2 the average sale in RegionA... (I don't want to mislead you - DBsurfer guides you into paraphrasing this as: "... names for salespersons that have sales with region equal to RegionA and with the average of sales less than 0.5 * the average of sale quantity of {sales with region equal to RegionA}".)
It is far easier to do this in DBsurfer - due to its guided synthetic language visual parsing technology - than would otherwise be the case. What this query has done is place the information extraction work squarely on the shoulders of the database server. In addition, the returned information is probably much less, so it economizes on communications bandwidth. Over plain old telephone lines (POTS) this can be a factor.
One other attribute has to be in place in order for all this to work: the query must be formulated in efficient SQL, otherwise it will take an unnecessarily long time to run. DBsurfer's SQL translator was designed to do precisely that.
Setting the stage for DBsurfer
Although the rise of datamarting is a result of a massive amount of data buildup, it could also be due to a fundamental lack of sophisticated ad-hoc querying technology. I say fundamental because on the surface there seem to be a multitude of products out there. But there is a great dearth of product that can give end-users the same (or at least similar) capabilities in dealing with data as an advanced SQL programmer, without a significant effort in setting things up for the end-user.
In fact, the irony of the situation is that an innocent ad-hoc query can tax SQL expertise far more in a single query than half a day of SQL programming in a production environment. This is not a slight against SQL programmers - it is just the nature of it. In production SQL you can decompose a complex SQL query into smaller, more manageable pieces; then assemble them with procedural code. This code may interact not only with the program user, but with other system components and is in general very complex in itself.
But an ad-hoc query user is going to pose a question in one breath, does not have any (significant) procedural "glue" and will expect a non-procedural SQL program to yield the answer from within a single sentence. What is more, there is absolutely no programming expertise expected of this user, so the SQL program has to be correct the first time. There is no debugging. Nor does the user care whether the program is just a simple join, a series of complex nested subqueries, or a combination of these. There is in fact absolutely no correlation between an English query and its corresponding SQL. Lengthy, convoluted English can lead to simple SQL, and terse, simple English can lead to excruciatingly complex, eye-watering SQL. A major reason for this is that the language mapping the query to the database involves the structure of the relational database itself.
Not only that. The SQL generated by the program has to be of high enough professional caliber to be allowed to run by IS. I have already shown elsewhere that SQL can be written in such a simple-minded manner that it becomes wildly inefficient. The reason for this is that simple-minded SQL suffers from a lack of idioms that enable the server to introduce optimizations. Replacing a series of relational intersects with a great big join - for example - is not the way to get the best out of the database server. In addition, SQL cannot live by joins alone. In order to exploit its vocabulary, forcing the use of joins can constitute a serious impediment to the implementation of aggregation comparisons involving groupings, intersects mixed with differences, set (in)equalities and relational divisions.
What current query products have done to date is introduce a new form of procedural glue for the end user. DBsurfer offers another alternative - one that reports and graphs the final result. In order to do this successfully, DBsurfer SQL does not place tables in a join when they are only required to express query filtering conditions. It can also do more.
What DBsurfer brings to the table
What we have done with DBsurfer is to provide a product that has a large degree of scalability. This term refers not only to the size of organization using it and the size of the database serving it, but to the degrees of freedom in a query. In a smaller company the staff the staff to build a data warehouse may not be available; there may also not be enough - or any - individuals with SQL expertise, nor any formal IT department. Such companies need a way to phase in an ad-hoc query product incrementally. The product should at least be able to query the database right "out of the box", yet have customizing features that can be added in stages.
Most queries posed by casual users can be simple (conceptually and technically) e.g. "Customers that buy expensive items". Others are conceptually simple but technically complex - e.g. "Customers that buy only expensive items" and you need a technically scalable query builder to make a seamless transition. It should be just as easy to ask either question.
Technical scalability goes further. Once you have a query designed you will want to treat it as a building block for others. In that way it becomes trivial to ask "Salespersons that sell to customers that buy only expensive items", and you have syntactic scalability. This can also be thought of as semantic layering. The important issue is that in DBsurfer this capability is available at the end-user level. It does not require programming.
DBsurfer is also a tool designed to improve the process of building a data warehouse and to subsequently exploit it successfully.
DBsurfer can bring more cost-effective choices to IS. It can eliminate the latency, political decisions and enormous costs of datamarting/datawarehousing with relatively inexpensive and straightforward replication technology. This can be an end in itself, or a first step in implementing the former. It can also give smaller companies the choice of avoiding or deferring a datamart until the resources to build it are in place.
Replicating (i.e. partially or completely mirroring) a production database can be done in two ways: time-shifting the data (overnight) to another machine to which ad-hoc users can be attached, or using a replication server. The first solution is more economical than the second, but is not viable for time-critical environments. The second solution delivers quasi-real time data, at higher cost.
The reason replication can now be effective is that DBsurfer vastly increases query efficiency and promotes better use of the database. Denormalization is no longer required, and the data retains its original, live structure and therefore its degrees of freedom, to satisfy the diverse needs for information mentioned above.
The scalability of DBsurfer is evident in another respect. Most users are familiar with the fact that query tools allow them to qualify data by issuing selection criteria on the data values (e.g. customers with city = 'Los Angeles'.) However, these same tools don't allow them to qualify the relationships to other data. DBsurfer does, by enabling queries with cross-reference criteria (e.g. customers with city = 'Los Angeles' that cross-reference orders with item price > 1000.) This feature is very powerful when joins need to be avoided. For example, customer names and order numbers for customers that have salespersons with name = 'Jones' - this query avoids a double join and is therefore faster to execute.
Another feature of DBsurfer is that it enables a larger population to benefit from ad-hoc querying by providing end-users with a much higher degree of independence from IS and by giving them a more expressive query tool. This places information that can be critical to timely decision support at the leaves of the organization - where it can be used more efficiently to answer daily questions, without detracting from its capabilities as a planning tool by analysts.
In addition, the architecture of DBsurfer promotes a different way of thinking about information. It places much higher emphasis on the central repository - the database server - to manage the information extraction process. It replaces the concept of dumping data in multiple forms into grinding mills that will then provide more clearly visualized results, with the technology to express a thought as clearly as possible and get an answer. It obsoletes the concept that data must first be seen to be useful, that it must undergo a multi-phasic process before it can arrive at a final, useful form. The fundamental premise of many if not all EIS software is that raw data must be present in the client to support multidimensional analysis (axis transformation). DBsurfer's speed is such that it does not suffer this burden. Fast response is carried through in all aspects of its implementation, and it is entirely practical to query million-line databases over POTS lines - even if it's a toll call. Basically, DBsurfer's method of analysis is more like management by exception than by vast absorption of data.
On the other hand, DBsurfer is not a natural language tool so it does not unnaturally break ad-hoc query users in their stride with questions designed to disambiguate or re-phrase the query, nor does it suffer from a large mandatory setup overhead that can defeat the very spirit of what ad-hoc querying is. Even if there were any true natural language querying software on the market, for each query that you can pose, there is a very large number of similar queries you cannot make. There are many reasons for this: technical shortcomings in parsing, ambiguity, inadequate or incomplete domain representation, semantic inconsistency in the target (SQL is very limited compared to English), undue complexity. Actually, each one of these problems also comes in various flavours, but that is more than need be said here.
By increasing query efficiency, DBsurfer lifts many constraints about who may run a query, when it can be run and quite possibly, where it can be run. By reducing the amount of client/server data transmission DBsurfer is useable over low-bandwidth hardware. By presenting a very simple, compact, guided interface it's ideal for touch-screen query applications.
We believe that the features of DBsurfer support these contentions, but we are under no illusion that it is a universal panacea, just that for many its use can be an effective way to promote business success.
A consulting component
DBsurfer is not entirely a stand-alone product. It makes an excellent stab at working "out-of-the-box" and allowing optional incremental embellishments to improve the query process. It doesn't require massive initial setups before you can formulate any kind of query. However, almost every database has subtle idiosyncracies that can elude the casual user. Every database that wasn't designed with the end-user in mind needs a rewrite of what the DBsurfer ad-hoc query user sees: names and verbs. Some end-users need support with pre-manufactured higher-level query components or views. To consultants versed in database technology these tasks can leverage their knowledge to provide end-users a very high level of value added with far less effort than they the latter would have to generate by themselves. The support demands on busy IT staff can be kept to a minimum and their assistance drawn upon at convenient times.
An educational component
DBsurfer contains access and full editing facilities for a language called QBE (Query-By-Example). QBE is poorly understood and unsuitable for non-programmers. However, it is a superb training tool for relational access concepts and - for programmers - it is far easier to learn than SQL. Because its use is rather demanding, its querying power has been considerably diluted by currently available implementations - in order to make it more palatable. DBsurfer QBE is an implementation that while less user-friendly than others, promotes its capabilities to the fullest. DBsurfer QBE provides seamless access to all SQL idioms and approaches SQL itself in querying power.
DBsurfer makes it possible - for the first time ever - to express a query either in English or with drag and drop components, obtain its QBE equivalent, then produce its SQL-equivalent, thus providing students and teachers with a fertile ground for demonstrating and learning SQL concepts and relational access. DBsurfer QBE includes elegant solutions to difficult areas such as negation, self-joins, self-intesects, self-differences, set equivalence and set containment.
Other system designers seem to agree with our position that attempting to make QBE easier to use by changing its original form is like de-tuning a racing car to make it street legal - it becomes easier to use but less responsive.
(c) Copyright 1999-2003 IBX datasystems limited