XSet: a Search Engine on Treaps for XML
Installation and Usage
Ben Yanbin Zhao
Last updated: March 3rd, 1999

Introduction:
XSet is an XML search engine, utilizing probabilistic data structures called treaps.   Given the ubiquitous applicatbility of XML and its gradual emergence as an internet standard, applications will need in the near future, a way to query and quickly search for XML documents.   XSet is designed to provide high performance XML searching functionality at a low level with minimal overhead.

A much more in-depth discussion of the overall XSet motivation, design and functionality can be found here.

Distribution Files:
In order to utilize the functionality of the XSet application, you need the following source files:

The following files are independent applications that demonstrate the XSet functionality: The directory printers includes a set of sample XML files as well sa the DTD file that they validate to.  Because of a restriction inside MSXML, each DTD reference in the XML files needs to be an absolute file path to the local DTD location.  To make the XML files validate on your file system, please replace the string "/usr/home/ravenben/ninja/classpath/ninja/xset/printers/printers.dtd" with the absolute path of the printers.dtd file on your local file system.  Alternatively, you can replace the file:/// reference to the DTD with a HTTP reference to the printer.dtd file in my homepage.  You can do this by replacing your current DOCTYPE line with the following:
    <!DOCTYPE PRINTCAP SYSTEM "http://www.cs.berkeley.edu/~ravenben/printer.dtd">
Another possibility is to remove the line altogether, which means the file will be a well-formed XML file, but cannot be validated.

All of these files are included in the Ninja 1.0 release under the directory ninja/xset.  Another key part of the XSet source code is the Microsoft XML Parser in Java (MSXML) class files.  Those files are also included in the Ninja 1.0 release under the com.ms.xml.* class tree.  These class files for MSXML can be freely downloaded from Microsoft, inside Microsoft's Java JDK.

Compiling XSet:
Compiling XSet should be easy to do, just type make in the ninja/xset directory.  Make sure that your CLASSPATH includes the directory that contains the com/ms/xml package as well as the ninja/xset source tree.  XSet has been tested with JDK 1.1.6 and 1.1.7.

Running the Demo:
A demo is included in the source tree which demonstrates a trivial application of the XSet search engine.  The demo provides an applet that allows the user to easily create queries for searching printer files and shows the functionality of the XSet ispace service.

The demo makes a lot more sense if you read the demo documentation here beforehand.

Writing applications to use XSet:
I've included XSetService.java, the XSet server interface wrapped up in an ispace service.   To use the XSet functionality, you can either use the XSetService interface, and communicate commands to the server via NinjaRMI, or utilize the SETserver object directly.  Read the source to XSetService.java to see an example of how to use the SETserver API.  Javadoc information is also available.

I would be interested in finding out about any applications you may develop using XSet. Please email ninja-devel@ninja.cs.berkeley.edu me with any problems or suggestions, and to let me know of applications you've written that use XSet.

For general ninja related questions and mail, please send mail to ninja-devel@ninja.cs.berkeley.edu.
 

Ben Zhao
March 3rd, 1999