John A. Kunze
Manager, Advanced Technology Group
[email protected]
Brian N. Warling
Manager, Digital Library Operations
[email protected]
Library and Center for Knowledge Management
University of California, San Francisco
San Francisco, CA 94143-0840
D-Lib Magazine, March 1996
To meet its goal in facilitating the dissemination of biomedical information, GALEN II also provides a publication platform for the UCSF and other relevant communities. In July 1995, the now infamous Brown and Williamson Company tobacco documents were published through GALEN II. Earlier in 1995, the UCSF Investigators' Handbook, an important campus resource, was published. The latest GALEN II publication is Trials Search: California HIV Clinical Trials. More electronic publications are planned for the future.
Providing access to vital biomedical knowledge resources is another important component of GALEN II. Using collection development guidelines specific to electronic resources, a team of librarians selects important resources in the biomedical fields. Criteria such as authenticity, accuracy, currency, and utility to the UCSF community are used in evaluating resources. Selected resources are categorized and then added to the GALEN II Knowledge Resources section along with evaluative information. Future challenges involve streamlining the content management task and facilitating resource selection and evaluation among our peer institutions.
The current MELVYL character-based search interface, while powerful, is difficult to use, especially for the novice and occasional user. The learning curve is rather steep. Through input from the faculty, we have learned that a simpler interface to MEDLINE is be a highly desired feature. Faculty would also like direct access to electronic versions of journals. One of the most exciting aspects of this project is the direct linking from retrieved MEDLINE citations to the full articles via links to the journals in the Red Sage system.
For over two months at the end of 1995, a team consisting of librarians and programmers met on a weekly basis to develop the functional specifications for the interface. The task was divided into three main components: (1) query formulation, (2) interface design, and (3) display formats. Given the relatively short development time frame and the potential for unforeseen technical roadblocks, the team decided to limit the system to a core set of features. There were a number of important pieces of functionality that did not make it into the final specification, such as automatic stemming, search set manipulation and current awareness. These features will be phased in at later times. User testing should reveal any limitations in the current design and also aid in prioritizing the development and incorporation of new features.
The computer-to-computer information retrieval protocol, Z39.50, is especially useful in this situation. Unlike the telnet interface, the Z39.50 interace has precisely defined, machine readable outputs (e.g., records, diagnostics) that convey more than enough structural and semantic information to provide foundation support for a variety of user interfaces. Because software does not tolerate output changes in the user-optimized telnet interface as well as human beings do, the telnet interface is not a stable foundation.
At the same time, Z39.50 has user-oriented features that make it preferable to a typical remote database query language. One of these features is server-side result sets, which allow users to avoid the network transfer of search results when and, especially, if they are wanted. Another feature is Explain, which returns both human and machine-readable descriptions of remote system objects such as databases, indexes, and record formats. There are other Explain features that support special library functions in a standardized way, such as term list scanning and document ordering.
Another benefit of using Z39.50 is that the programming cost for an interface can be amortized by re-applying it to other Z39.50 databases on the network. Once the user interface that interoperates with MEDLINE via Z39.50 is built, only a small fraction of extra effort is needed to re-use it against another database.
All that is required in the simplest case is a knowledge of HTML, the web's standard notation for expressing document text, together with links to other documents (text + links = hypertext). For more complex applications, HTML forms are required. HTML forms can be viewed as a very simple way to specify more advanced user interface components such as dialog boxes, radio buttons, check boxes, and pull down menus.
While the graphical layout of these components is relatively easy to achieve and test in HTML, connecting user manipulation of them (e.g., "checking" a box or entering some text) to HTTP server actions requires programming. Each manipulation by the user is recorded in the HTML form until a special "submit" button causes the "filled-out" form to be sent to the server. What happens then depends on whether or not there is a server gateway.
For these reasons, GALEN II uses the gateway option. In particular, it uses the Common Gateway Interface (CGI) so that it may interoperate with any conforming server base. This means that the MEDLINE program modules will run on a wide variety of servers, and that upgrading the GALEN II server can be done without disrupting the MEDLINE interface.
Our gateway's job is to perform two kinds of translation. It takes user input from a filled-out HTML form, converts it into Z39.50 queries or retrieval requests, and sends them on to the MELVYL MEDLINE server. It also takes Z39.50 responses from the MELVYL server, converts them into HTML forms and sends them back to the browser for display to the user. Put another way, the GALEN II gateway translates HTTP protocol messages into Z39.50 protocol messages, and vice versa.
One of the advantages of the web is the stateless nature of HTTP, which is explained as follows. A server using a stateless protocol (such as HTTP) treats each request as if from a client with which it has never communicated before; in other words, it maintains no memory, or state, regarding the client. In contrast, a stateful protocol (such as Z39.50) is conducted over a session for which the server keeps track of things like user identification and search results as they accumulate over the course of the session.
A particular technical challenge was how to make a series of stateless HTTP fetch requests connect to the same Z39.50 session and have the HTTP responses reflect the continuity of the corresponding series of stateful Z39.50 operations. In GALEN II this is done by making each HTTP request/response contain an HTML form into which a session identifier is inserted using a so-called "hidden field". The semantics of hidden fields in an HTML form dictate that they are not displayed to the user but are simply sent back unchanged when the filled-out form is sent to the server. The HTTP server effectively asks the browser to remember the context in which the form was created, and to remind the server when the next request is sent in.
The session's state is thus passed back and forth instead of being held at the server. While this preserves server simplicity, it merely shifts the burden onto the protocol interaction. The solution is not especially satisfactory and underscores a weakness of HTTP, yet it works well enough to obtain the tremendous leverage of using existing web browser interfaces.
At the heart of form generation is a program module that converts MARC records into HTML suitable for display or into text for downloading. This currently involves a table-driven procedure that steps through each field of the MARC record and consults a rule set that specifies how to display each field and combine it with other fields. The output is a complex stream of text and HTML table codes needed to display the rich set of information carried in a MEDLINE record.
hdl://cnri.dlib/march96-warling