Stories

D-Lib Magazine
March 1999

Volume 5 Issue 3
ISSN 1082-9873

The Mathematics Archives

Making Mathematics Easy to Find on the Web

blue line

Earl D. Fife
Calvin College
[email protected]

Lawrence Husch
University of Tennessee - Knoxville
[email protected]

 

Do a search on AltaVista for "algebra". What do you get? Nearly 700,000 hits, of which AltaVista will allow you to view only what it determines is the top 200. Major search engines such as AltaVista, Excite, HotBot, Lycos, and the like continue to provide a valuable service, but with the recent growth of the Internet, topic-specific sites that provide some organization to the topic are increasingly important. It the goal of the Mathematics Archives to make it easier for the ordinary user to find useful mathematical information on the Web.

An Overview

The Mathematics Archives (http://archives.math.utk.edu) is a multipurpose site for mathematics on the Internet. The focus is on materials which can be used in mathematics education (primarily at the undergraduate level). Resources available range from shareware and public domain software to electronic proceedings of various conferences, to an extensive collection of annotated links to other mathematical sites.

All materials on the Archives are categorized and cross referenced for the convenience of the user. Several search mechanisms are provided. The Harvest search engine is implemented to provide a full text search of most of the pages on the Archives. The software we house and our list of annotated links to mathematical sites are both categorized by subject matter. Each of these collections has a specialized search engine to assist the user in locating desired material.

Services at the Mathematics Archives are divided up into five broad topics:

All pages present at the Mathematics Archives can be searched using our Harvest search engine. This is useful for material that actually resides on our site, but since Harvest returns pages rather than individual links contained within pages, we have written our own search engine to search our annotated list of links.

Technical Aspects

There are three technical aspects of the Mathematics Archives that may be of interest to D-Lib readers -- designing a site such as ours in such a way that ftp, gopher and http all work together off of the same basic structure, construction of the search engine for the topics pages, and implementing automated link checking to maintain large collections of links.

A Unified Site Design

Our collections of Windows/DOS and Macintosh software are the two services that were established first at the Mathematics Archives. When we established the Mathematics Archives at our present site in 1993, we implemented both ftp and gopher. Making both services share the same "root directory" is just a matter of setting configurations of ftpd and gopherd properly. When we introduced http, Mosaic had just been released. Now it was possible to have links to images and text all on one page. Users who had web browsers wanted to view http sites, not return to gopher sites. So, to attract these users and keep their interest in the Mathematics Archives, we wanted to keep calls to gopher to a minimum. Furthermore, not all early web browsers handled the gopher protocol as well as they do today.

We addressed the problem with cgi scripts that would read the information contained in the gopher .links files and the files within the .caps subdirectory to generate html pages. This process is best illustrated within the Macintosh software collection. The main page is a static page "http://archives.math.utk.edu/software/.mac.directory.html" housed on the Archives server. Within this page are links to each topical directory within the collection. Within each topical directory (for example calculus) is a static file named .directory.html which is used to link to each package. (In the calculus directory, it looks like this.)

When a package is selected, say 3D-Filmstrip from within the frame on the right, all files associated with the package are listed within the frame on the left. This is the information from the files within the .caps subdirectory of the 3D-Filmstrip directory and from the .links file with the 3D-Filmstrip directory. The script producing this page makes a list of available files embedded in an appropriate html anchor and identifies the action to be taken on the file. For example, it downloads .hqx files (indicated by ), calls a script to read text files (indicated by ), and creates a hypertext link to remote web pages (indicated by ). If the file is a text file to be read (e.g., Abstract of 3D-Filmstrip) the script called for generates an html page consisting of the text of the file and ending with a list of the other files and links associated with the package.

At any time during the process of trying to locate a package for a particular task, the user may search all textfiles within the Macintosh collection.

A Search Engine for Links Classified by Topic

For over a year we had maintained a list of links to mathematical sites organized by mathematical topic. It had been a popular page, and it even was the model on which other sites began to organize their links. (See AMS, for example.) However, the list was getting so long, and keeping track of the cross-referencing was becoming so tedious, that we restructured it to allow us to perform searches on links and their annotations (keywords).

We now have one page for each major topic, and each link is listed only on the page of its primary classification. Then, each link and annotation conforms to the same pattern:

To perform a search on a word, a Perl script performs the following tasks:
  1. Opens each topics page, one at a time.
  2. For each topics page, it reads the unordered list of links, discarding the remainder of the page.
  3. It splits the list using the <li> tag as a delimiter between items.
  4. For each item, it now uses regular expressions to search for the appearance of the desired word.
  5. If the word is found, the link is stored in memory.
  6. Upon completion of searching all links on all pages, an html page is generated listing all of the links found to have contained the desired word.
This simple structure allows us to update the lists easily and provides the user with a reasonably powerful searching mechanism over the large number of links we have. The searching form appears at the bottom of each topics page and of each dynamically produced html page returning the results of a search.

Automated Link Checking

Shortly after we began organizing links, we realized that keeping our lists current (i.e., removing broken or outdated links) would be an overwhelming task if done by hand. (Our current count is 4,200 web pages with a total of over 33,000 links.) A Perl script was available from David Sibley for checking the links on a single page. It reads the page, parsing it for links, then for each link found, it performs a HEAD request and, if that fails, performs a GET request. We have modified the script to perform some additional specialized checks for us and used it as the basis for another script to do link checking on multiple pages.

Once a week, a cron job runs a script that searches our disk drive for all .html pages and stores a list of them in a file. Before using that file, a second script sorts it and removes files that also appear on the "exceptions list". (This is a file we use to eliminate pages from the searching procedure because they are test files, they contain only local links, or other various reasons.) Then each evening, our multiple file checking program is called by cron and performs a check of approximately 1/7 of the files listed. Errors are written to external files named by date and owner of the file. Each morning, if an owner of a file has errors reported during the previous evening's run, that person's error file is automatically emailed to him or her. Each person is now responsible for errors within their own files. These are checked by hand and corrected or removed.

Concluding Remarks

The classification and identification of mathematical links by people within the profession has been a strength of the Mathematics Archives. Implementing mechanisms to make navigation around the Archives easier for the user has required only a modest amount of scripting. Yet the benefit to the user can been considerable. It can yield richer results in searches than are available from the mega-search engines, and the special highlighting of particularly interesting sites (such as through the Pop Mathematics listing) can even result in more fruitful browsing.

Disciplined-oriented web sites can provide users with the means of finding the wealth of information on the Internet for their specific discipline. The technical expertise to create such a site is becoming more readily available within professional disciplines as the Internet grows in popularity within the professional community. Although commercialization seems to be taking over the Internet, there is an ever-increasing demand for free sites such as the Mathematics Archives and the services it provides.

Acknowledgements

The Mathematics Archives was created in 1993 with funding from the National Science Foundation (DUE-9351398 & DUE-9550943), The Tennessee Science Alliance, Calvin College, and the Department of Mathematics of the University of Tennessee - Knoxville. The co-directors of the Archives from its inception have been the authors of this article, Earl D. Fife and Larry Husch. At various times and durations since its inception, the following have volunteered their services: We gratefully acknowledge their support and contributions.

Copyright © 1999 Earl D. Fife and Lawrence Husch

Top | Contents
Search | Author Index | Title Index | Monthly Issues
Previous Story | Next story
Home| E-mail the Editor

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/march99-fife