Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Articles

spacer

D-Lib Magazine
November 2002

Volume 8 Number 11

ISSN 1082-9873

Software for Building a Full-Featured Discipline-Based Web Portal

The Scout Portal Toolkit

 

Edward Almasy <[email protected]>
David Sleasman <[email protected]>
Rachael Bower <[email protected]>
Internet Scout Project
University of Wisconsin - Madison

Red Line

spacer

Abstract

The University of Wisconsin-Madison's Internet Scout Project [1] received funding in the fall of 2000 from the Andrew W. Mellon Foundation [2] to build an open source software package that would allow collection developers to share their collection's metadata via the web. The resulting software, the Scout Portal Toolkit (SPT), is virtually turnkey, very inexpensive to maintain and operate, and easy for non-technical staff to download, set up and populate with metadata. Conforming to international standards for metadata, data harvesting, and Web technology makes SPT useful for and usable by a wide variety of projects and organizations, allowing and encouraging collaboration and record sharing among projects. Over the SPT project's two-year period, beta testers and in-house quality assurance testing provided valuable feedback, helping to ensure that the software was robust, easy to use, and well-suited to the needs of the intended audience.

Introduction

In today's Internet, with information overload prevalent even within a single discipline, scholars struggle to find the precise material they need in the tangled web of online information. The major search engines don't offer great precision or any guarantee of authority. The best sites in a given field are spread around the nooks and crannies of the Internet and need to be located and then individually searched for relevant information. Even electronic mailing lists can require substantial effort to monitor and sift to extract useful information. In addition, most scholars and researchers lack the extra time needed to roam the Web, trying to stay abreast of all the new resources and tools that, ironically, could make the task of locating information easier.

In some disciplines, this problem is being addressed by organizations that take a leadership role by building Web sites called subject gateways or discipline-based portals. These Web sites usually focus on a specific topic or scholarly discipline, and they often provide information in a variety of forms and from many sources. For example, a discipline-based portal may feature:

  • A browseable directory of online resources, described and arranged by subject;
  • A search facility that includes only resources related to the field and that allows searching by title, author, subject, etc.;
  • Current news stories related to the field;
  • Forums for discussing specific discipline-related issues; and
  • Facilities for scholars to comment on specific resources.

By bringing together various collections and access points into one integrated Web site, a discipline-based portal can bring coherence to the body of online information available in a given field of study, providing scholars and researchers with a facility that will save them substantial time and increase their awareness of other work in their field.

Given all of the above, a discipline-based portal sounds like a fine thing to put online, but building a high-quality portal with even a portion of these facilities can be a daunting undertaking. Although the benefits of setting up a discipline-based portal are clear, many organizations with a strong focus on a particular discipline don't have ready access to extensive technical resources, and even those organizations that do are likely to have their resources already committed to existing projects or working to support the organization's day-to-day operations.

The Scout Portal Toolkit (SPT), an open source software package developed by the Internet Scout Project under a grant from the Andrew W. Mellon Foundation, was created to address this problem. It allows a group or organization to share a specific knowledge base via a full-featured portal on the Web, with almost no investment in technical resources or infrastructure. Many groups and organizations already have available the minimal resources needed to put a discipline-based portal online, at little or no cost, using SPT. In addition, SPT provides facilities beyond those usually found in discipline-based portals, as discussed below.

Entering Data

Setting up the Scout Portal Toolkit is intended to be a simple process. (Detailed information about hardware and software requirements, as well as where to obtain the software, can be found at the end of this article.) Once SPT has been installed, the first task at hand to make the portal useful is the entry of records from the user's collection.

This section will explain what types of data can be entered into SPT and what tools are built into SPT to allow for metadata addition and manipulation. The Scout Portal Toolkit is distributed with a set of sample data that is loaded during installation, so that administrators and resource editors can see how the portal works with data in place. When it is no longer needed, resource editors can easily delete this sample data and enter new data into the portal.

The Metadata Field Editor

Out of the box, SPT comes with a Dublin Core Element Set [3] with several extensions taken from the qualifier sets defined by the DCMI Education [4] and DCMI Administrative Metadata Working Groups [5]. These may be sufficient for many project and user needs. However, some groups will wish to modify these fields or add additional fields. By using the Metadata Field Editor, a user with administrative privileges can add new fields. The eight basic data types supporting the creation of these new metadata fields are:

  • Text - A free-form field that may contain any textual data.
  • Paragraph - A free-form field that may contain any textual data. To ensure proper display and entry, a Paragraph field differs from a Text field in that the Paragraph field is expected to normally hold several lines of text.
  • Number - A numeric field that may have limits imposed and can be compared to other values when setting up searching criteria.
  • Date - A field containing a date value or date range. Date fields can contain whatever level of precision is appropriate, and support several additional attributes, such as a prepended "c" to indicate that the value entered represents a copyright date.
  • Flag - A Boolean field containing a true or false value. Labels may be assigned to True and False for clarity.
  • Controlled Name - A field containing an entry from a list of values that is maintained separately to ensure consistent naming. By default, SPT comes with four controlled name fields defined: Publisher, Creator, Contributor, and Subject.
  • Option - A field containing one of a number of attributes. SPT comes with four Option fields defined: Resource Type, Language, Audience, and Format.
  • Classification - A field containing one or more entries taken from a hierarchy of values that is maintained separately.

In addition to basic field attributes like type and name, the Metadata Field Editor allows an administrator to set default values and a number of type-specific field attributes, such as minimum and maximum value for numeric fields and on and off labels for flag fields. The Metadata Field Editor also allows tailoring the performance of the search engine (discussed below) by indicating which fields to consider for keyword searches and how to weight the fields when ranking search results.

Once metadata fields are defined, the primary method of entering resource metadata into SPT is the Metadata Tool.

The Metadata Tool

The Metadata Tool is designed to speed resource cataloging and support the accurate and consistent assignment and recording of metadata required to build and maintain a useful discipline-based portal. The Metadata Tool allows resource editors to add new records, edit records, duplicate records, and delete records. The Metadata Tool also provides special features that aid in resource management. Drop-down menus are available for many metadata fields, and these menus speed entry of commonly used values and help keep metadata vocabulary consistent.

Screen shot

Figure 1. Screen shot of the "Add New Resource" page.

The Metadata Tool also provides fields that support workflow management and editorial review. For example, by default SPT comes with a flag field labeled Okay for Viewing, which defaults to No. Before resources are displayed, the flag field is checked by the rest of the portal, allowing editorial review of material before it becomes available to the general public. If editorial review is not desired, the default value for this field can be set to Yes, via the Metadata Field Editor, and new records will become available immediately.

The portal is designed to assist projects not only in entering metadata about their collection, but also to help manage workflow issues. For example, for a given project, there may be several people entering data, and each person may need different levels of privileges. Some may be allowed to edit classifications for records, for example, but not be allowed to flag a record as complete and ready for viewing by the general public. To support editorial review and other workflow issues (and because, for example, allowing all portal users to access the Metadata Field Editor could certainly present some problems), SPT provides access control on a per-user basis.

Access Permissions

To make use of the Metadata Field Editor, the Metadata Tool, or any of the personalized portal services, users must create an account on the portal with a login name and password. Once logged in, access to various features is controlled by six permission flags, which may be assigned from any account with System Administrator access. The permission flags are:

  • Resource Administrator
  • Controlled Name Administrator
  • Classification Administrator
  • News Administrator
  • Forum Administrator
  • System Administrator

Users may be granted any of these flags independent of the others, allowing, for example, the portal administrator to designate certain individuals as responsible for maintaining the controlled name lists or classification hierarchies, while other individuals handle more administrative matters like monitoring activity in the portal discussion forums or posting news to the front page of the portal. This means that resources editors, and others involved with the portal, could potentially be spread out geographically and still effectively work together to contribute or edit records in a coordinated fashion.

Importing Data

Although the Metadata Tool will likely be the primary method of entering new data into the portal, sometimes—particularly during initial setup—an administrator may want to import records into the portal in bulk. To allow this, SPT supports importing tab-delimited resource records in a flexible format, with the first record in the imported file defining the order and meaning of fields in subsequent records. As with data entered via the Metadata Tool, SPT can adapt to some degree to field content in imported records; for example, dates or date ranges may appear in almost any common format and will be interpreted and stored correctly for use within the portal.

Finding Data

Once the proper metadata fields have been configured and populated with initial resource metadata records, collection developers may be ready to share their efforts with portal users. Of course, those users need some way of locating the specific information on the portal that best meets their needs, and SPT provides several routes toward achieving that end.

Browsing

The simplest and most familiar method for locating information on a Web site is browsing. SPT supports a hierarchical browsing interface, based on classifications assigned to the resources.

Screen shot of Browse Resources page

Figure 2. Screen shot of "Browse Resources" page.

Since classification hierarchies may be either wide (a large number of entries at the top level) or deep (a large number of levels) or both, the SPT browsing interface is dynamically generated, based on the structure of the classification tree. This prevents the browsing interface from becoming unwieldy when a given section of the tree is very broad, while still minimizing the need for users to search through multiple pages to find the classification they seek. To provide users with some idea of the distribution of entries through the classification tree, SPT displays the number of resources present under any branch of the tree. This can prove particularly useful to collection developers in assessing where their collection's strengths lie and where additional effort may be needed to round out the collection.

Searching

Because searching is often faster and more effective than browsing (provided the user has a good idea what they're looking for), many people prefer to search rather than browse to locate data on the Web. SPT provides two separate search mechanisms.

Keyword searching, the method familiar to most people, is very similar to the approach presented by Google™ and other general Web search engines. Users enter terms that are related or may appear in the entries that they're looking for, and those terms are used to determine which resources best fit the search. SPT supports most of the conventions offered by sites like Google™, such as phrase searching (enclosing several words in quotation marks to indicate that the user is looking for the words in that specific sequence) and term exclusion (prepending a minus sign to a word to indicate that the user only wants results that do not include that term). Keyword search results in SPT are ordered by their relevance to the terms entered, as indicated by the search weights set in the Metadata Field Editor by the portal administrator. Fields considered for keyword searches are also determined via this method.

To better take advantage of the precision offered by the metadata assigned to resources, SPT also supports fielded searching. With a fielded search, users enter terms in a fashion similar to a keyword search, but along with the terms users can specify in which fields to look for the terms. For non-textual fields, fielded searching also allows users to specify constraints that can be used to narrow the search results. For example, when searching through a collection of digitized rare books stored online, a user could specify only entries that were published between 1885 and 1890.

Screen shot of Advance Search page

Figure 3. Screen shot of "Advance Search" page.

An extension of fielded searching with what has become known in Internet jargon as "push technology," SPT also offers a feature called User Agents. This capability allows users to set up a fielded search that returns items they may find of interest, and then have that fielded search automatically performed nightly or weekly by the portal, with any new results found being sent to the user via e-mail. This allows the user to keep abreast of new resources that may become available in a timely fashion, and benefits the portal developer by actively maintaining awareness of the portal among the user community. For resource metadata administrators, SPT also offers the option to run these searches on an hourly basis, which may facilitate editorial review or other workflow processes set up among a group of collection developers.

Rating

An active user community and its contributions are key components of most successful Web portals. To leverage user community participation, SPT offers two features: resource rating and resource recommendations.

Resource rating allows users to indicate their opinion on the usefulness of an individual resource and to generate a cumulative rating for the resource based on these opinions. The cumulative ratings are beneficial both to other users, who can use them when determining which resources are most likely to meet their needs, and to the collection developers, who can use the ratings to monitor what portions of the collection users are finding most useful. Cumulative rating values are displayed graphically when browsing, searching, or viewing the full resource record.

Screen shot of Search Results page

Figure 4. Screen shot of "Search Results" page.

As an adjunct to the resource ratings, SPT also provides users the ability to post comments on resources, which are then displayed along with the individual resource record. Again, this can benefit both other users and the collection developers by providing more detail about why users may have found a particular resource useful.

To help insure the integrity of these facilities, only registered portal users may rate resources or post comments.

Recommending

Resource ratings provide information about the perceived usefulness of a resource to other users and the collection developers, but they can also represent a body of information about the needs and preferences of the users who assigned the ratings. To take advantage of this information, SPT includes a recommender system.

Screen shot of Recommendation Sources page

Figure 5. Screen shot of "Recommendation Sources" page.

A recommender system with which many people are familiar is that provided by Amazon.Com™, where a user rates a number of books and then, based on those ratings, Amazon recommends other books. The facility provided by SPT operates in a similar fashion, although it is a content-based recommender system rather than a collaborative recommender system such as Amazon's. A content-based system, which bases recommendations on item attributes, was chosen over a collaborative system [6] because collaborative systems, which base recommendations solely on preferences expressed by groups of users, typically require a very large number of ratings before they begin to offer useful recommendations.

Presenting Data

Entering data into the portal and helping users find the portion of that data that meets their needs have both been discussed, but there still remains the problem of presenting that data to the user in an effective fashion.

Effective presentation can vary widely depending on the subject matter and intended audience. Fortunately, SPT provides several mechanisms that can be used to tailor a portal to meet these specific needs. Using these mechanisms does require some technical expertise, and the mechanisms do not have to be employed to build a useful discipline-based portal. However, if the technical expertise is available, in some situations these mechanisms can dramatically improve the portal experience for the user.

Dynamic Interface

Some portals may need to serve disparate user communities, presenting a different face or offering different features depending on the user. For example, a portal focused on educational resources about Paleontology may want to serve both grade school children and high school students, but a Web site design that can catch and hold the attention of a eight-year-old will likely be judged by a sixteen-year-old as too childish or condescending, and site designs well-suited for either of those groups will likely not be optimal for use by their teachers.

To address this type of situation, the Scout Portal Toolkit supports multiple dynamic user interfaces, assignable on a per-user basis. In practical terms, a portal can have two or more user interfaces that differ significantly from one another in appearance and functionality while still using the exact same underlying SPT installation, configuration, and metadata. Yet, those interfaces can be in use by different users simultaneously.

Screen shot of a custom user interface page

Figure 6. Screen shot of a custom user interface page.

In addition to the default user interface, SPT comes with an interface called "Clean Orange," which can be used as a starting point for creating your own custom interface. All pages are built by beginning with a common page template and then incorporating any page-specific HTML from a separate file. When page-specific HTML is not found for a given interface, the corresponding HTML from the default interface is used, so it is possible to dramatically alter the appearance of a portal site by adding a new interface containing just a redesigned common page template. This also allows changes or additions to be made on a per-page basis to alter appearance or add functionality, without having to create an entire new set of HTML pages.

Customization Hooks

The dynamic interface support provided by SPT is intended primarily to allow customization via HTML, but there are times when more extensive changes or additions are warranted. To support this, SPT offers programming hooks for customization, where additional code may be linked in a way that will affect the operation of existing SPT functionality. Examples of this might include additional filtering of search results, or on-the-fly processing of resource metadata prior to display.

Of course, new versions of the Scout Portal Toolkit may be released with additional functionality or enhanced performance. When an existing SPT installation is upgraded to a new version, interface or programming changes are preserved wherever possible.

Exporting Data

Sometimes when presenting data, the intended recipient is a computer rather than a human being. To address this need, SPT supports exporting data in three formats: RSS, OAI, and tab-delimited text.

RSS [7] is a well-established XML [8] format for syndicating online content, typically article titles or headlines. The first version of RSS was developed and released by Netscape in 1999 [9], and the format has since been adopted as a de facto standard [10] among weblogs and other Web sites where syndicated headlines are desired. SPT supports RSS version 0.92.

The OAI (or Open Archives Initiative [11]) format is an XML-based protocol for harvesting metadata. Developed to be a low-barrier (i.e., easily implemented) method for sharing metadata, the protocol has been adopted very rapidly by the online metadata community. SPT supports version 2.0 [12] of the OAI protocol.

The tab-delimited export format matches the data import format described earlier and should be compatible with many common applications.

Each format is targeting a different audience. RSS will most commonly be used to share resource titles and information to be displayed on other Web sites, such as those implemented with uPortal [13]. OAI will most commonly be used when sharing data with other groups that are working with online metadata, such as NSF's National Science Digital Library (NSDL) initiative [14]. The tab-delimited format should be of use when collection developers want to manipulate data with other, non-Web-based applications.

Community-Building

Two additional features supported by SPT, News & Announcements and Forums, can be useful in building and maintaining a portal user community.

News & Announcements are items that usually appear on the front page of a portal, providing users with information about the portal itself or perhaps events of interest in the subject areas covered by the portal. A separate user permission flag is provided to allow the portal administrator to designate which users are responsible for posting and editing news and announcements.

Forums are the traditional online bulletin boards, integrated into the portal environment. They provide an area on the portal for discussion between portal users and for the portal administrator and collection developers to gather feedback from the user community.

SPT in Use (LearningLanguages.Net)

The Scout Portal Toolkit may be observed in daily use at LearningLanguages.Net [15], a discipline-based portal that focuses on online resources for English-speaking students learning French, Spanish, and Japanese.

Screen shot from LearningLanguages.net

Figure 7. Screen shot from LearningLanguages.net.

LearningLanguages.Net is a good example of the use of the customization hooks present in SPT, as the per-user language focus is built using those hooks. In particular, the search result and browsing hooks are employed to ensure that users are presented only with resource metadata entries that match the language selected by the user.

The Details

Requirements

SPT requires a Linux-based Web server that supports PHP [16] 4.0.6 (or later) and a database server running MySQL [17] 3.23 (or later). PHP must have been installed with MySQL support.

As far as hardware requirements, SPT will run on almost anything that will support PHP and MySQL. If the portal will include a large number of resources (thousands or tens of thousands), collection developers will likely want to run SPT on faster PC hardware, because the search engine and recommender system can both be CPU-intensive.

Where to Get SPT

The Scout Portal Toolkit is available for download from the Internet Scout Project site at:

<http://scout.wisc.edu/research/SPT/download.html>

Two files are available there: the SPT software package and an installation script. The integrity of the files can be verified by checking their MD5 checksums against the values posted at the bottom of the page.

With funding from the Andrew W. Mellon Foundation, the Scout Portal Toolkit was developed by Edward Almasy, Barry Wiegan, David Sleasman, Andy Yaco-Mink, and Rachael Bower. SPT is open source software, licensed under the GNU General Public License [18] and is available at no charge.

References

[1] Internet Scout Project, Department of Computer Sciences, University of Wisconsin-Madison. <http://scout.cs.wisc.edu>.

[2]] Andrew W. Mellon Foundation, <http://www.mellon.org/>.

[3] Dublin Core Metadata Initiative, Dublin Core Element Set, <http://dublincore.org/documents/dces/>.

[4] Dublin Core Metadata Initiative, Education Working Group, <http://dublincore.org/groups/education>.

[5] Dublin Core Metadata Initiative, Administrative Metadata Working Group, <http://dublincore.org/groups/admin>.

[6] Breese, Jack; Heckerman, David; Kadie, Carl. "Empirical Analysis of Predictive Algorithms for Collaborative Filtering." Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison WI, 1998, Morgan Kaufmann. <http://www.research.microsoft.com/users/breese/cfalgs.html>.

[7] Backend.Userland.com, RSS 0.92, <http://backend.userland.com/rss092>.

[8] World Wide Web Consortium, Extensible Markup Language (XML), <http://www.w3.org/XML/>.

[9] Libby, Dan. RSS 0.91 Spec, Revision 3, <http://my.netscape.com/publish/formats/rss-spec-0.91.html>.

[10] Syndic8.com, <http://www.syndic8.com>.

[11] Open Archives Initiative, <http://www.openarchives.org>.

[12] The Open Archives Initiative Protocol for Metadata Harvesting, <http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm>.

[13] uPortal, <http://www.uportal.org/forum/>.

[14] National Science Digital Library, <http://about.nsdl.org>.

[15] Internet Scout Project, University of Wisconsin-Madison, LearningLanguages.net, <http://LearningLanguages.net>.

[16] PHP Group, <http://www.php.net>.

[17] MySQL AB, <http://www.mysql.com>.

[18] GNU General Public License, Version 2, June 1991, <http://www.gnu.org/licenses/gpl.txt>.

 

Copyright © Edward Almasy, David Sleasman and Rachael Bower
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Next Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/november2002-almasy