The Warwick Metadata Workshop:

A Framework for theDeployment of Resource Description

Lorcan Dempsey
UKOLN, University of Bath, UK
http://www.ukoln.ac.uk/~lisld/
[email protected]

Stuart L. Weibel
OCLC Office of Research, Dublin, Ohio, USA
http://purl.oclc.org/net/weibel
[email protected]

D-Lib Magazine, July/August 1996

ISSN 1082-9873

Contents

  1. Introduction
  2. Moving the Dublin Core Forward
    1. Scope and Constraints of the Dublin Core
    2. Target Uses for the Dublin Core
    3. Early Dublin Core Pilot Projects
    4. Other Simple Resource Description Models
    5. Impediments to Wider Deployment
  3. The Warwick Framework: an Architecture for Metadata
    1. The Need for the Warwick Framework
    2. Requirements for Deployment
    3. Moving Forward
  4. Conclusions and Directions
  5. References and Bibliography

1. Introduction

The first week of April 1996 found fifty representatives of libraries, Internet standards, text markup, and digital library projects converging at Warwick University for the OCLC/UKOLN Warwick Metadata Workshop. The conferees came from three continents, eleven countries, and numerous perspectives in an effort to apply their collective experience to the clarification of issues surrounding the effective deployment of metadata for networked information resources.

The meeting followed last year's OCLC/NCSA Metadata Workshop, which convened a similarly diverse collection of stakeholders, and resulted in consensus on a simple resource description record that has come to be known as the Dublin Core. Indeed, the consensus itself may may well have been the first workshop's most important deliverable. The thirteen elements of a Dublin Core record contain few surprises, focusing largely on what might be thought of as network resource bibliography and a little bit more. [WEIB95a].

The Dublin Core received considerable attention as a simple resource description record in the year since the first meeting. While the first workshop helped to focus discussion of the topic in many communities, the implementation of such a description record requires a formal syntax and deployment strategy that were beyond the scope of that first meeting.

Planning for the second workshop began with informal discussions between the UK Office for Library and Information Networking (UKOLN) and OCLC's Office of Research in the summer of 1995. The agenda for the meeting gradually crystallized around the theme of identifying and resolving impediments to deployment of a Dublin Core style record for resource description. The expectations of the organisers and participants were exceeded over the course of the meeting as conferees worked towards a number of related conclusions about the Dublin Core Metadata element set, about the need for a wider set of metadata types, and about an extensible framework for interchange of metadata of different types. A consensus about these issues emerged from the workshop and a set of concrete proposals for moving forward has been produced. The areas of consensus include:

  1. Dublin Core

  2. Warwick Framework

  3. Guide to Creation and Maintenance of Metadata

This paper provides a high-level overview of the issues discussed at the workshop. It brings together descriptions of the above outcomes and places them in context. Section 2 discusses the Dublin Core and the proposals for taking it forward. Section 3 discusses the rationale for the Warwick Framework.


2. Moving the Dublin Core Forward

2.1 Scope and Constraints of the Dublin Core

The Dublin Core Metadata element set is a set of thirteen metadata elements proposed by the first workshop as a core description record to facilitate discovery of document-like objects in a networked environment. To facilitate progress, a number of constraints were imposed on the discussions of the two and one half day workshop of April, 1995:

Table 1. The Dublin Core Elements
Subject The topic addressed by the work.
Title The name of the object.
Author The person(s) primarily responsible for the intellectual content of the object.
Publisher The agent or agency responsible for making the object available in its current form.
Other AgentThe person(s), such as editors, transcribers, and illustrators who have made other significant intellectual contributions to the work.
Date The date of publication.
Object typeThe genre of the object, such as novel, poem or dictionary.
Form The physical manifestation of the object, such as PostScript file or Windows executable file.
Identifier String or number used to uniquely identify the object.
Relation Relationship to other objects.
Source Objects, either print or electronic, from which this object is derived, if applicable.
Language Language of the intellectual content.
Coverage The spatial location and/or temporal duration characteristics of the object.

The 1995 Dublin Metadata Workshop is described in greater detail in:
[WEIB 95a] and [WEIB95b] .

The reference description of the element set can be found at:
http://purl.org/metadata/dublin_core_elements

2.2 Target Uses for the Dublin Core

The development of the Dublin Core is motivated by several intended uses:

  1. A simple interchange format for descriptive metadata
  2. Content self-description for networked objects
  3. Semantic interoperability across domains

It is clear from early implementation experience (see 2.3 below) that projects have adopted the semantic flavor of the Dublin Core to develop simple resource description formats. The Dublin Core is intended to fill the niche between the terseness of the unstructured full-text web indexes and the structured description of more complex models such as MARC. It is intended to be sufficiently rich to support useful fielded retrieval but simple enough not to require specialist expertise or extensive manual effort to create.

Simplicity is especially important in the context of author-generated metadata. Conferees at both the 1995 and the 1996 workshops recognized the importance of embedded metadata in Web documents to be harvested by software robots. The key to success is balancing the need for well-structured metadata with the requirement that the creation of the description is manageable by authors.

Future applications will have to work with different types of metadata from different sources. The Dublin Core was positioned to provide a common set of tags that would have recognizable meaning across description models, and in that way provide a unifying semantics among many disciplines. The National Document and Information Service, a joint project of the National Library of Australia and the National Library of New Zealand, (described below) is one example of such a use.

2.3 Early Dublin Core Pilot Projects

Even absent a clearly defined syntax, the Dublin Core element set attracted the interest of a number of early adopters who developed projects that built on the consensus that emerged from the Dublin Metadata Workshop. Some of these include:

2.4 Other Simple Resource Description Models

Among the factors that motivated the Warwick Framework, described later in this paper, is the certainty that a variety of resource description models will emerge from different communities. A successful architecture of network resource description must accommodate such diversity.

Examples of other simple resource description models discussed at the workshop include RFC 1807 and IAFA templates:

2.5 Impediments to Wider Deployment

Among the major goals of the Warwick Workshop was the identification of impediments to successful deployment of a simple Internet resource description format such as the Dublin Core. Early workshop discussions identified four areas requiring substantive progress:

Specification of a Transfer Syntax

Discussions of syntax are often difficult, burdened as they are with the biases of familiarity and competing methodologies. The earlier Dublin Workshop made progress partly because such discussions were ruled out of scope. However, consensus concerning semantics cannot be deployed without a concrete syntax (or syntaxes). In pilot implementations, the absence of a common model led to different syntax and structuring choices. Clearly, any widespread deployment of Dublin Core (or any similar description scheme) hinges on reaching consensus about a transfer syntax.

Since the Web is currently the primary medium of the Internet, it was further recognized that deployment of metadata in the Web is the primary strategic application; successful deployment of metadata in HTML is necessary, though almost certainly not sufficient.

A working group on syntax formed around this issue, and this group has elaborated a position paper describing a formal syntax for Dublin Core Metadata. A Syntax for Dublin Core Metadata (Burnard, Miller, Quin, and Sperberg-McQueen) includes:

  1. A concrete syntax expressed as an SGML DTD
  2. A mapping of this DTD into existing HTML tags using the meta element of HTML 2.0
  3. A proposal for 'keeping the metadata at arms length' by allowing metadata consumers recognise references to external metadata using the LINK element.

In related developments, a convention for embedding metadata in HTML was proposed in a break-out group at the subsequent W3C Distributed Indexing and Searching Workshop, May 28-29, 1996. This break out group included representatives of the Dublin Core/Warwick Framework Metadata meetings, representatives of several major Web search vendors (Lycos, Microsoft, WebCrawler), various other software vendors, and the W3 Consortium.

The problem is to identify a simple means of embedding metadata within HTML documents without requiring additional tags or changes to browser software, and without unnecessarily compromising current practices for robot collection of data.

While metadata is intended for display in some situations, it is judged undesireable for such embedded metadata to display on browser screens as a side effect of displaying a document. Therefore, any solution requires encoding information in attribute tags rather than as container element content.

The goal was to agree on a simple convention for encoding structured metadata information of a variety of types (which may or may not be registered with a central registry analogous to the MIME Type registry). It was judged that a registry may be a necessary feature of the metadata infrastructure as alternative schema are elaborated, but that deployment in the short-term could go forward without such a registry, especially in light of the proposed use of the LINK tag to link descriptions to a standard schema description as described below.

The solution agreed upon is to encode schema elements in META tags, one element per META tag, and as many META tags as are necessary. Grouping of schema elements is achieved by a prefix schema identifier associated with each schema element.

A convention for linking resource description tags to the reference definition of the metadata schema (or schemata) used in a document was also proposed. Doing so serves as a primitive registration mechanism for metadata schemata, and lays the foundation for a more formal, machine-readable linkage mechanism in the future [WEIB96].

The proposed conventions are described more fully in http://www.oclc.org:5046/~weibel/html-meta.html

Development of User Guides

Resource descriptions might be created by a number of different agents in the metadata chain: authors, collection administrators, and third-party catalogers. Guidelines for the creation of metadata are needed. A guide for authors themselves would be especially useful in supporting a move to document-embedded descriptions, and at least one producer of HTML authoring tools (SoftQuad, Ltd.) has committed to embedding Dublin Core resource description templates in their products when the syntax and guidelines are sufficiently stable.

A working group on user guides formed at Warwick around the task of providing such guidelines [KUNZ96]. Their efforts are evolving and are linked to the Dublin Core home page.

Extensibility -- Mixing and Matching Metadata

The Dublin Core addresses one particular niche of the metadata ecology. It is a simple resource description format that is intended to be extensible in at least two ways. As its name implies, it is intended to provide a commonly understandable core of elements that will help unify different models of resource description. Its simplicity is among its major virtues, but users may well wish to augment description of their resources with additional data.

Original concepts of extensibility for the Dublin Core assumed a mechanism for local extensions -- additional elements added at the discretion of authors or collection maintainers. Such local information may be critical to the effective use of a particular collection, though the local character of such elements may not be of general interest or usefulness.

Of perhaps greater importance is the need to link Dublin Core records to other, richer description schemes (for example, MARC). The ability to link a simple description record to a richer description model provides a means to promote one record type to a more complete description as warranted, and also affords a more continuous axis of resource description (from simple to complex) to suit a variety of user or system needs.

Additionally, Dublin Core data address only one niche of the metadata ecology (resource description for search and retrieval). Other types of description are necessary, as well: terms and conditions (who must pay what to whom, for example), archival status, administrative metadata, and others.

Finally, there are competing models of resource description that overlap the Dublin Core to one degree or another. RFC 1807 and IAFA templates discussed above are examples of such formats. Workshop discussions on extensibility merged with this recognition of the need to accommodate different description models. No single format for resource description will fill all the needs, nor could such a monolithic model be easily maintained. The consensus of the workshop converged on a need for an architecture that would accommodate the diversity of models and levels of description that characterize the heterogeneous world of electronic resources.

The proposal that emerged from these discussions is known as the Warwick Framework, discussed in detail in a companion article by Carl Lagoze in this issue of D-Lib Magazine. It is an architecture for the aggregation and interchange of discrete metadata packages. Such an architecture will afford the opportunity to mix and match metadata sets, allowing rational deployment of many existing and emergent description models. The following section summarizes the essential features of the Warwick Framework.


3. An Architecture for Metadata: The Warwick Framework

3.1 The Need for the Warwick Framework

No single element set will satisfy all metadata requirements. Different communities of users or different application areas will require data of different elements and levels of complexity. The Workshop took as its starting point the Dublin Core, a simple scheme for what might be thought of as electronic bibliography. However, other application areas might require the fullness and structure provided by a MARC-type record, for example, or might have domain specific descriptive requirements not addressed in the Dublin Core. At the same time other types of data exist which were outside the scope of the Dublin Core: terms and conditions, evaluative data, for example.

Satisfying the need for competing, overlapping, and complementary metadata models requires an architecture that will accommodate a wide variety of seperately maintained metadata models. It was concluded that an architecture for the interchange of metadata packages was required. A package is conceived as a metadata object specialized for a particular purpose. A Dublin Core-based record might be one package, a MARC record another, terms and conditions another, and so on. Such discrete packages might be numerous and varied in content and even source. Users or software agents would need the ability to aggregate these discreet metadata packages in a conceptual container (a metadata basket of sorts), hence the notion of a container-package architecture.

This architecture should be modular, to allow for differently typed metadata objects; extensible, to allow for new metadata types; distributed, to allow external metadata objects to be referenced; recursive, to allow metadata objects to be treated as 'information content' and have metadata objects associated with them.

Packages are typed objects. They may be primitive (a package is one of a number of separately defined, primitive metadata formats); indirect (a package may be a reference to an external object); or a container (a container is a collection of metadata objects, which may in turn be packages or other containers).

Several benefits flow from a container-package approach:

The Warwick Framework is a high-level container architecture: it makes no assumptions about the contents of the packages. Nor can it be assumed that clients (or agents) will be able to interpret all packages. Conferees agreed that packages should be strongly typed and that a registry for metadata types will probably be required, perhaps along the same lines as the IANA registry for Internet Media Types (also known as MIME types).

3.2 Requirements for Implementation

Concrete Implementations

The architecture needs to be realized in one or more concrete implementations. Proposals for MIME- and SGML- based implementations have been prepared as well as a discussion of the architecture in a distributed object environment.

Registration

A registry agency for metadata object types needs to be established. Early implementation pilot projects should not be hampered by the lack of such an agency, but as more metadata sets are elaborated by various stakeholders, a formal means for managing changes will be important.

3.3 Moving Forward

The Warwick Framework was enthusiastically welcomed at the workshop as a practical approach to the effective integration of metadata into a global information infrastructure. The realization of such an architecture will require great effort on many fronts, in many communities. The great hope is that the consensus achieved at this meeting will have provided the foundation for coordination, and sufficient freedom in the proposed architecture to allow progress without an undue burden of close coordination.

The following working papers address aspects of the Warwick Framework more fully:


4. Conclusions and Directions

Conferees left Warwick convinced that significant progress had been made in important areas. This conviction is corroborated by the rapid appearance of a number of documents supporting key decisions and recommendations.

The consensus concerning embedding metadata in HTML reached at the W3C workshop on Distributed Indexing and Searching provides an encouraging impetus to rapid deployment of richer resource description techniques on the Web along the lines developed in the Warwick Workshop.

The recent appearance of a Dublin Core implementation based on these developments http://archaeology.ahds.ac.uk/project/metadata/dublin.html is a promising indicator of the need and demand for better resource description on the Internet, and the speed with which such ideas can be promulgated when community concensus emerges.

It is hoped that the Warwick Workshop will prove to have galvanized such a consensus and provided an important signpost for the development of more effective networked resource description.


5. References and Bibliography

  1. [BURN96] A Syntax for Dublin Core Metadata Lou Burnard, Eric Miller, Liam Quin, C.M. Sperberg-McQueen <URL:http://purl/org/metadata/dublin_core/.syntax.html>.
  2. [CAPL96] Metadata for Internet Resources: The Dublin Core Metadata Elements Set and Its Mapping to USMARC Cataloging and Classification Quarterly, Priscilla Caplan and Rebecca Guenther, (In Press).
  3. [GUEN95a] DISCUSSION PAPER NO. 86: Mapping the Dublin Core Metadata Elements to USMARC submitted to The USMARC Advisory Group by Rebecca Guenther of the Library of Congress, June 1995. <URL:gopher://marvel.loc.gov:70/00/.listarch/usmarc/dp86.doc>
  4. [GUEN95b] DISCUSSION PAPER NO. 88: Defining a Generic Author Field in USMARC submitted to The USMARC Advisory Group by Rebecca Guenther of the Library of Congress, May, 1995. <URL:gopher://marvel.loc.gov:70/00/.listarch/usmarc/dp88.doc>
  5. [HAKA96] Warwick Framework and Dublin Core set provide a comprehensive infrastructure for network resource description. Juha Hakala, Ole Husby, and Traugott Koch June 10, 1996. <URL:http://www.bibsys.no/warwick.html>.
  6. [KNIG96] A MIME implementation for the Warwick Framework Jon Knight and Martin Hamilton, 1996 <URL:http://weeble.lut.ac.uk/MIME-WF.html>.
  7. [KUNZ96] Guide to Creating Dublin Core Descriptive Metadata John Kunze, Priscilla Caplan, Bemal Rajapatarana, Frank Roos, and Tom Baker. Work in progress <URL:http://purl.oclc.org/metadata/dublin_core/guide>.
  8. [LAGO96] The Warwick Framework: a container architecture for aggregating metadata objects (Carl Lagoze and Clifford Lynch, and Ron Daniel)
    An overview of the Warwick Framework Architecture.
  9. [LASH95] RFC 1807: A Format for Bibliographic Records Rebecca Lasher and D. Cohen, June, 1995. <URL:http://ds.internic.net/rfc/rfc1807.txt>.
  10. [MILL96] Archaeology Data Service Paul Miller, Graphics & GIS Advisor, University Computing Service, University of Newcastle, Claremont Road, Newcastle upon Tyne NE1 7RU <URL:http://www.ncl.ac.uk/~napm1/ads/metadata>.
  11. [WEIB95a] OCLC/NCSA Metadata Workshop Report: The Essential Elements of Network Object Description Stuart Weibel, Jean Godby, Eric Miller and Ron Daniel. June, 1995 <URL:http://purl.oclc.org/oclc/rsch/metadataI>.
  12. [WEIB95b] Metadata: The foundations of Resource Description Stuart Weibel. D-Lib Magazine, July, 1995 <URL:http://www.dlib.org/dlib/July95/07weibel.html>.
  13. [WEIB96] A Proposed Convention for Embedding Metadata in HTML A working group report from the W3C Distributed Indexing Workshop, May 28-29, 1996 (reported by Stuart Weibel) <URL:http://www.oclc.org:5046/~weibel/html-meta.html>.
  14. [LCON95] Proposal No. 96-2: Define a Generic Author Field in the Bibliographic, Authority, Classification, and Community Information Formats Library of Congress, Network Development and MARC Standards Office. Washington, D.C.: Library of Congress, 1996.

Acknowledgements

The authors are indebted to many organizations and individuals that paved the way for this work and contributed substantively to the success achieved.

July 15, 1996

Copyright © 1996 Lorcan Dempsey and Stuart L. Weibel

D-Lib Magazine |  Current Issue | Comments
Next Story

hdl://cnri.dlib/january96-weibel