D-Lib Magazine
|
|
Andy Powell Michael Heaney Lorcan Dempsey |
IntroductionThe description of collections is becoming increasingly important in the context of networked information services and is an important underpinning for developing a collective resource. This view has emerged clearly through the MODELS project [1], where it has influenced the course of the clumps and hybrid libraries [2] who are working with collection and service descriptions, and in UKOLN's recent work on retrospective conversion [3]. In the latter case, a strong view is emerging that libraries need to complement item-based description with description at a higher level. A particular feature of this discussion is that this would complement current work in the archives community and that descriptions at this shared level of granularity would facilitate cross-domain working (while acknowledging that collections may mean different things in the different library, archival and other content models). This has been corroborated by recent work which looks at research issues shared by libraries, archives and museums, where it was recognized that description at this level would support higher level navigation of the cultural resource and selection of particular resources for further searching [4]. The creation of collection descriptions allows the owners or curators of collections to disclose information about their existence and availability to interested parties. Although collection descriptions may take the form of unstructured textual documents, for example a set of Web pages describing a collection, there are significant advantages in describing collections using structured, open, standardized, machine-readable formats. Such descriptions enable:
There are additional advantages where catalogues do not exist for collections, as a collection description may provide some indication to the remote user of content and coverage. This article describes work undertaken as part of the RSLP Collection Description Project, a project funded by the UK Research Support Libraries Programme (RSLP) [5] with the aim of enabling all projects funded through the programme to describe collections in a consistent and machine readable way. With additional funding from OCLC, the project has developed a model of collections and their catalogues [6]. We have used this work to form the basis of a collection description metadata schema [7], implemented using the Resource Description Framework (RDF) [8]. A Web-based tool has been developed that allows the construction of RDF descriptions by filling in a Web form [9]. Associated with this tool there is a detailed set of data entry guidelines [10] and an enumerated list of collection types [11]. Future work will see the development of a Web robot that will harvest collection descriptions from project Web sites and make them available through a central search service. Although it has its origin in the Research Support Libraries Programme, many of whose results will be digital resources of one kind or another, our work is not restricted to the description of digital collections. It is intended that the results of the project should be applicable to physical and digital collections of all kinds, including library, art and museum materials. It is by no means applicable only to the resources of large research libraries. In the specific context of the Programme, the intention of the project was to offset the costs of not adopting a consistent, machine-readable description at an early stage. Such a cost may have fallen on users and managers of collections alike:
Our work suggests that requirements for collection description fall into three broad informational categories. Firstly, descriptive information about the collection. This may include the subject area, ownership, strengths and weaknesses and sources of items within the collection. Secondly, information about how to access the collection, including physical access, in the case of library, museum or archival collections for example, or networked access in the case of digital collections. Thirdly, the terms and conditions associated with access to the collection and individual items within it. The term collection can be applied to any aggregation of individual items. Collections are exemplified in the following, non-exhaustive, list: library collections; museum collections; archives; library, museum and archival catalogues; digital archives; Internet directories; Internet subject gateways; robot generated Web indexes; collections of text, images, sounds, datasets, software, other material or combinations of these (this includes databases, CD-ROMs and collections of Web resources); other collections of physical items. This is a broad list, of overlapping categories. However, it suggests the need for a planned approach, both so that techniques adopted fit in well with broader resource discovery directions and so that techniques are flexible enough to cope with the many collection types that libraries will develop and indicate the relationships between them. It is worth noting that the list includes collections of physical items, collections of digital surrogates of physical items and collections of born-digital items. It is also worth noting that some collections are actually catalogues (metadata) for other collections. For example, a library catalogue typically describes the items in one or more collections within a library. Finally, it is worth noting that collections are often composed of other collections. An analytic model of collections and their cataloguesThe collection description model is aimed in the first instance at those responsible for the development of collection descriptions. It is also a general contribution to the debate about metadata in the digital age. As described above, its initial use is to inform the construction of a demonstrator to which all relevant RSLP projects can feed information. Although the primary purpose of this model is to illumine the process of resource discovery by users, collection description also serves collection management purposes, particularly in discharging an institution's curatorial responsibilities. The work focuses primarily on the needs of libraries in describing their collections but also takes into account the requirements of other sectors. Collection description itself may take a variety of forms, and the model makes no presumption about the format of such a description. The information landscape can be seen as a contour map in which there are mountains, hillocks, valleys, plains and plateaus. A large general collection of information, for example a research library, can be seen as a plateau, raised above the surrounding plain. A specialized collection of particular importance is like a sharp peak. Upon a plateau there might be undulations representing strengths and weaknesses. The scholar surveying this landscape is looking for the high points. A high point represents an area where the potential for gleaning desired information by visiting that spot (physically or by remote means) is greater than that of other areas. To continue the analogy, the scholar is concerned at the initial survey to identify areas rather than specific features - to identify rainforest rather than to retrieve an analysis of the canopy fauna of the Amazon basin. The model attempts to characterize that initial part of the process of information retrieval. The landscape is, however, multidimensional. Where one scholar may see a peak another may see a trough. The task is to devise mapping conventions that enable scholars to read the map of the landscape fruitfully, at the appropriate level of generality or specificity. The IFLA study Functional Requirements of Bibliographic Records [12] identifies (pp. 8-9) four functions of records, progression through which may be seen as constituting a successful traverse of the information landscape and the attainment of one's goal. These are:
The first two of these activities are associated with the traditional areas of catalogue codes, access and description. The relations they embody are characteristically static or at least persistent. A static model may adequately represent them - they are the map of the landscape. The second two reflect the more active operations involved in retrieving and using information; they are transactional in nature, and a dynamic or event-driven model may be more appropriate for them - they represent attempts to use the map to reach the areas of interest. The model attempts to encompass the first two activities. There are, however, many links to be made between all the elements in the process of obtaining information, and these links may be expressed reciprocally. Determining and describing the part of a link that may be embodied in the model inevitably determines the nature of the complementary half, though the objects at the other end may not be described fully, or at all. The list of example collections above can be categorized into those that are collections of entities (e.g. books) or of derived representations of entities (e.g. photographs of pieces of sculpture) on the one hand, and those that are collections of information about such entities. This article refers to a collection of entities as a 'Collection' and to a collection of information about such entities as a 'Collection-Description'. Some types of Collection-Description can themselves be seen as Collections, in this case of metadata rather than primary information. The Creators, Producers, etc., of the secondary Collection will not necessarily be those of the Collection it catalogues, however. Moreover, the secondary Collection can have its own recursive Collection-Description. This article also uses Collection-Description to encompass both intellectually created resources and passive assemblages of data such as those gathered by robotic search engines. The model says nothing explicit about the size of a Collection. It is possible to envisage a Collection consisting of one Item. Where an institution can choose between different degrees of aggregation in determining what are its Collections, there is no structure inherent in the model that requires or predisposes a particular level of aggregation. The institution should base its choices on its own pragmatic grounds, such as the level of detail required to make explicit those elements of the Collection-Description that the institution deems to be useful or necessary for the purposes of resource discovery or collection management. I.e. institutions should adopt a functional granularity approach. A highly simplified view of the model is presented here: Content is an intellectual creation, without reference to any particular instantiation. Item is the concrete (incorporating physical and electronic) realization of Content. Note that, in so far as the model is concerned with collections, the entities Content and Item are considered only to the extent that their types and attributes impinge upon Collection Description. In the vast majority of cases, too, the Items will coincide with what FRBR calls Items, not Manifestations. Item has been chosen as the most neutral term in preference to other terms which have been used such as 'Document' or 'Document-like Object'. Item can most easily embrace all of the concepts of physical and electronic, text and non-text, and human and natural creations. A Collection is an aggregation of physical and/or electronic Items. A Location is the place (identified physically or electronically) where a Collection is held. Note that it is important to distinguish between the place and the institution responsible for the place; the latter is represented in this model by the term Administrator. A Creator is responsible in some way for the existence of the intellectual Content of an Item. A Producer is responsible for the existence of the physical or electronic form in which an Item is realized. A Collector gathers Items together. An Owner is the Agent who has legal possession of a Collection. An Administrator has responsibility for the physical or electronic environment in which a Collection is held. The model separates Agents (Creator, Producer, Collector, Owner and Administrator), shown on the left-hand side, from Objects (Content, Item, Collection and Location), shown on the right. Agents are people or organizations. Agents initiate actions, for example they create Content, produce Items, gather Items into Collections and administer Locations. Agents have rights. Agents can have many roles at the same time, for example the Collector of a collection may also be its Owner. Agents also control the usage of the collections. They determine who has access rights to the Collection and its Location and who holds copyright and ownership. Kinds of collection descriptionsThe model defines four broad classes of collection description:
In three of the four identified types of Collection-Description the information conveyed is analytic: that is, the information is held in discrete packets (e.g. catalogue records). Although they may be brought together and presented as a result of a search, or may be organized in a particular sequence (e.g. by author's name), the packets are largely independent of each other. However, two qualifications have to be made to the paragraph above. First, a Collection-Description may have some overall structure that reduces the autonomy of its constituent elements - i.e. it may be necessary to know the placement of a catalogue record within the structure of the catalogue - its context -- in order to interpret the record correctly. This is always true to some extent, and the participants in the Toronto conference on the principles and development of AACR [13] stressed the weaknesses in online catalogues resulting from the loss for such contextual information (for example, the ordering of results sets is often effectively arbitrary). It is particularly true for the established practices in cataloguing archival collections (see the rules for multilevel description in ISAD(G) [14]). Second, with Internet resources the distinctions may become blurred. Take, for example, the existence of a site for the works of Kipling on the World Wide Web. Viewing the site as a whole, it may be said to be a collection of entities or derived representations of entities. However, if much the same list of links can be retrieved by a search on (say) Yahoo, does this make Yahoo a Collection in our definition instead of a Collection-Description? We take the view that ownership, administration and location are relevant to the definition of a collection. The fact that a catalogue can now be directly linked to the entities catalogued - that the searcher can move seamlessly from finding and identifying to selecting and obtaining - need not mean that the constituent elements of those processes have changed. A Unitary Finding-Aid takes as its basis the information about the Collection as a whole - it makes no attempt to capture information about individual records except in so far as it is necessary to provide aggregate information (e.g. on limiting dates, or on the number of Items it contains). An Analytic Finding-Aid lists the individual records comprising information about the intellectual Content and the Items in which it is realized. There may, in the individual records, be information about Collections, and the Finding-Aid may be searchable from that aspect, but that is not its focus. A library catalogue is typically an Analytic Finding-Aid. An archival collection is more often described by a Hierarchic Finding-Aid, in which the individual Items and their Content are described, but firmly grounded within the overall arrangement of the Collection, e.g. grouping together all the letters, account books etc. in an ordered sequence or sequences. The Items are often not uniquely identifiable when considered in isolation, so the context of the Collection is an essential element in compiling the Collection-Description. An Indexing Finding-Aid is characterized here as consisting of information derived from Items, by implication regardless of their Content. By this is meant that an Indexing Finding-Aid - such as a robotic search engine - will index the words in a document (or catalogue record) regardless of their context and without trying to identify the discrete elements of Content contained therein. The effects of this may be mitigated by the use of metadata tags in Web documents, but in so far as the engine uses such tags, it is creating an Analytic Finding-Aid (which may or may not be combined with the Indexing Finding-Aid). An online Analytic Finding-Aid may incorporate a keyword index that is, in effect, an Indexing Finding-Aid in this sense of the term. At the other end of the technological scale, a printed Calendar of a Collection may have its own printed Indexing Finding-Aid which lists, out of context, the names, places, etc., occurring in the Collection. External relationshipsBecause it is a model of a single instance of a Collection, the model of Collection Description does not explicitly map external relationships. Such relationships are between instances of the model, and are not part of the internal structure of the model itself. They may, moreover, operate both at the Collection level and at the Collection-Description level. Relevant external relationships are:
RSLP collection description schemaThe schema presented here is intended to facilitate the simple description of Collections, Locations and Agents, i.e. of the emboldened entities in the above diagram. This includes:
In terms of the four collection description types listed above, this schema supports the creation of Unitary Finding-Aids. Collection attributes:
Location attributes:
Agent attributes:
Note that the 'dc:', 'cld:' and 'vcard:' prefixes used in the 'RDF property' column above provide a convenient short-hand representation for the full RDF property. See the example RDF description listed below for full details of the RDF properties used in RSLP collection descriptions. It should be noted that, wherever possible, properties have been taken from existing metadata schemas, notably the Dublin Core Metadata Element Set (DCMES) [16] and the vCard set of attributes [17]. The table above also indicates where a property is a sub-property of an existing property in the Dublin Core or vCard schema. SyntaxThe Resource Description Framework (RDF) is the W3C recommended architecture for metadata on the Web. RDF provides a mechanism for making simple metadata statements about resources (including both digital and physical resources) of the form - resource X has property Y with value Z. By grouping sets of these simple statements together, and by using the same mechanism to make statements about the sets of statements, it is possible to build up complex RDF descriptions of multiple resources and the relationships between them. Currently, the exchange of RDF descriptions on the Web is achieved by encoding them using the Extensible Markup Language (XML) [18]. The RSLP Collection Description project chose to encode collection descriptions using the XML encoding of RDF, based on the attributes listed in the schema above. Full collection descriptions are partitioned into separate RDF descriptions of Collections, Locations, Collectors, Owners and Administrators. These separate descriptions are linked together to form a full description. An example RDF/XML description of the Morrison Collection of Chinese Books housed at the School of Oriental and African Studies Library, London follows: <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcq="http://purl.org/dc/qualifiers/1.0/" xmlns:vcard="http://www.imc.org/vcard/3.0/" xmlns:cld="http://www.ukoln.ac.uk/metadata/rslp/1.0/"> <rdf:Description about="urn:x-rslpcd:967715792-47835"> <!-- Collection --> <dc:title> Morrison Collection of Chinese Books </dc:title> <dc:description> This collection comprises the Chinese books accumulated by Dr. Robert Morrison (1782 - 1834), the first Protestant missionary to China, during his sixteen years residence in Guangzhou and Macao between 1807 and 1823. Ten thousand Chinese-style thread-bound volumes cover a broad spectrum of subjects from early and mid-Qing China. </dc:description> <dc:subject> <rdf:Description> <dcq:scheme> LCSH </dcq:scheme> <rdf:value> Missionaries -- China </rdf:value> </rdf:Description> </dc:subject> <dc:subject> <rdf:Description> <dcq:scheme> LCSH </dcq:scheme> <rdf:value> Rare books -- China -- Bibliography -- Catalogs. </rdf:value> </rdf:Description> </dc:subject> <dc:subject> <rdf:Description> <dcq:scheme> LCSH </dcq:scheme> <rdf:value> Chinese Imprints -- Catalogs </rdf:value> </rdf:Description> </dc:subject> <cld:agentName> Morrison, Robert, 1782-1834. </cld:agentName> <cld:agentName> School of Oriental and African Studies </cld:agentName> <dcq:place> South China </dcq:place> <dcq:place> Macao </dcq:place> <dcq:time> Early to mid Qing </dcq:time> <cld:strength> classics, history, philosophy, literature </cld:strength> <dc:language> chi </dc:language> <dc:format> 10,000 thread-bound volumes - manuscripts and folios </dc:format> <dc:type> Collection.Library.Special </dc:type> <cld:accumulationDateRange> 1807-1823 </cld:accumulationDateRange> <cld:contentsDateRange> 1650-1825 </cld:contentsDateRange> <dcq:hasPart> Literature collection within Morrison Collection of Chinese Books </dcq:hasPart> <dcq:hasPart> Classics collection within Morrison Collection of Chinese Books </dcq:hasPart> <dcq:hasPart> History collection within Morrison Collection of Chinese Books </dcq:hasPart> <dcq:hasPart> Philosophy collection within Morrison Collection of Chinese Books </dcq:hasPart> <dcq:isPartOf> SOAS Library </dcq:isPartOf> <cld:hasDescription> Catalogue of the Morrison Collection of Chinese Books (monograph) </cld:hasDescription> <cld:hasDescription> Items recorded in SOAS Library OPAC <http://195.195.181.2/> </cld:hasDescription> <cld:accrualStatus> deposit, closed </cld:accrualStatus> <cld:accessControl> A Library Guide to Membership is found at: <http://www.soas.ac.uk/Library/Guides/membership.html> </cld:accessControl> <cld:note> London Missionary Society held collection from 1825-1834, then UCL until 1922. Six "missing" books held at Bodleian library </cld:note> <dc:creator resource="urn:x-rslpcd:967715792-32366"/> <cld:owner resource="urn:x-rslpcd:967715792-62789"/> <cld:hasLocation resource="urn:x-rslpcd:967715792-16277"/> </rdf:Description> <rdf:Description about="urn:x-rslpcd:967715792-32366"> <!-- Collector --> <vcard:fn> Morrison, Robert, 1782-1834. </vcard:fn> </rdf:Description> <rdf:Description about="urn:x-rslpcd:967715792-62789"> <!-- Owner --> <vcard:org> School of Oriental and African Studies Library </vcard:org> <vcard:voice> +44 207 898 4163 </vcard:voice> <vcard:fax> +44 207 898 4159 </vcard:fax> <vcard:email> [email protected] </vcard:email> </rdf:Description> <rdf:Description about="urn:x-rslpcd:967715792-16277"> <!-- Location --> <dc:title> School of Oriental and African Studies Library </dc:title> <cld:address> Thornhaugh Street, Russell Square, LONDON WC1H 0XG. </cld:address> <cld:postcode> WC1H 0XG </cld:postcode> <cld:country> uk </cld:country> <cld:accessConditions> See opening hours for SOAS at: <http://www.soas.ac.uk/Library/open.html>. Reference only in the library. </cld:accessConditions> <cld:seeAlso> http://www.soas.ac.uk/Library/home.html </cld:seeAlso> <cld:isLocationOf resource="urn:x-rslpcd:967715792-47835"/> </rdf:Description> </rdf:RDF> This encoding syntax follows the draft recommendations for encoding Dublin Core metadata within RDF [19] (current at the time of development) particularly in the area of how to encode the scheme associated with a particular value. Notice that the RDF descriptions above are about resources that are explicitly identified using a URI [20]. In the case of most RSLP projects however, the Collections, Locations and Agents being described do not typically have URIs already assigned to them. The URIs used in the above description have been automatically generated, specifically for the purpose of creating the description in RDF. By encoding descriptions in RDF/XML and by making use of Dublin Core and vCard properties as far as possible, the project hopes to position RSLP collection description very closely alongside other emerging descriptive practice on the Web. ImplementationThe RSLP Collection Description Project has developed a simple Web-based tool that enables the creation and editing of fairly complex RDF collection descriptions. The tool has a number of example descriptions built into it and also contains embedded detailed help in the form of data-entry guidelines for each of the attributes. The tool is freely available for use on the Web. It has been developed in Perl and the source code will be made available in the near future. There has been some other, more experimental, implementation of the above schema using a relational database (Microsoft Access) and the ROADS suite of tools [21]. Although these implementations cannot yet produce an RDF/XML encoding of a collection description, there is no reason why they should not do so. This work has already been used successfully as the basis of further implementation by other RSLP projects and it is anticipated that further work will be carried out in this area. Collection typesWork on the project has also has also resulted in the development of an enumerated set of collection types - terms that may be used as a value for the collection Type (dc:type) attribute in the above schema. The list is made up of the emboldened categories in the left-hand column of the following table. The list of categories are grouped into those that indicate the type of collection, those that indicate the curatorial environment in which the collection has been made, those that indicate the content of the collection and those that indicate the collection policy and/or usage.
In this scheme, multiple categories may be selected as appropriate. Typically, zero or one category from each group will be selected (though there may be exceptions to this). Multiple categories should be concatenated together, separated by a '.', to form a string value - for example: Collection.Library.Dispersed Although the ordering of categories implies no hierarchy, it is suggested that categories be selected in the order shown here for consistency. Issues and conclusionsThe RSLP collection description schema is not intended to be a replacement for richer archival description schemas, such as that offered by ISAD(G). Rather, it should be seen as a schema for making relatively simple collection descriptions in a wide variety of contexts - a Dublin Core for collection description. It is noted that several of the current RSLP projects will be contributing ISAD(G) conformant EAD descriptions to the UK Archives Hub [22] (or will be eligible to contribute descriptions to the Hub). We have recognized that it is not sensible to ask those projects to describe the same collections twice. To enable RSLP projects to describe collections once, a mapping from ISAD(G) to the RSLP collection description schema or vice versa is necessary, allowing collection descriptions in one format to be transformed into the other format. Mappings between ISAD(G) and the RSLP collection description schema, and tools to automate conversion between ISAD(G) conformant EAD and RSLP collection descriptions encoded in RDF are likely to be generally useful, particularly given the possibility that the RSLP collection description schema may be used outside of the RSLP context. The choice of RDF/XML as an encoding syntax has not been entirely trouble free. RDF is a fairly new development and there is not a great deal of significant implementation experience. The approach taken of inventing a URI and assigning it to a resource, specifically for the purpose of creating an RDF description, is relatively untested. Furthermore, recent recommendations made by the Dublin Core Metadata Initiative for element qualifiers [23] (schemes and attribute refinements) have been developed in parallel with our work and the conventions for their encoding in RDF are not yet fully mature. The qualifiers and syntax adopted in this area by the RSLP Collection Description project may well be incompatible with the conventions developed elsewhere in the future. It might be argued that the project has not had sufficient resources to fully develop software tools that enable other RSLP project to describe collections in a cost effective and efficient way. This is largely true - such software development was never envisaged as part of the original project proposal. This means that projects, particularly those projects that need to describe large numbers of collections, are left with the burden of developing their own RDF collection description tools. This is made more difficult given the general lack of off-the-shelf RDF compliant tools. This is not an ideal situation. However, the RSLP Collection Description Project has been successful in developing a model of collections and collection descriptions, in implementing that model using an RDF encoding and in providing the basis for deployment of that encoding by other RSLP projects. References[1] The library, the catalogue, the broker: brokering access to
information in the hybrid library [2] eLib Phase 3 projects [3] Full Disclosure: Releasing the value of library and archive
collections [4] Scientific, Industrial, and Cultural Heritage: a shared approach: A research framework for digital libraries, museums and archives [5] Research Support Libraries Programme [6] An Analytical Model of Collections and their
Catalogues [7] RSLP Collection Description Schema [8] Resource Description Framework (RDF) Model and Syntax
Specification [9] RSLP Collection Description Tool [10] RSLP Collection Description Data Entry Guidelines
(draft) [11] CLDT - an enumerated list of collection types [12] Functional Requirements of Bibliographic Records [13] The principles and future of AACR [14] ISAD(G): General International Standard Archival
Description [15] Reference Model for an Open Archival Information system
(OAIS) [16] Dublin Core Metadata Element Set, Version 1.1: Reference
Description [17] RFC 2426 - vCard MIME Directory Profile [18] Extensible Markup Language (XML) [19] Guidance on expressing the Dublin Core within the Resource
Description Framework (RDF) [20] Naming and Addressing: URIs, URLs, ... [21] Resource Organisation And Discovery in Subject-based
services [22] The UK Higher Education Archives Hub [23] Dublin Core Qualifiers Copyright© 2000 Andy Powell, Michael Heaney, and Lorcan Dempsey |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Top
| Contents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
D-Lib Magazine Access Terms and Conditions DOI: 10.1045/september2000-powell
|