Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Articles

spacer

D-Lib Magazine
September/October 2009

Volume 15 Number 9/10

ISSN 1082-9873

Analysing Selection for Digitisation

Current Practices and Common Incentives

 

Bart Ooghe
Heritage Cell Waasland
<bart.ooghe@interwaas.be>

Dries Moreels
Flemish Theatre Institute (BE)
<dries@vti.be>

Red Line

spacer

Introduction

Over the past few decades, the explosion of digital and digitised documents and the addition of a purely digital facet to the documentary lifecycle have been forcing memory institutions from all sectors to address the same questions that the growth in analogue production necessitated in the early 20th century: Can/should all documents that pass the initial test of appraisal also remain stored indefinitely? What are the requirements for long-term preservation? At what point (if ever) do digital collections become too large to handle? And how can we decide what gets deleted? Individual institutions and collaborative research efforts alike have adopted a wide range of practices in their attempt to tackle these questions. We would expect the current phase of trial-and-error to move slowly towards a set of somewhat more uniformly adopted governing concepts and practices.

The specific challenge for traditional collection management on which we're focussing here is the matter of selecting content from heritage collections for digitisation. The question is: which items from a vast analogue collection should be made available in digital form first? This question has been approached from different angles over the last 20 years, yet at present no detailed frame of reference exists beyond the institutional or sectoral level to provide a more stable context for the decision-making process.

Selection guidelines vary in their respective scopes, and current practices are characterised by disparate approaches, different terminologies and a lack of open communication on the selection decisions that are being made. Some might suggest that selection needn't take place at all. Yet at the very least the mere magnitude of analogue collections and the organisational and financial impact of digitisation would seem to necessitate some form of prioritisation in the short term. That the issue has been dealt with in so many different ways reflects the complex nature of selection, but the resulting diversity in approaches ultimately also works against the development of a more uniformly applicable and accepted framework from which to approach the problem.

This article presents the results of a close reading of current practices and guidelines for digitisation, in an attempt to further the movement towards greater consensus on this issue. From the existing myriad approaches found in the field, the article formulates a set of common criteria for selection by way of a sector-independent longlist. In this way the article illustrates the complex nature of selection, which may be seen to depend upon significantly greater number of criteria than have so far been put forward in any single guiding document, but it also proposes a base-terminology that can be used in any institutional setting. Thus, it puts forward a possible common ground for selection practices and argues that the adoption of a more uniform language, and a more open and communicative approach, may not only help structure the decision-making process but is also a vital part of good governance.

Setting, scope and terminology

This article presents results obtained in the course of the Flemish government-funded research programme 'Preservation and Access for Multimedia Data in Flanders' (BOM-vl), which investigated key points in the digitisation, preservation and digital archiving of (primarily) audiovisual documents. The second work package of this programme examined how to select both AV and non-AV analogue originals for digitisation. The full report can be downloaded at <https://projects.ibbt.be/bom-vl/fileadmin/user_upload/frontendfiles/BOM-Vl%20WP2.2%20Rapport%20selectie.pdf>; an English translation is currently being prepared.

The study was based on a close reading of international practices and theoretical literature. Geographical focus lay in Europe and North America, with the addition of several cases from Australia and New Zealand. Practices were collected from 42 published surveys, guiding documents, project reports and best practices handbooks on both national and international levels. In addition to the digitisation programmes and institutions mentioned in these works, relevant information was also collected from policy documents and the websites of 98 individual institutions or initiatives, and direct input was received from 13 institutions. (For a list of these we refer the reader to the bibliography and appendix of the final report.)

Given the cross-sectoral scope of the study, the term 'document' is used throughout this article to refer to all types of content-carriers (e.g., paper, museum objects, AV-materials, etc.) and the term 'collection' is used in its broadest sense to refer to any grouping of documents.1 The article also explicitly avoids the issue of selecting born-digital documents to become part of a digital collection, which is inextricably linked to issues of appraisal and long-term preservation that were felt to fall outside the scope of the study.

Selection as part of the collection management process

In theory, selection for digitisation is distinct from traditional collection management decisions. Digital collections make documents more easily accessible; they allow new links to be created or new interactions with the audience to be maintained, but on a conceptual level they have no direct influence on the appraisal processes defining which documents end up in the source-collection in the first place or on subsequent re-appraisal and disposal processes. As long as the viability of long-term digital preservation remains unclear, digital collections will generally not be seen as a valid alternative for the preservation of analogue content.

However, current practices do reveal ample overlap between these two functions. Digitisation may be explicitly used to reduce handling stress on analogue originals, which links it closely to conservation concerns. The public value of a document for long-term storage may alter with its being accessible in digital form, which influences re-appraisal choices. In addition, the financial and practical efforts connected to digitisation require that we consider the long-term importance of documents offered for selection. Likewise, the implicit aim of many digitisation schemes is to create mirror collections that, to the eye of the end-user, will largely replace the original source-collections. This, too, implies close conceptual ties between the analogue and digital collections, and requires a close consideration of the relationships between the two.

While selection for digitisation as it is understood here differs conceptually from selection and (re-)appraisal decisions made in the course of collections management, a significant degree of overlap exists between the two. It follows that the processes by which these decisions are reached deserve closer attention.

Underlying issues in current selection practices

Once the decision to digitise a collection has been made, the logical subsequent questions would seem to be 'how much' and 'what'. However, not all digitisation schemes appear to address these questions – certainly not in their external communications – and current practices show many different answers to them. The study suggests that this diversity may be related to several combined factors, such as the three described below.

1. All-selective optimism and the remaining need for selection

The digital revolution has witnessed a rekindling of the ideal of the unified collection, which so strongly influenced pre-20th century archival thought. Against the practical need of selection for reasons of storage and access, there now comes the hope that we may yet be able to keep everything once it's been appraised after all.

This reasoning is mostly driven by the declining costs of digitisation and digital storage. In this light, some argue that it would be preferable to digitise all the documents of a collection at low quality and let time decide what is worth keeping and what needs to be re-digitised at high resolution for preservation purposes.2 Networked storage and access may also help circumvent the individual practical necessities of selection by dividing storage needs over multiple institutions.3 In addition, memory institutions' roles as keepers of knowledge in the digital world may themselves be under question. It is argued that the ease with which traditional heritage materials are recreated in digital form and dispersed on the Web, as well as the birth of entirely new forms of networked heritage such as blogs or social software applications, plead for an increasingly bottom-up approach to cultural valuation. This would cause appraisal, selection and preservation to fall largely in the hands of their creator(s) and of society as a whole, altering or even negating traditional institutional responsibilities.4

Yet while these considerations are not to be taken lightly, at least for now the arguments pleading against a 'select everything' approach still appear of greater significance. It remains unclear whether memory institutions will indeed lose ground to social networks with regard to either valuation or preservation. On the contrary, it might be argued that traditional institutions provide a more stable ground for preservation and access.5 In any case, memory institutions and cultural producers still have to face the financial and logistic impact of digitising their existing collections and of keeping these digital collections accessible. Selection prevents us from ending up with a cumbersome mass of data that is practically and financially impossible to maintain or access. Finally, the mere passage of time needed for this task implies that certain documents are likely to become illegible before the process is completed. This results in an unintended bias in the target-collection against which even the proposed 'select everything' ideal cannot arm itself. Giving voice to the decisions being made seems the more acceptable choice. An all-selective approach seems valid in only a restricted number of cases: small and precious collections, collections that would lose coherence unless digitised in their entirety and institutions with sufficient means to both digitise their entire collection over a short term and to carry the burden of long-term preservation.6

2. Differences in approach and terminology

An important second factor inducing fragmentation is the lack of a more or less uniform terminology for selection. Over the years several guidelines have proposed significantly different approaches to the task. However, while some of these are used across disciplines,7 much of the existing literature on the subject is directed at specific documentary types, sectors or digitisation functions. The suggested priorities, criteria and vocabulary differ accordingly and are not always easily translated beyond their specific setting.8 While libraries are well represented in the selection guidelines, museums and art institutions seem to have a particularly limited basis for comparison. Similarly, while efforts from Australia, Canada, the U.K., and the U.S. have been instrumental in the creation of selection guidelines, continental Europe appears to be lagging behind in the adoption or proposal of somewhat more standardised approaches to selection.

This variation works against a cross-institutional comparison and more general adoption of specific selection methods. When they are not explicitly stated to follow existing guidelines, selection practices appear most often based on ad hoc decisions or on available funds. The criteria underlying selection seem as numerous as the digitisation projects themselves,9 and comparing cases or finding suitable example practices can be far from straightforward.

3. Lack of open communication

Finally, the fragmentation in current practices is also amplified by the limited attention lavished on external communication of the decisions being made. Regarding selection and digitisation, there is often little distinction between the reasons for digitisation (e.g., increasing access) and the criteria for selection (e.g., prioritising heavily demanded materials). Communication often also reduces selection to a number of key points, such as broad chronological denotations or general remarks on the position taken with regard to copyrighted materials or financial limitations. While such highlighting does illustrate parts of the decision-making process, the study suggests that it also obscures the more complex facets of the issue by amalgamating different decisions under a single heading (see below).

More often than not there is no detailed external communication on selection decisions beyond the thematic or collection level. This may be due in part to a conservative reflex not to put collection management decisions out on public display. Indeed, a similar lack of communication is noticeable with regard to other aspects of collection management, which is particularly problematic for Library, Archives and Museum collections housed under the same roof or for collaborative initiatives.10 However, the 2006 the European Board of National Archivists (EBNA) survey of digitisation in European national archives suggests that this lack of communication may also reflect an actual lack of detailed selection documents within the institution.11

Whatever the case may be, it is clear that many initiatives as yet feel little urgency to communicate on the decisions being made and the processes by which these decisions are reached. They also lack a common vocabulary with which to do so. This makes it difficult for end-users to interpret the relationships between analogue and digital collections, a matter of increasing importance as many users come to know collections only through their digital counterparts.

Selection and the break-down of traditional boundaries

These different elements have jointly given rise to a multitude of attitudes to, and methodologies for, the creation of digital collections from analogue source-collections. For some digitisation is a matter of following predefined selection matrices; for others it is dealt with purely de facto or is perceived as a non-issue. These differences are noted at both the documentary, institutional and the sectoral levels, with only some initiatives attempting to provide guidelines across sectors (the most significant being the NINCH Guide to Good Practice12).

It is therefore somewhat of a paradox that it is precisely the digital setting within which these activities are taking place that is also altering the traditional roles of and boundaries between sectors and institutions. Digital collections and access portals are becoming increasingly sector-independent. End-users are also starting to perceive the digital documents as the primary access-point to collections, regardless of the institutional setting within which their analogue counterparts are housed. These changes would seem to call for a more explicit and unified approach to the selection issue and for a greater transparency on the decisions made. As digitised and web-accessible collections become the main public representation of heritage documents, openness is needed to allow complementary digital collections to be formed and to enable memory institutions to fulfil their primary mission to protect, preserve and promote public understanding of cultural heritage. And this requires a better understanding of the selection criteria used to create these digital collections.

Factors defining selection: the study

The study therefore aimed to provide a more stable ground for selection decisions and subsequent communication in the form of an abstracted list of selection criteria. An important part of this abstraction process consisted of the dissection of amalgamated selection methods and the filtering-out of implicit criteria from the available documentation. By closely examining and dissecting these it was often possible to redefine explicitly documented practices as consisting of multiple interacting criteria. For instance, selection on the basis of predefined time-frames may simply be a thematic choice, which we've defined as content-driven selection, but it can also relate indirectly to the physical state of the material (older materials may be of poor quality for digital presentation or, inversely, they may have to be digitised as a preservation measure). It may indicate a cost-reductive effort (the fragility of older materials may result in a more expensive handling and digitisation process), or it could be linked to copyright restrictions on more recent materials. Examples of implicit selection criteria are the prioritisation of catalogued or easily accessible documents or the way in which practical constraints such as document size may require a reduction of the number of selected documents later on in the digitisation process.13

Pooling the various sources, dissecting amalgamated criteria from divergent vocabularies and translating these into setting-independent terms, it was eventually possible to formulate a common set of 25 distinct selection criteria, grouped into 6 general categories. These reveal a selection practice that is far more diverse than individual cases might suggest, while at the same time they also reflect key-points for any digitisation effort, regardless of the specific setting or collection type.

Before turning our attention to the longlist itself, two additional points need to be clarified. Firstly, the criteria should be understood as being value-free: depending on the setting in which they are applied, they can be used both as a reason for, or a deterrent against, selection for uptake into a digital collection. Secondly, the order and relevance of the respective criteria will also depend on the concrete digitisation scheme, and not all criteria need to be part of the final selection process. The list is believed to integrate criteria more relevant to a (semi) commercial setting with those from a more purely publicly funded and/or heritage-centered setting. Hence, some readers may feel that criteria most relevant to their case are receiving too little attention or that others are of no relevance at all.

1. Institutional frameworks

  • Collection policy: the policy for collecting and preserving content may be defined by law or by policy documents created within the collecting institution. These vary in detail and scope, from general legal obligations to detailed treatments of specific documentary types. Any selection for digitisation should firstly fit these policies.
  • Aims and purposes of the existing digital collection: selection for digitisation functions, in a way, as appraisal for a digital collection. The explicit aims, purposes and characteristics of the collection evidently have a direct influence on the selection of these documents.
  • Selection by collection design: closely related to the previous criterion is the use of digitisation to create a new digital collection within which documents are placed in entirely new contexts and redefined documentary relationships. Again, the nature of this collection defines the criteria most relevant to its success.
  • Copyright and other legal restrictions: Legal restrictions on the reproduction (both in analogue and digital form) and the distribution of documents considerably affect the choices of material to be moved into publicly and often internationally accessible collections. Legal issues therefore represent some of the most important guiding factors in digitisation. Many initiatives deliberately avoid possible legal entanglements by selecting only documents that are right-free or for which rights can be obtained, with resulting negative impact on digitally available heritage of the 20th century.

2. Value of the material

  • Intrinsic value (content, completeness, clarity): Intrinsic documentary value is primarily defined by content and context. These consist of a combination of such aspects as socio-historical, cultural, aesthetic or scientific meaning, production processes, public interest, formal language or technology. These may to some degree be contained in the physicality of the document (e.g., typescript or format as denoting cultural meanings or production processes). Greater value is attributed to documents for which the authenticity and integrity can be ensured and the provenance is clear.
  • Selection and audience - use value: Different audiences have different needs regarding the material they wish to access and the tools, metadata, contextual information etc. that they require to interpret the document. The value of an item may be expressed in terms of its ability to meet these needs, bearing in mind, however, that the potential differences between current, intended and actual use require frequent re-evaluation. It is therefore significant to note that relatively few institutions have so far taken the time to perform detailed use assessments.14
  • Accessibility and availability: Maintaining access to its collections might be deemed the primary function of any memory institution. Inversely, the accessibility and availability of documents – not to be mistaken with the accessibility of their content, which is a separate criterion – may also influence the selection process. Prioritising highly demanded materials will enable multiple users simultaneous access and reduces physical strain on the originals. Prioritising items that are not easily available due to physical constraints furthers the disclosure of the collection. However, in some cases it may also be relevant to prioritise documents that are publicly unavailable, e.g., for political or legal reasons. Present unavailability may imply significant future historical importance.
  • Contextual value: Contextual value, understood here in a documentary rather than an archival sense (e.g., multiple recordings by the same orchestra, recordings that can be coupled with personal information available in private archives) enhances the legibility and thereby the overall significance of a document as a historical record. Selection on the basis of contextual value requires a balance between the intrinsic value of the document itself and the added value of an extensive documentary context. This may seem a self-evident criterion in archival management, but, for instance, for many AV documents selection is often carried out with little attention to the possibly related non-AV contexts.
  • Selection by affiliation: Selection by affiliation takes the previous criterion one step further. It implies that selection of a well-contextualised document coincide with the selection of part of its documentary context. This may greatly enhance the end-user's immediate understanding of the document, but it evidently also implies a greater digitisation effort.
  • Representativity: Representative selection implies the division of a collection into pre-defined classifications: formal (e.g., time-periods, geographical regions), structural (e.g., programme units, chapters) and content-based (e.g., genres). The method aims for a final selection that provides a representative view of the original collections. It is usually applied to large collections of documents that share a minimum of characteristics (e.g., newspaper collections) and requires specialised knowledge so as to limit personal bias in the definition of the classifications.
  • Arbitrary/randomised selection - sampling: Sampling also aims to create a representative image of a collection, but it strives more stringently for value-free selection. Samples can take on different forms, e.g., numerical, chronological, geographical, alphabetical or randomised. This technique serves particularly well in collections of which the size is disproportionate to the importance of individual documentary content, such as large series of similar records.
  • Aesthetics and visual appeal: A large part of the cultural heritage is currently being digitised for the purpose of making it more visible to a broad audience. Hence, aesthetic and visual factors play a significant role in its selection. One of the most widespread forms of digitisation entails the creation of images for institution websites, which are usually limited to top pieces from collections or images that capture the eye of the viewer, but we also find aesthetic selection taking place in educational settings where the most visually 'telling' pieces are digitised first.

3. Physical criteria

  • Accessibility of content: In general, priority should be given to material for which the content is likely to become inaccessible in the short- or mid-term. Such inaccessibility may be caused by physical decay (see also next criterion), by changes in hard- and software needed to access the document or by the disappearance of expertise regarding obsolete technology. In an AV setting the difficulty of assessing the expected lifespan of a carrier greatly complicates this method of selection.
  • Physical state of the material: the physical state of a document forms part of access-driven selection but is also sufficiently important to be listed as a distinct criterion. Documents most likely to physically disappear are usually prioritised, while fragile documents may be selected to reduce the danger of further decay caused by handling. However, when documents cannot be digitised without risk of loss or further damage, an institution might opt to delay selection in hopes of finding a safer method of digitisation before the document disappears.
  • Quality after digitisation: Once the decision to digitise has been made, it is necessary to define the degree to which the digital end-result will resemble the original document. This bears relation to both the functionality of the document (e.g., legibility) and to the ethics of altering its appearance (especially when digitising art). Considering the impact of high-quality digitisation on workflows and finances, priority might be given to material for which low-quality digitisation suffices, for which little deviation from the original is expected, or for which quality standards and accepted practices that speed up the workflow are available. Quality requirements must therefore always be defined beforehand.
  • Added value after digitisation: Digitisation can offer considerable added value both in terms of access and functionality, and through the creation of new contextual relationships. Manipulation of the digital object may, for instance, enhance its use by filtering out different kinds of noise. Priority might therefore be given to material for which digitisation opens the way to intrinsic added value. This choice bears obvious links to the decisions and constraints regarding visual quality after digitisation: increasing the functionality of a document may also imply deviating from the physical appearance of the original.

4. Unicity and digital multiplicity

Original documents always have priority over multiples, as they are irreplaceable and carry the highest level of authenticity and integrity. Copies and multiples, including those created as part of the digitisation process, must however be dealt with separately.

  • Copies and multiples within the collection: The document of the highest quality, in terms of physical state, completeness, legibility or guarantee of authenticity, has priority to become part of a digital collection. In the case of digital copies and born-digital documents, this means that versions with more complex and detailed metadata, including provenance and authority data, may be prioritised or that metadata may be transferred to the formats with greatest life-expectancy.
  • Multiplicity across collections: Considering the financial and logistic impact of digitisation, it might be proposed to give lower priority to material that has been digitised elsewhere or that is available elsewhere in a better copy. This choice should also depend upon institutional differences in accessibility and legal restraints on the respective multiples. Close cooperation between institutions is necessary in order to map possible overlap between collections and to decide whose multiple will eventually be selected.
  • Digital substitution: The term 'digital substitution' refers to the replacement of an analogue document by a digital copy. Digital substitution carries the considerable burden of the still unclear nature of digital long term preservation. Only documents that can be digitised at the highest current preservation level and for which the analogue form carries little or no artefact value may more easily be taken under consideration to be digitally substituted. It also more readily applies to documents labelled for disposal in the course of routine archival management but for which sufficient added value is expected from a (possibly short-lived) digital life.

5. Selection through metadata

Digital documents serve no purpose unless metadata are linked to them, but creating these is usually time and labour intensive. Both the presence and absence of metadata can therefore be used to guide selection. The most common metadata-driven selection prioritises documents that have sufficient metadata attributed to them or documents for which the input of metadata can be automated or otherwise streamlined (e.g., documents of a similar format or metadata-scheme). However, one might also opt to select documents with little or no metadata as a stimulus for metadata-creation and collection-disclosure, or because the very uniqueness of the document implies a lack of complex contextualising metadata. Finally, the amount of effort needed to map existing metadata to the metadata-architecture necessary to fulfil new functional requirements may also be used to guide selection.

6. Financial framework

Possibly the single most influential factor for selection is the financial framework of an initiative. This falls apart into different sub-criteria, some of which (namely economic selection) may be more easily overlooked by institutions that operate in a less commercially driven framework. Both in (semi-)commercial and more purely publicly funded environments all factors are nevertheless of considerable importance, not in the least in light of long-term business planning and sustainability modelling.

  • Costs of digitisation: The cost of digitisation includes such elements as the creation of the digital document, the restoration of fragile analogue documents or the costs for maintaining the digital collection over the long term and for updating the locally available expertise. Another important factor is the impact of the resulting digital collection on day-to-day collection management and long-term financial means.
  • Cost of selection: Selection itself involves the considerable effort of creating guidelines and defining criteria, which ideally occurs after detailed analysis of the available materials, and carries costs of physical selection, handling, storage, transport, etc. These costs might limit the amount of material being selected, or they might result in cost-reductive selection of entire series or of more easily accessible and handled materials.
  • Opportunity costs - the cost of loss: For each sum used to digitise a part of the collection, another part will have to remain untouched. Similarly, finances used for digitisation can't be used for new acquisitions. The criterion 'cost of loss' implies weighing the benefits of selecting larger parts of the collection against the significance of documents that are more costly to digitise, and weighing the cost-benefits of digitisation against those of collection care.
  • Cost of metadata: As stated earlier, digitisation and digital archiving require the input of metadata. It may be beneficial to limit the costs of this step by selecting only documents for which metadata are already available, preferably in a digital form easily transferable to the new metadata-architecture, or to adjust the digitisation initiative so as not to require overly complex metadata.
  • Potential income - economic selection: Digitisation may also be used as a source of income or as a way of heightening the collection profile. In this light, priority could be given to materials that are hoped to peak public interest, or which may be distributed commercially. On a small scale this may take the form of digitising content to place onto a website or to create cards for the gift shop. On a larger scale it may lead to the creation of pay-per-use digital collections.

Brief remarks on the possible implementation of the longlist

It was never the intention of the study to propose exhaustive or rigid rules for the decision-making process, but some general remarks may be given to help implementation of the longlist.15

1. Selecting criteria

The question of which criteria to use is of course primarily defined by the setting within which digitisation takes place. After deciding who carries the responsibility of the selection-process (curator, creator, external specialist and/or stakeholders), digitisation initiatives must examine such factors as: the position of the collection(s) in the broader field, existing policy documents, possible end-users and their needs, practical, legal and financial factors, opportunities for collaboration, and of course clearly defined goals underlying the digitisation scheme. These can then be translated into case-specific terms. Generally speaking, in a museum context –and barring the mere creation of digital images as part of the registration process – digitisation is most often exploitation-driven: exposition, public interaction with the collection, website etc. In libraries and to a lesser degree in archives digitisation is more likely to be used as a way to expand the availability of documents and reduce the handling of originals. Government-funded institutions may be more likely to rely on heritage-centric, value- and research-driven selection criteria, whereas institutions operating in a more commercially driven environment may be more likely to take matters such as audience, visibility for investors or expected revenues into consideration.

This very brief summary may seem self-evident, yet the study has shown that digitisation is at times still being carried out without a broader understanding of the relationships between digitisation, intended use and the mission or setting of the collection.

2. Clustering

While the final sets of criteria will differ from one initiative to the next, the study suggests that it is possible to formulate – with due care – some broad groupings of criteria more likely to feature in similar digitisation initiatives. These may somewhat simplify the actual decision-making process. For instance, preservation-driven digitisation appears most fundamentally based on criteria regarding physical state and vulnerability, physical and technical accessibility and the physical impact of use. User driven digitisation seems to centre on accessibility, availability, use value, metadata and context (in relation to functionality). When digitisation is used to support existing workflows and basic institutional functions, selection will be more dependent on institutional settings, the value of documents for the performance of institutional functions and their value as proof of these functions. Research and education will rely heavily on all types of value criteria (including added value after digitisation) and on metadata needed to understand the documents. Finally, economic or promotional digitisation is more likely to focus on uniqueness, aesthetics, user interests and legal restraints on exploitation.

Obviously these examples are highly generalised and many additional points have to be factored into the final decision-making process. Financial and legal matters, for instance, will be a driving force in virtually any initiative, as will any pre-existing policies and content-driven choices. Yet these general pointers, which stand out in the close reading of selection practices, may provide some first frame of reference on the adoption of the longlist.

Benefits and importance of working with uniform criteria

On the basis of the information collected through the study, we can, in closing, formulate a series of more general remarks on what we perceive to be the benefits of working with the suggested list of selection criteria on both a practical and a governance level.

1. Making the valuation process more transparent and simplifying the workflow

By breaking up often conglomerated criteria into multiple facets and by summing up the main potential selection criteria through a single list, the overview is hoped to ease and guide the decision-making process, point out easily overlooked factors that may drive selection and in this manner help avoid any unwanted bias. Similarly, the overview highlights reasons for selection which are often applied subconsciously (e.g., accessibility of source-documents) or that are easily forgotten (e.g., cost of loss). The overview thus makes the selection process itself more transparent and gives a first indication of the main issues to be examined when defining one's priorities.

On a more practical level the longlist also allows for a more rapid streamlining of selection workflows. This in turn facilitates the production of selection matrices, which may in turn streamline the actual selection process. Finally, working with a uniform terminology also allows the resulting decisions to be more readily communicated internally and externally, and enables them to be applied in similar situations more rapidly than is the case with most existing selection guidelines.

2. Greater communication through shared terminology

Selection is an issue faced by institutions across all sectors. However, as already stated earlier in this article, the question is often still being approached from very specific viewpoints. There is often only limited external communication on the choices that have been made, and differences in terminologies give the impression that reasons and forms of selection vary significantly. The longlist clearly shows that institutions in fact face many similar problems when moving their collections into the digital realm, a realisation that may in itself already stimulate communication. More importantly, it presents a concrete tool for communication in the form of a case-independent set of terms to define the basic selection decisions.

Adopting a more uniform vocabulary opens possibilities for comparison of, and discussion on, the choices that are being made and the validity of specific selection scenarios. Criteria clusters such as those formulated higher are furthermore generally applicable beyond the boundaries of the individual digitisation effort. Maintaining similar terminologies and communicating more openly, in other words, stimulates the sharing of expertise and the creation of a reference-base for third parties.

A practical example of how the abstraction of factual selection decisions, in this case also through clustering, increases transparency and allows for cross-institutional comparison can be given by transliterating the selection criteria expressed by the National Library of Medicine (US) in 2005.16 The Library itself has proposed two forms of selection from its collections, linked to respectively access-driven and preservation-driven digitisation. For both forms of digitisation a set of criteria was defined in initiative-specific terms, e.g., selection from microfilm, selection of milestone works, of US-related materials, of documents from the public domain, etc. Using the terminology of the longlist it is possible to translate these selection decisions into a more uniform set of criteria. The access-driven cluster can then be seen to consist of content-driven selection (as, among others, regarding language, subject, etc.), metadata and accessibility (only catalogued materials are selected), financial selection, duality across collections (what has already been digitised elsewhere is given a lower priority) and rights (public domain only). The preservation-driven cluster consists of content-driven selection, accessibility and physical state (material in danger of physical decay gets prioritised), cost of loss (materials most costly to replace are prioritised), duality within and between collections (neither what's been done elsewhere nor what's been microfilmed in the Library gets selected), archival design (certain backfiles are not selected) and rights (copyrighted material at risk of loss is still selected).

This brief example illustrates how the longlist can be used to firstly highlight those elements of relevance to the digitisation activity and to subsequently translate these to the local collections. More importantly, when we use the same terminology for both types of digitisation within the Library, the resulting lists of criteria more easily reveal how differences in the purpose for digitisation become reflected in differences in the selection process. Finally, the resulting grouping of criteria can be translated to similar cases elsewhere, potentially creating greater uniformity and a stronger basis for comparison between digitisation schemes and digital collections.

3. Good governance and long-term institutional responsibilities

Digital collections represent only a fraction of the analogue record. Not only does the magnitude of analogue collections make it unlikely that all material will ever become fully available online, but digitisation may also alter the appearance of documents to varying degrees, and it is often not possible or deemed feasible to provide the highest possible digital copies of the originals.17 Thus, as increasing numbers of end-users come to understand collections only through their digital representations, the institutional responsibility to differentiate between the digital documentary reality and the source-collection becomes ever more important.

In order for users to understand the relationships between the original documents and the digitised versions they are seeing, it is necessary to clarify which alterations and decisions have taken place and why. This should be seen as part of the primary mission of memory institutions. In addition, it may also be argued that memory institutions play a vital role in enhancing the digital literacy of users by pointing out how they should interpret digital information and where possible pitfalls may lie. However, current practices show that communication on these issues is often overlooked, and only limited action is undertaken to stress to users that what isn't available online may yet exist (and, incidentally, can be traced through other means than the oft-hoped for 'google-like interface').

In addition to this responsibility to present users, open communication also plays a vital part in good governance by maintaining the authenticity and integrity of documents for future use. Any choice involving the collection and its accessibility should be documented so as to clarify which and what kinds of alterations collections may have undergone and why.18 Yet the reality seems to be that these choices – and even the alterations made to digital files – are not always documented, certainly not externally. An important realisation in this regard is that the relatively recent nature of digitisation activities and the great variety in practices dictate that open communication, in a way that facilitates dialogue, should outweigh any hesitation one might have to publicly expose what may turn out to be 'the wrong decision'.19 Given the rapidly changing nature of digital life, many of the actions we undertake will be pioneering, and at one point we will undoubtedly regret some of the choices we're now making. Only through openness and dialogue can we hope to somewhat anticipate the more predictable outcomes, better evaluate the impact of our decisions and adjust our actions accordingly.20

Clearly defined selection criteria serve as a vital means of explaining these relationships between documents and collections and are therefore an important stimulus towards openness and good governance. They help current and future users correctly interpret what they are seeing and further the general understanding of the decision-making process that gave birth to the digital collection. In terms of institutional responsibilities they define the impact of and responsibilities taken by the institution with regard to the presentation of its collections.

Conclusion

On the basis of a close reading of current practices and out of a conviction that, certainly for now, selection for digitisation must remain a vital part of collection management, this article has proposed a list of 25 criteria that are found to represent the most significant factors by which selection may be carried out. This abstraction is believed to serve different purposes. On a most general level, the list is thought to represent the complexity yet also the commonality in selection choices across institutions and sectors. On a practical level, the definition of these concerns through a setting-independent vocabulary is hoped to ease the decision-making process and stimulate communication on the issue – a matter deemed of specific importance in light of the increasingly cross-institutional nature of digital heritage. Finally, such communication on selection decisions is also understood as a vital aspect of good governance and as a basic responsibility for any memory institution.

Acknowledgement

The authors would like to thank Saskia Scheltjens (Ghent University) for her help in fine-tuning an earlier draft of the text of this article.

Notes

1. Following Nail, Fernie (2007).

2. Ralf Goebel, Programmadirecteur Deutsche Forschungsgemeinschaft, in: Maidment-Otlet (2008), p. 25; Kulturarv (2009); Michalko (2007).

3. LOCKSS (2004); Uricchio (2007), p. 23.

4. Uricchio (2007), p. 24; Maness (2006); Bearman (2007), pp. 26-44.

5. Mackenzie Owen (2007), p. 45.

6. Hedstrom (2003), p. 16; TASI (2007), p. 2; Witthaut (2004), p. 87.

7. E.g., Hazen et al. (1998); Breen et al. (2004); Saskia Scheltjens (Ghent University), personal communication.

8. See for instance the significant differences between Hazen et al. (1998), Ayris (1998), Lee (1999) and Digital Preservation Coalition (2006).

9. Russell (1999); European Commission (2002), p. 206.

10. Piet Creve (AMSAB, Ghent, Belgium), personal communication; Croxford (2003), p. 9; Zorich et al. (2008); Crabtree, Donakowski (2006); van Asseldonk (2008), p. 26.

11. Hakala (2006).

12. Ross, Anderson et al. (2002).

13. As in the case of the Biblioteca Nacional de España: Alvarez Galán, Juan José, Llera Cermeño and Mar�a Belén, personal communications (August 2008).

14. As suggested in Tenopir et al. (2003); Wubs, Huysmans (2006); Holden (2007); Huysmans, De Haan (2007), Sundqvist (2007) and De Haan, Adolfsen (2008).

15. For more detail on both elements, please see the final report.

16. National Library of Medicine (2005).

17. E.g., Ling (2002).

18. Note for instance the prime importance placed on open communication and policy documentation in the NESTOR guidelines for trusted repositories: Dobratz et al. (2006).

19. Davis-Perkins et al. (2005), p. 280, 284.

20. Zie o.a. Kenney (2005); Allen (2006); European Commission (2002), p. 206; Reilly (2000), pp. 43-45; Hedstrom (2003), p. 16.

Bibliography

Allen, C.E. (2006) 'Foundations for a Successful Digital Preservation Program: Discussions from Digital Preservation in State Government: Best Practices Exchange 2006,' RLG DigiNews 10/3 <http://worldcat.org/arcviewer/1/OCC/2007/08/08/0000070511/viewer/file1724.html>.

Ayris, P. (1998) 'Guidance for Selecting Materials for Digitisation,' Joint RLG and NPO Preservation Conference: Guidelines for Digital Imaging, Warwick <http://eprints.ucl.ac.uk/492/1/paul_ayris3.pdf>.

Bearman, D. (2007) 'Addressing Selection and Digital Preservation as Systemic Problems,' in: Y. de Lusenet and V. Wintermans (Eds.), Preserving the Digital Heritage. Principles and Policies, Amsterdam, 26-44 <http://www.yoladelusenet.nl/yola_de_lusenet_publicaties/publicatielijst_assets/preserving_digital_heritage.pdf>.

Breen, M., G. Flam, et al. (2004) Task Force to Establish Selection Criteria of Analogue and Digital Audio Contents for Transfer to Data Formats for Preservation Purposes, International Association of Sound and Audiovisual Archives (IASA) <http://www.iasa-web.org/downloads/publications/taskforce.pdf>.

Crabtree, J., D. Donakowski (2006) 'Building Relationships. "A Foundation for Digital Archives",' JCDL Workshop 2006. Digital Curation & Trusted Repositories: Seeking Success, Chapel Hill: JCDL (<http://www.ils.unc.edu/tibbo/JCDL2006/Crabtree-JCDLWorkshop2006.pdf>.

Croxford, I. (2003) 'Getting Collections Information to New Audiences,' International Cultural Heritage and Informatics Meeting 03, Paris: Archives & Museums Informatics Europe <http://www.ichim.org/ichim03/PDF/092C.pdf>.

Davis-Perkins, V., R. Butterworth, et al. (2005) 'A Study into the Effect of Digitisation Projects on the Management and Stability of Historic Photograph Collections,' Lecture Notes in Computer Science 3652, 278-289 <http://www.dcs.qmul.ac.uk/~pc/publications/2005/ECDLpreprint.pdf>.

De Haan, J., A. Adolfsen (2008) De virtuele cultuurbezoeker. Publieke belangstelling voor cultuurwebsites, Den Haag: Sociaal en Cultureel Planbureau <http://www.scp.nl/dsresource?objectid=19697&type=org>.

Digital Preservation Coalition (2006) 'Interactive Assessment: Decision Tree for Selection of Materials for Long-Term Retention,' in: Digital Preservation Coalition (Ed.), Preservation Management of Digital Materials. The Handbook, s.l. <http://www.dpconline.org/graphics/handbook/dec-tree.html>.

Dobratz, S., A. Hänger, et al. (2006) Catalogue of Criteria for Trusted Digital Repositories. Version 1 (Draft for Public Comment) [Nestor - materials 8], Frankfurt am Main: Network of Expertise in long-term STORage (NESTOR) Working Group on Trusted Repositories Certification <http://edoc.hu-berlin.de/series/nestor-materialien/8en/PDF/8en.pdf>.

European Commission (2002) The DigiCULT Report. Technological Landscapes for Tomorrow's Cultural Economy. Unlocking the Value of Cultural Heritage, Luxembourgh: DigiCULT <http://digicult.salzburgresearch.at/>.

Hakala, P. (2006) Digital Material in European National Archives, Helsinki: European Board of National Archivists (EBNA) and The National Archives of Finland <http://www.narc.fi/EBNA/docs/EBNA-digireport.pdf>.

Hazen, D., J. Horrell, J. Merrill-Oldham (1998) Selecting Research Collections for Digitization, Council on Library and Information Resources (CLIR) <http://www.clir.org/pubs/reports/hazen/pub74.html>.

Hedstrom, M. (2003) It's About Time: Research Challenges in Digital Archiving and Long-Term Preservation. Final Report. Workshop on Research Challenges in Digital Archiving and Long-Term Preservation, April 12-13, 2002, s.l.: The National Science Foundation and The Library of Congress <http://www.si.umich.edu/digarch/NSF%200915031.pdf>.

Holden, J., (2007) Logging On. Culture, Participation and the Web, London: Demos <http://www.demos.co.uk/files/Logging%20On%20-%20web.pdf>.

Huysmans, F., J. De Haan (2007) Het bereik van het verleden. Ontwikkelingen in de belangstelling voor cultureel erfgoed. Het culturele draagvlak Deel 7, Den Haag: Sociaal en Cultureel Planbureau <http://www.scp.nl/dsresource?objectid=19473&type=org>.

Kenney, A.R., (2005) 'Developing Digital Preservation Programs: The Cornell Survey of Institutional Readiness, 2003-2005,' RLG DigiNews 9/4 <http://worldcat.org/arcviewer/1/OCC/2007/08/08/0000070519/viewer/file1088.html>.

Kulturarv (2009), Kulturarv. The Heritage Agency of Denmark <http://www.kulturarv.dk/english/index.jsp>.

Lee, S.D. (1999) 'Decision Matrix for Proposed Digitization Projects,' in: Scoping the Future of the University of Oxford's Digital Library Collections. Final Report, University of Oxford <http://www.bodley.ox.ac.uk/scoping/report.html>.

Ling, T., (2002) 'Why the Archives Introduced Digitisation on Demand,' RLG DigiNews 6/4 <http://worldcat.org/arcviewer/1/OCC/2007/08/08/0000070519/viewer/file2881.html>.

LOCKSS (2004) Libraries in the Digital Age. Collection & Preservation for Generational Access. The LOCKSS Program, Asia: Stanford University Libraries <http://itweb.lib.ru.ac.th/online2004/asia2004talks1.ppt>.

Mackenzie Owen, J. (2007) 'Preserving the Digital Heritage: Roles and Responsibilities for Heritage Repositories,' in: Y. de Lusenet, V. Wintermans (Eds.), Preserving the Digital Heritage. Principles and Policies, Amsterdam, 45-49 <http://www.yoladelusenet.nl/yola_de_lusenet_publicaties/publicatielijst_assets/preserving_digital_heritage.pdf>.

Maidment-Otlet, D. (2008) JISC Digitisation Conference, Cardiff, 20-21 July 2007, Cardiff: JISC <http://www.jisc.ac.uk/media/documents/publications/digi_conference_report-v1-final.pdf>.

Maness, J.M. (2006) 'Library 2.0 Theory: Web 2.0 and Its Implications for Libraries,' Webology 3/2 <http://www.webology.ir/2006/v3n2/a25.html>.

Michalko J. (2007) 'Mass Digitization and Cultural Heritage: Imperative and Opportunity,' Dutch Digital Heritage Conference, 12 december 2007, Rotterdam: OCLC/RLG <http://www.den.nl/getasset.aspx?id=Conferentie2007/k1michalko.pdf&assettype=attachments>.

Nail, M., K. Fernie (2007) 'Frequently Asked Questions,' in: MICHAEL (Ed.), MICHAEL-UK Collection Description Manual, s.l.: The Museums, Library and Archives Council, 67-70. <http://www.michael-culture.eu/technology/collectiondescriptionmanual/MICHAEL-UK_CDManual_v2.pdf>.

National Library of Medicine (2005) Selection Criteria for Digital Reformatting, National Library of Medicine <http://www.nlm.nih.gov/psd/pcm/digitizationcriteria.pdf>.

Reilly, B. (2000) 'Museum Collections Online,' in: Council of Library and Information Resources (CLIR), Collections, Content, and the Web, Washington D.C.: CLIR, 40-47 <http://www.clir.org/pubs/reports/pub88/pub88.pdf>.

Ross, S., I. Anderson, et al. (2002) Ninch Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials. III Selecting Materials: An Iterative Process, Washington DC: HATII, University of Glasgow and National Initiative for a Networked Cultural Heritage (NINCH), pp. 38-60 <http://www.nyu.edu/its/humanities/ninchguide/III/>.

Russell, K. (1999) Why Can't We Preserve Everything? Selection Issues for the Preservation of Digital Materials. Debate and Discussion at the Cedars Project Advisory Board Meeting, St. Pancras: Cedars <http://www.webarchive.org.uk/wayback/archive/20050410120000/http://www.leeds.ac.uk/cedars/colman/ABS01.html>.

Sundqvist, A. (2007) 'The Use of Records - a Literature Review,' Archives & Social Studies: a Journal of Interdisciplinary Research 1/1, 623-653 <http://socialstudies.cartagena.es/images/PDF/no1/sundqvist_use.pdf>.

TASI (2007), Advice Paper. Selection Procedures, Technical Advisory Service for Images (now Jisc Digital Media) <http://www.jiscdigitalmedia.ac.uk/advice/creating/pdf/selecpro.pdf>.

Tenopir, C., B. Hitchcock, A. Pillow (2003) Use and Users of Electronic Library Resources: An Overview and Analysis of Recent Research Studies, Washington D.C.: Council of Library and Information Resources (CLIR) <http://www.clir.org/pubs/reports/pub120/pub120.pdf>.

Uricchio, W. (2007) 'Moving Beyond the Artifact: Lessons from Participatory Culture,' in: Y. de Lusenet, V. Wintermans (Eds.), Preserving the Digital Heritage. Principles and Policies. Selected Papers of the International Conference Organized by Netherlands National Commission for Unesco, KB, (Den Haag, 4-5 November 2005), Amsterdam, 15-25 <http://www.yoladelusenet.nl/yola_de_lusenet_publicaties/publicatielijst_assets/preserving_digital_heritage.pdf>.

van Asseldonk, N. (2008) 'Erfgoeddata in Nieuwe Samenhang,' InformatieProfessional 11, 24-27.

Witthaut, D., A. Zierer, et al. (2004) Digitalisierung Und Erhalt Von Digitalisaten in Deutschen Museen [Nestor - materialien 2], s.l.: Nestor <http://www.langzeitarchivierung.de/downloads/mat/nestor_mat_02.pdf>.

Wubs, H., F. Huysmans (2006) Snuffelen en graven. Over doelgroepen van digitaal toegankelijke archieven, Den Haag: Sociaal en Cultureel Planbureau <http://www.scp.nl/dsresource?objectid=20623&type=org>.

Zorich, D.M., G. Waibel, R. Erway (2008) Beyond the Silos of the Lams. Collaboration among Libraries, Archives and Museums, Dublin, Ohio: Online Computer Library Center (OCLC) <http://www.oclc.org/programs/publications/reports/2008-05.pdf>.

Copyright © 2009 Bart Ooghe and Dries Moreels
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Next Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

doi:10.1045/september2009-ooghe