Preamble
This resource intends to outline the set of procedures for scanning of documents in two-dimensional format consistent with best practice and national and international standards for quality reproduction of documents .
Selection
The documents are selected on the basis of the selection criteria defined by the project, paying particular attention to legal issues (laws on copyright, privacy...). From this point of view, any concerns must be submitted to the opinion of legal counsel.
The selection criteria generally measure:
- historical and cultural value
- uniqueness and rarity
- high demand
- material without legal constraints or digitisation permits obtained
- restricted access due to the condition, value and location
- value added through online access, the creation of virtual collections, increased interest in little known or unknown material
In some cases, it may be useful to carry out an inventory of documents for identifying the quantity, type, size, state of preservation of documents. This information may be used for subsequent activities of conservation, cataloguing and digitisation.
For more informations
Legal aspects
When digitising documents, serious attention must be given to issues concerning copyright, in respect both of original material and of digital resources.
Points to examine are: characteristics of the work to be processed, rights ownership (who owns the rights – is the work protected – what type of protection?), the actions to be performed on the work (what are they – what rights are involved – has authorization been obtained?), likely critical areas and possible solutions.
Works that must be excluded are those subject to copyright and those digitised in other collections and accessible to the public on the web, in this instance so as to avoid duplication and minimize costs.
For more informations
Preservation of items
Digitisation is no substitute for commitment to care and preservation of original documents.
It is important to assess the state of preservation of original documents before proceeding with digitisation, and to ensure that any treatment of original specimens is carried out only after they have been inspected by experts.
For more informations
Northeast Document Conservation Center (NDCC) – Preservation leaflets
IFLA Principles for the care and handling of library material
The British Library, Preservation Advisory Centre - Photographic material
The Library of Congress – Collections care
Digitisation
Digitisation is the process of transformation/conversion of an analogue object (text, image, audio, video) into a digital format, interpretable by a computer.
The nature and size of the originals determine the choice of the recording system, the lighting system and methods of treatment (transport, opening of the volumes, handling).
The quality of images defined in the project determines the hardware and the recording software requirements, the acquisition times and image processing, and the memory usage in the storage media to manage and maintain.
In-house or outsourced digitisation
The choice of digitisation within the institution (in-house) or the use of outside services (outsourcing) has to consider the advantages and disadvantages of the two methods.
In-house | Outsourcing | |
---|---|---|
Advantages |
|
|
Disadvantages |
|
|
Recommendations | The in-house service is recommended if:
|
Outsourcing is recommended if:
|
Outsourced digitisation can be performed in the premises of the library or at the selected company’s location.
The flow of outsourced digitisation activities includes:
- definition of the scanning parameters
- preparation of a market study or a tender
- examination of the technical and logistical aspects
- arrangement of the digitalisation set
- preparation of documents
- training of staff and operators involved for quality control
- creation of a prototype
- digitisation
- quality control
- relocation of documents
- product delivery
The flow of in-house digitisation activities includes:
- definition of the scanning parameters
- purchase of equipment
- training of staff and operators involved
- examination of the technical and logistical aspects
- arrangement of the digitisation set
- preparation of documents
- creation of a prototype
- digitisation
- quality control
- relocation of documents
Choice of equipment
The data acquisition system (light source, optics, sensor, capture and calibration software) should ensure the image quality required by the project and not damage the original documents.
In particular, the lighting system must be cold-light without emission of UV and IR.
For ancient or valuable documents the use of suitable supports is required in order to not damage the document (facing the surface to be scanned upwards and using a tilting platform or V support).
There are some general indications on scanning systems:
- Flatbed scanner: for single-sheet documents, or bound documents that can be opened easily, smaller or equal to A3 size.These documents include: printed materials (e.g. leaflets, posters, brochures), manuscripts (e.g. letters), maps in good condition, printed music, prints (e.g. engravings, etchings, lithographs), pen and ink drawings without added watercolour or gouache (e.g. cartoons), photographic material (e.g. gelatin prints in black and white and in colour, albumen prints).
- Film Scanner, negatives and slides
- Planetary scanner or digital camera: for bound documents, documents of a particular nature, documents larger than A3 size. These documents include: bound volumes (e.g. books, albums, printed music, atlases), fragile documents, oil paintings, most works of art on paper (e.g. watercolours, drawings), graphic material and artworks made with flaked and friable substances (e.g. crayons, charcoal, soft pencil), watercolours with thick drafting, tempera or with paints, large or fragile maps, manuscripts (e.g. bound diaries, folded documents), parchments, photographic material (e.g. large prints, historical photographic processes, such as daguerreotypes and ambrotypes), three-dimensional material (e.g. textiles, sculptures, objects).
Digital Acquisition
The result of digitisation is the creation of files intended for long-term storage, “master” files, and files resulting from further processing, “derived” files, intended for use by users, typically via the Web.
The master file (“preservation master file” or “archival master file”) is the file that represents the best-copy output from digitisation, where “best” means that it meets the objectives of a particular project. These objectives may vary depending on the type of document. The criteria to be used in creating the master file must ensure faithful reproduction of the document in view of its long-term digital preservation or the need for high-quality printing, ensuring that there be no need to repeat the digitisation in the future.
Derivative files are produced from the master file and optimised for different fruition by the user, for example for display in a browser, to be converted to text via OCR, or for viewing on a dedicated workstation. They are normally resized and compressed, even with loss of information (i.e. JPEG images, MP3 audio format), for more convenient use achieved without excessive loss of quality.
Below are guidelines for the digitisation of image files, i.e. the product of the digitalisation of text, graphic or three-dimensional documents.
Image files
The following specifications are to be taken as general guidelines, to be tailored in each case to achieve the best compromise between quality and cost. High quality images, both in terms of resolution and in terms of colour depth, also imply higher costs of acquisition (equipment and qualified personnel) and of management (file size to be kept). On the other hand, the choice of the digital parameters must be sufficient enough to faithfully reconstruct the level of detail of the document.
The sampling density, or the number of pixels that represents the unit of length, must therefore be assessed not only based on the size of the document, but also based on the importance of the original document and the available resources.
Master File
- The image is archived as it has been captured by the scanning instrument.
- The document must be taken in its entirety. Around the document, it is necessary to leave a border of a few millimetres in order to make it possible to read the contours of the document.
- For books, an image file is produced for each page: each side, recto and verso, of each page, including flyleaves, even if there is no information, and blank pages; all parts of the binding: endpapers, spine, textblocks, (in order to show headbands, clasps, hinges, borders). For maps and archive material, the verso is scanned only if there is information present.
- If the original is mounted on a support which contains information (e.g. a photograph mounted on cardboard with the photographer's trademark), digitisation must also include the support.
- Each document must be scanned alongside a chromatic scale, a greyscale and a metric scale, placed outside of the reproduced image and within the overall frame. In the case of volume, it is sufficient to place the scale once on a paper or page (which will be scanned two times, one with the scale and one without).
- In the presence of scratches, wormholes or oxidation of the inks, the papers must be masked with white paper in order to avoid capturing the underlying content.
Derivative files
Chromatic scales, greyscales and metric scales should be removed from derivative files.
Derivative files must be balanced for brightness, contrast and saturation in order to correct any chromatic aberrations due to the conditions of capture, on the basis of samples resulting from the colour scales and greyscales. This balancing should aim to achieve faithful reproduction of the original colour characteristics, not to an arbitrary aesthetic improvement.
For the technical specifications of master files and derivative files see charts at pages 9, 10, 11 of Guidelines on Digitisation on Phaidra.
File Names
In general, the name of each file will be a character string composed of several parts, having therein the information necessary to uniquely identify the project document to which the image refers. File names will be completed with the appropriate extension (tif, jpg, pdf, xml).
In mass storage, image files will be organised in multiple folders, in order to preserve the overall ordering of materials.
The nomenclature of the folders and files is a string of fields (library code, shelf mark...) separated by a hyphen (-). Where the shelf mark contains a hyphen (-), spaces or special characters, they are replaced by a dot (.).
For graphic material and archive material that are scanned on both sides, follow the progressive numbering of “-r” files for the recto, and “-v” files for the verso.
For books, front and back covers are named so that they occur in the same order they have in the physical document. The spine or other parts of the original document (textblocks, binding details ...) must be included at the end.
The image that includes the colour scale, the greyscale and the metric scale, must be named so that it is the last file in the folder and a “-c” is added to the progressive numbering of the file.
For more informations see pages 12, 13, 14, 15 of Guidelines on Digitisation on Phaidra.
Data storage and conservation
The image collection consisting of folders and files will be stored on optical or magnetic storage media, such as CDs, DVDs, and external hard drives.
It is recommended to store data on two different supports – of different brands or different series – and to keep the media in two locations, to verify the data periodically, and to transfer data periodically to new media.
The lifespan of the storage media is affected by various factors (the ISO standards 18923:2000 and 18925:2013 indicate the parameters for the proper maintenance of the storage media).
It is essential to maintain digital assets created over time in order to avoid repeating the costly work of scanning, so procedures must be put in place to ensure that digital objects remain usable and accessible regardless of future changes in technology.
The usability and accessibility of digital objects over time is guaranteed by file format (format standard, file size, network transmission time, how the images are displayed...), by media storage and by the digital repository. It is essential to use open standards to facilitate interoperability with other systems and thus access to metadata through other service providers (e.g. Europeana).
Quality control
Quality control is aimed at ensuring good screen readability of the entire information content present in the original, this should be documented and maintained during the entire digitisation process. Besides the on-screen control, it can be useful to do print tests to verify the quality of the image on paper.
Quality control planning includes:
- proper preparation of the environment (hardware configuration, visualisation software, viewing conditions, etc.)
- a priori definition of “acceptable” and “unacceptable” characteristics
- verification mode (any product or a sample, all files or only the master, visual screen quality and printing quality, etc.).
The visual inspection of an image usually involves:
- correctness of framing and exposure, the absence of any deformation and/or optical aberrations
- control of the chromatic tolerance
- depth and colour profile
- digital size and format
- the presence of any elements which compromise the fidelity of the reproduction (light reflections, etc.);
- file name
For more informations
Besser, Howard - Introduction to Imaging
Cornell University Library - Digital preservation management resource
Cornell University Library - Moving theory into practice: digital imaging tutorial
Digital Library Federation (DFL) - Draft benchmark for digital reproductions of printed books and serial publications
Digital Library Federation (DLF)- Guides to quality in visual resource imaging
Federal Agencies Digitization Initiative (FADGI), Still Image Working Group - Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files
Metadata
Metadata is structured information relating to any type of resource, used to identify, describe, manage or allow access to the resource in question.
There is no metadata standard that meets all the needs of all types of collections and repositories.
Generally considered, metadata models include the following information:
- Descriptive metadata: data describing the content of a resource and allowing its retrieval
- Administrative metadata: data containing information on the management and administration of a resource (e.g. rights management, preservation metadata, technical metadata)
- Structural metadata: data describing the relations between digital objects (e.g. page order in a digitised book)
From Good practices handbook edited by the Minerva Working Group 6)
“Appropriate Meta-data Standards
Issue Definition
Before selecting a meta-data model for a digitisation project, the material to be described with the meta-data should be reviewed. This will help to identify existing meta-data models, as well as to pinpoint any omissions or gaps between what is covered by an existing meta-data model and the important meta-data for your project
Pragmatic Suggestions
The use of appropriate meta-data is very important for enabling search and retrieval of material from digital collections. This is even more the case when searching across multiple collections, stored in different locations, is the overall objective (logical union catalogues, virtual combined museums, etc.).
There exist already many meta-data models. Therefore, each project has to choose as meta-data model based on its own goals. It is advisable to avoid creating a new one, unless the requirements of your project are badly underserved by all existing standards.
Time spent modelling the important characteristics of the material being digitised, and identifying its key attributes and descriptors will be well invested. Such a model can then be compared with the scope and features of existing meta-data models
Possible controlled vocabularies (e.g. to describe a location, or an artist) should be identified. Several such vocabularies already exist and can greatly increase the success of searches, etc. See the section on meta-data standards and controlled vocabularies, below, for details."
For more informations
M. Baca (edited by) - Introduction to metadata
Dublin Core Metadata Element Set, Version 1.1
National Information Standards Organization (NISO) - Understanding metadata