Digitization Services and Capabilities
Digitization of Source Material
We maintain equipment and staff to perform several kinds of digital imaging. Generally speaking, we can scan photographic prints, graphic material (e.g. maps, manuscripts, graphic artwork, music scores, postcards, etc.), books (bound or disbound) up to a size of 34.25 inches high by 49 inches wide. We also have equipment to digitize photographic negatives, transparencies, and 35mm slides.
We have experience working with service vendors to manage the digitization of high-volume book scanning (including bound volumes), as well as large-format graphics and audio-visual material.
We maintain an up-to-date knowledge of metadata schemes and best practices used by the digital library community. We can assist partners with selecting the most appropriate metadata scheme to use for a particular project. We can also assist in creating basic tools for metadata gathering, such as spreadsheet or database templates.
Data Transformation and Automated Markup
Metadata must often be transformed from one form to another in order to create rich digital objects for online display. We have experience with automated transformation of data formats and generating XML markup from pre-collected metadata.
Open Archives Initiative
Metadata from collections we host can be shared using the Open Archives Initiative’s Protocol for Metadata Harvesting (OAI-PMH). The OAI-PMH uses the Dublin Core metadata scheme as its baseline record format; the ULS can assist in the mapping process if a different metadata scheme is in use.
Optical Character Recognition (OCR)
We implement powerful text recognition software known as PrimeRecognition. It combines the processing efforts of five OCR products to produce highly accurate results when creating searchable electronic text from scanned page images.
We license an XML-aware search engine (XPAT) from the University of Michigan, which supports the indexing and searching of full-text collections.
Online Access to Digitized Research Collections
Text collections are generally comprised of monographs or serials for which online page-turning and full-text searching are desired. At minimum, text collections require a scanned image of every page in an item, and a bibliographic record for the item. However, much of the strength of the collections depends on a full-text index, which can be created by either extracting text via optical character recognition (OCR), or by otherwise transcribing or capturing the printed text. Text collections can also accommodate structural metadata, such as chapter or article information.
- Historic Pittsburgh Full-Text Collection (http://digital.library.pitt.edu/fulltext/)
- University of Pittsburgh Press Digital Editions (http://digital.library.pitt.edu/p/pittpress/)
- Documenting Pitt Text Collections (http://digital.library.pitt.edu/d/documentingpitt/)
Image collections are comprised of photographs or graphic images with accompanying descriptive records. The images are frequently high-resolution grayscale or color images, converted into presentation formats to allow for dynamic viewing online. Descriptive records are searchable, and stored within a simple fielded database format.
- Historic Pittsburgh Image Collections (http://images.library.pitt.edu/pghphotos)
- Audubon’s Birds of America (http://digital.library.pitt.edu/a/audubon/)
- Chartres: Cathedral of Notre-Dame (http://images.library.pitt.edu/c/chartres/)
- Visuals for Foreign Language Instruction (http://images.library.pitt.edu/v/visuals)
If you are interested in partnering with us to digitize a research collection, please contact us with your idea.