Building the Digital Collection
The Bancroft Library received an award in the 1998/99 of The Library of Congress / Ameritech National Library Competition to support the digitization of materials related to The Chinese in California, 1850- 1925. Additional partners in this project were The California Historical Society, San Francisco and The Ethnic Studies Library, University of California, Berkeley. The materials proposed for digitization reflected a broad range of formats and topics illustrating the complex history of the Chinese/Chinese American experience in California.

Selecting the Source Materials
The Chinese in California, 1850-1925 virtual archive consists of selected materials from a number of diverse collections at The University of California at Berkeley's Bancroft Library and Ethnic Studies Library and collections held by the California Historical Society.

Westward Expansion
San Francisco Chinatown
Communities outside San Francisco
Oroville Chinese Temple
Agriculture, Fishing and Related Industries
Anti-Chinese Movement and Exclusion
Sentiments Concerning the Chinese
Finding Aid
About the Project
Selecting the Source Materials
Intellectual Access
Integrating into American Memory

'Dupon Gai' Dupont Street now Grant Ave, S.F., The Bancroft Library

The materials selected for this project document the nineteenth and early twentieth century Chinese immigration to and settlement in California. Because of the complexity of the social and political history of the time, the presentation of the materials is organized thematically. In selecting the material and constructing this digital archive it became apparent that much of the material reflected an outsiders' view of the Chinese communities.

Information on the Chinese in 19th century and early 20th century comes to us via books, periodicals, newspapers, and other published records as well as through original source documentation such as manuscripts and photographs, drawings, and other pictorial materials. These materials are often filled with caricatures and derogatory designations. Yet these sources are often used even today because of the scarcity of written documentation on certain aspects of Chinese American history.

These documents tell us the history of what immigrants faced coming to the American West and the inter-ethnic tensions that were present. But they also can document the specific contributions of the Chinese to commerce, architecture, and cultural and social life. Moreover, in surveying materials at the three repositories, there was a visual and textual trail that reflected the Chinese point of view.

This was especially true of materials from the Chinese American Archives at the Ethnic Studies Library, reflecting the political, economic, social and cultural life of the Chinese communities in California. But much rich resource materials is also available at both The Bancroft Library and The California Historical Society that sheds light on the Chinese/Chinese American point of view on the many challenges and struggles they encountered in California, as well as their reflections on achievements and accomplishments.

Intellectual Access to the Collection
The UC Berkeley Library’s Project Control Database was designed as a reusable tool which curators could use to build digitized collections of diverse archival materials such as photographs, drawings, letters, manuscripts, and books. It has been used in numerous digitization projects including, the NEH-funded Making of America II, the LSTA-funded Cased Photographs projects, the IMLS Museums in the Online Archive of California (MOAC), and, currently, the Library of Congress-funded California Cultures Project.

The project control database manages the process of creating digital objects; creating intellectual description and access information (the database is designed to accommodate all major descriptive standards currently in use in digital projects) and correlating it with text and image files; provides necessary structural metadata; and records image capture data.

Information recorded includes fields for: Identification of Item (Collection Name, Call Number, Series name/Sub-collection number, Shelving location, Item identification - (Volume, Container, Folder, Item numbers), Caption); Digital File (Format, Resolution and Dimensions, File location).

There is a one to many relationship between item and electronic document, and file. EAD container listings of finding aids are also automatically generated from information in the database at the end of the process using perl scripts. The database also automatically binds multi-part digital objects together into XML encoded (as Metadata Encoding and Transmission Standard - METS) objects. It tracks all of the administrative metadata for the images, storing important information as to how the archival images and their derivatives were processed, when, and with what methods. The database accommodates different image processing work flows (flatbed scanner, multipage scanner, and digital camera) and the workflow for reformatting and marking up electronic texts. This database is currently the standard tool in OAC digital projects.

Project staff entered a brief descriptive record for each object in the project control databases. Descriptive and administrative metadata was keyed in by staff into separate project control databases set up at the partner institutions. Project staff keyed data into one database at The California Historical Society and into a second database for material from both the Ethnic Studies Library and The Bancroft Library.

The descriptions present in the finding aids follow rules set forth in Anglo-American Cataloging Rules, 2nd edition (AACR2). As necessary, local data conventions and guidelines were developed to aid in the consistency of data entry across institutions.

The database is hierarchical, so that entries could be made for collections, with all related item entries made under their collection record. Records contained identifying call numbers, project batch numbers used for routing and tracking material for digitization, physical descriptions and other cataloging information including contextual notes.

The UC Berkeley’s Digital Imaging Lab (DIL) served as a service bureau offering digital image capture service. Following other CDL collaborative imaging projects, DIL and project partners met the standards described in the CDL’s Digital Image Collections Standards.

The image production process used by DIL in this project was originally designed for the NEH California Heritage Digital Image Access Project and has been used to produce the images and detailed descriptive information for a number of projects including, the LSTA-funded Japanese American Relocation Digital Archive (JARDA), the IMLS Museums in the Online Archive of California (MOAC), the NEH-funded Making of America II, and the LSTA-funded Cased Photographs projects.

DIL performed digital image capture using an Agfa Arcus II flatbed scanner and a PhaseOne Powerphase digital camera. The PhaseOne Powerphase digital camera scanning back was used with a Hasselblad with 120mm Makro-Planar CF lens, mounted on a copystand, with Kaiser daylight fluorescent illumination.

The Agfa Arcus II flatbed scanner is used primarily for loose (unmounted) originals; while originals mounted in bound volumes, framed originals, and originals larger than the platen of the scanner (8 _ x 13 inches) are captured with the PhaseOne Powerphase digital camera scanning back on a Hasselblad with 120mm Makro-Planar CF lens, mounted on a copystand, with Kaiser daylight fluorescent illumination.

As part of the initial capture, images were balanced for brightness, contrast, and color, using the proprietary software supplied by the equipment manufacturers. A compact target including a grayscale, centimeter and color patches was included for reference with each scan. Typical capture resolution is between 300 and 600 dpi, with the 600 dpi level utilized whenever practical.

The digital master files are archived onto writeable cd media (CD-ROM) as 24-bit RGB TIFF files. Derivative (viewing) files were created from the digital masters in batch mode using Photoshop and Debabelizer software to produce JPEG (JFIF) and GIF format files at the reduced resolution levels appropriate for viewing.

Quality review of work was done at a number of points in each production workflow, first at the point of capture on flatbed scanner and digital camera (since each is a 'one off' operation) and finally just before web presentation when both images and text were viewed with a browser to confirm their accessibility.

Digital image capture equipment used by DIL for this project includes:

  • PhaseOne Powerphase Digital camera, which fits on a Hasselblad body and captures images up to 7000 x 7000 pixels (LxW). This camera is used to capture originals in bound volumes, and originals too large for flatbed scanning.
  • Epson 836XL and 1640XL flatbed scanners, which can scan originals up to 11x17 inches. These scanners are primarily used with unbound originals, especially document pages and loose photos.

Capture Specifications for Digital Masters
Our philosophy of digital image capture can be summarized as "scan once, re-use for many purposes." We expect that our master files should be useful for disparate, demanding purposes including scholarly study and high-quality publication, so we follow standards for resolution, image composition, and file format to ensure the ongoing value of the images.

CDL Image Standards
 We follow the California Digital Library Standards that pertain to imaging. See the following documents:

File format
Digital masters are captured in 24 bit RGB color and stored in uncompressed TIFF format. Scanner- or camera-specific image capture software is used to manage the technical details of image capture, and then Adobe Photoshop is used to save the TIFF file. This public domain file format is widely readable.

Capture resolution and master file size
DIL 's preferred capture resolution for reflective originals is 600 pixels per inch, yielding RGB TIFF master files in the 60 to 100 MB size range for originals close to letter size (8.5 x 11 inches). Larger originals are usually captured at a lower resolution, to keep the file size in the same range, and smaller originals may be captured at higher resolution.

Included targets
For reflective originals, a one-piece target is imaged at the edge of each capture. It combines the grayscale target and the color patches from a Kodak Q-13 Color Separation Guide and Grayscale with a centimeter scale, all in a compact layout created using a hobby knife and two-sided adhesive tape.

The information from the target is intended to provide information about the tonality and scale of the image to scholars and technicians. The "A," "M," and "B" steps of the grayscale are marked with small dots to make them easy to identify for making tonal measurements during capture set-up and file processing. Several different-sized versions of the combined target are suited to the range of sizes of the originals.

Tonal metric
The RGB data in the image files is captured in the native colorspace of the capture device (camera or scanner); that is, no color management step such as applying an ICC color profile is used on the digital masters prior to saving.

Before capture occurs, the camera (or scanner) operator uses the controls in the scanning software to adjust the color balance, brightness, and contrast of the scan so that the grayscale target in the image has the expected RGB values. These values are as follows: for the white "A" patch, R, G, and B values all at or near 239; for the middle-gray "M" patch, RGB = 98; for the near-black "B" patch, RGB = 31. These expected RGB values are appropriate for a 24 bit RGB image with gamma 1.8.

Cropping and background
Originals are depicted entirely, including blank margins, against a suitable background paper (usually white or gray) so that the digital image documents the physical artifact, as well as reproducing the imagery that the artifact portrays. A narrow gap between the grayscale target and the original allows for cropping the grayscale out of the composition if desired for some new purpose.

Storing the digital masters
The digital capture files are grouped on a local server for quality review, technical metadata preparation, viewing file preparation, and are then archived on recordable CD media.

Making the viewing files
Viewing files are made from the masters using scripted actions in Adobe Photoshop. The master files are opened, gamma-adjusted, downsampled to size, sharpened with unsharp mask, and saved in GIF and JPEG formats. The viewing files are intended for viewing on a typical computer monitor, as represented by the sRGB colorspace. In some cases, the master files are explicitly translated into the sRGB colorspace, with an ICC color profile created using Colorsynergy InCamera software to characterize the color capture of the camera or scanner.

Technical Metadata
Technical metadata captured for each original:

  • Width in pixels
  • Height in pixels
  • Ppi (aka dpi)
  • Filters used in capture
  • Glassvused to hold item flat
  • Master file name (eg <SObjID>A.tiff)
  • Network location of a master digital file
  • Frame selected to be the master
  • Date captured in digital format
  • Photographers batch ID
  • Scan type (eg scanner, digicam)
  • Scanner make
  • Scanner model
  • Scanner serial number
  • Bit depth (probably 24 bit)
  • Light source used during capture
  • Catchall description of color management
  • Name of color profile file for this setup
  • Grayscale, RGB, CMYK
  • Master file format (TIFF)
  • Intensity correction function for capture
  • Digital compression used during capture
  • Code for one of several standard derivative resolution setups
  • Name of this derivative file (eg <MasterfileName>A.jpeg)
  • Location of this derivative file
  • Host where this derivative file lives
  • Directory where this derivative file lives
  • Creation date for derivative file
  • ID for one of several standard derivation setups
  • Software to produce derivative file from master
  • Derivative file format (eg GIF, JPEG, JTIP)
  • Grayscale, RGB, CMYK
  • Resolution class
  • Conversion Gamma
  • Input color profile
  • Output color profile
  • Digital compression used during creation of a deriviative

Integrating the Collection into American Memory
Once the data was input and digitization of the selected materials complete, the two databases were exported to a single "virtual" EAD XML finding aid using a program written in perl. Originally, each database was divided at the highest level into nine broad topical areas, e.g., "Chinese and Westward Expansion," "San Francisco's Chinatown -- Architectural Space," "Agriculture, Fishing and Related Industries," etc.

These nine areas were preserved in the exported finding aid and material from each repository merged under each topic. LC proved to be extremely accommodating in allowing us to choose a format used to submit data to the American Memory Project. EAD was chosen because it most naturally fit the data as it was created.

The data represented a mix of item-level cataloging and collection-level cataloging. Collection-level cataloging is useful for presenting users with a contextual view of an archival collection. Showing items in the context of its series and subseries and neighboring items. An item-level view, such as an interface to browse, search, and sort individual images, necessarily presents items out of context. Accommodating collection-level cataloging for an item-level interface presented its own challenges.

"Dupon Gai" Dupont Street now Grant Ave, S.F.  
The Bancroft Library  

| Bancroft Home | General Information | Collections | Research Programs |
| Reference and Access Services | News, Events, Exhibitions, Publications |
| Friends of The Bancroft Library | Site Map | Search The Bancroft Library Website |
| UC Berkeley Library Home | Catalogs | Search the Library Website |
Copyright © 2008 The Regents of the University of California. All rights reserved.
Document maintained on server: by The Bancroft Library.
Last updated 12/03/2008. Server manager: contact