“The work of the inspectors during the small-pox epidemic in the sweat-shops of Chicago, consisted in the enforcement of Sections 1 and 2 of the law....”  [read more]

About This Project


Homicide in Chicago 1870-1930During the collection of supporting material for the Homicide in Chicago 1870-1930 web site, project Director Leigh Buchanan Bienen acquired an intense appreciation for Hull House Maps and Papers – in particular with the prose style of its primary author, Florence Kelley. This work is a stellar example of the progressive ideals of the day: that the social sciences could be harnessed to enlighten and motivate to action. The more she delved into Ms. Kelley's history and her phenomenal body of work, the more fascinated she became – until deciding, in 2007, to create a web site as companion to her first one.

An article, Digital Books Replace Aging Pages, in the Spring 2007 issue of Northwestern, caught our attention, specifically Roxanne Sellberg's quote: "We want to move very quickly into scanning not just brittle books but many other books that we think should be available online." In 2005 the University Library had purchased a Kirtas 2000 scanner, capable of high-speed scanning...approximately 30 minutes for a 300-page book.
[click on photo to right]

Google, Amazon and most major universities are rushing to digitize vast collections of both contemporary and historic material. As they do so a balance is sought between the desire to disseminate knowledge and to protect intellectual copyright [ see video]. Two things became clear in our initial meetings with the University Library's Digital Media Services and Archives units – at which they graciously agreed to a collaboration with our team:

africae nova descriptio    
  1. High-speed scanning would allow a completely different sort of web project – one in which we could dig deep into the vaults of written work on Chicago to resuscitate health department records chronicling the spread of smallpox in Chicago's slums during the 1890's, factory inspection reports, court cases, correspondence, and articles from any of the hundred-plus daily news publications of the time.
  2. The use of Flash technology used by Northwestern University, Harvard's Image Delivery Service, the Encylopedia of Chicago and, of course, Google and Amazon would allow us to protect the copyright of author's and publishers wanting to maintain control of their works: the viewer allows viewing but not saving/downloading of the material. This could, in fact, help publicize a relatively unknown, obscure, or out-of-print work.
Whereas the Homicide web site, on the one hand, consisted of a single interactive dataset and a limited number of books and reports as supporting material, we realized the Florence Kelley site would have a very different form: essentially a supporting web site sitting in front of a huge trove of historical material.

Homicide in Chicago searchabilityOnce you have thousands (or tens of thousands) of pages and documents to make available, the question is of how to present them in a coherent and accesible way. We quickly came to realize we wanted a solution that could somehow replicate the interactivity we had been able to craft for the Homicide site - yet do so without a dataset to work from. The design imperative became something like this: We want to process these documents in such a way that if someone can locate a record of a distant relative if they were noted in one of the Factory Inspection reports, just as they could do with the a record from the Homicide web site.

The first phase of the project - christened hunting and collecting - gave way to the second. The complexity of the task became simplified with a work flow: (a) acquire source material, (b) acquire permissions, (c) scan, (d) run optical character recognition (OCR) on the scans, (e) tie the scans, OCR files and meta data together in a document management framework, (f) index the components for search engines, (g) then provide an elegant interface for the end user to experience them with.

One of the advantages of working in an environment like that of Northwestern University is the potential for collaboration. The Documents Librarian (Pegeen Bassett) and Faculty Services Librarian (Marcia Lehr) from the Northwestern University School of Law worked with Leigh and a cadre of students to gather the material. The University Library Digital Collections had the scanning equipment, image processing software and server architecture for delivering image files. that left the SESP team with the task of OCR, document management and design.

We deployed our in-house content management system, MySESP, coupled it with a document management application (built for the My World GIS project), and used an open source indexing system, SOLR, to process all the OCR files our research assistant tediously cranked out over hundereds of hours from the scans.