Accessibility links

Breaking News

Scanning Project Digitizes 25,000 US Library of Congress Books

The Library of Congress is the world's largest library, with tens of millions of items that attract scholars from all over the world to do research. But soon, those scholars may not have to travel far to do their research. Some of the library's treasures are starting to appear online.

Like many other great research libraries, the Library of Congress has been moving into the digital world.

One way they're doing it is through a scanning project that has so far put 25,000 books online for anyone to read or download.

Doron Weber of the Alfred P. Sloan Foundation, which is funding the $2 million project, stresses the importance of scanning complete books to preserve their cultural context.

"To preserve book knowledge and book culture means preserving every word of every sentence in the right sequence of pages in the right edition, within the appropriate historical, scholarly and bibliographical context. You must respect what you scan and treat it as an organic whole, not just raw bits of slapdash data."

The scanning is being done by the Internet Archive. The San Francisco-based nonprofit group aims to preserve cultural artifacts such as musical recordings and Web pages, as well as books, and make them available online. Brewster Kahle heads the Internet Archive.

"They're going faster and faster and faster here at the Library of Congress to bring the book collection, to digitize those, run them through optical character recognition, offer them for free on the Internet for anyone to download, read, bind, do anything they want with," Kahle said.

The books are being scanned in a large, utilitarian-looking room in the Library of Congress, a block from the U.S. Capitol building in Washington.

Ten scanning units, called scribe stations, have been set up. In each one, a book sits on a V-shaped cradle. Two high-resolution digital cameras overhead point separately at the left and right pages of the open book. An operator sits in front, using a foot pedal to operate a V-shaped glass cover that comes down to flatten the pages being photographed or goes up so the page can be turned. A pair of pages is scanned every six seconds.

Library of Congress staffer Aaron Chaletzky explained the scanning process and said that the online books are being used much more than their physical counterparts at the library.

"You know, if you build it, they will come," he said. "Well, we've now digitized these materials. We've put them out there, and a lot of items that have not literally seen the light of day because they haven't been checked out in God knows how long, have been downloaded and reviewed on Internet Archive's Web site dozens of times, and that's really gratifying."

The books being digitized in this project are all at least 75 years old and thus out of copyright. So Internet users may read them, download them, or really do any creative thing they like with them.

Deanna Marcum, associate librarian of the Library of Congress, says the Sloan Foundation project is focusing on fragile books that need special handling, American history, genealogy and some rare books.

"Most importantly, the result of these collections that are rare and hard to find and sometimes too brittle or too old to serve to the public, we're now able to make openly available to the public, and we see this as a great accomplishment," she said.

And a cost-effective one - the Internet Archive is able to do the mass scanning for just 10 cents a page.

There are other book scanning projects. Google, for example, has agreements with great libraries in Europe and Asia, as well as the United States, to scan books in their collections.

Charles B. Lowry, of the Association of Research Libraries, says it's important in the digital age that the older material remain accessible.

"I believe we're on the cusp of a jump from a world of analog print information to a world of digital networked access to information. Today, almost all information - even that which ultimately appears in print - is born digital. Yet I think there remains a need for large-scale efforts to expose existing print collections so that they do not become invisible."

The scanned books from the Library of Congress are online at the Internet Archive.