US Library of Congress' Digital Collection Among World's Largest
US Library of Congress' Digital Collection Among World's Largest

<!-- IMAGE -->

The U.S. Library of Congress is well known for being the world's largest library.  That is, in the traditional, paper format.  Now, the library is on the way to hosting the largest digital collection in the world with more than 700 terabytes of data. 

Converting holdings to a digital format

This is the new look of the U.S. Library of Congress: blinking lights, lots of cables and an ocean of digital information with more than 50 million individual files.  This fancy tower is one of several Web servers that brings most of the information to the Internet.

  Jane Mandelbaum manages information technology services at the library. "All the data on our website is here," she explains.

So far, the library has a total of 700 terabytes of data.  But because of copyright issues, only 200 of those are available on the Web.

"A terabyte is about 1,600 CDs or about 330 hours of TV or about 2,000 books and we have about 500 terabytes that we keep in our long term preservation systems," she adds.

At the Library of Congress, the numbers can be mind-boggling. Experts estimate they have more than 120 million books, 36,000 feature films, hundreds of thousands of music sheets and recordings, and the large collections of manuscripts, Web sites, posters and photography.  Yet only one percent of it has been digitized.

Thomas Youkel is the senior systems engineer. "We have a scan lab here that scans anywhere from four to six million items a year," he says, "I don't guarantee that all those are put on the web, but a lot of it is."

Technology used for preservation

<!-- IMAGE -->

Most of the library's digital collection is for preservation reasons.  But it is the one percent of the collection that has been digitized for the web that serves most of its customers: 85 million a year.  

Digitizing the Library of Congress is a long and expensive process.  This is one of 205 volumes from Abraham Lincoln's documents from the 1800s.   The careful scanning of manuscripts makes this very slow work.  

The collection of around 65 million manuscripts hold some of the most treasured documents at the library, from presidential papers to original poems.  The chief of the manuscript division, James Hutson, says in the computer age, the creative process of manuscripts is getting lost.

"You won't have Shakespeare first draft or Beethoven's original sketch of the ninth symphony in the future probably, is all lost in the digital age," Hutson said.

<!-- IMAGE -->

More than five million maps are being digitized.  Some in large sizes like this map of Africa painted on cloth at the turn of the last century.  Atlas books like this one, hand painted in the early 1600s, require a different technology.  Its anthropomorphic map of Belgium is beautiful but geographically incorrect.

   Colleen Cahill leads the digitizing team.  She says people can freely use those materials on the Web. "You are looking at four acres of cartographic materials, they represent over five million maps, tens of thousands of atlases, hundreds of globes," she said.

Also, nearly one and a half million photos have been posted on the web.  

One of the biggest challenges at the Library of Congress today is the rapid change of technology.

"If you think in terms of changing technologies you go from this to this.  This holds approximately 100 times as much information as this one," Youkel said.

While workers continue scanning and digitizing millions of items, they keep an eye on a migration plan, to move from obsolete technology to new technology - a never ending process.