Sign up for ResearchBuzz FREE every week by e-mail.
|
April 02, 2006CiL2006 -- Recreating the Civil War / Provenance and DigitizationI believed I mentioned in the ResearchBuzz newsletter that I hadn't written up my favorite session from computers in libraries, so here you go. My favorite session was by far Sharon Carlson and Margaret Graham's discussion on digitzing archived materials. Partially it was because of the subject, partially because as presenters they were first-rate, and partially because of the audience interest and enthusiasm. First up was Sharon Carlson, talking about a project to digitize eight Civil War diaries -- about 1100 pages worth. All the diaries were from men who had Michigan connections, all were from Union regiments, and all were considered to have research value. (There are even a couple of photographs available of the men in the diaries, including Isaac Knapp and Eugene Sly.) As you might expect it takes several different people to handle this task, and in this case the team included scanners, digitizers, someone to handle color management (I had no idea color management was such a big deal in a situation like this) and someone to manage metadata. Sharon laid out the procedure like this: a transcriber to transcribe and proofread; an encoder to proof, code documents with structure tags and keyword tags, a reviewer to proof and correct again, and then an authorizer to review text and give it a metaphorical stamp of approval. Student encoders and transcribers work about 15-20 hours a week. Some of them are interns, while some of them are paid student wages. At this point, however, students do not do the reviewing level of the projects. The work does take time -- 3-5 pages an hour can be done transcribing and coding, while a reviewer can manage eight pages an hour. Many custom categories have been built into the coding, including battles, clothing, desertion, food, money, music, and so on. Preservation issues have been considered even though the project is not quite complete yet (it's expected to be up by the end of the summer.) Materials will be both stored on server and stored on CDs offsite. Next up was Margaret Graham, talking about a Drexel University online archive: Women Physicians 1850s-1950s. This is a much larger project -- 2000 digital objects encompassing almost 30,000 pages of material. (a women's medical college preceded Drexel so they apparently have a very large offline collection of these kinds of materials. There was some discussion of drawing out objects across several collections for the online archive, and the cataloging issues that arise thereby, but I got a little lost.) Margaret Graham mentioned that the storage infrastructure was built on open source as much as possible. This was not the first time I had heard open source mentioned at CiL, but people were amazingly casual about it. It was like somewhere a line was crossed and open source turned into a viable alternative to proprietary solutions. But man, it was done quietly. Anyway, a terabyte of storage is used for the online archive, which Margaret didn't seem to think was much these days. About 60 collections (out of 600) were drawn from to make the online archive. She took us through a brief tour of what some of the archive offered. One of the items, I remember, was a diary. This one did not appear to be transcribed like the Civil War diaries; however, it was more recent and easier to read. A product called Zoomifier was used to -- as you might expect -- zoom in and out images. It made the diary super-easy to read (and I heard quite a few oohs and aaahs from the audience.) I know I'm supposed to be a big search engine nut, and I am. But I thought it was fascinating to look behind the scenes and see how the many great online archives come to be, and the many steps involved to creating them. Kudos for a great presentation. Posted to Internet-Technology
|
|||||