F e d e r a l D e p o s i t o r y L i b r a r y P r o g r a m | ||
| ||
|
Home About the FDLP Depository Management Electronic Collection Locator Tools & Services Processing Tools Publications Q & A | |
| askLPS · Calendar · Contacts · Library Directory · Site Index · Site Search |
|
|
| |
|
Web Documents Digital Archive Pilot Project (OCLC): Arizona Janet Fisher Phoenix, AZ The State Library in Arizona has recently signed on to the OCLC Web Preservation Project. The responsibility for this falls to another person in my agency, our Director of Electronic Government Information. I am here today to present some of his ideas and to describe our thoughts and efforts to capture Arizona’s electronic government information for the future. The Arizona State Library, Archives and Public Records is mandated to preserve state agency materials, in all formats. Currently, the agency has three responsibilities for state information. We:
We are in a unique position to have all of these activities under the leadership of one agency. The coordination of projects and any difficulties dealing with overlaps between these programs is more easily solved within one agency than it might be if we were in separate agencies. In the Law and Research Library Division, we have been working with print publications for many years. With varying success, we have been collecting, cataloging and providing access to these print publications. For the past year, we have been piloting a Government Information Locator System (GILS) program which spiders 105 state Web servers (approximately 190,000 Web pages). But we have been searching for a way to preserve the state agency Web publications. We have looked at the work of the National Archives and at the efforts of other states; and we have looked at the private sector. We have come to this research project of OCLC to continue our growth and to start working on a caching project for state information. We do not want to save everything in Web space.
As an agency we want to preserve information that is accessible to the public only on the Web, that provides evidence of what an agency published and which may be used by the public in making a decision, and demonstrates an agency’s accountability. Some of the problems that we are encountering are that:
Throughout all of this, we must be able to meet out statutory responsibility to preserve state agency publications. We can’t rely on agencies to preserve their Web publications; that’s our mandate, not theirs. We also have to recognize that agencies are not prepared to provide reference service to their old Web sites, which may be stored offline. We have joined the OCLC Preservation Project to see how we can capture and provide access to these publications. One thing we need to determine is how much and how often we capture the information. Rather than trying to capture every page, we are looking at a way to scan and capture (or get a snapshot of) state information on state servers at specific periods of time. How would we describe the cache? What is the scope of the cache? The cache should include only documents that meet the following five criteria:
Ideally, all Web publications that meet the preceding criteria would be cached. We are looking at capturing state Web materials four (4) times a year, at a minimum. When we go into full production, we will investigate monthly captures, and we will ask that agencies leave pages up for at least a month to give us a chance to capture them. One of the points in a methodology for Web space is to involve Webmasters in the process and to stress the use of metatags. We have already begun training of state Webmasters in metatagging for their sites for successful searches using our Government Information Locator System. In addition to helping point to current Web locations, the metatags can be used as identifiers for these documents in the future (and may include additional information to describe the electronic information) We have considered tags that would facilitate automated caching of those pages for enduring access. Those tags we have considered include:
The sequence number establishes the order in which pages should be printed or output to microfilm.
How will we retrieve materials from the cache? If the cache is stored on a Web server, the Government Information Locator System can index and provide access to the contents of the server in the same way it indexes and provides access to other Web sites. We have also considered using a proxy server combined with a database to ensure that links point to other documents contemporaneous with the document being viewed, rather than the most current version. For example, clicking on the Home button on an archived page would take you to the Home page current when the archived page was served. It is our hope that such a logical methodology will be viable and assist us in the retention of Web-based state information, for the short term. The puzzle of electronic information – here today and gone tomorrow – is one that needs to get solved. We need to be able to say it is here today, and those materials of enduring value will also be here tomorrow. These are the concepts we are carrying forward as we join in with others to face the challenge of preservation of electronic documents. An additional effort we are making in Arizona is to come to terms with electronic records. We have convened a group of representatives from state agencies and local governments, called the "Arizona ‘Lectronic Records Task Force" (ALERT) to look at public records, not publications. This group is working on a methodology and standards for handling electronic records. We are starting with the need for a common vocabulary, and building discussions about retention of electronic records. The challenges brought with born digital information do not have one answer, and so we are trying to address various angles and groups who are involved in the creation and later referral to these information sources. We are looking at the Web caching possibilities of this OCLC Web Preservation project as one of the pieces in the puzzle of preserving state government information. We look forward to the experience we gain, and lessons learned in this process.
|
| A service of the Superintendent of
Documents, U.S. Government Printing Office. Questions or comments: asklps@gpo.gov. | |||
| Last updated: April 12, 2002 Page Name: http://www.access.gpo.gov/su_docs/fdlp/pubs/proceedings/01pro14.html | |||
| [ GPO Home ] | [ GPO Access Home ] | [ FDLP Desktop Home ] | [ Top ] |