F e d e r a l D e p o s i t o r y L i b r a r y P r o g r a m | ||
| ||
|
Home About the FDLP Depository Management Electronic Collection Locator Tools & Services Processing Tools Publications Q & A | |
| askLPS · Calendar · Contacts · Library Directory · Site Index · Site Search |
|
|
| |
|
Documents Data Miner 2©: Demonstration of a Pilot Project: http://govdoc.wichita.edu/ddm2 Nan Myers Wichita, KS Development and System Overview What is Documents Data Miner 2? Documents Data Miner 2 (DDM2) has been under development as a pilot project practically since the announcement of Documents Data Miner itself in April of 1998. In 1997, the Federal Depository Library Program began offering files of its shipping lists on the Federal Bulletin Board, but did not make them available in any searchable format. Then, beginning with a file for December 1998, GPO Cataloging records were made available at the Federal Bulletin Board in a file called SPCMOCAT. Files of GPO’s Cataloging Branch output have been posted monthly ever since, but as one long data stream requiring conversion to the MARC format. These two additional pieces of the Federal depository workload puzzle made it seem possible to provide, in one online location, a more complete Library Management System for United States Federal documents. Documents Data Miner 2, then, is:
Development of Documents Data Miner (DDM): Documents Data Miner began in 1995 at the Wichita State University Libraries’ Technical Services Department as an in-house relational database in Paradox, designed to support management of Federal depository library collections. Preliminary design was accomplished through a partnership between Nan Myers, Assistant Professor and Government Documents Cataloger; John Williams, Head of Acquisitions; and graduate students of the University’s departments of Electrical Engineering, Decision Sciences and Computer Science. The initial prototype for the data mining function was written by Dr. Xumin Nie, formerly Professor, WSU Computer Science Department. (For a more complete discussion of the initial database, see: Myers, Nan. "GPRD - Institutional and Statewide Benefits of an Internet-Accessible Relational Database." Proceedings of the 6th Annual Federal Depository Library Conference, April 14-17, 1997. Also published online at: < In 1997, we moved the relational database to the Internet, on server space leased from the National Institute for Aviation Research (NIAR) on the WSU campus. SQL server database implementation, query algorithms, and Web database publication were developed by two of NIAR’s staff: John Ellis, Senior Database Analyst, and Dr. John Hutchinson, Professor of Mathematics and Statistics. The new utility was named Documents Data Miner, or DDM. Documents Data Miner was built on official sources of data from the Government Printing Office (GPO) files at the Federal Bulletin Board. At this point, DDM provided an Internet-accessible relational database for the use of the government documents community (1350 Federal depository libraries and the Government Printing Office). DDM became an official partnership site of the GPO in April 1998 and was announced at the Federal Depository Library Conference. (For a more complete overview of DDM, see: Myers, Nan. "Collection Management Using the Documents Data Miner." Ellis, John. "Architecture and Functionality of Documents Data Miner." Hartman, Cathy. "Documents Data Miner: A Resource for Collection Development and Management." Proceedings of the 7th Annual Federal Depository Library Conference, April 20-23, 1998. Also published online at: < There are five databases in Documents Data Miner:
The DDM Development Goals were as follows:
The last development goal, "open system follow-ons," provided the basis for the prototype version of Documents Data Miner 2. DDM2 Development Goals are:
Documents Data Miner 2 System Overview: In this section, I will be discussing:
The current design parameters for DDM2 are as follows:
GPO data is drawn from the following files at the Federal Bulletin Board at <http://fedbbs.access.gpo.gov/liblist.html>
The current attributes of DDM2 are:
Development software included:
Current Data Statistics for DDM2 as of October 12, 2001:
Value added by the modules of DDM2 (which include those in DDM):
Recent Enhancements:
Projected Enhancements to the DDM2 Pilot Project (would require grant funding):
Issues of Cost Recovery Partners in Documents Data Miner 2: The two current partners for DDM2 are Wichita State University Libraries and University Computing and Telecommunications. Partnership with the Government Printing Office is pending. After the announcement of Documents Data Miner in April 1998, the political scenario at the WSU Libraries changed when the Dean of Libraries retired in the summer of 1998 and a year’s search concluded in the hiring of a new Dean in the summer of 1999. In addition, John Ellis moved from NIAR to a position as Web Applications Manager for University Computing in January 2000. There was then a period of transition and education, negotiation with University Computing, and seeking of direction from the University’s Office of Research Administration and Legal Counsel. All discussions about completion of DDM2 have led to the need for an Oracle platform in order to provide a sound national-level utility. Since such an expense could not be absorbed by either University Libraries or University Computing, it became apparent that cost recovery would be required to complete the vision of DDM2. Below is a cost summary of both DDM and DDM2 between 1997 and 2001: What it Cost:
Value 1997-2000 (DDM and DDM2) $200,000
Market Value 2000-2001 $75,000
of current DDM2 design - Fair Market Value $30,000
DDM2 design Cannot speculate Maintenance requirements even at the most basic level for DDM2 will require daily oversight, as shipping list files are published several times weekly at the Federal Bulletin Board. Summary of the Documents Data Miner 2 Online Survey: Cost recovery revenue streams will allow completion of all the modules for DDM2 and provide for the ongoing operation of the site. This revenue could derive from user fees, from a contract with GPO or other vendor, or from grants. In the summer of 2001, it was decided that a survey should be conducted of the over 1300 Federal depository libraries in order to determine whether or not users would be willing to pay modest fees to use DDM2. The DDM2 survey was designed and administered by John Williams, Head of Acquisitions. The information below on the survey results is from his internal reports of August 3 and August 10, 2001. The survey was announced on the GOVDOC-L discussion list and the DocTech-L discussion list, as well as through a batched direct e-mailing to the depository library addresses in the directory of DDM. The DDM2 Survey was available on the Web from July 13 to August 6. We had 232 responses, a response rate of about 17%. (Late responses up to August 10 boosted the total to 243, or 18.5%.) There was a nice cross-section of the library community from the University of Michigan to UT Austin and from Wellesley to UCLA. There were many small schools that took the time to respond, as well as responses from law, military, and professional libraries. Survey Introduction— Basically, we wanted to know three things from this survey:
Profiles — 70% of the responding libraries were medium to large in size, selecting over 45% of available Federal documents. None of the responding libraries had completed the cataloging of their collections. And, finally, all of them had sufficient infrastructure to use all the features that would be available in DDM2. The summary of results is as follows, based on the 232 responses by August 3, 2001:
Greater than Monthly: 57% Less than Monthly: 43% Paper and electronic: 57% Electronic Only: 30% Neither: 12% Not online and, perhaps, not cataloged at all: 3% Partially cataloged but not online: 10% Partially cataloged in an OPAC 87% Responses indicated adequate resources for all responding. Greater than 45%: 70% Less than 45% 30% Greater than $250.00 per year: 45% Less than $250.00 per year: 55%
Implications of responses — Regional depositories, which select 100% of Federal Documents, have the greatest logistical problems and, probably, the largest un-cataloged collections. Small depositories, selecting less than 30% of Federal Documents, have the greatest need for an organized catalog and processing utility. Demonstration of Documents Data Miner 2 Prototype http://govdoc.wichita.edu/ddm2 Even though still under development, the prototype Documents Data Miner 2 offers unique and highly useable services, including one of the most-requested utilities by the depository community — searchable shipping lists. DDM2 offers the following features — [a demonstration of each feature followed]:
All MARC records created by GPO Cataloging Division from December 1998 to present (currently over 51,000 records). Records are searchable using: Accessed records may be viewed in DDM2's public view, MARC view, downloaded to the user’s PC for import into local databases, or accessed via the Web for records containing P/URLs. Records may be downloaded into OPACs individually or batched. Records may be tagged by depository profiles. There is the potential to house all GPO MARC records from 1976 to the present: over 350,000 records. In addition, there is the potential to include retrospective cataloging project records from various institutions. All this would require further development. The URL Locator is a subset of the MARC Locator described above. The URL Locator is restricted to records with the 856 field for hotlinking to Web resources. At present, there are close to 15,000 URLs or PURLs in the records of DDM2, although not that many records because many records contain more than one 856 field. The URL Locator is searchable in multiple fields, like the MARC Locator. Records may be tagged by depository profiles; however, many library may wish to download all GPO MARC records for online titles. The depository community is encouraged to use Documents Data Miner 2 and to communicate any problems or ideas for improvement to the developers. For additional information or to provide feedback, please contact: Nan Myers
|
| A service of the Superintendent of
Documents, U.S. Government Printing Office. Questions or comments: asklps@gpo.gov. | |||
| Last updated: April 12, 2002 Page Name: http://www.access.gpo.gov/su_docs/fdlp/pubs/proceedings/01pro7.html | |||
| [ GPO Home ] | [ GPO Access Home ] | [ FDLP Desktop Home ] | [ Top ] |