
Copyright © 2000 Wallace Koehler - All Rights Reserved
|
|
Whether the Web is library or is a source of material for digital libraries, we must also understand the concept in order to appreciate the context. Explore the following slide show on digital libraries. For definitions of digital libraries documented by other library school students, see http://www.simmons.edu/~schwartz/530-defs.html
There is an important distinction we need to bear in mind when considering options for the management of digital libraries and the Web environment. As we will see, there are a number of digital library projects that that include the incorporation of mark up language by the library into the text of the document. For example, consider the Model Editions Partnership (MEP) that marks up the digitized version of historical documents. Note the careful phrasing. MEP does not mark-up the historical document itself, but rather a representation of that document in digital format. Given government propensity for document digitization, perhaps MEP or similar technologies will be applied to government documents either by authors themselves or later by historians and librarians. Or perhaps GILS already is a form of native mark up of those documents.
Web documents are a thing apart from archived historical documents or live, GILS marked-up government documents. Web documents are created and "published" by millions independent of any formal mark-up requirement other than to produce material in an html format. There are no rules that require a Web publisher to supply even the most rudimentary metatag. We might cache or archive Web documents, and there are many proposals and technologies that do just that. Once captured in some way, these documents can be marked-up in any way one might wish. But there remains no obligation on the part of the author or publisher to do anything. And for the most part, most of the Web remains outside the direct control of those who may seek to mark it up for what ever purpose.
As a Web author, I am a proponent of the redundant use of classification techniques. As an information scientist, I recognize the limits of author supplied cataloging and indexing. However rigorous I may be in trying to observe the intricacies of metatags, Dublin Core, and various XML mark-up techniques, I am probably not equal to the task and intolerant of the time costs. There are also skill and technology barriers. At the time of creation, parts of this document were created with software that does not support editing of the document source code.
Whether we mark-up Web documents or apply classic cataloging techniques, Web librarians are limited to post publication methodologies just as they are with most other material maintained in libraries. We use what we are given in the document itself or in what we can discover or infer. Over time, authors and publishers have subscribed to both formal and informal metadata standards. Titles, author names, publication data, structure, and many other factors are organized in more or less standard ways. Publishers, again for the most part, subscribe to standard identification schemes for their monographs and serials, for example the ISBN number books carry. No such code exists on the Web. But remember that even in the "organized" traditional world of print, not everything is done the same way. Books published in English, if they have them, have tables of contents at the front. Books published in many others languages have those same tables of contents at the back. Most scholarly works in English contain indexes. This is non-standard in many other publishing traditions.
Post publication methodologies are hostage in a dynamic medium. Web documents die or change with great frequency. We will suggest here that not only must we acknowledge those changes, we should also take advantage of them to categorize documents where we can.
Finally, what does a library do? What is library science? What is information science? Has the advent of the computer and its ability to rapidly store, process, and retrieve information revolutionized and redefined information science or library science? What then of digital libraries? And finally, can we cope with the WWW within the constraints of the library/information sciences paradigms? If we deal only with Web documents, we no longer need to be too concerned with where to physically put them. We continue to have to consider where to put them in intellectual space. Moreover, unless I am very mistaken, "physical information containers" will be with us for quite some time. Whatever our responsibilities for the physical maintenance of collections will become, we will forever be responsible for the retrieval of appropriate, quality, authoritative information in the service of our patrons and clients. Therein lies much of the challenge.
The second is to attempt to manage the corpus of the Web, to catalog it as a single collection, This is perhaps a natural conclusion deriving from the encyclopedist tradition, culminating perhaps in the arguments of H.G. Wells in his interesting collection of essays World Brain, published in 1938. There is an ample literature in cataloging and indexing that tells us that the same collection needs to have its cataloging and indexing presented differently for different audiences (e.g. Soergel). In an interesting paper, Colomb argues that the WWW is a "heterogenous and chaotic collection of information." I have no argument with that. Because there are multiple users with multiple needs, Colomb argues that the Web needs multiple indexes. The alternative he sees is an overwhelmingly complex and multi-layered single catalog that is difficult to index, very expensive to maintain, and impossible to use. Bella Haas Weinberg informs us that there is "nothing new under the sun," that the Web poses no new challenges of substance and that it pales in fact when contrasted with everthing that has come before it.
There is another problem. I have already invoked the philosopher Heraclitus. As we shall see, the WWW is in constant flux. Just as it is for other publication systems, the pool of materials on the Web continues unceasingly to increase. Estimates in 1996 and 1997 placed the number of Web pages at between 100 and 600 million pages. Year 2000 estimates have it at between 1 and 1.5 billion public, static pages. Public means accessible without password or behind a firewall. Static means not dynamically produced from a database on demand. But new books, journals, magazines, flyers, films, CDs, ad nauseum are added to our pot of information as well.
But unlike those new books, journals, magazines, flyers, films, CDs, and so on, Web documents (an inclusive term for Web pages, sites, and other structures) undergo constant metamorphosis. In any given year almost all Web pages and all Web sites will be changed by their creators in some way. Not only are we faced with a "heterogenous and chaotic collection of information," we are faced with something that is constantly being redefined. To use library-speak, no longer can we mark 'em, park 'em, and forget 'em. We must forever be forever remarkin' 'em and reparkin' 'em. And there's no forgetin' 'em.
Are there solutions to the issues. Weinberg tells us we can do it. She is right, of course. We have to do it. This course does not offer solutions as such. It does explore the problems and the various solutions that are being offered and explored. I think everyone involved in the process would concede that we are far from the ideal solution but also that progress is being made.
| Is the Web a library? From the perspective of the bibliographic control
of Web documents, does it matter? Discuss the nature of the WWW.
Consider the public policy considerations of the "digital divide" both from a domestic and from an international prospective. Will "good" management' of "Web information space" exacerbate or alleviate the digital divide? Check out the US federal initiative at http://www.digitaldivide.gov/ |
R. Kling, Beyond Outlaws, Hackers and Pirates: Ethical Issues in the Work of Information and Computer Science Professionals, 1995. Available: http://www-slis.lib.indiana.edu/kling/cc/8-ETH1.html
W. Koehler, "The World Wide Web as a Third Information Model: Revolution or Old Wine in New Bottles?" Crimea 98, Libraries and Associations in the Transient World: New Technologies and New Forms of Cooperation Proceedings Available: http://www.gpntb.ru/win/inter-events/crimea98/doc1/doc65.html (note abstracts in Ukrainian and Russian, text in English).
F.W. Lancaster, "Second Thoughts on the Paperless Society," Library Journal, September 1999: 48-50.
S. Lawrence and C. L. Giles, "Accessibility of Information on the Web," Nature 400, 8 1999: 107-9.
D. Soergel, Organizing Information: Principles of Database and Retrieval Systems. Orland0, FL: Academic Press, 1985.
B.H. Weinberg,. "Improved Internet Access: Guidance
from Research on Indexing and Classification" Bulletin of the
American Society for Information Science 25, 2 (1999) Available: http://www.asis.org/Bulletin/Jan-99/weinberg.html
| What does Lancaster mean when he says: "The
typical library catalog is a pathetic tool for subject access."? Given
what Lawrence and Giles report, are search engines "pathetic
tools" too? Are other approaches pathetic as well?
What ethical considerations does Kling raise that are pertinent to our examination of management of the WWW? If Koehler is correct, do we need more rigor or less in our attempts to bibliographically capture the WWW? Colomb asks can we/should we have more than one
access tool? Given digital economies, is there any reason not to? Would
this, as Colomb seems to suggest, somehow strike at the foundations of
library science?
|