course logo

MODULE 1-INTRODUCTION
PAGE 1
PURPOSE
PAGE 2
REQUIREMENTS
PAGE 3
EXPECTATION
PAGE 4
SYLLABUS
PAGE 5
ASSIGNMENTS


Copyright © 2002Wallace Koehler - All Rights Reserved

link to index page............site map

Purpose

This course is concerned with the bibliographic control of the Web. This is a discipline in its infancy. There have been a range of proposals and projects to attempt to address the anarchy of the Web. This course is designed to explore the range of options and to attempt to begin to evaluate their application and usefulness.

button

Is the World Wide Web the Wild Wild Web, an anarchic construct governed without rules where anything goes, without concern for purposeful and effective information management? Or is it amenable to management? Without a doubt, the WWW has grown in importance in the 1990s and will continue to do so in the third millennium. Many groups, principally in the library and computer science communities have recognized that importance.  That has been reflected in initiatives by such groups as UKOLN, OCLC, the EU, and W3C to seek and implement means of bibliographic control to the new medium. They are not alone in their efforts. There have been many commercial ventures, known to us all as search engines and directories, that have made an entrance on the stage of Web management as well.

The management of Web information, like its more "traditional" counterparts, has been approached from two directions. The first might be termed "author-based" tools, the other "post publication" tools. Authors select titles that reflect the content of their material (sometimes), they may provide keywords, and they may write abstracts. They give authority control by signing the material, by identifying their corporate home, and by the venue through which they publish. Web authors can but often do not avail themselves of the full range of tools available to them to offer bibliometric control of their work. For example, the index or home page for this course contains, metatags, Dublin Core metatags, and RDF metatags, all designed to help categorize and define the content and authority for this Web site. Mark up languages, like XML, are designed in part to support bibliographic control.

"Post publication" tools include cataloging initiatives including those from OCLC (NetFirst, InterCat, CORC), UKOLN, SOSIG, Virtual Library, and many others offer post hoc cataloging and indexing of Web material. These post publication efforts often utilize the pre hoc author supplied information to populate their catalog fields. Other initiatives, like W3C's PICS, can be applied either by authors or by third party indexers.

The United States Library of Congress and many other organizations throughout the world are addressing problems and issues of collecting Web and other digitized material within digital, electronic, virtual, and/or hybrid libraries. Each of these "library types" means something a little different. In the end, all these initiatives have a common end -- collection and utilization of Web material.

If we assume that Web information is manageable through some or all of the initiatives that have come before us and will continue to come forward, the Web will continue to challenge on two fronts. The first is quality control. Much work and many proposals exist to understand and manage Web quality issues, including authority, accuracy, and timeliness. The problem is not yet solved, and as in the "traditional world," may never be fully satisfactorily resolved. This is partly because of the ease of  Web publication but also because quality issues are inherently subjective.

But the Web is unlike the traditional world of publication. Web documents by their very nature are ephemeral. They come, they go. Web documents are transitory. They can be moved from server to server, from address to address, without changing the inherent quality or the literal nature of their content. There are now efforts to archive the Web and to provide some form of address stability. These include the PURLS from OCLC and URx's from W3C and others.

Web documents also undergo metamorphosis, they change content sometimes subtlety, sometimes significantly and they do so sometimes slowly,  sometimes rapidly. These problems too have been addressed in the literature and must be resolved before the Web will submit itself fully to bibliometric control.

A new literature has erupted some addressing the phenomenon (like D-Lib Magazine or the Journal of Internet Cataloging), some taking advantage of it through the proliferation of e-journals and h-journals.

This course is conceived as a survey course. By their very nature, survey courses are designed to introduce the student to the scope of the field, to touch upon its breadth and depth. They are not designed to create experts. Rather they are designed to foment interest, to point the way so that the student becomes aware of the richness and complexity of the offering. What we are about to do with this course is to try to gain some understanding of information in the Web environment and the various approaches authors, researchers, and practitioners have undertaken to bring some kind of order to that chaos.

We will explore a wide range of applications. It must be stated at the outset that almost nothing you will find here has risen to become a "set-in-stone" standard. Much is still in flux. Some have fallen from favor, but -- who knows -- may return to favor. My purpose is to introduce the student to the state-of-the-art. We will become conversant with but not necessarily experts on any given approach, philosophy, or technology.

button

The management of electronic documents predates the Internet and the World Wide Web as we know them. The first initiatives to utilize electronic documents dates from the 1960s when what was to become Dialog began representing print with abstracted electronic surrogates. These databases were to evolve into more complex representations of documents, including full text electronic proxies. In the end, these initiatives have but two purposes: (1) to represent the native document with an effective record and (2) to use that record to effectively retrieve information from the native record.

Librarians, information scientists, computer scientists, and others have evolved multiple ways to represent the record and its content. A recent IFLA study notes that bibliographic requirements and bibliographic methodologies have undergone significant change over the past fifty years. Bibliographic representation -- cataloging and indexing -- can be approached from a number of different perspectives. Fifty years ago just as today, librarians were concerned  with "minimal level cataloging" and "core cataloging." One had to do with how much cataloging was needed, the other with the development of  transnational standards. What  in the end we must address are the same questions and issues raised then: what is needed to meet end user needs and what is needed to meet intermediary needs. Implicit in all this also is what can we realistically do particularly with the advent of a new information resource and one which appears more difficult to manage that its more "traditional" predecessors. We are of course speaking of the World Wide Web.

The IFLA study can be used as a framework for it addresses the identification of those elements, those factors which necessarily are part of a bibliographic record. We will not debate whether IFLA captures all necessary elements or the right elements. Ours is not a cataloging course as such.

What we will consider first is whether WWW documents can be described using IFLA or other guidelines. We will second consider whether there are other elements particular to the WWW that should be a part of a bibliographic representation of a Web document. Finally we will examine critically the multiplicity of approaches to capturing WWW documents through bibliographic representation. It must always be remembered that Web documents are not electronic surrogates for print, they are the original native document. As we produce surrogates of Web documents, we need to take advantage of that characteristic.

Let us acknowledge at the outset that no system for bibliographic representation is perfect. We will see that there a number of initiatives, like Dublin Core that are designed to be implemented at the front end, by the document author or someone else close to the creation of the document. There are other systems, like NetFirst that are post hoc cataloging methodologies. Still others, perhaps beginning with the Warwick Framework are attempts to develop standards that allow one metadata system to "talk" to another. This is and will continue to be an important theme [see Bearman et al].

You will see that some of the approaches we explore are as intuitively obvious as Cutter Numbers while others require extensive training in computer programing to grasp. It is true that we will try to implement some of the approaches, and many of us will be relieved to find that relatively simple metatags through complex XML can now be generated (more or less) automatically with a (more or less) minimum of stress and difficulty. It is important to us as librarians and as information managers to understand how each of these processes work and how we can employ them in our institutions to more effectively manage Web material.
 

A Vocabulary Note

The terms WWW, Web, and World Wide Web are used to mean two very different things. The first is the digital information conduit. Information -- text, graphics, audio, video, databases, etc. -- flow across it just as microwave signals are transmitted from tower to tower or telephone messages are conveyed across copper wire and now optical fiber from point to point. Each has its own protocols and hardware.

The second definition of WWW has is information repository. It is, in fact, those text, graphic, audio, or video objects rather than a transmission medium for those objects. We are concerned here with the management of the content of the Web rather than with the whys and the hows of transmission. This will be the first and last time terms like TCP/IP will be seen here. Ours will be a different set of acronyms (HTML, XML, RDF, URI, TEI, and so on).
 

Page References

David Bearman, Eric Miller, Godfrey Rust, Jennifer Trant, and Stuart Weibel, A Common Model to Support Interoperable Metadata [:]  Progress report on reconciling metadata requirements from the Dublin Core and INDECS/DOI Communities. D-Lib Magazine, 5, 1, January 1999. Available: http://www.dlib.org/dlib/january99/bearman/01bearman.html

IFLA Study Group on Functional Requirements for Bibliographic Records, Functional Requirements for Bibliographic Records, Final Report, September 1997. Available: http://www.ifla.org/VII/s13/frbr/frbr.pdf

Lorcan Dempsey and Stuart L. Weibel, "The Warwick Metadata Workshop: A Framework for theDeployment of Resource Description," D-Lib Magazine, July/August 1996, Available: http://www.dlib.org/dlib/july96/07weibel.html

E-journals aka electronic journals are publications published solely in electronic format. H-journals or hybrid journals are publications offered in both paper and electronic format. Many well established journals of long standing, e.g. Science and the  Journal of the American Society for Information Science are available in both formats.
 

site map...........................|Expectations|Requirements|............................home page