
|Innovative Cataloging|concept|relationship|charateristic|
Grounded theory is concerned with finding common traits among diverse concepts through a positive feed back loop evaluation of empirical test, evaluation, and test. It is necessarily iterative and elastic. It is therefore less phenomenological (particularistic) than synthetic (combination of elements).
Faceted classification systems identify the individual
characteristics of various concepts and systems and recombine them in synthetic
fashion to create somewhat fluid and iterative definitions, classifications
of information. Susan Leigh Star argues that grounded theory and faceted
classification are particularly important concepts for information definition
and therefore retrieval. Given the electronic environment, objections to
the approach are made more illusory.
Let us take for example, the countries of the world and go to my office.
| We could start with the category: All
Countries.
All Countries could be subdivided by Region: [World -- North America]. This can be further reduced to Country: [World -- North America -- United States]. By political subdivision: [World -- North America -- United States -- Oklahoma]. By city: [World -- North America -- United States -- Oklahoma -- Norman]. By street address: [World -- North America -- United States -- Oklahoma -- Norman -- 401 West Brooks]. And finally office number: [World -- North America -- United States -- Oklahoma -- Norman -- 401 West Brooks -- Room 23]. [notes and explanations are usually somewhere else, but
that is not too bad with hypertext.
|
At any point we could select all subclasses within the general class.
For example, if we were stumped at "North America,"
we could see a list of all countries that meet the definition: Canada,
Greenland, Mexico, and the United States. Each sub-group could be further
sub-divided. Cities in Oklahoma could be grouped alphabetically, by SE,
SW, NE, NW quadrants, distance from Oklahoma City or proximity to I-35,
I-40, I-44, and so on.
Herein lies part of the rub. The chain categories used on the Web are non-standard and each are designed on an ad hoc basis. Some are less "ad hoc" than others.
Studies have shown that both inter- and intra-indexer and cataloger uniformity is poor, even for highly trained indexers and catalogers using well defined guides, classification schemes, and thesauri. This means that two catalogers cataloging the same thing will probably not do it the same way. Moreover, the same cataloger cataloging the same resource at two different times will also probably do it differently. Some Web based services are maintained by trained catalogers (again Yahoo!). Others evolve haphazardly.
Librarians are trained to understand and appreciate that one search interface is different from another. For example, the search protocols that underlie Dialog and Lexis-Nexis are very different. Even within Dialog, search protocols differ from one database to another. We know this, we adjust for it.. Moreover, it is well documented by the search services. Dialog's Bluesheets tell us what is there and how how to get at it. Lexis provides all kinds of help.
The Web poses two problems. First, the various Web based search services
are notorious in their lack of documentation. Second, most people who use
the search services assume that there is some sort of inter-service as
well as intra-service similarity. That too often just isn't so.
There are also a number of other important functions that search systems can offer in support of information retrieval. These include proximity searching, truncation, and temporality. Proximity searching permits the user to specify the distance and sometimes the order of relationship between search terms. If one were searching for example with the US President's residence, many search engines would return an inordinate number of totally irrelevant hits if one were to request "white AND house." However, if one could specify "white AND house -- no intervening terms and in this order" the return set would be less complex and more useful. Many engines will do this. Syntax and command vocabulary differ.
Truncation (we will use the term generically to cover both pre and post truncation as well as substitution) allows the searcher to specify variations in tense, spelling, and number without multiple term entry. These include American and English spelling (color and colour), number (house - houses, man - men, mouse - mice), tense (has - had). Again the abilities, syntax, and command languages vary.
Temporality allows the user to specify time ranges for retrieval. For example, one could specify material published between one date and another or from yesterday to today, or just everything. Once again, the abilities, syntax, and command languages vary.
Proximity searching, truncation, and temporality are "bells and
whistles," albeit very nice bells and whistles. The discussion that follows
addresses fundamental theoretical and practical considerations in the building
of indexes and their ability to deliver.
A number of search engines support fielded searches. It is possible to specify that only index pages or title fields be searched. It may also be possible to specify that only certain portions of a document be searched -- first paragraphs, concluding sentences, etc. The assumption is that the most important information and the document "aboutness" are most often found in specific locations. This can be particularly important when doing full text searches, for the likelihood of the inclusion of less relevant material increases as less important portions of the document are included.
In a precoordinate system, the record or document creator designates the search terms from an already authoritative source. Library of Congress Subject Headings (LCSH) is one such source. An important distinguishing characteristic is that concepts are usually represented by a single term drawn from the authoritative source.
In a postcoordinate system, the information creator may create lists of terms or in the case of full-text presentation may allow the text itself to generate search terms. Multiple terms representing concepts may be acceptable (as is the case with the Art & Architecture Thesaurus (AAT)). It is postcoordinate because the end user selects search terms.
What these spiders or robots crawl matters. Some crawl and index "everything" on a Web document. Others have limited themselves to header material or just the title field. Author based indexing tools like Dublin Core and metatags provide data for the search engines to find.
CyberStacks(sm), Available: http://www.public.iastate.edu/~CYBERSTACKS/
Gerry McKiernan's (Iowa State University Curator) catalog of Internet
resources.
"CyberStacks(sm) is a centralized, integrated,
and unified collection of significant World Wide Web
(WWW) and other Internet resources categorized
using the Library of Congress classification scheme."
cite
CyberDewey, Available: http://ivory.lm.com/~mundie/DDHC/CyberDewey.html
Internet Public Library (http://www.ipl.org/)uses
a variety of mechanisms to assist information retrieval. The include graphical
user interfaces (GUI). pathfinders,
formal and informal ("teen collection: dating & stuff") subject
classification, and FAQs and frequently asked reference questions (FARQ?).
"However, we cannot quarantine the quality of the resources as we can for the SOSIG Internet Catalogue. Hence we recommend that you only use the Social Science Search Engine if you are not finding a sufficient number of Web resources from your searches on the SOSIG Internet Catalogue itself." Source: http://sosig.esrc.bristol.ac.uk/help/harvester.htmlSOSIG employs a straightforward metadata format. The following is the record for my ethics page:
[Title]Ethics Links to Librarian and Information Manager Associations
WWW
Pages
Description: Produced by Wallace Koehler, Assistant Professor
at the School
of Library and Information Studies, University of Oklahoma, this site provides
a
set of links to Codes of Ethics and Standards of Practice that have been
published on the WWW by library and information science related professional
associations. The site is divided into four sections: ethics pages, mission
statements, homepages of associations which do not publish ethics or mission
statements online, and other related links. Each link has a brief description
of
what can be found at that site. This Web site actually
consists of a single, rather
long page, which may consequently be slow to download.
Keywords: ethics, professional standards, mission statements,
librarians,
computer personnel, information professional associations
Classification Scheme: UDC
Classification Number(s): 174
Subject Section(s): Professional and Business Ethics
Resource Type: Resource Guides
Admin Name: Wallace Koehler
Admin Email: wkoehler@ou.edu
Language: en
URL: http://www.ou.edu/cas/slis/ethics/EthicsBibOrg.htm
I have added emphasis to the metadata field titles. Note too
that the title field tag is implicit.
Fuzzy searches are often presented as graphical user interfaces (gui) which seek to demonstrate the relationships among information vectors.