Introduction to Dublin Core

One of the key driving forces behind OCLC's and NCSA's development of a simple, easy to understand and implement, electronic document description was the chaotic environment users encounter searching the Internet (Weibel, Godby, Miller, & Daniel, 1995). Another motivational component is the lack of descriptive information provided in documents that would aid in the cataloging of scholarly electronic materials. The first workshop, held in Dublin, Ohio had three major goals:

  1. Recognize the clientel, or communities concerned with document retrieval (e.g. Universities)
  2. Realize the potential metadata descriptions would serve these user populations
  3. Achieve a consensus of the minimal descriptive metadata elements needed to facilitate resource discovery.
The latter goal has the greatest impact on users wanting their Web pages to be accessed, as well as wanting greater precision retrieving documents in their own searches. Even sophisticated search engines (e.g. Alta Vista Advanced Search) are still inadequate in regard to precision in document retrieval. The metadata mentioned in the third goal is a set of embedded fields found in the <HEAD> </HEAD> that describes the DLO (Document-Like-Object) very similar to the fields found in a cataloging MARC record or information supplied in a bibliography or reference list. Each component of information (e.g. author, title, publisher, etc.) is called an Element and encompasses a single data field. Future designs of search engines may eventually allow for "fielded searching" of these metadata elements (Dublin Core Paper No. 99, 1997). In other words, if you wanted to search for DLOs written by someone in particular, the search engine would provide you with that option without having to do a keyword search.

The fifteen elements, finalized December of 1996, provide minimal, but sufficient information to describe documents and to "facilitate the discovery of DLOs in a networked environment such as the Internet" (Weibel, 1995). The Fifteen elements, with thumbnail definitions are:

Title: The name given to the work by the author or publisher.
Subject and Keywords: The topic of the work, or keywords that describe the content of the work.
Descriptors: A textual description of the content of the resource.
Creator or Author: The person(s) primarily responsible for the intellectual content of the object.
Publisher: The agent or agency responsible for making the object available in its present form. Generally a publisher, an institution (university department, for example) or a corporate entity.
Contributors: The person(s) other than the author(s) who have made other significant intellectual contributions to the work (for example, editors, transcibers, illustrators, convenors).
Date: The date the work was made available in its present form.
Resource Type: The genre of the object, such as home page, novel, poem, working paper, technical report, essay, dictionary, etc.
Format: The data representation of the object, such as Postscript file or Windows executable file.
Resource Identifier: String or number used to uniquely identify the object.
Relation: Relationship to other objects.
Source: Objects, either print or electronic, from which this object is derived, if applicable.
Language: Language of the intellectual content.
Coverage: The spatial locations and temporal durations characteristic of the object.
Rights management: Intented to be a link to a copyright notice, a rights-management statement, etc.


As mentioned above, the meta elements resemble the information used in a reference list or bibliography. This is another key design feature that will make the application and utilization of the elements easier. Some search engines currently index some of META tags in addition to the terms used in the documents (Richmond, & Richmond, 1997). There are basic concepts that document creators must keep in mind:

  1. All elements do not have to be used.
  2. All elements are repeatable.
  3. There cannot be any empty elements.
  4. The only data required are the META NAME, the CONTENT and their respective values provided by the creator.
The design of the Dublin Core Metadata package was deliberately made simple to allow the describing of documents by creators easier. Easier because they do not have to know complicated procedures of cataloging, or what standard subject term resources are available to them (e.g. LCSH, ERIC Thesaurus, etc.), nor do they have to know other standard code or abbreviation conventions used (e.g. ISO31). Creators of DLOs will be encouraged to use standard resources for descriptions of their works, but in reality they only need to know how to input what they know. This deliberate simplicity of the DC is one of the key reasons that make this approach of electronic document description workable and desirable.

As mentioned before, there are components to a DC element that must be present: the element META NAME, CONTENT, and their respective information. A better understanding can be obtained by example:

<META NAME="DC.subject" CONTENT="Dublin Core">
The entire meta string is one DC element (i.e. < to >). A breakdown of each componant in the above element is as follows:

Knowing that there are many resource descriptions available to creators of electronic documents, the DC elements were developed to compliment, not replace, other metadata packages. As a means of distiguishing between Dublin Core elements and others, the DC acronym, known as a Schema Identifier, is attached as a prefix to the META NAME value, separated by a full stop. Also allowed, is the mixing, or mapping of different document description types (i.e. Dublin Core, Institution specific, etc.), a process known as extensibility (Dempsey, Weibel, 1996).For example:

<META NAME="DC.title" CONTENT="Journal for and Information Age">
<META NAME="FSU.frequency" CONTENT="Annual">

The elements as a whole (i.e. the entire contents between the <HEAD> and </HEAD>) are sometimes referred to as a container-package. The actual aggregating of different container package types found in documents is accomplished with a mechanism known as the Warwick Framework. Proposed at the Second Metadata Workshop held in Warwick, UK in April 1996, it is a container architecture used to map different types of metadata sets. The Warwick Framework will not be discussed here, suffice it to say that this mapping, in addition to extensibility, is what also allows for the interoperability of metadata between diverse computer systems (Lagoze, Lynch, Daniel, 1996).

The Dublin Core elements can also be clarified by using what became known as the Canberra Qualifiers, a name dubbed on the SCHEME, TYPE, and LANG qualifiers at the 4th Dublin Core Workshop held in April of 1997 (Miller, & Gill, 1997). These three qualifiers are optional, but there may be times when the creator of documents should clarify the meta entry. Future prospects of limit searching by these qualifiers provided by seach engines may be incentive enough for creators to study, learn and utilize them.

The user of this document must be cognizant of the fact that the use of qualifiers is still in the experimental stage. That means that, although the three types of qualifiers have been determined, the proposed conventions of how they should be used have not. There are many documents published on the Web with recommeded usage of these qualifiers. The examples listed in this guide are selected from many of these resources, particularly those that embrace and satisfy the original need for database interoperability. The inclusion of MARC tags in the narrative are hoped to aid professional Information Specialists in the understanding of the mapping, or crosswalk, of DC and MARC (interoperability). As new developments occur, they will be reflected in this document.

Qualifiers are used to clarify exactly what the data value of the CONTENT is. In other words, if the data in the CONTENT is an email address, the TYPE qualifier can be appended to the META NAME value. For example:

<META NAME="DC.creator.email" CONTENT="gvf3184@mailer.fsu.edu">

Of the three qualifiers, TYPE is the only qualifier appended to the the META NAME value. This qualifier is used to clarifiy the DC element it is appended to. For example, the email address above defines the type of DC.creator provided in the CONTENT value. This extention, separated by a full stop (period), is referred to as the dot mechanism (Heery, Miller, Gill, & Beckett, 1997).

The qualifiers are intended for the future search engines. The search engine can look for those DLOs that have the .email appended to the creator element value when a searcher is looking for that particular information, or when looking for specific documents written by the John Smith of a specified email address (after all, how many John Smiths do you think there are with Web pages?).

The companion elements, SCHEME and LANG are placed in the value of the CONTENT attribute, each enclosed in its own separate parentheses. SCHEME specifies the established standard that the creator of the document is using to obtain the data for the CONTENT's value. For example, the creator may use a subject term that has been established as a standard by the Library of Congress. The acronym for this SCHEME is LCSH and, if the content of the document was about Afro-American women, the DC element would resemble the following.

<META NAME="DC.subject" CONTENT="(SCHEME=LCSH) Afro-American Women">

There are two recommended standards that may be used for the LANG qualifier: ISO639 and Z3953. ISO 639 comprises of two lowercase characters from the ISO 639 International standard. NISO Z3953 is a three character code for languages that is compatible with MARC and recommended by the author of this document. The LANG qualifier is intended to distinguish the language of the Dublin Core metadata from the language of the document itself. In other words, the document may be a poem in Italian but the DC description is in English. Note also that there is a separate DC element for the language of the document. For example:

<META NAME="DC.title" CONTENT="(LANG=en) Ah, poor heart!
<META NAME="DC.language" CONTENT="(SCHEME=Z3953) ITA">

There will be default values assigned for each qualifier for each element. It is only necessary to create generic metadata for these default values. In other words:

<META NAME="DC.subject" CONTENT="Dublin Core">

All DC elements are repeatable, which means that if the DLO has two or more authors, each author is provided a separate DC.creator element. Every element used also must have a CONTENT value. In other words, don't input meta tags unless you need and are going to use them.

The user of this document must also be aware that the <LINK> tag and its use with Dublin Core has been ommitted due to that fact that the author felt that it is unneccessary work on the creator of documents. This tag is also optional. For more information on this tag and its use, see the Bibliography

bar

Return to DC Guide Index
Return to Information Studies Index
Return to Home Page

bar

Please send your questions, comments, or suggestions to gfrost@valdosta.edu
Copyright ©1997 by Guy Frost
Last Updated July 25, 1997