course logo
MODULE 3
INTRO
PAGE 1
Spam Indexing
PAGE 2
Mark Up
PAGE 3
MetaTags
PAGE 4
SGML/XML
PAGE 5
Dublin Core
PAGE 6
PICS
PAGE 7
Metametatags

home logo site map

SGML


What is the Standard Generalized Markup Language (SGML)? Library of Congress defines it as: "SGML is a set of rules for defining and expressing the logical structure of documents thereby enabling software products to control the searching, retrieval, and structured display of those documents." Cite

SGML and XML are a parent mark up language and a metalanguage.A markup language like SGML and its derivative langauge XML  are used to encode or give specific machine readable meaning to a body of material. The unique characteristic of SGML is that it indicates document type and it is therefore known as a document type definition (DTD). Thus documents have form and characteristics that define their type. HTML is an SGML based language.

Because SGML is DTD, it is an electronic variant of diplomatics. Diplomatics is, according to Duranti , the bibliographic art of comprehending the purpose of a document by examining its form or format. Legal documents, for example,  have an appearance or "feel."  The National Enquirer and the New York Times each have their own form. A great deal can be known about the content and the quality of that content at a glance and without extensive perusal of the documents. SGML is an electronic definition.

An SGML marked up document can be used in a numbers, assuming one has the appropriate formater or editor. SGML can be used to navigate within documents, it can be used to modify document templates for specific applications, and its terms and entities can be used to classify document types and content.

SGML supports entities. Entities are defined in SGML as substitutes or short-hands for terms or concepts. An SGML senstitive interpreter will read the SGML entities and substitute the defined string for the entity. Entities are in a sense electronic versions of jargon, abbreviations, or acronyms. These latter terms are interpreted mentally rather than electronically.
 

XML

Extensible Markup Language (XML) is a "simplified" subset of SGML (XML Overview) XML was developed to be included and interpretable within HTML. It is supported as part of the HTML 4.0 standard. XML is seen to have at least seven application that include data exchange, query formulation, authoring tools, support of commerce and the following:
 
"7.Metadata Interchange
                 There is growing interest in the interchange of metadata (especially for databases) and in the use of metadata registries to facilitate interoperability of database design, DBMS, query, user interface, data warehousing, and report generation tools. Examples include ISO 11179 and ANSI X3.285 data registry standards, and OMG's proposed XMI standard." (W3Ca)
XML consists of a complex set of data types (W3Cb). These are defined by the 3-tuple or tri-tuple: value space, lexical space, and value space facets.

For an interesting and somewhat critical review of XML as Web mark up, and therefore an indexing language, see Elliott Pritchard's MSc thesis at the University of Sheffield. He concludes that there will likely be more corporate support than personal for the standard. And while there are drawbacks to XML, its advantages outweigh them.


Readings

C. M. Sperberg-McQueen and Lou Burnard, eds. "A Gentle Introduction to SGML", ch 2 in Guidelines for Electronic Text Encoding and Interchange (TEI P3). Available: http://www-tei.uic.edu/orgs/tei/sgml/teip3sg/index.html

L. Duranti. "Diplomatics: New uses for an old science." Archivaria 28, 1 (1989) pp 7-17.

Robin Cover,  The SGML/XML Web Page Extensible Markup Language (XML) http://www.oasis-open.org/cover/xml.html#overview. Last modified 31 January 2000.

W3C(a), XML Schema Requirements, W3C Note 15 February 1999, NOTE-xml-schema-req-19990215. Available: http://www.w3.org/TR/NOTE-xml-schema-req

W3C(b), XML Schema Part 2: Datatypes,  W3C Working Draft 17 December 1999. Available: http://www.w3.org/TR/1999/WD-xmlschema-2-19991217/datatypes.html

Elliott Pritchard, XML: the future of web markup? MSc Thesis in Information Management, 1998/1999. University of Sheffield. Available: http://panizzi.shef.ac.uk/elecdiss/edl0003/index.html

Resources

To automatically generate DTD - see Fred at OCLC: Fred: The SGML Grammar Builder Project, Available: http://www.oclc.org/fred/docs/about-fred.html, Fred: Automatic DTD Creation from Sample Text, Available: http://www.oclc.org/fred/docs/create-dtd-text.html and Automatic DTD Creation from a URL or Sample Text Available: http://www.oclc.org/fred/docs/help/create-dtd.html