The Dublin Core Metadata Initiative (DCMI) is an organization dedicated to promoting the widespread adoption of interoperable metadata standards and developing specialized metadata vocabularies for describing resources that enable more intelligent information discovery systems. DCMI is derived from an informal working group hosted by the Online Computer Library Center (OCLC) in Dublin, Ohio, and is now an organization in its own right.

DCMI specifications have become very popular in the library community, such that Dublin Core metadata is being used for the over 58 million records in WorldCat — a combined catalog from the OCLC's consortium of over 9000 world libraries.

The Dublin Core also provides a way of adding metadata to HTML documents by declaring a "DC" namespace using a <link> element and then declaring name-value pairs in individual <meta> elements within the document header, e.g.:


  <head>
    <title>John Steinbeck's "Cannery Row"</title>
    <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
    <meta name="DC.title" lang="en" content="Cannery Row" />
    <meta name="DC.creator" content="Steinbeck, John" />
  </head>
 


DCMI Documents#

The Dublin Core (DC) has produced a number of published specifications in various stages of completion. The most important, and the basis for many others, is probably the Dublin Core Metadata Element Set (DCMES), currently (as of June 2005) at version 1.1. This is a set of metadata fields common to describing to information resources, as described therein:

The Dublin Core metadata element set is a standard for cross-domain information resource description. Here an information resource is defined to be "anything that has identity". This is the definition used in Internet RFC 2396, "Uniform Resource Identifiers (URI): Generic Syntax", by Tim Berners-Lee et al. There are no fundamental restrictions to the types of resources to which Dublin Core metadata can be assigned.

There are fifteen fields, or "elements" in the DCMES, which include the kinds of things you'd expect as metadata (e.g., title, author, publisher) about information resources (e.g., books, videos, etc.). The DMES list below includes the title of each field as well as its XML "element", though DCMES is not always used in XML). It turns out that DCMES has over its lifespan been wrapped into a larger metadata schema developed as part of ISO 11179, so that there's now an overarching DCMI schema called the Dublin Core Metadata Terms that includes all of DCMES but also fills in the details needed to comply with ISO 11179.

If this all sounds a bit complicated, it's only meant to convey that a great deal of thought has gone into making DCMI's metadata products both functional and compliant with all other related metadata standards. It is in reality a very simple standard and not meant to compete with existing library metadata schemas, such as MARC, which has hundreds of fields. DCMES is meant to capture the kinds of things that are useful for users to find content. This makes it ideal for most web-related metadata needs.

Using Dublin Core with HTML/XHTML#

There's a DCMI specification that describes how to embed DCMES in HTML/XHTML <meta> elements, Expressing Dublin Core in HTML/XHTML meta and link elements.

There's even an existing way of encoding DC metadata in HTML <meta> elements, as described in a document Expressing Dublin Core in HTML/XHTML meta and link elements. This basically means you first declare the DC namespace using a <link> element, then in one or more <meta> elements you can embed a DC identifier in the 'name' attribute and the metadata in its 'content' attribute. For example:


  <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
  <meta name="DC.title" lang="en" content="Electric Forest Blog" />
  <meta name="DC.creator" content="Murray Altheim" />
  <meta name="DC.subject" lang="en" content="1. Books. 2. eBooks. 3. Digital Libraries. 4. Topic Maps. 5. Metadata. 6. Information Organization." />
  <meta name="DC.description" lang="en" content="A group blog devoted to discussion of eBooks, digital libraries, and software tools for organizing our thoughts." />
  <meta name="DC.publisher" content="Murray Altheim" />
  <meta name="DC.contributor" content="Murray Altheim, Patrick Durusau, Lee Iverson, Alexander Johannesen, Jack Park, Gary Richmond, Roger Sperberg, Conal Tuohy, Bernard Vatant" />
  <meta name="DC.date" content="2005-05-30" />
  <meta name="DC.type" content="Text" />
  <meta name="DC.format" content="text/html; charset=ISO-8859-1" />
  <meta name="DC.format" content="57486 bytes" />
  <meta name="DC.identifier" content="http://www.altheim.com/ef/" />

Now, this doesn't compare with library MARC records (which have hundreds of fields), but it is certainly suitable for web documents.

Extending DCMES#

There are ways of extending DCMES through the use of qualifiers and schema identifiers. The former extend the various DCMES fields (such as Date to include Date.modified); the latter provide information about the format of the metadata content (e.g., specifying the way that a DC.date is expressed, such as ISO 8601/W3C DTF (yyyy-mm-dd'T'hh:mm:ss, or that a field contains a URI).

The qualifiers were originally described in their own document, Dublin Core Qualifiers, but document has been superceded, with its content being wrapped into the DCMI Terms described earlier.

Here's an example of some DCMI qualified terms (noting that this declares a second namespace beyond "DC" called "DCTERMS"):


  <link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
  <meta name="DC.date" scheme="DCTERMS.W3CDTF" content="2005-05-30" />
  <meta name="DC.type" scheme="DCTERMS.DCMIType" content="Text" />
  <meta name="DC.identifier" scheme="DCTERMS.URI" content="http://www.altheim.com/ef/" />

The Dublin Core Metadata Element Set (DCMES)#

Title (<Title>)
A name given to the resource.
(Comment: Typically, Title will be a name by which the resource is formally known.)
Creator (<Creator>)
An entity primarily responsible for making the content of the resource.
(Comment: Examples of Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity.)
Subject and Keywords (<Subject>)
A topic of the content of the resource.
(Comment: Typically, Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.)
Description (<Description>)
An account of the content of the resource.
(Comment: Examples of Description include, but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.)
Publisher (<Publisher>)
An entity responsible for making the resource available
(Comment: Examples of Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.)
Contributor (<Contributor>)
An entity responsible for making contributions to the content of the resource.
(Comment: Examples of Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity.)
Date (<Date>)
A date of an event in the lifecycle of the resource.
(Comment: Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 W3CDTF and includes (among others) dates of the form YYYY-MM-DD.)
Resource Type (<Type>)
The nature or genre of the content of the resource.
(Comment: Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the DCMI Type Vocabulary DCT1). To describe the physical or digital manifestation of the resource, use the FORMAT element.)
Format (<Format>)
The physical or digital manifestation of the resource.
(Comment: Typically, Format may include the media-type or dimensions of the resource. Format may be used to identify the software, hardware, or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types MIME defining computer media formats).)
Resource Identifier (<Identifier>)
An unambiguous reference to the resource within a given context.
(Comment: Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Formal identification systems include but are not limited to the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).)
Source (<Source>)
A Reference to a resource from which the present resource is derived.
(Comment: The present resource may be derived from the Source resource in whole or in part. Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system.)
Language (<Language>)
A language of the intellectual content of the resource.
(Comment: Recommended best practice is to use RFC 3066 RFC3066 which, in conjunction with ISO639 ISO639), defines two- and three-letter primary language tags with optional subtags. Examples include "en" or "eng" for English, "akk" for Akkadian", and "en-GB" for English used in the United Kingdom.)
Relation (<Relation>)
A reference to a related resource.
(Comment: Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system.)
Coverage (<Coverage>)
The extent or scope of the content of the resource.
(Comment: Typically, Coverage will include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names TGN) and to use, where appropriate, named places or time periods in preference to numeric identifiers such as sets of coordinates or date ranges.)
Rights Management (<Rights>)
Information about rights held in and over the resource.
(Comment: Typically, Rights will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions may be made about any rights held in or over the resource.)

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-9) was last changed on 10-Sep-2006 20:42 by JanneJalkanen