[This is a work page for a Wiki Metadata API specification and should be considered only as ideas at this stage. -- MurrayAltheim, 20 June 2007]


Dublin Core Metadata Terms for JSPWiki#

This is based on the Dublin Core Metadata Initiative schema, probably the most popular metadata standard on the Web.

A start at breaking this down property by property:


DC.title#

The title of the wiki page.

property http://purl.org/dc/elements/1.1/title
example One Minute Wiki

There should probably be a provision for an ISO language code to permit multiple language expressions of the wiki page title.


DC.format#

property http://purl.org/dc/elements/1.1/format
schema http://purl.org/dc/terms/IMT
fixed value application/x-wiki+jspwiki

In the the DCMI schema, the type property would be expressed as Format, identified by the URI http://purl.org/dc/elements/1.1/format. When expressed in a syntax that permits it, one should add "IMT" (Internet Media Type) as the scheme using the DCMI Term http://purl.org/dc/terms/IMT.

The IMT (or MIME) type for wikis has been a bit of a holy grail for a long time. I'd not recommend putting 'jspwiki' at the immediate token following the 'x-' as it's really a flavor. In reading over RFC 2046 I'm rather torn between text/ and application/ given wiki text is generally human-readable, but what tends to kick it over to the application/ side of things is the fact that text/ requires a CRLF as EOL delimiter, and we plainly don't enforce that (nor do I think that CRLF is a reasonable line delimiter in 2007, but that's neither here nor there).

So what I'd recommend for the IMT for JSPWiki would be application/x-wiki+jspwiki, where the content following the + sign is the syntax identifier. It's not great (I think it'd ideally be a URL since I'm sure IANA doesn't want to register wiki syntax names) but it's okay.

We might want a way to express the difference between a wiki page and an attachment; not sure where I'd do that.


DCTERMS.created#

The initial creation date of the wiki page.

property http://purl.org/dc/terms/created
example 2007-03-22T09:44:52

DCTERMS.modified#

The date of the last modification of the wiki page (i.e., its most recent revision).

property http://purl.org/dc/terms/modified
example 2007-05-14T12:31:02

DC.creator#

The initial creator of the wiki page (its first revision).

property http://purl.org/dc/elements/1.1/creator
example JanneJalkanen

DC.contributor#

The editor of the most recent revision of the wiki page.

property http://purl.org/dc/elements/1.1/contributor
example MurrayAltheim

DC.identifier#

The URI identifier of the page (its locator and canonical identifier).

In previous versions of DCMES it was possible to use qualifiers on DCMES elements, so that one could express a subtype. I used to use this to express revision numbers (using the identifier http://purl.org/dc/elements/1.1/identifier.version) but I'm not sure with the current Terms how this is done appropriately. Will have to look into how to express revision numbers since it's clearly a requirement.

property http://purl.org/dc/elements/1.1/identifier
example http://www.jspwiki.org/wiki/OneMinuteWiki


Discussion#

Quoting Janne Jalkanen <janne.jalkanen@iki.fi>:
[...]
It needs a proper metadata API... I don't particularly want to
introduce anything new that would break anyway soon.

However, there are quite a few people over on this list who *are*
interested in proper metadata stuff. I'd recommend that you kick off
a task force to write up a requirements list (this is the same way as
auth got implemented: Andrew wrote a really good requirements list,
and I gave him direct CVS access to write the code, too ;-)

Murray or someone else, if you're willing to lead this task force,
that'd be great. Or, if someone wants to start to maintain 2.4. and
make it a stable, I can lead the effort, too... ;-)

Hi Janne,

Certainly I'm willing to lead this if I'm not stepping on anyone's toes doing so. As I mentioned to you privately, I'll be adding a metadata functionality to my implementation regardless of whether it ends up back in the JSPWiki codebase, but I'm happy to share it as well.

My requirements are fairly straightforward:

  1. be able to store basic Dublin Core metadata of the type commonly stored in HTML/XHTML <meta> elements as per the DCMI specification ''Expressing Dublin Core in HTML/XHTML meta and link elements''. This includes both DC and qualified DC using standardized identifiers.
  2. The metadata schema should be independent of the WikiPageProvider, in that it should be easy to implement for providers that can "natively" store the metadata along with the page, but there is to be no requirement on how the binding or storage between the page and the metadata is handled on a per-provider basis (i.e., we leave the details up to the developer of each provider)
  3. While the basis of the metadata should be the DC schema (since that is the most common worldwide standard and is suitable for what a wiki page might need), the metadata provider should be extensible so that other schemas or appropriately-namespaced metadata properties can be stored using either simple name-value pairs, or within an XML element containing the metadata, with appropriate namespace labelling.

Any other requirements people have, please send them into the thread and I'll collect them together onto a wiki page.

I have a basic implementation that does #1 and #2, and my needs for #3 are not very high right now, so I'd likely not spend much time on it, though I'd be very happy to have input on it once I get going on writing this up.

As Janne knows, I've almost finished a new XNodeProvider, which is a WikiPageProvider implementation based on the XNode API I developed for my Ceryle project, basically an XML backend that has per-document metadata. I'll be writing up both the spec for XNode, publishing the javadoc API, and releasing an open source implementation within the next month, sooner if I can get help from someone on posting it to SourceForge.

Co-workers on the metadata API are most welcome.

Thanks,

MurrayAltheim


A few key parameters:

I'm not really worried at this stage that much about implementation, or even API, but what is clearly needed is a design of the repository structure. Things like:

"All page content shall be stored as wiki:text -properties under the respective Node"

"A Node shall represent a wikipage or an attachment"

"The xxx:type property shall define the MIME-type of the object. Wikipages shall be stored as application/x-jspwiki".

"The path to a wikipage consists of WikiFarm name, then the direct path name. E.g. /MyFarm/MyPage. There shall always be one Farm, called "Main".

You know, that sort of stuff. That's what's critical at this stage...

--JanneJalkanen, 19-Jun-2007


A couple of comments: The title of the page is coupled to its Node name, so that's probably not needed. Also, JSR-170 already defines UUIDs (as "jcr:uuid"), so dc.identifier is not needed either (though it can be exposed also as dc.identifier, if necessary). There are some other things that JSR-170 already provides, such as the "jcr:created" property.

Also, a small technicality, JSR-170 uses colons instead of periods, so it's "dc:contributor", "dc:created".

--JanneJalkanen, 19-Jun-2007


Also, JSR-170 defines a versioning API (using nt:versionHistory and nt:version types), so you don't have to worry about expressing version numbers; those come free with the API. They are also available through /jcr:system/jcr:versionHistory, so they already have a place in the Node tree.

--JanneJalkanen, 19-Jun-2007


Yes, understood and agreed. A page title is simply that, a title. There can be many different titles for a given page, in different languages, singular and plural stems, etc. Given this is a wiki there should probably be a way to keep some way of eliminating name collisions, since when there is a tight coupling this is impossible but becomes possible when we break that coupling. There will still need to be some tight coupling with a canonical name for a page unless we're going to break a lot of existing wiki paradigms. I think that would also lead to a lot of user confusion.

The list of properties above are going to be properties of any page in a repository, i.e., there will always be at least one title, one identifier, etc. though of course some things are optional. We'd in the "API" specify for a given page record which are required and which optional.

So when you say something like an identifier is "not needed", it's not so much as it's not needed as there's likely an isomorphism with an existing Dublin Core property. For example, if we have a "jcr:uuid" value, that's likely isomorphic with DC.identifier. As to colons versus dots, that'll depend on the expression syntax. Dublin Core is pretty explicit about the different ways a metadata record can be expressed, and I'd simply argue that for whatever way we're expressing that metadata we not break any rules. There's to my knowledge no place where a colon is used as the delimiter between the "DC" prefix and the property name, except when we're talking about XML syntax, such as <dc:identifier>. But we don't really need to talk about that level of detail — all that stuff is already in the specs. While there may be a way of expressing version numbers in JSR-170, if we're expressing version numbers in DC there isn't a standard way, though the UK has an extension that I'm using, which is as close to a standard as we currently have. (remember, I'm talking explicitly about metadata as expressed in Dublin Core; if within whatever design API we dream up we drift from DC I'd like to hear a good argument given that DC is the way of marking up metadata in Web pages, with no significant alternative). If on the inside of the engine something is marked up as something else (say, something from JSR-170), we'll still need to expose it as DC.

This stuff has all been worked out and is in extremely wide use, see Expressing Dublin Core in HTML/XHTML meta and link elements for details.

-- MurrayAltheim, 20 June 2007


I'm not too hot on doing localization at page level - I can see it resulting in more trouble than what it's worth.

There's one minor error in your thinking, and it's that you're thinking about metadata of pages. This is slightly incorrect. JSR-170 is more generic, as it exposes everything as properties of nodes. While in most cases there is not much difference, granted, this means that technically speaking, the page content itself is metadata of the node, and so is everything else, such as the author, etc.

The JSR-170 notation actually comes from XML, so therefore dc:contributor is correct in our sense :-).

But, before we go any deeper into this, we need to enumerate the properties that we need, and how the repository (or, if you will, the DOM) should be built before jumping into the intricacies of Dublin Core, though. Once the properties are enumerated and analyzed, we can then figure out if we have anything in Dublin Core that we can use. For example, the RFC 4287 might be also an useful source of syntax.

Let's not decide on implementation before requirements.

--JanneJalkanen, 20-Jun-2007

Janne, I think you're misinterpreting me on a couple of counts. I'm not talking about doing localization at a page level, I'm talking about designing a metadata API that permits multiple languages for any given metadata field. DC does that abstractly, and concretely in some of its syntaxes. And no, I'm not thinking about the metadata of pages, but of nodes (basically, the metadata needs to be applicable as you suggest at any level — no issue there, things need to permit recursivity in the design as well as syntactically), and I'll stress that the colon at this point is not necessarily what we'd use since we haven't figured out whether (a) whether we're talking about abstract or concrete syntax or (b) whether the property names will show up in the implentation as XML element type names or as attribute values; in the latter case, no, we'd not see colons. In the former, only if we use XML Namespaces (where 'dc' is the namespace prefix). But as you say, let's not talk about that kind of detail until we've figured the requirements. I'm only enumerating the Dublin Core properties in the abstract sense at this point. How they get referred to will depend on implementation.

-- MurrayAltheim, 21-Jun-2007

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-12) was last changed on 21-Jun-2007 00:39 by MurrayAltheim