During the development of some plugins I've noticed that there should be some sort of a way to store persistent metadata with the wikipages, not only attachments. For example, the VotePlugin (2.1.52+) currently stores the votes it has received as attachments on the page, but really, the data should be stored in some other way.

There is already discussion on AttachmentMetaData, but this is really a bigger issue that concerns both WikiPages and WikiAttachments. It also ties in with some of the discussion on Ideas - SubPages, as the metadata could be embedded in subpages.

It must be possible to define arbitrary metadata so that plugins can save metadata as they please. However, certain constant names for JSPWiki internal use need to be defined, such as "author" and "lastModified" and "description". For those that overlap with the existing DublinCoreMetadata, it might be useful to use the existing DC identifiers.

Currently, we have the WikiPage.get/setAttribute() calls, but these unfortunately only handle non-persistent information, as there is no way to store this information on shutdown. I fear that the PageProvider interface needs to be amended with

    void setPageInfo( WikiPage page )
which would then save all of the PERSISTENT attributes. The WikiPage class needs to be amended with
    void setAttribute( String key, String value, int persistence )

where persistence is either TRANSIENT or PERSISTENT.

Unfortunately, breaking the PageProvider interface means that we don't want to do this now; we want to wait until 3.0.

Question: is it better to store Serializable Objects instead of Strings?

-- JanneJalkanen, 18-Jul-2003

Please don't. Serializable Objects will only work (with less effort) with the same JVM or so. Changing it will make the serialized object unreadable. Maybe there are solutions that aviod this problem, I don't know. Maybe TOMCAT has a coded solution for storing session data to make it serializable?

-- Guido, 19-Jan-2004

You might want to take a look at XMLEncoder and XMLDecoder in the java.beans package of the Java API. Or if you're just interested in storing strings as meta-data but still want to be able to use line feeds and such, you might wanna check out the loadFromXML and storeToXML functions of java.util.Properties-- ErikAlm, 08-Jun-2005

Guido has a point. Let me add something to the need for metadata: I would immediately rewrite my PageProvider to store and read access permissions from the metadata instead of a plugin-like syntax on the page. (I'd also add a small JSP page to edit these permissions, available perhaps in the page footer or the "more information" view.)

-- ebu

This sounds good to me. I'm using access permissions right now from the current alpha implementation and created some classes to get connected to our user db. I find the permission stuff not that simple. In addition I needed an information about translated pages, because we use two languages in our intranet. At the moment we have defined two attribute names, where the jsp is checking for presents of them. Question: how do you thing the metadata attribute names shall be configured? Via jspwiki.properties perhaps?

-- Guido, 21-Jan-2004


Discussion#

Wouter (2003-09-25) - See my rant at http://nukleos.editthispage.com/2003/09/16. You'ld need an extended interface, indeed - that may well complicate matters. BTW, Categories are Metadata, but/and they can be the "header" for Subpages as well. So you might get them up and running through this same mechanism as well...

There already is one, has been for a year now... See WikiRPCInterface. I *am* happy with the MetaWeblogAPI, but I don't think it is suitable for Wikis as such.

-- JanneJalkanen

Wouter (2003-10-02) - Hmm, what is missing is something like a getPageMetadata( String pagename), returning an XML-formatted string or an array of key-value pairs - and a setter as well. 't Would be easier if one could adapt the existing methods to emit/accept XML; that way, you could get the page AND the metadata (including author etc) in a single call. But of course, that's wishful thinking: a lot of existing apps would crash right away! BTW, I was thinking of new Java interface classes, not XMLRPC...


I've put something in Ideas : 19.10.2003 -- insert header (javascript and meta) It needs discussing ... just an idea . FrancoisParlant (2003-10-21)


Hey, just in case anyone is interested in this, I thought I'd post it. It's relevant to metadata, but it's also relevant to searching. Maybe you'll find it interesting, maybe not....

Okay, here's what I want to do, and my Google-Fu is too weak to find out how...

I am writing a Wiki, which will form my database. I want a search engine tool which will crawl my wiki, recognise four or five database fields as well as the wiki name of the particular page, and build a searchable index for each page.

For example the page might be

Apology :

LibTitle : Apology LibAuthor : Plato LibTopics : Philosophy Greece Ancient

And I'd like to provide a search interface for searching Wikipages based on those optional keywords, or fulltext, and coming back with the results.

I'd also like to (if I could) provide some kind of page clustering based on similar searches - i.e. not just come back with ranking hits, but by some kind of page association mechanism.

Any ideas?


NascifAbousalhNeto (3/21/2005) How about using the Emacs approach to store local variables in page? Granted this would limit the metadata to name-value pairs, but it has the major advantage that it would not break any of the existing APIs to the file providers. A small change in the markup language processor would do.

The idea would got basically like this: as a page is loaded, its header is scanned for lines using a special notation, say:

# name1: value1
# name2: value2

<contents>

Those lines would be parsed and the name-value pairs stored as metadata in the page. When the page is saved, the values would be updated by the wiki engine (just in case they were somehow modified, say by a plugin).

The Edit page would have to be changed to not show the "metadata" lines, and perhaps to add a small form-oriented to manipulate the page metadata attriubutes.

Perhaps a PageFilter could be used to implement that feature... I am going over the API and it looks promising! :-)

ErikAlm, 09-Jun-2005: I like this idea since it furthers the email metaphor.

However, I fear it may also confuse things. For instance, what happens if a user puts anything but name-value pairs before the first blank line? Even if we only accept header-lines that has hash-pounds (#) we can get into problems if a user decides to start their page with a number list.

How about using something completely other than the # to mark a name-value pair? Maybe $name: value or @name: value? We could even do $name=value. This way you could do:

<some contents>
$name1: value1
<some more contents>
$name2: value2
All lines starting with "$" or whatever character, would then be removed.


Another idea would be to save the page metadata as a separate page by using SubPages: Each page would have a corresponding "WikiPage/_metadata" -page. However, this causes some problems with versioning and migration...

-- JanneJalkanen, 12-Jun-2005


I'm looking at various ways of harvesting metadata from wiki pages, such that it could be used for "meta categorization" and creating navigation layers, likely using XML Topic Maps. There's an existing metadata standard that was developed by the library community and is highly advocated for use in web pages, called Dublin Core (DC). The Dublin Core Metadata Element Set (DCMES) is a set of about a dozen elements for common metadata, like author, title, etc. and is used in WorldCat, an international library project that now encompasses over 58 million metadata records in over 9000 libraries. IOW, you can't really go wrong using Dublin Core metadata.

There's even an existing way of encoding DC metadata in HTML <meta> elements, as described in a document Expressing Dublin Core in HTML/XHTML meta and link elements. This basically means you first declare the DC namespace using a <link> element, then in one or more <meta> elements you can embed a DC identifier in the 'name' attribute and the metadata in its 'content' attribute. For example:


  <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
  <meta name="DC.title" lang="en" content="Electric Forest Blog" />
  <meta name="DC.creator" content="Murray Altheim" />
  <meta name="DC.subject" lang="en" content="1. Books. 2. eBooks. 3. Digital Libraries. 4. Topic Maps. 5. Metadata. 6. Information Organization." />
  <meta name="DC.description" lang="en" content="A group blog devoted to discussion of eBooks, digital libraries, and software tools for organizing our thoughts." />
  <meta name="DC.publisher" content="Murray Altheim" />
  <meta name="DC.contributor" content="Murray Altheim, Patrick Durusau, Lee Iverson, Alexander Johannesen, Jack Park, Gary Richmond, Roger Sperberg, Conal Tuohy, Bernard Vatant" />
  <meta name="DC.date" content="2005-05-30" />
  <meta name="DC.type" content="Text" />
  <meta name="DC.format" content="text/html; charset=ISO-8859-1" />
  <meta name="DC.format" content="57486 bytes" />
  <meta name="DC.identifier" content="http://www.altheim.com/ef/" />

Without getting into too much more detail, it's also possible to further characterize DCMES elements with qualifiers and scheme identifiers (e.g., stating what format some data is in), and for this there are additional sets of DC terms, such that it's a relatively rich way of characterizing resources. As a further example (noting that I'm declaring a second namespace, "DCTERMS"):


  <link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
  <meta name="DC.date" scheme="DCTERMS.W3CDTF" content="2005-05-30" />
  <meta name="DC.type" scheme="DCTERMS.DCMIType" content="Text" />
  <meta name="DC.identifier" scheme="DCTERMS.URI" content="http://www.altheim.com/ef/" />

Now, this doesn't compare with library MARC records (which have hundreds of fields), but it is certainly suitable for web documents. I'm thinking that if JSPWiki's syntax had a way of either marking off the boundaries for a set of metadata terms, or a way of creating name-value pairs using the "DC" and "DCTERMS" namespaces, it might prove relatively easy to use. And any content so declared could be converted into <meta> elements and embedded in the generated HTML/XHTML documents, which would allow it to be harvested by standardized tools.

I'll be working on something like this for my own purposes but am happy to coordinate with others in the JSPWiki code. I have some ideas for possible syntax, but that can wait until later. I use DC metadata throughout my Ceryle project, so this would all tie in quite nicely.

-- MurrayAltheim, 2005-06-18


also posted on the mailing list:

I think it would be possible to implement a simple form of metadata persistence based on treating the fist lines of a wiki as headers. Let me explore how we could respond to "Janne Jalkanen's Five Metadata Criteria" :-)

> a) allows metadata to be hidden or visible (you don't want to see automatical edits, for example).
We could easily hide the metadata from presentation using CSS tags.

> b) does not have this "must run the entire page through TranslatorReader before getting metadata" -crap. (That's expensive)
I don't have a good answer for that - we definetely would have to rely on the principle of pre-parsing to extract the metadata. But perhaps the performance impact can be minimized:

  1. . The metadata would be stored in the beginning of the document, so a simple check in the first line would indicate if the page has metadata or not. If not, the metadata filter would immediately return; if so, it would just read up to the "end-of-metadata" marker.
  2. . The whole process could be optimized by being pushed into the Wiki engine, instead of having to go through the "pluggable" filter mechanism.

> c) works well for both wikipages and attachments (i.e. treats them the same way)
Here you got me. But why do we need metadata for attachments? If we can cover metadata for the content, doesn't that satisfy the "80%" criteria?

> d) would allow sub-pages and namespaces (for wikifarms)
Don't know enough about those topics to respond properly.

> e) has a relatively simple API so that people can write their own providers easily
No need for a separate API, nothing gets broken.

Elaborating a little further, a page with metadata would look like that:

%%metadata
|Profile|@BugReportMetadataProfile
|version|2.2.16-beta
|criticality|LightBug
%%

...content

The '' tags serve two functions: telling the metadata filter where the metadata info is, and allowing CSS to be used to hide it or render it in a different way then regular wiki tables. You can see the format is very close to the one generated by current WikiForms like SubmitBugRepor - the idea is that you can use current tools to generate tables with metadata.

Using the info would be trivial, just a simple filter (the metadata filter) that would:

  • scan the first line of a page, searching for the "entry;
  • if present, it would parse each subsequent line into a name-value pair and set them in the page internal representation using the setAttribute() API;
  • as soon as the final "%" marker was found it would return.

With the parsed metadata in memory we could build all kinds of fancy applications like advanced query plugins, etc.

Editing this data could be done in two ways:

  • 1. The easiest and obvious one: just like we do today with a BugReport or Idea page. The data is visible as a table, the user can directly modify it. The metadata would be updated with a postSave() method call.
  • 2. A more advanced scenario would be to create a metadata-aware edit page. We could do it in two ways:
    • 2.1: A data-agnostic WikiForm that just present each metadata value as a String;
    • 2.2: A data-aware WikiForm that knows what is the best input control for each metadata entry, and could even later use Javascript to do some simple client-side input validation (like checking for valid dates, dates, etc.)

To implement 2.2, every metadata header would always include a "profile" propery, pointing to a WikiPage describing the type information for the metadata. More specifically, it would hold all the information necessary to create a WikiForm to manipulate that data - down to the FormInput and FormSelect attributes. So a specialized editor would just have to use the "profile" property to find the appropriate metadata profile page, and combine the information there with the metadata values to create a WikiForm page on-the-fly. The page contents (minus the metadata header) would go into a text area appended to the form after the metadata-related input elements.

-- NascifAbousalhNeto


In looking over the entire page so far, one thing I'd like to suggest (which may only be coming from my own requirements, I admit) is that for any metadata fields that happen to overlap with Dublin Core, that if there's a need for a field identifier that we use the Dublin Core one for that. This would allow easy translation to HTML <meta> elements, using the methodology recommended by DCMI and described by me briefly above. I'll create a Dublin Core Metadata Element Set (DCMES) page on the wiki to describe these fields, which can be identified both by a short string or via URL (there's both available for each, and even a way of extending the base set).

-- MurrayAltheim


Metadata can be formulated more or less as key-value pairs directly on the Wiki pages:

HasAuthor MKiesel

- HasAuthor should be a WikiPage on that, in turn, the URI of the corresponding DC RDF property is declared. When parsing Wiki pages, the corresponding RDF statements can be built.

Of course, you can use arbitrary techniques and/or customized XML here. But after all, RDF has been invented for these things, so why not use it?

There are quite a lot of Java RDF libraries available:

Sound metadata handling probably needs quite a lot of work on JSPWiki's core code along with changes in the Provider's interfaces of course. On the other side, this would allow interesting things on the way, for example good handling of binary data/file attachments, proper handling of renaming pages, rendering of links based on properties defined on the target page (tooltips for links!)...

-- Kiesel, 2005-07-15


One major enhancement would be to adopt JSR-170 as the main repository metaphor. It would allow pages, attachments and metadata for each to be cleanly stored, would make it nice and easy to do queries (with XPath), and also allow a nice upgrade path to running multiple wikis in one engine. However, it's a bit problematic to make the transition graceful - the JSR-170 Repository metaphor is somewhat more complicated than ours, and at least for now Jackrabbit (the Jakarta implementation of JSR-170) would require quite a lot of tweaking to interface neatly with JSPWiki.

But for the future, I think JSR-170 is really the way to go. XPath is cool.

-- JanneJalkanen, 16-Jul-2005


How about using RDF or OWL and semantic web concepts.

--AnonymousCoward, 21-May-2006


You'd want to think about Semantic Web extensions. See e.g. Semantic MediaWiki.

--Hans Oesterholt, 26-Sep-2006


My installations work with v2.6.x and I solved my metadata requirements with the OpenRDF Sesame server. I use plain Dublin Core metadata expressed in XHTML as well as Dublin Core RDF wiki-page attachments. The RDF attachments are exposed via an definite url rendered dynamically by the ViewTemplate.jsp template. Metadata are kept in sync with the repository by a Groovy script. This script stores the RDF files via the openrdf API in the triplestore. Sesame comes with a SPARQL web GUI (aka workbench) that is quite easy to use for SPARQL newbies.

Looking forward to future releases of JSPWiki I speak to keep metadata closedly attached/combined with the non-metadata (content) page as there should be no extra effort to keep the metadata attached with its target. Other combinations use to fail in every day operations. Second I would like to see an extra tab for editing metadata within JSPWiki maybe plaintext or form-based.

Please do not introduce non-standard metadata conventions. Dublin Core works well and further requirements can be easily adopted with PRISM or something like this.

My 2 ct. --GregorWillemsen, 22-SEPT-208

Add new attachment

Only authorized users are allowed to upload new attachments.

List of attachments

Kind Attachment Name Size Version Date Modified Author Change note
txt
pmcs.txt 0.1 kB 1 14-Jan-2006 08:08 80.184.153.120
« This page (revision-41) was last changed on 23-Sep-2008 00:27 by 77.25.228.163