During the development of some plugins I've noticed that there should be some sort of a way to store persistent metadata with the wikipages, not only attachments.  For example, the VotePlugin (2.1.52+) currently stores the votes it has received as attachments on the page, but really, the data should be stored in some other way.

There is already discussion on [AttachmentMetaData], but this is really a bigger issue that concerns both WikiPages and WikiAttachments.  It also ties in with some of the discussion on [Ideas - SubPages], as the metadata could be embedded in subpages.

It must be possible to define arbitrary metadata so that plugins can save metadata as they please.  However, certain constant names for JSPWiki internal use need to be defined, such as "author" and "lastModified" and "description". For those that overlap with the existing [DublinCoreMetadata], it might be useful to use the existing DC identifiers.

Currently, we have the WikiPage.get/setAttribute() calls, but these unfortunately only handle non-persistent information, as there is no way to store this information on shutdown.  I fear that the PageProvider interface needs to be amended with
    void setPageInfo( WikiPage page )
which would then save all of the PERSISTENT attributes.  The WikiPage class needs to be amended with
    void setAttribute( String key, String value, int persistence )

where {{persistence}} is either TRANSIENT or PERSISTENT.

Unfortunately, breaking the PageProvider interface means that we don't want to do this now; we want to wait until 3.0.

''Question: is it better to store Serializable Objects instead of Strings?''

-- JanneJalkanen, 18-Jul-2003

Please don't. Serializable Objects will only work (with less effort) with the same JVM or so. Changing it will make the serialized object unreadable. Maybe there are solutions that aviod this problem, I don't know. Maybe TOMCAT has a coded solution for storing session data to make it serializable?

-- Guido, 19-Jan-2004

;:You might want to take a look at [XMLEncoder|http://java.sun.com/j2se/1.5.0/docs/api/java/beans/XMLEncoder.html] and [XMLDecoder|http://java.sun.com/j2se/1.5.0/docs/api/java/beans/XMLDecoder.html] in the java.beans package of the Java API.  Or if you're just interested in storing strings as meta-data but still want to be able to use line feeds and such, you might wanna check out the loadFromXML and storeToXML functions of [java.util.Properties|http://java.sun.com/j2se/1.5.0/docs/api/java/util/Properties.html]-- ErikAlm, 08-Jun-2005

Guido has a point. Let me add something to the need for metadata: I would immediately rewrite my ~PageProvider to store and read access permissions from the metadata instead of a plugin-like syntax on the page. (I'd also add a small JSP page to edit these permissions, available perhaps in the page footer or the "more information" view.)

-- ebu

This sounds good to me. I'm using access permissions right now from the current alpha implementation and created some classes to get connected to our user db. I find the permission stuff not that simple. In addition I needed an information about translated pages, because we use two languages in our intranet. At the moment we have defined two
attribute names, where the jsp is checking for presents of them.
 __Question:__ how do you thing the metadata attribute names shall be configured? Via jspwiki.properties perhaps?

-- Guido, 21-Jan-2004



[Wouter] (2003-09-25) - See my rant at [http://nukleos.editthispage.com/2003/09/16]. You'ld need an extended interface, indeed - that may well complicate matters. BTW, Categories are Metadata, but/and they can be the "header" for [Subpages|IdeasSubPages] as well. So you might get them up and running through this same mechanism as well...

There already is one, has been for a year now...  See [WikiRPCInterface].  I *am* happy with the MetaWeblogAPI, but I don't think it is suitable for Wikis as such.

-- JanneJalkanen

[Wouter] (2003-10-02) - Hmm, what is missing is something like a {{getPageMetadata( String pagename)}}, returning an XML-formatted  string or an array of key-value pairs - and a setter as well. 't Would be easier if one could adapt the existing methods to emit/accept XML; that way, you could get the page AND the metadata (including author etc) in a single call. ''But of course, that's wishful thinking: a lot of existing apps would crash right away!'' BTW, I was thinking of new Java interface classes, not XMLRPC...


I've put something in [Ideas] : 19.10.2003 -- __insert header (javascript and meta)__
It needs discussing ... just an idea .
FrancoisParlant (2003-10-21)

Hey, just in case anyone is interested in this, I thought I'd post it. It's relevant to metadata, but it's also relevant to searching. Maybe you'll find it interesting, maybe not....

 Okay, here's what I want to do, and my Google-Fu is too weak to find out how...

I am writing a Wiki, which will form my database. I want a search engine tool which will crawl my wiki, recognise four or five database fields as well as the wiki name of the particular page, and build a searchable index for each page.

For example the page might be

Apology :

LibTitle : Apology
LibAuthor : Plato
LibTopics : Philosophy Greece Ancient

And I'd like to provide a search interface for searching Wikipages based on those optional keywords, or fulltext, and coming back with the results.

I'd also like to (if I could) provide some kind of page clustering based on similar searches - i.e. not just come back with ranking hits, but by some kind of page association mechanism.

Any ideas?

[NascifAbousalhNeto] (3/21/2005)
How about using the [Emacs approach|http://www-2.cs.cmu.edu/cgi-bin/info2www?(emacs)File%20Variables] to store local variables in page? Granted this would limit the metadata to name-value pairs, but it has the major advantage that it would not break any of the existing APIs to the file providers. A small change in the markup language processor would do.

The idea would got basically like this: as a page is loaded, its header is scanned for lines using a special notation, say:
# name1: value1
# name2: value2


Those lines would be parsed and the name-value pairs stored as metadata in the page. When the page is saved, the values would be updated by the wiki engine (just in case they were somehow modified, say by a plugin). 

The Edit page would have to be changed to not show the "metadata" lines, and perhaps to add a small form-oriented to manipulate the page metadata attriubutes.

Perhaps a PageFilter could be used to implement that feature... I am going over the API and it looks promising! :-)

ErikAlm, 09-Jun-2005: I like this idea since it furthers the email metaphor.

However, I fear it may also confuse things.  For instance, what happens if a user puts anything but name-value pairs before the first blank line?  Even if we only accept header-lines that has hash-pounds (#) we can get into problems if a user decides to start their page with a number list.

How about using something completely other than the # to mark a name-value pair?  Maybe {{$name: value}} or {{@name: value}}?  We could even do {{$name=value}}.  This way you could do:
<some contents>
$name1: value1
<some more contents>
$name2: value2
All lines starting with "$" or whatever character, would then be removed.


Another idea would be to save the page metadata as a separate page by using SubPages:  Each page would have a corresponding "WikiPage/_metadata" -page.  However, this causes some problems with versioning and migration...

--  JanneJalkanen, 12-Jun-2005


I'm looking at various ways of harvesting metadata from wiki pages, such that it could be used for "meta categorization" and creating navigation layers, likely using [XML Topic Maps|http://www.topicmaps.org/xtm/1.0/]. There's an existing metadata standard that was developed by the library community and is highly advocated for use in web pages, called __Dublin Core__ (DC).
The [Dublin Core Metadata Element Set|http://dublincore.org/documents/dces/] (DCMES) is a set of about a dozen elements for common metadata, like author, title, etc. and is used in [WorldCat|http://www.oclc.org/worldcat/default.htm], an international library project that now encompasses over 58 million metadata records in over 9000 libraries. IOW, you can't really go wrong using Dublin Core metadata.

There's even an existing way of encoding DC metadata in HTML <meta> elements, as described in a document 
[Expressing Dublin Core in HTML/XHTML meta and link elements|http://dublincore.org/documents/dcq-html/]. This basically means
you first declare the DC namespace using a <link> element, then in one or more <meta> elements you can embed a DC identifier in the 'name' attribute and the metadata in its 'content' attribute. For example:


  <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
  <meta name="DC.title" lang="en" content="Electric Forest Blog" />
  <meta name="DC.creator" content="Murray Altheim" />
  <meta name="DC.subject" lang="en" content="1. Books. 2. eBooks. 3. Digital Libraries. 4. Topic Maps. 5. Metadata. 6. Information Organization." />
  <meta name="DC.description" lang="en" content="A group blog devoted to discussion of eBooks, digital libraries, and software tools for organizing our thoughts." />
  <meta name="DC.publisher" content="Murray Altheim" />
  <meta name="DC.contributor" content="Murray Altheim, Patrick Durusau, Lee Iverson, Alexander Johannesen, Jack Park, Gary Richmond, Roger Sperberg, Conal Tuohy, Bernard Vatant" />
  <meta name="DC.date" content="2005-05-30" />
  <meta name="DC.type" content="Text" />
  <meta name="DC.format" content="text/html; charset=ISO-8859-1" />
  <meta name="DC.format" content="57486 bytes" />
  <meta name="DC.identifier" content="http://www.altheim.com/ef/" />


Without getting into too much more detail, it's also possible to further characterize DCMES elements
with qualifiers and scheme identifiers (e.g., stating what format some data is in), and for this there 
are additional sets of DC terms, such that it's a relatively rich way of characterizing resources. As 
a further example (noting that I'm declaring a second namespace, "DCTERMS"):


  <link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
  <meta name="DC.date" scheme="DCTERMS.W3CDTF" content="2005-05-30" />
  <meta name="DC.type" scheme="DCTERMS.DCMIType" content="Text" />
  <meta name="DC.identifier" scheme="DCTERMS.URI" content="http://www.altheim.com/ef/" />


Now, this doesn't compare with library MARC records (which have hundreds of fields),
but it is certainly suitable for web documents. I'm thinking that if JSPWiki's syntax
had a way of either marking off the boundaries for a set of metadata terms, or a way
of creating name-value pairs using the "DC" and "DCTERMS" namespaces, it might prove
relatively easy to use. And any content so declared could be converted into <meta>
elements and embedded in the generated HTML/XHTML documents, which would allow it to
be harvested by standardized tools.

I'll be working on something like this for my own purposes but am happy to coordinate
with others in the JSPWiki code. I have some ideas for possible syntax, but that can
wait until later. I use DC metadata throughout my [Ceryle|http://purl.org/ceryle/] 
project, so this would all tie in quite nicely.

-- MurrayAltheim, 2005-06-18

''also posted on the mailing list:''\\

I think it would be possible to implement a simple form of metadata persistence based on treating the fist lines of a wiki as headers. Let me explore how we could respond to "Janne Jalkanen's Five Metadata Criteria" :-)

{{> a) allows metadata to be hidden or visible (you don't want to see automatical edits, for example).}}\\
We could easily hide the metadata from presentation using CSS tags.

{{> b) does not have this "must run the entire page through TranslatorReader before getting metadata" -crap. (That's expensive)}}\\
I don't have a good answer for that - we definetely would have to rely on the principle of pre-parsing to extract the metadata. But perhaps the performance impact can be minimized:
#. The metadata would be stored in the beginning of the document, so a simple check in the first line would indicate if the page has metadata or not. If not, the metadata filter would immediately return; if so, it would just read up to the "end-of-metadata" marker.
#. The whole process could be optimized by being pushed into the Wiki engine, instead of having to go through the "pluggable" filter mechanism.

{{> c) works well for both wikipages and attachments (i.e. treats them the same way)}}\\
Here you got me. But why do we need metadata for attachments? If we can cover metadata for the content, doesn't that satisfy the "80%" criteria?

{{> d) would allow sub-pages and namespaces (for wikifarms)}}\\
Don't know enough about those topics to respond properly.

{{> e) has a relatively simple API so that people can write their own providers easily}}\\
No need for a separate API, nothing gets broken.

Elaborating a little further, a page with metadata would look like that:


The '%%' tags serve two functions: telling the metadata filter where the metadata info is, and allowing CSS to be used to hide it or render it in a different way then regular wiki tables. You can see the format is very close to the one generated by current WikiForms like SubmitBugRepor - the idea is that you can use current tools to generate tables with metadata.

Using the info would be trivial, just a simple filter (the metadata filter) that would:
* scan the first line of a page, searching for the "%%metadata" entry;
* if present, it would parse each subsequent line into a name-value pair and set them in the page internal representation using the setAttribute() API;
* as soon as the final "%%" marker was found it would return.

With the parsed metadata in memory we could build all kinds of fancy applications like advanced query plugins, etc.

Editing this data could be done in two ways:
* 1. The easiest and obvious one: just like we do today with a BugReport or Idea page. The data is visible as a table, the user can directly modify it. The metadata would be updated with a postSave() method call.
* 2. A more advanced scenario would be to create a metadata-aware edit page. We could do it in two ways:
** 2.1: A data-agnostic WikiForm that just present each metadata value as a String;
** 2.2: A data-aware WikiForm that knows what is the best input control for each metadata entry, and could even later use Javascript to do some simple client-side input validation (like checking for valid dates, dates, etc.)

To implement 2.2, every metadata header would always include a "profile" propery, pointing to a WikiPage describing the type information for the metadata. More specifically, it would hold all the information necessary to create a WikiForm to manipulate that data - down to the FormInput and FormSelect attributes. So a specialized editor would just have to use the "profile" property to find the appropriate metadata profile page, and combine the information there with the metadata values to create a WikiForm page on-the-fly. The page contents (minus the metadata header) would go into a text area appended to the form after the metadata-related input elements.

-- NascifAbousalhNeto


In looking over the entire page so far, one thing I'd like to suggest (which may only be 
coming from my own requirements, I admit) is that for any metadata fields that happen to
overlap with Dublin Core, that if there's a need for a field identifier that we use the
Dublin Core one for that. This would allow easy translation to HTML <meta> elements,
using the methodology recommended by DCMI and described by me briefly above. I'll create 
a Dublin Core Metadata Element Set (DCMES) page on the wiki to describe these fields, 
which can be identified both by a short string or via URL (there's both available for 
each, and even a way of extending the base set). 

-- MurrayAltheim


Metadata can be formulated more or less as key-value pairs directly on the Wiki pages:

HasAuthor MKiesel

- HasAuthor should be a WikiPage on that, in turn, the URI of the corresponding DC RDF property is declared.
When parsing Wiki pages, the corresponding RDF statements can be built.

Of course, you can use arbitrary techniques and/or customized XML here. But after all, RDF has been invented for these things, so why not use it?

* http://en.wikipedia.org/wiki/Resource_Description_Framework
* http://en.wikipedia.org/wiki/Semantic_web

There are quite a lot of Java RDF libraries available:

* http://www.openrdf.org/ - Rio can be used for in-memory handling and double as a database
* http://jena.sourceforge.net/ - Jena's a common RDF framework but quite large

Sound metadata handling probably needs quite a lot of work on JSPWiki's core code along with changes in the Provider's interfaces of course. On the other side, this would allow interesting things on the way, for example good handling of binary data/file attachments, proper handling of renaming pages, rendering of links based on properties defined on the target page (tooltips for links!)...

-- Kiesel, 2005-07-15


One major enhancement would be to adopt JSR-170 as the main repository metaphor.  It would allow pages, attachments and metadata for each to be cleanly stored, would make it nice and easy to do queries (with XPath), and also allow a nice upgrade path to running multiple wikis in one engine.  However, it's a bit problematic to make the transition graceful - the JSR-170 Repository metaphor is somewhat more complicated than ours, and at least for now Jackrabbit (the Jakarta implementation of JSR-170) would require quite a lot of tweaking to interface neatly with JSPWiki.

But for the future, I think JSR-170 is really the way to go.  XPath is cool.

-- JanneJalkanen, 16-Jul-2005


How about using RDF or OWL and semantic web concepts.

--AnonymousCoward, 21-May-2006


You'd want to think about Semantic Web extensions. See e.g. Semantic MediaWiki.

--Hans Oesterholt, 26-Sep-2006


My installations work with v2.6.x and I solved my metadata requirements with the OpenRDF Sesame server. I use plain Dublin Core metadata expressed in XHTML as well as Dublin Core RDF wiki-page attachments. The RDF attachments are exposed via an definite url rendered dynamically by the {{ViewTemplate.jsp}} template. Metadata are kept in sync with the repository by a Groovy script. This script stores the RDF files via the openrdf API in the triplestore. Sesame comes with a SPARQL web GUI (aka ''workbench'') that is quite easy to use for SPARQL newbies. 

Looking forward to future releases of JSPWiki I speak to keep metadata closedly attached/combined with the non-metadata (content) page as there should be no extra effort to keep the metadata attached with its target. Other combinations use to fail in every day operations. Second I would like to see an extra tab for editing metadata within JSPWiki maybe plaintext or form-based.

Please do not introduce non-standard metadata conventions. Dublin Core works well and further requirements can be easily adopted with PRISM or something like this.

My 2 ct. --GregorWillemsen, 22-SEPT-208