|Title|Prevent search engines from locking pages
|Date|25-Apr-2005 21:53:48 EEST
|JSPWiki version|2.2
|[Idea Category]|GenericIdea

Sorry, haven't really found a relevant location for this, so I'm putting it here.  (fyi, the site search fails almost all of the time on "robots.txt")

By subscribing to my rss feed I was finding that robots were hitting the edit page for every single page on the site, causing the pages to be locked at random times (also causing the rss feed to update when nothing of significance had really happened).

Might I suggest that in the default installation of jspwiki that you include a robots.txt that has the contents:

User-Agent: *\\
Deny: /wiki/Edit.jsp
This doesn't really help: robots.txt is read only from the webserver root directory, therefor in typical JSPWiki installations the default file would be located in /JSPWiki/robots.txt or, like in your example, /wiki/robots.txt.

These files will never be read by robots and are useless.

What would help with this issue is (using 2.0 container authentication) to protect Edit.jsp and force login first (maybe with a standard login displayed on the login page) or replacing the "Edit this page" Link with a small form with submit-button as robots usually don't submit forms during site traversal. This should be fairly simple and is probably a [template|ContributedTemplate] issue.


I agree.  This is something that should be done in the templates.  The other possibility is to use JavaScript in the default template to write the edit link, which should be rendered by the browsers, but ignored by the bots.  However, this is not a very accessibility-friendly solution.

-- JanneJalkanen


Maybe just a note in the documentation? ("If you don't want spiders locking your pages all the time, do this...")

-- [MarkBeeson|http://markbeeson.net/]

I agree with Olaf and Janne. Here's the rationale, direct from the [HTTP spec (RFC 2616)|http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1.1], which says:

''...the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe".''

That restriction is in the spec precisely to prevent problems like this. Ideally, if the ''Edit'' action is going to lock a page, it should only be accessible by POST (which in HTML means a form and a submit button, not a regular link). In conjunction with that change, you could make Edit.jsp only lock the page if it was accessed via HTTP POST. That way, any HTML links that remain (in old templates for example) won't cause the problem. I haven't looked at the code to see how easy or hard this might be to do.

-- JimAncona