Caching improvements#

I spent some time analyzing what kinds of requests are issued from the wiki to the Page and Attachment providers in an effort to figure out why large wikis (1000+ pages) with lots of links and large pages become slow.

We have our own custom Page and Attachment providers which request content from a remote machine. This causes a 10 millisecond delay in every request. Booting up a wiki like this takes ages (10 minutes on 2.4GHz pentium) mainly due to caching inefficiency.

There are multiple factors that cause this:

  • Page and Attachment provider implementations (how these should behave is not strictly controlled)
  • There is no Attachment cache
  • Cache configuration problems
  • There were some bugs in the caching implementation

More or less all of the issues got solved by re-engineering Page and Attachment providers and fixing of the bugs. This did spark me with some ideas on how the providers should be rewritten to give more flexibility to the provider writer.

Page api#

There are currently multiple requests to fill out the provider data on a page. Sometimes a (remote) call to request one property of a page is so expensive that returning more information than requested doesn't cost anything. Like loading the page content even if only a request for page existance is needed.

I consider the following to be properties of a page (there might be more):

  • ACL
  • versions
  • content
  • creator
  • timestamp
  • subpages/attachments are just pages which inherit their ACL from the master page
    • getAllPages should really load these also

When requesting one property of a file it should be possible to fill out more properties than absolutely needed as it might be practical to do that. The simplest providers would probably fill the whole structure no matter what is requested.

All pages (existing or nonexisting) would be objects implementing an interface capable of returning all this information as the page content permits.

At we just fill a WikiPage structure and pass it to various providers that then request data relating to the page. This is very inefficient and causes multiple expensive DB requests when rendering pages.

Cache#

The cache needs to approach the pages as intelligent objects when doing expiration of page content. Calling refresh(), instead of reloading the page once it needs possible freshening. This way the page can use possible intelligence of the underlaying database or filesystem to check if it needs to refresh or invalidate some data. There are some issues that need to be checked when actually implementing this as some parts of data may expire at different times.

Implementation#

The current WikiPage object already implements most of the queries. WikiPage could actually act as an interface to the actual data. This might cause some problems as WikiPages can get created dynamically without actually communicating with a provider? Therefore a WikiPageContentProvider (having problems figuring out a good name) interface is probably needed.

There needs to be a master database connection and a place where to request a list of pages (including the subpages, even if they are currently handled by a different provider). This functionality is needed by all the WikiPageContentProviders in order for them to get data and to verify data validity.

Pros & cons#

Places both (Page and Attachment) providers under a single api from wiki perspective. This should make the actual wiki engine cleaner. The single provider might have to read data from two different kinds of databases in case the attachments and pages are stored in different databases (you have the wrong database chosen and should suffer anyways due to making a double implementation).

The functionality of the WikiPageContentProvider can be quite complex. Writing a simple base class that handles most of the complexity is probably wise.

This is becoming pretty complex and I don't see any point in implementing this unless someone else than me feels that his is of any value.

There might be more things that need to be stored (user settings, wiki settings, plugin settings). It would make sense to accommodate these somehow also.

NiiloNeuvo


Comments#

After looking at the times the PageProvider is accessed, and is actualy doing disk IO, I have implemented the following optimalizations on my system:

Effectivly my system currently has 2 levels of caching.

  1. the CachingProvider (caching processed pages) and the CachedAttachmentProvider (caching nonexisting pages)
  2. the PrevaylerProvider (keeping all current unprocessed versions of all pages)

Attachments are not cached.

At the moment I have a very few pages, so all this caching is realy just overkill :)

Side note: I've trailed from the JSPWiki tree pretty much now, so I decided to setup my own environment for my further developments. It is my intention to keepi a close eye on the JSPWiki cvs tree and merge those developments into my tree when applicable. If you like to see how it is working out... please visit https://www.aiko.sh/wiki_dev/

2003-09-01 AikoMastboom


Caching non-existant pages is a good source for memory leaks... 2.1.63 handles non-existant pages much more efficiently now, by simply assuming that if it's not on the list of all pages, it must not exist :-).

2.1.63 also fixes a serious bug (corrected with a single, forgotten line) which caused the CachingProvider to sometimes forget it had ever seen a page.

I also put these fixes on 2.0.51.

I will also add a CachingAttachmentProvider, since Niilo convinced me that it is a problem.

-- JanneJalkanen


Well, I did not yet look deeper into the code, but to avoid memory leaks I work with SoftReference. This allows me to cache everything without caring about OutOfMemory.

-- Bebbo

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-13) was last changed on 26-Sep-2007 23:36 by JanneJalkanen