JSPWiki currently doesn't support Load Balancing and HighAvailability (HA) well. Here are some of the issues associated with building a cluster of JSPWiki instances.

In general, it would seem most beneficial to store as much data as possible on a redundant central storage device such as a Database or a SAN/NAS.

User and Group Data#

A JDBCUserDatabase does exists to store users to a database and it should enable multiple instances to share the same database. But it probably ought to implement true "commit" support, but in the meantime it should work correctly.

There is currently no JDBCGroupManager that exists to group data, one is planned, but has not yet been written.

Page and Attachment Data#

There are various flavors of JDBC Page Provider out there. but needs some changes made to it before it's ready for integration into the JSPWiki core (specifically: it needs to support JNDI). Various people have mentioned using it as is with good results.

The Reference Manager#

The Reference Manager keeps various types of references to pages. It gets called when pages are added and updated so that its referral lists can be updated. The difficulty here is that if a page is updated on node1 how does node2 get notified to update itself as well.

Option 1#

AndrewJaquith states "...a key feature of the 2.4 JSPWiki platform is the addition of various event classes. Various "page events" (WikiPageEvent objects) are fired to registered listeners when pages are read, saved, and modified. Thus, from a development perspective, what you would need to do would be to wire up a listener class that detects page events, then blast the event out using your favorite local messaging protocol. Other cluster members could pick up the message, and instruct their PageManagers to reload the page in question. I haven't used it, but it seems to me that JGroups would be perfect for a cluster communications."

Option 2#

Another possibility would be to modify the Reference Manager update a storage facility like a database or perhaps even use Memcached in some mode where the data is always read from the source whenever needed, and not kept within itself. This would allow a change made by node1 to be seen by node2 without any sort of event notification.

Mark Rawling's comments "I'm tempted to ask, wouldn't it be a whole lot simpler to put the page reference tables into the database? I admit though, it seems like the sort of data that would get thrashed a *lot*, so you'd have to study the application very carefully to make sure that it would scale. But still - the question remains - events, or database? Or maybe this has already been answered - eg, that events are already intended to replace the ref mgr, and HA will just be another aspect of that. Is that the case? "

Option 3#

A third method that is more brute force might be to have Reference Manager occasionally do a full page scan to update itself. This has problems with consistency, because there are times when the data of each node in the Reference Managers are out of date, but it might provide for more simplified code.

Along those lines, what appears to be possible would be to build a master/failover model, where the master would be handling all the requests until it stopped or died, at which the failver node would take over. The big trick here is to prevent the failover node from initializing until it becomes active. Deferring initialization would mean the Reference Manager would not have started and contain stale data. This sort of thing seems to be possible by changing the deployment descriptor web.xml so the WikiServlet does not load on startup.

SearchManager and PageManager #

Do the same issues the Reference Manager has apply here?


Note that high availability and load balancing are not the same thing. Failover (that is, a warm-standby scenario) means that the standby machine receives some sort of set of journaled changes that would enable it to take over where the other machine died. But load balancing is for scenarios where you need to run more than one copy of something at the same time, to cope with high amounts of traffic.

The two issues (LB and HA) are not mutually exclusive, of course, but they have different needs.

Personally, I think that shared storage for persistent things like pages, users and groups is fine, but for in-memory constructs you really need some sort of cluster communications protocol. That's why I mentioned JGroups. The slightly-less-license-encumbered JCluster is another option.

--Andrew Jaquith, 03-Oct-2006

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-5) was last changed on 18-Mar-2010 23:09 by Allhours