CachingProvider seems to freak out on saving as well sometimes, as RecentChanges throws a NullPointerException every now and then.

The user sees this as a HTTP 500 return value, with a NPE wrapped inside a ServletException.

MahlenMorris: What looks to me is going on is that this chunk of code in CachingProvider (line 181):

                    CacheItem item = new CacheItem();
                    item.m_page = (WikiPage);
is creating a WikiPage with no date. This WikiPage will get used the next time getAllPages() is called, and in trying to sort it, the NPE gets thrown. Honestly, I'm not even sure what that loop is doing, though; firstly, I'm surprised the (WikiPage) even compiled, since should be a String, no? Is the WikiPage(String) constructor getting called there implicitly? That's too funky. Plus it's iterating through the cache and setting items in the cache at the same time; that puts the fear in me, and I'm not even sure what the purpose is of that loop, it seems like you're clearing the text out of the cache for no reason. Not to imply that it doesn't make sense, it's just not obvious to me yet. I haven't tested this, but from staring at the code that not-setting-the-date appears to be the problem. I just want RecentChanges to work so my RPC Email Notification stuff can work :)

Yeah. It was actually iterating through a wrong collection - namely itself. Mega-oops. :-) It's fixed now in 1.6.8, and hopefully we should get no more NPEs. Thanks! (And having fear was a proper reaction - it certainly looked scary to me too. Must. Stop. Coding. Late. At. Night.) --JanneJalkanen

Update: Still crashes. Stymied. But have some guesses. Now sleep. --JanneJalkanen

MahlenMorris: Well, in looking at the code last night (before i added the note above) I noticed there were a few places in the code that create a new WikiPage object without setting the date. So fixing all of them (even though not all may be used) should help. Also, it wouldn't be hard to bulletproof the date Comparator method you're using to not crash when it encounters a null date (it should probably do the email-the-admin thing, of course). If i have time tonight when i get home and the code drop has occurred, I'll take another peek. Can you describe the behavior more? Is there some event that triggers it, or something that clears it?

Well, it worked before adding a CachingProvider. Also, it works most of the time, but then it crashes after a while. These suggest to me that there might be an interesting race condition, a concurrency issue, a garbage collector issue, or some other exotic creature lying around. I'd rather know the real reason behind the problem than patch the symptoms... Makes for better code later on. :-) --JanneJalkanen

MahlenMorris: I'm starting to think you're right, that it could be a multi-threading issue. However, one potential problem is the following code in RCSFileProvider,getPageInfo():

    MatchResult result = matcher.getMatch();

    Date d = rcsdatefmt.parse( );

    info.setLastModified( d );
If 'd' ends up being null, then the WikiPage object will get a null date, which will crash the PageTimeComparator. A small thing to check.

In a case like this, where we don't have a reliable way to cause the problem, it might be time to crank up the debug output. Make the constructor of each and every WikiPage print a stack trace (like: (new Exception()).printStackTrace()). Make WikiPage.setLastModified() do the same, and print the value it's being set to. It'll be a boatload of output, but at least you'll nail it.

And i was only earlier suggesting that hardening PageTimeComparator would allow the site to continue operating, even when this bug occurs. No one at my work thinks there's any conditions under which my code should legally throw an NPE :). Of course, that doesn't mean it never does...

It's really annoying to me that working on JSPWiki is much more interesting than the job I'm currently paid for.

JanneJalkanen: The difficulty with this bug is that it manifests only occasionally. I built locally a copy and bombarded it through the XML-RPC API at ~30 requests/second and it didn't even flinch. Well, the currently installed at this site version does a lot of debug messaging. I hope it fails soon, so I can check the logs... :-).

JanneJalkanen: It has failed. The debug code and sanity checks prevent RecentChanges from crashing, but it turns out that the culprit is a null that has somehow gotten into the Recent changes Collection, posing as a WikiPage. Will search for more, though I have little time now. At least the RecentChanges are stable for now.

MahlenMorris: Excellent! Uh, I guess. No, it is good that we now have clue as to what is happening (and null WikiPage's was not something I'd been looking for). I noticed this in FileSystemProvider: public WikiPage getPageInfo( String page, int version ) { File file = findPage( page ); if( !file.exists() ) return null; This is the one place I can see where something that returns a WikiPage might do so.

OK, I think I see how this could happen. If you call WikiEngine.getPage() with the name of a page that doesn't exist, then it'll call CacheProvider.getPageInfo(), which will try to find it in the cache. Since that fails, it then calls CachingProvider.addPage(), which will try to get a WikiPage from RCSFileProvider, which gets a null WikiPage from FileSystemProvider, as noted above. However, CachingProvider.addPage() doesn't check if the returned WikiPage is null or not, and puts it into the cache anyway. Then the next time someone tries to call getAllPages(), wham! you get a list with a null in it, and hilarity results.

This situation would happen anytime a new page is created, I think. I figured it would be something relatively infrequently, so I intentionally created a new page when the new code went up.

I haven't tried to observe this in running code, but I think this is it. I think the fix would be if CachingProvider.addPage() didn't add it to the cache if the WikiPage is null. But it's 1:45 AM here in California as i write this, so don't listen to me :)

JanneJalkanen: I arrived at the same conclusion over the weekend, but my cable modem broke down and it'll take me a while to upload the possible fixes. Anyway, it's good that we agree on the likely bug: it makes it even more likely that that is the culprit :-). I probably should write a JUnit testcase...

I'll try to get the update here soonish (depending on how fast my new cable modem starts working).

There is now a new update, which hopefully fixes the aforementioned problem. 1.6.9 contains plenty more debugging code and checks =). --JanneJalkanen

(07-Feb-02: So far so good, no breaks yet... *crosses fingers*)

(08-Feb-02: Still no sign...)

MahlenMorris: Has anyone created a new page since this fix went up? Isn't that the test?

JanneJalkanen: Yes, multiple pages have been created. So far so good =).

Back to BugReports.

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-21) was last changed on 12-Oct-2007 06:43 by JanneJalkanen