This is a page where you should put any and all ideas concerning a better search in JSPWiki.

Jun-18-2003 - Multi JSPWiki Search#

Motivation: We use JSPWiki as a key part of our intranet. We run about 8 separate JSPWiki instances each having a distinct focus: team, customers, products, research, etc. All wiki's are tied together via InterWikiLinks and they are all listed on the LeftMenu for easy reference. For things like our UserNames we have used the RedirectPlugin to tie them all together into one "logical" page. Our problem we sometimes forget which wiki has a certain piece of information. "Hmm, could be in research? no?, products? no?, customers? yep, it was captured as customer feedback, found it." This leads us to runnign the same search again and again on each wiki instance until we either find the information or give up.

Enhancement: Two things could be changed from the users perspective, either:

  • add another "Search Multiple Wikis" search box to header and the SearchWiki page, or:
  • change the search behaivour as follows:
    • Existing search box, user enters query, JSPWiki runs normal search, if pages are found, display results, otherwise:
    • Automatically perform a multi JSPWiki Search and present the results "clustered" by wiki.

Implementation: Could this be possible with the XML-RPC interface? I've not looked yet. This feature could be made core to JSPWiki or I suppose it could be written as a plugin.

We'd need to have the multi-search wiki's listed somewhere with thier RPC2 url's given. This could be in the main property file or as plugin parameters.

Thoughts? Comments? Feedback?

--JohnVolkar

Yes, sounds very useful and I'll have to work on that.

As a workaround I would suggest that you make a MetaWikiSearchPage, which is simply a JSP page that does a HTTP GET among several Wikis and then aggregates the results on a single page. Make a very simple ViewTemplate with nothing except the search results on it, like:

      <wiki:CheckRequestContext context="find">
         <wiki:Include page="FindContent.jsp" />
      </wiki:CheckRequestContext>

then call repeatedly the search function using

http://mywiki.com/MyWiki1/Search.jsp?query=emacs+java&skin=searchtemplate
http://mywiki.com/MyWiki2/Search.jsp?query=emacs+java&skin=searchtemplate
http://mywiki.com/MyWiki3/Search.jsp?query=emacs+java&skin=searchtemplate
http://mywiki.com/MyWiki4/Search.jsp?query=emacs+java&skin=searchtemplate

and put all the resulting HTML on one page. This should work nicely on 2.1.x, and for 2.0.x you have to change Search.jsp to contain

    String skin    = wiki.safeGetParameter( request, "skin" );

instead of

    String skin    = wiki.getTemplateDir();

This is completely untested, of course. But I think it should work. Note that the parameter "skin" is likely to be renamed "template" in some near-future version in order to be more consistent with our WikiTemplates terminology.

-- JanneJalkanen

Arent-Jan Maybe opensearch can be used to get the results as xml, and aggregated/processed to html using a stylesheet http://opensearch.a9.com/spec/opensearchrss/1.0/


9.5.2003 - Searching for attachment names#

As far as I noticed there is no convenient way to find attachments. If the attachments name is not mentioned in the wiki page itself, you never get hits to that page.
Or did I anything wrong with my configuration?
If not, already any suggestions how to improve this?

--MichiEmde

No, you are quite correct. Attachments are not searched at all currently. You should definitely mention the attachment on the WikiPage itself, if you want it to be found.

--JanneJalkanen

This is now in 2.2.24. Search will find attachment names.

-- Arent-Jan


14/02/2003 - Find Page#

I would like to see the FindPage enhanced. For one, if a valid WikiName is entered in the Search field and the page is not found, the user should be prompted to create a new page, if desired.

This would be convenient, but it might not be healthy for the wiki because any page you create this way would always be an orphan. Better to create a wikilink off of an existing page. -- KenLiu

How about a FindPage along these lines (this would display nicely if embedded HTML was allowed on this site ;-\):

Title search. You can also use regular expressions, such as <em>Seriali[[sz]e</em>. 

<form method=get><input name=titlesearch size=40 value="">
<input type=submit name="OK" value="Go">

<INPUT type=checkbox checked value="1" name="search_pages"> Search web pages, 
<INPUT type=checkbox checked value="1" name="search_plugins"> Plugins, 
<INPUT type=checkbox checked value="1" name="search_images"> Images, 
<INPUT type=checkbox value="1" name="case_sensitive"> Case sensitive
</form><p>

Full-text search. It will not find page titles.

<form method=get><input name=fullsearch size=40 value="">
<input type=submit name="OK" value="Go">

<INPUT type=checkbox checked value="1" name="search_pages"> Search web pages, 
<INPUT type=checkbox checked value="1" name="search_plugins"> Plugins, 
<INPUT type=checkbox value="1" name="case_sensitive"> Case sensitive
</form><p>

Go directly to a page, or __create__ a new page by entering a valid [WikiName|Wiki Name].


<form method=get><input name=goto size=40>
<input type=submit name="OK" value="Go">
</form><p>

Use '+' to require a word, '-' to forbid a word . . . 
. . . etc.

-- PaulDownes


Regexp search#

Would a simple glob regexp search (?, *) bring real value? We would then probably need boolean operators as well... And most successful internet search engines don't really need them either.

--JanneJalkanen, 03-Jan-2003.

Show description of page contents on search#

The search page should show an excerpt of the page contents, so that it would be easier to figure out whether this is the page you want.

--JanneJalkanen, 03-Jan-2003.

I've started working on something like this. --KenLiu

I actually have it working already, I could contribute it this week, however I want it to be in line with ongoing efforts of search enhancements development. Where can I see latest proposals and contact people working on it? --AlexPakka, 08-May-2005

Alex, do you have a patch somewhere, or can you attach it to the wiki? I would like to check it out

-- Arent-Jan


Refactoring Search#

I refactored the Wiki search to have seperate search providers. This to make it easier to extend the Wiki search functionality. Patch is mailed to the user list, if needed I can attach it here somewhere.

-- Arent-Jan Banck


Searching attachment content#

For searching attachment content(this might also be used to do diffs on the attachment history) I would like to create some attachment extractors. Any suggestions for the structure? I could add an entry to jspwiki.properties for every attachment-type -- Arent-Jan

I think we should rely on Lucene Documents here, and make a MIME-type -based extractor. Maybe add the stuff to a search.xml config file:

   ...
   <luceneindexer>
       <pattern>*.pdf</pattern>
       <document>org.foo.LucenePDFDocument</document>
   </luceneindexer>

or something...

-- JanneJalkanen

Q: Has anybody an example for the luceneindexer and where do i have to place the search.xml config file?


I created a plugin to search attachment content of MS Word, MS Excel, MS Outlook, MS PowerPoint, html/rtf/text and PDF format. Trying to upload but there is an upload limit. Code is based on JSPWiki 2.2.27, and could use some refactoring. Using: PDFBox (PDFBox-0.7.1.jar) Jakarta POI from CVS as code is not released.

-- Arent-Jan

Hello - here's an example of how to integrate the MS Word attachment searching yourself. This was tested with Jakarta POI from CVS scratchpad 3.0alpha1 and JSPWiki 2.2.33. In the LuceneSearchProvider.java file insert the following code to the getAttachmentContent( Attachment att ) method:

...
        if(filename.endsWith(".txt") ||
           filename.endsWith(".xml") ||
           filename.endsWith(".ini") ||
           filename.endsWith(".html"))
        {
...
        }
        else if (filename.endsWith(".doc")) {
            InputStream attStream = null;
            try 
            {
                attStream = mgr.getAttachmentStream( att );
                
                HWPFDocument doc = new HWPFDocument(attStream);
                Iterator it = doc.getTextTable().getTextPieces().iterator();
                StringBuffer sb = new StringBuffer(doc.characterLength());
                while(it.hasNext()) {
                    TextPiece tp = (TextPiece)it.next();
                    sb.append(tp.getStringBuffer());
                }
                String s = sb.toString();
                log.debug("Extracted text: " + s + " from attachment: " + filename);
                return s;
            }
            catch (Exception e) 
            {
                log.error("Attachment cannot be loaded", e);
                return null;
            }
            finally {
                if (attStream != null) {
                    try {
                        attStream.close();
                    }
                    catch (IOException e) {
                        log.warn("Couldn't close attachment stream for " + filename, e);
                    }
                }
            }
        }
...

-- Leonardo Graf

---Q: I put these lines in the LuceneSearchProvider.java and added the package hwdf needed in jspwiki.lib , but i cannot compile the whole thing with ant instruction. Any hint would be great. thanx


I tried the above lines for LuceneSearchProvider, and it seems to compile fine with ant. The only gotcha is the JAR signing.

-- Bill Schneider


Maybe Apache Nutch can be used to enhance the search. The Apache Nutch project is a open source websearch based on Lucene, it supports several file formats. Maybe this can be used for search results presentation and attachment search. Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc. http://lucene.apache.org/nutch/

-- Arent-Jan


Support search results as XML, so results can be aggregated over multiple wiki's. For this opensearch might be used.

http://opensearch.a9.com/ Seems something like this would be usefull for the Wiki's , so the search page can be more loosly coupled, and to make it easier to do things like multi-wiki search. Disadvantage might be that the search results are more display-oriented.

-- Arent-Jan


Category Ideas

Hi all i have modified the Search.jsp. This will find the pages containing the querry. if no page is present with that name, it will give link to create that page.

Changes made: New variable named check of the type boolean is added

 boolean check;
    String query = request.getParameter( "query");
    if( query != null )
    {
        log.info("Searching for string "+query);

        try
        {
            list = wiki.findPages( query );
	    check = list.isEmpty(); 
            pageContext.setAttribute( "searchresults",
                                      list,
                                      PageContext.REQUEST_SCOPE );
	    if( check )
    {out.println("<font color=red>No page found with the name "+query+ "
<a href=Edit.jsp?page="+query+">create this page</h1></font>");}

-- Harish Kumar Panda


Taking Advantage of Lucene's Advanced Search Syntax#

If there's anything approaching an industry standard in the open source search world, it's the full text search engine used in JSPWiki, Apache Lucene. Lucene has a very powerful search syntax of its own that you can type directly in the JSPWiki search box.

Terms are single strings of characters, phrases can include whitespace and are surrounded by double-quotes. Searching on a phrase can sometimes provide more exact matching. For wildcards, you can use the "?" to match a single character or "*" to match any string of characters, e.g.:

   D??tchland
or
   Deutch*
You can use logical operators "AND", "OR", and "NOT" (which must appear in uppercase to be recognized), you can boost or prohibit a term or phrase by preceding it with a "+" or "-" (respectively), and group boolean statements using parentheses "(" and "), e.g.:
   ( "ford theater" NOT ( auto OR car OR motor ) ) AND lincoln -continental
You can do fuzzy searching, fuzzily matching "Jalkanen" as "jalkennen~" (note the trailing tilde '~' character). If you're getting too many results you can even control the amount of fuzziness by adding a decimal number between 0.0 and 1.0 after the tilde, e.g.:
   jalkennen~0.7
where the closer to 1.0 the stricter the required match, with 0.0 effectively returning everything (so 0.1 is a reasonable minimum) — 0.5 is the default used if you don't provide a value.

For more details see the Lucene query parser syntax page.

-- MurrayAltheim


Renaming Page leads to search issue on attachments #

I used PDOBox and ApachePOI to input PDF and Office files to the search engine and it's working now. The only problem is that when I rename a page, all the attachments of the page is not searchable until I upload them again...

--IceCool, 04-Dec-2009 07:16

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-48) was last changed on 04-Dec-2009 07:19 by IceCool