I initially sent this as an email to the group, but thought it might serve better on the weblog…
I’ve been playing around with eXist today. Holy crap.
I used Rob’s JUD export script to suck all 3600+ records out of the CAREO JUD (took almost 2 hours to process that), then ran the import function on eXist (took maybe 5 minutes to import them all).
It looks like it’s going to be able to do some pretty freaky stuff, search-wise. I’ve been playing around with some pretty loose XPath queries, and it returns excellent hits, pretty darned fast. It can be slow if I request, say, all documents with the letter “a” in them somewhere, but for normal queries, it’s stinky fast.
Even for some pretty compound queries, it’s fast, too.
Here’s an example:
document(*)//text() &= ‘*image* *biology* *water*’
This basically says: Return any xml document that contains, somewhere in the various elements in the document, the strings “image”, “biology”, and “water”.
It might match “image” in /lom/technical/format, and “biology” in /lom/classification/keywords/langstring, and “water” in /general/description/langstring.
This particular search returned 60 hits, taking a total of 638ms of processing. Without having added any indexing.
I did another search for:
document(*)//text() &= “*biology* *video*”
and it found stuff that would have been difficult to know it was a video otherwise (the technical/location had a value that had “/VIDEOS/” in it, so it matched.
Also, it seems to cache search results on the fly, so subsequent searches for the same thing return instantly. Very nice.