I’ve been playing around with interim builds of XStreamDB 3.1 Beta, and it’s coming along REALLY nicely. It’s pretty cool when the president of the company is the guy running the beta program. Jim’s been awesome, feeding tips and pointers to the new stuff.
They just added/enhanced scoring of fulltext queries, so we can have results sorted by descending relevancy to a query. It’s freakin’ fast, too. I’ve added 3734 XML records from CAREO to XStreamDB (which is also running on the CAREO server – a G4/500 Desktop box, not fast by any stretch of the imagination – to keep the playing field level). Scored and sorted queries are returning results in well under a second. Great stuff.
I’ve even got it doing the processing to pull out just a few elements (title, description, etc…) rather than the whole LOM. It doesn’t seem to take any more time to just pull the whole LOM out, either, which is cool. I prefer the mini proxy results for now, because they’re easier to read in the results listing, but if the EOAdaptor needs the full record, that’s trivial.
The cool thing about the relevancy ranking is that it generates a float value from 0.000000 (completely irrelevant) to 1.000000 (completely relevant). Lots of room in there for subtle variation in relevancy.
Here’s the query I’m running right now:
FOR $record IN (Root("Apollo:Metadata"))
LET $score := SCORE $record USING [//* CONTAINS "earth image"]
RETURN
LET $title := $record//*:general/*:title/*:langstring/text()
LET $location := $record//*:technical/*:location/text()
LET $format := $record//*:technical/*:format/text()
LET $description := $record//*:general/*:description/*:langstring/text()
LET $docid := GetDocId( $record )
ORDER BY $score
RETURN
<result>
<docid>
{$docid}
</docid>
<score>
{$score}
</score>
<title>
{$title}
</title>
<description>
{$description}
</description>
<location>
{$location}
</location>
<format>
{$format}
</format>
</result>
Which returns stuff like this:
<result>
<docid>
1$Apollo:Metadata$1-3438-0
</docid>
<score>
0.95458674
</score>
<title>
The Earth and Moon Viewer
</title>
<description>
This website allows you access to earth and moon imagery from a
variety of viewpoints. You can view either a map of the Earth
showing the day and night regions at this moment, or view the
Earth from the Sun, the Moon, the night side of the Earth, above
any location on the planet specified by latitude, longitude and
altitude, from a satellite in Earth orbit, or above various
cities around the globe. Images can be generated based on a
full-colour image of the Earth by day and night, a topographical
map of the Earth, up-to-date weather satellite imagery, or a
composite image of cloud cover superimposed on a map of the
Earth, a colour composite which shows clouds, land and sea
temperatures, and ice, or the global distribution of water
vapour. Expert mode allows you additional control over the
generation of the image. You can compose a custom request with
frequently-used parameters and save it as a hotlist or bookmark
item in your browser.
</description>
<location>
http://www.fourmilab.ch/earthview/vplanet.html
</location>
<format>
text/html
</format>
</result>
Compare that to the smaller statement to just pull the whole LOM:
FOR $record IN (Root("Apollo:Metadata"))
LET $score := SCORE $record USING [//* CONTAINS "earth image"]
ORDER BY $score
RETURN $record
Which returns the whole freakin’ LOM document.