Searching (with reference to schema)

vilm0001@infoeng.flinders.edu.au (by way of yuri )

2002-08-30 22:15:14 UTC

To Ryan, regarding fulltext indexes:
The problem with (my/postgres/MS etc) SQL fulltext index searching is that it
is very, very slow. My proposal wasn't just for the idea of having a fulltext
index, but rather an outline for a fast implementation optimised for this
database (or any database where records have a fixed-width primary key) (read
the spec I wrote ;) ). I've gotten my copy of the existing perl fulltext
search working (in the CVS tree) and it's well done, but still a ways off
optimal speedwise (and it generates huge index files, as they are essentially
plain text)

I'll be putting a prototype of my indexer and searcher together over the
weekend (based on the spec I outlined in the dev board), but I don't have the
net after tomorrow, so I may have to upload it from a friend's place midweek,
or next weekend, if the friend in question isn't available :) ...

Anyway, I'll treat characters as 32-bit, to make it UTF-8 friendly; words
will be delimited by whitespace or punctution, except for words with an
apostrophe, which will be keyed twice: once as the word with the apostrophe
removed, once as the longest bit between apostrophy and whitespace; the
module will consist of C functions which do the actual dirty work, and a
command-line front end to test them - this will make "glueing" the functions
into the database software easier, especially if freedb moves to another
database format.

My next post will be either an announcement or a rather abashed apology ;)

cya -Yuri