Searching PDF with ht://Dig


I've just enabled indexing and searching of .pdf documents on the Learning Commons website.

We're using ht:/Dig as our search engine, and it's quite flexible. It can take external parsers to teach it to read non-text-only file formats. There are libraries available that can teach it to read .rtf, .pdf, .ps, .doc, .swf, .xls, and even .ppt files.

For now, I've only added the .pdf parser, using the Xpdf library. There was no binary available for MacOSX, so I had to compile from source. Here's a link to the compiled binaries for MacOSX (compiled without support for the X11 windowing system - these are just the command line utilities). Just drop them in /usr/local/bin and enjoy!


htdig 

See Also

comments powered by Disqus