I've created some bindings around Strigi (a desktop search engine on kde). I've wrapped the libstreams aspect of the engine, which exposes meta data extraction in a stream way (i.e. you can extract meta data from files deep in an archive without extracting each file. The project is an extensive set of bindings around Strigi's Streams and StreamAnalyzer libraries (http://strigi.sourceforge.net/).
At this stage, there is currently only one language implementation, but the intention is that more language bindings can easily be created.
I'm hoping that between these two bindings (python-lucene++ and python-streamanalyzer) a really innovative python based desktop search engine can be created. I believe that 90% of the hard work and
performance of a desktop search engine is tied up in the areas that these libraries do best. I really believe desktop search has become stale and we REALLY lack something good still (I'm still using grep
and locate!!!!). Perhaps these libraries will allow someone who isn't caught up in the nitty grity of implementing meta extraction and indexing to come in and blow us all away...
- see http://www.github.com/ustramooner/python-streamanalyzer