Friday, January 6, 2012

Python wrappers for Strigi Stream Analyzers

I've created some bindings around Strigi (a desktop search engine on kde). I've wrapped the libstreams aspect of the engine, which exposes meta data extraction in a stream way (i.e. you can extract meta data from files deep in an archive without extracting each file. The project is an extensive set of bindings around Strigi's Streams and StreamAnalyzer libraries (http://strigi.sourceforge.net/).

At this stage, there is currently only one language implementation, but the intention is that more language bindings can easily be created.

I'm hoping that between these two bindings (python-lucene++ and python-streamanalyzer) a really innovative python based desktop search engine can be created. I believe that 90% of the hard work and
performance of a desktop search engine is tied up in the areas that these libraries do best. I really believe desktop search has become stale and we REALLY lack something good still (I'm still using grep
and locate!!!!). Perhaps these libraries will allow someone who isn't caught up in the nitty grity of implementing meta extraction and indexing to come in and blow us all away...

- see http://www.github.com/ustramooner/python-streamanalyzer

Multiple Git Staging Areas

I often get stuck when I've done a whole bunch of work on a branch and I want to bunch my commits together nicely into different commits based on their functionality.

So I wrote a bash script which allows you to have multiple staging areas. WARNING: This is beta software. I use it for my own use, but please be careful. I cannot take any responsibility!!!

To continue or create a stage:
# gitstage [staging area name]
This puts an existing stage into the index (if the stage exists) and then executes git gui.

To list the current staging areas:
# gitstage

The workflow scenario I use is generally something like this
1. Run git gui
2. Realise that I need to split the commit into 2... duhhhh
3. Close git gui without commiting
4. Run gitstage commit1, which creates a staging area called 'commit1' and then opens the git gui again
5. Make note of what I've staged so far in the git gui Commit Message textbox
6. Close the git gui.
- gitstage will now creates a commit of the currently staged index
- gitstage also puts any messages you were writing in the git gui in .git/gitstage/commit1.msg
7 Run gitstage commit2, which creates another staging area and opens git gui.
8. Add other files/hunks/lines to the new staging area in the gui
9. Repeat steps 4-8 multiple times. Each time you swap the messages you write will be kept
10. Committing: click commit in the gui and close the gui
- this then removes the staging area
- commit any other remaining staging areas
- end of work flow

The approach I originally took was to store the staged commits as patches, but that means the working directory gets modified, which I didn't like. The new approach actually commits the staged commit and tags it with stage-[name]. The stage- prefix is used for stage names.

To switch to a previous stage I reset back to before the commit and then cherry pick all the revisions after that. As it goes along, it also re-tags all the stages. Then the last commit is cherry picked into the index only.

A few 'nice to haves':
  • It'd be nice to streamline this in an actual gui, but I like git gui and I don't have time so this works for me! :-).
  • A bunch of command line tools which does something similar would be good too.
You can download the script here: http://github.com/ustramooner/gitstage
Put it in your home ~/bin or in /usr/local/bin and make it executable.