Long road to DSearch, Part 1: out of nowhere and into the Jungle

My main day and night time job is developing DSearch, which is an attempt to create competitive Web scale search engine based on distributed computing principles with the help of community of people all around the world. This post is about the long road that lead me to this project.

The first search engine that I wrote was in Perl for my MBA database in late 1998, when I entertained a rather stupid idea of trying to get into MBA program in Harvard and Stanford, naturally they rejected my application, which was fair enough – I am glad I have not pursued that path anyway. As part of research in what it takes I wrote small Perl database/search engine to allow people submit their GMAT grades in order to try to get some statistical information on who gets into those Unis and who does not. The search engine itself is a pathetic attempt that used “table scanning” approach: works okay on small data scales but very wasteful when number of documents increases.

Jungle.com's Screenshot The next chance at having a better shot at search engines I had while working for Jungle.com – once a top flying e-tailer in the UK in early 2000, but ultimately failed and was bought by old-style company called Argos, who (along with blatant incompetence in business management) has lead to complete destruction of the company with scores good people fired. The degree of technical incompetence is apparent even now: Jungle.com will fail, but www.jungle.com will redirect to a shadow of what Jungle.com used to be – it is now a mere word in the URL path of Argos, a pathetic and bitter ending that really needs some dedicated posting that I will do at a later date.

Whilst at Jungle.com in late 2001/early 2002 we had to cope with consequences of a failed project “implemented” by some clowns whose fishy name really puts otherwise yummy salmon fish into a bad light… The search engine that they implemented to search over 500k products was as bad as what I had initially in my first search engine – but this time it used J2EE, something that must have made it look more professional, and certainly a lot more expensive. Anyway, the search was still scanning table, only this time the costs of doing so were huge as we had a lot of products – this was pushing CPU usage on our 12 CPU Sun box pretty high so a solution that does not suck was needed: in other words the kind of solution you would want to get for yourself, not the kind of botched job that often gets done for fixed fee IT contracts.

To Be Continued!


One Response to “Long road to DSearch, Part 1: out of nowhere and into the Jungle”

  1. DimPrawn Says:

    Interesting read Alex.

    Keep up the good work and spend your “downtime” blogging rather than pointless posts on CUK.

    :-)

Leave a Reply