First post and a taste of things to come

For some time I thought of starting a blog as means of recording some interesting finds as well as venting some frustrations experienced during process of building DSearch.

The current problem I am working on is automatically determining best recrawl rate for pages that generate dynamic content that is technically different every time they are requested, either due to personalisation, some hidden internals like client-side web analytics or they are just designed to look updated to make search engines recrawl them more often than they really needed. The solution requires an algorithm that is resistant to small changes on the page and allows to determine if a substantial part of the page change, it should also be very fast as we can’t spend too much time analysing a page, and also it should take very little space… if that’s your cup of tea then stay tuned for updates!

2 Responses to “First post and a taste of things to come”

  1. Jon Says:

    Maybe you should talk a little about the background of this project… it looks pretty interesting, but it’s tough to get a sense of its history.

  2. alexc Says:

    Good point Jon, see new post for first bit of history!

