[Systers-dev] [Gsoc10] Improve Mailman Archive Access/Searching

kartik rustagi kashes911 at gmail.com
Mon Mar 22 04:23:13 PDT 2010


Hi
I am senior undergraduate, pursuing my BE in Information Technology. I came
to know about Systers through a friend of mine and I liked the work you are
doing. I am interested in applying for a spot on this year's Gsoc as a
student under Systers. I am interesting in improving the Archive
Access/Searching project, specifically the back end. Regarding improved
indexing/searching:

   - Writing a code for this from scratch will involve reinventing the
   wheel, there are very practical solutions available for this
   - I looked at Lucene <http://lucene.apache.org/>, which is a Java library
   to perform the tasks of indexing and searching and build applications on the
   top of it like recommendation engines etc. and off course searching. There
   does exist a python port callpyLucene<http://lucene.apache.org/pylucene/>,
   not very sure if it is as good as the original library
   - <http://lucene.apache.org/pylucene/>An even better approach will be to
   just skip Lucene and use Solr <http://lucene.apache.org/solr/> which is
   built on top of Lucene. It is a full fledged application sever which accepts
   data using HTTP, hence making it language independent. It is already being
   used by reddit, netflix, cnet, slideshare etc. which speaks for its
   scalability and reliability
   - There are some other alternatives like swoosh, but the above two are
   very well supported (Apache foundation) and in my view are the best options

Views/ recommendations regarding these and other options are requested and
will be highly appreciated. As of now I am trying to setup my development
environment on top of Ubuntu 9.10 and facing some issues due to mixing up of
python 2.6 and 2.5. I will also setup S*olr* on my system and will try to
learn its semantics and maybe discover the underlying limitations (if any).

-- 
Regards
Kartik Rustagi

To contribute to this conversation, send mail to <systers-dev+accesssearch at systers.org>


More information about the Systers-dev mailing list