[Systers-dev] [Gsoc10] Improve Mailman Archive Access/Searching
kartik rustagi
kashes911 at gmail.com
Mon Mar 22 04:23:13 PDT 2010
Hi
I am senior undergraduate, pursuing my BE in Information Technology. I came
to know about Systers through a friend of mine and I liked the work you are
doing. I am interested in applying for a spot on this year's Gsoc as a
student under Systers. I am interesting in improving the Archive
Access/Searching project, specifically the back end. Regarding improved
indexing/searching:
- Writing a code for this from scratch will involve reinventing the
wheel, there are very practical solutions available for this
- I looked at Lucene <http://lucene.apache.org/>, which is a Java library
to perform the tasks of indexing and searching and build applications on the
top of it like recommendation engines etc. and off course searching. There
does exist a python port callpyLucene<http://lucene.apache.org/pylucene/>,
not very sure if it is as good as the original library
- <http://lucene.apache.org/pylucene/>An even better approach will be to
just skip Lucene and use Solr <http://lucene.apache.org/solr/> which is
built on top of Lucene. It is a full fledged application sever which accepts
data using HTTP, hence making it language independent. It is already being
used by reddit, netflix, cnet, slideshare etc. which speaks for its
scalability and reliability
- There are some other alternatives like swoosh, but the above two are
very well supported (Apache foundation) and in my view are the best options
Views/ recommendations regarding these and other options are requested and
will be highly appreciated. As of now I am trying to setup my development
environment on top of Ubuntu 9.10 and facing some issues due to mixing up of
python 2.6 and 2.5. I will also setup S*olr* on my system and will try to
learn its semantics and maybe discover the underlying limitations (if any).
--
Regards
Kartik Rustagi
To contribute to this conversation, send mail to <systers-dev+accesssearch at systers.org>
More information about the Systers-dev
mailing list