[Systers-dev] GSOC Introduction

Ann-Marie Horcher horcheram at gmail.com
Wed Apr 7 06:09:54 PDT 2010


 Here is my proposal, which I have also submitted through Melange.  The link
to comment onit there is
http://socghop.appspot.com/gsoc/student_proposal/edit/google/gsoc2010/amhorcher/t127058400604

You need to have GSOC profile to see it there, so I added it here.  I would
appreciate any comments or suggestions.

 A: Read everything in:
getting_started<http://systers.org/systers-dev/doku.php/getting_started>
Completed

B. Attempt to set up the required development/test/run environment.
Completed

   At least, Download code from Launch Pad, make a modification such
as adding   a new empty file, or fixing one of the trial bugs, and
upload the branch.   This requires a launchpad account and ssh keys.
If you are able to complete the entire installation, and end up with a
running     installation please invite members of SYSTERS-DEV to your
mailing list - use your       launchpad name for your mailing list.

 Note: we are happy to give you help with your bug fixing and with setting
up the development environment. Please subscribe to systers-dev at systers.org(
www.systers.org/mailman/listinfo/systers-dev ) and ask questions there.
Questions asked as part of this application may not be seen and responded to
until the application deadline is passed, which is too late for you to make
your application better. Learning to ask questions of other developers is
part of the experience of being in an open source community. There are only
about a dozen developers on this list.

Development environment up and running.  Have downloaded the latest bug list
branch from launchpad to see how it works.   Used launchpad to create a test
bug.
To Apply answer the following:

 1. How can we reach you (email, IRC, etc.) if we have questions about your
application?

 email:  horcheram at gmail.com   Google chat:  horcheram at gmail.com

IRC: amhorcher

2. Which systers-mailman project are you applying for (please submit a
separate application for each project):

 User interface for searching the Mailman archives

Security model for Systers single -signon and permission managment

There are references on the web of attempts to integrate a search interface
with mailman archives.  The one that "works" is the use of a Googlebot that
crawls the posts.  This is not acceptable for the Systers archives, because
the content in confidential.  In addition, there can be weeks of delay
before the postings are indexed.

There are suggestions to integrate various search engines such as Swish-e,
Whoosh, and Python Search Engines in the literature.   The search engine
provides the backend, but the user interface is equally critical.   A
well-designed interface unlocks content that would stay hidden.   If an
interface is too hard to manipulate, a user will give up and make do without
helpful content.

2a. What do you plan to accomplish over this summer for this project?
(please tell us what part of a larger project you want to work on – e.g., “I
want to work on the search portion of the archives project” –, how you will
approach that project portion, and what your milestones are. You may want to
ask for help from the systers-dev at systers.org list if you have not created
project milestones before, or if you are unsure what is realistic to
accomplish. GSOC divides the summer in half, and at the midpoint, you are
paid if you are reasonably close to the milestones you proposed to reach by
then, and again for meeting your milestones at the end of the summer. If
your project proposal is accepted, you will have a chance to work with your
mentors to revise these milestones, and we always take into account
“unforeseen circumstances”, such as discovering that the code you were going
to build on top of is incompatible with mailman. However, being able to
realistically estimate how much you will be able to accomplish is an
important part of this proposal.)

Phase 1: Preparation (May thru Mid-June)

In this phase I would be learning and refining the skills needed to
accomplish the project.  Requirements for the project would be
established with the project mentor, and more detailed milestones created.

1. Learn Python - General syntax, examples from the Systers code, commonly
used routines within the systers code, testing and debugging, APIs. This
goal is common for most of the developers, and it might be good to have a
webinar to build community, and share the learning.  PyGreSQL is
python-based interface to Postresql, and provides interfaces to query from a
python script.  This would be a good place to start on creating a backend

2. Gather requirements for the UI - screen content, functionality,
performance.  Most commonly searched fields are at the top of the screen.
Where there is a limited set of values for search field, a drop-down is
appropriate.  Where the value is a date, an appropriate calendar style
prompt should be created.

3. Create a requirements document, using either a template mandated by an
admin or a generalized format. www.jiludwig.com/*templates*/FRD*Template*
.doc <http://www.jiludwig.com/templates/FRDTemplate.doc>

The requirements document will be the guide for creating the design
document, and setting a reasonable scope for the project.   Due to the time
constraints, the design would build on already present modules or
capabilities.

4  Learn the change control procedure - Launchpad, how to move to the
development environment, where to retrieve/store use cases, how to track
completion of testing

** with multiple students change control may be critical, or multiple dev
environments may be set up.

5. Determine expectations for Documentation -  Documentation is easiest to
write when the knowledge is fresh and expectations are clear.  Using
previous documentation as a model for what is expected.

6. Design document - includes evaluation of query engines, choice of query
engine

Deliverables for Phase 1: Requirements document, working Python calls,
queries to the posresql database

Phase 2: Development and First testing (mid-June to July 12)

1. Create the prototype of the UI search - layout, working dropdowns, with
stubs to index calls

2. Create Index call module - should accept a query string, connect to an
index, and bring back results in a usable form

3, Create the Display UI -  single record retrieval, multiple record
retrieval - accepts the results and formats them to a screen for user
interaction

4. Create a Content retrieval UI - When user picks what they want to see
from a multiple record set, or when there is only one hit, display the
result

5. Query creator - takes input from search ui, or content retrieval UI and
formulates a valid query to the indexes.

I would normally write these as separate components so that it could be
distributed amongst multiple developers, and for ease in debugging.  UI can
involve a lot of visual tweaking, so separating it from the backend calls is
usually helpful to allow parallel development.  It also allows dev to
interact with stable versions of the components.

Deliverables:  Prototypes of the modules, move to development

Phase 3: Move towards delivery (July 12-Aug 2)

1. Log current features against requirements and design document

2. Fix bugs

3. Adjust interfaces based on user comments

4. Performance turning

Deliverables:  First draft of documentation, working code moved to
production

Phase 4: Completion (Aug 2-9)

Evaluations, final version of documentation, final critical bugs

2b. Project specific questions:

 If you are applying to work on the front end portion of the mailman
archives project, please answer this question:

  Select a “feature” or aspect of the current mailman archives that annoys
or frustrates you, describe it, and propose 3 different ways that problem
could be solved (if you have to get really “out of the box” to think up 3
solutions, go for it). Then discuss the pros and cons of your solutions
(hint: describing use cases for the feature and how each solution works for
the different use cases would be a good way to organize your answer). Are
there any use cases where the existing solution is better?

 Each Systers system has a separate sign-on and registration.  (dev vs
maillist, etc.) I would like to see at least a design specification to work
toward for the various sections of the systers systems.  It will make adding
features easier, because an authorization schema will not need to be
designed and maintained for each component.  It also will mean personal
information is not stored and maintained in multiple places.

When searching the archives I first tried to guess the month.  Then I tried
to guess the author.  Then I tried downloading portions of the archives and
using Google Desktop for searching.  One box searching is good for phrases,
but not for the type of precision I wanted.  After I was done, I cleaned up
my disk until the next time I searched.

The mailman archive needs at least an attribute search that interfaces to
the backend datafile.  The attributes for Mailman can be in a flat file,
even a csv file that becomes an input to a database program.

Looking at a sample of the archives, each message consists of a header and
then content.  Loading the tag information into an SQL-based open source
database is a quick shortcut to searchability.   For my purposes I wrote a
quick strip of the headers into a csv file, and used Access for a SQL based
query.

Search capabilities are normally based by indexes.  The indexes are formed
by crawling the actual content.   The search utility interacts with the
indexes to identify the desired content, and normall delivers a pointer to
the content.  Sometimes the desired content is a small chunk of information
about multiple content items.  For example, a list of all content that
mentions GSOC in the subject.  A search interface may also provide the
ability to summarize/tabulate/collate content.

 3. If you have your own project to propose, please describe it here:

 Create a user information database with schemas for personal information,
description of roles, and an interface to authenticate.  The user would get
the choice to use OpenID as the authentication option, and the Systers
organization gets out of the authentication space.   Systers already uses a
PostreSQL database, and this would be another series of tables in the
schema.

 Create of test cases for validating code changes that are searchable, and
providing tracking of when the test cases are used, success/failure,
compatibilty with mailman levels.

 4. What is your launchPad username:  A-M Horcher

5. Describe how much of the systers-mailman environment you were able to
install. Describe what problems you had. And what prevented you from getting
the entire system up and running?

 Sun Virtual Box installed, vdi of current Systers environment installed,
test messages sent.

6. If you tried to fix one of the bugs listed above,

   - which one did you try?  I am working on + welcome message bug

   - describe your approach.
   I currently studying the structure of the directories to make sure I
   understand how to make the change.
   Since this is my first change to the system, I want make sure my change
   does not impact the system anywhere else.

   - if you got stuck, where did you get stuck?
   I am currently stuck at how to move my internal development to the
   outside environment.
   - what kind of help did you ask for/get?
   I have corresponded with other students and will post a request for help
   in the dev forum.
   - how would you test whether your solution works? What were the specific
   test cases you would run? (if you solved it, tell us what you did; if you
   didn't solve it, describe a set of test cases that would tell you whether
   your solution works)

   The testing I am doing internally is writing out local HTML I display in
   my browser to see if the message is coming through from my change.

   The bigger challenge is still studying the code to make sure what I think
   is a local change is only local.

(We aren't that interested in whether you were successful, but more in how
you went about learning about the code and solving a problem. Please
describe your experience from that perspective)

 In my experience, fixing the first bug on a system is the most
time-consuming.  It isn't so much whether you can come up with a solution.
There is the overhead of understanding how the pieces work together.  The
actual change, if done well, can be tiny.   You can possibly skip that in
the short term, but it is not a best practice.

7. Why do you think you are a good candidate for this project? Describe the
skills you confidently bring to the project, and what you hope to learn from
working on this project, and your interest in the systers mission.

 My research interests are in human-computer interfaces and mobile security.
I have been working in digital content management and am very skilled at
text-searching applications. My experience in design allows me to set up
text searching that anticipates change over time, is flexible to add new
features, responds quickly, and takes minimal maintenance.

Since trying to sift through the Systers archives for messages I know are
there, I have been interested in improving the searchability.  My previous
experience in project management make me an asset to a larger project that
will require coordination between several students for success.
Furthermore, my experience in designing search interfaces on other platforms
in other languages, and HCI background will makes me an asset in creating
the search interface.

8. Are you a syster (www.systers.org)? Would you join if you are accepted?
(Note: systers is only open to women in computing; if you are male, you may
not join, though you are welcome to join systers-dev, our development list.
) -How would you make sure that you understand how dlists (the systers
mailman customizations) work, if you are unable or choose not to join?

 I am currently a member of Systers.

9. All our projects are in python. Describe the largest project you have
completed in python. If you haven't used python, describe the programming
experience you have that will allow you to learn python quickly and be
successful on this project.

 I have programmed several user interfaces to content of millions of
documents.  (www.dowcorning.com)  I have programming experience in many
programming languages, though not specifically python.  I have sucessfully
created programs in other scripting languages, and have no concerns on
python.   It is similar to other Unix scripting languages I have used.

My code is currently running at my former employer's installation.  The code
is stable, well-constructed and efficient, and continues to allow them to
ship product.  I am used to working on teams in various roles, including
project leader.

10. Describe any plans you have over the time period of GSOC (including the
community bonding period) in addition to GSoC, such as classes, a summer
job, vacation plans, master's thesis, etc. Here is the GSOC timeline
http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/timeline
.

 I will wait for vacation until after GSOC.   This would be my summer (and
only) job.

 11. Schooling:

 University of Illinois, Urbana-Champaign - B.S., M.S. in Information
Science

Nova Southeastern University - Current Ph.D. student in Information Systems
Security

What year are you in school?  Ph.D. - 2 years remaining

   - What programming courses have you taken? Fortran, Visual Basic, Basic,
   RPG, Unix Korn shell, Bash,Java, J2EE, and self taught on open source and
   other languages.  I teach myself most languages.

   Extensive knowledge of databases and SQL
   - What did you like about them? What did you not like?
   Formal programming classes are best when I get to use the knowledge in a
   lab that mimics a real-life problem.  I like getting the computer to make my
   life easier.

 What is your major?  Information Systems Security

   - Why have you chosen that?
   It is a growing need, and my work experience dovetails well with that.
   - Have you done group projects (programming or otherwise)?
   I have done many group projects over my career, as individual and group.
   - What was your primary contribution to/role in the group?
   I have been technical lead, technical advisor, project leader, and
   developer, depending on the project.

   - What made working in a group better than alone? What made it harder?
   Working in a group means you have someone to tell when you succeed, who
   actually understands what you are talking about.  You also have someone to
   bounce ideas off of, and someone to offer help.

   Working in a group also requires good communication, and procedures to
   make sure the resulting code works together, and does not conflict.  You
   don't get to make all the decisions, but that can be a good thing.

 12. Do you have work experience in programming? Tell us about it.

I have considerable work experience in programming.   My code, as I
mentioned above, is running in production.  I am focused on easy-to-use
code, and delivering in phases, instead of waiting until the end.  My work
can be seen at www.dowcorning.com and www.xiameter.com.

13. Do you have previous open source experience. Tell us what you have done.

My open source experience is with Unix and Linus on my own servers.




On Mon, Apr 5, 2010 at 11:39 PM, Robin Jeffries <robin at jeffries.org> wrote:

> Have you looked at the archive project?  We are definitely looking for
> people to help design/implement the UI for this, and we are at the
> requirements gathering stage.  My expectation is that we would build a few
> prototypes, based on what the students working on the backend can deliver,
> and find ways to try them out with some friendly lists that have a heavy
> need for archive search/browsing.  Given the need to iterate, I don't think
> we will end up with a final UI this summer, but it might be an interesting
> thing to put in your portfolio.
>
> Robin
>
>
> On Sun, Apr 4, 2010 at 5:02 PM, A-M Horcher <horcheram at gmail.com> wrote:
>
>> I am a Ph.D student in Information Systems.   While  attending GHC 2009, I
>> attended the code sprint there.  I have been interested in further
>> pursuing
>> improvements for the Systers code since.  I have also been a member of
>> Systers since then, have contributed to the wiki.
>>
>> My research interests are in human-computer interfaces and mobile
>> security.
>> I have been working in digital content management and am very skilled at
>> text-searching applications. My experience in design allows me to set up
>> text searching that anticipates change over time, is flexible to add new
>> features, responds quickly, and takes minimal maintenance.
>>
>> Since trying to sift through the Systers archives for messages I know are
>> there, I have been interested in improving the searchability.  My previous
>> experience in project management make me an asset to a larger project that
>> will require coordination between several students for success.
>> Furthermore, my experience in designing search interfaces on other
>> platforms
>> in other languages, and HCI background will makes me an asset in creating
>> the search interface.
>>
>> The dean of my graduate school encouraged his students to apply.  The open
>> source community is more prevalent in the academic arena, and the
>> experience
>> can help in my future research projects and academic career.  I also want
>> the experience with open source and GSOC to contribute to my future goal
>> of
>> mentoring and
>> advising computer science students.
>>
>> I am a non-traditional student, quitting an industry job to go back to
>> grad
>> school.  I have taken a non-traditional career path, by being a woman in
>> technology when there were few.  I have had a non-traditional homelife.
>> Systers has shown me I am not alone out there, and that there is a whole
>> new
>> tradition to form.
>>
>> I have successfully installed the virtual box and vdi to create the
>> current
>> Systers environment and am looking over the bugs.  I have also done some
>> research on some other open source text searching  engines, in addition to
>> my own experience with FAST, Xindex, and other text searching interfaces.
>>
>>
>> To unsubscribe from this conversation, send email to <
>> systers-dev+gsocintros+unsubscribe at systers.org<systers-dev%2Bgsocintros%2Bunsubscribe at systers.org>>
>> or visit <
>> http://systers.org/mailman/options/systers-dev?override=94&preference=0>
>> To contribute to this conversation, use your mailer's reply-all or
>> reply-group command or send your message to
>> systers-dev+gsocintros at systers.org <systers-dev%2Bgsocintros at systers.org>
>> To start a new conversation, send email to <systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
>> >
>> To unsubscribe entirely from systers-dev, send email to <
>> systers-dev-request at systers.org> with subject unsubscribe.
>>
>
>


-- 
Ann-Marie Horcher

To contribute to this conversation, send mail to <Robin Jeffries >


More information about the Systers-dev mailing list