[Systers-dev] Time for weekly updates

Robin Jeffries robin at jeffries.org
Mon Jul 19 21:45:01 PDT 2010


On Mon, Jul 19, 2010 at 1:17 PM, Yian Shang <yian.shang at gmail.com> wrote:

> Thanks, I'll start using that field--seems much easier.
>
> Actually the use case I wrote earlier was for filtering:
> http://systers.org/systers-dev/doku.php/timeline-yian:situations_that_involve_filtering_messages_by_date,
> but I started considering the weekly/monthly ones because since it's what's
> currently there, I wasn't sure if users had some other uses for having the
> archives split up other than for viewing messages from a specific time.
>

The archives are currently split up purely because the technology used for
displaying them is very slow, and this was a reasonable size (the admin can
actually set the volume size, I think to week/month/year).  But technology
for filtering/displaying etc. has gotten a lot better (as well as computers
being a lot faster).


>
> Also, I think dynamically generating the messages that fit under specified
> dates would be slightly inefficient (especially compared to the current
> static scheme). How would the special indexes mentioned work?
>

I can imagine several things, depending on how stuff is stored in the
database.  One is making the date a key, so that it;s an easy database
operation to get the messages between x and y (for arbitrary dates).

Another is to have multiple archives that are invisible to the user, but are
used to solve for different date ranges.
    - one might be specifically for the case of recent messages, and might
be of all the messages sent in the last 3-4 months (new messages are added
as they come in, but it's regenerated once a month, perhaps)
   - one might be for messages that people are likely to search for, and
might include all the messages in the last 2 years say (this would need to
be a admin specified value, but we'd also want to do some analytics once the
system is in production, and see what the 80/20 rule is for messages people
are looking for -- you'd want this to contain maybe 80% of all the messages
people search for).  Except for very unusual mailing list (e.g, things that
are meant to be archival, not conversational), my guess is that 2 years will
cover that 80%.
   - all messages

[maybe we need 4 or 5 different indexes, but this would be my first guess]

It would be an interesting challenge to see how easy it would be to figure
out which archive to use for a given search.

Starting dumb heuristic: if the search isn't by date,  use the "all
messages" archive.  If it is by date, use the newest one that covers the
beginning and ending dates

You could imagine making that smarter in various ways (maybe start with the
2 years archive and assume that if someone is looking for something older
than that, they would be willing to wait longer to get it retrieved, so you
could make a second pass over that index, but there might be lots of ways to
make it smarter).  But that's a project for next summer or maybe even the
summer after that.  We need a working system before we worry about getting a
FAST working system.

If all else fails, you provide users with a drop down of "search recent
messages" "search messages from the last 2 years" "search entire mailing
list history" with a reasonable default, and expect them to change it if
they are looking for something different.  And at the end of the set of
retrieved messages, you have a note that says something like "didn't find
what you wanted?  *Search older messages.*"

Does that make sense?  I haven't thought that deeply about this, so just
take this as a set of ideas to run with, not a fully formed design document.

Robin
*
*



> Yian
>
> On Sun, Jul 18, 2010 at 10:07 PM, Robin Jeffries <robin at jeffries.org>wrote:
>
>> In the maillist data structure (which you ought to be reading), there is a
>> field dlist-enabled (I believe), which is set to yes for dlists (and
>> probably doesn't exist for non-dlists).  That ought to be easier and more
>> accurate than looking for the plus addressing.
>>
>> I'm not sure about the weekly/monthly archives (especially if users have
>> to know which ones to use).  What is your use case for this?  If people are
>> likely to want to look by date, the dates are not going to be on week/month
>> boundaries, so it seems like it's much better to have the full archives with
>> some sort of date filter (yes, this is going to cause some issues about
>> performance; but I think that's something that has to be addressed -- maybe
>> by some special indexes will help.  But if we don't have to solve the
>> performance problems this summer)
>>
>> Robin
>>
>> On Fri, Jul 16, 2010 at 5:07 PM, Yian Shang <yian.shang at gmail.com> wrote:
>>
>>> Hi again,
>>>
>>> Weekly updated here (or below):
>>> http://movicont.nfshost.com/blog/archives-ui-support-for-non-dlist-lists
>>>
>>> Last week, I got my hands on a number of different mbox files, the
>>> majority
>>> of whom were non-dlist mbox files. Thus, I realized that I should
>>> probably
>>> make an attempt to support non-dlist files, that is, that they should
>>> give
>>> the user an experience similar to the dlist-enabled lists.
>>>
>>> Here's a simple use case: a user could post a message with subject
>>> “Question
>>> about Scaling”. Other users could type replies with subject “Re: Question
>>> about Scaling”. When users are browsing the archives, they would likely
>>> prefer seeing the first message (the question) and all of the "Re:
>>> Question
>>> about Scaling" messages (the answers) on the same page. This would save
>>> them
>>> from having to search for the answer(s). In the original archives, the
>>> user
>>> would click through to a message, and just see that message. In the
>>> message,
>>> there were often links titled "next message" or "prev message"; however,
>>> these links often did not lead to the appropriate reply to the message at
>>> hand, which would lead to a lot of confusion.
>>>
>>> As I started coding this, I realized I needed to integrate the non-dlist
>>> and
>>> dlist generation together. Because I didn't want to add any new settings,
>>> I
>>> tried to automatically detect dlist-style thread addresses (I defined
>>> this
>>> as an email address that had '+' in it.) When it wasn't a dlist address,
>>> I
>>> would generate a map of "subject" to "list of messages with that subject"
>>> instead of "thread address" to "list of messages in conversation". I also
>>> had to use a subject stripped of everything but alphanumeric characters
>>> (this was mainly due to problems with '/' in file generation).
>>>
>>> Here's a screenshot of what I was able to generate:
>>>
>>> <
>>>
>>> http://systers.org/systers-dev/lib/exe/fetch.php/timeline-yian:screenshot-3.png?cache=cache
>>> >
>>>
>>> Since the messages are grouped by subject, I thought that when users
>>> clicked
>>> on the “mailto” link to contribute, the mailto link ought to have the
>>> subject written in by default. There are several advantages: 1) users
>>> don't
>>> have to type in (or copy/paste) the subject again and 2) users aren't as
>>> likely to accidentally make a typo in the subject, which could throw the
>>> sorting off. To see what I meant, see the highlighted red part of the
>>> screenshot.
>>>
>>> There is also a problem with grouping by subject: when a user decides to
>>> purposefully change the subject but the message's content is still the
>>> same
>>> “topic”. The OP might have subject “A question” and another user could
>>> answer with “An Answer”. The two are clearly related and are better off
>>> grouped together, but the subject-grouping won't handle this. In general
>>> though, it is difficult to determine (without humans) whether these two
>>> messages are related,  so I'm not sure if there really is a solution to
>>> this.
>>>
>>> --
>>>
>>> Another thing I did was to generate both the single archive (to solve the
>>> conversations problem mentioned last week) and multiple volumes split by
>>> month or week or ... I felt like this would cover all possible cases of
>>> what
>>> users needed--if they wanted to group messages by conversation, they
>>> could
>>> do so through the 'all' volume, and if they wanted to browse from a
>>> certain
>>> period of time, they could do so from each of the {monthly, weekly, ... }
>>> volumes.
>>>
>>> >From a coding standpoint, this actually took longer than I expected,
>>> because
>>> it took a while to decide where to add in the generation of two archives.
>>> Eventually I modified 'get_archive' to return two archive names and
>>>  'volNameToDesc' to handle the new names returned. The sorting also
>>> needed
>>> to be fixed, as I wanted the 'all' volume to be the first one rather than
>>> the last one. Here's a screenshot:
>>>
>>> <
>>>
>>> http://systers.org/systers-dev/lib/exe/fetch.php/timeline-yian:screen-all-arch.png?cache=cache
>>> >
>>>
>>>
>>>
>>> Yian
>>>
>>> On Fri, Jul 9, 2010 at 1:55 PM, Yian Shang <yian.shang at gmail.com> wrote:
>>>
>>> > Hi everyone!
>>> >
>>> > Update this week is on my blog again (or see below):
>>> > http://movicont.nfshost.com/blog/mailman-archives-ui
>>> >
>>> > --
>>> >
>>> > I recently was able to get some mbox files from Terri (the systers-dev
>>> mbox
>>> > file with dlists was really useful), which helped a lot in the testing
>>> > process. As I started testing on the mbox files, I found some problems:
>>> >
>>> > - The thread address was sometimes wrong. The code had set it to the
>>> > message's "To" header, but it turns out that sometimes people had Cced
>>> the
>>> > list and thus the right thread address wasn't in the "To" header. I
>>> ended up
>>> > creating a list of possible headers to check and ran a loop through the
>>> > list.
>>> > - Some of the email addresses were showing up as botched, due to
>>> different
>>> > formatting styles (i.e., some had "First Last" <emailaddress at blah.com
>>> >),
>>> > so I had to take those into account.
>>> >
>>> > I also realized that including all messages within a conversation on
>>> one
>>> > page meant that I had to change the archival style. The current method
>>> does
>>> > it by a period of time (i.e., monthly, quarterly, weekly etc), which
>>> means
>>> > that if the messages in a conversation were spread across two months
>>> (or
>>> > quarters, weeks etc), they would not be put in the same conversation
>>> page.
>>> > Instead, there would be two pages separating the messages of the
>>> > conversation for each period of time.
>>> >
>>> > I figured that this wasn't ideal, because conversations that were split
>>> up
>>> > would lead to confusion. If a person posts a question on the last day
>>> of
>>> > November and the answer pops up in early December, those messages would
>>> not
>>> > be placed together, which would lead some people (those looking at the
>>> > November archives) to believe that the question never got answered.
>>> >
>>> > Thus, I changed the code to return a single archive for all posts. Then
>>> I
>>> > realized I had another problem--there were too many posts on one page,
>>> and
>>> > so as more and more messages were sent to the list, the archives page
>>> would
>>> > take longer and longer to load. I decided that splitting up the list
>>> into
>>> > several pages would be a good idea, and so I generated pages based on a
>>> set
>>> > number of conversations to list per page (default is 20). I then added
>>> links
>>> > to all pages at the top and bottom of each page.
>>> >
>>> > Here's a screenshot:
>>> > http://img12.imageshack.us/img12/3706/screenshotthetryallarch.png
>>> >
>>> > Have a great weekend everyone! :)
>>> >
>>> > Yian
>>> >
>>>
>>>
>>> To unsubscribe from this conversation, send email to <
>>> systers-dev+systersdev4+unsubscribe at systers.org<systers-dev%2Bsystersdev4%2Bunsubscribe at systers.org>>
>>> or visit <
>>> http://systers.org/mailman/options/systers-dev?override=134&preference=0
>>> >
>>> To contribute to this conversation, use your mailer's reply-all or
>>> reply-group command or send your message to
>>> systers-dev+systersdev4 at systers.org<systers-dev%2Bsystersdev4 at systers.org>
>>> To start a new conversation, send email to <systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
>>> >
>>> To unsubscribe entirely from systers-dev, send email to <
>>> systers-dev-request at systers.org> with subject unsubscribe.
>>>
>>
>>
>

To contribute to this conversation, send mail to <Yian Shang >


More information about the Systers-dev mailing list