[Systers-dev] Database Schema for Message Storing
Robin Jeffries
robin at jeffries.org
Thu Jul 22 20:06:28 PDT 2010
Since any non-dlist list is not going to be databased at all, you could
experiment with MongoDB for this project, using either the plain pickle
files or the database for the other data (I think that the ORM we use lets
you abstract that away, but I don't know if you can plug that into another
mailman installation). That would give us a chance to experiment with
MongoDB. And I think that the ORM would allow you to convert at least dlist
installations pretty easily, if we decide to go with it.
Robin
On Thu, Jul 22, 2010 at 7:29 PM, Gloria W <strangest at comcast.net> wrote:
> Based on what I've seen others doing, and based on the potential size
> issues you may have with an active mailing list, I would recommend using
> MongoDB. It is quite fast for both read and write operations, it is
> schema-less, and yet you can still index, limit, and order "records" in a
> SQL-like fashion. It can span collections, so it can grow exponentially and
> still be fast. It can also be configured like a cache, where records expire
> and free space in a collection. You can also add/remove fields as you see
> fit, across records. MongoDB is one of the best, easiest to use tools I've
> ever seen for data storage and access. I would recommend taking a look at it
> before making a decision on db schemas.
>
> I don't mean to dismiss the hard work that has gone into the current
> selection. I have great respect for the work you have done. I think you can
> extend it into MongoDB as a simple Python dictionary, and make things quite
> a bit easier for yourself, and more extensible for the future development of
> the project.
>
> Thank you,
> Gloria
>
>> This is great (and very quick of you). A few comments. I am not a
>> database
>> expert, so someone with real database knowledge should chime in.
>>
>> 1. I think you should not have messageid and message_id for two different
>> things. I can imagine a number of hard to track down bugs that this is
>> likely to lead to. message_id is the one that we use already and is
>> mailman
>> specific, so it probably needs to stay as is. Not sure what to call the
>> other, maybe email_message_id?
>>
>> 2. It's my belief/superstition that it's better to have separate tables
>> for
>> different actions (if nothing else, it leads to less confusion when there
>> are multiple accesses of the database). So I would make this a new table,
>> with all these fields. (Now would be a good time for the database expert
>> to
>> speak up)
>>
>> 3. A few more fields from the message I would consider. You may not need
>> them now, but I can imagine a future where you need them
>> - a field for the sender email, as the sender could have used one of
>> her
>> alternative addresses, (and for a reason -- e.g., for anonymity). In
>> fact,
>> I'm a little worried about a leak here. If someone posts under a
>> pseudonym,
>> there shouldn't be a way for a regular user to find out the real
>> name/email
>> behind the pseudonym (an admin should be able to, but should have to work
>> at
>> it), and having the sender field yoked to the users mailman record makes
>> it
>> easy for someone to decide to expose that.
>> - a field for the cc field (for both to and cc, keep in mind that it
>> can
>> have several entries, and those entries consist of a name and an email
>> address, the name part being optional. You might think about how to
>> structure them
>> - a field for ALL the message headers (including the ones you have
>> called
>> out into separate fields). Could be useful for many things -- sending
>> mail
>> to different mail clients in different formats, figuring out which
>> messages
>> went through which intermediate points, searching for meta strings that
>> you
>> aren't sure which field they are in -- I really don't know, but I wouldn't
>> want to lose that information
>>
>> What time is the date in? UTC time? How will you display it for a user?
>> Especially if it is recent, they will want to see it relative to their
>> timezone. Can you do that?
>>
>> Robin
>>
>> On Thu, Jul 22, 2010 at 5:19 PM, Yian Shang<yian.shang at gmail.com> wrote:
>>
>>
>>
>>> I've written a bit about the database schema for message storing here:
>>> http://systers.org/systers-dev/doku.php/database-schema
>>>
>>> Since the current db already had a few fields for message identification,
>>> I
>>> added a few more ones that I thought would be useful for search/UI. I've
>>> talked to Priya about the fields and I think I integrated the ones that
>>> she
>>> was using with Whoosh. Please let me know if more fields/other
>>> modifications
>>> are needed.
>>>
>>> --
>>> Yian || http://movicont.nfshost.com/
>>>
>>>
>>> To unsubscribe from this conversation, send email to<
>>> systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>
>>> <systers-dev%2Bdatabase3%2Bunsubscribe at systers.org<systers-dev%252Bdatabase3%252Bunsubscribe at systers.org>
>>> >>
>>>
>>> or visit<
>>> http://systers.org/mailman/options/systers-dev?override=147&preference=0
>>> >
>>> To contribute to this conversation, use your mailer's reply-all or
>>> reply-group command or send your message to
>>> systers-dev+database3 at systers.org <systers-dev%2Bdatabase3 at systers.org><
>>> systers-dev%2Bdatabase3 at systers.org<systers-dev%252Bdatabase3 at systers.org>
>>> >
>>> To start a new conversation, send email to<systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
>>> <systers-dev%2Bnew at systers.org <systers-dev%252Bnew at systers.org>>
>>>
>>>
>>>
>>>>
>>>>
>>> To unsubscribe entirely from systers-dev, send email to<
>>> systers-dev-request at systers.org> with subject unsubscribe.
>>>
>>>
>>>
>>
>> To unsubscribe from this conversation, send email to<
>> systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>>
>> or visit<
>> http://systers.org/mailman/options/systers-dev?override=147&preference=0>
>> To contribute to this conversation, use your mailer's reply-all or
>> reply-group command or send your message to
>> systers-dev+database3 at systers.org <systers-dev%2Bdatabase3 at systers.org>
>> To start a new conversation, send email to<systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
>> >
>> To unsubscribe entirely from systers-dev, send email to<
>> systers-dev-request at systers.org> with subject unsubscribe.
>>
>>
>>
>
>
>
> To unsubscribe from this conversation, send email to <
> systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>>
> or visit <
> http://systers.org/mailman/options/systers-dev?override=147&preference=0>
> To contribute to this conversation, use your mailer's reply-all or
> reply-group command or send your message to
> systers-dev+database3 at systers.org <systers-dev%2Bdatabase3 at systers.org>
> To start a new conversation, send email to <systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
> >
> To unsubscribe entirely from systers-dev, send email to <
> systers-dev-request at systers.org> with subject unsubscribe.
>
To contribute to this conversation, send mail to <Gloria W >
More information about the Systers-dev
mailing list