[Systers-dev] Database Schema for Message Storing
Gloria W
strangest at comcast.net
Thu Jul 22 19:29:45 PDT 2010
Based on what I've seen others doing, and based on the potential size
issues you may have with an active mailing list, I would recommend using
MongoDB. It is quite fast for both read and write operations, it is
schema-less, and yet you can still index, limit, and order "records" in
a SQL-like fashion. It can span collections, so it can grow
exponentially and still be fast. It can also be configured like a cache,
where records expire and free space in a collection. You can also
add/remove fields as you see fit, across records. MongoDB is one of the
best, easiest to use tools I've ever seen for data storage and access. I
would recommend taking a look at it before making a decision on db schemas.
I don't mean to dismiss the hard work that has gone into the current
selection. I have great respect for the work you have done. I think you
can extend it into MongoDB as a simple Python dictionary, and make
things quite a bit easier for yourself, and more extensible for the
future development of the project.
Thank you,
Gloria
> This is great (and very quick of you). A few comments. I am not a database
> expert, so someone with real database knowledge should chime in.
>
> 1. I think you should not have messageid and message_id for two different
> things. I can imagine a number of hard to track down bugs that this is
> likely to lead to. message_id is the one that we use already and is mailman
> specific, so it probably needs to stay as is. Not sure what to call the
> other, maybe email_message_id?
>
> 2. It's my belief/superstition that it's better to have separate tables for
> different actions (if nothing else, it leads to less confusion when there
> are multiple accesses of the database). So I would make this a new table,
> with all these fields. (Now would be a good time for the database expert to
> speak up)
>
> 3. A few more fields from the message I would consider. You may not need
> them now, but I can imagine a future where you need them
> - a field for the sender email, as the sender could have used one of her
> alternative addresses, (and for a reason -- e.g., for anonymity). In fact,
> I'm a little worried about a leak here. If someone posts under a pseudonym,
> there shouldn't be a way for a regular user to find out the real name/email
> behind the pseudonym (an admin should be able to, but should have to work at
> it), and having the sender field yoked to the users mailman record makes it
> easy for someone to decide to expose that.
> - a field for the cc field (for both to and cc, keep in mind that it can
> have several entries, and those entries consist of a name and an email
> address, the name part being optional. You might think about how to
> structure them
> - a field for ALL the message headers (including the ones you have called
> out into separate fields). Could be useful for many things -- sending mail
> to different mail clients in different formats, figuring out which messages
> went through which intermediate points, searching for meta strings that you
> aren't sure which field they are in -- I really don't know, but I wouldn't
> want to lose that information
>
> What time is the date in? UTC time? How will you display it for a user?
> Especially if it is recent, they will want to see it relative to their
> timezone. Can you do that?
>
> Robin
>
> On Thu, Jul 22, 2010 at 5:19 PM, Yian Shang<yian.shang at gmail.com> wrote:
>
>
>> I've written a bit about the database schema for message storing here:
>> http://systers.org/systers-dev/doku.php/database-schema
>>
>> Since the current db already had a few fields for message identification, I
>> added a few more ones that I thought would be useful for search/UI. I've
>> talked to Priya about the fields and I think I integrated the ones that she
>> was using with Whoosh. Please let me know if more fields/other
>> modifications
>> are needed.
>>
>> --
>> Yian || http://movicont.nfshost.com/
>>
>>
>> To unsubscribe from this conversation, send email to<
>> systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>>
>> or visit<
>> http://systers.org/mailman/options/systers-dev?override=147&preference=0>
>> To contribute to this conversation, use your mailer's reply-all or
>> reply-group command or send your message to
>> systers-dev+database3 at systers.org<systers-dev%2Bdatabase3 at systers.org>
>> To start a new conversation, send email to<systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
>>
>>>
>> To unsubscribe entirely from systers-dev, send email to<
>> systers-dev-request at systers.org> with subject unsubscribe.
>>
>>
>
> To unsubscribe from this conversation, send email to<systers-dev+database3+unsubscribe at systers.org> or visit<http://systers.org/mailman/options/systers-dev?override=147&preference=0>
> To contribute to this conversation, use your mailer's reply-all or reply-group command or send your message to systers-dev+database3 at systers.org
> To start a new conversation, send email to<systers-dev+new at systers.org>
> To unsubscribe entirely from systers-dev, send email to<systers-dev-request at systers.org> with subject unsubscribe.
>
>
To contribute to this conversation, send mail to <Robin Jeffries >
More information about the Systers-dev
mailing list