[Systers-dev] Database Schema for Message Storing

Yian Shang yian.shang at gmail.com
Thu Jul 22 21:40:15 PDT 2010


I just set up and played with MongoDB for a bit and I added the indexes we'd
probably need on the wiki:
http://systers.org/systers-dev/doku.php/database-schema.

What time is the date in?  UTC time?  How will you display it for a user?
>  Especially if it is recent, they will want to see it relative to their
> timezone.  Can you do that?


I think the date/time should probably be UTC time (in unix timestamp
format). Then it can be converted to the user's local time, which could
either be based on IP address geo-approximation or could be a user-set
time-zone.

Yian

On Thu, Jul 22, 2010 at 8:06 PM, Robin Jeffries <robin at jeffries.org> wrote:

> Since any non-dlist list is not going to be databased at all, you could
> experiment with MongoDB for this project, using either the plain pickle
> files or the database for the other data (I think that the ORM we use lets
> you abstract that away, but I don't know if you can plug that into another
> mailman installation).  That would give us a chance to experiment with
> MongoDB.  And I think that the ORM would allow you to convert at least
> dlist
> installations pretty easily, if we decide to go with it.
>
> Robin
>
> On Thu, Jul 22, 2010 at 7:29 PM, Gloria W <strangest at comcast.net> wrote:
>
> > Based on what I've seen others doing, and based on the potential size
> > issues you may have with an active mailing list, I would recommend using
> > MongoDB. It is quite fast for both read and write operations, it is
> > schema-less, and yet you can still index, limit, and order "records" in a
> > SQL-like fashion. It can span collections, so it can grow exponentially
> and
> > still be fast. It can also be configured like a cache, where records
> expire
> > and free space in a collection. You can also add/remove fields as you see
> > fit, across records. MongoDB is one of the best, easiest to use tools
> I've
> > ever seen for data storage and access. I would recommend taking a look at
> it
> > before making a decision on db schemas.
> >
> > I don't mean to dismiss the hard work that has gone into the current
> > selection. I have great respect for the work you have done. I think you
> can
> > extend it into MongoDB as a simple Python dictionary, and make things
> quite
> > a bit easier for yourself, and more extensible for the future development
> of
> > the project.
> >
> > Thank you,
> > Gloria
> >
> >> This is great (and very quick of you).  A few comments.  I am not a
> >> database
> >> expert, so someone with real database knowledge should chime in.
> >>
> >> 1.  I think you should not have messageid and message_id for two
> different
> >> things.  I can imagine a number of hard to track down bugs that this is
> >> likely to lead to.  message_id is the one that we use already and is
> >> mailman
> >> specific, so it probably needs to stay as is.  Not sure what to call the
> >> other, maybe email_message_id?
> >>
> >> 2.  It's my belief/superstition that it's better to have separate tables
> >> for
> >> different actions (if nothing else, it leads to less confusion when
> there
> >> are multiple accesses of the database).  So I would make this a new
> table,
> >> with all these fields.  (Now would be a good time for the database
> expert
> >> to
> >> speak up)
> >>
> >> 3.  A few more fields from the message I would consider.  You may not
> need
> >> them now, but I can imagine a future where you need them
> >>    - a field for the sender email, as the sender could have used one of
> >> her
> >> alternative addresses, (and for a reason -- e.g., for anonymity).  In
> >> fact,
> >> I'm a little worried about a leak here.  If someone posts under a
> >> pseudonym,
> >> there shouldn't be a way for a regular user to find out the real
> >> name/email
> >> behind the pseudonym (an admin should be able to, but should have to
> work
> >> at
> >> it), and having the sender field yoked to the users mailman record makes
> >> it
> >> easy for someone to decide to expose that.
> >>    - a field for the cc field (for both to and cc, keep in mind that it
> >> can
> >> have several entries, and those entries consist of a name and an email
> >> address, the name part being optional.  You might think about how to
> >> structure them
> >>    - a field for ALL the message headers (including the ones you have
> >> called
> >> out into separate fields).  Could be useful for many things -- sending
> >> mail
> >> to different mail clients in different formats, figuring out which
> >> messages
> >> went through which intermediate points, searching for meta strings that
> >> you
> >> aren't sure which field they are in -- I really don't know, but I
> wouldn't
> >> want to lose that information
> >>
> >> What time is the date in?  UTC time?  How will you display it for a
> user?
> >>  Especially if it is recent, they will want to see it relative to their
> >> timezone.  Can you do that?
> >>
> >> Robin
> >>
> >> On Thu, Jul 22, 2010 at 5:19 PM, Yian Shang<yian.shang at gmail.com>
>  wrote:
> >>
> >>
> >>
> >>> I've written a bit about the database schema for message storing here:
> >>> http://systers.org/systers-dev/doku.php/database-schema
> >>>
> >>> Since the current db already had a few fields for message
> identification,
> >>> I
> >>> added a few more ones that I thought would be useful for search/UI.
> I've
> >>> talked to Priya about the fields and I think I integrated the ones that
> >>> she
> >>> was using with Whoosh. Please let me know if more fields/other
> >>> modifications
> >>> are needed.
> >>>
> >>> --
> >>> Yian || http://movicont.nfshost.com/
> >>>
> >>>
> >>> To unsubscribe from this conversation, send email to<
> >>> systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>
> <systers-dev%2Bdatabase3%2Bunsubscribe at systers.org<systers-dev%252Bdatabase3%252Bunsubscribe at systers.org>
> >
> >>> <systers-dev%2Bdatabase3%2Bunsubscribe at systers.org<systers-dev%252Bdatabase3%252Bunsubscribe at systers.org>
> <systers-dev%252Bdatabase3%252Bunsubscribe at systers.org<systers-dev%25252Bdatabase3%25252Bunsubscribe at systers.org>
> >
> >>> >>
> >>>
> >>> or visit<
> >>>
> http://systers.org/mailman/options/systers-dev?override=147&preference=0
> >>> >
> >>> To contribute to this conversation, use your mailer's reply-all or
> >>> reply-group command or send your message to
> >>> systers-dev+database3 at systers.org<systers-dev%2Bdatabase3 at systers.org><
> systers-dev%2Bdatabase3 at systers.org<systers-dev%252Bdatabase3 at systers.org>
> ><
> >>> systers-dev%2Bdatabase3 at systers.org<systers-dev%252Bdatabase3 at systers.org>
> <systers-dev%252Bdatabase3 at systers.org<systers-dev%25252Bdatabase3 at systers.org>
> >
> >>> >
> >>> To start a new conversation, send email to<systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
> <systers-dev%2Bnew at systers.org <systers-dev%252Bnew at systers.org>>
> >>> <systers-dev%2Bnew at systers.org <systers-dev%252Bnew at systers.org> <
> systers-dev%252Bnew at systers.org <systers-dev%25252Bnew at systers.org>>>
> >>>
> >>>
> >>>
> >>>>
> >>>>
> >>> To unsubscribe entirely from systers-dev, send email to<
> >>> systers-dev-request at systers.org>  with subject unsubscribe.
> >>>
> >>>
> >>>
> >>
> >> To unsubscribe from this conversation, send email to<
> >> systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>
> <systers-dev%2Bdatabase3%2Bunsubscribe at systers.org<systers-dev%252Bdatabase3%252Bunsubscribe at systers.org>
> >>
> >>  or visit<
> >>
> http://systers.org/mailman/options/systers-dev?override=147&preference=0>
> >> To contribute to this conversation, use your mailer's reply-all or
> >> reply-group command or send your message to
> >> systers-dev+database3 at systers.org <systers-dev%2Bdatabase3 at systers.org><
> systers-dev%2Bdatabase3 at systers.org<systers-dev%252Bdatabase3 at systers.org>
> >
> >> To start a new conversation, send email to<systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
> <systers-dev%2Bnew at systers.org <systers-dev%252Bnew at systers.org>>
> >> >
> >> To unsubscribe entirely from systers-dev, send email to<
> >> systers-dev-request at systers.org>  with subject unsubscribe.
> >>
> >>
> >>
> >
> >
> >
> > To unsubscribe from this conversation, send email to <
> > systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>
> <systers-dev%2Bdatabase3%2Bunsubscribe at systers.org<systers-dev%252Bdatabase3%252Bunsubscribe at systers.org>
> >>
> > or visit <
> > http://systers.org/mailman/options/systers-dev?override=147&preference=0
> >
> > To contribute to this conversation, use your mailer's reply-all or
> > reply-group command or send your message to
> > systers-dev+database3 at systers.org <systers-dev%2Bdatabase3 at systers.org><
> systers-dev%2Bdatabase3 at systers.org<systers-dev%252Bdatabase3 at systers.org>
> >
> > To start a new conversation, send email to <systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
> <systers-dev%2Bnew at systers.org <systers-dev%252Bnew at systers.org>>
> > >
> > To unsubscribe entirely from systers-dev, send email to <
> > systers-dev-request at systers.org> with subject unsubscribe.
> >
>
>
> To unsubscribe from this conversation, send email to <
> systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>>
> or visit <
> http://systers.org/mailman/options/systers-dev?override=147&preference=0>
> To contribute to this conversation, use your mailer's reply-all or
> reply-group command or send your message to
> systers-dev+database3 at systers.org <systers-dev%2Bdatabase3 at systers.org>
> To start a new conversation, send email to <systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
> >
> To unsubscribe entirely from systers-dev, send email to <
> systers-dev-request at systers.org> with subject unsubscribe.
>

To contribute to this conversation, send mail to <Robin Jeffries >


More information about the Systers-dev mailing list