[Systers-dev] Database Schema for Message Storing

Yian Shang yian.shang at gmail.com
Sun Jul 25 01:02:10 PDT 2010


I think the abstraction layer is a good idea for the future, makes things a
lot more open-ended in case we discover that one type of db doesn't work.

I went ahead and implemented a crude version of this abstraction today.
Basically each new database class just needs a method 'add_message(msg)',
where msg is a map of fields to values. That class can then have whatever
other methods it needs to handle adding a message. I implemented the
Postgres version, and hopefully other databases can be added in the future.

I also wrote a script to convert the current mbox archives to entries in the
database using the Postgres db class mentioned earlier.

Here's the code:
http://bazaar.launchpad.net/~systers-dev/systers/archive-ui/revision/102 and
the db structure: http://systers.org/systers-dev/doku.php/database-schema

Yian

On Sun, Jul 25, 2010 at 12:15 AM, Ann-Marie Horcher <horcheram at gmail.com>wrote:

> The abstraction layer that Robin proposes as the additional advantage of
> allowing a comparison between various database back ends.
>
> Over the life of an organization, its data may move from one repository to
> another, as technology and best practices change.  An interface that is
> database-agnostic is valuable for transition, comparison, and, as Robin
> describes, for matching the database characteristics to the needs of
> certain
> sections of an application.
>
> On Fri, Jul 23, 2010 at 12:31 PM, Gloria W <strangest at comcast.net> wrote:
>
> > No problem, I respect your opinion. But irrespective of this, please do
> > pass me whatever evidence you have found to lead you to believe the
> MongoDB
> > is unfit for production. I have never heard of such a thing, I would
> really
> > dislike for such an unsubstantiated rumor to spread through a large
> > development community. Also the source of this information you have is
> > extremely important to me.
> >
> > Thank you,
> > Gloria
> >
> >> I definitely don't want to get into database wars, but some things to
> >> think about for throughput decisions
> >>   - for the standard use of mailman, I don't think we need a high
> >> throughput database.  Systers has 3000 members, which I suspect is on
> the
> >> high side for mailman lists, and even on days when we are going fast and
> >> furious, mailman can keep up (and I don't think we are particularly well
> >> tuned for throughput).  In fact, I consider slowness a feature, as it
> keeps
> >> people from posting even more often (they don't see the responses as
> >> quickly), and it's a rare topic that increased posting makes better :-).
> >>  Maybe Terri can comment on whether there are situations where mailman
> lists
> >> need high/higher throughput
> >>   - now, for archives, a different approach may be important -- you want
> >> to pull out the messages that match the request and show them (or at
> least
> >> show the first 10, and I don't know if they come out of the database in
> the
> >> order we will want to show them).  That does need to be "fast enough",
> which
> >> is probably "pretty fast".
> >>
> >> We definitely need an abstraction layer that allows us to (at some point
> >> -- maybe not this summer) experiment with different databases, so that
> we
> >> can find the right balance between the two needs I describe above.
> >>
> >> Robin
> >>
> >> On Fri, Jul 23, 2010 at 1:23 AM, Gloria W <strangest at comcast.net<mailto:
> >> strangest at comcast.net>> wrote:
> >>
> >>    On 07/23/2010 01:20 AM, Jennifer Redman wrote:
> >>
> >>        On Thu, Jul 22, 2010 at 9:40 PM, Yian
> >>        Shang<yian.shang at gmail.com <mailto:yian.shang at gmail.com>>
>  wrote:
> >>
> >>
> >>
> >>            I just set up and played with MongoDB for a bit and I
> >>            added the indexes
> >>            we'd
> >>            probably need on the wiki:
> >>            http://systers.org/systers-dev/doku.php/database-schema.
> >>
> >>
> >>
> >>        MongoDB is INCREDIBLY unstable and in no way ready for
> production.
> >>
> >>    I VERY STRONGLY DISAGREE. Some people simply aren't using it
> >>    correctly. This article states the contrary, in fine detail:
> >>
> >>
> >>
> http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/
> >>
> >>
> >>         There is
> >>        is this slight problem of complete and totally unexpected
> >>        data-loss.
> >>
> >>    Again, I don't believe this. In this one particular case, it is
> >>    most likely corrupted and needs to be repaired, which can happen
> >>    to any database, irrespective of type.
> >>
> >>    Also, 10gen makes it very obvious in their presentations that if
> >>    you want immediate, verified writes (ACID compliance), you set a
> >>    flag, and it is accomplished. They state clearly why this is
> >>    necessary, the trade-offs for each choice, and that data integrity
> >>    is not an issue. The person who wrote the fear-monging article
> >>    also fails to state that even couchdb works this way, but has the
> >>    opposite default value. Read this for an accurate assessment of
> >>    durability of MongoDB:
> >>
> >>    http://blog.mongodb.org/post/381927266/what-about-durability
> >>
> >>    As an aside, these trade-offs have been made by MySql users for
> >>    over twenty years now, when choosing between MyISAM and InnoDB, so
> >>    this is not a new philosophy by any means.
> >>
> >>
> >>
> >> http://www.mikealrogers.com/2010/07/mongodb-performance-durability/
> >>
> >>        If you want to play with a NoSQL db look at CouchDB -- but I
> >>        highly advise
> >>        against using MongoDB for anything.
> >>
> >>    I strongly disagree with this statement, and know many who use it
> >>    in production for high throughput, have not lost data, and are
> >>    happy with the results. I am also going to ask you for your data
> >>    substantiating this statement, since I cannot find any. If you're
> >>    going to point to this one article, this will not suffice.
> >>
> >>        If you want to switch to a non-relational db server then you
> >>        need to do some
> >>        analysis first on why that is a correct decision and then
> >>        evaluate all the
> >>        tools.  Stability and extensibility is extremely important.
> >>         How many people
> >>        running Mailman are going to be willing to run MongoDB?  Or
> >>        CouchDB for that
> >>        matter?
> >>
> >>    I think people will be willing than you realize, given the
> >>    throughput benefits. Like I said before, db architects have been
> >>    making the ACID vs. non-ACID compliance decision for many years,
> >>    and this is not a new concept.
> >>
> >>    Also, I am more fearful of projects which use relational, heavily
> >>    indexed, slow-insert databases on potentially high traffic
> >>    projects, without doing the analysis first on where this model
> >>    will fail. Your view of having to do analysis to prove the need
> >>    for a non-relational db strikes me as odd. We need to do analysis
> >>    to prove that a relational db is the correct solution to this
> >>    problem, and will not so badly affect performance that it brings
> >>    Mailman to it's knees.
> >>
> >>    How many people are going to be willing to adjust their SHMMAX
> >>    setting sot make it run faster, and potentially bring their
> >>    machine to a grinding halt? How many people will be willing to
> >>    auto-vacuum their db hourly? How many others will be willing to
> >>    change the MySQL indexing scheme mid-stream, from InnoDB to the
> >>    less reliable, non ACID compliant yet faster-insert model
> >>    (MyISAM)?  How many others will feel as if they have to switch to
> >>    a much more expensive architecture, just to allow the relational
> >>    db to keep up with massive amounts of single-record inserts? Most
> >>    certainly, there's trouble ahead using this model, and analysis
> >>    needs to be done for this model.
> >>
> >>    Maybe the solution is to write a flexible DB API, and retrofit one
> >>    relational and one non-relational db solution to it. No single
> >>    group of developers should be making a predetermined decision
> >>    about what database soultions would be appropriate for anyone
> >>    else's mail traffic, IMHO. There is no one-size-fits-all solution
> >>    to this issue. So I propose that we make this solution flexible
> >>    via a DB API, and provide two preconfigured, out-of-the-box
> >>    solutions while allowing the user to come up with their own.
> >>
> >>    Gloria
> >>
> >>
> >>
> >>    Gloria
> >>
> >>
> >>
> >>
> >>    To unsubscribe from this conversation, send email to
> >>    <systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>
> <systers-dev%2Bdatabase3%2Bunsubscribe at systers.org<systers-dev%252Bdatabase3%252Bunsubscribe at systers.org>
> >
> >>    <mailto:systers-dev%2Bdatabase3%2Bunsubscribe at systers.org<systers-dev%252Bdatabase3%252Bunsubscribe at systers.org>
> <systers-dev%252Bdatabase3%252Bunsubscribe at systers.org<systers-dev%25252Bdatabase3%25252Bunsubscribe at systers.org>
> >>>
> >> or
> >>
> >>    visit
> >>    <
> >>
> http://systers.org/mailman/options/systers-dev?override=147&preference=0
> >>    <
> >>
> http://systers.org/mailman/options/systers-dev?override=147&preference=0
> >> >>
> >>    To contribute to this conversation, use your mailer's reply-all or
> >>    reply-group command or send your message to
> >>    systers-dev+database3 at systers.org<systers-dev%2Bdatabase3 at systers.org>
> <systers-dev%2Bdatabase3 at systers.org<systers-dev%252Bdatabase3 at systers.org>
> >
> >>    <mailto:systers-dev%2Bdatabase3 at systers.org<systers-dev%252Bdatabase3 at systers.org>
> <systers-dev%252Bdatabase3 at systers.org<systers-dev%25252Bdatabase3 at systers.org>
> >
> >> >
> >>
> >>    To start a new conversation, send email to
> >>    <systers-dev+new at systers.org <systers-dev%2Bnew at systers.org> <
> systers-dev%2Bnew at systers.org <systers-dev%252Bnew at systers.org>> <mailto:
> >> systers-dev%2Bnew at systers.org <systers-dev%252Bnew at systers.org> <
> systers-dev%252Bnew at systers.org <systers-dev%25252Bnew at systers.org>>>>
> >>
> >>    To unsubscribe entirely from systers-dev, send email to
> >>    <systers-dev-request at systers.org
> >>    <mailto:systers-dev-request at systers.org>> with subject unsubscribe.
> >>
> >>
> >>
> >
> >
> > To unsubscribe from this conversation, send email to <
> > systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>
> <systers-dev%2Bdatabase3%2Bunsubscribe at systers.org<systers-dev%252Bdatabase3%252Bunsubscribe at systers.org>
> >>
> > or visit <
> > http://systers.org/mailman/options/systers-dev?override=147&preference=0
> >
> > To contribute to this conversation, use your mailer's reply-all or
> > reply-group command or send your message to
> > systers-dev+database3 at systers.org <systers-dev%2Bdatabase3 at systers.org><
> systers-dev%2Bdatabase3 at systers.org<systers-dev%252Bdatabase3 at systers.org>
> >
> > To start a new conversation, send email to <systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
> <systers-dev%2Bnew at systers.org <systers-dev%252Bnew at systers.org>>
> > >
> > To unsubscribe entirely from systers-dev, send email to <
> > systers-dev-request at systers.org> with subject unsubscribe.
> >
>
>
>
> --
> Ann-Marie Horcher
>
>
> To unsubscribe from this conversation, send email to <
> systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>>
> or visit <
> http://systers.org/mailman/options/systers-dev?override=147&preference=0>
> To contribute to this conversation, use your mailer's reply-all or
> reply-group command or send your message to
> systers-dev+database3 at systers.org <systers-dev%2Bdatabase3 at systers.org>
> To start a new conversation, send email to <systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
> >
> To unsubscribe entirely from systers-dev, send email to <
> systers-dev-request at systers.org> with subject unsubscribe.
>

To contribute to this conversation, send mail to <Ann-Marie Horcher >


More information about the Systers-dev mailing list