[Systers-dev] Database Schema for Message Storing

Ann-Marie Horcher horcheram at gmail.com
Sun Jul 25 00:15:52 PDT 2010


The abstraction layer that Robin proposes as the additional advantage of
allowing a comparison between various database back ends.

Over the life of an organization, its data may move from one repository to
another, as technology and best practices change.  An interface that is
database-agnostic is valuable for transition, comparison, and, as Robin
describes, for matching the database characteristics to the needs of certain
sections of an application.

On Fri, Jul 23, 2010 at 12:31 PM, Gloria W <strangest at comcast.net> wrote:

> No problem, I respect your opinion. But irrespective of this, please do
> pass me whatever evidence you have found to lead you to believe the MongoDB
> is unfit for production. I have never heard of such a thing, I would really
> dislike for such an unsubstantiated rumor to spread through a large
> development community. Also the source of this information you have is
> extremely important to me.
>
> Thank you,
> Gloria
>
>> I definitely don't want to get into database wars, but some things to
>> think about for throughput decisions
>>   - for the standard use of mailman, I don't think we need a high
>> throughput database.  Systers has 3000 members, which I suspect is on the
>> high side for mailman lists, and even on days when we are going fast and
>> furious, mailman can keep up (and I don't think we are particularly well
>> tuned for throughput).  In fact, I consider slowness a feature, as it keeps
>> people from posting even more often (they don't see the responses as
>> quickly), and it's a rare topic that increased posting makes better :-).
>>  Maybe Terri can comment on whether there are situations where mailman lists
>> need high/higher throughput
>>   - now, for archives, a different approach may be important -- you want
>> to pull out the messages that match the request and show them (or at least
>> show the first 10, and I don't know if they come out of the database in the
>> order we will want to show them).  That does need to be "fast enough", which
>> is probably "pretty fast".
>>
>> We definitely need an abstraction layer that allows us to (at some point
>> -- maybe not this summer) experiment with different databases, so that we
>> can find the right balance between the two needs I describe above.
>>
>> Robin
>>
>> On Fri, Jul 23, 2010 at 1:23 AM, Gloria W <strangest at comcast.net <mailto:
>> strangest at comcast.net>> wrote:
>>
>>    On 07/23/2010 01:20 AM, Jennifer Redman wrote:
>>
>>        On Thu, Jul 22, 2010 at 9:40 PM, Yian
>>        Shang<yian.shang at gmail.com <mailto:yian.shang at gmail.com>>  wrote:
>>
>>
>>
>>            I just set up and played with MongoDB for a bit and I
>>            added the indexes
>>            we'd
>>            probably need on the wiki:
>>            http://systers.org/systers-dev/doku.php/database-schema.
>>
>>
>>
>>        MongoDB is INCREDIBLY unstable and in no way ready for production.
>>
>>    I VERY STRONGLY DISAGREE. Some people simply aren't using it
>>    correctly. This article states the contrary, in fine detail:
>>
>>
>> http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/
>>
>>
>>         There is
>>        is this slight problem of complete and totally unexpected
>>        data-loss.
>>
>>    Again, I don't believe this. In this one particular case, it is
>>    most likely corrupted and needs to be repaired, which can happen
>>    to any database, irrespective of type.
>>
>>    Also, 10gen makes it very obvious in their presentations that if
>>    you want immediate, verified writes (ACID compliance), you set a
>>    flag, and it is accomplished. They state clearly why this is
>>    necessary, the trade-offs for each choice, and that data integrity
>>    is not an issue. The person who wrote the fear-monging article
>>    also fails to state that even couchdb works this way, but has the
>>    opposite default value. Read this for an accurate assessment of
>>    durability of MongoDB:
>>
>>    http://blog.mongodb.org/post/381927266/what-about-durability
>>
>>    As an aside, these trade-offs have been made by MySql users for
>>    over twenty years now, when choosing between MyISAM and InnoDB, so
>>    this is not a new philosophy by any means.
>>
>>
>>
>> http://www.mikealrogers.com/2010/07/mongodb-performance-durability/
>>
>>        If you want to play with a NoSQL db look at CouchDB -- but I
>>        highly advise
>>        against using MongoDB for anything.
>>
>>    I strongly disagree with this statement, and know many who use it
>>    in production for high throughput, have not lost data, and are
>>    happy with the results. I am also going to ask you for your data
>>    substantiating this statement, since I cannot find any. If you're
>>    going to point to this one article, this will not suffice.
>>
>>        If you want to switch to a non-relational db server then you
>>        need to do some
>>        analysis first on why that is a correct decision and then
>>        evaluate all the
>>        tools.  Stability and extensibility is extremely important.
>>         How many people
>>        running Mailman are going to be willing to run MongoDB?  Or
>>        CouchDB for that
>>        matter?
>>
>>    I think people will be willing than you realize, given the
>>    throughput benefits. Like I said before, db architects have been
>>    making the ACID vs. non-ACID compliance decision for many years,
>>    and this is not a new concept.
>>
>>    Also, I am more fearful of projects which use relational, heavily
>>    indexed, slow-insert databases on potentially high traffic
>>    projects, without doing the analysis first on where this model
>>    will fail. Your view of having to do analysis to prove the need
>>    for a non-relational db strikes me as odd. We need to do analysis
>>    to prove that a relational db is the correct solution to this
>>    problem, and will not so badly affect performance that it brings
>>    Mailman to it's knees.
>>
>>    How many people are going to be willing to adjust their SHMMAX
>>    setting sot make it run faster, and potentially bring their
>>    machine to a grinding halt? How many people will be willing to
>>    auto-vacuum their db hourly? How many others will be willing to
>>    change the MySQL indexing scheme mid-stream, from InnoDB to the
>>    less reliable, non ACID compliant yet faster-insert model
>>    (MyISAM)?  How many others will feel as if they have to switch to
>>    a much more expensive architecture, just to allow the relational
>>    db to keep up with massive amounts of single-record inserts? Most
>>    certainly, there's trouble ahead using this model, and analysis
>>    needs to be done for this model.
>>
>>    Maybe the solution is to write a flexible DB API, and retrofit one
>>    relational and one non-relational db solution to it. No single
>>    group of developers should be making a predetermined decision
>>    about what database soultions would be appropriate for anyone
>>    else's mail traffic, IMHO. There is no one-size-fits-all solution
>>    to this issue. So I propose that we make this solution flexible
>>    via a DB API, and provide two preconfigured, out-of-the-box
>>    solutions while allowing the user to come up with their own.
>>
>>    Gloria
>>
>>
>>
>>    Gloria
>>
>>
>>
>>
>>    To unsubscribe from this conversation, send email to
>>    <systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>
>>    <mailto:systers-dev%2Bdatabase3%2Bunsubscribe at systers.org<systers-dev%252Bdatabase3%252Bunsubscribe at systers.org>>>
>> or
>>
>>    visit
>>    <
>> http://systers.org/mailman/options/systers-dev?override=147&preference=0
>>    <
>> http://systers.org/mailman/options/systers-dev?override=147&preference=0
>> >>
>>    To contribute to this conversation, use your mailer's reply-all or
>>    reply-group command or send your message to
>>    systers-dev+database3 at systers.org<systers-dev%2Bdatabase3 at systers.org>
>>    <mailto:systers-dev%2Bdatabase3 at systers.org<systers-dev%252Bdatabase3 at systers.org>
>> >
>>
>>    To start a new conversation, send email to
>>    <systers-dev+new at systers.org <systers-dev%2Bnew at systers.org> <mailto:
>> systers-dev%2Bnew at systers.org <systers-dev%252Bnew at systers.org>>>
>>
>>    To unsubscribe entirely from systers-dev, send email to
>>    <systers-dev-request at systers.org
>>    <mailto:systers-dev-request at systers.org>> with subject unsubscribe.
>>
>>
>>
>
>
> To unsubscribe from this conversation, send email to <
> systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>>
> or visit <
> http://systers.org/mailman/options/systers-dev?override=147&preference=0>
> To contribute to this conversation, use your mailer's reply-all or
> reply-group command or send your message to
> systers-dev+database3 at systers.org <systers-dev%2Bdatabase3 at systers.org>
> To start a new conversation, send email to <systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
> >
> To unsubscribe entirely from systers-dev, send email to <
> systers-dev-request at systers.org> with subject unsubscribe.
>



-- 
Ann-Marie Horcher

To contribute to this conversation, send mail to <Gloria W >


More information about the Systers-dev mailing list