[Systers-dev] Database Schema for Message Storing

Gloria W strangest at comcast.net
Fri Jul 23 09:31:34 PDT 2010


No problem, I respect your opinion. But irrespective of this, please do 
pass me whatever evidence you have found to lead you to believe the 
MongoDB is unfit for production. I have never heard of such a thing, I 
would really dislike for such an unsubstantiated rumor to spread through 
a large development community. Also the source of this information you 
have is extremely important to me.

Thank you,
Gloria
> I definitely don't want to get into database wars, but some things to 
> think about for throughput decisions
>    - for the standard use of mailman, I don't think we need a high 
> throughput database.  Systers has 3000 members, which I suspect is on 
> the high side for mailman lists, and even on days when we are going 
> fast and furious, mailman can keep up (and I don't think we are 
> particularly well tuned for throughput).  In fact, I consider slowness 
> a feature, as it keeps people from posting even more often (they don't 
> see the responses as quickly), and it's a rare topic that increased 
> posting makes better :-).  Maybe Terri can comment on whether there 
> are situations where mailman lists need high/higher throughput
>    - now, for archives, a different approach may be important -- you 
> want to pull out the messages that match the request and show them (or 
> at least show the first 10, and I don't know if they come out of the 
> database in the order we will want to show them).  That does need to 
> be "fast enough", which is probably "pretty fast".
>
> We definitely need an abstraction layer that allows us to (at some 
> point -- maybe not this summer) experiment with different databases, 
> so that we can find the right balance between the two needs I describe 
> above.
>
> Robin
>
> On Fri, Jul 23, 2010 at 1:23 AM, Gloria W <strangest at comcast.net 
> <mailto:strangest at comcast.net>> wrote:
>
>     On 07/23/2010 01:20 AM, Jennifer Redman wrote:
>
>         On Thu, Jul 22, 2010 at 9:40 PM, Yian
>         Shang<yian.shang at gmail.com <mailto:yian.shang at gmail.com>>  wrote:
>
>
>             I just set up and played with MongoDB for a bit and I
>             added the indexes
>             we'd
>             probably need on the wiki:
>             http://systers.org/systers-dev/doku.php/database-schema.
>
>
>
>         MongoDB is INCREDIBLY unstable and in no way ready for production.
>
>     I VERY STRONGLY DISAGREE. Some people simply aren't using it
>     correctly. This article states the contrary, in fine detail:
>
>     http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/
>
>
>          There is
>         is this slight problem of complete and totally unexpected
>         data-loss.
>
>     Again, I don't believe this. In this one particular case, it is
>     most likely corrupted and needs to be repaired, which can happen
>     to any database, irrespective of type.
>
>     Also, 10gen makes it very obvious in their presentations that if
>     you want immediate, verified writes (ACID compliance), you set a
>     flag, and it is accomplished. They state clearly why this is
>     necessary, the trade-offs for each choice, and that data integrity
>     is not an issue. The person who wrote the fear-monging article
>     also fails to state that even couchdb works this way, but has the
>     opposite default value. Read this for an accurate assessment of
>     durability of MongoDB:
>
>     http://blog.mongodb.org/post/381927266/what-about-durability
>
>     As an aside, these trade-offs have been made by MySql users for
>     over twenty years now, when choosing between MyISAM and InnoDB, so
>     this is not a new philosophy by any means.
>
>
>         http://www.mikealrogers.com/2010/07/mongodb-performance-durability/
>
>         If you want to play with a NoSQL db look at CouchDB -- but I
>         highly advise
>         against using MongoDB for anything.
>
>     I strongly disagree with this statement, and know many who use it
>     in production for high throughput, have not lost data, and are
>     happy with the results. I am also going to ask you for your data
>     substantiating this statement, since I cannot find any. If you're
>     going to point to this one article, this will not suffice.
>
>         If you want to switch to a non-relational db server then you
>         need to do some
>         analysis first on why that is a correct decision and then
>         evaluate all the
>         tools.  Stability and extensibility is extremely important.
>          How many people
>         running Mailman are going to be willing to run MongoDB?  Or
>         CouchDB for that
>         matter?
>
>     I think people will be willing than you realize, given the
>     throughput benefits. Like I said before, db architects have been
>     making the ACID vs. non-ACID compliance decision for many years,
>     and this is not a new concept.
>
>     Also, I am more fearful of projects which use relational, heavily
>     indexed, slow-insert databases on potentially high traffic
>     projects, without doing the analysis first on where this model
>     will fail. Your view of having to do analysis to prove the need
>     for a non-relational db strikes me as odd. We need to do analysis
>     to prove that a relational db is the correct solution to this
>     problem, and will not so badly affect performance that it brings
>     Mailman to it's knees.
>
>     How many people are going to be willing to adjust their SHMMAX
>     setting sot make it run faster, and potentially bring their
>     machine to a grinding halt? How many people will be willing to
>     auto-vacuum their db hourly? How many others will be willing to
>     change the MySQL indexing scheme mid-stream, from InnoDB to the
>     less reliable, non ACID compliant yet faster-insert model
>     (MyISAM)?  How many others will feel as if they have to switch to
>     a much more expensive architecture, just to allow the relational
>     db to keep up with massive amounts of single-record inserts? Most
>     certainly, there's trouble ahead using this model, and analysis
>     needs to be done for this model.
>
>     Maybe the solution is to write a flexible DB API, and retrofit one
>     relational and one non-relational db solution to it. No single
>     group of developers should be making a predetermined decision
>     about what database soultions would be appropriate for anyone
>     else's mail traffic, IMHO. There is no one-size-fits-all solution
>     to this issue. So I propose that we make this solution flexible
>     via a DB API, and provide two preconfigured, out-of-the-box
>     solutions while allowing the user to come up with their own.
>
>     Gloria
>
>
>
>     Gloria
>
>
>
>
>     To unsubscribe from this conversation, send email to
>     <systers-dev+database3+unsubscribe at systers.org
>     <mailto:systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>> or
>     visit
>     <http://systers.org/mailman/options/systers-dev?override=147&preference=0
>     <http://systers.org/mailman/options/systers-dev?override=147&preference=0>>
>     To contribute to this conversation, use your mailer's reply-all or
>     reply-group command or send your message to
>     systers-dev+database3 at systers.org
>     <mailto:systers-dev%2Bdatabase3 at systers.org>
>     To start a new conversation, send email to
>     <systers-dev+new at systers.org <mailto:systers-dev%2Bnew at systers.org>>
>     To unsubscribe entirely from systers-dev, send email to
>     <systers-dev-request at systers.org
>     <mailto:systers-dev-request at systers.org>> with subject unsubscribe.
>
>


To contribute to this conversation, send mail to <Robin Jeffries >


More information about the Systers-dev mailing list