[Systers-dev] Database Schema for Message Storing

Gloria W strangest at comcast.net
Fri Jul 23 01:23:43 PDT 2010


On 07/23/2010 01:20 AM, Jennifer Redman wrote:
> On Thu, Jul 22, 2010 at 9:40 PM, Yian Shang<yian.shang at gmail.com>  wrote:
>
>    
>> I just set up and played with MongoDB for a bit and I added the indexes
>> we'd
>> probably need on the wiki:
>> http://systers.org/systers-dev/doku.php/database-schema.
>>
>>
>>      
> MongoDB is INCREDIBLY unstable and in no way ready for production.
I VERY STRONGLY DISAGREE. Some people simply aren't using it correctly. 
This article states the contrary, in fine detail:

http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/
>   There is
> is this slight problem of complete and totally unexpected data-loss.
>    
Again, I don't believe this. In this one particular case, it is most 
likely corrupted and needs to be repaired, which can happen to any 
database, irrespective of type.

Also, 10gen makes it very obvious in their presentations that if you 
want immediate, verified writes (ACID compliance), you set a flag, and 
it is accomplished. They state clearly why this is necessary, the 
trade-offs for each choice, and that data integrity is not an issue. The 
person who wrote the fear-monging article also fails to state that even 
couchdb works this way, but has the opposite default value. Read this 
for an accurate assessment of durability of MongoDB:

http://blog.mongodb.org/post/381927266/what-about-durability

As an aside, these trade-offs have been made by MySql users for over 
twenty years now, when choosing between MyISAM and InnoDB, so this is 
not a new philosophy by any means.

> http://www.mikealrogers.com/2010/07/mongodb-performance-durability/
>
> If you want to play with a NoSQL db look at CouchDB -- but I highly advise
> against using MongoDB for anything.
>    
I strongly disagree with this statement, and know many who use it in 
production for high throughput, have not lost data, and are happy with 
the results. I am also going to ask you for your data substantiating 
this statement, since I cannot find any. If you're going to point to 
this one article, this will not suffice.
> If you want to switch to a non-relational db server then you need to do some
> analysis first on why that is a correct decision and then evaluate all the
> tools.  Stability and extensibility is extremely important.  How many people
> running Mailman are going to be willing to run MongoDB?  Or CouchDB for that
> matter?
>    
I think people will be willing than you realize, given the throughput 
benefits. Like I said before, db architects have been making the ACID 
vs. non-ACID compliance decision for many years, and this is not a new 
concept.

Also, I am more fearful of projects which use relational, heavily 
indexed, slow-insert databases on potentially high traffic projects, 
without doing the analysis first on where this model will fail. Your 
view of having to do analysis to prove the need for a non-relational db 
strikes me as odd. We need to do analysis to prove that a relational db 
is the correct solution to this problem, and will not so badly affect 
performance that it brings Mailman to it's knees.

How many people are going to be willing to adjust their SHMMAX setting 
sot make it run faster, and potentially bring their machine to a 
grinding halt? How many people will be willing to auto-vacuum their db 
hourly? How many others will be willing to change the MySQL indexing 
scheme mid-stream, from InnoDB to the less reliable, non ACID compliant 
yet faster-insert model (MyISAM)?  How many others will feel as if they 
have to switch to a much more expensive architecture, just to allow the 
relational db to keep up with massive amounts of single-record inserts? 
Most certainly, there's trouble ahead using this model, and analysis 
needs to be done for this model.

Maybe the solution is to write a flexible DB API, and retrofit one 
relational and one non-relational db solution to it. No single group of 
developers should be making a predetermined decision about what database 
soultions would be appropriate for anyone else's mail traffic, IMHO. 
There is no one-size-fits-all solution to this issue. So I propose that 
we make this solution flexible via a DB API, and provide two 
preconfigured, out-of-the-box solutions while allowing the user to come 
up with their own.

Gloria



Gloria


To contribute to this conversation, send mail to <Jennifer Redman >


More information about the Systers-dev mailing list