[Systers-dev] Database Schema for Message Storing
Gloria W
strangest at comcast.net
Fri Jul 23 01:23:43 PDT 2010
On 07/23/2010 01:20 AM, Jennifer Redman wrote:
> On Thu, Jul 22, 2010 at 9:40 PM, Yian Shang<yian.shang at gmail.com> wrote:
>
>
>> I just set up and played with MongoDB for a bit and I added the indexes
>> we'd
>> probably need on the wiki:
>> http://systers.org/systers-dev/doku.php/database-schema.
>>
>>
>>
> MongoDB is INCREDIBLY unstable and in no way ready for production.
I VERY STRONGLY DISAGREE. Some people simply aren't using it correctly.
This article states the contrary, in fine detail:
http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/
> There is
> is this slight problem of complete and totally unexpected data-loss.
>
Again, I don't believe this. In this one particular case, it is most
likely corrupted and needs to be repaired, which can happen to any
database, irrespective of type.
Also, 10gen makes it very obvious in their presentations that if you
want immediate, verified writes (ACID compliance), you set a flag, and
it is accomplished. They state clearly why this is necessary, the
trade-offs for each choice, and that data integrity is not an issue. The
person who wrote the fear-monging article also fails to state that even
couchdb works this way, but has the opposite default value. Read this
for an accurate assessment of durability of MongoDB:
http://blog.mongodb.org/post/381927266/what-about-durability
As an aside, these trade-offs have been made by MySql users for over
twenty years now, when choosing between MyISAM and InnoDB, so this is
not a new philosophy by any means.
> http://www.mikealrogers.com/2010/07/mongodb-performance-durability/
>
> If you want to play with a NoSQL db look at CouchDB -- but I highly advise
> against using MongoDB for anything.
>
I strongly disagree with this statement, and know many who use it in
production for high throughput, have not lost data, and are happy with
the results. I am also going to ask you for your data substantiating
this statement, since I cannot find any. If you're going to point to
this one article, this will not suffice.
> If you want to switch to a non-relational db server then you need to do some
> analysis first on why that is a correct decision and then evaluate all the
> tools. Stability and extensibility is extremely important. How many people
> running Mailman are going to be willing to run MongoDB? Or CouchDB for that
> matter?
>
I think people will be willing than you realize, given the throughput
benefits. Like I said before, db architects have been making the ACID
vs. non-ACID compliance decision for many years, and this is not a new
concept.
Also, I am more fearful of projects which use relational, heavily
indexed, slow-insert databases on potentially high traffic projects,
without doing the analysis first on where this model will fail. Your
view of having to do analysis to prove the need for a non-relational db
strikes me as odd. We need to do analysis to prove that a relational db
is the correct solution to this problem, and will not so badly affect
performance that it brings Mailman to it's knees.
How many people are going to be willing to adjust their SHMMAX setting
sot make it run faster, and potentially bring their machine to a
grinding halt? How many people will be willing to auto-vacuum their db
hourly? How many others will be willing to change the MySQL indexing
scheme mid-stream, from InnoDB to the less reliable, non ACID compliant
yet faster-insert model (MyISAM)? How many others will feel as if they
have to switch to a much more expensive architecture, just to allow the
relational db to keep up with massive amounts of single-record inserts?
Most certainly, there's trouble ahead using this model, and analysis
needs to be done for this model.
Maybe the solution is to write a flexible DB API, and retrofit one
relational and one non-relational db solution to it. No single group of
developers should be making a predetermined decision about what database
soultions would be appropriate for anyone else's mail traffic, IMHO.
There is no one-size-fits-all solution to this issue. So I propose that
we make this solution flexible via a DB API, and provide two
preconfigured, out-of-the-box solutions while allowing the user to come
up with their own.
Gloria
Gloria
To contribute to this conversation, send mail to <Jennifer Redman >
More information about the Systers-dev
mailing list