[Systers-dev] Database Schema for Message Storing

Jennifer Redman jenred at gmail.com
Tue Jul 27 13:01:31 PDT 2010


On Fri, Jul 23, 2010 at 1:23 AM, Gloria W <strangest at comcast.net> wrote:

>
> I strongly disagree with this statement, and know many who use it in
> production for high throughput, have not lost data, and are happy with the
> results. I am also going to ask you for your data substantiating this
> statement, since I cannot find any. If you're going to point to this one
> article, this will not suffice.


1) I'm coming off of several conferences - Open Source Bridge and OSCON -
where I spent a fair amount of time talking to DB developers et.al.  The
consensus is that MongoDB is not ready for wide-scale adoption in the
majority of production environments.

2) MongoDB requires 2 servers and replication for durability:
http://blog.mongodb.org/post/381927266/what-about-durability

3) There is a 2.5G data limit on 32bit systems -
http://blog.mongodb.org/post/137788967/32-bit-limitations

While 2 and 3 may be acceptable for high-end users with the resources to
play with new (and yes innovative) technologies the average person or
organization running a Mailman listserv does not have those resources.

All of the Systers servers are currently 32bit systems.


>  If you want to switch to a non-relational db server then you need to do
>> some
>> analysis first on why that is a correct decision and then evaluate all the
>> tools.  Stability and extensibility is extremely important.  How many
>> people
>> running Mailman are going to be willing to run MongoDB?  Or CouchDB for
>> that
>> matter?
>>
>>
> I think people will be willing than you realize, given the throughput
> benefits. Like I said before, db architects have been making the ACID vs.
> non-ACID compliance decision for many years, and this is not a new concept.
>
> Also, I am more fearful of projects which use relational, heavily indexed,
> slow-insert databases on potentially high traffic projects, without doing
> the analysis first on where this model will fail. Your view of having to do
> analysis to prove the need for a non-relational db strikes me as odd. We
> need to do analysis to prove that a relational db is the correct solution to
> this problem, and will not so badly affect performance that it brings
> Mailman to it's knees.
>
>
Generally speaking it is best-practice to use what is widely available and
stable in production environments.  If you want to make an argument for a
change from the status quo you need to provide:

1) Performance measures that indicate there is indeed a problem
2) An analysis - both good and bad of why the change from the stable
technology is necessary and why the new software is better.

This is especially necessary given some of the volatile development
processes in OS development.  This is also why Redhat has RHEL - rapid
adoption of innovative technologies is often very bad for the stability of a
production environment.

In the case of MM development and specifically the archives project I would
expect to see the following questions addressed:

1) How does MongoDB fit in with upstream development - particularly since
the focus is on SQL-based DB's?
2) How does MongoDB improve the performance of X functionality
3) Is the durability/performance trade-off worth it in terms of system
function and end-user experience?
4) Is the need for wide-spread adoption of 64bit systems worth the
performance bump? Remember this involves lots of time and upgrades across a
varied user-base.
5) Is the investment in resources to implement replication worth the
performance bump?
6) Is the investment in training all the 1000s of MM listserver admins worth
the performance bump.

Right now we are using Postgresql.  We have not experienced any performance
issues.  Postgresql is a mature project with a vibrant development community
and has proven to be stable - historically - 15 years as opposed to ~2.

Additionally, the fact that MongoDB is a fairly new piece of tech - frequent
updates and upgrades for bug-fixes are to be expected - which means more
patching in the production environment - always a bad thing.

So in my opinion -- MongoDB is not ready for wide-scale adoption in
production environments and is likely to result in stability and scalability
problems for the average Mailman Listserv operator and probably most
sysadmins running a basic LAM/PP setup.

I don't want to dissuade anyone from playing with and exploring NoSQL
options and thinking about alternatives to RDMS -- I'm doing so myself.  But
MongoDB is not the right solution for this project at this time.  And you
need to think differently about production environments - stable and
scalable are key.

Jen

To contribute to this conversation, send mail to <Gloria W >


More information about the Systers-dev mailing list