[Systers-dev] Database Schema for Message Storing
Robin Jeffries
robin at jeffries.org
Fri Jul 23 09:23:59 PDT 2010
I definitely don't want to get into database wars, but some things to think
about for throughput decisions
- for the standard use of mailman, I don't think we need a high
throughput database. Systers has 3000 members, which I suspect is on the
high side for mailman lists, and even on days when we are going fast and
furious, mailman can keep up (and I don't think we are particularly well
tuned for throughput). In fact, I consider slowness a feature, as it keeps
people from posting even more often (they don't see the responses as
quickly), and it's a rare topic that increased posting makes better :-).
Maybe Terri can comment on whether there are situations where mailman lists
need high/higher throughput
- now, for archives, a different approach may be important -- you want to
pull out the messages that match the request and show them (or at least show
the first 10, and I don't know if they come out of the database in the order
we will want to show them). That does need to be "fast enough", which is
probably "pretty fast".
We definitely need an abstraction layer that allows us to (at some point --
maybe not this summer) experiment with different databases, so that we can
find the right balance between the two needs I describe above.
Robin
On Fri, Jul 23, 2010 at 1:23 AM, Gloria W <strangest at comcast.net> wrote:
> On 07/23/2010 01:20 AM, Jennifer Redman wrote:
>
>> On Thu, Jul 22, 2010 at 9:40 PM, Yian Shang<yian.shang at gmail.com> wrote:
>>
>>
>>
>>> I just set up and played with MongoDB for a bit and I added the indexes
>>> we'd
>>> probably need on the wiki:
>>> http://systers.org/systers-dev/doku.php/database-schema.
>>>
>>>
>>>
>>>
>> MongoDB is INCREDIBLY unstable and in no way ready for production.
>>
> I VERY STRONGLY DISAGREE. Some people simply aren't using it correctly.
> This article states the contrary, in fine detail:
>
>
> http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/
>
> There is
>> is this slight problem of complete and totally unexpected data-loss.
>>
>>
> Again, I don't believe this. In this one particular case, it is most likely
> corrupted and needs to be repaired, which can happen to any database,
> irrespective of type.
>
> Also, 10gen makes it very obvious in their presentations that if you want
> immediate, verified writes (ACID compliance), you set a flag, and it is
> accomplished. They state clearly why this is necessary, the trade-offs for
> each choice, and that data integrity is not an issue. The person who wrote
> the fear-monging article also fails to state that even couchdb works this
> way, but has the opposite default value. Read this for an accurate
> assessment of durability of MongoDB:
>
> http://blog.mongodb.org/post/381927266/what-about-durability
>
> As an aside, these trade-offs have been made by MySql users for over twenty
> years now, when choosing between MyISAM and InnoDB, so this is not a new
> philosophy by any means.
>
>
> http://www.mikealrogers.com/2010/07/mongodb-performance-durability/
>>
>> If you want to play with a NoSQL db look at CouchDB -- but I highly advise
>> against using MongoDB for anything.
>>
>>
> I strongly disagree with this statement, and know many who use it in
> production for high throughput, have not lost data, and are happy with the
> results. I am also going to ask you for your data substantiating this
> statement, since I cannot find any. If you're going to point to this one
> article, this will not suffice.
>
> If you want to switch to a non-relational db server then you need to do
>> some
>> analysis first on why that is a correct decision and then evaluate all the
>> tools. Stability and extensibility is extremely important. How many
>> people
>> running Mailman are going to be willing to run MongoDB? Or CouchDB for
>> that
>> matter?
>>
>>
> I think people will be willing than you realize, given the throughput
> benefits. Like I said before, db architects have been making the ACID vs.
> non-ACID compliance decision for many years, and this is not a new concept.
>
> Also, I am more fearful of projects which use relational, heavily indexed,
> slow-insert databases on potentially high traffic projects, without doing
> the analysis first on where this model will fail. Your view of having to do
> analysis to prove the need for a non-relational db strikes me as odd. We
> need to do analysis to prove that a relational db is the correct solution to
> this problem, and will not so badly affect performance that it brings
> Mailman to it's knees.
>
> How many people are going to be willing to adjust their SHMMAX setting sot
> make it run faster, and potentially bring their machine to a grinding halt?
> How many people will be willing to auto-vacuum their db hourly? How many
> others will be willing to change the MySQL indexing scheme mid-stream, from
> InnoDB to the less reliable, non ACID compliant yet faster-insert model
> (MyISAM)? How many others will feel as if they have to switch to a much
> more expensive architecture, just to allow the relational db to keep up with
> massive amounts of single-record inserts? Most certainly, there's trouble
> ahead using this model, and analysis needs to be done for this model.
>
> Maybe the solution is to write a flexible DB API, and retrofit one
> relational and one non-relational db solution to it. No single group of
> developers should be making a predetermined decision about what database
> soultions would be appropriate for anyone else's mail traffic, IMHO. There
> is no one-size-fits-all solution to this issue. So I propose that we make
> this solution flexible via a DB API, and provide two preconfigured,
> out-of-the-box solutions while allowing the user to come up with their own.
>
> Gloria
>
>
>
> Gloria
>
>
>
>
> To unsubscribe from this conversation, send email to <
> systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>>
> or visit <
> http://systers.org/mailman/options/systers-dev?override=147&preference=0>
> To contribute to this conversation, use your mailer's reply-all or
> reply-group command or send your message to
> systers-dev+database3 at systers.org <systers-dev%2Bdatabase3 at systers.org>
> To start a new conversation, send email to <systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
> >
> To unsubscribe entirely from systers-dev, send email to <
> systers-dev-request at systers.org> with subject unsubscribe.
>
To contribute to this conversation, send mail to <Gloria W >
More information about the Systers-dev
mailing list