[Systers-dev] Database Schema for Message Storing
Yian Shang
yian.shang at gmail.com
Sun Jul 25 01:02:10 PDT 2010
I think the abstraction layer is a good idea for the future, makes things a
lot more open-ended in case we discover that one type of db doesn't work.
I went ahead and implemented a crude version of this abstraction today.
Basically each new database class just needs a method 'add_message(msg)',
where msg is a map of fields to values. That class can then have whatever
other methods it needs to handle adding a message. I implemented the
Postgres version, and hopefully other databases can be added in the future.
I also wrote a script to convert the current mbox archives to entries in the
database using the Postgres db class mentioned earlier.
Here's the code:
http://bazaar.launchpad.net/~systers-dev/systers/archive-ui/revision/102 and
the db structure: http://systers.org/systers-dev/doku.php/database-schema
Yian
On Sun, Jul 25, 2010 at 12:15 AM, Ann-Marie Horcher <horcheram at gmail.com>wrote:
> The abstraction layer that Robin proposes as the additional advantage of
> allowing a comparison between various database back ends.
>
> Over the life of an organization, its data may move from one repository to
> another, as technology and best practices change. An interface that is
> database-agnostic is valuable for transition, comparison, and, as Robin
> describes, for matching the database characteristics to the needs of
> certain
> sections of an application.
>
> On Fri, Jul 23, 2010 at 12:31 PM, Gloria W <strangest at comcast.net> wrote:
>
> > No problem, I respect your opinion. But irrespective of this, please do
> > pass me whatever evidence you have found to lead you to believe the
> MongoDB
> > is unfit for production. I have never heard of such a thing, I would
> really
> > dislike for such an unsubstantiated rumor to spread through a large
> > development community. Also the source of this information you have is
> > extremely important to me.
> >
> > Thank you,
> > Gloria
> >
> >> I definitely don't want to get into database wars, but some things to
> >> think about for throughput decisions
> >> - for the standard use of mailman, I don't think we need a high
> >> throughput database. Systers has 3000 members, which I suspect is on
> the
> >> high side for mailman lists, and even on days when we are going fast and
> >> furious, mailman can keep up (and I don't think we are particularly well
> >> tuned for throughput). In fact, I consider slowness a feature, as it
> keeps
> >> people from posting even more often (they don't see the responses as
> >> quickly), and it's a rare topic that increased posting makes better :-).
> >> Maybe Terri can comment on whether there are situations where mailman
> lists
> >> need high/higher throughput
> >> - now, for archives, a different approach may be important -- you want
> >> to pull out the messages that match the request and show them (or at
> least
> >> show the first 10, and I don't know if they come out of the database in
> the
> >> order we will want to show them). That does need to be "fast enough",
> which
> >> is probably "pretty fast".
> >>
> >> We definitely need an abstraction layer that allows us to (at some point
> >> -- maybe not this summer) experiment with different databases, so that
> we
> >> can find the right balance between the two needs I describe above.
> >>
> >> Robin
> >>
> >> On Fri, Jul 23, 2010 at 1:23 AM, Gloria W <strangest at comcast.net<mailto:
> >> strangest at comcast.net>> wrote:
> >>
> >> On 07/23/2010 01:20 AM, Jennifer Redman wrote:
> >>
> >> On Thu, Jul 22, 2010 at 9:40 PM, Yian
> >> Shang<yian.shang at gmail.com <mailto:yian.shang at gmail.com>>
> wrote:
> >>
> >>
> >>
> >> I just set up and played with MongoDB for a bit and I
> >> added the indexes
> >> we'd
> >> probably need on the wiki:
> >> http://systers.org/systers-dev/doku.php/database-schema.
> >>
> >>
> >>
> >> MongoDB is INCREDIBLY unstable and in no way ready for
> production.
> >>
> >> I VERY STRONGLY DISAGREE. Some people simply aren't using it
> >> correctly. This article states the contrary, in fine detail:
> >>
> >>
> >>
> http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/
> >>
> >>
> >> There is
> >> is this slight problem of complete and totally unexpected
> >> data-loss.
> >>
> >> Again, I don't believe this. In this one particular case, it is
> >> most likely corrupted and needs to be repaired, which can happen
> >> to any database, irrespective of type.
> >>
> >> Also, 10gen makes it very obvious in their presentations that if
> >> you want immediate, verified writes (ACID compliance), you set a
> >> flag, and it is accomplished. They state clearly why this is
> >> necessary, the trade-offs for each choice, and that data integrity
> >> is not an issue. The person who wrote the fear-monging article
> >> also fails to state that even couchdb works this way, but has the
> >> opposite default value. Read this for an accurate assessment of
> >> durability of MongoDB:
> >>
> >> http://blog.mongodb.org/post/381927266/what-about-durability
> >>
> >> As an aside, these trade-offs have been made by MySql users for
> >> over twenty years now, when choosing between MyISAM and InnoDB, so
> >> this is not a new philosophy by any means.
> >>
> >>
> >>
> >> http://www.mikealrogers.com/2010/07/mongodb-performance-durability/
> >>
> >> If you want to play with a NoSQL db look at CouchDB -- but I
> >> highly advise
> >> against using MongoDB for anything.
> >>
> >> I strongly disagree with this statement, and know many who use it
> >> in production for high throughput, have not lost data, and are
> >> happy with the results. I am also going to ask you for your data
> >> substantiating this statement, since I cannot find any. If you're
> >> going to point to this one article, this will not suffice.
> >>
> >> If you want to switch to a non-relational db server then you
> >> need to do some
> >> analysis first on why that is a correct decision and then
> >> evaluate all the
> >> tools. Stability and extensibility is extremely important.
> >> How many people
> >> running Mailman are going to be willing to run MongoDB? Or
> >> CouchDB for that
> >> matter?
> >>
> >> I think people will be willing than you realize, given the
> >> throughput benefits. Like I said before, db architects have been
> >> making the ACID vs. non-ACID compliance decision for many years,
> >> and this is not a new concept.
> >>
> >> Also, I am more fearful of projects which use relational, heavily
> >> indexed, slow-insert databases on potentially high traffic
> >> projects, without doing the analysis first on where this model
> >> will fail. Your view of having to do analysis to prove the need
> >> for a non-relational db strikes me as odd. We need to do analysis
> >> to prove that a relational db is the correct solution to this
> >> problem, and will not so badly affect performance that it brings
> >> Mailman to it's knees.
> >>
> >> How many people are going to be willing to adjust their SHMMAX
> >> setting sot make it run faster, and potentially bring their
> >> machine to a grinding halt? How many people will be willing to
> >> auto-vacuum their db hourly? How many others will be willing to
> >> change the MySQL indexing scheme mid-stream, from InnoDB to the
> >> less reliable, non ACID compliant yet faster-insert model
> >> (MyISAM)? How many others will feel as if they have to switch to
> >> a much more expensive architecture, just to allow the relational
> >> db to keep up with massive amounts of single-record inserts? Most
> >> certainly, there's trouble ahead using this model, and analysis
> >> needs to be done for this model.
> >>
> >> Maybe the solution is to write a flexible DB API, and retrofit one
> >> relational and one non-relational db solution to it. No single
> >> group of developers should be making a predetermined decision
> >> about what database soultions would be appropriate for anyone
> >> else's mail traffic, IMHO. There is no one-size-fits-all solution
> >> to this issue. So I propose that we make this solution flexible
> >> via a DB API, and provide two preconfigured, out-of-the-box
> >> solutions while allowing the user to come up with their own.
> >>
> >> Gloria
> >>
> >>
> >>
> >> Gloria
> >>
> >>
> >>
> >>
> >> To unsubscribe from this conversation, send email to
> >> <systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>
> <systers-dev%2Bdatabase3%2Bunsubscribe at systers.org<systers-dev%252Bdatabase3%252Bunsubscribe at systers.org>
> >
> >> <mailto:systers-dev%2Bdatabase3%2Bunsubscribe at systers.org<systers-dev%252Bdatabase3%252Bunsubscribe at systers.org>
> <systers-dev%252Bdatabase3%252Bunsubscribe at systers.org<systers-dev%25252Bdatabase3%25252Bunsubscribe at systers.org>
> >>>
> >> or
> >>
> >> visit
> >> <
> >>
> http://systers.org/mailman/options/systers-dev?override=147&preference=0
> >> <
> >>
> http://systers.org/mailman/options/systers-dev?override=147&preference=0
> >> >>
> >> To contribute to this conversation, use your mailer's reply-all or
> >> reply-group command or send your message to
> >> systers-dev+database3 at systers.org<systers-dev%2Bdatabase3 at systers.org>
> <systers-dev%2Bdatabase3 at systers.org<systers-dev%252Bdatabase3 at systers.org>
> >
> >> <mailto:systers-dev%2Bdatabase3 at systers.org<systers-dev%252Bdatabase3 at systers.org>
> <systers-dev%252Bdatabase3 at systers.org<systers-dev%25252Bdatabase3 at systers.org>
> >
> >> >
> >>
> >> To start a new conversation, send email to
> >> <systers-dev+new at systers.org <systers-dev%2Bnew at systers.org> <
> systers-dev%2Bnew at systers.org <systers-dev%252Bnew at systers.org>> <mailto:
> >> systers-dev%2Bnew at systers.org <systers-dev%252Bnew at systers.org> <
> systers-dev%252Bnew at systers.org <systers-dev%25252Bnew at systers.org>>>>
> >>
> >> To unsubscribe entirely from systers-dev, send email to
> >> <systers-dev-request at systers.org
> >> <mailto:systers-dev-request at systers.org>> with subject unsubscribe.
> >>
> >>
> >>
> >
> >
> > To unsubscribe from this conversation, send email to <
> > systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>
> <systers-dev%2Bdatabase3%2Bunsubscribe at systers.org<systers-dev%252Bdatabase3%252Bunsubscribe at systers.org>
> >>
> > or visit <
> > http://systers.org/mailman/options/systers-dev?override=147&preference=0
> >
> > To contribute to this conversation, use your mailer's reply-all or
> > reply-group command or send your message to
> > systers-dev+database3 at systers.org <systers-dev%2Bdatabase3 at systers.org><
> systers-dev%2Bdatabase3 at systers.org<systers-dev%252Bdatabase3 at systers.org>
> >
> > To start a new conversation, send email to <systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
> <systers-dev%2Bnew at systers.org <systers-dev%252Bnew at systers.org>>
> > >
> > To unsubscribe entirely from systers-dev, send email to <
> > systers-dev-request at systers.org> with subject unsubscribe.
> >
>
>
>
> --
> Ann-Marie Horcher
>
>
> To unsubscribe from this conversation, send email to <
> systers-dev+database3+unsubscribe at systers.org<systers-dev%2Bdatabase3%2Bunsubscribe at systers.org>>
> or visit <
> http://systers.org/mailman/options/systers-dev?override=147&preference=0>
> To contribute to this conversation, use your mailer's reply-all or
> reply-group command or send your message to
> systers-dev+database3 at systers.org <systers-dev%2Bdatabase3 at systers.org>
> To start a new conversation, send email to <systers-dev+new at systers.org<systers-dev%2Bnew at systers.org>
> >
> To unsubscribe entirely from systers-dev, send email to <
> systers-dev-request at systers.org> with subject unsubscribe.
>
To contribute to this conversation, send mail to <Ann-Marie Horcher >
More information about the Systers-dev
mailing list