Complaining about relational databases is a staple theme of programmer blogs. Why are so many programmers irritated and frustrated with relational databases? Why do the perceived intricacies of SQL and the “object-relational impedance mismatch” launch so many rants? Why are DBAs more hated than managers? I have some ideas.
Programmers Conflate RDBMSs with SQL SQL is the standard language for describing, querying, and manipulating data residing in a Relational Database Management System (RDBMS). The deficiencies of SQL, real or imagined, have little to do with relational theory, data integrity, or database management. The SQL language has been criticized by the most distinguished proponents of relational databases. Once you learn relational theory and how (and why) to normalize, SQL’s problems will matter a lot less.
RDBMSs Are Old, Like COBOL and Fortran
Most programmers don’t find databases interesting or fun, so they dismiss databases as “legacy” technology that doesn’t know when it’s time to die. RDBMSs are approximately as old as the C language, and younger than Lisp and Forth, which are trendy again. RDBMSs are still around for two big reasons: There is no serious competing data management technology, and a huge amount of data is committed to RDBMSs. If there was an alternative technology that solved the problems RDBMSs solve so well and offered additional advantages database-based applications would be slowly migrating. Instead we have MySQL and SQLite, low-end open source implementations of 35-year-old technology.
There Are Too Many Incompatible Versions Of SQL
Although SQL is an ANSI standard every vendor has extended the language and implemented parts of the language differently. Oracle’s dialect has some serious deviations from the standard, but then again Oracle and a lot of their installed base pre-date the ANSI standard. The differences between SQL dialects are frequently exaggerated by programmers. In real life converting from one RDBMS to another is such a major undertaking that resolving differences in SQL dialects is only a small part of the problem. Writing a database library that can support multiple RDBMSs is obviously complicated by multiple implementations, but this is a problem with a lot of languages and tools, not specifically an RDBMS or SQL problem.
Who Needs Theory — Just Give Me The Data
Because most of what programmers do has no theoretical basis or system of proof, programmers back away from anything that looks like hard math. Failure to understand and appreciate relational theory has launched countless bad databases (and thousands of witless articles condemning SQL). The idea that there is a right way to design a normalized database–and that the wrong ways are provably wrong–is alien to most programmers. Database management doesn’t come down to a matter of opinion like OOP class hierarchies or coding style.
Most professional programmers know how it feels to see an amateur, unfamiliar with Knuth or any programming book containing equations, implement their own sort routine. That’s how people who understand relational theory feel when they see a badly-designed database. Relational theory and RDBMSs are old and well-established now, so it’s hard not to think a lot of programmers are willfully ignorant.
Relational Databases Don’t Play Nice With My Objects
When OOP took over as the dominant programming paradigm all of the older techniques and technologies were dismissed as bad and unenlightened. In the case of RDBMSs this scorn has a technical name: Object-Relational Impedance Mismatch. RDBMSs can be used as backing store for objects, but mapping object relationships, hierarchies, and references to and from a database is tedious and ugly. A generic code layer to do this (called an Object-Relational Mapping or ORM) is complicated enough, but the incompatibilities of RDBMS vendors and versions (see above) make ORMs really hard to get right. Relational databases are based on sets, not on objects, and relations in a database are based on matching values, not on pointers. There’s definitely an apples vs. oranges problem, but neither technology is necessarily wrong or bad. Object databases may eventually offer a workable solution to the object-relational mismatch, but without a theoretical basis comparable to relational theory it will be hard to make a case for moving valuable data out of RDBMSs.
Some of the most entertaining complaints about the “impedance mismatch” are written by Ruby on Rails advocates–entertaining because by design Rails can’t play nice with legacy databases. The right opinion to have in that community is that legacy databases should be replaced, DBAs should be bypassed by programmers installing MySQL on their own servers, and data integrity should be enforced in every application that uses the data.
Relational Databases Don’t Play Nice With Anything
SQL is a declarative language. Most programmers work with imperative or functional languages, where the code describes how to do something. The SQL SELECT statement may look like a command telling the database which rows (records) to retrieve, but really it’s a description of the relations in the database that satisfy the conditions in the statement. In addition to the imperative/declarative disconnect the data types commonly supported in RDBMSs don’t always map cleanly to the host or calling language. This problem has been around for a long time–OOP just exposed it to a new generation of programmers. One partial solution is embedded SQL, where SQL statements are part of the host language, the types are mapped behind the scenes, and imperative control structures (like loops) can hide RDBMS features like cursors. When tempted to kick RDBMSs out of the playground remember they were probably there first.
Database Administrators (DBAs) Cramp My Style
Companies commit a lot of important and valuable data to RDBMSs, and DBAs are charged with making sure that data is secure, backed up, and protected from programmers deploying broken and untested code. Whatever programmers think of themselves, losing data is just as disastrous as–and frequently more permanent than–a crashing program. Maintaining data integrity is what RDBMSs do, and DBAs are supposed to make sure the RDBMS is allowed to do that. Every DBA I’ve worked with has a lot of stories about programmers losing or corrupting big chunks of valuable data, but programmers mainly complain that DBAs won’t let them do whatever they want. DBAs are called “high priests” and derided as petty tyrants. Some DBAs are incompetent or difficult, but that’s true of programmers, too. A good DBA can design correct, robust databases and help programmers access databases efficiently and safely.
Before flying off the handle about DBAs, consider that the database probably has to support multiple applications, simultaneously, preventing each application from corrupting data or stomping on the other applications. This is a pretty big problem and RDBMSs have offered a workable, scalable solution for a long time.
Recommended Database and SQL Books
- Database in Depth: Relational Theory for Practitioners, C. J. Date
- An Introduction to Database Systems, Eighth Edition, C. J. Date
- Practical Issues in Database Management: A Reference for the Thinking Practitioner, Fabian Pascal
- The Database Relational Model: A Retrospective Review and Analysis : A Historical Account and Assessment of E. F. Codd’s Contribution to the Field of Database Technology, C. J. Date
- Joe Celko’s SQL for Smarties: Advanced SQL Programming Third Edition, Joe Celko
- The Art of SQL, Stephane Faroult and Peter Robson
Ryan, 26 September 2007 at 12:36 am
This is a great article. I’m a programmer who loves relational databases. I see many other programmers that have a knee-jerk reaction to relational databases, and it infuriates me. It’s good to see some of the specific complaints laid out and addressed in a non-judgmental manner. I’ve experienced many of these issues first-hand, but never really thought of them as the same issues that cause “database animosity.” Your article helps me understand where those developers are coming from, and helps me to think about what an organization can do to address these problems.
tim, 26 September 2007 at 1:15 am
This is news to me! I like technologies older than SQL, I like declarative languages, I like mathematics, I haven’t had any problems with my ORM (amazingly), and I haven’t actually written raw SQL in ages.
I thought we didn’t like RDBMSs because they don’t really support features that people expect from modern systems, like complex data structures, object versioning, or (worse) schema versioning.
admin, 26 September 2007 at 1:24 am
Tim, there are a few of us but we are outnumbered by the “SQL sucks!” crowd.
The relational model has no problem with complex/compound data structures, but none of the vendors have done much to support non-scalar column types unless you count XML, which is another can of worms altogether.
Versioning is something I’ve struggled with. In lieu of a workable alternative I’ve just accepted this as an inherent shortcoming of RDBMS implementations. I keep my schema in subversion but it’s close to impossible to coordinate that with the data and with applications that use the database.
Chris Date is always careful to draw a distinction between the relational model and the commercial (or open source) implementations, which all fall short. As a programmer the distinction is small consolation, but it also offers hope that the current RDBMS vendors, or an upstart newcomer, could address some of the identified problems. There’s nothing in Codd’s rules that precludes complex types or versioning.
Miklos Hollender, 26 September 2007 at 1:53 am
I agree that relational databases and SQL are not the same, and it’s SQL that often sucks. SQL is wonderful for complex queries, but it sucks at the very basic CRUD and also it sucks for querying data the most typical way: by 1:N relationships. C’mon, all my tables are at least BCNF but I think they are 4NF too, so why can’t I just write SELECT SUM(lines.price * lines.quantity) from customers.orders.lines AS lines WHERE customers.id = @cust_id, why do I need all those superfluous joins? Can someone justify that?
I have a feeling that the designers of SQL mainly thought of static databases where you write complex queries to report stuff but you don’t have to care about the daily stuff, about people entering data and stuff like that. I think they had something else in mind, like, a database where you upload a year’s sales data at once and then you just analyse it.
Having said that, I think if I would not be working in a system where the client application is tightly integrated to the IDE, I’d write 90% of the system in stored procedures. If it’d be web application I think I’d just have something like one very simple web service where you can call any stored procedure with parameters and it would return the result set in JSON or something like that. I don’t see what’s so cool about ORM’s.
Franck, 26 September 2007 at 3:51 am
I remember attending conferences in the early 90’s where people selling “new” OO databases (O2, Versant) were telling us that RDBMS was a 80s technology and that the world will migrate to OODBMS… The fact is this never happened (at least on a large scale), because OO technology never delivered the performance needed for large production applications.
Kun Xi, 26 September 2007 at 6:59 am
Tim, what is your favorite ORM?
I dislike SQL since it is such a big headache for development and deployment. Until recently, I found django, very nice ORM for RAD web application. But still frustrated working on client/server or desktop applications.
Rob Goretsky, 26 September 2007 at 7:11 am
Rob Goretsky says: Working with SQL every day, I’ve learned to love the expressiveness and power of its (admittedly somewhat cumbersome) syntax. I have yet to find an instance where a problem could not be solved with SQL (again, not always NEATLY, but the job gets done). Having worked with Rails a bit, I think it’s a bit dangerous to have programmers looking at the DB as just some abstract persistence layer without having any more understanding of how it works — these are the same programmers who will complain that RDBMS technology is inefficient when they should have just realized they needed to create a few b-tree indexes and use those indexed columns as their query criteria.
TopKak, 26 September 2007 at 7:13 am
SQL is a third generation language (3GL), get with the times and use a fourth generation language like Progress Software and cut your coding effort by 70% . If ‘Time to Market’ is critical, there is no quicker solution – granted there are licensing fees, which is why most Fortune 100 companies use it to drive their businesses.
Keith Sader, 26 September 2007 at 7:30 am
My one gripe with sql is it’s lack of modularity. SQL is a bad functional language in as much as it makes me repeat and repeat and repeat the same statements over and over and over again.
Attempts to patch over SQL’s lack of modularity have resulted in bad languages like PL-SQL and T-SQL, both of which try to cram imperative syntax on to a functional domain. D/Tutorial D still remains a pipe dream for any in-use database at this point, so we’re stuck with an awful language.
RBDMS theory is great, but like Date et al. the execution is poor at best, but businesses find even this poor execution useful.
Matt Revelle, 26 September 2007 at 7:32 am
I’ve never heard another programmer say “SQL sucks”. What I have heard is, “why are we using a RDBMS as storage for objects?”.
To the web app developer, a RDBMS is a storage container for structured data that is easily backed-up. It’s used instead of files for performance reasons (i.e., indexes) and instead of an OODBMS because there is not one readily-available (i.e., open source and free).
The modern position of DBA seems unnecessary; however, the responsibilities are real. Backups and availability are the responsibility of the sysadmin while design is the responsibility of the programmer. The database should be designed by those using it, and those designing it should grok normalization, data integrity, etc.
“Data warehouse” environments are the possible exception where a DBA is arguably the appropriate choice since there’s data being collected but not used.
Drew Vogel, 26 September 2007 at 8:56 am
Using RDBMs is annoying to me only because the APIs for determining what went wrong with my query are not very fun to use. e.g. did my insert fail because of a syntax error vs. a duplicate key vs. a failed check vs. a locked table — all of which effects the error message that I give my user.
Rich, 26 September 2007 at 9:53 am
Before flying off the handle about DBAs, consider that the database probably has to support multiple applications, simultaneously, preventing each application from corrupting data or stomping on the other applications.
Wow. That’s the big problem. Multiple applications accessing the same database. That’s just prone to lead to problems even with the best, most diligent of DBAs. For the love of all that have to rely on such data, don’t do it!!! Wrap the database in a web service or some other remote abstract interface and have the apps access that!
admin, 26 September 2007 at 10:33 am
Miklos: “Why do I need all those superfluous joins? Can someone justify that?”
Without the explicit joins the relationships would have to be defined in the database schema. The problem is SELECT statements would no longer completely and explicitly describe the relations — the meaning of a SELECT would depend on information not visible (and maybe not available) to the user of the database. This kind of problem is usually solved with views.
In your example it’s easy to understand the shorthand for joining two 1:n relationships, but the shorthand would break down with n:n or other more complex joins, which would lead to some queries looking simpler (because they can depend on implicit behavior), but the hardest queries being just as hard as they are now.
tim, 26 September 2007 at 10:42 am
Frank: I’ve used Versant. Nobody else has, and not because it can’t deliver on performance, but because it can’t deliver on features. Its VQL is a subset of OQL, and one of the things they omitted (in 2007, still!) was being able to SELECT on a string field of an object. You’d think SELECT FROM X WHERE X.name = ‘Y’ is pretty basic stuff that would be supported everywhere, but no. It also doesn’t help that they only support a couple languages (C++ and VB?).
Kun: I’m using ActiveRecord right now, and have no complaints. I’ve not used Django, but I’ve heard it’s similar. What’s frustrating about using it on a desktop app?
Matt: There’s a free and open-source OODBMS called db4o. I don’t know if it’s any good, as it was Java-only (at the time — it’s since added C#) and didn’t have the bindings I was looking for.
admin, 26 September 2007 at 10:47 am
Matt: “The database should be designed by those using it, and those designing it should grok normalization, data integrity, etc.”
You’re assuming a database is used by one application. A more common scenario is a legacy database that has lots of corporate information, and that database was originally designed for a green-screen or client-server application. The legacy applications still use the database, but now you need to access the database from a brand new web application. Turning the database over to programmers is not a great idea unless they understand and take into account the legacy applications that are using the database. In that context you really need a DBA who concentrates on data integrity, regardless of what code or group of programmers need to use it.
Rich: “Wow. That’s the big problem. Multiple applications accessing the same database. … Wrap the database in a web service or some other remote abstract interface and have the apps access that!”
A web services interface doesn’t help those legacy apps (see above). Wrapping another API around the RDBMS is hard to do, whether it’s an ORM or a web services layer. When you’re done you’ll find that making HTTP requests to retrieve and manipulate data has exactly the same problems as using SQL: lack of expressiveness, data type mapping problems, flow of control issues. You’ll have to re-invent a good chunk of what SQL does, and the result will almost certainly be slower and buggier, without the data integrity guarantees the RDBMS provides.
That said, I have implemented web service APIs to legacy databases, abstracting operations on the data instead of the database itself. This is how programmers should be using databases anyway, not embedding SQL statements all over the place. In shops with DBAs programmers don’t like having to get permission to write and deploy stored procedures (if they know how to write them to begin with), and DBAs often don’t like procedural code in the database. Writing the abstraction layer in procedural code is the approach I usually take, so most of the application does not interact directly with the database.
In my experience ORMs are abstractions that leak right off, because the application has to “know” way too much about the structure and contents of the database — the ORM doesn’t hide the database enough.
jojo, 26 September 2007 at 11:39 am
You missed a super important one: RDBMS’ don’t scale well. Ever. If you build an architecture around an RDBMS and the service becomes enormously popular, you’ll end up doing what YouTube and everyone else does: sharding the DB, because an RDBMS == SPOF. This sucks on so many levels its hard to know where to start.
If you are smart, you instead eschew the RDBMs model from the very beginning, as google did with BigTable, and you implement something that is infinitely scalable, such as a DHT or, at the very least, a DB model like that of CouchDB.
RDBMS’ are incredibly weak when it comes to scalability, despite all of the work that has gone into trying to make them scalable over the past several decades. And its not because that scalability work was bad — its because the RDBMS model is inherently unscalable from the getgo.
Admitting that is the first step.
admin, 26 September 2007 at 11:56 am
jojo: You raise several issues because scalability has multiple dimensions. Just dismissing RDBMSs as weak and inherently unscalable denies the vast number of huge relational databases that have scaled just fine in size and number of clients.
Scaling to handle huge amounts of data efficiently while maintaining integrity is what RDBMSs do best. Other models for data storage and retrieval (file system, hashes, OODBMSs, XML databases) have a hard time keeping up with RDBMSs when the database gets big. Most databases are small and at that scale almost anything works OK, but with enough data RDBMSs pull ahead.
Distributing the data across servers to reduce the SPOF risk and to avoid overloading individual servers with too much work is a different problem. High-end RDBMSs offer replication, publish/subscribe, and distributed databases to address these problems. Managing distributed data while maintaining integrity is a hard problem for any data management system, RDBMSs included.
BigTable and CouchDB are solutions that offer some advantages in scalability and allowing distribution of data, but they don’t solve the same problem as RDBMSs. There is some overlap in functionality, but BigTable and CouchDB have different design goals. If the need is to store a large amount of mostly static unrelated data and find a few large blobs of it quickly, with no SPOF, that’s not a great application for an RDBMS. If the need is to manage and report on millions of related transactions in a homogenous way — like banking transactions — an RDBMS is going to work a lot better than CouchDB.
tracker1, 26 September 2007 at 12:21 pm
jojo, I have to disagree to some extent… I think that RDBMS systems can be very scalable. Both Oracle and MS-SQL can scale very well. If you have mostly-read data, then mySQL does a fair job of scaling. Now, these aren’t necessarily the best implimentations, but are often easier than migrating an entire data system to a new platform. Personally, I like MS-SQL’s system for replication of data. And Oracle can run on some very big hardware platforms, with replication… mySQL has a pretty efficient system for replicating read-only data, which is useful in web applications, I think slashdot should prove this (aside from some inappropriate index choices in the past).
I would love to see some replication options put into PostgreSQL myself, that work well and are F/OSS solutions. I haven’t worked with IBM’s DB/2 enough to be familiar with its’ replication/scalability options.
Personally, I both love and hate RDBMS, because on one hand, it is a firm system for managing data… SQL itself is mostly portable among different implementations with usually minor differences. What I hate is manually manipulating information, or creating an ORM layer, or any number of other issues. Reporting is usually done better closer to the dbms than data entry.. I think some variations to the “rules” need to apply. The problem is people(managers) want everything to be done the same way… I think that reports are best done with views, or stored procedures without much of a layer between the request, and the display… Data entry/manipulation should have more abstraction.. db>sproc>orm>bll>model>view And there are instances where the latter is excessive.
In any case, I am a programmer, and I like RDBMS.
David Bond, 26 September 2007 at 12:43 pm
Well, I’ve just downloaded db4o and am about to give it a go. The tutorial does, indeed seem to demonstrate a simple way of storing and re-accessing all my objects, completely avoiding the SQL step. I am suspicious though – can I get the power I need? How can I pull two different related objects out at the same time? What about relationships between objects? How can I visualise my data outside my application etc. I’m willing to give it a try!
Eric Elinow, 26 September 2007 at 12:46 pm
In many of the companies in which I have found myself working, we lacked a DBA either due to our small company size or lack of need. Either I or suitably capable team member took on this responsibility and that has worked fine in most cases.
The situations usually drive the need and as several others have stated, larger multi-app accessed DB’s require a controller to ensure all remains competent with the DB. While I have met quite a few DBA’s with massive attitude problems and self-deified world views, I can clearly say that great DBA’s, truly great DBAs, are worth every penny. I have only met a few of those, and only a small subset of that group didn’t suffer the holier-than-thou attitude.
All that being said, the inter-disciplinary problems which have traditionally existed between development and dba departments can usually be rectified by simply keeping individuals from both groups involved in every step of the initial DB design phase(s). As long as everyone keeps an effective line of communication open, problems should be kept minimal, few and far between.
Zak, 26 September 2007 at 1:08 pm
I think one of the biggest problems is people using a relational database for an application built from scratch when it’s the wrong model. A lot of applications have data that’s organized hierarchically, with leaf nodes being complex data structures. Filesystems and web services are a better model for this sort of information.
When your only tool is a hammer, every problem looks like a nail. Think about what sort of data store you should be using before you begin work on a new application. You might just save yourself from smashing your thumb.
engtech, 26 September 2007 at 1:28 pm
Does anyone have a list of best practises for a programmer creating a MySQL database app using Rails?
The only time I paid any attention to RDBMS was when I was doing some entry level stuff with Oracle/VAX.
jon, 26 September 2007 at 1:40 pm
It’s a mistake to say that “by design Rails can’t play nice with legacy databases”. It’s more of a case of “by default rails doesn’t place nice with legacy databases”.
It’s entirely possible to make Rails (more specifically ActiveRecord) to work with legacy databases. It’s more work to setup, sure, but it’s not really that big of a deal as long as you understand your legacy schema.
(See http://the.rubydevelopers.net/node/25 for an intro. It’s old, but still valid.)
admin, 26 September 2007 at 2:04 pm
jon: I was commenting on the Rails gestalt, especially the opinions of David Heinemeier Hansson. Rails can of course be made to work with “legacy” databases with varying degrees of hassle.
For my source of DHH’s opinion on legacy databases and DBAs see:
http://www.loudthinking.com/arc/2005_09.html (“Choose a single layer of cleverness”).
http://www.loudthinking.com/arc/000352.html (“The DBA distortion filter”).
There are plenty of other examples. I haven’t seen his resume but I don’t think DHH has ever worked in a shop with a big legacy database supporting multiple applications that were developed at different times with different tools. And I don’t think he has worked with many DBAs. His opinions are interesting (and emphatic) but may not be based on much real experience, and since Rails is “opinionated software” the opinions of the author matter.
jon, 26 September 2007 at 2:16 pm
That’s what I hoped you were trying to say, but it wasn’t totally clear.
I absolutely agree that the typical “Rails mindset” is that legacy databases are evil and should be destroyed, and just wanted to clarify the point for anybody not familiar with Rails.
Mike Moore, 26 September 2007 at 4:16 pm
Uggh, the “Rails mindset” is not that legacy databases should be destroyed. It is that your web application should use an application database as opposed to an integration database. If you want to replicate data to and from your application and integration databases then go ahead.
The Rails default ORM library is ActiveRecord, which is a thin wrapper to the database. Therefore, the “Rails mindset” considers the rails-specific database as a persistence layer for the domain. The point it that the domain design drives the persistence layer design, not the other way around.
Shanti Braford, 26 September 2007 at 4:40 pm
Tim, okay, you got me. I’ve become lazy at writing raw SQL after using Rails for so long. =)
But for web devs, while we may not love with all our heart’s desire our particular RDBMS, it’s not as if we are jumping out of our seats to find a different paradigm.
I also used to be a DBA for the 2nd largest private company owned data warehouse in the world (SBC). Let me tell you, that job sucked!
My favorite DBA story from SBC:
“Select * From …” was not allowed under any circumstance, and they were even clever enough to write some script that checked for that.
So you might have 20 columns in a table, actually be using all of them (for admin display, or whatever) and have to rewrite your SQL to manually include each and every column in the table.
undees, 26 September 2007 at 5:10 pm
Either I’m an atypical Rails guy, or (more likely) the blog rants you’ve seen aren’t really representative. I suspect a lot of Rails folks are just quietly teaching their ActiveRecord objects how to talk to whatever whiz-bang schema the expert DBAs have cooked up. They’re not angrily posting “RDBMS is teh suxorz!” because it’s working for them.
RDBMS are just a tool to use — they’re neither the spawn of the devil, nor the only sane way to store/query your data. And that brings me to another topic.
I’ve seen at least one or two Websites (not this one!) that have expressed the sentiment that OO is for idiots and that anything shy of the (angelic music here) relational model is a waste of time.
One of the many defenses offered for this view is the accusation that OO has no theory at all, whereas relational theory has a model that’s (angelic music again) backed by science and proven to be right and you’d have to be an idiot not to believe it.
Such statements are problematic on two counts. First, they ignore the decades of research on the mathematical underpinnings of OO, or at least after-the-fact reasoning about OO (start with Luca Cardelli – http://tinyurl.com/2lmb45 (PDF).
Second, they offer no information about this wealth of relational theory that supposedly proves them right. Wikipedia would have you believe that the relational model is just something Codd and Date cooked up in their respective garages; surely there must be more to it than that.
So I’m asking an honest, non-troll question: where’s the theory? Where should a programmer interested in the relational model start reading?
admin, 26 September 2007 at 7:00 pm
undees: Thanks for the link to the Cardelli slides, even though WordPress munged the link (tinyurl is your friend!).
Here’s the difference between OOP theory (or rather type theory as it applies to OOP, which is as far as Cardelli got): You can’t use the theory to design a system of classes given a problem domain, nor can you use the theory to show how any given system of classes is incorrect. The best you can do is find inconsistencies in the types.
Contrast the current state of OOP theory with the well-defined rules of data normalization and relational algebra. A DBA or programmer who understands normalization can look at any database schema and determine the level of normalization, find the redundancies and duplications, and describe how to transform the schema to a higher normal form.
For any non-trivial collection of data there is usually more than one way to normalize the data, but each normalized schema can be tested and defended. More important every non-normalized schema can be identified and weeded out. With a normalized database relational theory can guarantee that each legal operation on the database will leave the data in a predictable, consistent state. That’s the advantage of having a mathematical foundation.
The same can’t be said of OOP design. Programmers rely on experience, judgment, received wisdom, fads, and trial-and-error that have little or no theoretical basis. OOP is not necessarily bad, and there are definable best and worst practices, but it’s not possible today to apply the same rigor to OOP as it is to relational databases.
I’m sure you can dig a little more to find that relational theory, relational algebra and calculus, and the principles of normalization were not just cooked up by Codd and Date. Date’s books are a good place to start, especially the Introduction to Database Systems (linked above). Wikipedia has quite a few relevant articles:
etc. — just follow the links and references around.
Doug @ Straw Dogs, 26 September 2007 at 11:21 pm
I’m learning to love them slowly. As I realise so much is possible in the humble SQL statement that I wasn’t once aware of.
Some of the wizardry I’ve pulled off from a well constructed MySQL statement has saved on a LOT of code.
Mike Brauwerman, 26 September 2007 at 11:39 pm
SQL is a declarative language. Most programmers work with imperative or functional languages, where the code describes how to do something. You meant: SQL is a declarative (aka functional language). Most programmers work with imperative (aka procedural) languages, where the code describes how to do something.
“Declarative” and “Functional” are (approximately) synonynms.
Emilian Bold, 27 September 2007 at 1:32 am
I don’t think programmers dislike databases per-se. They dislike the hassle of getting the data into their app. And this means writing SQL and more important, using an ORM.
Now, ORM is a way to babysit the programmers and hide them from SQL. Sadly, I think it has failed (or when it fails you are most likely lost) so I’m starting to think I’d rather write my own SQL then rely on another ORMer.
Mike Moore, 27 September 2007 at 8:38 am
The same can’t be said of OOP design. Programmers rely on experience, judgment, received wisdom, fads, and trial-and-error that have little or no theoretical basis.
I think its a mistake to dismiss OOP design as having no theoretical basis. It does, but software design is fundamentally more complex than relational theory. For as many options we have to normalize our data we have even more options to how we design our software, and folks who are learning are going to be on the domain model curve longer than the relational model curve. Languages and platforms have a huge impact on software design. You will design a domain model differently for Java or Ruby or Erlang, where a (non-application) relational model would be largely the same whether using Sybase or Oracle or MySQL.
I think this post is mis-labeled. You really haven’t explained the dislike of RDBMS by programmers who matter. But congratulations on all the traffic! 😉
Tim, 27 September 2007 at 10:58 am
I feel dirty for saying this, but:
The problem with trumpeting “Relational = Good because we’ve got math(tm)!” is that nobody uses the math. When’s the last time you saw a programmer or DBA or anybody sit down and write a proof that their data is normalized?
Similarly, I learned program correctness proofs in CS 211, and have never used them again, nor have I seen anybody ever use them again. Which is not to say they’re not cool — they are. But they’re simply not a useful tool for building programs, in part because it’s harder to write a correct proof than it is to write a program!
And I’ve seen nothing to lead me to believe that it’s any different for relational theory. Ask your DBAs which they use more in their jobs: experience and intuition, or mathematical proofs. Or ask somebody to submit a written proof that their data is normalized.
Proofs are cool, proofs are fun, but until somebody comes up with a system for writing proofs that isn’t harder than writing code or schemas, it’s not going to get used. It’s simply not the deciding factor for anybody.
undees, 27 September 2007 at 11:13 am
Thanks for your apt reply.
WordPress munged the link (tinyurl is your friend!).
Yeah, I thought of tinyurl, too, but only after the fact. L’esprit de l’escalier, I suppose….
You can’t use the theory to design a system of classes given a problem domain … The best you can do is find inconsistencies in the types.
I think the OO experts (I don’t claim to be one) would argue that some problem domains can be tackled in such a way that most potential errors are type inconsistencies. Sort of an Orwellian “revolution is impossible if you can’t express the idea of revolution” concept.
A DBA or programmer who understands normalization can look at any database schema and determine the level of normalization, find the redundancies and duplications, and describe how to transform the schema to a higher normal form.
I think you’re right. But is data normalization the same as problem solving? I mean, knowing how to make sure my representation of reindeer is in fifth normal form isn’t the same as solving the Santa Claus problem.
With a normalized database relational theory can guarantee that each legal operation on the database will leave the data in a predictable, consistent state.
That is indeed an important result. A decade or so ago, I would’ve chalked up normalization as “just good common sense” (I mean, having a string column with a comma-delimited list of product names or something just smells bad) without looking enough into the issue of why it’s so crucial.
it’s not possible today to apply the same rigor to OOP as it is to relational databases.
That may be, but there’s a big difference between “less rigor” and “no rigor.” And again, OOP is targeted (perhaps too ambitiously) at a much broader domain than just data representation.
Wikipedia has quite a few relevant articles
Yup. I read those before posting. But where to next? Would you recommend Codd 1970, or the Third Manifesto? Do I need a grad degree in mathematics to understand them?
Stephen Jones, 28 September 2007 at 2:49 am
I’m told the designers of operating tables don’t understand brain surgery either; it’s so last century.
Tom Barta, 30 September 2007 at 9:13 am
Wow. That’s the big problem. Multiple applications accessing the same database. That’s just prone to lead to problems even with the best, most diligent of DBAs.
With a good RDBMS, this is easy to deal with. Every application can be granted a set of views (either read-only or updateable) and functions necessary to get their work done. If your RDBMS allows clients to corrupt it by inserting things in the wrong order, or not connecting the dots correctly, you already have a problem.
If you can’t rely on a diligent DBA (whose job depends on the database) to protect the integrity, how can you rely on a diligent programmer to do the same? Turning something into a web service isn’t any safer than adding a layer of views/functions, and it’s a good deal more work. In fact, no amount of web services will help when you have to deal with a concurrent insert/update issue, unless you want to re-implement transactions and ACID in your web service.
If you’re using a database that doesn’t support the protection you need to segregate application interfaces (heck, even the latests Postgres still has issues with function permissions), you’re probably better off finding a better RDBMS provider.
Jack White, 3 October 2007 at 9:53 am
I’m a vaguely programmey guy and I hate RDBs too, but not too much. I find them somewhat incomprehensible. I have a basic understanding of their theory and I can appreciate it. However, I have grown up programming in BASIC, C and assembly. Things like redundant data are absolutely second nature, because you’re moving this value there so you can work on it and then you’re moving it back. If you’re using some kind of non-essential device, like a modem, you’ve then got a third copy of your data in the modem’s send memory.
I wanted to be modern (which relational databases are on a massive scale if not on a small scale – everyone’s got them now), but all this abstraction is just too much for my limited brain.
kauroff. outsourcing company, 8 October 2007 at 11:40 pm
I think OOP or relational databases are just experience of previos generations. And like any other’s experience, even good and useful, it can be accepted or no with next generations. It depends more on obedience.
Doug, 13 November 2007 at 9:17 am
I’m a programmer and I freaking love relational databases.
Lee, 13 November 2007 at 11:12 am
Sorry, disagree. I dislike them because they ARE old and DON’T solve OOP problems. SQL is NOT staying around because it’s the only solution out there; it’s rapidly being supplanted by ORMs. The problem is that the application coders who need ORMs are not the same people who develop DBMS’s for a living, so provision hasn’t caught up with requirements.
Inane User, 13 November 2007 at 11:12 am
If Relational databases cannot handle the requirements, then Hierarchical databases work fine. One need only convert the Relational output to XML.
Tjerk, 13 November 2007 at 11:45 am
At my university they have created a Native XML Databases, that builds upon the relational model. NOT on SQL. (monetdb it is called)
The main query language is XQuery and is a very extensive and expressive query language. Much better than SQL. It’s a superset of XPath.
I think that the XML Model (Infoset) is the new datamodel that is far more expressive than the relationalmodel.
I hear you say: “But XML is Bloated”
NOPE: XML human readable syntax is bloated, not the datamodel. The data model is just an hierarchical data model.
I have many more advantages, but off course: it all depends on the datamodel that you require.
admin, 14 November 2007 at 1:19 pm
Tjerk: XML databases are a reincarnation of hierarchical databases, which were made obsolete by the relational model. See:
The limitations of hierarchical databases have been described in several books; Chris Date’s “The Relational Database Model” (linked above) gives a good historical perspective.
Penguin Pete, 11 December 2007 at 5:35 am
Wow, thank you! All my life, I have hated database work, but never could put my finger on why. Even though I’m relatively comfortable with *SQL, I still treat it like something vile that I only touch when I have to use it quick and fling it away until next time.
Donald, 11 December 2007 at 7:08 pm
This article is a good read. I would also suggest reading http://database-programmer.blogspot.com its a blog that was started by a friend of mine for I consider a database expert as well as a great programmer, and its all about database fundamentals for programmers. Also check out http://andro.sf.net