Database Thaw Reheated

In his November 24 article Database Thaw Martin Fowler wrote about the lock-in of relational/SQL databases and some possible alternatives. I was with him until he wrote that “you can’t get [a] bigger breach of encapsulation than that,” referring to a central database shared by multiple applications. (The indented quotes below are verbatim from Fowler’s article).

For many organizations today, the primary pattern for integration is Shared Database Integration – where multiple applications are integrated by all using a common database. When you have these Integration Databases, it’s important that all these applications can easily get at this shared data – hence the all important role of SQL. The role of SQL as mostly-standard query language has been central to the dominance of databases.

That’s a fair argument, though I don’t agree that SQL has been central to the dominance of databases, or central to making shared database integration the “primary pattern” for integration. Before SQL-based databases took over in the mid-1980s, integration through shared non-relational databases was already the norm. SQL standardizes how users and application get data in and out of the database, and how database schemas are described, but it’s just a language that more or less replaced many proprietary and unstandardized languages that did the same thing.

The heating of the database space comes from the presence of alternatives to integration – in particular the rise of web services. Under various banners there’s a growing movement for applications to talk to each other by passing text (mostly XML) documents over HTTP. The web, both in internet and intranet forms, has made this integration mode even more prevalent than SQL. This is a good thing, I’ve never liked the approach of multiple applications tightly coupled through a common database – you can’t get bigger breach of encapsulation than that.

I’ll go along with the first part, about applications talking with XML over HTTP. I hesitate to call that talking “integration,” though; it’s not comparable to the kind of integration trusted to databases. I’ll talk more about that point later. What jumped out at me was the statement “you can’t get a bigger breach of encapsulation” than a shared database. That struck me as a misunderstanding of what a shared database is for, and particularly a misunderstanding of how a relational/SQL database should be used.

All modern relational databases engines, with the arguable exception of MySQL, are intended to be more than dumb backing store. Database engines are designed to encapsulate data and the valid operations on the data, and to guarantee data integrity to multiple instances of multiple applications accessing the data simultaneously. Large enterprise databases are usually implemented like that. Using those databases, programmers don’t send raw SQL SELECT and UPDATE statements to the database server, and they don’t serialize objects into dumb tables for convenience. The API offered by the database to client applications can and should be the same kind of API you’d expect in a good object-oriented library: well-defined operations on abstract objects, with the underlying data and its representation invisible. If your code is doing this:

INSERT INTO invoices (customer, date, amount) ...;
UPDATE customers SET balance = balance + amount;

and not something like this:

NEW_INVOICE(customer, date, amount);

you are doing it wrong. A shared database should not expose the invoices and customers tables. Instead there should be an API implemented with stored procedures in the database, or a common data access layer written in a non-SQL procedural language to hide the underlying data representation.

I’ll restate Fowler’s concern like this: You can’t get a bigger breach of encapsulation than spreading knowledge of the data and operations on it across multiple applications.

It isn’t the database engine, or the concept of relational databases, or SQL that is the breach of encapsulation. It’s that far too many programmers are either unaware of how to use a database to promote encapsulation, or they don’t see the problem. My own experience is that a lot of programmers learned relational databases from MySQL, which until recently didn’t support things like transactions, stored procedures, views, and triggers, which are required to make a database into something more than a convenient backing store.

The view of databases as something that promotes bad software implementation is common. Ignorance of how relational databases work and how they should be used to make application integration easier and more robust is widespread and growing, evidenced by the ongoing “alternative database” movement that trumpets every reinvention of old ideas (key/value pairs, network databases, hierarchical databases, etc.) as something new and revolutionary. Fowler continues:

If you switch your integration protocol from SQL to HTTP, it now means you can change databases from being Integration Databases to Application Databases. This change is profound. In the first step it supports a much simpler approach to object-relational mapping – such as the approach taken by Ruby on Rails. But furthermore it breaks the vice-like grip of the relational data model. If you integrate through HTTP it no longer matters how an application stores its own data, which in turn means an application can choose a data model that makes sense for its own needs.

I can’t imagine a worse breach of encapsulation than integrating multiple “application databases” over HTTP. And I can’t imagine a worse threat to data integrity. If each application can have its own copy of data, and its own rules for operating on the data, either the applications are going to have a lot of duplicated code and data (a violation of the DRY principle to say the least), or each application’s copy of the data will be possibly wrong and untrustworthy.

SQL is not comparable to XML over HTTP, for reasons that should be obvious, but apparently aren’t. SQL is a declarative language for describing legal operations on a database. XML over HTTP is a data transport mechanism. XML+HTTP can’t address the problems already solved by relational database engines: integrity, consistency, reliability, transactions, locking, and so on. All of those things that are important to real applications will either be ignored or implemented over and over again, many times incorrectly.

Fowler concludes:

I don’t think this means that relational databases will disappear – after all they are the right choice for many situations. But it does mean that now application developers should think about what the right option is for their needs. As non-relational projects grow in popularity and maturity, more and more will go for other options.

I’ll agree that relational databases are not the right choice for everything, and that there are applications that don’t need to care about data integrity or encapsulation or sharing across applications. I hope that when developers look at those “other options” they carefully consider what they might be giving up going down the non-relational path. It will help if developers making those decisions have a serious understanding of relational databases and the powerful features of modern RDBMSs, and don’t just dismiss a successful and mature set of tools as an obsolete holdover from the computing stone age.

I’m always interested in what might be happening in the world of databases, which is why I read Martin Fowler’s article in the first place, but if this is what is happening — a giant step backward in data integrity and encapsulation — I am going to keep my relational database and SQL skills sharp for the coming Rails/MySQL, XML+HTTP, “application database” apocalypse.

5 comments ↓

#1 Mike on 12.05.08 at 3:16 pm

Hi Greg,

I agree with most of what you have to say here. Databases *should* encapsulate your business logic across multiple application languages, as that is what they are there for: to define and protect the data model. Unfortunately, as Rob said, the languages (SQL and its extensions) used by the various database vendors is too limited to accurately reflect this business logic in any meaningful, discoverable way.

One possible solution to this is CodaServer, an open-source tool I am developing. You can get it at http://codaserver.com.

It uses a SQL-like language that provides much richer application metadata and includes the ability to define workflow at the schema level, so application languages can discover it easily and use it to build smart user interfaces. (Be warned: I am uploading a new release tonight that fixes a pretty serious bug, so it may be best to try downloading tomorrow. :)

I do support Martin’s idea of an application orientation for data, but think that this integration needs to happen over (as you said) well-defined interfaces.

-Mike

#2 Greg Jorgensen on 12.05.08 at 4:13 pm

Mike: I agree that PL/SQL and T-SQL are not the greatest languages for implementing business logic. But Oracle and SQL Server support other languages for writing stored procedures, and I believe DB/2 and PostgreSQL do as well.

In my article I wrote “… there should be an API implemented with stored procedures in the database, or a common data access layer written in a non-SQL procedural language to hide the underlying data representation.” I don’t think it matters if the API to the shared data is implemented inside the database engine or in a layer between the database engine and the database clients; what matters is that there is a single API to enforce integrity and consistency, not different implementations in every client application.

#3 Mike on 12.05.08 at 4:49 pm

Hi Greg,

I think we agree here for the most part.

Respectfully, the fact that database engines allow different languages to be used to code procedures inside their environment is sort of beside the point; familiar syntax is great, but the issue with SQL databases is that they don’t have a rich enough set of metadata available for their most common usage scenarios.

For instance, literally every application I have worked on implements its own user management and permissions system outside of that provided by SQL, because SQL permissions are inadequate to describe how users actually interact with various tables. Everything is an UPDATE, basically. User management and permissions are a critical part of business logic, forced to live in application code somewhere.

I do generally agree with the idea of having stored procedures (or something like them) surrounding complicated operations, but I think they only address half of the problem. What does our workflow look like? What are the relationships between our tables? How can we make our user interfaces more responsive to the database schema? Stored procedures certainly help make sure things can’t be updated incorrectly, but SQL’s lack of easily discoverable metadata still makes it difficult to know the *meaning* of the schema across multiple application languages.

-Mike

#4 Dan on 12.07.08 at 6:17 pm

Greg,

I’m not sure that you understand what Fowler means when he says ApplicationDatabases (http://martinfowler.com/bliki/ApplicationDatabase.html). He does not mean that each “application can have its own copy of data”, there is no copying of data; each application encapsulates some of a global schema, the only way for any other application to get the data is by talking the encapsulating application (via xml over http for example, but it could be via any interface).
Instead of one large databases, there are lots of small database. None of the databases duplicate data from another, there is no violation of the DRY principles.

The technology stack for each application can be upgraded and migrated independently. Apps can be deployed and scaled independently. The critical part is how the global schema is broken up into the small databases. If it is done well you will have a high performing maintainable system. If this is done poorly then you will have a slow mess.

#5 Marcel Popescu on 07.03.09 at 1:47 am

In http://typicalprogrammer.com/?p=16 you took the RDB experts to task for warning about this exact problem: people who would do anything to actually learn how to use RDBs and replace them with whatever. “I’ve never seen a claim that MapReduce is a replacement for RDBMS.” says a comment… and yet, I have seen hundreds of alleged “better” technologies than RDBs, including MapReduce. Weird to compare these two articles.