Typical Programmer

A glittering performance of rare perception


Popular articles
All articles

About me
Hire me

Abject-Oriented Databases

24 Jul 2007

Most serious software applications store and retrieve data from databases. Relational databases are widely used to reliably manage large amounts of related data. Here are some database design and programming techniques that work well with Abject-Oriented Programming.

abject adj.: sunk to or existing in a low state or condition.

ACID: Atomicity, Consistency, Isolation, Durability

ACID describes the core properties of a database that guarantee reliable data storage and retrieval. Atomic means do all of the operations or none of them — don’t forget to insert a customer record when inserting an order. Consistency means using the same data types throughout your database. Isolate your database operations from the rest of your code, under multiple layers of database APIs or behind an ORM (more on those later). Durable means the database should still work after a server crash.

SQL: Structured Query Language

SQL is the language all relational databases understand. SQL is an ANSI standard so your SQL code should work on any of the popular flavors: Oracle, Microsoft SQL Server, Sybase, PostgreSQL, MySQL, SQLite, and MS Access. SQL is called a declarative language because you declare what the database server should do, as opposed to an imperative language. SQL is designed to be English-like and easy to use, so executives, managers and other non-technical people can use relational databases.

Codd’s Rules

Back in the 60s or 70s IBM researcher Ted Codd described how relational databases should work in mathematical terms. Without getting into lots of details, if you didn’t understand algebra and calculus in college you will probably not understand how relational databases work under the hood. You can use SQL without understanding the math, though. Codd’s laws are a little dated and sometimes SQL shows its age: like Fortran, SQL has more functions for doing trigonometry than it has for handling character strings.

VARCHAR: the basic SQL data type

SQL offers a few different types for storing data, but experienced database programmers tend to use VARCHAR whenever possible. VARCHAR means VARying CHARacter — a character string that can hold various data, similar to the Variant type in Visual Basic. If you use SQL’s more obscure types — DECIMAL, BIGINT, DATETIME, etc. — you will run into problems converting to and from the types your programming language uses. Pretty much every language understands character strings.

NULLS: the other useful SQL data type

In relational databases NULL means “no value” or “I don’t know the value.” NULL is not a type per se — any column can allow nulls. If you’ve ever had to invent “magic number” values like -1 and 99999 to indicate “none of the above” in your code you will find SQL’s NULLs very handy.

Keys and indexes

Relational database tables should have a key — a value identifying each record (or “row”). Keys should be unique in the database table, and if you can make them unique they are called primary keys. An index is a flag that tells the database that you will frequently be searching for records using the indexed column. Not having indexes on your columns will make the database seem slow, so remember to create indexes for each column. A database that can’t find records using indexes is not very useful.

Transactions

In e-commerce applications a transaction is the process of collecting payment from the customer before shipping the product. This logic should be reflected in the database: associate a payment with each order. In the context of databases the term transaction has come to mean any sequence of operations that are related in such a way that they need to happen in a certain order. A sequence of operations that look like a single operation (like a SQL stored procedure) is called an atomic transaction.

Joins and Normalization

A join pulls data from multiple tables into a single result. It’s very hard to get joins right so it’s best to design your database to avoid needing them. Joins are also very slow for the database engine, especially outer joins. You may be unlucky enough to use a database that was created by a Database Administrator (DBA) or someone showing off, though, so here’s a simple example of using a join. The DBA created two tables, one for managers and another for employees:

MANAGERS
id    dept    name
----------------------------------
 1    IT      Tom Tamale
 2    Accoun  Bob Bunny
 3    Shippi  Vince Vega

EMPLOYEES
id    dept    mgrid  name
----------------------------------------
 1    IT      1      Frank Fox
 2    Accoun  2      Dora Explorer
 3    Accoun  2      Ed Elephant
 4    Shippi  3      Gimli Dwarf
 5    IT      1      Dwight, Reginald
 6    NULL    NULL   Tim The Enchanter

Note: The dept column is declared VARCHAR(6) so some department names are truncated. To list all employees and their manager’s name you need to join the two tables:

SELECT EMPLOYEES.ID, EMPLOYEES.NAME, MANAGERS.NAME
FROM EMPLOYEES
JOIN MANAGERS ON MANAGERS.ID = EMPLOYEES.MANAGERID

Obviously the database and the code using it would be a lot simpler if the EMPLOYEES table just included the manager’s name. The process of simplifying a bunch of fragmented tables that need lots of joins into fewer tables with more data is called normalization. You’ll probably hear about “first normal form” and “fourth normal form” from other programmers or your DBA, if your company has one. First normal form just means all of the data fields you need to work with are in one table.

Replication

If you back up your database by copying the structure and data to another database, that’s replication. Sometimes it’s useful to have an older copy (a replica) of the database for testing, or for looking at records that have since changed. If a customer record is accidentally deleted from the main database you can sometimes recover it from a replicated copy.

ORM: Object-Relational Mapping

When object-oriented programming became popular a few years ago, programmers started to run into problems using relational databases from their OO code. This problem is caused by the object-relational impudence mismatch. One of the more popular fixes for this mismatch is the ORM. In OOP, an ORM is a software layer that maps properties of objects to records and fields in a database. The ORM’s job is to couple the application code to the database for efficiency, and to let programmers move data integrity enforcement and business rules out of the database and into application code. If you’ve ever tried to get a SQL constraint or trigger to work you’ll appreciate how much easier it is to have that kind of logic in a more modern language. Most programmers don’t understand databases or SQL, so it’s a good practice to move data management and business logic into the application.

Object Databases: The Future

One way around the object-relational impudence mismatch is storing objects directly in the database, without the tedious process of serializing to VARCHAR fields and back to objects again. This technology is still in its infancy but shows promise. Many programmers have had success storing large objects like XML documents and JPEG/GIF graphics in their native binary format in database records, as objects. As long as you’re careful not to abuse OOP techniques such as inheritance this approach may be the best of both worlds.

Conclusion

Relational databases are pervasive. Understanding the concepts and learning to use SQL is a requirement for many programming jobs. Although some of the ideas are old and SQL is not as powerful as modern languages such as PHP or Visual Basic, a working knowledge of databases should be part of every professional programmer’s toolbox.

Comments

John, 25 July 2007 at 11:49 am

Middle Tiers and Business Logic

Often you hear of “middleware” or “middle tiers”, more recently called “business logic”. The middle tier is a component of the outmoded design model which attempts to separate application computation from the data storage mechanism. However, as modern application design has proven, the tighter the data storage mechanism is integrated into the application, the more robust, maintainable, and faster the application will be. To achieve this, all computations on the data should be performed using Stored Procedures and Trigger. Stored Procedures are written in the most modern and powerful programming language there is — much more powerful that application programming languages — and offer the flexibility, type-safety, flow-control constructs, exception handling, multitudes of libraries, and other language features that you need to manipulate the complex data structures of your application. The wide availability and standardization of Stored Procedure Language will soon lead to the elimination of the old “stand alone” languages like C++ or Java.

The database can not be reasonably expected to maintain Consistency if you are constantly changing the data out from under it. That’s where Trigger comes in. Trigger, named after The Lone Ranger’s companion, is the work horse of the database. You must take care to maintain Consistency by putting Trigger to work. Each time data is changed — either updated, inserted, or even inspected — Trigger should run wild. Sometimes, to maintain Consitency, Trigger needs to modify some more data. That’s ok, since Trigger will run some more to make sure the modified data is also Consistent.

Ben Finney, 25 July 2007 at 7:48 pm

You’re an evil man, Greg; more please 🙂 I need to know all about Abject-Oriented methodologies if I’m to survive in today’s IT world.

maetl, 25 July 2007 at 8:37 pm

hilarious!

and sadly true, in terms of common maladaptions and misunderstandings

ted stockwell, 27 July 2007 at 8:39 am

It’s ‘object-relational impedance mismatch’, not impudence

Ben Finney, 29 July 2007 at 6:01 am

ted: You may be mistaking this for an article on Object-Oriented Databases. In working with Abject-Oriented Databases, there most certainly is an impudence mismatch.

Jeff, 30 July 2007 at 11:12 am

The process of simplifying a bunch of fragmented tables that need lots of joins into fewer tables with more data is called normalization. You’ll probably hear about “first normal form” and “fourth normal form” from other programmers or your DBA, if your company has one. First normal form just means all of the data fields you need to work with are in one table.

That is completely, 100% wrong. Are you just joking around or being dead serious? If I missed the joke and I’m taking your words too literally, I apologize.

Jeff, 30 July 2007 at 11:17 am

AH! I re-read more closely ! Thank goodness you are joking, I got a little scared there!! 🙂

Thanks for a good read.

– Jeff

El, 10 August 2007 at 8:05 pm

Nice introduction to ORDBMS.

I disagree on VARCHAR part, though. Modern database systems allow a great amount of different data types and the ability to define your own to different extents (from simple DOMAINS to full fledged data types with custom input/output/indexing methods and overloaded operators and functions).

This actually enables the database to manage data in a consistent and efficient manner. You simply can’t get that out of pure VARCHAR.

The problem of data conversion between the database and middleware is also not very big. Almost every language out there has a binding/driver interfacing with the database and transparently performing safe type conversion. Moreover, almost every datatype can be CAST AS TEXT when needed.

Other than that, great post!

Bill, 20 August 2007 at 11:30 am

This site is great!

I’m going to go back through all of my documentation for FISMA compliance, and see what I can adapt to fit the definitions presented here.

Keep the site running though, as I’m going to include lots of references to this site, and I have a lot of documentation to go through.

Another government office admitting the prevalence of Abject-Oriented Programming.