Most serious software applications store and retrieve data from databases. Relational databases are widely used to reliably manage large amounts of related data. Here are some database design and programming techniques that work well with Abject-Oriented Programming.
abject adj.: sunk to or existing in a low state or condition.
ACID: Atomicity, Consistency, Isolation, Durability
ACID describes the core properties of a database that guarantee reliable data storage and retrieval. Atomic means do all of the operations or none of them — don’t forget to insert a customer record when inserting an order. Consistency means using the same data types throughout your database. Isolate your database operations from the rest of your code, under multiple layers of database APIs or behind an ORM (more on those later). Durable means the database should still work after a server crash.
SQL: Structured Query Language
SQL is the language all relational databases understand. SQL is an ANSI standard so your SQL code should work on any of the popular flavors: Oracle, Microsoft SQL Server, Sybase, PostgreSQL, MySQL, SQLite, and MS Access. SQL is called a declarative language because you declare what the database server should do, as opposed to an imperative language. SQL is designed to be English-like and easy to use, so executives, managers and other non-technical people can use relational databases.
Back in the 60s or 70s IBM researcher Ted Codd described how relational databases should work in mathematical terms. Without getting into lots of details, if you didn’t understand algebra and calculus in college you will probably not understand how relational databases work under the hood. You can use SQL without understanding the math, though. Codd’s laws are a little dated and sometimes SQL shows its age: like Fortran, SQL has more functions for doing trigonometry than it has for handling character strings.
VARCHAR: the basic SQL data type
SQL offers a few different types for storing data, but experienced database programmers tend to use
VARCHAR whenever possible.
VARCHAR means VARying CHARacter — a character string that can hold various data, similar to the
Variant type in Visual Basic. If you use SQL’s more obscure types —
DATETIME, etc. — you will run into problems converting to and from the types your programming language uses. Pretty much every language understands character strings.
NULLS: the other useful SQL data type
In relational databases
NULL means “no value” or “I don’t know the value.”
NULL is not a type per se — any column can allow nulls. If you’ve ever had to invent “magic number” values like -1 and 99999 to indicate “none of the above” in your code you will find SQL’s NULLs very handy.
Keys and indexes
Relational database tables should have a key — a value identifying each record (or “row”). Keys should be unique in the database table, and if you can make them unique they are called primary keys. An index is a flag that tells the database that you will frequently be searching for records using the indexed column. Not having indexes on your columns will make the database seem slow, so remember to create indexes for each column. A database that can’t find records using indexes is not very useful.
In e-commerce applications a transaction is the process of collecting payment from the customer before shipping the product. This logic should be reflected in the database: associate a payment with each order. In the context of databases the term transaction has come to mean any sequence of operations that are related in such a way that they need to happen in a certain order. A sequence of operations that look like a single operation (like a SQL stored procedure) is called an atomic transaction.
Joins and Normalization
A join pulls data from multiple tables into a single result. It’s very hard to get joins right so it’s best to design your database to avoid needing them. Joins are also very slow for the database engine, especially outer joins. You may be unlucky enough to use a database that was created by a Database Administrator (DBA) or someone showing off, though, so here’s a simple example of using a join. The DBA created two tables, one for managers and another for employees:
MANAGERS id dept name ---------------------------------- 1 IT Tom Tamale 2 Accoun Bob Bunny 3 Shippi Vince Vega EMPLOYEES id dept mgrid name ---------------------------------------- 1 IT 1 Frank Fox 2 Accoun 2 Dora Explorer 3 Accoun 2 Ed Elephant 4 Shippi 3 Gimli Dwarf 5 IT 1 Dwight, Reginald 6 NULL NULL Tim The Enchanter
note: the dept column is declared
VARCHAR(6) so some department names are truncated.To list all employees and their manager’s name you need to join the two tables:
SELECT EMPLOYEES.ID, EMPLOYEES.NAME, MANAGERS.NAME FROM EMPLOYEES JOIN MANAGERS ON MANAGERS.ID = EMPLOYEES.MANAGERID
Obviously the database and the code using it would be a lot simpler if the
EMPLOYEES table just included the manager’s name. The process of simplifying a bunch of fragmented tables that need lots of joins into fewer tables with more data is called normalization. You’ll probably hear about “first normal form” and “fourth normal form” from other programmers or your DBA, if your company has one. First normal form just means all of the data fields you need to work with are in one table.
If you back up your database by copying the structure and data to another database, that’s replication. Sometimes it’s useful to have an older copy (a replica) of the database for testing, or for looking at records that have since changed. If a customer record is accidentally deleted from the main database you can sometimes recover it from a replicated copy.
ORM: Object-Relational Mapping
When object-oriented programming became popular a few years ago, programmers started to run into problems using relational databases from their OO code. This problem is caused by the object-relational impudence mismatch. One of the more popular fixes for this mismatch is the ORM. In OOP, an ORM is a software layer that maps properties of objects to records and fields in a database. The ORM’s job is to couple the application code to the database for efficiency, and to let programmers move data integrity enforcement and business rules out of the database and into application code. If you’ve ever tried to get a SQL constraint or trigger to work you’ll appreciate how much easier it is to have that kind of logic in a more modern language. Most programmers don’t understand databases or SQL, so it’s a good practice to move data management and business logic into the application.
Object Databases: The Future
One way around the object-relational impudence mismatch is storing objects directly in the database, without the tedious process of serializing to
VARCHAR fields and back to objects again. This technology is still in its infancy but shows promise. Many programmers have had success storing large objects like XML documents and JPEG/GIF graphics in their native binary format in database records, as objects. As long as you’re careful not to abuse OOP techniques such as inheritance this approach may be the best of both worlds.
Relational databases are pervasive. Understanding the concepts and learning to use SQL is a requirement for many programming jobs. Although some of the ideas are old and SQL is not as powerful as modern languages such as PHP or Visual Basic, a working knowledge of databases should be part of every professional programmer’s toolbox.