Unless you are lucky, or haven’t been programming for very long, you’ve probably had to write code to work with a “legacy” system. The recruiter promises you’ll be doing all new development using the latest cool language. It may take a while for you to find out that all of the enterprise data and business logic lives in a poorly-understood, unmaintained system written in some language you’ve never heard of, running on the big beige box in the corner of the server room that has tape over the power switch. If the software vendor is still in business they can’t support the system, because it has been modified so much it can’t be upgraded to the latest release. Management has a project underway to find a replacement, but until then all new code will have to work with the zombie system.
You need to make all new code as independent of the legacy system as possible, just in case the legacy system is replaced. Even if you don’t care about planning that far ahead, you probably don’t want to just tack your new code onto the legacy system. The best approach is to write a software layer to isolate the legacy system, a library of functions that can get data in and out and invoke business logic in the old system. The legacy system probably won’t have an API you can use directly, and if it supported something like web services it wouldn’t be legacy. Code that sits at the boundary between incompatible systems can be some of the trickiest (and ugliest) but it’s usually better to do it once rather than building one bolt-on after another. Some of the techniques I’ve had to use to interface with legacy systems:
- Writing a file to a shared directory, then waiting for the legacy system to wake up and read that file, process it, and put the result in another file;
- Same as above, but with FTP;
- Writing a file somewhere, then triggering the legacy system with a COM object or something like it;
- Writing to a database, not necessarily a relational database, using some proprietary language or a klunky standard like ODBC;
- Over a telnet connection;
- Over a serial connection, maybe even over a modem because that’s the only outside communication the legacy system supports;
- Simulating a “green screen” terminal in software, essentially pretending to be a human user;
The software vendor may have a COM or Java API that does the dirty work for you, wrapping the file write/wait/read steps so you don’t see how kludgy it is. If not, or if the vendor API is a useless subset of the functionality you need, you will have to write a lot of tricky code in a language that will make Java and Python look like advanced alien technology. Getting this much to work is hard enough, but it’s just the beginning. The interface library can hide the ugly details of getting things in and out of the legacy system, but the shortcomings of the legacy system will limit and taint the new code that has to work with it.
One common problem is batch-oriented systems. For example, a legacy order processing system might do credit card authorizations on an hourly schedule, or it might authorize in batches when it fills orders from inventory. A web site that sends orders into such a system can’t authorize credit cards in real-time, so it can’t tell the customer that their card was declined, or detect certain kinds of fraud. Batch-oriented systems may not be able to give real-time inventory status, which makes it hard for a web site to let customers know when to expect their order.
Another problem is latency. A web site that needs real-time customer or order information from a legacy system originally designed to support 31 terminal users will bog down even under moderate load. Preparing reports that need to make thousands of individual queries can take hours.
The programming and IT staff become responsible for an ever-growing web of special files, overlapping cron jobs, directories that must be cleared out, and arcane recovery processes documented on printed emails pinned to cubicle walls. New software does not make any of the legacy problems go away; in fact it can easily make the system more complicated and fragile.
My point is that when faced with building on a legacy system you should be making a clear boundary between the old code and the new. You don’t want to spread the icky details of the old system all around your shiny new classes. The legacy system’s functions probably won’t map to an object-oriented design (or any modern software architecture). You’ll have a mismatch that the new system will have to deal with, because the legacy system isn’t adaptable.
Legacy Relational Databases
Relational databases are like legacy systems. Relational databases can usually talk over TCP/IP or sockets, and they understand a more-or-less standard language (SQL), but that just gets you past the first set of hurdles. Relational databases don’t mesh with the object-oriented model; this is commonly called the object-relational impedance mismatch. Object-oriented systems model the application domain with objects; relational databases organize data in sets of rows (or tuples) with relationships expressed by common keys. Object-oriented programming tries to optimize for software lifetime cost; relational databases optimize for data integrity and performance. It’s possible to use a relational database as simple backing store for objects, but that doesn’t work if the database schema is already defined and the database is already in use. The data elements and relationships are defined in a schema that is not easily changed. Business logic is implemented in the database, either implicitly in the schema, or explicitly as stored procedures that work on the database. Look at one popular, modern technology — Ruby on Rails — and you see that the successful developers were able to design a new database schema to suit Rails. Dig around for stories about hooking Rails up to a legacy database and you get a different view: shimmed-in code to make the database work with the ActiveRecord pattern, compromises to Rails’ functionality, and fragile code that “knows too much” about the database schema. I’m not saying there’s anything wrong with Rails in this regard. Building new software on top of a legacy relational database is the same as building on a legacy software system. It follows that you need to isolate the new code from the database in the same you would wall it off from the twenty-year-old COBOL system that runs the business.
The “object-relational impedance mismatch” is caused by trying to make a too-tight connection between OO and relational systems; it’s not a side-effect of SQL’s old age or uptight database administrators. When I’ve had to hook up a new system (usually a web site) to a legacy relational database I’ve avoided the “impedance mismatch” by keeping the systems apart. Instead of embedded knowledge of the database inside my classes, and trying to graft CRUD operations onto a complicate schema, I’ve made a library to do operations on the database. For example, instead of making a customer object that has to model the application domain’s concept of customer and map that to a database that represents a customer differently, I write some functions to do things to customers: retrieve data my application needs, make updates to the customer, attach an order to a customer account, etc. The database is treated like a collection of abstract objects, black boxes that do things to the database without exposing my application to the structure of the database.
Separating the new application from the legacy database avoids the “impedance mismatch” because the new OO code doesn’t try to query or manipulate the database directly. Keeping new OO code isolated from a legacy system or database lets you abstract the behavior of the legacy system or database, though obviously limited by the abilities of the legacy system. I think this a more useful approach, and your new code has a better chance to be useful if that legacy system or database ever does go away.