Encapsulation is a design issue
that deals with how functionality is compartmentalized within a system: you
should not have to know how something is implemented to be able to use it.
The implication of encapsulation is that you can build anything anyway
you want, and then you can later change the implementation and it will not
affect other components within the system as long as the interface to that
component did not change. People
often say that encapsulation is the act of painting the box black – you are
defining how something is going to be done, but you are not telling the rest of
the world how you’re going to do it. For
example, consider your bank. How do
they keep track of your account information, on a mainframe, a mini, or a PC?
What database do they use?
What operating system? It
doesn’t matter to you because the bank has encapsulated the way in which they
perform account services. You just
walk up to a teller and perform whatever
transactions you wish.
In this article I describe the concept of
a database encapsulation layer, various encapsulation
architectures, and implementation
strategies for database encapsulation.
This article
is written following the assumption that you are working
with relational
databases, although if this isn’t the case much of the advice in this
article is still appropriate with slight modifications.
Table of Contents
-
Database
Encapsulation Layers
-
Encapsulation Layer Architecture
-
Implementation
Strategies
-
Marshalling
-
Error Handling
-
References and Recommended
Reading
Figure 1 depicts a typical
architectural layering strategy for building business applications which
includes a database encapsulation layer (also known as a
persistence layer
or simply a data layer). A database encapsulation layer hides the
implementation details of your database(s), including their physical schemas,
from your business code. In effect
this layer provides your business objects with persistence services – the ability to
read data from, write data to, and delete data from – data sources.
Ideally your business objects should know nothing about how they are
persisted, it just happens. Database
encapsulation layers aren’t magic and they aren’t academic theories;
database encapsulation layers are commonly used practice by both large and small
applications as well as in both simple and complex applications.
Database encapsulation layers are an important technique that every agile
software developer should be aware of and be prepared to use.
Figure 1. A
typical architectural layering strategy for an application.

An effective database encapsulation layer will provide
several benefits:
-
It reduces the coupling between your object schema and
your data schema, increasing your ability to evolve either one.
-
It implements all data-related code in one place.
-
It simplifies the job of application programmers.
-
It allows application programmers to focus on the business problem and Agile DBA(s)
can focus on the database.
-
It gives you a common place to implement data-oriented business rules.
-
It takes advantage of specific database features,
increasing application performance.
There are potentially several disadvantages of database
encapsulation layers:
Figure 2
depicts the simplest architecture for encapsulating access to a relational
database – a single application working with a single database.
In this situation there is the greatest potential for flexibility as your
team should be able to choose the implementation strategy, such as data access
objects or a persistence framework, that best fits your situation. Furthermore you should be in a position to evolve both your
object schema and your database schema as you implement new requirements.
Figure 2.
Single application, single database architecture.

A far more realistic situation to be in is
Figure
3 that depicts a multiple application, single database architecture.
This architecture is common in organizations that have a centralized legacy
database with which all applications work.
Another realistic situation is shown in
Figure
4 where there are multiple applications working with multiple databases.
In this case you are likely accessing your "preferred" database(s) as well as one or more legacy data sources.
Figure 3.
Multi-application, single database architecture.

 |
One interesting observation about both of these diagrams is
that some applications may not take advantage of the encapsulation layer(s) and
instead directly access data. There
are several reasons for this:
-
Your data encapsulation layer is written in a language
that some legacy applications can’t easily access (e.g. COBOL cannot easily
access Java).
-
You’ve chosen not to rework some of your legacy
applications to use the database encapsulation layers.
-
You want to use technologies, such as a bulk load
facility or a reporting framework, that require direct access to the
database schema. Note that this
may motivate your team to sometimes go around the encapsulation layer.
|
The point is that some applications will be able to take
advantage of your encapsulation layer(s) and some won’t.
There are still benefits because you are reducing coupling and therefore
reducing your development costs and maintenance burden.
Figure 4.
Multi-application, multi-database architecture.

Figure 4 makes
it clear that some applications already have an encapsulation layer in place. If this is the case you
might want to consider reusing the existing approach instead of developing your
own. By having a single
encapsulation layer that all applications use to access all data sources (where
appropriate) you potentially reduce the effort it takes to evolve your database
schemas via database
refactoring because there is only one encapsulation layer to update.
If you’ve purchased the encapsulation layer there is an opportunity to
reduce overall licensing fees because you only need to deal with one vendor.
The potential disadvantage is that the team responsible for maintaining
the encapsulation layer could become a bottleneck if they are unable or
unwilling to work in an agile manner.
Regardless of whether you intend to purchase, build, or download
an a database encapsulation layer it is critical for both
Agile DBAs and
application developers to understand the various implementation strategies.
There are four basic strategies – brute force,
data
access objects, persistence
frameworks, and services – that you should
consider using.
The basic strategy with the brute force approach is that business
objects access data sources directly, typically submitting Structure Query
Language (SQL) or Object Query Language (OQL) code to the database.
In Java applications this will be done via the Java Database Connectivity
(JDBC) class library and via Open Database Connectivity (ODBC) application
programming interface (API) in Microsoft-based applications.
Note that Microsoft has newer approaches, such as the ActiveX Data Object
(ADO) and Microsoft Data Access Component (MDAC) libraries, which encapsulate
and extend ODBC. Other
environments, such as Ruby, have their own native APIs which often take
advantage of existing ODBC or JDBC database drivers.
The brute force approach isn’t really a database encapsulation
strategy, it is what you do when you don’t have a database encapsulation
layer. However, it is a valid
option for database access. Furthermore
it is likely the most common approach because it is simple and provides
programmers with complete control over how their business objects interact with
the database. Because of its
simplicity this is a very good approach to take at the beginning of a project
when your database access requirements are fairly straightforward.
As your database access needs become more complex
data access objects or
persistence frameworks are likely better options.
Data access objects (DAOs) encapsulate the database access logic
required of business objects. The
typical approach is for there to be one data access object for each business
object, for example the Customer class would have a Customer_Data class.
The Customer_Data class implements the SQL/OQL/… code required to
access the database, similar to the brute force approach.
The main advantage of data access objects over the brute force approach
is that your business classes are no longer directly coupled to the database,
instead the data access classes are. It
is quite common to simply develop your own data access objects, although you may
also choose to follow industry-standard approaches such as Java
Data Object (JDO)s and DataObjects.Net. Note: This "industry standards" change every couple of years.
A persistence framework, often referred to as a persistence layer,
fully encapsulates database access from your business objects.
Instead of writing code to implement the logic required to access the
database you instead define meta data that represents the mappings.
So, if the Customer class maps to the T_Customer table part
of the meta data would represent this mapping.
Meta data representing the mappings of all business objects, as well as
the associations between them would also need to exist.
Based on this meta data the persistence framework would generate the
database access code it requires to persist the business objects.
Depending on the framework, see
this web search,
this code is either generated dynamically at run time or it can be generated
statically in the form of data access objects which are then compiled into the
application. The first approach
provides greater flexibility whereas the second provides greater performance.
Persistence frameworks will have a variety of features.
Simple ones will support basic create, read, update, delete (CRUD)
functionality for objects as well as basic
transaction
and concurrency control. Advanced
features include robust error handling, database connection pooling,
caching, XML support, schema and
mapping generation capabilities, and support
for industry standard technology such as
EJB.
Taking the persistence framework approach the job of an
Agile DBA becomes a little more complex but a lot less onerous.
You will be expected to install, if necessary, the persistence framework.
You will also need to work with the administration facility to define and
maintain the mapping meta data. In
the case of explicitly controlled persistence frameworks application programmers
will need mentoring in the use of the framework, often a very simple task.
For the sake of discussion a service is an operation
offered by a computing entity that can be invoked by other computing entities.
At the time of this writing the most popular
architectural strategy is
web services, however, as you see in Table
2, it is only one of several common strategies available to you:
-
Common Object Request Broker Architecture (CORBA)
-
Customer Information Control System (CICS)
Transaction
-
Electronic data interchange (EDI)
-
Stored procedures
-
Web Services
Services are typically used to encapsulate access to legacy functionality
and data, and there is a clear preference within the industry to build new
applications following a web services-based architecture to facilitate reuse via
system integration.
One way to identify services is overviewed
here. As you
can see, it indicates the potential need for business services such as check to
see if student exists, verify that a person is eligible to enroll, add applicant
to database, and calculate enrollment fees. To support these services you would
need to access the database. Whether you want to do that directly through DAOs,
or through a layer of "pure data services", whatever that implies, is up to
you. Arguably the first and third business service that I listed are mostly
data services, at least on the surface.
Let’s start with some terminology.
Marshalling is the conversion of an object into a data
structure such as an XML document or a data set.
Unmarshalling is the conversion of data to objects.
However, it is common to refer to both types of conversion simply as
marshalling. Don’t worry about
it.
Because (un)marshalling occurs at boundary points within
your system you need to consider validating the data.
You want to ensure that your data is in a valid state, that
individual values conform to business rules (e.g. someone’s age is less than
150) and to referential
integrity rules.
The
Joy of Legacy Data described many common data quality problems that you may
want to try to detect. There are
several data validation issues that you need to consider:
-
Do you validate the data at all?
-
Where is validation performed?
-
Do you validate automatically?
-
What do you do when you find a problem?
An important feature of a database encapsulation layer is
its ability to handle database errors accordingly.
Whenever the encapsulation layer interacts with a data source
there is a potential that an error can occur.
Common types of database-oriented errors include:
-
The database is not available
-
The network is not available
-
The request you made to the database is not correct
(e.g. improperly formulated SQL code, you’re trying to invoke a stored
procedure that doesn’t exist)
-
You are trying to work with data that doesn’t exist
(e.g. trying to update a deleted record)
-
You are trying to insert existing data
-
The data you want to access is locked (e.g. you want to
update a record that another user has write locked)
The list described above is nowhere near complete, you only
have to look at the list of error codes in the manuals for your database to see
this, but it is a good start. The
point is that errors happen and you need to be prepared to act on them.
A good encapsulation layer should be able to:
-
Detect database-oriented errors and continue
processing.
-
Optionally log the details pertaining to the error.
-
Report the error to the invoking application in an
intelligible manner.
|
|