The primary goals of Master Data Management (MDM) are to
promote a shared foundation of common data definitions within your
organization, to reduce data inconsistency within your organization, and
to improve overall return on your IT investment. MDM, when it is done
effectively, is an important supporting activity for service oriented
architecture (SOA) at the enterprise level, for enterprise architecture
in general, for
data warehouse(DW)/business intelligence (BI) efforts, and for software
development projects in general. Traditional approaches to
data management (DM), particularly those based on extensive modeling and
a serial approach to performing the work, have a poor track record in
practice. MDM is likely to struggle if you do not move away from
traditional DM strategies. In this article I show that agile software
development strategies offer significant value for MDM efforts,
strategies based on evolutionary development, collaborative approaches
to working, and focusing on providing concrete value to the business.
Agile software development (ASD) is an evolutionary approach which is
collaborative and self-organizing in nature, producing high-quality systems that
meets the changing needs of stakeholders in a cost effective and timely manner.
MDM and ASD are clearly different things, although they are clearly compatible.
An agile approach to MDM:
- Addresses basic MDM activities
- Is collaborative
- Is embedded within the development process
- Is an enterprise activity
- Is evolutionary
- Is usage-driven
- Produces measurable results
- Delivers quality through testing
- Adopts a lean governance approach
- Requires a cultural shift
The main differences between "Agile MDM" and "traditional MDM"
are centered on the approach to doing the work, not the fundamental work itself;
In other words, when you do the work, how you do it, and who you do it with are
the critical issues. An agile approach to MDM achieves the goals of MDM
(promoting common data definitions, reducing data inconsistency, and improving
IT ROI) by embedding MDM activities into the overall software process in a
manner which reflects the environment of modern IT departments. The
following basic MDM activities are still performed (if and when they make sense)
with an agile approach, but as you'll see they're accomplished in a more
effective and efficient manner:
- Classify data elements (data classification)
- Consider data access (data security)
- Identify pertinent master data elements (MDEs) such as entity types,
data elements, associations, and so on.
- Define and manage metadata pertaining to MDEs, including:
- Primary source(s) of record for MDEs
- How systems access MDEs (identifying producers and consumers)
- Volatility of MDEs
- Lifecycles of MDEs
- Value to your organization of individual MDEs
- Owners and/or data stewards of MDEs
- Adopt tools, including modeling tools and repositories, to manage MDM
metadata
Any agilist reading the above list is likely reeling from the potential for
out-of-control bureaucracy surrounding MDM. Considering the past track
record of most data management efforts, more on
this in a bit, this is a significant concern. As you'll soon learn in
the rest of this article it is in fact possible to streamline MDM efforts so
that the value is achieved without the pain of needless bureaucracy, although as
you would imagine this will require significant culture
shifts in some organizations.
 |
The best way to deliver value is to work closely with development teams and their
stakeholders to ensure that the MDM effort is focused on supporting the creation
of business functionality that stakeholders actually need now, not at some
undefined point in the future. Traditional, documentation heavy,
command-and-control approaches to MDM are often doomed to failure because the
MDM program is too tedious for teams to follow. With a collaborative
approach to MDM:
- The
enterprise administrators and
enterprise architects are actively involved with working with the teams
to support and enhance the MDM efforts.
- They make it as easy as possible for the development teams to do the
right thing by collaborating with them to do so.
- They do a lot of the "MDM grunt work" which the teams would have
otherwise avoided.
- You work together in face-to-face collaborative working sessions.
These prove to be far more effective than traditional approaches such as
formalized meetings, reviews, or functionally distributed teams (where the
data specialists work on their own in parallel to the development teams).
|
It is very easy to claim that you intend to take a collaborative approach to
MDM, but a lot harder to actually do so. Traditional data management has a
poor track record of working together closely and effectively with development
teams, as you see in Figure 1. This chart summarizes the results of two
questions asked in
Dr.
Dobb's Journal (DDJ)'s 2006 State of Data Management Survey -- the first
question asked whether development teams find the need to go around their
organization's data group (the majority did) and the second asked why they did
so. Interestingly, 25% of the problem was around simple education issues
with developers (they need to know who to work with and when to do so) and 75%
of the problem rested on the shoulders of the data group (people either found
them too difficult to work with, too slow, or simply didn't offer sufficient
value). The point is that if your development teams are currently
frustrated with the level of service provided by your organization's data group
then it will be that more difficult for the data group to make inroads into the
teams to support any sort of MDM effort.
Figure 1. Reasons why development teams go around data groups.

If the MDM activities, particularly the
ones involving work to identify and capture metadata, are separate from
day-to-day development activities then there is very little chance of your MDM
program succeeding. The easiest way to embedded MDM activities into your
development process is to educate team members on the importance of MDM and to
ensure that one or people have the appropriate skills to collaborate with the
enterprise administrator(s) and
enterprise
architect(s) responsible for MDM efforts. If your team has one or more
agile DBAs then MDM
activities should be part of their daily jobs, and ideally they will have tools
which automate as much if this work as possible.
The challenge is that development teams in general, and in particular agile
teams with their focus on high-value activities, will be reticent to do this
sort of data-oriented work if they perceive it as extraneous. Worse yet,
few development methods explicitly include these sorts of activities, in part
because the people behind the methods often lack experience in such activities
but mostly because the data community struggles to make their techniques
relevant to modern-day development.
MDM by definition must have an organization/enterprise-level view, and an
agile approach to MDM is no exception. However, that doesn't mean that MDM
has to be an onerous, command-and-control activity which does little more than
justify the existence of your data management group for the year or two that
they're able to milk MDM before it fails due to not
producing measurable value. Instead, with a
collaborative and lean approach your
enterprise administrator(s) and
enterprise
architect(s) can achieve the stated goals of MDM in a sustainable way.
Agile MDM is both a project-level and an
enterprise-level activity, and the needs of these two levels will need to be
balanced in a manner which reflects your
unique
situation.
The evidence that
evolutionary, iterative and incremental, approaches to software development are
superior to serial approaches has been mounting for years. This is true of data-oriented
activities too, as this site clearly shows. Technically it is
quite easy to take an evolutionary approach to IT activities, including data
activities, but that often the true challenges prove to be around overcoming
cultural challenges.
Not only is it possible to
analyze legacy data sources, to collect metadata,
and then support development teams in an evolutionary manner you really have no
choice in the matter. This is obvious for several reasons:
- For all but the smallest organizations you simply
can’t do all of the requisite legacy analysis and metadata collection all up
front without it changing underneath you before you can make it available.
- The business environment is going to change anyway so
you’re going to have to evolve your data definitions over time, like it or
not.
- You’re only human and as a result you’re going to make
mistakes. You have to assume that your understanding of various data
elements will change over time regardless of how much time you actually put
into the initial definition efforts.
- The needs and priorities of development teams will
change throughout the lifetime of a release of a system, let alone the
lifetime of the system itself. This will affect how you prioritize your MDM
activities.
- If your organization chooses to grow through
acquisition or partnership then the new firms that you acquire and/or work
with will likely have different viewpoints which will motivate you to evolve
your existing perceptions.
With an evolutionary approach to MDM you want to work in priority order.
This order should be set by the business not by the IT department.
A common Agile strategy, exemplified in development methods such as
Open Unified
Process, Extreme Programming (XP), and
Microsoft Solution Framework (MSF)
for Agile, is have the stakeholders prioritize the work to be done, not the
IT professionals. This strategy is depicted in
Figure 2
and described in detail in
Agile
Requirements Change Management. This enables you to maximize return on
investment (ROI) because you're always working on the most important
functionality required by your stakeholders. Yes, your
enterprise architecture and
enterprise business modeling efforts will still guide your work, but this
guidance will be reflected in the overall prioritization of the work.
Figure
2. Agile
requirements
change
management
process.

This is probably the most radical advice which I present in
this article – data is a secondary concern for MDM, not a primary one. An
IBM study into CRM showed that the primary
success factors for CRM were business-oriented and cultural in nature and not
technical. Considering that MDM is arguably CRM applied to all major business
concepts and not just customers we should really take heed of these findings.
In other words, you must focus on usage, not on data.
With a usage-driven approach your major requirements
artifacts explain how people will work with, or interact with, the system.
Examples of such artifacts include
use cases,
user stories, and
usage scenarios
which are primary artifacts of OpenUP, XP, and MSF for Agile respectively.
Business process models could also arguably be used here, but none of the major
agile development methodologies use them as a primary artifact although
Agile
Modeling includes them as potential models which you should apply where
appropriate. When these artifacts are created rigorously they often refer to
other types of requirements, such as
business rules and report specifications.
However, these sorts of details are often explored on a
just-in-time (JIT) model
storming basis during the project so many agile teams won’t invest in rigorously
documenting them because the useful lifetime of such
documentation is very
short.
The value in usage models, in particular use cases and
usage scenarios, is that they focus on the business objectives which end users
are trying to accomplish by using your system(s). If your stakeholders are able
to prioritize the various usages, then suddenly development teams find
themselves in the position of being able to not only deliver something of
concrete value, the implementation of the various usages, but if they implement
them in priority order then they will maximize stakeholder’s return on
investment (ROI) in IT.
A common mistake which often leads to failure is to let
technology decisions drive your prioritization strategies. For example, a
favorite IT strategy is to work on one legacy system at a time, analyzing and
then cataloging the metadata for the entire system. This sort of initial,
detailed cataloging effort can take years to accomplish and will more than
likely run out of steam long before any concrete results are produced. Another
ill-fated strategy is to focus on specific data entities one at a time.
Although this approach has more merit than the previous one, you may find that
you need to do this for a large number of entities before you can start
providing real business value from your efforts. The fundamental problem is
that technical prioritization strategies do not reflect the priorities of the
business which you are trying to support, putting any IT effort, including MDM
efforts, at risk because your stakeholders aren’t receiving concrete value in a
timely manner. When stakeholders don’t perceive the value that they’re getting
for their IT investment they quickly start to rethink such investment.
Worse yet, some MDM efforts run aground on the “one truth” shoals – they
strive to develop one definition for each data entity within an organization.
In theory this is a laudable goal but in practice it’s virtually impossible
because few organizations can actually come to an agreement on the definitions
of major concepts. Furthermore, it’s often a competitive advantage for your
organization to treat various concepts differently at times based on the given
context. A wonderful example of this is
HSBC’s series of billboard and airport
advertisements around the world showing two different pictures with captions,
then showing the same two pictures with the captions swapped.
Figure 3 is a picture that I took in a hallway in
London's Heathrow airport. In short,
efforts to try to identify the “one truth” are likely misguided and unlikely to
actually produce value. My advice is to worry less about gathering perfect
metadata and instead focus on delivering valuable business functionality.
Figure 3. Questioning the "One Truth" philosophy.

Many traditional IT efforts find themselves in trouble when they take a
document-based approach to reporting progress. For example, in
earned value
management (EVM) you claim progress against your plan when you achieve various
milestones called out in those plans. On traditional software development
projects these milestones are typically based on delivery of key documentation
such requirements specifications, design specifications, test plans, and
eventually the working system. Traditional MDM efforts may choose to measure
earned value in terms of the metadata collected, such as the number of entity
types or entity attributes defined. The challenge to a document-based approach
to measuring earned value is that there is a tenuous relationship between
documentation and actual delivery of working functionality which actually
provides real value to business stakeholders. When you think about it,
you’re doing little more than justifying bureaucracy with document-based EVM.
Agile teams “earned value” in the form of a working solution, which for a
software development project is the delivery of working software and for a DW/BI
project the delivery of analytic data and supporting reports. Therefore, with an agile approach to MDM your focus shouldn’t be on collecting
metadata (although you will still do that) but instead should be on:
- Supporting project teams to deliver high-quality
working software which meets the
changing needs of their stakeholders
- Supporting business stakeholders to access and manipulate data,
typically via a DW/BI solution
In other
words, don’t do MDM for the sake of doing MDM, instead do it to streamline
stakeholder-facing data-oriented activities. The only valid way of
measuring your MDM efforts isn’t by number of data elements collected but
instead by number of “data conformant” reports, data conformant web services, or
data conformant components
delivered by project teams.
Agile software development teams work in
priority order,
as you saw in Figure 2, and thereby they maximize stakeholder return
on investment (ROI) by focusing on delivering the highest value functionality at
any given time. If all of your development teams work in this manner, and
because agile MDM work is embedded in the
development process, you similarly will
maximize the ROI on your MDM efforts.
This differs from traditional MDM efforts which try to capture the required
metadata in a "big
modeling up front (BMUF)" style effort. This is often in the form of a
multi-month if not multi-year effort run by a DM project team in parallel to
actual software development projects. There are several problems with the
traditional approach to MDM:
- It can be months, if not years, before tangible results are produced.
Although many organizations believe that they can succeed at long-term
efforts such as this, few actually can in practice. Larissa T. Moss
points out in Critical Success Factors for
MDM that in the past the data community had a very poor track record
with similar metadata schemes which had long-term paybacks.
- Immediate efficiencies are forgone. Although the MDM effort
may inevitably produce a comprehensive repository of metadata it misses
immediate opportunities to provide actual value to the business. If
the MDM effort does eventually achieve a positive ROI it will be lower as a
result.
- Needless work will occur. People are not good at judging up
front what they want, we've found that when you define
detailed
requirements specifications early in the development lifecycle nearly
half of the identified functionality is never used by end users.
Therefore it is likely that a traditional approach to MDM where you try to
comprehensively define the required metadata is equally likely to result in
significant wastage.
Agile software developers typically take a
test-first approach to development, also called test-driven development (TDD)
or behavior driven development (BDD), and this is not only
possible
for data professionals it is highly desirable. With a test-driven approach
you write a single test before doing the work to fulfill that test, in effect
creating a detailed specification for that functionality before implementing
it. Better still, you can run the tests on a regular basis and thereby validate
your work in progress. A test-first approach, in combination with other
agile
testing activities, greatly increases the quality of the work delivered. This
shouldn’t come as a surprise – testing as early as you possibly can, and fixing
the defects that you do find, and doing
so more often, leads to improved quality.
Traditional teams often take a review-based approach to development,
particularly early in the lifecycle when they have no software to work with.
Although better than doing nothing at all,
reviews prove ineffective in practice
when compared with regression testing when it comes to quality. Reviews have a
very long feedback cycle, often weeks if not months, and as a result the costs
of addressing defects are much higher than techniques (such as TDD) with
shorter
feedback cycles. If someone can offer actual value in a review, why not
have them involved with the actual work to begin with? In short, reviews often
seem to be a stop-gap measure which compensate for poor collaboration or lack of
quality focus earlier in the lifecycle. It is far better to address the real
problem, hopefully with Agile strategies, than to simply put a band-aid over it
and hope for the best. And the numbers clearly show that traditional approaches
to data quality are failing in practice – The Data Warehouse Institute (TDWI)
reports that data quality problems result in a loss of over $600 Billion
annually in the United States.
Traditional governance often focuses on command-and-control strategies which
strive to manage and direct development project teams in an explicit manner.
This approach is akin to herding cats because you'll put a lot of work into the
governance effort but achieve very little in practice.
Agile/lean data governance
focuses on collaborative strategies that strive to enable and motivate team
members implicitly. This is akin to leading cats – if you grab a piece of raw
fish, cats will follow you wherever you want to go.
An important component of data management is governance of the MDM metadata
and of the source data which it represents. My experience is that a
traditional, command-and-control approach where the DM group “owns” the data
assets within your organization and has a “death-lock” on your databases proves
dysfunctional in practice. At best it results in the DM group becoming a
bottleneck within your IT department and at worst it results in the development
teams going around the DM group in order to get their work done, effectively
negating your data governance efforts (some alarming statistics on this in a
minute). A better approach is to:
- Include data professionals as active participants on development teams.
When your DM group is external to project teams it can foster a “them vs.
us” mentality within your IT organization if you’re not very careful. You
don’t need to have an external group to run your data governance activities,
instead individual data professionals can do so as part of their
responsibilities on development teams in a collaborative and timely manner.
This is one of the fundamental concepts of the
Agile Data method.
- Streamline data standards and supporting activities. When data
standards, including master data definitions, are sensible, easy to
understand, and easy to access then there is a significantly greater chance
that people will actually follow the standards in practice. When you force
people to conform to standards, when it make it onerous for them to do so,
then you reduce the chance that they will actually do so. Your
data
administration efforts need to be based on collaboration and enablement,
not command-and-control.
- Educate developers. Developers need to understand why your MDM efforts
are important, what the benefits are, and how to work together with your DM
team. When they know why something needs to be done, and how to do it
effectively, chances are much better that they’ll actually do it.
The real challenges with MDM have
nothing to do with technology but instead with people. In many organizations
there is a significant
cultural impedance mismatch that you need to overcome
between the data management group and the development teams. This will take
time. This mismatch was revealed in the results of the
IBM survey into CRM as well as a
data management survey performed by Dr. Dobb’s Journal in the Fall
of 2006. The survey found that 66% of respondents indicated the need to
go around their data groups at time, and that of those people 75% indicated that
they did so because the data groups were too slow to respond to their requests,
provided too little real value to the development teams, or were simply too
difficult to work with.
The data community must recognize that we can do better than the traditional
strategy for MDM, and for data management in general. Although many data
professionals prefer traditional, documentation-heavy approaches they must
recognize that the rest of the IT community has moved on and have adopted more
effective ways of working. An Agile approach to MDM is more effective than a traditional approach, for
several reasons:
- The traditional data management (DM) track record is poor. If
you apply traditional DM strategies to MDM this it is fair to assume that
you will experience the same levels of success achieved with Customer
Relationship Management (CRM) and metadata repositories in the past. Sadly,
an IBM Global CRM Survey of over 370 companies worldwide found that in
America, Europe and Asia,
85 percent of companies did not feel that their
CRM efforts were fully successful. To be fair perhaps organization’s
expectations weren’t realistic, but if your DM group is making similar
promises about MDM that were made about CRM a few years ago then you have
cause for concern. Furthermore, as Larissa T. Moss points out in
Critical Success Factors for
MDM, the data community has clearly struggled in the past with similar
meta-data schemes. We need to get off the traditional treadmill and
start adopting strategies which have a chance of succeeding in practice.
- The Agile track record is better. Dr. Dobb's Journal (DDJ)'s
2007 Project Success Survey
showed that agile project teams have a 71.5% success rate compared with
62.8% for traditional teams. Agile enjoys a higher success rate due to
its greater focus on return on investment (ROI), it's increased ability to
meet the actual needs of business stakeholders, and its greater focus on
quality.
- The Agile community leads in DM thinking. The agile data
community represents the leading edge of data-oriented techniques.
This community has lead the way in
evolutionary/agile data modeling,
database
refactoring,
database
testing, database integration, and even
agile administration techniques. We’ve addressed many of the
issues which have thwarted the traditional community for years, particularly
when it comes to data quality.
Master Data Management (MDM), when implemented correctly, can provide
significant value to your organization. Unfortunately, our track record with
similar efforts in the past, in particular Customer Relationship Management
(CRM) and metadata repositories before that, were less than ideal. I believe
that you will greatly increase your chance of success by apply agile techniques
such as working in an evolutionary manner, taking a usage-driven approach,
focusing on measurable results, working collaboratively, delivering quality
through testing, and adopting a lean approach to data governance.