Category Archives: Common Dimensions and MDM/EAI

Common dimensions are the physical EDW repositories for enterprise master data such as customers, properties, products, employees, suppliers and facilities. This category explains how common dimensions (and common facts) support Master Data Management (MDM) and enterprise application integration (EAI).

The EDW and Customer Matching

The Customer Match Challenge 

Matching and linking customer records is at the heart of Customer Data Integration (CDI).  The goal is to achieve a “360 degree view” of your customer, including past and present relationships.  There are so many challenges to this task that it can seem overwhelming.  The matching algorithms themselves are computationally intensive, consuming valuable clock time, not to mention computing resources.  Customers exist in a web of relationships with companies, groups, and other individuals.  Even if a company holds data on the various entities of the customer relationship web, that data is often scattered across systems.  Connecting the dots is a singular challenge.

Ultimately, customer matching is a learning process.  The more you know about your customer, the better able you are (or should be) to match his or her transactions.  In the real world, we usually don’t have the luxury of “retroactively” applying lessons learned.  With CDI, we are given a chance to correct prior matching mistakes.  But this opportunity is a supreme challenge architecturally.  Even if the technical challenges can be met, the business value of accurate customer matching is realized only when the information is delivered to the right place at the right time.  The CDI matching solution demands the best integration of technologies and business processes that we can possibly muster.  

While there are no silver bullets to the customer matching challenge, the EDW can fulfill a pivotal role.  This post is intended to generate discussion and ultimately lead to some principles for positioning the EDW as the hub of the CDI solution.

The Architectural Problem

At the risk of oversimplification, let’s reduce the customer matching architectural problem to three main players—the source systems, the EDW, and the matching tool.  The source systems capture customer business transactions.  The EDW holds integrated customer profile information and granular business transactions.  The matching tool identifies distinct customers, and matches business transactions to customers based on complex matching rules.  The nature of the linkages between these three players are many and varied.  Depending on requirements, the linkages may need to be traversed in batch, real time, or near real time windows.

Fig. 1:  Customer Matching Players
Fig. 1: Customer Matching Linkages

Source to EDW – This is the normal ETL flow in which matched customer transactions and business profile updates are submitted to the data warehouse.  The question is—where in the data flow are the transactions matched? 

EDW to Source– This data flow direction does not exist in the traditional data warehouse model.  But with CDI, it may be necessary to synchronize source systems with the latest customer identities in the EDW.

Source to Matching Tool – Transactions are submitted to the matching tool for assignment of a customer match identifier .  The match identifier may represent a group of one or more candidate customer identifiers (i.e., “suspects”).  If the match score is sufficiently high, the transactions are assigned a distinct “golden” customer identifier.

Matching Tool to Source – The matching tool actively participates in assigning customer identifiers to the source transactions.  This could be a batch file of matched transactions, or a message of match candidates for a single customer interaction.

Matching Tool to EDW – In addition to assigning customer identifiers to source transactions, the matching tool is responsible for coming up with a distinct “golden” identity of each customer.   Customer identity information must then be stored in the EDW as a conformed and universally accessible customer dimension.

EDW to Matching Tool – The EDW is the keeper of intelligence about customers.  It is also the place where vast amounts of detail data are integrated, grouped, linked, and enriched.  Thus, the EDW is a logical source of data for the matching tool.  The EDW can relieve the matching tool of the data management responsibility so that it can concentrate fully on the matching tasks.

Note that successful customer matching requires a dose of human intervention at key points.  The need for human participation contributes to decisions about which processes should take place in “interaction windows” versus “batch windows”.  An interaction window is the time frame during which a business transaction involving people is consummated.  A batch windowis an arbitrary time frame during which data is processed after the fact, without influencing the originating business transactions.

Designing customer matching linkages that balance business requirements with technical feasibility is a key architectural challenge for a successful CDI implementation.    Pragmatism is your good counsel in this endeavor. 

How Can the EDW Help?

The Enterprise Data Warehouse can and should play a central role in the CDI architecture.  The main idea is that it is much easier to synchronize data around a common customer definition when the data to be synchronized exists in one place.  Setting aside for the moment the question of whether the EDW should exist as a source of record for customer data (i.e., a CDI “transactional hub”), let’s just say that maintaining a 360 view of the customer in an EDW is more natural and conceptually appealing than having to link up data across disparate systems and repositories. 

Without getting into implementation details, here are 10 ways in which a high-performance, purpose-built EDW can benefit customer matching.  We will explore these EDW capabilities, and how they benefit the business, through the follow-on posts and blog comments.  For now, I just want to get some ideas on the table for discussion.

  1. The EDW can organize transactional data in a way that will allow the customer matching tool to do its chores faster and more efficiently.
  2. By centralizing customer data in the EDW, the source application data flows do not have to be dramatically changed to accommodate customer matching.
  3. Linking the web of customer relationships inside the EDW can improve customer match accuracy by supporting both horizontal and vertical “chaining” of customer transactions.
  4. The EDW is well equipped to maintain active traceability of historical changes to customer profiles.
  5. The EDW can easily generate and maintain globally unique customer identifiers across the enterprise.
  6. The historical dimension of the EDW allows it to recreate the customer identities and relationships at any previous point in time.
  7. The EDW can efficiently maintain and traverse cross-reference tables that associate source transactions with current customer identities.  This capability can be leveraged to synchronize legacy systems with current customer profiles.
  8. The EDW can consolidate all customer profile data, not just those attributes required for matching.  This “one stop shop” model can simplify application integration.
  9. The EDW normalizes the inherit timing differences experienced when customer data is accessed from multiple disparate data sources.  This capability helps to fulfill the “single version of the truth” objective.
  10. Security and privacy of customer data can be better managed when that data is controlled at an enterprise level in the EDW.

Certainly, the EDW is not a panacea for customer data integration.  As stated earlier, CDI demands a convergence of the best technologies and business processes, and it will take time.  However, the EDW makes a lot of sense as a platform of data integration that can lessen the brunt of many customer matching challenges.  In my view, the EDW helps makes the end goal achievable and technologically pragmatic. 


Let’s here what you have to say.  Submit your comments related to CDI and customer matching.  What are the pros and cons of positioning the EDW as the hub of a CDI solution?  What specific matching challenges are you facing?  Where in the matching process is human intervention necessary or desired at your company?  How does your business use (or would like to use) customer information during those precious “interaction windows” when you have the full and undivided attention of your customers?  What kinds of advanced analytics would you use to promote customer objectives, and how could a EDW/CDI solution help?