Approaches to Reference Data Management

The Reference Data Accelerator in Collibra is a powerful and flexible tool for organizing and relating reference data.

The Problem

Many times, depending on the system, there are multiple Code Values (see Asset Types) that represent the same thing.

As an example let us consider the use case of country reference data. The Code Values AR, ARG, and 032 all represent the Country “Argentina”, depending on the System context you are in. For example, an ERP System may use numeric country Code Values , while a system that interacts with the ERP may use the alphabetic ISO Code Values. The goal is to assign this relationship in Data Governance Center. This way, users can not only understand that “AR” means Argentina in the context of the ERP system, but that it also could be represented in other places by “ARG” or “032”. 

Approaches

We distinguish three general approaches to Reference Data Management:

  1. Make the Code Values Attributes of the Asset;
  2. Relate Multiple Code Values to the Same Asset;
  3. Create a Complex Relationship Between the Code Values.

We illustrate each of them using or country reference data use case.

Approach 1: Make the Code Values Attributes of the Asset

This approach requires creating custom Asset Types and Attribute Types. An Asset Type “Country” must be created with a specific Attribute Type for each ISO Code Set, i.e., ISO-2-digit, ISO-3-digit, and ISO Numeric. The result for the country asset named “Andorra” is depicted as follows.

Advantages of Approach 1:

  • Simplicity, i.e. all information is one one page.

When to use Approach 1:

  • The codes do not need to be traced to any other asset.
  • The codes will never be reused.
  • There will not be new types of codes in the future.

When Approach 1 is insufficient:

This approach fails to take advantage of the traceability and reuseability that Data Governance Center offers. For example, there is no way to express the fact that the 2-digit country code relates to a certain Table Column in a Database. Another example would be if there was a new type of code in the future that is a 4-digit code. A new Attribute Type will have to be created each time there is a new type of code. It is recommended to evaluate the next two approaches.

Approach 2:  Relate Multiple Code Values to the Same Asset

A tutorial on how to recreate Approach 2 can be found here.

This is accomplished by creating a Relation between any number of Code Values to a Business Asset.  In the running country code example, the three instances of Code Value are each related to the same Country. AR, ARG and 032 are all created as individual Code Value Assets. They are each then related to the Argentina Business Asset

[bok-callout]In below screenshot, Country is a custom Business Asset type. However, creating a custom Asset Type is not necessary. A Business Term could be used instead if desired.[/bok-callout]

See the example for the country named “Argentina”:


 

Advantages of Approach 2:

  • Each Code Value is an Asset in itself and therefore has its own life cycle. Aspects of the Code Value like relations, Responsibilities and Status can be managed per individual Code Value, with workflows.
  • Additional Code Values for an asset can be added in the future without having to add a new custom attribute type to the operating model.
  • By using Relations instead of custom Attribute Types, Traceability is created between an asset and its code values (See the Traceability Diagram above).

When Approach 2 is Insufficient:

  • The Business Asset that the Code Values relate to is unknown or doesn’t exist.
  • The Code Values need to be directly related to many other assets.
  • There is Transformation Logic between the pairs of Code Values that needs to described. (See CRM System example below)
  • Multiple Code Values map to a single Code Value, or there is a many to many relationship between Code Values.
  • The mapping between two Code Sets needs a description or Transformation Logic.

Approach 3: Create a Complex Relation

When two assets are related, but there are exceptions within the relationship that require a description or some kind of Transformation Logic, a Complex Relation may be required. To continue the example of country code reference data, consider a situation where there is a CRM system that uses a modified version of the ISO 3-digit Code Set. In the CRM system, the developer added the prefix “CC-” to all the 3-digit ISO codes to show that they are country codes. In the CRM System the code for Andorra is CC-AND. The mapping is always based on this Transformation Logic. This logic is important for users and should be described in Data Governance Center.

 

Advantages of Approach 3:

  • Able to document exceptions or Transformation Logic between specific Code Values.
  • Able to document the Crosswalk between multiple Code Sets.
  • Code Values do not have to relate to a Business Asset.

 

For an illustration on importing reference data using approach number 2 we refer to Importing Reference Data from Excel: ISO Country Codes and Subdivisions