OMDP Step 1 – Transform Familiar Examples into Asset Types, Attribute Types and Relation Types

[bok-callout]Objective & Outcome
In OMDP Step 1, we identify instances of assets, relations, attributes or domains in metadata use cases, and determined their type. The outcome is a first version of the Data Governance Operating Model formed by asset types, attribute types, relation types and domain types.[/bok-callout]

This step of the OMDP is described and structured with example scenarios.

In process modeling, the requirements analysis usually starts from collecting examples of processes to be carried out by the system. Such process examples are called “use cases”. The description of use cases was standardized in the Unified Modeling Language (UML). 

For devising the Data Governance Operating Model, we follow a similar procedure. We start from a collecting key metadata use cases. The metadata use case contains familiar examples of assets, attributes that capture their meaning, and relations between them. Use cases are usually stored in a spreadsheet or CSV file, and have been managed in Excel or alike. 

The operating model design’s overall goal is to find a suitable structure for this metadata. 

The goal of this first step is to discover Structural Concepts such as assets, attributes and relations in this examples, and transform them into types in the Operating Model.

[bok-callout type=”green”]Scope and Focus by Asking the Right Questions

If you are not sure what examples to start from, take a step back and go back to your main objective for your project. From your data governance project objectives you can identify a number of challenging questions. Many of these questions motivate the data governance case. The starting point you choose to tackle the problem prescribes the order in which to define your assets. E.g.:

  • If you start with Business Glossary, then the order of capturing your assets will be first Business Terms, Acronyms, and then relate these to either Rules or Data Assets.
  • If you start with Reference Data Management, the the order is Code Value, Code Set, Business Term, and /or Data Assets.

Example 1 Consider the following example for the asset ‘Customer’ in a typical combined business glossary and data dictionary case.

  1. What does the term ‘customer’ precisely mean? 
    • When trying to answer question (1), we may stumble upon a disagreement between business units on the exact definition of the term “customer”. The result is the appreciation that there may be different definitions of “customer” that co-exist.
  2. Where can we find information on “customer”?  
    • When assessing question (2), one may conclude that a multinational company has more than one system and/or database, geographically distributed, that store information about these different types of subscribing customers. 
  3. How is “customer” information structured? 
    • This entails also that the answer to question (3) is not straightforward: the data dictionaries of these databases and systems were set up in different ways, at different times, for different requirements. It will be very unlikely that they match.

These key data governance questions then lead to the identification of key traceability paths which we formulate in terms of asset types, attribute types and relation types. Next example goes further into this.

Example 2 Consider the following example for ‘Address’.

[/bok-callout]

Exercise

When you finished reading this page, try to analyze the examples in the attached spreadsheet and design your first asset-, relation-, and attribute-types. 

Scenario 1 – One Asset + One Attribute + One Domain

Consider, the following table extract that captures a representative part of a business glossary use case.

word meaning context
Customer a party who buys at least one product per year Sales
Account a record or statement of financial expenditure or receipts relating to a particular period or purpose Finance
Segment Customer a party representing a segment that buys at least one product per year Sales
Project an individual or collaborative enterprise that is carefully planned and designed to achieve a particular aim Operations

Every row represents metadata for one specific asset’s attributes, or multiple ones in case the metadata represents a relation. First, we identify the Structural Concepts behind the columns of the spreadsheet. Next, these concepts then translate in asset types of the operating model (see Asset Types). Consequently, you have to identify Attribute Types, and how the different concepts are related to each other, which translates then in relation types in the operating model.

Summarizing:

  1. For every column, identify whether it concerns one of the following Structural Concepts:
    1. an asset to be managed on its own;
    2. an attribute value that captures part of the meaning of an asset;
    3. (for every two columns) a relation between assets;
    4. a domain or community name.
  2. For every identified concept, determine its type:
    1. for every identified asset, determine what asset type it instantiates: e.g., for row 1: is the asset with name “Customer” a “Business Term”, Data Element” or “Code Value”?;
    2. for every identified attribute, determine the attribute type (e.g., “Definition”, “Example”, etc.);
    3. for every relation between assets, determine the relation type;
    4. for every domain, determine the domain type.

[bok-callout type=”orange”]If the type does not exist yet, make a note. We discuss how to extend the operating model in the admin guide’s section on Metamodel Configuration.[/bok-callout]

Identify structural concepts

[bok-callout type=”green”]Telephone Method
The best way to achieve this is to imagine you are conveying the information in the spreadsheet over the phone to a colleague. Try reading out loud the first rows in the spreadsheet. The telephone method is a well-known technique in Object-Role Modeling.[/bok-callout]

For illustrating the read-out using the telephone method, we color-coded the first two rows.

Customer a party who buys at least one product per year Sales
Account a record or statement of financial expenditure or receipts relating to a particular period or purpose Finance

We read out as follows:

  • Row 1: The Asset with name Customerin the Domain named ‘Sales’ has Definition attribute value “a party who buys at least one product per year”.
  • Row 2: The Asset with name ‘Account‘  in the Domain named ‘Finance’ has Definition attribute value “a record or statement of financial expenditure or receipts relating to a particular period or purpose“.

Indeed the word ‘Customer’ is a proper name for an Asset. Moreover, by reading out using the Telephone Method we have identified for every column whether it concerns an asset name, attribute value, domain name or community name.

Determine Key (Asset, Attribute, Relation) Types

Now we determine for every structural concept, the type. For our scenario, this would translate in the following requirements for the operating model:

  • Column 1: asset type ‘Business Term’.
  • Column 2: text attribute type ‘Definition’.
  • Column 3: domain type ‘Glossary’.

We add a column for the asset type and domain type and relabel the headers for all columns as follows based on the previous findings.

asset name asset type definition domain name domain type
Customer Business Term a party who buys at least one product per year Sales Glossary
Account Business Term a record or statement of financial expenditure or receipts relating to a particular period or purpose Finance Glossary
Segment Customer Business Term a party representing a segment that buys at least one product per year Sales Glossary
Project Business Term an individual or collaborative enterprise that is carefully planned and designed to achieve a particular aim Operations Glossary

This allows us finally to read out every row in terms of the Operating Model:

The asset with name ‘Customer’

    • is of (asset) type ‘Business Term’;
    • has Definition attribute value “a party who buys at least one product per year“;
    • and is owned by the Domain with name ‘Sales’;
      • which is of Domain Type ‘Glossary’.

This can be visualized as follows:

[bok-callout]Further Reading
We refer to to the admin guide’s section on Assets, Attributes and (Complex) Relations for the functionality.

Once having the operating model set, we can perform the import of the sample spreadsheet. For more help on importing we refer to:

[/bok-callout]

Scenario 2 – Distinguishing Between Asset and Attribute 

The following table shows a spreadsheet extract from a combined reference data / multilingual business glossary use case at the Flemish Department of Economy, Science and Innovation, before they started using Collibra DGC. It defines a controlled listing of all official scientific disciplines in Dutch and English, and their respective reference code values. the code values are used to classify research staff, publications, funding programs, funded projects, and publications in the European research landscape.

L0 NL_NAME EN_NAME
B000 BIOMEDISCHE WETENSCHAPPEN BIOMEDICAL SCIENCES
H000 MENSWETENSCHAPPEN HUMANITIES
P000 EXACTE WETENSCHAPPEN   PHYSICAL SCIENCES
S000 SOCIALE WETENSCHAPPEN  SOCIAL SCIENCES
T000 TOEGEPASTE WETENSCHAPPEN  TECHNOLOGICAL SCIENCES

This scenario has two solution approaches which we will discuss separately.

Identify structural concepts for the First Approach

We apply again the Telephone Method. For illustrating the read-out, we color-coded the first two rows.

B000 BIOMEDISCHE WETENSCHAPPEN BIOMEDICAL SCIENCES
H000 MENSWETENSCHAPPEN HUMANITIES

First approach road-out:

  • The Asset with name ‘B000’ has Dutch Description attribute value ‘biomedische wetenschappen’ and has English Description attribute value ‘biomedical sciences’.
  • The Asset with name ‘H000’ has Dutch Description attribute value ‘menswetenschappen’ and has English Description attribute value ‘humanities’.

Indeed the word ‘B000’ is a proper name for an Asset. Moreover, by reading out using the Telephone Method we have identified for every column whether it concerns an asset name, attribute value, domain name or community name.

Determine Key (Asset, Attribute, Relation) Types for the First Approach

Now we determine for every structural concept, the type. For our example scenario, this would translate in the following requirements for the operating model:

  1. Column 1: asset type ‘Code Value’;
  2. Column 2: a text attribute ‘Dutch Description’ for a Code Value;
  3. Column 3: a text attribute ‘English Description’ for a Code Value.

We add a column for the asset type and relabel the headers for all columns as follows based on the previous findings. Moreover, for completeness we also add a column for a domain name which we name (for example) ‘Codelist Domain’.

code value name asset type dutch description attribute value english description attribute value domain name
B000 Code Value BIOMEDISCHE WETENSCHAPPEN BIOMEDICAL SCIENCES Codelist Domain
H000 Code Value MENSWETENSCHAPPEN HUMANITIES Codelist Domain
P000 Code Value EXACTE WETENSCHAPPEN   PHYSICAL SCIENCES Codelist Domain
S000 Code Value SOCIALE WETENSCHAPPEN  SOCIAL SCIENCES Codelist Domain
T000 Code Value TOEGEPASTE WETENSCHAPPEN  TECHNOLOGICAL SCIENCES Codelist Domain

This allows us finally to read out every row in terms of the Operating Model:

 The asset with name ‘B000’

    • is of (asset) type ‘Code Value’;
    • has Dutch Description attribute value “Biomedische Wetenschappen“;
    • has English Description attribute value “Biomedical Sciences“;
    • and is owned by the Domain with name ‘Codelist Domain’.

This can be visualized as follows.

In this first approach, we have chosen to model the English and Dutch descriptions as mere attribute values assuming we do not need to capture further details on them. Let’s reiterate over the steps now for the second approach.

Identify structural concepts for the Second Approach

Second approach read-out:
  • The Asset with name ‘B000’ is related to an Asset with name ‘biomedische wetenschappen’ and is related to an Asset with name ‘biomedical sciences’.
  • The Asset with name ‘H000’ is related to an Asset with name ‘menswetenschappen’ and is related to an Asset with name ‘humanities’.

Indeed, ‘B001’, as well as ‘BIOMEDISCHE WETENSCHAPPEN’ and ‘BIOMEDICAL SCIENCES’ could be interpreted as names for independent assets.

Determine Key (Asset, Attribute, Relation) Types for the Second Approach

This would translate in the following requirements for the operating model:
    1. Column 1: (as in the first approach) asset type ‘Code Value’;
    2. Column 2: asset type ‘Business Term’;
    3. Column 3: asset type ‘Business Term’;
    4. Moreover, a relation type “Code Value has business term / is encoded by Business Term”.

We add columns for the asset types and relabel the headers for all columns as follows based on the these findings. Moreover, for completeness we also add three columns for domain names in which the respective assets are owned.

code value name code value asset type code value domain name dutch business term name dutch business term asset type dutch business term domain name english business term name english business term asset type english business term domain name
B000 Code Value Codelist Domain BIOMEDISCHE WETENSCHAPPEN Business Term Disciplines Glossary Dutch BIOMEDICAL SCIENCES Business Term Disciplines Glossary English
H000 Code Value Codelist Domain MENSWETENSCHAPPEN Business Term Disciplines Glossary Dutch HUMANITIES Business Term Disciplines Glossary English
P000 Code Value Codelist Domain EXACTE WETENSCHAPPEN   Business Term Disciplines Glossary Dutch PHYSICAL SCIENCES Business Term Disciplines Glossary English
S000 Code Value Codelist Domain SOCIALE WETENSCHAPPEN  Business Term Disciplines Glossary Dutch SOCIAL SCIENCES Business Term Disciplines Glossary English
T000 Code Value Codelist Domain TOEGEPASTE WETENSCHAPPEN  Business Term Disciplines Glossary Dutch TECHNOLOGICAL SCIENCES Business Term Disciplines Glossary English

This allows us finally to read out every row in terms of the Operating Model:

 The asset with name ‘B000’

    • is of (asset) type ‘Code Value’;
    • is owned by the Domain with name ‘Codelist Domain’.
    • has relation “has business term’ with an asset with name ‘Biomedische Wetenschappen’ that;
      • is of asset type ‘Business Term’;
      • is owned by the domain named ‘Disciplines Glossary Dutch’;
    • has relation “has business term’ with an asset with name ‘Biomedical Sciences’ that
      • is of asset type ‘Business Term’;
      • is owned by the domain named ‘Disciplines Glossary English’.

This can be visualized as follows.

 

In the second approach, we have interpreted the descriptions as names for business terms to be managed as assets on their own. This means they have their own status, roles assignments and attributes. This may not be clear from te given table extract. Therefore, let’s append two additional columns.

L0 NL_NAME EN_NAME NL_STATUS EN_STATUS
B000 BIOMEDISCHE WETENSCHAPPEN BIOMEDICAL SCIENCES Candidate Accepted
H000 MENSWETENSCHAPPEN HUMANITIES Candidate Under Review
P000 EXACTE WETENSCHAPPEN   PHYSICAL SCIENCES Accepted Candidate
S000 SOCIALE WETENSCHAPPEN  SOCIAL SCIENCES Under Review Accepted
T000 TOEGEPASTE WETENSCHAPPEN  TECHNOLOGICAL SCIENCES Candidate Under Review

 

The columns ‘NL_STATUS’ and ‘EN_STATUS’ represent the status of the respective Dutch and English descriptions. Considering these additional columns representing status (or any other attribute value), we have now a clearer case for the second approach where we model the descriptions as assets. This would be visualized as follows.

Scenario 3 – Distinguishing Attributes from Relations

Consider the following table extract from a business glossary use case at a financial services company. 

Name Definition Abbreviation Synonym Generalization
Customer a party who buys at least one product per year   Client Party
Accumulated Adjustments Account an accumulated record or statement of financial expenditure or receipts relating to a particular period or purpose AAA   Account
Third Party Logistics a firm that provides service to its customers of outsourced (or “third party”) logistics services for part, or all of their supply chain management functions 3PL    
Zero Quantity the symbol 0, indicating an absence of quantity or magnitude ZQ    

 

In Scenario 1 we already learnt how to identify the first two columns. We read out as follows (identifying structural concepts and determining their type at the same time): 

  • The Business Term with name ‘Customer’ is defined as “a party who buys at least one product per year”.

  • The Business Term with name ‘Accumulated Adjustments Account’ is defined as “an accumulated record or statement of financial expenditure or receipts relating to a particular period or purpose”.

Columns 3-5 can be interpreted either as attribute values for the business terms. E.g. for row 1 and 4 we can state either:

  • The Business Term with name ‘Customer’ has Synonym attribute value ‘Client’.
  • The Business Term with name ‘Zero Quantity’ has Abbreviation attribute value ‘ZQ’

or we can interpret them resp. as: proper names for assets related to the business terms.  E.g.,:

  • The Business Term with name ‘Customer’ has relation “has synonym” with Business Term with name ‘Client’.
  • The Business Term with name ‘Zero Quantity’ has relation named “has shorted form” with Abbreviation with name “ZQ’.

Conclusion, we can model abbreviations and synonyms either as attributes or assets. Following diagram shows the first attribute approach. Attributes are owned by only one single asset.

 

When modeling them as using the second approach, i.e., as an asset. you can reuse and refer to them from multiple assets. This is illustrated below. E.g., the Abbreviation ‘ZQ’ could also serve as abbreviation for another business term ‘Zone Qualifier’ (not in the table), allowing for cross-term mappings. Note also by distinguishing abbreviations from terms with a designated asset type we can filter views more accurately on asset type and restrict instantiation of the head and tail terms of the “has shortened form” relation type to only business terms and abbreviation assets.

The choice between both approach becomes easier when considering column 5. In Scenario 1, we already consider the business term ‘Account’. Assuming this term already exists in our glossary, the choice is now easier to define a “generalization” relation with that term, instead of redundantly redefining it as an attribute value.

Scenario 4 – Identifying Taxonomies

Let us consider further the last column of scenario 3, which is about the identification of taxonomies. Following extract shows the generalizations and specializations for ‘Customer’.

Name Generalisation
Customer Party
Enterprise Customer Customer
Segment Customer Customer
Special Enterprise Customer Enterprise Customer

We read out the concept types and asset types as follows:

  • The Asset with name ‘Customer’ 
    • is of asset type ‘Business Term’
    • specializes asset with name ‘Party’ of asset type ‘Business Term.

Resulting in the following extension of the spreadsheet and relabeling of its columns.

business term name business term asset type general business term name general business term asset type domain
Customer Business Term Party Business Term Taxonomy Domain
Enterprise Customer Business Term Customer Business Term Taxonomy Domain
Segment Customer Business Term Customer Business Term Taxonomy Domain
Special Enterprise Customer Business Term Enterprise Customer Business Term Taxonomy Domain

 

Scenario 5 – Distinguishing Taxonomy from Qualifiers

Following table is an extract from a glossary case at a global commercial insurer where every term’s context is defined or ‘qualified’ by three dimensions.

term standard business line geo-region business function information classification
Home Address Address Life Insurances United States Human Resources Firm Confidential
Subscription Date Date Commercial Japan Chief Administrative Office Restricted

We read out as follows (identifying structural concepts and determining their type at the same time, see scenarios 1 and 2): 

  • Business Term ‘Home Address’
    • specializes the Standard Business Term ‘Address’;
    • is classified by Business Line ‘Life Insurances’;
    • is classified by Country ‘United States’;
    • is classified by Business Function ‘Human Resources’;
    • is classified by Information Classification ‘Firm Confidential’.


The extension of the operating model is visualized as follows.

Note the difference between taxonomical classification (e.g., Home Address specializes Address) and qualification (e.g., Home Address is classified by business unit ‘Life Insurance’).

[bok-callout type=”orange”]Classification Schemes
Classification schemes define the discrete qualities an asset might be classified for. It is important to manage their completeness and unambiguity according to the data governance principles. For example, the geo-region classifier reuses the list of country business terms from the ISO3166 standard. These terms may be on their term linked to various codlists such as ISO 3166-2. See Importing Reference Data from Excel: ISO Country Codes and Subdivisions for more.[/bok-callout]