Setting up an Information Governance - a.k.a. Data Governance - program requires some effort. Thankfully, there are many frameworks around to help you, including Prodago's. But before we get into the "how," it is essential to appreciate "what" it is that we are trying to govern. With a clear understanding of the various aspects that we are trying to manage, we make clear and informed decisions about priorities and sequencing.
In subsequent posts, we will delve into the likes of our framework, best practices and organizational considerations, amongst other subjects related to data governance. In this post, we will cover nine (9) aspects of data that, if not managed, will expose your organization to data risks (more on those in a later blog post). These aspects apply to any industry.
Nine (9) aspects of data that, if not managed, will expose your organization to data risks
The aspects we manage fall into three categories:
Data protection - this one is self-explanatory;
Utility - this is about making sure the data itself is useful in context or a particular use case;
Value - this one is about applying the data to the right context or use case, and managing potential negative implications.
Why is Information Governance Important?
We are using the term Information Governance to emphasize how governance itself can raise the business value of data. We like part of wikipedia's definition which says: "balances the risk that information presents with the value that information provides." That is quite a mission and one that is fundamental. As such, we will see below, some aspects are defensive in nature, which can be very important (think of risk management or regulatory compliance). Still, others are clearly about playing offence and driving more business value. Indeed, organizations that are successful at monetizing their data recognize that it is a critical asset and, therefore, always govern it well.
Information Governance "balances the risk that information presents with the value that information provides." Wikipedia
Privacy or PII (protection)
Personally identifiable information means that we can use the data to identify, contact or locate an individual. There will be a set of processes to determine which data is privacy sensitive and who can see it. For instance, one client's data science group systematically removed the first and last minute of telematics car trip data. Even if the data itself had no PII fields in it, they rightly discovered that people typically start and finish trips from home! That would allow their data scientists or data engineers to derive whom this trip belonged to by cross-referencing public data sets like Google maps.
While governing PII is about protecting individuals, data security is about protecting company information in general. The ideas are similar yet entirely different in scope and focus. Generally, security will refer to preventing unauthorized access. In contrast, in dealing with PII, we may have to remove data if asked, like in the case of having to comply with GDPR.
Compliance is a wide-ranging topic. It is about creating the capability to act according to regulations, a set of rules or even a request. Many regulatory compliance laws have to do with PII, but here, there are often added requirements to "prove," "demonstrate," or report regularly.
One of the toughest areas of Information Governance, in our opinion, is data integration or how datasets get connected. Integration is complex and multi-layered. When done right, it can make the difference between making the data usable or not, and the time to do it a real project show-stopper. Integration is all about data architecture, reference and master data, but one area we don't see implemented often in data governance programs.
Data Quality (utility)
Unlike Integration, Data Quality is probably the most popular driver of data governance as a discipline. However, one thing to remember is that data quality should not be viewed as an absolute but instead to mean "fit for purpose." If a data set has one purpose or use, data quality is generally smooth, but it is rarely the case. Because of its importance, data quality and its remedies are well-documented.
Data about data is instrumental, especially at consumption time. What is the content? When was it refreshed? Were there errors? What is the level of quality? What is the lineage for a specific piece of information? Data catalogues fall in this area. They can help make the data more useful because they make it available for us.
How long will we need to keep data? Will there be different ways to access it depending on age? The advent of data lakes and all its various incantations may mean we keep data indefinitely. Big data sets inside more sophisticated data platforms, like Snowflake, wind up costing more even if you don't access it. We still need a strategy around retention.
Data risks (value)
Primarily because of protection, merely using data creates risks for the organization. Think of data breaches in the cloud. But risks go beyond unauthorized access. There is a whole emerging discipline around ethics, such as Responsible AI. It is about ensuring that the organization's reputation is not tarnished by automating specific processes with machine learning models that include bias. You may have heard about the wife of a wealthy individual that was refused a loan on a bank website, and when her husband filled out the form, the bank approved the loan. They, of course, have the same assets, revenues, etc. The bank did not manage that risk, and its materialization quashed any benefits of automation.
Impact or ROI (value)
Systematically looking for value in data is the whole idea behind "data labs." There is value inherent within the data, but is it explored? Can collaboration or sharing of data increase its value? Sometimes, itis about making a conscious choice to ask the question. This is a fascinating area.
Dataiku, a leading Data Science and Machine-Learning platform, proposes 5 ways of calculating ROI on data analytics initiatives. It is clearly worth time to think about how to calculate ROI if just to prioritize initiatives that are likely more numerous than there is budget or resources for. Exact figures are not as important as the concepts and comparative t-shirt size estimations to get the job done.
As you can see, Information Governance does not have to be a defensive, cost-center imperative. If you are part of a Data Council or planning a Data Strategy, it is useful to think about the scope (the nine aspects above) and the potential value of each aspect to the organization. Any governance program should tackle all nine, at least from a mission perspective.