Knowing your data, its attributes, and its physical location are crucial to being compliant with GDPR. Implementing a 'Data Classification Levels' scheme can help your GDPR compliance efforts identify different data types based on sensitivity.

In this article, we will explain;

  1. What is data classification;

  2. What are different types of data classification levels;

  3. How data classification can aid compliance with GDPR.

Data Classification Explained

What is data classification?

Data classification categorizes data into relevant groups based on pre-selected criteria such as sensitivity levels and file type. Classifying data in such a way makes it easier to locate, trace, and search data controlled by the organization, leading to more efficient data management, heightened data security, and more robust legal compliance.

For example, an organization may need to find all data related to credit card details because it has to execute more robust security measures such as encryption to comply with regulations. Data classification allows organizations to 'know their data,' label them based on pre-defined criteria and take action accordingly, restricting access to a limited number of employees or de-identifying data.

Data classification is not the final step in your compliance or data security efforts, but it is instead an enabler for you to group similar data based on sensitivity levels. This categorization is the basis for you to take appropriate action for compliance and enhanced security. For example, data classification will help you detect and label sensitive health records so that you can implement methods to prevent leakage of such data to malicious third parties.

What are the different methods for classifying data?

There are three methods you can use to classify data.

Content-based classification

This method scans through each file's content, such as e-mail content, spreadsheets, and word documents, and detects where sensitive information lies. Based on this, the relevant document or data is categorized as being sensitive or confidential.

Regular expressions are classic examples of content-based classification. It is a programming language. You can use it to define 'string rules' to detect and label sensitive data such as social security numbers, card numbers, and e-mail addresses regardless of data location or who created the data.

Context-based classification

This method organizes data into groups based on their location, who created it, and on which application it runs. You can classify and label sensitive data based on these indirect variables.

Suppose the HR team creates files related to job applicant evaluations, for instance. In that case, the system can identify such data as sensitive because such data can relate to race, ethnic origin, or previous addresses.

User-based classification

While the methods mentioned above deploy automated means, you can also classify data manually. Your employees or customers can have the discretion to decide on the sensitivity of the data involved.

This method may provide a more accurate classification level because human intuition and expertise can eliminate false positives and negatives. However, the manual process is likely to consume valuable time that employees can spend more productively. It can even be utterly impractical because your data inventory may be years old and too big to handle all by manual labor.

Is Data Classification different than data discovery?

While data classification and discovery are highly related, they are not the same; they complement each other for implementing effective data governance and legal compliance process.

Effective classification of data starts with locating where each data resides. In other words, data discovery is the foundation of a successful data classification policy because you cannot categorize data of which you are not aware. In other words, data discovery predates classification.

Is there a one-size-fits-all?

Unlike government organizations, there are no standards for data classification levels specified for commercial organizations.

Organizations can customize what classification levels as appropriate. We have to ask why we create different levels; it has a lot to do with how we will handle each.

What Are 'Data Classification Levels'?

How to determine data classification levels?

Effective implementation of data classification levels starts with assessing the sensitivity levels of each data domain and attributes. The more sensitive the relevant data is, the higher the classification level would generally apply.

How do you assess the level of sensitivity and define classification levels? One obvious criterion is to estimate the potential impact of unauthorized data erasure, access, and transfer on your organization and individuals. If the potential impact is expected to be highly severe, then the sensitivity level is high, so a more restrictive classification level should apply to the relevant data.

You can use the Federal Information Processing Standard's framework when making the sensitivity assessment.

This framework contains three criteria that can help you determine the sensitivity level:

Confidentiality: If unauthorized disclosure or access of data to third parties occur, what degree of impact will this have on individuals and your organization? It can be limited, serious, or catastrophic. Depending on this impact's level, you can assign the data to one of the classification levels.

Integrity: If data is modified or destructed without authorization, how will this impact the individual or the organization, and to what degree?

Availability: If access to data is not provided, what are the potential effects of this on individuals and your organization, and to what degree?

Based on your assessment, you can create the following sensitivity levels such as 'low level,' medium level,' and 'high level' and assign the data to classification levels based on the degree of sensitivity.

Common data classification levels adopted by organizations.

While there is no upper or lower limit to the number of levels of their degree of sensitivity, having a simple and high-level classification level scheme is likely to be most effective. An excessive number of levels can add complexity to the compliance process and confuse your employees and invite inaccurate evaluation.

The most basic data classification scheme often used in practice is a three-level classification scheme based on sensitivity levels:

  • Public data: Any data that employees can disclose to the public falls under this category because it is unlikely to impact individuals severely. Your privacy policy, company mission statement, or job ads are examples that you can freely disclose, share, and transfer with the public.

  • Internal Data: This category refers to data that can have medium-level adverse effects on your organization. It cannot be accessible to the public and should only be accessible by your internal teams. Internal communications between departments, third-party contracts, and intellectual property-related documents are typical examples.

  • Restricted/Confidential Data: If unauthorized access to, sharing, modification, or loss of data can result in a severe impact on individuals, it is considered within the restricted data category. Personal data under GDPR and CCPA protected health information under HIPAA fall within this category.

Guidance using standards

For more information and ideas on creating data classification levels, you can look at data management standards, like DAMA's Data Management Knowledge System Guide (Original Book 2nd Edition), or the EDM Council's Data Management Capability Assessment. Prodago has integrated both as part of its registry of best practices in data management.

How Data Classification Can Aid Compliance With GDPR

Hefty fines due to data breaches are only one aspect when it comes to GDPR compliance. GDPR adopts the accountability principle, so that you need to be proactive and demonstrate your compliance with GDPR even before a breach occurs.

Furthermore, sensitive data such as racial and ethnic origin are defined as 'special categories of data under Article 9 of GDPR, so you need to be extra diligent with such data.

Be extra diligent with racial and ethnic data.

All these challenges require complete knowledge of what type of data you have, where you keep it, and its attributes.

Here are three significant ways data classification can help GDPR compliance efforts:

GDPR Security Requirements

If you are processing personal data subject to GDPR, you are responsible for providing an appropriate security level, including preventing unauthorized disclosure and destruction under article 5 GPDR. You must also apply state-of-the-art organizational and security measures such as encryption and pseudonymization.

By implementing data classification, you can discover what data you have and hold it to inspect security risks and implement technological solutions. These may include encrypting personal data. It can also cover measures such as blocking the sharing of sensitive personal data via insecure networks(such as public Wi-Fi) and on vulnerable apps or with unauthorized third parties.

Data classification can help you group data that you can anonymize so that you may not even be subject to GDPR.

Suppose personal data relates to sensitive health records or extreme political views, for instance. In that case, you can classify this data as high-sensitive and insert a 'protective marking' sign on it so people interacting with this data can realize it is restricted data, and they should be extra careful. Doing so increases employee-awareness and reduces the irresponsible handling of personal data.

Data subject requests are more easily fulfilled.

GDPR gives data subjects the right to access their data, the right to erasure, and the right to rectify their data. Executing these requests depends on locating personal data, retrieving them from the system, and taking action asked by an individual.

A well-implemented data classification scheme facilitates this process because you can swiftly extract personal data and satisfy requests.

Data classification can help tremendously with requests when they are on a granular level. Suppose a request is only limited to a specific type of sensitive data, such as genetic information. In that case, individuals may only ask for deletion or modification of such data, and classification enables you to fulfill this request appropriately.

Data retention

After classifying personal data subject to GDPR, you can assess how long you have been storing such data and whether it is still necessary to keep it. GDPR already requires you to dispose of data that is no longer necessary under Article 5, so data classification may reveal data you no longer need.


Data classification may only be a part of a big puzzle, yet it plays a vital role in GDPR compliance. Putting in place an effective data classification policy and determining data classification levels can aid your compliance efforts.

Follow us on LinkedIn!