As business generates more data and moves it into the cloud, the dynamics of data management changes in a number of fundamental ways. Data owners worry about unauthorized access either by individuals or applications, potentially exposing sensitive information. Leadership is concerned about data breaches and insider threats where authorized personnel misuse their access for malicious purposes. Finally, data users themselves might be uncertain about the processing methods used on the data they rely on and the possibility of data tampering, raising concerns about the data’s reliability. These overlapping concerns highlight the importance of robust data governance practices.
Data protection has to be carried out at multiple levels to provide defense in depth. This starts with physical security for data centers and robust network infrastructure protection for data in transit. Next comes access control, where we plan how authorized personnel and applications will be authenticated and granted specific permissions. Encryption adds another vital layer ensuring that even if a breach occurs, the stolen data remains unreadable. Data security requirements and procedures are categorized into four groups, known as the 4 A’s: Authentication, Authorization, Access and Audit.
Also Read: Ingredients for Successful Data Governance
Authentication
The identity specifies who has access. This could be an end user who is identified by a username or an application identified by a service account. User accounts represent a data scientist or business analyst or an administrator. They are intended for scenarios in which an application needs to access resources interactively on behalf of a human user.
Authentication verifies user identity, typically through a combination of passwords, second-factor authentication and sometimes biometrics. User authorization follows authentication, determining access rights based on predefined policies. These policies dictate actions such as reading data, editing metadata updating content or performing ETL operations. Service accounts are managed by Cloud Identity and Access Management (IAM) and represent nonhuman users. They are intended for scenarios in which an application needs to access resources automatically. These accounts like robots get some permissions from their creators specifically designed for applications to use.
Authorization
Authorization involves defining roles and permissions to control access for users. A role is a set of permissions that specify what actions an identity (user or group) can perform. Custom roles can be created for specific needs such as allowing access only to metadata in a BigQuery dataset. To manage permissions efficiently, you can create groups based on job functions, assign roles to these groups and add users to the groups. This approach simplifies updates when job roles change. Custom roles can also be used to limit predefined roles for example, allowing data suppliers to create but not modify or delete tables. The principle of least privilege should be followed, granting only the necessary permissions. While permissions are usually managed individually for each resource, this can become complex. Setting roles at the project level can help as permissions will automatically apply to all resources within the project. Another method is using Identity-Aware Proxy (IAP), which provides centralized access control for applications accessed via HTTPS. IAP allows for organization-wide policies that ensure consistent and secure access control, such as restricting resource access to employees only.
Access
The role determines what access is allowed to the identity in question. Identity-Aware Proxy (IAP) policies can be applied across your entire organization, allowing you to centrally define and enforce access rules for all applications and resources. Use IAP to control who can access your applications and resources such as allowing employees to access certain resources while restricting contractors. Policies are rules that help developers work quickly while maintaining security and compliance. They include authentication and security policies, like two-factor authentication, as well as authorization policies that determine what actions users can perform on specific resources.
Define policies hierarchically whenever possible. This means creating policies that apply to the entire organization or to specific business units, projects or teams. Higher-level policies take precedence, ensuring consistent enforcement across the organization. Some systems allow lower levels to evaluate specific rules, helping to maintain compliance. Monitoring policy use helps understand which rules are applied to particular resources.
Context-Aware Access enhances security by enforcing detailed access controls based on user identity and request context, such as device type and IP address. This approach aligns with the zero-trust security model. For example, you can configure policies to allow an employee to edit data from a secure corporate device but only view data from an unpatched device. This improves security by ensuring that access is granted based on the specific conditions of each request.
Data Loss Prevention (DLP) involves protecting sensitive information like home addresses or credit card numbers from being accessed by unauthorized parties. To achieve this, you can use tools that scan your data stores for specific patterns, such as credit card numbers or medical information. These scans help ensure sensitive data is securely managed, reducing the risk of exposure. Regular scans are necessary to keep up with data growth and usage changes.
Encryption plays a crucial role in data security by encoding information so that only authorized users can access it with the right keys. Even if attackers gain access to storage devices, encrypted data remains unreadable without these keys. Encryption simplifies data access control and auditing, preserving customer privacy during operations like backups. Public cloud providers offer encryption for data at rest and in transit, safeguarding information on hard drives and during network transfers.
Differential privacy is another method to protect sensitive datasets by anonymizing individual identities while sharing aggregate data. Techniques like k-anonymity and adding statistical noise help ensure data privacy by making it difficult to link specific individuals to the shared data. These methods generalize and reduce the granularity of data to prevent re-identification through statistical analysis.
Audit
It’s crucial to ensure that data access is transparent to safeguard it effectively. In any system, especially in Cloud audit processes play a key role. Access Transparency for example in Google Cloud gives you detailed logs of actions taken by Google’s personnel when they access your data. Cloud Audit Logs on the other hand focus on actions taken within your own organization’s cloud projects providing answers to questions like who did what, where and when. These logs are essential for maintaining accountability and understanding how your data is being managed.
Access Transparency logs specifically capture actions by cloud provider personnel ensuring they access your data only for valid reasons such as fixing an issue or responding to a support request. These logs are crucial for verifying compliance with legal requirements and can be analysed using security tools for enhanced monitoring. It’s important to note that while Access Transparency logs are comprehensive, they don’t cover access initiated through standard methods allowed by Cloud IAM policies. Therefore, using both Access Transparency logs and Cloud Audit Logs together provides a complete picture of data access activities across your cloud environment.
Many businesses face heightened risks of data leaks due to vulnerabilities in their technology infrastructure that demand immediate attention. As data expands and moves to the cloud, managing it becomes more complex. Concerns include unauthorized access, data breaches, insider threats and data integrity issues, highlighting the need for immediate security enhancements. To address these multifaceted concerns, robust data governance is crucial. The SCIKIQ data platform is specifically designed to enhance data governance and management by providing actionable insights. It strengthens data governance practices, establishes stringent security controls, and effectively mitigates the risk of breaches. Moreover, it ensures compliance with regulatory standards, offering comprehensive protection at every stage of data handling.
Reference
- Data Management Body of Knowledge, DAMA International Technics Publications, Basking Ridge, New Jersey.
- Evren Eryurek, Uri Gilad, Valliappa Lakshmanan, Anita Kibunguchy, Jessi AshdownData Governance: The Definitive Guide. People, Process and Tools to Operationalize Data Trustworthiness. March 2021.
- Anderson, Dean and Anderson, Linda Ackerson. Beyond Change Management. Pfeiffer, 2012.
- Giordano, Anthony Davis. Performance Information Governance: A Step-by-step Guide to Making Information Governance Work. IBM Press, 2014. Print. IBM Press.
Also Read: Key Steps for Data Protection