The Promise of Safe, Usable Data in the Cloud
Data is rapidly becoming a strategic asset for every organization. Businesses are looking for more value from their data, and that data is coming from newer sources, is increasingly diverse, and is growing exponentially. Creating business value relies on converting data into actionable insights, supported by the analytics and artificial intelligence/machine learning (AI/ML) capabilities used within an organization.
Increasingly, organizations are moving to cloud data lake architectures to ingest, store, and catalog data at scale, then allow data scientists and business analysts to access it with their choice of tools and frameworks, including cloud analytics and ML services. The benefits derived from these architectures assume that all valuable data within an organisation is freely available and accessible on demand for consumption by data users. Unfortunately, for organisations that handle personal or otherwise sensitive information in large quantities, this isn’t the case.
Under the cloud service providers’ (CSPs) shared responsibility models, while the CSP ensures the security of the overall platform, the customer remains responsible for the data — including the privacy of information relating to individuals within that data. Heavily regulated industries have approached risk and compliance around the management and use of personal information by tightly governing and constraining use of this data.
While this problem most commonly affects organizations looking to leverage the benefits of a streamlined cloud data architecture for analytics and ML, it’s also a problem for organizations looking to ingest sensitive data into the cloud or to share sensitive data with external partners or business units in different geographies. In the end, much value is unrealised, because data assets that are personal or otherwise sensitive in nature are unable to be easily, broadly, and efficiently accessed from a data lake and used by business analysts and data scientists.
To realise the promise of safe, usable data leveraging cloud technologies, businesses must implement and automate a “safe data pipeline” into and around the cloud. This enables safe, quick access by data scientists and business lines that need it.
For many organisations with large, complex, or siloed data estates, an important first step is knowing what data they have and where their sensitive data is, which can be achieved with data discovery and cataloguing systems. Raw data must be treated as high risk until the scope of any personal information it contains is understood. It is critical to maintain tight access controls on this data.
Once you know where your sensitive data is and it’s properly catalogued, the next step is to apply privacy transformations to the data itself. Only a contextual combination of pseudonymisation, minimisation, and generalisation techniques allow you to strike the right balance between privacy and the continued utility of the data. These transformations can be applied to the data before migrating it to the cloud, on its way into the cloud, or once it is in the cloud. The data can then be made widely available for use from a “de-identified data lake” or any other cloud data repository.
In implementing this, here are a few key principles businesses should adhere to:
- The safe data pipeline should flow through all data sources and environments, from on-premises systems to the cloud and across cloud platforms to enable hybrid and multi-cloud approaches.
- Privacy controls must be applied consistently across the entire architecture to enable safe data use at scale by multiple users simultaneously.
- The technology enabling the safe data pipeline should integrate seamlessly with cloud security tools.
- The whole process should be automated to enable immediate access to safe data.
These high-level requirements serve as the key building blocks for ensuring that organisations can safely and quickly shift their sensitive data into the cloud and utilise advanced cloud services — while complying with data protection laws and protecting the privacy of their own consumers, patients, or citizens.
In today’s environment there is no excuse for organisations to leave value on the table by severely the limiting use of their sensitive data. By putting privacy at the heart of their approach to cloud adoption, businesses can unleash the power of their data to innovate and generate revenue safely and efficiently while remaining compliant.
Tom Kennedy, Director - Cloud & Technology Partnerships at Privitar
Twitter: @privitarglobal
LinkedIn: https://www.linkedin.com/in/tom-kennedy-b15a5b98/
LinkedIn: https://www.linkedin.com/company/privitar/
