15 May 2025
by Fawad Qureshi

A vision for UK-wide data sharing: Designing the National Data Library

See how Snowflake’s blueprint for a collaborative data-sharing resource could change the UK’s research and public services landscape.

In December 2024, the Wellcome Trust and the Economic and Social Research Council (ESRC) put out a call for technical visions for a National Data Library (NDL). The goal was to create the blueprint for a data-sharing architecture that would connect thousands of disconnected databases to improve vital research and public services across the UK.  

In today's digital landscape, data supply chains have emerged as critical infrastructure for effective decision-making. Just as physical supply chains connect raw materials to finished products, data supply chains transform raw data into actionable insights that power services and research. Building end-to-end visibility across these data supply chains is crucial to eliminate information asymmetry, where some stakeholders possess more complete information than others. This asymmetry creates inefficiencies, reduces trust, and limits collaboration potential. By establishing transparent data flows with clear provenance, governance, and accessibility, organisations can ensure that all participants operate with consistent information, leading to more equitable outcomes and enhanced innovation. An effective NDL would address these challenges by creating standardised pathways for data to move securely between trusted parties while maintaining visibility throughout the entire journey. 

Snowflake co-authored one of five successful white papers that shared a vision for the future of the UK’s public services data sharing. In this article, we look at some of the challenges in creating a distributed architecture for research and explore our recommended approach to creating an NDL, along with examples of other projects where we’ve helped organisations share data at scale.  

Big ambitions require overcoming big challenges  

The creation of an NDL would have a profound impact on the UK’s ability to deliver public services, but there is a paradox at the heart of this problem; any proposed solution needs to enable maximal data sharing while also ensuring complete control over who can access data.   

To address this problem, the architecture behind an NDL must be based on an open standard for how data is stored, ensuring both data security and accessibility. This is vital for increasing public trust and reducing complexity. There must also be robust mechanisms for how data is shared, supported by strong governance to connect trusted parties to this data.  

The most important consideration is that the data library should prioritise the ecosystem itself above any individual user or contributor. In doing so, it can be built in a way that benefits every party.  

The right approach will move public services away from siloed data that provides a partial view of populations to a world of unified, on-demand data. Data that’s accessible to multiple providers — all with standardised workflows and approaches to citizen engagement. 

The principles behind our architecture 

By applying secure cloud collaboration, template-driven infrastructure architectures, and proven, scalable technologies, we proposed a blueprint for an economically viable ecosystem that shares valuable data applications and insights. 

At the heart of this ecosystem are nine key principles. Think of them like a manifesto designed to increase public trust, reproducibility and sustainability, while simultaneously reducing complexity, costs and risk. 

  1. Services are customer-driven, not centrally determined. 

  1. We prefer analytical code to be open-source rather than reinvented. 

  1. We should minimise data duplication wherever possible. 

  1. Personal data should remain as close to the source as possible to provide security and ease of privacy management. 

  1. Code moves to the data rather than moving data.  

  1. Personal identifiers should be the last resort when linking datasets. 

  1. Simple access to data is essential, as friction and complexity reduce usage and value. 

  1. We trust domain experts to develop their own data assets. 

  1. Design must be iterative and responsive to changing needs. 

 

Three approaches at the heart of our NDL 

These nine principles are supported by three core approaches to data use that underpin our proposal.  

  1. Secure data sharing 

Snowflake allows for secure data sharing, by ensuring that no actual data is copied or transferred. Instead, the organisation you wish to share data with can query and process the original data as if it’s within its own environment.  

We enable this through a virtual data warehouse that separates where data is stored from who queries it. Through this approach, you can share data directly with multiple organisations while ensuring sensitive data never leaves your control. This is key to both ensuring compliance and building trust.  

  1. Listings  

Part of the problem with large-scale data sharing is that not everyone knows what data exists to support research. Snowflake allows users to create a shared area that lists all available data. It provides full control of who has access to view these listings, and additional external sources can easily be discovered on the Snowflake Marketplace. As with secure data sharing, these listings allow you to bring the organisations you partner with to your data, rather than the other way round, so you can maintain complete integrity and control.  

  1. Data clean rooms 

For the most sensitive data, that categorically cannot be viewed outside of an organisation, data clean rooms provide an environment for connecting and linking data sets to show aggregated results, without allowing individuals to view the underlying data. To ensure data privacy, clean rooms deploy differential privacy technology that adds “noise” to those results so you can run computation directly on top of encrypted data.  

Our experience with data mesh architectures 

Our proposal for the NDL incorporates a ‘data mesh’ or ‘data fabric’ architecture; an approach to enhancing self-service data use which many leading compute, storage, and security platforms are now supporting.  

Our vision is based on the LATTICE architecture developed at the Francis Crick Institute. ‘The Crick’, as it’s colloquially known, is Europe’s largest discovery life sciences research institute and uses Snowflake’s AI Data Cloud to power its exploratory research.  

A data mesh architecture enables the perfect balance between open sharing and consistent security, and our past experience shows us this works in practice as well as in theory.  

Most recently, working with Roche Diagnostics, the Swiss healthcare multinational, we helped to create a dynamic, decentralised data mesh architecture that enables Roche to make more efficient use of data, reduce costs, and ensure governance.  

Using a combination of Snowflake and Immuta, Roche eliminated a significant role-based access burden and reduced required access groups by 94%. It has also increased the production of data models by up to 20% using Snowpark, and accelerated time to action by up to 300%.  

A dedication to collaboration and data sharing 

Roche isn’t the only example of our recent work in this field. We have vast experience bringing data sharing ambitions to life, and regularly collaborate with other organisations to simplify data sharing and uncover new opportunities. 

In 2024, we collaborated with the California Department of Technology and California State Water Resources Control Board to implement the data connectors that support The California Open Data Portal. This portal makes data publicly available in a safe and secure way.  

We’ve also helped the Met Office enhance its weather data delivery, which improves decision-making across industries like healthcare and aviation. The Met Office uses Snowflake Marketplace to make data more accessible, supporting critical offerings like emergency services and public transit. The accessibility of Snowflake Marketplace means a larger userbase can access the Met Office’s data, helping it reach new sectors and provide more detail to its data users.  

To learn more about our vision for an NDL, read our full paper, along with those from the other five organisations who took part.  


ai_icon_badge_stroke 2pt final.png

techUK - Seizing the AI Opportunity

The UK is a global leader in AI innovation, development and adoption.

The economic growth and productivity gain that AI can unlock is vast, but to fully harness this transformative opportunity, immediate action is required. Our aim is to ensure the UK seizes the opportunities presented by AI technology and continues to be a world leader in AI development. 

Get involved: techUK runs a busy calendar of activities including events, reports, and insights to demonstrate some of the most significant AI opportunities for the UK. Our AI Hub is where you will find details of all upcoming activity. We also send a monthly AI newsletter which you can subscribe to here.

Upcoming AI events

Latest news and insights

Subscribe to our AI newsletter

AI updates

Sign-up to our monthly newsletter to get the latest updates and opportunities from our AI and Data Analytics Programme straight to your inbox.

Contact the team

Usman Ikhlaq

Usman Ikhlaq

Programme Manager - Artificial Intelligence, techUK

Sue Daley OBE

Sue Daley OBE

Director, Technology and Innovation

Tess Buckley

Tess Buckley

Programme Manager - Digital Ethics and AI Safety, techUK

Laura Foster

Laura Foster

Associate Director - Technology and Innovation, techUK

Nimmi Patel

Nimmi Patel

Head of Skills, Talent and Diversity, techUK

Audre Verseckaite

Audre Verseckaite

Senior Policy Manager, Data & AI, techUK

Edward Emerson

Edward Emerson

Head of Digital Economy, techUK

Heather Cover-Kus

Heather Cover-Kus

Head of Central Government Programme, techUK

Visit our AI Hub - the home of all our AI content:

Seizing the AI Opportunity generic AI campaign card.jpg

 

 

Authors

Fawad Qureshi

Fawad Qureshi

Global Field CTO, Snowflake

Fawad is a strategic technology leader with more than two decades of international industry project experience involving consulting on enterprise data warehouses, big data analytics, and cloud solution architectures. He has worked across all stages of the technology life cycle starting from engineering, professional services, pre-sales, and business development. In his current role he works as Global Industry Field CTO, helping customers in multiple industries across the globe. He is passionate about sustainability and helps his clients in achieving their sustainability goals through the use of data and analytics.