05 Sep 2025

Making Sustainable Choices: dark data and abstractions (Guest blog from Kainos)

Mark Nevin.png

This blog was written by Marc Nevin, Innovation Solution Architect at Kainos and author of 'Digital Sustainability: the need for greener software'

On a digital sustainability journey, you quickly learn the importance of good data. Capturing the right data, in the right places and in right formats, is essential for practical use and better understanding, management and reduction of the environmental impacts.

Unfortunately, emissions data is often what’s holding us back from more sustainable approaches to cloud use; it’s often incomplete, it’s not granular, and its primarily retrospective, making it difficult, as users of those services, to know what to do. We will discuss two key sources of this problem, Dark Data and Poor Abstractions.

So what is ‘Dark Data’?

This is the data we all collect but don’t draw insights from, its largely forgotten about. This can be data like unstructured logs, deprecated backups, poorly selected logs and measures. It might be stored in a storage bucket in an old account, a forgotten database, or even idle compute instances with no clear purpose. It can also be data we know we’re collecting, but simply don’t use given Seagates ‘Rethink Data’ study found that 68% of enterprise data that is collected, is effectively unused.  

As organisations adopt new services and tools at speed, often accelerated with the boom of AI tooling, the risk of inconsistent tagging, tracking or standards means is easier to end up with shadow adoption, zombie resources and poorly thought-out data capture. Often mitigating this dark data falls to FinOps teams, looking to manage costs and waste after the fact.

Using Poor Abstractions?

In practice, most organisations rely on abstractions when it comes to reporting on the emissions of their cloud environments. Often the task falls on development and IT teams that aren't experts in the space, who are faced with a new world of terminologies, proxies and calculation estimation approaches. Its complex, which is why abstractions can help.

However within these abstractions, the way that SaaS and Cloud Service Providers (CSPs) present the data can hide data gaps. Things like averaged provider-published carbon-intensity factors or ‘per-user’ numbers, rough Power Usage Effectiveness (PUE) ratios, or static watt-per-vCPU estimates to gauge emissions. Not only is this detail not granular enough for end users, it can be presented in a way as to obfuscate consumption e.g. using Renewable Energy Credits to reduce, or sometimes even zero-out, GHG Scope 2 emissions data.

While these abstractions simplify reporting and give a rough idea of your emissions, they obscure the depth of the problem and the variance in modern workloads; burst compute, GPU-heavy training jobs, ETL pipelines and underutilised instances all consume power in different ways. Not to mention variance in regions and zones. These abstractions, fall apart one or two layers of investigation down and applying these broad estimates flattens the nuances of different applications into a single number, leading to inaccurate estimates of actual impact. This makes it hard to stand behind the stats, prioritise optimisation or detect when “green” improvements move the needle. Accurate impact measurement in the cloud demands finer granularity than current abstractions allow.

Managing Impact

Whilst easy to lean on proxies such as cost to make sense of complex systems and understand environmental impact, they hide the scale of the problem. This makes taking accountability challenging as they data we have isn’t pointing at the real impact. But we can take responsibility look at the actions we can take to break this into a manageable problem.

There are lots of reasons that make this difficult, but we would advise taking the 5 actions below:

1. Pressure - Start increasing pressure on CSP’s to give us better and more granular, real-time power, usage and emissions information, ideally across all GHG scopes.

2. Measure – Despite the challenges, we can’t abandon abstractions. We need to measure what we can, as accurately as we can, but understand and mitigate key limitations. We must build with the understanding that existing measures are useful and seek to reduce inaccuracy through enrichment. For example augmenting with workload-level carbon signals, making dark data visible, measurable, and actionable e.g. adopting approaches like Software Carbon Intensity (SCI).

3. Manage - Start embedding data governance and cloud governance frameworks to enable emissions measurement and shine the light on dark data. Frameworks that feature sustainability can better capture climate signals in logs and reporting, creating a more granular, realistic measure. With better cloud hygiene and standards for tags and lifecycles, it can reduce and mange dark data and zombie recourses impacting our emissions goals. Set KPIs aligned with SCI around your functional measures and create the opportunity to work on delivering your service within a set emissions impact.

4. Culture - It’s easy to overlook but it’ll kill any sustainability initiative if its an afterthought. Teams need to know that they can make sustainability focused decisions, balanced against other aspects. Building a culture of measurement and improvement using the data they have, shows emissions are their responsibility and not just a nice thing to do. This is key in letting your teams proactively utilise improved data as it becomes more available, it creates a cycle of pressure and accountability enabling them to strive to do better and make lower-impact solutions.

5. Start Today

The best time to start was yesterday, the next best time is today. Start looking at your culture and data, it’s easy to focus on cost but taking accountability for emissions can enable teams to start addressing your impact and better manage the effect of dark data and poor abstractions.


Cloud Week 2025

Check out more insights on a range of key topics related to Cloud

Find out more

 

Call for contributions: A sustainable future for Cloud, Data and AI

We are asking techUK members to submit case studies and success stories highlighting the latest innovation and best practice that supports a sustainable approach to cloud, data and AI.

Find out more

techUK's Technology and Innovation updates

If you’d like to start receiving information about relevant events, news and initiatives, please subscribe here and join the Technology and Innovation contact preference.

Sign-up here

For more information please contact: 

Chris Hazell

Chris Hazell

Programme Manager - Cloud, Tech and Innovation, techUK

Sue Daley OBE

Sue Daley OBE

Director, Technology and Innovation

Laura Foster

Laura Foster

Associate Director - Technology and Innovation, techUK