Building sustainability by design into cloud-based AI tools

Guest blog by Andrew Grigg, Head of Sustainability Consulting, Sopra Steria Next UK, and Sarah Oury, Sustainable AI Consultant, Sopra Steria.

Artificial Intelligence has become a central engine of innovation, but it also comes with a growing environmental footprint. As organisations scale their use of AI, the question is no longer whether to adopt it, but how to do so responsibly.

Building a more sustainable approach to AI requires stepping back and considering the entire AI lifecycle and the stakeholders involved. For organisations that use AI internally and also develop AI powered solutions, this translates into two main actions. First, they can reduce environmental impact by promoting responsible usage habits. These include raising awareness of AI's environmental implications, sharing digital sobriety guidelines, and training staff to use tools efficiently through prompt engineering best practices. These small adjustments help curb avoidable computation and limit infrastructure growth. Second, at the design stage, teams should embed sustainability by design by selecting use cases that genuinely require AI, choosing appropriately sized models, building efficient architectures, prioritising suppliers with strong environmental commitments, and favouring low carbon or local infrastructure. These decisions have a lasting impact throughout a solution's lifecycle, shaping cost, performance, maintenance complexity, and overall environmental footprint.

Taking this broader perspective, one of the most impactful levers for reducing the impact of generative AI comes into focus: Small Language Models (SLMs). By prioritising efficiency, control and purpose-driven design, SLMs enable more secure and environmentally responsible AI deployments, marking a meaningful shift in how enterprises approach language technologies.

Why the industry started with LLMs and why it's shifting

Large Language Models (LLMs) dominated the early wave of enterprise generative AI adoption. They enabled organisations to rapidly experiment with use cases, such as content generation, conversational agents, and knowledge retrieval, without requiring extensive time or specialised expertise. This versatility combined with strong early hype made LLMs a natural starting point for many digital transformation initiatives. However, as enterprises moved from experimentation to production deployment, the limitations of LLMs became increasingly visible.

The practical limitations of LLMs in enterprise environments

LLMs are powerful, but their scale creates significant operational challenges:

High computational overhead Using LLMs through external APIs can become extremely costly because every request is billed based on the number of input tokens in your prompt and the output tokens generated. In complex systems, where prompts may be long, reasoning chains deep, or where multiple model calls are chained together, these costs can escalate very quickly. Moreover, relying on third party AI providers means your operational budget is exposed to sudden price changes, creating financial and strategic dependency that organisations cannot easily control.
Latency constraints Due to their size, LLMs struggle to consistently meet the response time requirements of time sensitive enterprise workflows. When milliseconds matter, model weight becomes a bottleneck.
Data privacy and sovereignty risks Most LLMs are accessed through third party cloud APIs. For industries like healthcare, finance, or government, sending sensitive data to external systems creates compliance risks around confidentiality and sovereignty. For many organisations, this risk is simply unacceptable.
Limited task specialisation General purpose LLMs are not inherently optimised for domain specific contexts. Fine-tuning them is resource intensive and may still produce inconsistent results or factual inaccuracies or hallucinations. In regulated or high stakes environments, such errors cannot be tolerated.

These constraints have accelerated the shift toward a more focused and sustainable approach: the adoption of SLMs.

Why SLMs are gaining momentum

A SLM (Small Language Model) is a compact, purpose driven language model designed to perform specific tasks efficiently, rather than trying to handle every possible language task like LLMs. SLMs can be categorised into three main families, each optimised for different organisational needs:

Distilled models Created by compressing knowledge from a larger "teacher" LLM, these models retain much of the original capability but in a much smaller footprint. They balance performance and efficiency.
Task specific models Extensively fine-tuned for narrow domains, such as medical summarisation, legal document classification, or industry specific customer support, these models deliver high accuracy in targeted functions.
Lightweight architectures Designed from scratch with efficiency in mind, these models incorporate architectural optimisations to maximise performance while minimising resources.

SLMs flip the script: instead of trying to do everything, they excel at doing the right things, faster, cheaper, and with more control.

Customisable and domain specific by design

Because of their compact and purpose-built architecture, SLMs are naturally easier to customise for specific domains, enabling enterprises to integrate industry specific terminology, improve accuracy on specialised workflows, and reduce hallucinations through narrower, higher quality training data.

Flexible deployment architectures

With an increasing share of enterprise data being processed locally, SLMs are an ideal fit for edge workloads. SLMs unlock new architectural patterns aligned with enterprise priorities:

On premise: for maximum control over data and compliance.
Edge: for real time and offline first workloads.
Hybrid: combining local processing for routine tasks with cloud for large scale training or fallback to LLMs when needed.

This flexibility allows organisations to design AI systems that balance performance, privacy, scalability, and cost.

Efficiency and performance at lower cost

SLMs require far less computational power and memory, enabling deployment on standard Central Processing Units, modest Graphics Processing Units, or even edge devices. Their smaller size results in faster inference speeds, crucial for real time applications, significantly lower energy consumption supporting sustainability goals, and easier environmental assessment because infrastructure is known and close, allowing energy consumption to be measured and impact to be managed through tangible actions.

Public research consistently shows that smaller, task specific models can be significantly more energy efficient than LLMs. For example, a UNESCO report states that small models tailored to specific tasks can cut energy use by up to 90%, representing a smarter, more cost and resource efficient approach: matching the right model to the right job, instead of turning to one large, all-purpose system for everything.

Privacy and sovereignty first

SLMs enable organisations to process sensitive data securely within their own infrastructure. This is especially important in regulated industries and for use cases involving personal or confidential information. SLMs can also operate offline, a critical feature for environments where connectivity is limited, or where data cannot transit through external networks.

A growing industry shift but not without challenges

SLMs represent a significant shift in enterprise AI strategy, offering a more efficient, scalable, reliable, and sustainable alternative to large general purpose models. Although some organisations remain hesitant, adopting SLMs demands more thoughtful design upfront, slightly longer proof of concept phases, and higher initial expertise. However, the long-term benefits in cost, performance, sustainability, and operational control far outweigh these early challenges.

Ready to explore what SLMs could do for you?

Our team has already demonstrated this with a successful SLM deployment in a payment systems validation chat application, proving that SLM first approaches are not only feasible but beneficial. Our team combines deep expertise, hands on experience, and a strong track record of real world SLM deployments. Whether you want to assess the environmental impact of current models, explore SLM replacements, or build new SLM powered solutions from scratch, we can support you at every step — get in touch!

Authors

Andrew Grigg

Head of Sustainability Consulting, Sopra Steria

Sarah Oury

Sustainable AI Consultant, Sopra Steria

Cloud Computing Programme activities

The techUK Cloud Programme ensures the UK stays at the forefront of cloud adoption and is a single point of contact for UK government and key stakeholders on issues impacting the development of the UK cloud market and industry. Visit the programme page here.