Synthetic data’s pivotal role in the future of compute
The vast problem-solving potential of today’s HPC means that science and industry can develop technologies that have previously been impossible. HPC systems can perform quadrillions of calculations per second, and in concert with developments like quantum computing this performance will grow many times over. Data sits at the heart of these calculations. But there are many instances where it’s impossible, impractical or illegal to use real data, stopping innovation in its tracks.
Synthetic data is a solution. Rather than being the data equivalent of ‘lorum ipsum’ placeholder text, synthetic data accurately reflects the real data to enable significant and valuable conclusions to be drawn. Using AI and modelling, you can synthesise whole swathes of data in a repeatable and realistic fashion. By synthesising input data from scratch or by creating new data from old, you can build, test and demonstrate a proof of concept at maximum velocity with total control and with your risks mitigated.
Synthetic data keeps the build moving
The future of compute looks exciting, and you want to make decisions fast. Maybe quantum optimisation is ideal for your goals, or your next killer app will depend on a graph neural net. But you need data to go deep in the build process – and different compute means different data to what you already might have. Synthetic data can bridge the gap.
Perhaps you’re leaning on the latest GPUs, using HPC to power ultra-high-resolution forecasts. Or perhaps you’re considering quantum computing, with its potential to solve optimisation problems that are intractable today. What if your input data isn’t high enough resolution, or you simply don’t have enough? How do you test your nascent solution? Or demonstrate it to the board?
Starting from scratch
If you only have old data – or none at all! – that shouldn’t stop you building new analytics solutions. If anything, it’s an opportunity. For many data sets, you can build a mathematical model that represents the fundamental behaviours you anticipate, like surges in power use at teatime or drivers diverting from a closed road. How rich a data set you create depends on the complexity of your use case: the metaverse is being used as synthetic data for training autonomous vehicles, for example. But even if your synthetic data is only a rough approximation of reality, it can still be hugely valuable for testing high-impact scenarios.
You can also use data synthesis to fill in the blanks. Sometimes you can’t measure everything, or can’t record things often, like in the core of a nuclear reactor. Informed by your best understanding of the whole, a model of the system underneath can help you extrapolate from what you know to what you don’t, using techniques like Bayesian inference powered by the latest HPC.
Mimicking data to mitigate risk
On the other hand, when you’re data-rich, synthetic data can be used to mitigate risk. What if you’re uneasy about using real data in a development sandbox, or when demonstrating to potential partners? To avoid using real data until it’s necessary, you could train a generative adversarial network to make new data from old.
It’s vital to consider privacy and regulation, even with synthetic data. If you use real data to train a synthesiser, signatures of that data could still be present in what you generate, especially if overfitting is a risk – so GDPR may apply, for example. Beware too of issues like copyright as regulation catches up with innovation, coming to the fore in today’s diffusion-based image generators. Your least risky approach in regulated domains could be to synthesise realistic data from scratch, with the added benefit that it’s now totally in your control
As compute moves forward, we’re seeing a shift. Tomorrow isn’t about just making faster versions of the hardware we already use; it’s about harnessing different technologies like quantum computers, digital annealers and FPGAs. But building new high-performance solutions on new hardware needs new data. Synthetic data can help you better incorporate AI and other types of modelling into your business today while also allowing you to experiment with that technology of tomorrow. The synthetic world is your oyster.
Written by Dr. Francis Woodhouse, Technical director at the Smith Institute
Future of Compute Week 2022
During this week we will deep-dive into a number of themes that if addressed could develop our large scale compute infrastructure to support the UK’s ambitions as a science and technology superpower. To find out more, including how to get involved, click the link below
Laura is techUK’s Head of Programme for Technology and Innovation.
She supports the application and expansion of emerging technologies, including Quantum Computing, High-Performance Computing, AR/VR/XR and Edge technologies, across the UK. As part of this, she works alongside techUK members and UK Government to champion long-term and sustainable innovation policy that will ensure the UK is a pioneer in science and technology
Before joining techUK, Laura worked internationally as a conference researcher and producer covering enterprise adoption of emerging technologies. This included being part of the strategic team at London Tech Week.
Laura has a degree in History (BA Hons) from Durham University, focussing on regional social history. Outside of work she loves reading, travelling and supporting rugby team St. Helens, where she is from.
Rory joined techUK in June 2023 after three years in the Civil Service on its Fast Stream leadership development programme.
During this time, Rory worked on the Government's response to Covid-19 (NHS Test & Trace), school funding strategy (Department for Education) and international climate and nature policy (Cabinet Office). He also tackled the social care crisis whilst on secondment to techUK's Health and Social Care programme in 2022.
Before this, Rory worked in the House of Commons and House of Lords alongside completing degrees in Political Economy and Global Politics.
Today, he is techUK's Programme Manager for Emerging Technologies, covering dozens of technologies including metaverse, drones, future materials, robotics, blockchain, space technologies, nanotechnology, gaming tech and Web3.0.