Is there a convergence of AI with HPC? What are the new workloads?
Author: Vasilis Kapsalis, Northern EMEA Sales Leader at SambaNova Systems
The convergence of AI and HPC refers to exchanges that have taken place between the technologies over the last decade. To date, HPC and AI have shared ‘best practices,’ but there remains a separation. HPC continues to focus on fundamental models in science while AI is being leveraged in more creative activities.
High-end HPC workloads are run in data centres, using large specialist systems and clustered servers. Over time, AI developers have found they needed access to more powerful hardware, particularly with the trend towards training large and more complex models. They turned to HPC practitioners and systems to bolster performance.
In parallel, HPC teams have begun to gain exposure to new approaches used within AI, for example, mixed precision floating point and the use of models referred to as surrogates to perform lookups based on past results to avoid rerunning more predictable parts of a computationally expensive simulation.
In HPC, there is new interest in Physics Informed Neural Networks. These are models trained on data relating to physical systems that learn to understand the characteristics and behaviour of a system so that they can provide predictions. These models take advantage of the relationship between mathematics' continuous and discrete (quantised) domains. Continuous functions, such as partial differential equations, frequently occur in physics. They have an analogy in the discrete-time with the weightings in the neural network converging on values to allow approximation of the continuous domain results.
Over time HPC and AI will likely continue to share best practices, however, there will remain some separation. While HPC continues to focus on fundamental models in science, AI is stepping into many more creative activities.
What's changing in silicon, and is this driving change?
The trend of technology inter-change will likely continue, though we need to also set this in the context of our understanding of Moore's Law (which was only ever an observation). In semiconductor design, the miniaturisation gains that for many decades have permitted the creation of smaller and smaller transistors are fading, with many people talking about the end of Moore's Law. It may not be the case, but the free lunch for software developers is over.
While chip manufacturers present process roadmaps down to 1.8nm (nanometers), these size/dimension references only refer to the smallest feature size, for example, gate length. In practice, transistors and other features are much larger, so the named process sizes in nanometers are very much marketing representations. Two nanometers is the equivalent of just 18 silicon atoms wide, much smaller, and a feature risks disappearing altogether. At this size, there are also challenges with the insulating oxide layers, which become too thin to work effectively. As a result, different strategies, including novel approaches to architecture and software, are essential to continue scaling future performance. Moving forward, this will restore a focus on the benefits of heterogeneous systems which leverage the benefits of alternative forms of computer architecture, such as specialist approaches for processing tensors and the application of dataflow techniques.
Changes in approach to compute
Today, the main strands of technology driving innovation in compute models/architectures are those offering specialist capabilities in processing AI workloads and tensors. Tensors are a mathematical concept that generalises matrices in more than two dimensions and provides the basic data structures and rules used within Deep learning models. Training such models places a significant burden on the memory architecture of traditional CPUs and GPUs, which continuously need to load/reload data and instructions to process the different nodes or layers within a Deep Learning model. As a result, these architectures rely heavily on expensive high-bandwidth memory to sustain performance. With the issues that happen when you exceed the size of localised memory, developers are often forced into adopting memory management strategies, making it harder and less efficient to train AI models.
In contrast, architectures such as dataflow allow the instantiation of an entire Deep Learning Neural Network laid out on processor tiles. As a result, training data can flow efficiently from layer to layer without burdening the memory. Dataflow architectures can support larger AI models using standard DRAM memory, making scaling much easier and more efficient.
AI Model types, use cases and Foundation Models
High-end AI models are already finding applications in research and business. Others may remain in the emerging category but show significant promise in test environments. Still, a key trend that is particularly promising with respect to broader application and re-usability is the concept of Foundation Models. In general, Foundation Models are larger models that can support multiple tasks. With this flexibility, organisations can avoid managing and retraining multiple separate models. Right now, the use cases seeing traction with Foundation Models are in the areas of Large Language Models (LLMs) and Vision Transformers. These and other use cases are listed below.
Large Language Models
There is significant interest in the capabilities of large language models (i.e. advanced NLP). These offer better accuracy in understanding text-based content and generating human-like text in response to an initial input or question. These models also exhibit high flexibility in supporting domain-specific tasks in financial services, legal, public administration, or research/science sectors. Fine-tuning can deliver higher precision for these domains, enabling activities such as the summarisation of documents and question-and-answer services based on a collection of documents. Other tasks can include classifying legal contracts based on those needing review in response to a change in legislation; or determining sentiment from business reports.
Computer Vision and Vision Transformers
Computer vision models have existed for some time, though more recently, a newer class of models, such as visual transformers, have significantly increased the capabilities available. In respect of classification (identifying objects/features in an image) or segmentation (e.g. marking out the boundaries of a tumour), there has been significant progress. Today, leading providers, typically offering specialist dataflow or Tensor solutions, can process much larger images (up to 50k x 50k resolution) without the need to downsize or tile the image; this affords greater accuracy and enhances reproducibility. These capabilities extend into 3D image data, with applications across fields such as medical imaging, 3D mapping and seismic data analysis.
Another class of visual transformers are also increasingly demonstrating an ability to generate visual content or act as a response to text-based descriptions. While these cases are in many ways still emerging, there is potential within creative industries, though outstanding issues on copyright need resolving. For example, can an AI model generate copyrightable material, and what if that output contains some element of previously copyrighted material used in training the model? These same risks exist with the text-based transformers trained on internet data, though the visual content cases have accelerated the debate.
Vision transformers have also been able to make predictions for specific physical systems, for instance, changes in precipitation in a short-term weather forecast, through an approach called nowcasting, which links to the next topic.
As mentioned in the first section, there is also significant interest in Physics Informed Neural Networks (PINNs). These can replace more computationally expensive HPC models and the ability to simulate physical environments without necessarily having to set up first principle equations. PINNs are an area of active research; they could boost extensive multi-component simulations such as weather forecasting or even enhance the physics in computer games or metaverse simulations.
Graph Neural Networks
Graph Neural Networks (GNNs) are a specialist type of model that can encapsulate elements of an AI and which can process data, with the ability to store and link information/data in a graph. GNNs have potential applications in fields such as medical research, allowing, for example, a better understanding of the relationship between molecule structures and metabolic pathways to help facilitate the optimisation of drug metabolisation or in finding synergistic drug combinations. However, such capabilities may also cause concern if used by bad actors, for example, by determining toxic molecules that might serve as nerve agents. (Urbina, F, Lentzos, F, Invernizzi, C and Ekins, S (2022) Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence 4(3), 189–191.CrossRef). However, as with disinformation, AI can also help provide solutions.
A call to action
The UK has a strong tradition of research and innovation, a capability that will help drive advances in the application of high-end computing and AI. While the US, China, and EU have larger budgets for supercomputing, the UK can differentiate by leveraging its strong academic and research base to address specific challenges in science, engineering and business by applying novel heterogeneous computing techniques. Achieving this will require refocusing and an increase in HPC/AI funding.
Future of Compute Week 2022
During this week we will deep-dive into a number of themes that if addressed could develop our large scale compute infrastructure to support the UK’s ambitions as a science and technology superpower. To find out more, including how to get involved, click the link below
Laura is techUK’s Head of Programme for Technology and Innovation.
She supports the application and expansion of emerging technologies, including Quantum Computing, High-Performance Computing, AR/VR/XR and Edge technologies, across the UK. As part of this, she works alongside techUK members and UK Government to champion long-term and sustainable innovation policy that will ensure the UK is a pioneer in science and technology
Before joining techUK, Laura worked internationally as a conference researcher and producer covering enterprise adoption of emerging technologies. This included being part of the strategic team at London Tech Week.
Laura has a degree in History (BA Hons) from Durham University, focussing on regional social history. Outside of work she loves reading, travelling and supporting rugby team St. Helens, where she is from.
Rory joined techUK in June 2023 after three years in the Civil Service on its Fast Stream leadership development programme.
During this time, Rory worked on the Government's response to Covid-19 (NHS Test & Trace), school funding strategy (Department for Education) and international climate and nature policy (Cabinet Office). He also tackled the social care crisis whilst on secondment to techUK's Health and Social Care programme in 2022.
Before this, Rory worked in the House of Commons and House of Lords alongside completing degrees in Political Economy and Global Politics.
Today, he is techUK's Programme Manager for Emerging Technologies, covering dozens of technologies including metaverse, drones, future materials, robotics, blockchain, space technologies, nanotechnology, gaming tech and Web3.0.