Real time CPU and GPU processing optimisation

0

Fujitsu has announced the development of what it says is the the world’s first technology to optimise the use of CPUs and GPUs by allocating resources in real time to give priority to processes with high execution efficiency, even when running programs that use GPUs.

Fujitsu designed the new technology to address the global shortage of GPUs due to the explosive demand for generative AI, deep learning, and other applications, by optimising users’ existing computing resources.

Fujitsu has also developed a new technology for parallel processing that switches processing of multiple programs in real time without waiting for the completion of a running program in an HPC system that performs large-scale computations by linking multiple computers.

This technology makes it possible to immediately execute the processing of applications that require large-scale computational resources and real-time performance like digital twin and generative AI programs.

Fujitsu will provide the newly developed technology as part of a future computer workload broker, a software initiative currently under development that enables AI to automatically calculate and select the most appropriate resource for a problem that customers want to solve according to their requirements, including computation time, computation accuracy, and cost, and will continue to validate this technology with customers, to realise a platform that can solve societal problems and create innovation to achieve a sustainable future.

Features of new technology

1. World’s first technology to use CPU and GPU even during program processing
Fujitsu says it has developed the world’s first technology to distinguish between programs that require a GPU and those that can be processed by a CPU, even when multiple programs are being processed, by predicting the rate of acceleration, and by allocating GPUs in real time for high-priority program processing.

For example, as shown in Figure 1, if the user wants to efficiently process three programs using one CPU and two GPUs, it is possible to assign GPUs to programs 1 and 2 according to the availability of GPUs. Then, in response to the request of program 3, the GPU allocation is changed from program 1 to program 3 for performance measurement, and the degree of processing acceleration on the GPU is measured. As a result of the measurement, it is found that the overall processing time would be reduced by allocating the GPU to program 3 rather than to program 1. Therefore, the GPU would be allocated to program 3 and the CPU would be allocated to program 1 during that time. After program 2 is finished, the GPU becomes free, so the GPU is allocated to program 1 again, and in this way, the computational resources are allocated so that the program processing is completed in the shortest time.

This technology makes it possible to quickly train models for processing graph AI data in the development of applications such as AI using GPUs and advanced image recognition.

Figure 1. Image of CPU and GPU allocation switching

Figure 1. Image of CPU and GPU allocation switching

2. World’s first technology for real-time switching of execution of multiple programs on an HPC system
Fujitsu has developed the world’s first technology for real-time switching between multiple programs that can be used without waiting for the current program to conclude in an HPC system that links multiple computers, enabling HPC systems to be used to execute programs that require real-time performance.

Because the conventional control method uses unicast communication, which switches program execution to each server one by one, variations in switching timing occur, making it difficult to perform batch switching of program execution in real time.

By adopting broadcast communication that can be sent simultaneously to the communication that switches program execution, Fujitsu has enabled real-time batch switching of program execution by reducing the interval between program processing switches that affect program performance from a few seconds to 100 milliseconds in a 256 node HPC environment.

Since the appropriate communication method depends on application requirements and network quality, the optimal communication method can be selected in consideration of the degree of performance improvement due to broadcast communication and performance degradation due to packet loss.

This technology enables applications requiring real-time performance for digital twins, generative AI, and materials and drug discovery to be executed more rapidly using HPC-like computational resources.

Figure 2. Differences in communication methods used to switch program execution


Figure 2. Differences in communication methods used to switch program execution

Future Plans

In the future, Fujitsu plans to use the CPU/GPU resource optimisation technology for processing that requires GPUs for its Fujitsu Kozuchi (code name) – AI Platform, which allows users to quickly test advanced AI technologies.

The HPC optimisation technology will also be applied to Fujitsu’s 40 qubit quantum computer simulator for collaborative computing using a large number of nodes.

Fujitsu will additionally consider applications for Fujitsu Computing as a Service HPC, which offers users the ability to develop and execute applications for simulation, AI, and combinatorial optimisation problems, as well as the Composable Disaggregated Infrastructure (CDI) architecture, a technology that enables to change hardware configurations among servers, in an effort to create a society in which everyone can easily access cost-effective, high-performance computing resources.

Share.

Comments are closed.