These are the secrets of the RDNA 3 instruction set

The RDNA 3 instruction set is used by the RX 7000 and integrated RX 700 graphics cards. What are its secrets?

We could not program any chip without it having an ISA associated with it, which in turn would allow such programs to be compiled and created, and this is not only limited to CPUs, but also to GPUs and graphics chips. This time we bring you the RDNA 3 instruction set of which we will tell you about its novelties so that you have a general idea of what the latest generation graphics cards from AMD can do.

Each new generation of graphics cards brings with it a series of innovations that go beyond their architecture, additional capabilities that were not possible before and that need their related instruction for their use. In the case of the RDNA 3 architecture, we have been able to see improvements in the rate of operations per cycle, new dedicated instructions for artificial intelligence and improvements in Ray Tracing.

What’s new in the RDNA 3 instruction set?

The RDNA 3 instruction set is the set of instructions that allows low-level Shader programs, that is, those closest to machine code, to run on GPUs under the same name architecture. Which are the RX 7000 series graphics cards and the integrated into RX 700 series processor corresponding to Ryzen 7000 APUs, currently only available for laptops.

In this article we will not talk about their internal organization or architecture, but we will review the new capabilities at the hardware level that allow the new set of instructions and we will especially focus on those that are new, leaving aside those already existing in the previous generation.

Wave32 versus Wave64 mode

The RDNA 3 instruction set, like its predecessors, supports two operating modes. What is the difference between the two? The number of items that are arranged together and at most per wave. Being the natural mode of RDNA since the first generation are the waves of 32 elements and one of the things that was most rumored for the new ISA for AMD GPUs long before the launch of these was that there would only be a 32 wave mode, ruling out the 64, in the end, and for the moment it has not been like that.

The shader supports both waves of 32 work items (“Wave32”) and waves of 64 work items (“Wave64”).
- Both wave sizes are supported for all instructions (except those that are Dual Issue or VLIW2 that only work with those of 32).
Shader programs must be compiled and run for a specific wave size, regardless of how many work items are active in a given wave.
- Therefore, it implies that if a game has its shaders precompiled for Wave64, if we want to take advantage of using Wave32 we must recompile it. This point is important for potential improved versions of video game consoles.
waves of wave32 They issue each instruction at most once.
Waves wave64 they typically issue each instruction twice: once for the low half (work items 31-0) and then again for the high half (work items 63-32).

This point is important, since despite the fact that at the programming level we have a single RDNA 3 instruction set, internally the GPU works under two different instruction encodings, hence the fact that two different compilations are used. Either by the driver or with precompiled shaders.

Dual Issue in RDNA 3

The concept of Dual Issue refers to the fact of coding two identical or different instructions in parallel. This is made possible by the fact that certain combinations are possible taking into account the availability of execution units. Let’s not forget that there is not a single ALU, but several units, each of them specialized in one or several instructions, which is normal in any type of processor.

We have two types of combinations in the RDNA 3 instruction set, the first is the one in which two different instructions are executed at the same time as one and depends on the availability of the ALUs at each moment. That is, it is a way of taking advantage of resources when they are in disuse. The other refers to an ability that ALUs have to SIMD registers and affects instruction twin pairs.

Rapid Packed Math, but 32-bit

Because it is possible to combine two instructions FMADD, 2 operations per cycle each. This gives it the ability to double the TFLOPS rate in that mode compared to RDNA 2. However, combinations of the same instruction are possible thanks to the SIMD over register capability. That is, the same register for an N-bit ALU can be converted into two registers for two N/2-bit ALUs working in unison.

A good part of the combinations that allow the Dual Issue in the RDNA 3 instruction set were already in the Rapid Packed Math mode that we first saw in AMD Vega. The difference here is that not two 16-bit instructions can be combined, but two 32-bit.

Which could serve as a hint for a future iteration of RDNA with 64-bit ALUs replacing CDNA as a GPU for high-performance and scientific computing. However, this is a long shot and not related to the RDNA 3 instruction set, but rather to a future superset.

WMMA, operations with matrices and units for AI

AMD has chosen not to adapt the CDNA Matrix Core Units in RDNA 3, possibly because they are prepared to work only with waves of 64 elements. However, the lack of systolic matrix units has been alleviated by adapting SIMD units for it. The WMMA mode adds a series of instructions in RDNA 3 that speed up Deep Learning and Machine Learning algorithms, but without reaching the speed of a specialized unit.

In particular, these units are designed to speed up the calculation of the operation of matrices, systolic arrays, but GPUs with RDNA 3 instruction set lack these units, but in turn have integrated a series of instructions that allow operations to be carried out. WMMA, that is, matrix addition and matrix multiplication in SIMD units. What is achieved? Double the speed when executing these instructions, but it is not x8 as it happens when using any type of systolic unit.

The way to speed up the execution of WMMA instructions in RDNA 3 is simple, matrix operations are usually row multiplied by column many times, so the system stores rows and columns separately as information to treat them as vectors to be executed. Thus, in a 3 x 3 matrix, we would have, on the one hand, A1, B1 and C1 as a row, and A1, A2 and A3, and, as a column, as two different vectors. Each of them managed by a SIMD unit within the Compute Unit when operating.

ray tracing

AMD has improved its Ray Accelerator Unit compared to the very poor version for RDNA 2, since now this unit can finally go through the BVH data structure, so we will no longer depend on a Shader program. Which was one of the previously existing bottlenecks, however, such capability is not reflected in the RDNA 3 instruction documentation.

However, in the RDNA 3 instruction set we only have the intersection instruction documented. Therefore, we assume that the traversal in the BVH tree that saves the organization of the 3D scene and that is used during the execution of the Ray Tracing algorithm is carried out by the driver or is called from it.

This is curious because it tells us that the old mode can be used, but at the same time that it will be necessary to rethink the Shader code to make use of the new features, but these are left out of the RDNA 3 instruction set, which allows AMD use the unit on future architectures, even if they are not binary compatible.

RDNA 3 on PS5 Pro?

No, no section has been misplaced, but if we listen to the rumor mill, everything would indicate that a possible PS5 Pro, if it exists, could be based on a custom version of RDNA 3. In any case, we must bear in mind that AMD actually goes so far as to create patchwork versions for consoles, especially to maintain backwards compatibility. So it will rather take from the RDNA 3 instruction set what SONY, and specifically Mark Cerny, is interested in putting into a potential improved iteration of the architecture.

What is not very clear to us is if RDNA 3 is a superset of the RDNA 2 instruction set or if it is not a different ISA. In any case, internally they have suppressed the Wave64 mode, so from the point of view of the chip it has disappeared. This, which may seem banal, could create Shader programs that depend on the timing of the instructions. In any case, an eventual PS5 Pro should keep GCN’s 64-item per wave mode for backwards compatibility with the huge catalog of PS4 and PS4 Pro available to PlayStation users.

LoL Patch 14.6: Riot reveals the next Crystallis…

BTS member Suga’s concert film D-Day will be released in…

This is why game journalists are “bad” at video games

The Gentlemen on Netflix – Should you watch the film…

Shogun release schedule: When is episode 4 coming out? |…

The new Assassin’s Creed title moves away from the game, clues…

The use of AI in board games causes a new scandal

Fortnite Chapter 5 Season 2: Mythical weapons, bosses and…

Halo Infinite: A new map with Operation Cyber Showdown III

New LoL High Noon skins: release date, champions and more

Bradley Gervais

Meet Bradley Gervais, the maestro of Gaming Mods and IT wizardry. With a profound understanding of gaming intricacies, Bradley transforms digital landscapes, pushing the boundaries of gaming experiences through innovative modifications. His expertise extends beyond the virtual realm, seamlessly intertwining with the complexities of Information Technology. Bradley's skill set is a fusion of creativity and technical prowess, earning him a reputation as a go-to authority in the gaming modding community. Whether crafting immersive game enhancements or navigating the IT labyrinth, Bradley Gervais is your guide to a seamlessly blended world where gaming prowess meets technological finesse.

These are the secrets of the RDNA 3 instruction set