NVIDIA RTX 3000 Series GPUs: Here’s What’s New.

On September 1, 2020, NVIDIA revealed its new line of gaming GPUs: the RTX 3000 series, based on its Ampere architecture. We’ll talk about what’s new, the AI-powered software that comes with it, and all the details that make this generation truly amazing.

Meet the RTX 3000 series GPUs

NVIDIA’s main announcement was its shiny new GPUs, all built on a custom 8nm manufacturing process, and all bringing significant speedups to both raster and ray tracing performance.

NVIDIA RTX 3000 Series GPUs: Here’s What’s New

At the lower end of the lineup is the , which costs $499. It’s a bit pricey for the cheapest card NVIDIA unveiled in the initial announcement, but it’s an absolute steal once you learn that it outperforms the existing RTX 2080 Ti, a top-of-the-line card that regularly sells for well over $ 1400. However, after NVIDIA’s announcement, the third-party sales price dropped, with a large number of them panic-selling on eBay for less than $600.

There are no solid benchmarks from the announcement, so it’s unclear if the card is objectively “better” than a 2080 Ti, or if NVIDIA is tweaking the marketing a bit. The benchmarks that were run were at 4K and likely had RTX turned on, which may make the gap seem larger than it will be in purely raster gaming, as the Ampere-based 3000 series will run more than twice as fast in the ray tracing than Turing. But, since ray tracing is now something that doesn’t affect performance much and is supported on the latest generation of consoles, it’s a major selling point that it runs as fast as the last-gen flagship by almost a third. of the price.

NVIDIA RTX 3000 Series GPUs: Here’s What’s New

It is also not clear if the price will continue to be so. Third-party designs regularly add at least $50 to the price, and with likely high demand, it won’t be surprising to see it sell for $600 in October 2020.

Right on top of that is the at $699, which should be twice as fast as the RTX 2080, and arrive around 25-30% faster than the 3080.

NVIDIA RTX 3000 Series GPUs: Here’s What’s New

Then, at the higher end, the new flagship is the , which is comically huge. NVIDIA is well aware and referred to it as “BFGPU”, which the company says stands for “Big Ferocious GPU”.

NVIDIA didn’t show any direct performance metrics, but the company did show it running 8K games at 60 FPS, which is really impressive. Of course, NVIDIA is almost certainly using DLSS to hit that mark, but 8K gaming is 8K gaming.

NVIDIA RTX 3000 Series GPUs: Here’s What’s New

Of course, there will eventually be a 3060 and other more budget-oriented card variations, but those usually come later.

To really cool things down, NVIDIA needed a fresh design. The 3080 is rated at 320 watts, which is pretty high, so NVIDIA has gone with a dual fan design, but instead of putting both vwinf fans on the bottom, NVIDIA has put one fan on the end. top where the back plate usually goes. The fan directs air upwards, toward the CPU cooler, and toward the top of the case.

Judging by how much performance can be affected by poor airflow in a case, this makes a lot of sense. However, the circuit board is very tight due to this, which will likely affect third-party sales prices.

NVIDIA RTX 3000 Series GPUs: Here’s What’s New

DLSS: A Software Advantage

Ray tracing isn’t the only benefit of these new cards. It’s all a gimmick, really: the RTX 2000 series and 3000 series aren’t much better at doing actual ray tracing, compared to previous generations of cards. Ray tracing an entire scene in 3D software like Blender usually takes a few seconds or even minutes per frame, so you can’t brute force it in less than 10 milliseconds.

Of course, there is dedicated hardware for running lightning calculations, called RT cores, but to a large extent, NVIDIA took a different approach. NVIDIA improved denoising algorithms, which allow GPUs to generate a very cheap single pass that looks terrible and somehow, through the magic of AI, turn it into something a gamer wants to look at. When combined with traditional raster-based techniques, it offers an enjoyable experience enhanced by ray-tracing effects.

However, to make this fast, NVIDIA has added AI-specific processing cores called Tensor cores. These process all the math needed to run machine learning models, and they do it very quickly. They are a total , since many companies use AI extensively.

NVIDIA RTX 3000 Series GPUs: Here’s What’s New

Beyond denoising, the main use of Tensor cores for gamers is called DLSS, or deep learning supersampling. It takes a low quality frame and scales it to full native quality. Basically, this means you can game at 1080p-level frame rates, while watching a 4K image.

This also helps quite a bit with ray tracing performance – display an RTX 2080 Super Running Control at ultra quality, with all ray tracing settings maxed out. At 4K, it struggles with just 19 FPS, but with DLSS on, it gets a much better 54 FPS. DLSS is free performance for NVIDIA, made possible by Tensor cores on Turing and Ampere. Any game that supports it and is limited by GPU can experience severe throttling with software alone.

DLSS isn’t new and was announced as a feature when the RTX 2000 series launched two years ago. At the time, it was supported by very few games, as it required NVIDIA to train and tune a machine learning model for each individual game.

However, in that time, NVIDIA has completely rewritten it, calling the new version DLSS 2.0. It’s a general-purpose API, which means any developer can implement it, and most major versions are already using it. Instead of working on one frame, it takes motion vector data from the previous frame, similar to TAA. The result is much sharper than DLSS 1.0, and in some cases looks better and sharper even than the native resolution, so there’s not much reason not to turn it on.

NVIDIA RTX 3000 Series GPUs: Here’s What’s New

There is a catch: when switching scenes entirely, such as cutscenes, DLSS 2.0 must render the first frame at 50% quality while waiting for the motion vector data. This can result in a small drop in quality for a few milliseconds. But 99% of everything you look at will render correctly and most people don’t notice this in practice.

Ampere Architecture: Built for AI

The amp is fast. Really fast, especially in AI calculations. The RT core is 1.7 times faster than Turing, and the new Tensor core is 2.7 times faster than Turing. The combination of the two is a true generational leap in ray tracing performance.

In early May, a data center GPU designed to run AI. With it, they detailed a lot of what makes Ampere so much faster. For high-performance computing workloads and data centers, Ampere is typically around 1.7 times faster than Turing. For AI training, it’s up to 6 times faster.

NVIDIA RTX 3000 Series GPUs: Here’s What’s New

With Ampere, NVIDIA is using a new number format designed to replace the industry standard “Floating-Point 32,” or FP32, in some workloads. Under the hood, each number your computer processes occupies a predefined number of bits in memory, be it 8 bits, 16 bits, 32, 64, or even more. Numbers that are larger are harder to process, so if you can use a smaller size, you’ll have less to process.

FP32 stores a 32-bit decimal number and uses 8 bits for the number’s range (how big or small it can be) and 23 bits for precision. NVIDIA’s claim is that these 23 bits of precision aren’t quite necessary for many AI workloads, and you can get similar results and much better performance with just 10 of them. Reducing the size to just 19 bits, instead of 32, makes a big difference in many calculations.

This new format is called Tensor Float 32, and the Tensor Cores in the A100 are optimized to handle the odd size format. This is, in addition to the die reductions and core count increases, how they are getting the massive 6x speed in AI training.

In addition to the new number format, Ampere is experiencing significant performance speedups in specific calculations, such as FP32 and FP64. These don’t directly translate to more FPS for the layman, but they are part of what makes it almost three times faster overall in Tensor operations.

Then, to further speed up the calculations, they’ve introduced the concept of , which is a very fancy word for a fairly simple concept. Neural networks work with large lists of numbers, called weights, that affect the final output. The more numbers it processes, the slower it will be.

However, not all of these numbers are really useful. Some of them are literally zero, and can basically be thrown away, which leads to massive speedups when you can process more numbers at the same time. Scattering essentially compresses the numbers, which requires less effort to do calculations. The new “Sparse Tensor Core” is designed to operate with compressed data.

Despite the changes, NVIDIA says this shouldn’t noticeably affect the accuracy of the trained models at all.

For Sparse INT8 calculations, one of the smallest number formats, the maximum performance of a single A100 GPU is over 1.25 PetaFLOP, a staggeringly high number. Of course, that’s only when processing a specific type of number, but it’s awesome nonetheless.

NVIDIA RTX 3000 Series GPUs: Here’s What’s New