Discussion exciting new features, research & advancements in gaming (graphics & adjacent software)

marees · Jan 11, 2026

thread to discuss all new features to gaming GPU architectures & graphics APIs

idea behind this thread is to discuss "Real time graphics, GPGPU, ML and adjacent (physics, audio etc) software research"

(suggestion: if you are posting videos then post text links also)

separate thread below for hardware features

Discussion - General GPU µArch Research Thread

Hi all, thought I'd start some new threads just for research on µArchs, with this one specific to GPUs alone.

socialtechwork.com

reference 1 — GPU blogs

nvidia GeForce — https://blogs.nvidia.com/blog/category/gaming/
amd GPUOpen — https://gpuopen.com/
intel Arc — https://game.intel.com/in/stories/?category=intel-arc

reference 2 — Gaming APIs

OpenGL — Cross-platform GAPI. Use if uncertain.
Vulkan — OpenGL but low-level. High-performance, cross-platform.
MoltenVK — Enables Vulkan on macOS/iOS.
DirectX 11 — Microsoft's GAPI for Windows. Use if Windows-only.
DirectX 12 — Improved, lower-level for Windows. Use for more features.
WebGL — FLASH. Web standard for 3D graphics. Use for web rendering.
WebGPU — Successor to WebGL. Better, broader support. Use for modern web graphics (many functions not supported on newer webb updates).
Metal — Apple's low-level GAPI. Use for macOS/iOS development.

reference 3 — history of GPUs

https://www.techspot.com/article/659-history-of-the-gpu-part-4/

https://www.techspot.com/article/2702-gpu-compute-ai/

marees · Jan 11, 2026

"Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment."

marees said:
AMD blog on this — for those who want text version

GDC 2024 Work graphs and draw calls – a match made in heaven! - AMD GPUOpen

Introducing "mesh nodes", which make draw calls an integral part of the work graph, providing a higher perf alternative to ExecuteIndirect dispatches.

gpuopen.com

& follow up demo paper/pdf here

https://twitter.com/x/status/1813318845769097417

extracts from blog & paper (as screenshots)
View attachment 136538

View attachment 136541

View attachment 136540

MrMPFR said:
That's the boring paper for mesh nodes launch.

At HPG 2025 they went full on crackpot with GPU tree gen: https://gpuopen.com/download/Real-Time_GPU_Tree_Generation.pdf

marees said:
AMD's solution to the VRAM problem & memory price issues ??

https://twitter.com/x/status/1937338942157898113

marees · Jan 11, 2026

removed speculative info on patents

marees · Jan 11, 2026

removed post on CCU as it is a hardware feature

MrMPFR · Jan 11, 2026

The most impressive Work Graphs paper to date from HPG 2025:

GPUOpen download: https://gpuopen.com/download/Real-Time_GPU_Tree_Generation.pdf

HPG talk:

Bastian Kuth's YT demonstration:

Features:

100% GPU driven SOTA procedural tree generation using GPU Work Graphs API and Mesh Nodes extension
Per frame changes based on 150 parameters: "Our model supports procedural displacement, seasonal changes, complex pruning, animation, culling, continuous LOD, and intuitive artistic control with real-time edits."
Based on 55kB of generation code vs 34.8GB for pregenerated assets (static)
Allocates a 1.5GB scratchpad buffer for work graph vs Execute Indirect's combinatorial explosion that makes this impossible on consumer grade HW
Rendered in real-time with 3.13ms total overhead (G-buffer + PCG) on RX 7900 XTX
Auto LOD geometry overhead to ht certain FPS or ms target.
Nanite like continuous LOD
Responds to wind in very convincing way (rebuild for every single frame)

marees · Jan 12, 2026

this is very old (from 6 years ago)

but I am curious. whatever happened to CPU based ray tracing for directX11 games ?

World of Tanks Implements Ray Traced Shadows for ANY DX11 GPU

Stuka87 said:
Meant to post this yesterday, but I forgot. The video link below goes through how the implemented it, and its an interesting watch. For anybody wondering why its only shadows, its because each tank has in excess of 50k polygons, and you can have up to 50 tanks on screen at any given time. Currently no hardware exists that can handle that kind of load with RT reflections/GI. Wargaming has always tried to make their engine run on even low end hardware, and not require special hardware.

However, with how they implemented the RT Shadows, with help from Intel, it can run on any GPU. I play the game a lot, so I downloaded their test client, with the game settings at Ultra, and RT Shadows at HIGH, my system with a 4.5GHz 4690K and an RX480 4GB ran it really quite well at 1080P. It did drop below 60fps a few times. As a note, this does not use any specialized RT hardware, even if the GPU has it. This uses a combination of CPU and GPU to handle the calculations, and they use only 1 ray per pixel, with a custom dithering filter that they came up with which is shown in the video.

Details: https://worldoftanks.com/en/news/general-news/ray-tracing/

Developer Video:

marees · Jan 16, 2026

just came across a very good blog by Nvidia on work graphs from 2 years ago

Advancing GPU-Driven Rendering with Work Graphs in Direct3D 12 | NVIDIA Technical Blog

GPU-driven rendering has long been a major goal for many game applications. It enables better scalability for handling large virtual scenes and reduces cases where the CPU could bottleneck a game’s…

developer.nvidia.com

Work graphs new functionality

D3D12 already exposes functionality to aid in GPU-driven rendering, as mentioned previously. This section highlights the new functionality introduced by work graphs, compared to existing functionality.

Dynamic shader selection

Each node in the work graph can choose which of its children to run. The decision is driven by the producer’s shader code itself. This enables decisions to be determined by information generated by the GPU in a previous node or workload.

On the other hand, ExecuteIndirect is confined to work under the state it was launched with, most notably the shader specified by the pipeline state object. An application that needs to launch different shaders depending on GPU-side data has no choice but to issue a series of SetPipelineState and ExecuteIndirect calls, or rely on inefficient uber shaders to cover only some of the potential possibilities.

Implicit micro-dependency model

Rendering a frame involves executing several major passes, such as depth, geometry, or lighting passes. Within each pass, data is processed in parallel, where each unit of data goes through several sequential operations. Resource barriers are usually placed between the operations to ensure data processing is completed by the previous operation before moving to the next.

A work graph expresses this dependency implicitly by producer nodes passing records to children nodes. Children node shaders will only run when the producer has completed writing the record, implying that the data is fully ready for consumption by the child. Note that the scope of work graph producer-consumer dependencies are on the data record scope, whereas a resource barrier operates on all accesses to a resource.

The work graph dependency model is fine-grained compared to barriers. This can translate to better occupancy on the GPU, as dependent work can launch earlier instead of waiting for a barrier to finish. Records can immediately pass from the producer to the consumer node and need not be fully flushed across algorithm steps as is the case for Dispatch-ResourceBarrier sequences.

Figure 2 illustrates how the workloads are executed in each case. On the left, two Dispatch calls separated by a ResourceBarrier. Each row represents a producer thread-group (green) and its consumer thread-group (blue). On the right, the same workloads run with a work graph.

The image on the left shows two columns of multiple rows of blocks. The two columns are separated by a vertical line representing a resource barrier. An image on the right shows the same blocks but without the vertical line. The blocks are all packed tightly next to each other.

Figure 2. A comparison of workload execution

Work graphs overview

Shader Model 6.8 for D3D12, among many other features, marks the official release of work graphs. The term ‘graph’ in the name holds up well to its definition: a collection of nodes connected by edges. In work graphs, nodes perform tasks (“work”) and pass data to other nodes across the graph edges.

But what is this work that a node executes? Is it a command such as a Dispatch call? A single thread running a certain shader? Or perhaps a group of threads running the same shader?

The answer is, all of the above. Each node has a shader that is launched in a certain configuration of the programmer’s choice. This configuration, or launch mode, can be a full dispatch grid (broadcast launch) or compute threads run either independently of each other (thread launch) or potentially collectively (coalescing launch). Note that Thread Launch work can be gathered to run in a wave where possible, but each thread will still have its inputs independent of other threads.

A connection to another node is realized by choosing the target node and passing data to it. This resembles what is typically known as continuation in graph terminology. The target node receives the data and runs outside its caller’s context. There is no call stack in this system, just data cascading from the top to the bottom of the graph.

Units of data, called records, drive the entire execution of the work graph. To launch a node, a record must be written for it. The node’s shader is then launched in the chosen launch mode, and consumes that record as input. The record is a packed structure of data filled by the producer. The producer could be the CPU’s command DispatchGraph, or any node in the work graph. A node consuming the record could be thought of as a child of the producer node.

A work graph with one block representing a producer node, connecting to three other blocks, each representing a child node. Each connection line represents a single record being passed from the producer to one of its children. Each child specifies a different node launch mode: broadcast launch, thread launch, and coalescing launch.

Figure 1. A work graph with a root node producing records to three children. Each child’s launch mode is different, resulting in different total threads launched for each node per single input record

marees · Jan 16, 2026

Work graphs enable the GPU to assign work to itself, significantly improving rendering efficiency in areas that previously relied on the CPU to do the heavy lifting. Work graphs are so flexible that 3D engines can run almost entirely on the GPU when used appropriately.

Mesh nodes are an extension of work graphs that allow the GPU to issue draw calls to itself. Again, this shifts work from the CPU to the GPU.

AMD researchers reduce graphics card VRAM capacity of 3D-rendered trees from 38GB to just 52 KB with work graphs and mesh nodes — shifting CPU work to the GPU yields tremendous results

What would take 38GB of VRAM to hold now takes just a measly 52KB.

www.tomshardware.com

marees · Jan 16, 2026

Speaking on the NeoGAF forums, known AMD leaker Kepler L2 commented on the possibility of next-generation consoles having access to the DirectX 12 Work Graphs feature, which delivers better performance and improved scalbility across complex workloads including procedural generation, particle systems and AI logic, saying that while hardware and API will be ready, it is unlikely to see widespread adoption, as the engines and tools currently in use don't support this style of graphics programming.

As such, it will be a while to see if the feature will bring significant performance improvements on consoles that go well beyond what we have already seen on PC shortly after its release.

PlayStation 6, Xbox Next Won't See Widespread Adoption Of DirectX 12 Work Graphs, Especially in the Cross-Gen Period - Rumor

Though it will be part of next-gen development, PlayStation 6 and the next Xbox console won't see widespread adoption of the DirectX 12 Work Graphs feature.

wccftech.com

marees · Jan 16, 2026

It's up to developers to build Work Graphs into games, so this DirectX update isn't going to boost performance across all your games instantly. You'll also need a relatively modern GPU to take advantage of Work Graphs. Nvidia says this feature works on Ampere (RTX 30) and Ada Lovelace-based (RTX 40) video cards. On the AMD side, support is limited to the Radeon RX 7000 series.

New DirectX 12 Feature Eliminates CPU Bottleneck in Game Rendering

Work Graphs allow the GPU to schedule its own work instead of asking the CPU.

www.extremetech.com

marees · Jan 16, 2026

marees said:
Nvidia says this feature works on Ampere (RTX 30) and Ada Lovelace-based (RTX 40) video cards. On the AMD side, support is limited to the Radeon RX 7000 series.

this is the minimum hardware requirement for work graphs

on top of this game api/engine needs to be designed for this

if both of the above are met then on a hardware such as rdna 5 the user should see significant benefit, I guess

MrMPFR · Jan 16, 2026

marees said:
Speaking on the NeoGAF forums, known AMD leaker Kepler L2 commented on the possibility of next-generation consoles having access to the DirectX 12 Work Graphs feature, which delivers better performance and improved scalbility across complex workloads including procedural generation, particle systems and AI logic, saying that while hardware and API will be ready, it is unlikely to see widespread adoption, as the engines and tools currently in use don't support this style of graphics programming.

As such, it will be a while to see if the feature will bring significant performance improvements on consoles that go well beyond what we have already seen on PC shortly after its release.

PlayStation 6, Xbox Next Won't See Widespread Adoption Of DirectX 12 Work Graphs, Especially in the Cross-Gen Period - Rumor

Though it will be part of next-gen development, PlayStation 6 and the next Xbox console won't see widespread adoption of the DirectX 12 Work Graphs feature.

wccftech.com

Can we please keep this limited to confirmed official information and SW only? Please no nextgen stuff or console leaks.

marees said:
It's up to developers to build Work Graphs into games, so this DirectX update isn't going to boost performance across all your games instantly. You'll also need a relatively modern GPU to take advantage of Work Graphs. Nvidia says this feature works on Ampere (RTX 30) and Ada Lovelace-based (RTX 40) video cards. On the AMD side, support is limited to the Radeon RX 7000 series.

New DirectX 12 Feature Eliminates CPU Bottleneck in Game Rendering

Work Graphs allow the GPU to schedule its own work instead of asking the CPU.

www.extremetech.com

MS has a better official blog from GDC 2024. So does AMD. Gaming tech media provides very little and superficial insight.

marees · Jan 18, 2026

found this article on DirectX Raytracing 1.2

DirectX Raytracing 1.2 (DXR) & Neural Rendering Announced at GDC, Support Across AMD, Intel & NVIDIA Hardware

At GDC, Microsoft is unveiling some big updates to its software suite for Windows, which come in the form of DirectX Raytracing 1.2, PIX, Neural Rendering, and more

Starting with DXR 1.2 (DirectX Raytracing), Microsoft is introducing two new technologies which include OMM (opacity micromaps) and SER (shader execution reordering), which will help deliver huge performance boosts.

Opacity micromaps significantly optimize alpha-tested geometry, delivering up to 2.3x performance improvement in path-traced games. By efficiently managing opacity data, OMM reduces shader invocations and greatly enhances rendering efficiency without compromising visual quality.
Shader execution reordering offers a major leap forward in rendering performance — up to 2x faster in some scenarios — by intelligently grouping shader execution to enhance GPU efficiency, reduce divergence, and boost frame rates, making raytraced titles smoother and more immersive than ever. This feature paves the way for more path-traced games in the future.

DXR 1.2 aims to deliver up to 40% improvements in performance during complex scenes as demoed within Remedy's Alan Wake 2 which was showcased at GDC with the latest tech while delivering substantially detailed raytraced visuals.

Another major aspect that is being introduced by Microsoft is support for cooperative vectors, which is a brand-new programming feature coming to Shader Model 6.9, very soon. With these cooperative vectors, developers can leverage new hardware acceleration engines for vector and matrix operations, allowing them to integrate neural rendering techniques directly within the graphics pipeline.

All major hardware vendors, including NVIDIA, AMD, & Intel, are on board the Cooperative Vectors and Neural Rendering train:

Neural Block Texture Compression is a new graphics technique that dramatically reduces memory usage, while maintaining exceptional visual fidelity. Overall, our partners at Intel shared that by leveraging cooperative vectors to power advanced neural compression models, they saw a 10x speed up in inference performance.
Real-time path tracing can be enhanced by neural supersampling and denoising, combining two of the most cutting-edge graphics innovations to provide realistic visuals at practical performance levels.
NVIDIA unveiled that their Neural Shading SDK will support DirectX and utilize cooperative vectors, providing developers with tools to easily integrate neural rendering techniques, significantly improving visual realism without sacrificing performance.

Microsoft Unveils DirectX Raytracing 1.2 With Huge Performance & Visual Improvements, Next-Gen Neural Rendering, Partnerships With NVIDIA, AMD & Intel

Microsoft has announced DirectX Raytracing (DXR) 1.2 update alongside next-gen Neural Rendering support across NVIDIA, Intel & AMD hardware.

wccftech.com

MrMPFR · Feb 1, 2026

Neural Two-Level Monte Carlo Real-Time Rendering

Neural Two-Level Monte Carlo Real-Time Rendering

We introduce an efficient Two-Level Monte Carlo (subset of Multi-Level Monte Carlo, MLMC) estimator for real-time rendering of scenes with global illumination. Using MLMC we split the shading integral into two parts: the radiance cache integral and the residual error integral that compensates...

arxiv.org

RTRT neural rendering paper. A significant improvement over NVIDIA's 2021 Neural Radiance Caching paper.

As you can see a massive improvement.

Edit: this video illustrates it better:

~~@basix and @soresu how long until you think we'll see games using this?~~ Stupid question this is not something games will use without big modifications. I finished the paper and there are still multiple issues with fine detail geometry + ReSTIR integration wasn't even explored.

Momoka_ · Feb 2, 2026

Path Guiding

If traditional Path Tracing is like a blind person navigating a massive, pitch-black maze by touch (random sampling), then Path Guiding equips that person with a light-sensing radar.
Here are the three implementation methods of Path Guiding that will be utilized in future game development:

Voxel Path Guiding(Voxel PG)

Voxel PG leverages a discretized spatial structure (Voxels) to store light information. It treats the 3D scene as a grid where each "cell" acts as a local cache for the light field.

Core Mechanism

Voxelization: The scene is subdivided into a 3D grid or a Sparse Voxel Octree (SVO).
Radiance Injection: During the rendering process (or a pre-pass), whenever a path finds light, that energy is "injected" into the voxel containing that point.
Directional Encoding: To remember where the light came from within a voxel, developers typically use Spherical Harmonics (SH) or Hemispherical Voxel Faces.
Guided Sampling: When a new ray hits a surface inside a voxel, the renderer queries the voxel’s stored distribution. Instead of picking a random direction, the ray is "guided" toward the neighboring voxels with the highest radiance.

Key Characteristics

Performance: Extremely fast. It relies on GPU texture lookups and hardware-accelerated grid traversal.
Artifacts: Can suffer from blockiness (aliasing) because the light field is stepped. It often requires heavy interpolation or blurring to look smooth.
Best For: Real-time applications like UE5 Lumen or game engines where speed is prioritized over mathematical perfection.

Neural Path Guiding(Neural PG)

Neural PG replaces physical data structures (like trees or grids) with a Continuous Function represented by a Neural Network (usually a lightweight MLP).

Core Mechanism

The Implicit Light Field: A neural network learns a mapping: $f(\text{position, direction}) \to \text{Radiance}$.
Normalizing Flows: Many state-of-the-art Neural PG methods use Normalizing Flows. This technique starts with a simple probability distribution (like a uniform sphere) and "warps" it through neural layers into a complex shape that matches the actual light distribution in the scene.
Online Reinforcement Learning: As the renderer runs, the network is constantly trained on the fly. The "successful" paths (those that hit light) act as training data to update the network weights.
Product Importance Sampling: Unlike Voxels, Neural PG can effectively compute the product of the BSDF (material) and the Incident Radiance. This ensures rays are sent in directions that are both "bright" and "reflective."

Key Characteristics

Accuracy: Produces extremely high-quality, continuous sampling without grid artifacts.
Memory Efficiency: A neural network with a few thousand parameters can represent a complex light field that would otherwise require gigabytes of voxel data.
Overhead: Requires Tensor Core computation. The "inference" (asking the AI where to go) is slower than a voxel lookup but significantly more accurate.

ReSTIR PG(Reservoir-based Spatiotemporal Importance Resampling PG)

ReSTIR PG represents a significant leap in how we handle global illumination. While traditional Path Guiding attempts to learn the "map" of light in a scene, ReSTIR PG turns that map into a living, breathing conversation between pixels.

Core Mechanism

The magic of ReSTIR PG lies in treating path guiding as a resampling problem rather than a density estimation problem. Instead of building massive, static data structures to record where light comes from, it uses "Reservoirs" to pass around the best light directions it finds.

The Reservoir (The "Briefcase")

At the heart of the algorithm is the Reservoir. Think of it as a small briefcase that each pixel carries. Inside this briefcase, the pixel stores:

$y$ :A promising incoming light direction ($\omega_i$).
$w_{sum}$ : The sum of weights of all candidate directions seen so far.
$M$ : The total number of candidates processed.
$W$ : The final weight used for unbiased contribution.

The Three-Stage Pipeline

ReSTIR PG typically follows these steps to find the "best" direction to shoot a ray:

Initial Sampling: Each pixel samples a few candidate directions using a basic distribution (like a simple BRDF or a low-resolution guiding structure).
Temporal Reuse: The pixel looks at its own briefcase from the previous frame. If that direction was high-quality (high radiance), it merges that old information into its current reservoir.
Spatial Reuse: The pixel "talks" to its neighbors. It looks at the reservoirs of nearby pixels and merges their best directions into its own. Since light hits neighboring pixels similarly, this "gossip" allows a single lucky find (hitting a tiny, bright light source) to spread to the whole screen instantly.

Shift Mapping (The Translation)

When a pixel borrows a direction from a neighbor, the geometry isn't exactly the same. ReSTIR PG uses Shift Mapping to mathematically adjust the direction so it "makes sense" for the new starting point. This ensures that the borrowed light doesn't create artifacts or "leak" through walls.

Key Characteristics

1. Extraordinary Convergence Speed

Because of the spatiotemporal reuse, ReSTIR PG "learns" the lighting environment almost instantly. In scenes with complex indirect lighting (e.g., a room lit only by a door slightly ajar), standard path tracers might take thousands of samples to find the light. ReSTIR PG finds it in a handful of samples because it shares the "discovery" across the entire image.

2. Memory Efficiency

Traditional path guiding often requires SD-Trees or high-resolution V-forests—complex data structures that eat up VRAM. ReSTIR PG is "lightweight." It primarily stores information per-pixel in a flat buffer. This makes it much more friendly for modern GPU architectures where memory bandwidth and capacity are precious.

3. Robustness to Dynamic Scenes

Many guiding methods require a "training" phase or long-term accumulation that breaks when a light moves or an object is destroyed. ReSTIR PG is inherently reactive. Because it re-evaluates reservoirs every frame, it can adapt to a light turning on or a mirror moving in near real-time.

4. Mathematical Unbiasedness (mostly)

By using Weighted Importance Sampling (WIS), ReSTIR PG can remain unbiased or "consistently biased" (converging to the correct answer as samples increase). The weight calculation follows:

$$W = \frac{1}{\hat{p}(y)} \cdot \frac{\sum w_i}{M}$$

This formula ensures that even though we are "cherry-picking" the best directions, the final image remains physically accurate.

basix · Feb 2, 2026

marees said:
DirectX Raytracing 1.2 (DXR) & Neural Rendering Announced at GDC, Support Across AMD, Intel & NVIDIA Hardware
Another major aspect that is being introduced by Microsoft is support for cooperative vectors, which is a brand-new programming feature coming to Shader Model 6.9, very soon. With these cooperative vectors, developers can leverage new hardware acceleration engines for vector and matrix operations, allowing them to integrate neural rendering techniques directly within the graphics pipeline.

Cooperative vectors have been postponed to a later release. Probably SM 6.10 instead of SM 6.9 (Q1/2026): https://devblogs.microsoft.com/directx/shader-model-6-9-and-the-future-of-cooperative-vector/

Reason:
Cooperative vectors were initially designed to execute "vector - matrix" operations. The updated cooperative vectors API will also support "matrix - matrix" operations to achieve much better peak throughput.

marees · Feb 2, 2026

Momoka_ said:
Path Guiding
If traditional Path Tracing is like a blind person navigating a massive, pitch-black maze by touch (random sampling), then Path Guiding equips that person with a light-sensing radar.
Here are the three implementation methods of Path Guiding that will be utilized in future game development:

Voxel Path Guiding(Voxel PG)
Voxel PG leverages a discretized spatial structure (Voxels) to store light information. It treats the 3D scene as a grid where each "cell" acts as a local cache for the light field.

Core Mechanism

Voxelization: The scene is subdivided into a 3D grid or a Sparse Voxel Octree (SVO).

Radiance Injection: During the rendering process (or a pre-pass), whenever a path finds light, that energy is "injected" into the voxel containing that point.

Directional Encoding: To remember where the light came from within a voxel, developers typically use Spherical Harmonics (SH) or Hemispherical Voxel Faces.

Guided Sampling: When a new ray hits a surface inside a voxel, the renderer queries the voxel’s stored distribution. Instead of picking a random direction, the ray is "guided" toward the neighboring voxels with the highest radiance.

Key Characteristics

Performance: Extremely fast. It relies on GPU texture lookups and hardware-accelerated grid traversal.

Artifacts: Can suffer from blockiness (aliasing) because the light field is stepped. It often requires heavy interpolation or blurring to look smooth.

Best For: Real-time applications like UE5 Lumen or game engines where speed is prioritized over mathematical perfection.

Neural Path Guiding(Neural PG)
Neural PG replaces physical data structures (like trees or grids) with a Continuous Function represented by a Neural Network (usually a lightweight MLP).

Core Mechanism

The Implicit Light Field: A neural network learns a mapping: $f(\text{position, direction}) \to \text{Radiance}$.

Normalizing Flows: Many state-of-the-art Neural PG methods use Normalizing Flows. This technique starts with a simple probability distribution (like a uniform sphere) and "warps" it through neural layers into a complex shape that matches the actual light distribution in the scene.

Online Reinforcement Learning: As the renderer runs, the network is constantly trained on the fly. The "successful" paths (those that hit light) act as training data to update the network weights.

Product Importance Sampling: Unlike Voxels, Neural PG can effectively compute the product of the BSDF (material) and the Incident Radiance. This ensures rays are sent in directions that are both "bright" and "reflective."

Key Characteristics

Accuracy: Produces extremely high-quality, continuous sampling without grid artifacts.

Memory Efficiency: A neural network with a few thousand parameters can represent a complex light field that would otherwise require gigabytes of voxel data.

Overhead: Requires Tensor Core computation. The "inference" (asking the AI where to go) is slower than a voxel lookup but significantly more accurate.

ReSTIR PG

did the ReSTIR PG section get cut off ?

Momoka_ · Feb 3, 2026

marees said:
did the ReSTIR PG section get cut off ?

Oops! I forgot to add that.

ReSTIR PG(Reservoir-based Spatiotemporal Importance Resampling PG)

ReSTIR PG represents a significant leap in how we handle global illumination. While traditional Path Guiding attempts to learn the "map" of light in a scene, ReSTIR PG turns that map into a living, breathing conversation between pixels.

Core Mechanism

The magic of ReSTIR PG lies in treating path guiding as a resampling problem rather than a density estimation problem. Instead of building massive, static data structures to record where light comes from, it uses "Reservoirs" to pass around the best light directions it finds.

The Reservoir (The "Briefcase")

At the heart of the algorithm is the Reservoir. Think of it as a small briefcase that each pixel carries. Inside this briefcase, the pixel stores:

$y$ :A promising incoming light direction ( $\omega_i$ ).
$w_{sum}$ : The sum of weights of all candidate directions seen so far.
$M$ : The total number of candidates processed.
$W$ : The final weight used for unbiased contribution.

The Three-Stage Pipeline

ReSTIR PG typically follows these steps to find the "best" direction to shoot a ray:

Initial Sampling: Each pixel samples a few candidate directions using a basic distribution (like a simple BRDF or a low-resolution guiding structure).
Temporal Reuse: The pixel looks at its own briefcase from the previous frame. If that direction was high-quality (high radiance), it merges that old information into its current reservoir.
Spatial Reuse: The pixel "talks" to its neighbors. It looks at the reservoirs of nearby pixels and merges their best directions into its own. Since light hits neighboring pixels similarly, this "gossip" allows a single lucky find (hitting a tiny, bright light source) to spread to the whole screen instantly.

Shift Mapping (The Translation)

When a pixel borrows a direction from a neighbor, the geometry isn't exactly the same. ReSTIR PG uses Shift Mapping to mathematically adjust the direction so it "makes sense" for the new starting point. This ensures that the borrowed light doesn't create artifacts or "leak" through walls.

Key Characteristics

1. Extraordinary Convergence Speed

Because of the spatiotemporal reuse, ReSTIR PG "learns" the lighting environment almost instantly. In scenes with complex indirect lighting (e.g., a room lit only by a door slightly ajar), standard path tracers might take thousands of samples to find the light. ReSTIR PG finds it in a handful of samples because it shares the "discovery" across the entire image.

2. Memory Efficiency

Traditional path guiding often requires SD-Trees or high-resolution V-forests—complex data structures that eat up VRAM. ReSTIR PG is "lightweight." It primarily stores information per-pixel in a flat buffer. This makes it much more friendly for modern GPU architectures where memory bandwidth and capacity are precious.

3. Robustness to Dynamic Scenes

Many guiding methods require a "training" phase or long-term accumulation that breaks when a light moves or an object is destroyed. ReSTIR PG is inherently reactive. Because it re-evaluates reservoirs every frame, it can adapt to a light turning on or a mirror moving in near real-time.

4. Mathematical Unbiasedness (mostly)

By using Weighted Importance Sampling (WIS), ReSTIR PG can remain unbiased or "consistently biased" (converging to the correct answer as samples increase). The weight calculation follows:

$$W = \frac{1}{\hat{p}(y)} \cdot \frac{\sum w_i}{M}$$

This formula ensures that even though we are "cherry-picking" the best directions, the final image remains physically accurate.

Momoka_ · Feb 3, 2026

RTX Mega Geometry(RTXMG)

RTX Mega Geometry (RTXMG) is a revolutionary geometry processing technology officially launched by NVIDIA in 2025. Its core objective is to shatter the geometry complexity bottleneck in real-time ray tracing, enabling games and rendering applications to trace hundreds of millions or even billions of triangles in real-time without causing VRAM overflows or massive frame rate drops.

Why Do We Need RTX Mega Geometry?

In traditional ray tracing, for a light ray to know what it has hit, the GPU must build an Acceleration Structure (BVH).

The Pain Point: When a scene contains tens of millions of triangles (such as virtualized Nanite models) or complex animations (like a flowing cape), building and updating the BVH consumes massive CPU power, occupies huge amounts of VRAM, and leads to poor rendering efficiency.
The Reality: Many games, even with ray tracing enabled, actually use "Proxy Meshes" for RT calculations. This often results in shadows and reflections that don't perfectly align with the high-detail visuals.

Core Technical Principles

RTX Mega Geometry redefines the rendering pipeline through three primary pillars:

Clustered Acceleration Structure (CLAS)

RTXMG stops processing every tiny triangle individually and instead divides the geometry into Clusters.

Intelligent Updates: Only clusters that undergo changes are rebuilt on the GPU, drastically reducing the CPU's scheduling overhead.
Hardware Acceleration: It leverages dedicated hardware units in the RTX 40/50 series to speed up BVH construction by 10x to 100x.

Subdivision & Micro-Mesh

It adopts a "tessellation" approach similar to the film industry. Developers only need to provide a lightweight "control mesh," and RTXMG generates micron-level detail in real-time based on viewing distance.

Infinite Detail: Theoretically achieves "one triangle per pixel" precision.
Memory Compression: Compared to storing massive high-poly meshes directly, RTXMG uses displacement maps and base meshes, saving up to 10x the VRAM space.

Integration with Virtualized Geometry (e.g., Nanite)

RTXMG solves the long-standing incompatibility between Unreal Engine 5's Nanite and ray tracing. It allows rays to directly trace the original ultra-high-precision data of Nanite, eliminating artifacts like light leaking and broken shadows.

RTXMG was announced by NVIDIA last year alongside the launch of the RTX 50-series GPUs. However, due to hardware and software limitations, Alan Wake 2 is currently the only title to have implemented the technology. That said, as Unreal Engine 5 continues to evolve, 'Mega Geometry for Nanite' is expected to be integrated into future UE5 iterations. This means that Intel ARC and AMD RDNA series GPUs should eventually be able to benefit from this technology as well. Furthermore, given the deep partnership between CDPR and NVIDIA, there is a high probability that The Witcher 4 will utilize this tech.

Momoka_ · Feb 3, 2026

RTX Mega Geometry(RTXMG)

RTX Mega Geometry (RTXMG) is a revolutionary geometry processing technology officially launched by NVIDIA last year. Its core objective is to shatter the geometry complexity bottleneck in real-time ray tracing, enabling games and rendering applications to trace hundreds of millions or even billions of triangles in real-time without causing VRAM overflows or massive frame rate drops.

Why Do We Need RTX Mega Geometry?

In traditional ray tracing, for a light ray to know what it has hit, the GPU must build an Acceleration Structure (BVH).

The Pain Point: When a scene contains tens of millions of triangles (such as virtualized Nanite models) or complex animations (like a flowing cape), building and updating the BVH consumes massive CPU power, occupies huge amounts of VRAM, and leads to poor rendering efficiency.
The Reality: Many games, even with ray tracing enabled, actually use "Proxy Meshes" for RT calculations. This often results in shadows and reflections that don't perfectly align with the high-detail visuals.

Core Technical Principles

RTX Mega Geometry redefines the rendering pipeline through three primary pillars:

Clustered Acceleration Structure (CLAS)

RTXMG stops processing every tiny triangle individually and instead divides the geometry into Clusters.

Intelligent Updates: Only clusters that undergo changes are rebuilt on the GPU, drastically reducing the CPU's scheduling overhead.
Hardware Acceleration: It leverages dedicated hardware units in the RTX 40/50 series to speed up BVH construction by 10x to 100x.

Subdivision & Micro-Mesh

It adopts a "tessellation" approach similar to the film industry. Developers only need to provide a lightweight "control mesh," and RTXMG generates micron-level detail in real-time based on viewing distance.

Infinite Detail: Theoretically achieves "one triangle per pixel" precision.
Memory Compression: Compared to storing massive high-poly meshes directly, RTXMG uses displacement maps and base meshes, saving up to 10x the VRAM space.

Integration with Virtualized Geometry (e.g., Nanite)

RTXMG solves the long-standing incompatibility between Unreal Engine 5's Nanite and ray tracing. It allows rays to directly trace the original ultra-high-precision data of Nanite, eliminating artifacts like light leaking and broken shadows.

RTXMG was announced by NVIDIA last year alongside the launch of the RTX 50-series GPUs. However, due to hardware and software limitations, Alan Wake 2 is currently the only title to have implemented the technology. That said, as Unreal Engine 5 continues to evolve, 'Mega Geometry for Nanite' is expected to be integrated into future UE5 iterations. This means that Intel ARC and AMD RDNA series GPUs should eventually be able to benefit from this technology as well. Furthermore, given the deep partnership between CDPR and NVIDIA, there is a high probability that The Witcher 4 will utilize this tech.

MrMPFR · Feb 3, 2026

There's nothing preventing AMD or Intel from coming up with similar tech and all 50 series does is to provide RTX MG compression (30% smaller VRAM footprint IIRC). NVIDIA keeping it a RTX exclusive is the usual gameworks BS because they support it on 20-40 series despite no native HW support.

NVRTX branch of UE5 already has RTX MG support.

@Momoka_ Your second post contains a lot of info already found in the first one. Also please no LLM prompt dumps.

MrMPFR · Feb 3, 2026

Neural Path Guiding: https://arxiv.org/abs/2506.00839
Voxel Path Guiding: https://dl.acm.org/doi/10.1145/3658203
ReSTIR Path Guiding: https://research.nvidia.com/labs/rtr/publication/zeng2025restirpg/
RTX Mega Geometry at SIGGRAPH 2025: https://dl.acm.org/doi/10.1145/3721243.3735983

Momoka_ · Feb 3, 2026

MrMPFR said:
There's nothing preventing AMD or Intel from coming up with similar tech and all 50 series does is to provide RTX MG compression (30% smaller VRAM footprint IIRC). NVIDIA keeping it a RTX exclusive is the usual gameworks BS because they support it on 20-40 series despite no native HW support.

NVRTX branch of UE5 already has RTX MG support.

Your second post contains a lot of info already found in the first one. Also please no LLM prompt dumps.

Sorry, I sent it twice by mistake.

soresu · Feb 5, 2026

Strange, AMD just published another HIP RT paper.

Though it's marked as a technical report it's not really saying anything new vs the 2024 paper 🤔

soresu · Feb 5, 2026

Real-Time Markov Chain Path Guiding for Global Illumination and Single Scattering!

Abstract:

We present a lightweight and unbiased path guiding algorithm tailored for real-time applications with highly dynamic content. The algorithm demonstrates effectiveness in guiding both direct and indirect illumination. Moreover, it can be extended to guide single scattering events in participating media. Building upon the screen-space approach by Dittebrandt et al. [2023], the incident light distribution is represented as a von Mises-Fisher mixture model, which is controlled by a Markov chain process. To extend the procedure to world space, our algorithm uses a unique Markov chain architecture, which resamples Markov chain states from an ensemble of hash grids. We combine multi-resolution adaptive grids with a static grid, ensuring rapid state exchange without compromising guiding quality. The algorithm imposes minimal prerequisites on scene representation and seamlessly integrates into existing path tracing frameworks. Through continuous multiple importance sampling, it remains independent of the equilibrium distribution of Markov chain and hash grid resampling. We perform an evaluation of the proposed methods across diverse scenarios. Additionally, we explore the algorithm’s viability in offline scenarios, showcasing its effectiveness in rendering volumetric caustics. We demonstrate the application of the proposed methods in a path tracing engine for the original Quake game. The demo project features path traced global illumination and single scattering effects at frame rates over 30 FPS on NVIDIA’s GeForce 20 series or AMD’s Radeon RX 6000 series without upscaling.

It's actually referenced in the ReSTIR PG paper mentioned above.

It can do single scatter volume rendering insanely fast, but it's not so hot with handling lots of light sources apparently.

Discussion exciting new features, research & advancements in gaming (graphics & adjacent software)

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Senior member

Platinum Member

World of Tanks Implements Ray Traced Shadows for ANY DX11 GPU​

Platinum Member

Work graphs new functionality​

Dynamic shader selection​

Implicit micro-dependency model​

Work graphs overview​

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Senior member

Platinum Member

DirectX Raytracing 1.2 (DXR) & Neural Rendering Announced at GDC, Support Across AMD, Intel & NVIDIA Hardware​

Senior member

Neural Two-Level Monte Carlo Real-Time Rendering​

Member

Path Guiding​

Voxel Path Guiding(Voxel PG)​

Core Mechanism​

Key Characteristics​

Neural Path Guiding(Neural PG)​

Core Mechanism​

Key Characteristics​

ReSTIR PG(Reservoir-based Spatiotemporal Importance Resampling PG)​

Core Mechanism​

Key Characteristics​

Senior member

DirectX Raytracing 1.2 (DXR) & Neural Rendering Announced at GDC, Support Across AMD, Intel & NVIDIA Hardware​

Platinum Member

Path Guiding​

Voxel Path Guiding(Voxel PG)​

Core Mechanism​

Key Characteristics​

Neural Path Guiding(Neural PG)​

Core Mechanism​

Key Characteristics​

ReSTIR PG​

Member

ReSTIR PG(Reservoir-based Spatiotemporal Importance Resampling PG)​

Core Mechanism​

Key Characteristics​

Member

RTX Mega Geometry(RTXMG)​

Member

RTX Mega Geometry(RTXMG)​

Senior member

Senior member

Member

Diamond Member

Diamond Member

World of Tanks Implements Ray Traced Shadows for ANY DX11 GPU

Work graphs new functionality

Dynamic shader selection

Implicit micro-dependency model

Work graphs overview

DirectX Raytracing 1.2 (DXR) & Neural Rendering Announced at GDC, Support Across AMD, Intel & NVIDIA Hardware

Neural Two-Level Monte Carlo Real-Time Rendering

Path Guiding

Voxel Path Guiding(Voxel PG)

Core Mechanism

Key Characteristics

Neural Path Guiding(Neural PG)

Core Mechanism

Key Characteristics

ReSTIR PG(Reservoir-based Spatiotemporal Importance Resampling PG)

Core Mechanism

Key Characteristics

DirectX Raytracing 1.2 (DXR) & Neural Rendering Announced at GDC, Support Across AMD, Intel & NVIDIA Hardware

Path Guiding

Voxel Path Guiding(Voxel PG)

Core Mechanism

Key Characteristics

Neural Path Guiding(Neural PG)

Core Mechanism

Key Characteristics

ReSTIR PG

ReSTIR PG(Reservoir-based Spatiotemporal Importance Resampling PG)

Core Mechanism

Key Characteristics

RTX Mega Geometry(RTXMG)

RTX Mega Geometry(RTXMG)