• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion exciting new features, research & advancements in gaming (graphics & adjacent software)

marees

Platinum Member
thread to discuss all new features to gaming GPU architectures & graphics APIs

idea behind this thread is to discuss "Real time graphics, GPGPU, ML and adjacent (physics, audio etc) software research"

(suggestion: if you are posting videos then post text links also)

separate thread below for hardware features

reference 1 GPU blogs
  1. nvidia GeForce — https://blogs.nvidia.com/blog/category/gaming/
  2. amd GPUOpen — https://gpuopen.com/
  3. intel Arc — https://game.intel.com/in/stories/?category=intel-arc

reference 2 — Gaming APIs
  • OpenGL — Cross-platform GAPI. Use if uncertain.
  • Vulkan — OpenGL but low-level. High-performance, cross-platform.
  • MoltenVK — Enables Vulkan on macOS/iOS.
  • DirectX 11 — Microsoft's GAPI for Windows. Use if Windows-only.
  • DirectX 12 — Improved, lower-level for Windows. Use for more features.
  • WebGL — FLASH. Web standard for 3D graphics. Use for web rendering.
  • WebGPU — Successor to WebGL. Better, broader support. Use for modern web graphics (many functions not supported on newer webb updates).
  • Metal — Apple's low-level GAPI. Use for macOS/iOS development.


reference 3history of GPUs


 
Last edited:
"Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment."
AMD blog on this — for those who want text version


& follow up demo paper/pdf here


extracts from blog & paper (as screenshots)
View attachment 136538

View attachment 136541


View attachment 136540

That's the boring paper for mesh nodes launch.

At HPG 2025 they went full on crackpot with GPU tree gen: https://gpuopen.com/download/Real-Time_GPU_Tree_Generation.pdf

AMD's solution to the VRAM problem & memory price issues ??

 
Last edited:
The most impressive Work Graphs paper to date from HPG 2025:

GPUOpen download: https://gpuopen.com/download/Real-Time_GPU_Tree_Generation.pdf

HPG talk:

Bastian Kuth's YT demonstration:

Features:
  • 100% GPU driven SOTA procedural tree generation using GPU Work Graphs API and Mesh Nodes extension
  • Per frame changes based on 150 parameters: "Our model supports procedural displacement, seasonal changes, complex pruning, animation, culling, continuous LOD, and intuitive artistic control with real-time edits."
  • Based on 55kB of generation code vs 34.8GB for pregenerated assets (static)
  • Allocates a 1.5GB scratchpad buffer for work graph vs Execute Indirect's combinatorial explosion that makes this impossible on consumer grade HW
  • Rendered in real-time with 3.13ms total overhead (G-buffer + PCG) on RX 7900 XTX
  • Auto LOD geometry overhead to ht certain FPS or ms target.
  • Nanite like continuous LOD
  • Responds to wind in very convincing way (rebuild for every single frame)
 
Last edited:
this is very old (from 6 years ago)

but I am curious. whatever happened to CPU based ray tracing for directX11 games ?

World of Tanks Implements Ray Traced Shadows for ANY DX11 GPU​

Meant to post this yesterday, but I forgot. The video link below goes through how the implemented it, and its an interesting watch. For anybody wondering why its only shadows, its because each tank has in excess of 50k polygons, and you can have up to 50 tanks on screen at any given time. Currently no hardware exists that can handle that kind of load with RT reflections/GI. Wargaming has always tried to make their engine run on even low end hardware, and not require special hardware.

However, with how they implemented the RT Shadows, with help from Intel, it can run on any GPU. I play the game a lot, so I downloaded their test client, with the game settings at Ultra, and RT Shadows at HIGH, my system with a 4.5GHz 4690K and an RX480 4GB ran it really quite well at 1080P. It did drop below 60fps a few times. As a note, this does not use any specialized RT hardware, even if the GPU has it. This uses a combination of CPU and GPU to handle the calculations, and they use only 1 ray per pixel, with a custom dithering filter that they came up with which is shown in the video.

Details: https://worldoftanks.com/en/news/general-news/ray-tracing/

Developer Video:
 
just came across a very good blog by Nvidia on work graphs from 2 years ago



Work graphs new functionality​

D3D12 already exposes functionality to aid in GPU-driven rendering, as mentioned previously. This section highlights the new functionality introduced by work graphs, compared to existing functionality.

Dynamic shader selection​

Each node in the work graph can choose which of its children to run. The decision is driven by the producer’s shader code itself. This enables decisions to be determined by information generated by the GPU in a previous node or workload.

On the other hand, ExecuteIndirect is confined to work under the state it was launched with, most notably the shader specified by the pipeline state object. An application that needs to launch different shaders depending on GPU-side data has no choice but to issue a series of SetPipelineState and ExecuteIndirect calls, or rely on inefficient uber shaders to cover only some of the potential possibilities.

Implicit micro-dependency model​

Rendering a frame involves executing several major passes, such as depth, geometry, or lighting passes. Within each pass, data is processed in parallel, where each unit of data goes through several sequential operations. Resource barriers are usually placed between the operations to ensure data processing is completed by the previous operation before moving to the next.

A work graph expresses this dependency implicitly by producer nodes passing records to children nodes. Children node shaders will only run when the producer has completed writing the record, implying that the data is fully ready for consumption by the child. Note that the scope of work graph producer-consumer dependencies are on the data record scope, whereas a resource barrier operates on all accesses to a resource.

The work graph dependency model is fine-grained compared to barriers. This can translate to better occupancy on the GPU, as dependent work can launch earlier instead of waiting for a barrier to finish. Records can immediately pass from the producer to the consumer node and need not be fully flushed across algorithm steps as is the case for Dispatch-ResourceBarrier sequences.

Figure 2 illustrates how the workloads are executed in each case. On the left, two Dispatch calls separated by a ResourceBarrier. Each row represents a producer thread-group (green) and its consumer thread-group (blue). On the right, the same workloads run with a work graph.

The image on the left shows two columns of multiple rows of blocks. The two columns are separated by a vertical line representing a resource barrier. An image on the right shows the same blocks but without the vertical line. The blocks are all packed tightly next to each other.
Figure 2. A comparison of workload execution

Work graphs overview​

Shader Model 6.8 for D3D12, among many other features, marks the official release of work graphs. The term ‘graph’ in the name holds up well to its definition: a collection of nodes connected by edges. In work graphs, nodes perform tasks (“work”) and pass data to other nodes across the graph edges.

But what is this work that a node executes? Is it a command such as a Dispatch call? A single thread running a certain shader? Or perhaps a group of threads running the same shader?

The answer is, all of the above. Each node has a shader that is launched in a certain configuration of the programmer’s choice. This configuration, or launch mode, can be a full dispatch grid (broadcast launch) or compute threads run either independently of each other (thread launch) or potentially collectively (coalescing launch). Note that Thread Launch work can be gathered to run in a wave where possible, but each thread will still have its inputs independent of other threads.

A connection to another node is realized by choosing the target node and passing data to it. This resembles what is typically known as continuation in graph terminology. The target node receives the data and runs outside its caller’s context. There is no call stack in this system, just data cascading from the top to the bottom of the graph.

Units of data, called records, drive the entire execution of the work graph. To launch a node, a record must be written for it. The node’s shader is then launched in the chosen launch mode, and consumes that record as input. The record is a packed structure of data filled by the producer. The producer could be the CPU’s command DispatchGraph, or any node in the work graph. A node consuming the record could be thought of as a child of the producer node.

A work graph with one block representing a producer node, connecting to three other blocks, each representing a child node. Each connection line represents a single record being passed from the producer to one of its children. Each child specifies a different node launch mode: broadcast launch, thread launch, and coalescing launch.
Figure 1. A work graph with a root node producing records to three children. Each child’s launch mode is different, resulting in different total threads launched for each node per single input record
 
Work graphs enable the GPU to assign work to itself, significantly improving rendering efficiency in areas that previously relied on the CPU to do the heavy lifting. Work graphs are so flexible that 3D engines can run almost entirely on the GPU when used appropriately.

Mesh nodes are an extension of work graphs that allow the GPU to issue draw calls to itself. Again, this shifts work from the CPU to the GPU.

 
Speaking on the NeoGAF forums, known AMD leaker Kepler L2 commented on the possibility of next-generation consoles having access to the DirectX 12 Work Graphs feature, which delivers better performance and improved scalbility across complex workloads including procedural generation, particle systems and AI logic, saying that while hardware and API will be ready, it is unlikely to see widespread adoption, as the engines and tools currently in use don't support this style of graphics programming.

As such, it will be a while to see if the feature will bring significant performance improvements on consoles that go well beyond what we have already seen on PC shortly after its release.

 
It's up to developers to build Work Graphs into games, so this DirectX update isn't going to boost performance across all your games instantly. You'll also need a relatively modern GPU to take advantage of Work Graphs. Nvidia says this feature works on Ampere (RTX 30) and Ada Lovelace-based (RTX 40) video cards. On the AMD side, support is limited to the Radeon RX 7000 series.

 
Nvidia says this feature works on Ampere (RTX 30) and Ada Lovelace-based (RTX 40) video cards. On the AMD side, support is limited to the Radeon RX 7000 series.
this is the minimum hardware requirement for work graphs

on top of this game api/engine needs to be designed for this

if both of the above are met then on a hardware such as rdna 5 the user should see significant benefit, I guess
 
Speaking on the NeoGAF forums, known AMD leaker Kepler L2 commented on the possibility of next-generation consoles having access to the DirectX 12 Work Graphs feature, which delivers better performance and improved scalbility across complex workloads including procedural generation, particle systems and AI logic, saying that while hardware and API will be ready, it is unlikely to see widespread adoption, as the engines and tools currently in use don't support this style of graphics programming.

As such, it will be a while to see if the feature will bring significant performance improvements on consoles that go well beyond what we have already seen on PC shortly after its release.

Can we please keep this limited to confirmed official information and SW only? Please no nextgen stuff or console leaks.

It's up to developers to build Work Graphs into games, so this DirectX update isn't going to boost performance across all your games instantly. You'll also need a relatively modern GPU to take advantage of Work Graphs. Nvidia says this feature works on Ampere (RTX 30) and Ada Lovelace-based (RTX 40) video cards. On the AMD side, support is limited to the Radeon RX 7000 series.

MS has a better official blog from GDC 2024. So does AMD. Gaming tech media provides very little and superficial insight.
 
found this article on DirectX Raytracing 1.2

DirectX Raytracing 1.2 (DXR) & Neural Rendering Announced at GDC, Support Across AMD, Intel & NVIDIA Hardware​

At GDC, Microsoft is unveiling some big updates to its software suite for Windows, which come in the form of DirectX Raytracing 1.2, PIX, Neural Rendering, and more


Starting with DXR 1.2 (DirectX Raytracing), Microsoft is introducing two new technologies which include OMM (opacity micromaps) and SER (shader execution reordering), which will help deliver huge performance boosts.
  • Opacity micromaps significantly optimize alpha-tested geometry, delivering up to 2.3x performance improvement in path-traced games. By efficiently managing opacity data, OMM reduces shader invocations and greatly enhances rendering efficiency without compromising visual quality.
  • Shader execution reordering offers a major leap forward in rendering performance — up to 2x faster in some scenarios — by intelligently grouping shader execution to enhance GPU efficiency, reduce divergence, and boost frame rates, making raytraced titles smoother and more immersive than ever. This feature paves the way for more path-traced games in the future.
DXR 1.2 aims to deliver up to 40% improvements in performance during complex scenes as demoed within Remedy's Alan Wake 2 which was showcased at GDC with the latest tech while delivering substantially detailed raytraced visuals.


Another major aspect that is being introduced by Microsoft is support for cooperative vectors, which is a brand-new programming feature coming to Shader Model 6.9, very soon. With these cooperative vectors, developers can leverage new hardware acceleration engines for vector and matrix operations, allowing them to integrate neural rendering techniques directly within the graphics pipeline.

All major hardware vendors, including NVIDIA, AMD, & Intel, are on board the Cooperative Vectors and Neural Rendering train:

 

Neural Two-Level Monte Carlo Real-Time Rendering​



RTRT neural rendering paper. A significant improvement over NVIDIA's 2021 Neural Radiance Caching paper.

NIRC.png
As you can see a massive improvement.

Edit: this video illustrates it better:

@basix and @soresu how long until you think we'll see games using this? Stupid question this is not something games will use without big modifications. I finished the paper and there are still multiple issues with fine detail geometry + ReSTIR integration wasn't even explored.
 
Last edited:

Path Guiding​

If traditional Path Tracing is like a blind person navigating a massive, pitch-black maze by touch (random sampling), then Path Guiding equips that person with a light-sensing radar.
Here are the three implementation methods of Path Guiding that will be utilized in future game development:

Voxel Path Guiding(Voxel PG)​

Voxel PG leverages a discretized spatial structure (Voxels) to store light information. It treats the 3D scene as a grid where each "cell" acts as a local cache for the light field.

Core Mechanism​

  • Voxelization: The scene is subdivided into a 3D grid or a Sparse Voxel Octree (SVO).
  • Radiance Injection: During the rendering process (or a pre-pass), whenever a path finds light, that energy is "injected" into the voxel containing that point.
  • Directional Encoding: To remember where the light came from within a voxel, developers typically use Spherical Harmonics (SH) or Hemispherical Voxel Faces.
  • Guided Sampling: When a new ray hits a surface inside a voxel, the renderer queries the voxel’s stored distribution. Instead of picking a random direction, the ray is "guided" toward the neighboring voxels with the highest radiance.

Key Characteristics​

  • Performance: Extremely fast. It relies on GPU texture lookups and hardware-accelerated grid traversal.
  • Artifacts: Can suffer from blockiness (aliasing) because the light field is stepped. It often requires heavy interpolation or blurring to look smooth.
  • Best For: Real-time applications like UE5 Lumen or game engines where speed is prioritized over mathematical perfection.

Neural Path Guiding(Neural PG)​

Neural PG replaces physical data structures (like trees or grids) with a Continuous Function represented by a Neural Network (usually a lightweight MLP).

Core Mechanism​

  • The Implicit Light Field: A neural network learns a mapping: $f(\text{position, direction}) \to \text{Radiance}$.
  • Normalizing Flows: Many state-of-the-art Neural PG methods use Normalizing Flows. This technique starts with a simple probability distribution (like a uniform sphere) and "warps" it through neural layers into a complex shape that matches the actual light distribution in the scene.
  • Online Reinforcement Learning: As the renderer runs, the network is constantly trained on the fly. The "successful" paths (those that hit light) act as training data to update the network weights.
  • Product Importance Sampling: Unlike Voxels, Neural PG can effectively compute the product of the BSDF (material) and the Incident Radiance. This ensures rays are sent in directions that are both "bright" and "reflective."

Key Characteristics​

  • Accuracy: Produces extremely high-quality, continuous sampling without grid artifacts.
  • Memory Efficiency: A neural network with a few thousand parameters can represent a complex light field that would otherwise require gigabytes of voxel data.
  • Overhead: Requires Tensor Core computation. The "inference" (asking the AI where to go) is slower than a voxel lookup but significantly more accurate.

ReSTIR PG(Reservoir-based Spatiotemporal Importance Resampling PG)

ReSTIR PG represents a significant leap in how we handle global illumination. While traditional Path Guiding attempts to learn the "map" of light in a scene, ReSTIR PG turns that map into a living, breathing conversation between pixels.

Core Mechanism

The magic of ReSTIR PG lies in treating path guiding as a resampling problem rather than a density estimation problem. Instead of building massive, static data structures to record where light comes from, it uses "Reservoirs" to pass around the best light directions it finds.

The Reservoir (The "Briefcase")

At the heart of the algorithm is the Reservoir. Think of it as a small briefcase that each pixel carries. Inside this briefcase, the pixel stores:

  • $y$:A promising incoming light direction ($\omega_i$).
  • $w_{sum}$: The sum of weights of all candidate directions seen so far.
  • $M$: The total number of candidates processed.
  • $W$: The final weight used for unbiased contribution.
The Three-Stage Pipeline

ReSTIR PG
typically follows these steps to find the "best" direction to shoot a ray:
  1. Initial Sampling: Each pixel samples a few candidate directions using a basic distribution (like a simple BRDF or a low-resolution guiding structure).
  2. Temporal Reuse: The pixel looks at its own briefcase from the previous frame. If that direction was high-quality (high radiance), it merges that old information into its current reservoir.
  3. Spatial Reuse: The pixel "talks" to its neighbors. It looks at the reservoirs of nearby pixels and merges their best directions into its own. Since light hits neighboring pixels similarly, this "gossip" allows a single lucky find (hitting a tiny, bright light source) to spread to the whole screen instantly.
Shift Mapping (The Translation)

When a pixel borrows a direction from a neighbor, the geometry isn't exactly the same. ReSTIR PG uses Shift Mapping to mathematically adjust the direction so it "makes sense" for the new starting point. This ensures that the borrowed light doesn't create artifacts or "leak" through walls.

Key Characteristics

1. Extraordinary Convergence Speed

Because of the spatiotemporal reuse, ReSTIR PG "learns" the lighting environment almost instantly. In scenes with complex indirect lighting (e.g., a room lit only by a door slightly ajar), standard path tracers might take thousands of samples to find the light. ReSTIR PG finds it in a handful of samples because it shares the "discovery" across the entire image.​

2. Memory Efficiency

Traditional path guiding often requires SD-Trees or high-resolution V-forests—complex data structures that eat up VRAM. ReSTIR PG is "lightweight." It primarily stores information per-pixel in a flat buffer. This makes it much more friendly for modern GPU architectures where memory bandwidth and capacity are precious.​
3. Robustness to Dynamic Scenes
Many guiding methods require a "training" phase or long-term accumulation that breaks when a light moves or an object is destroyed. ReSTIR PG is inherently reactive. Because it re-evaluates reservoirs every frame, it can adapt to a light turning on or a mirror moving in near real-time.​

4. Mathematical Unbiasedness (mostly)

By using Weighted Importance Sampling (WIS), ReSTIR PG can remain unbiased or "consistently biased" (converging to the correct answer as samples increase). The weight calculation follows:​
$$W = \frac{1}{\hat{p}(y)} \cdot \frac{\sum w_i}{M}$$
This formula ensures that even though we are "cherry-picking" the best directions, the final image remains physically accurate.​
 
Last edited:

DirectX Raytracing 1.2 (DXR) & Neural Rendering Announced at GDC, Support Across AMD, Intel & NVIDIA Hardware​

Another major aspect that is being introduced by Microsoft is support for cooperative vectors, which is a brand-new programming feature coming to Shader Model 6.9, very soon. With these cooperative vectors, developers can leverage new hardware acceleration engines for vector and matrix operations, allowing them to integrate neural rendering techniques directly within the graphics pipeline.
Cooperative vectors have been postponed to a later release. Probably SM 6.10 instead of SM 6.9 (Q1/2026): https://devblogs.microsoft.com/directx/shader-model-6-9-and-the-future-of-cooperative-vector/

Reason:
Cooperative vectors were initially designed to execute "vector - matrix" operations. The updated cooperative vectors API will also support "matrix - matrix" operations to achieve much better peak throughput.
 
Last edited:

Path Guiding​

If traditional Path Tracing is like a blind person navigating a massive, pitch-black maze by touch (random sampling), then Path Guiding equips that person with a light-sensing radar.
Here are the three implementation methods of Path Guiding that will be utilized in future game development:

Voxel Path Guiding(Voxel PG)​

Voxel PG leverages a discretized spatial structure (Voxels) to store light information. It treats the 3D scene as a grid where each "cell" acts as a local cache for the light field.

Core Mechanism​

  • Voxelization: The scene is subdivided into a 3D grid or a Sparse Voxel Octree (SVO).
  • Radiance Injection: During the rendering process (or a pre-pass), whenever a path finds light, that energy is "injected" into the voxel containing that point.
  • Directional Encoding: To remember where the light came from within a voxel, developers typically use Spherical Harmonics (SH) or Hemispherical Voxel Faces.
  • Guided Sampling: When a new ray hits a surface inside a voxel, the renderer queries the voxel’s stored distribution. Instead of picking a random direction, the ray is "guided" toward the neighboring voxels with the highest radiance.

Key Characteristics​

  • Performance: Extremely fast. It relies on GPU texture lookups and hardware-accelerated grid traversal.
  • Artifacts: Can suffer from blockiness (aliasing) because the light field is stepped. It often requires heavy interpolation or blurring to look smooth.
  • Best For: Real-time applications like UE5 Lumen or game engines where speed is prioritized over mathematical perfection.

Neural Path Guiding(Neural PG)​

Neural PG replaces physical data structures (like trees or grids) with a Continuous Function represented by a Neural Network (usually a lightweight MLP).

Core Mechanism​

  • The Implicit Light Field: A neural network learns a mapping: $f(\text{position, direction}) \to \text{Radiance}$.
  • Normalizing Flows: Many state-of-the-art Neural PG methods use Normalizing Flows. This technique starts with a simple probability distribution (like a uniform sphere) and "warps" it through neural layers into a complex shape that matches the actual light distribution in the scene.
  • Online Reinforcement Learning: As the renderer runs, the network is constantly trained on the fly. The "successful" paths (those that hit light) act as training data to update the network weights.
  • Product Importance Sampling: Unlike Voxels, Neural PG can effectively compute the product of the BSDF (material) and the Incident Radiance. This ensures rays are sent in directions that are both "bright" and "reflective."

Key Characteristics​

  • Accuracy: Produces extremely high-quality, continuous sampling without grid artifacts.
  • Memory Efficiency: A neural network with a few thousand parameters can represent a complex light field that would otherwise require gigabytes of voxel data.
  • Overhead: Requires Tensor Core computation. The "inference" (asking the AI where to go) is slower than a voxel lookup but significantly more accurate.

ReSTIR PG​

did the ReSTIR PG section get cut off ?
 
did the ReSTIR PG section get cut off ?
Oops! I forgot to add that.

ReSTIR PG(Reservoir-based Spatiotemporal Importance Resampling PG)

ReSTIR PG represents a significant leap in how we handle global illumination. While traditional Path Guiding attempts to learn the "map" of light in a scene, ReSTIR PG turns that map into a living, breathing conversation between pixels.

Core Mechanism

The magic of ReSTIR PG lies in treating path guiding as a resampling problem rather than a density estimation problem. Instead of building massive, static data structures to record where light comes from, it uses "Reservoirs" to pass around the best light directions it finds.

The Reservoir (The "Briefcase")

At the heart of the algorithm is the Reservoir. Think of it as a small briefcase that each pixel carries. Inside this briefcase, the pixel stores:

  • $y$:A promising incoming light direction ($\omega_i$).
  • $w_{sum}$: The sum of weights of all candidate directions seen so far.
  • $M$: The total number of candidates processed.
  • $W$: The final weight used for unbiased contribution.
The Three-Stage Pipeline

ReSTIR PG typically follows these steps to find the "best" direction to shoot a ray:
  1. Initial Sampling: Each pixel samples a few candidate directions using a basic distribution (like a simple BRDF or a low-resolution guiding structure).
  2. Temporal Reuse: The pixel looks at its own briefcase from the previous frame. If that direction was high-quality (high radiance), it merges that old information into its current reservoir.
  3. Spatial Reuse: The pixel "talks" to its neighbors. It looks at the reservoirs of nearby pixels and merges their best directions into its own. Since light hits neighboring pixels similarly, this "gossip" allows a single lucky find (hitting a tiny, bright light source) to spread to the whole screen instantly.
Shift Mapping (The Translation)

When a pixel borrows a direction from a neighbor, the geometry isn't exactly the same. ReSTIR PG uses Shift Mapping to mathematically adjust the direction so it "makes sense" for the new starting point. This ensures that the borrowed light doesn't create artifacts or "leak" through walls.

Key Characteristics

1. Extraordinary Convergence Speed

Because of the spatiotemporal reuse, ReSTIR PG "learns" the lighting environment almost instantly. In scenes with complex indirect lighting (e.g., a room lit only by a door slightly ajar), standard path tracers might take thousands of samples to find the light. ReSTIR PG finds it in a handful of samples because it shares the "discovery" across the entire image.​

2. Memory Efficiency

Traditional path guiding often requires SD-Trees or high-resolution V-forests—complex data structures that eat up VRAM. ReSTIR PG is "lightweight." It primarily stores information per-pixel in a flat buffer. This makes it much more friendly for modern GPU architectures where memory bandwidth and capacity are precious.​
3. Robustness to Dynamic Scenes
Many guiding methods require a "training" phase or long-term accumulation that breaks when a light moves or an object is destroyed. ReSTIR PG is inherently reactive. Because it re-evaluates reservoirs every frame, it can adapt to a light turning on or a mirror moving in near real-time.​

4. Mathematical Unbiasedness (mostly)

By using Weighted Importance Sampling (WIS), ReSTIR PG can remain unbiased or "consistently biased" (converging to the correct answer as samples increase). The weight calculation follows:​
$$W = \frac{1}{\hat{p}(y)} \cdot \frac{\sum w_i}{M}$$
This formula ensures that even though we are "cherry-picking" the best directions, the final image remains physically accurate.​
 

RTX Mega Geometry(RTXMG)​

RTX Mega Geometry (RTXMG) is a revolutionary geometry processing technology officially launched by NVIDIA in 2025. Its core objective is to shatter the geometry complexity bottleneck in real-time ray tracing, enabling games and rendering applications to trace hundreds of millions or even billions of triangles in real-time without causing VRAM overflows or massive frame rate drops.

Why Do We Need RTX Mega Geometry?

In traditional ray tracing, for a light ray to know what it has hit, the GPU must build an Acceleration Structure (BVH).

  • The Pain Point: When a scene contains tens of millions of triangles (such as virtualized Nanite models) or complex animations (like a flowing cape), building and updating the BVH consumes massive CPU power, occupies huge amounts of VRAM, and leads to poor rendering efficiency.
  • The Reality: Many games, even with ray tracing enabled, actually use "Proxy Meshes" for RT calculations. This often results in shadows and reflections that don't perfectly align with the high-detail visuals.
Core Technical Principles

RTX Mega Geometry
redefines the rendering pipeline through three primary pillars:

Clustered Acceleration Structure (CLAS)

RTXMG stops processing every tiny triangle individually and instead divides the geometry into Clusters.

  • Intelligent Updates: Only clusters that undergo changes are rebuilt on the GPU, drastically reducing the CPU's scheduling overhead.
  • Hardware Acceleration: It leverages dedicated hardware units in the RTX 40/50 series to speed up BVH construction by 10x to 100x.
Subdivision & Micro-Mesh

It adopts a "tessellation" approach similar to the film industry. Developers only need to provide a lightweight "control mesh," and RTXMG generates micron-level detail in real-time based on viewing distance.

  • Infinite Detail: Theoretically achieves "one triangle per pixel" precision.
  • Memory Compression: Compared to storing massive high-poly meshes directly, RTXMG uses displacement maps and base meshes, saving up to 10x the VRAM space.
Integration with Virtualized Geometry (e.g., Nanite)

RTXMG
solves the long-standing incompatibility between Unreal Engine 5's Nanite and ray tracing. It allows rays to directly trace the original ultra-high-precision data of Nanite, eliminating artifacts like light leaking and broken shadows.

RTXMG was announced by NVIDIA last year alongside the launch of the RTX 50-series GPUs. However, due to hardware and software limitations, Alan Wake 2 is currently the only title to have implemented the technology. That said, as Unreal Engine 5 continues to evolve, 'Mega Geometry for Nanite' is expected to be integrated into future UE5 iterations. This means that Intel ARC and AMD RDNA series GPUs should eventually be able to benefit from this technology as well. Furthermore, given the deep partnership between CDPR and NVIDIA, there is a high probability that The Witcher 4 will utilize this tech.
 

RTX Mega Geometry(RTXMG)​

RTX Mega Geometry (RTXMG) is a revolutionary geometry processing technology officially launched by NVIDIA last year. Its core objective is to shatter the geometry complexity bottleneck in real-time ray tracing, enabling games and rendering applications to trace hundreds of millions or even billions of triangles in real-time without causing VRAM overflows or massive frame rate drops.

Why Do We Need RTX Mega Geometry?

In traditional ray tracing, for a light ray to know what it has hit, the GPU must build an Acceleration Structure (BVH).

  • The Pain Point: When a scene contains tens of millions of triangles (such as virtualized Nanite models) or complex animations (like a flowing cape), building and updating the BVH consumes massive CPU power, occupies huge amounts of VRAM, and leads to poor rendering efficiency.
  • The Reality: Many games, even with ray tracing enabled, actually use "Proxy Meshes" for RT calculations. This often results in shadows and reflections that don't perfectly align with the high-detail visuals.
Core Technical Principles

RTX Mega Geometry
redefines the rendering pipeline through three primary pillars:

Clustered Acceleration Structure (CLAS)

RTXMG stops processing every tiny triangle individually and instead divides the geometry into Clusters.

  • Intelligent Updates: Only clusters that undergo changes are rebuilt on the GPU, drastically reducing the CPU's scheduling overhead.
  • Hardware Acceleration: It leverages dedicated hardware units in the RTX 40/50 series to speed up BVH construction by 10x to 100x.
Subdivision & Micro-Mesh

It adopts a "tessellation" approach similar to the film industry. Developers only need to provide a lightweight "control mesh," and RTXMG generates micron-level detail in real-time based on viewing distance.

  • Infinite Detail: Theoretically achieves "one triangle per pixel" precision.
  • Memory Compression: Compared to storing massive high-poly meshes directly, RTXMG uses displacement maps and base meshes, saving up to 10x the VRAM space.
Integration with Virtualized Geometry (e.g., Nanite)

RTXMG
solves the long-standing incompatibility between Unreal Engine 5's Nanite and ray tracing. It allows rays to directly trace the original ultra-high-precision data of Nanite, eliminating artifacts like light leaking and broken shadows.


RTXMG was announced by NVIDIA last year alongside the launch of the RTX 50-series GPUs. However, due to hardware and software limitations, Alan Wake 2 is currently the only title to have implemented the technology. That said, as Unreal Engine 5 continues to evolve, 'Mega Geometry for Nanite' is expected to be integrated into future UE5 iterations. This means that Intel ARC and AMD RDNA series GPUs should eventually be able to benefit from this technology as well. Furthermore, given the deep partnership between CDPR and NVIDIA, there is a high probability that The Witcher 4 will utilize this tech.
 
There's nothing preventing AMD or Intel from coming up with similar tech and all 50 series does is to provide RTX MG compression (30% smaller VRAM footprint IIRC). NVIDIA keeping it a RTX exclusive is the usual gameworks BS because they support it on 20-40 series despite no native HW support.

NVRTX branch of UE5 already has RTX MG support.

@Momoka_ Your second post contains a lot of info already found in the first one. Also please no LLM prompt dumps.
 
Last edited:
There's nothing preventing AMD or Intel from coming up with similar tech and all 50 series does is to provide RTX MG compression (30% smaller VRAM footprint IIRC). NVIDIA keeping it a RTX exclusive is the usual gameworks BS because they support it on 20-40 series despite no native HW support.

NVRTX branch of UE5 already has RTX MG support.

Your second post contains a lot of info already found in the first one. Also please no LLM prompt dumps.
Sorry, I sent it twice by mistake.
 
Real-Time Markov Chain Path Guiding for Global Illumination and Single Scattering!
Abstract:

We present a lightweight and unbiased path guiding algorithm tailored for real-time applications with highly dynamic content. The algorithm demonstrates effectiveness in guiding both direct and indirect illumination. Moreover, it can be extended to guide single scattering events in participating media. Building upon the screen-space approach by Dittebrandt et al. [2023], the incident light distribution is represented as a von Mises-Fisher mixture model, which is controlled by a Markov chain process. To extend the procedure to world space, our algorithm uses a unique Markov chain architecture, which resamples Markov chain states from an ensemble of hash grids. We combine multi-resolution adaptive grids with a static grid, ensuring rapid state exchange without compromising guiding quality. The algorithm imposes minimal prerequisites on scene representation and seamlessly integrates into existing path tracing frameworks. Through continuous multiple importance sampling, it remains independent of the equilibrium distribution of Markov chain and hash grid resampling. We perform an evaluation of the proposed methods across diverse scenarios. Additionally, we explore the algorithm’s viability in offline scenarios, showcasing its effectiveness in rendering volumetric caustics. We demonstrate the application of the proposed methods in a path tracing engine for the original Quake game. The demo project features path traced global illumination and single scattering effects at frame rates over 30 FPS on NVIDIA’s GeForce 20 series or AMD’s Radeon RX 6000 series without upscaling.

It's actually referenced in the ReSTIR PG paper mentioned above.

It can do single scatter volume rendering insanely fast, but it's not so hot with handling lots of light sources apparently.
 
Back
Top