• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion exciting new features, research & advancements in gaming (graphics & adjacent software)

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Real-Time Markov Chain Path Guiding for Global Illumination and Single Scattering!


It's actually referenced in the ReSTIR PG paper mentioned above.

It can do single scatter volume rendering insanely fast, but it's not so hot with handling lots of light sources apparently.
I remember seeing this about 8-9 months ago. More info on Reddit, where I summarized it.

Side by side comparisons on the channel of one of authors of the paper: https://www.youtube.com/@lucas-alber

Same author also has this cool denoising implementation: https://www.lalber.org/2025/06/percentile-based-adaptive-svgf/
 
Last edited:
It can do single scatter volume rendering insanely fast, but it's not so hot with handling lots of light sources apparently.
Seems like this is a tradeoff resulting from limited size of SMIS set and it can be combined with ReSTIR:

Our method is less effective in scenarios with many light sources, as illustrated inAlkaline (right). In these situations, it adapts its sampling distribution to prioritize dominant light sources and maintains a stable output over time. For a detailed examination of this behavior, see section 7. ReSTIR remains orthogonal to our method and MCPG could be used to guide paths in ReSTIR PT [Lin et al. 2022] for even lower error output.
 
Forget Superresolution, Sample Adaptively (when Path Tracing)

Abstract​


Real-time path tracing increasingly operates under extremely low sampling budgets, often below one sample per pixel, as rendering complexity, resolution, and frame-rate requirements continue to rise. While super-resolution is widely used in production, it uniformly sacrifices spatial detail and cannot exploit variations in noise, reconstruction difficulty, and perceptual importance across the image. Adaptive sampling offers a compelling alternative, but existing end-to-end approaches rely on approximations that break down in sparse regimes.
We introduce an end-to-end adaptive sampling and denoising pipeline explicitly designed for the sub-1-spp regime. Our method uses a stochastic formulation of sample placement that enables gradient estimation despite discrete sampling decisions, allowing stable training of a neural sampler at low sampling budgets. To better align optimization with human perception, we propose a tonemapping-aware training pipeline that integrates differentiable filmic operators and a state-of-the-art perceptual loss, preventing oversampling of regions with low visual impact.
In addition, we introduce a gather-based pyramidal denoising filter and a learnable generalization of albedo demodulation tailored to sparse sampling. Our results show consistent improvements over uniform sparse sampling, with notably better reconstruction of perceptually critical details such as specular highlights and shadow boundaries, and demonstrate that adaptive sampling remains effective even at minimal budgets.

Instead of doing joint denoising and super res it reduces the sampling density corresponding to the resolution scale and then performs adaptive sampling and denoising in a joint pipeline powered by a 15M parameter CNN model.

It's "...the first end-to-end adaptive sampling and denoising pipeline designed for the sub-1-spp regime." Here it destroys DLSS Ray Reconstruction Transformer and beats all other methods. This is still true for the 2.6M parameter scaled down model.
 
Forget Superresolution, Sample Adaptively (when Path Tracing)


Instead of doing joint denoising and super res it reduces the sampling density corresponding to the resolution scale and then performs adaptive sampling and denoising in a joint pipeline powered by a 15M parameter CNN model.

It's "...the first end-to-end adaptive sampling and denoising pipeline designed for the sub-1-spp regime." Here it destroys DLSS Ray Reconstruction Transformer and beats all other methods. This is still true for the 2.6M parameter scaled down model.

Conclusion
We presented the first end-to-end adaptive sampling and denoising pipeline designed for the sub-1-spp regime. Our stochastic formulation for sample placement enables stable gradient estimation at extreme sparsity, where prior deterministic approaches fail. Combined with tonemapping-aware training using perceptual losses and a gather-based pyramidal filter suited to sparse inputs, our method
consistently outperforms both superresolution and previous adaptive sampling approaches across all tested budgets.

Our results demonstrate that adaptive sampling remains effective even when the average budget falls below one sample per pixel.
The learned sampler reliably concentrates samples on perceptually critical regions while safely reducing allocation in smooth areas where errors are difficult to perceive.
 
Real-time Rendering with a Neural Irradiance Volume

Abstract​


Rendering diffuse global illumination in real-time is often approximated by pre-computing and storing irradiance in a 3D grid of probes. As long as most of the scene remains static, probes approximate irradiance for all surfaces immersed in the irradiance volume, including novel dynamic objects. This approach, however, suffers from aliasing artifacts and high memory consumption. We propose Neural Irradiance Volume (NIV), a neural-based technique that allows accurate real-time rendering of diffuse global illumination via a compact pre-computed model, overcoming the limitations of traditional probe-based methods, such as the expensive memory footprint, aliasing artifacts, and scene-specific heuristics. The key insight is that neural compression creates an adaptive and amortized representation of irradiance, circumventing the cubic scaling of grid-based methods. Our superior memory-scaling improves quality by at least 10x at the same memory budget, and enables a straightforward representation of higher-dimensional irradiance fields, allowing rendering of time-varying or dynamic effects without requiring additional computation at runtime. Unlike other neural rendering techniques, our method works within strict real-time constraints, providing fast inference (around 1 ms per frame on consumer GPUs at full HD resolution), reduced memory usage (1-5 MB for medium-sized scenes), and only requires a G-buffer as input, without expensive ray tracing or denoising.

A purely ML based high quality diffuse global illumination implementation. Many limitations on the implementation but runs much faster and consumes far less memory than probe based GI. Looks far better in some instances as well.
 

Think this is the first mention of shipping Neural Texture Compression in a real game. This is not NVIDIA's SDK but Ubisofts own proprietary take that combines selective NTC (high memory pressure + instance counts) and automatic asset upscaling. Upscaling part was unexpected but Mirage was crossgen title with low quality assets and also apparently NTC benefits from higher-resolution training targets.

No talks scheduled about this at GDC, so possibly something to be discussed at a later point at Eurographics, HPG or SIGGRAPH. Also no confirmed date for when this ships in the game so it could be a while.
 
just came across a very good blog by Nvidia on work graphs from 2 years ago
That's indeed a good blog + I noticed the stuff about occupancy. Isn't no barriers supposed to be a major win especially for divergent workloads? As soon as producer is finished, no more BS waiting for the last SIMD32 thread to complete?
I've prob spent too much time looking at this stuff xD This article introduces:

  • … enable more direct methods of solving complex problems.
  • … reduce memory constraints and improve cache utilization.
  • … simplify inter-pass dependencies and barrier-induced complexity.
  • … improve GPU thread saturation.

So
#1 API flexibility
#2 Cachemem efficiency boost
#3 No more fixed limitations where one weak link can stall entire execution
#4 Very much the same as #3, but due additional factors.

The SIGGRAPH 2025 PDF notes are a gold mine:

Page 253 illustrates why running multiple material shaders at once within a SIMD unit is a terrible idea. But by having a material node for each unique material this can be avoided altogether ensuring only coherent material shaders are being run.

From a hardware angle to properly benefit from this we would need to defer the any-hit shader evaluations, since performing a SE/GPC global payload sort is required to ensure there's plenty of any-hit requests to choose from. This is to ensure only one material node at a time is being executed for the material shaders. This is misleading the producer results can accumulate via the global payload sorter until these can be sent to the consumers. No expensive writes to global memory. Should allow the GPU to achieve very high occupancy. Because only one or a few to one material node at a time is executed in compute units, we can achieve very high occupancy. If there's more than one node a simple partial sort should be enough to ensure coherent shader execution. Thus extremely high coherence could be expected, likely eclipsing what is currently afforded by SER in DXR 1.2 So no more thread divergence and also no more low occupancy. Since PT is largely shading, not traversal, the potential improvement here is significant. By moving the entire RT pipeline into a work graph, not just the compute shader passes, additional efficiencies are adventitiously exploited to ensure a pipeline that's even more coherent and has even higher occupancy, thus eliminating most if not all bubbles, barriers, and empty launches.

Regular gaming workloads can benefit too, but will unlock new possibilities for GPU driven procedural content generation, complex systems on GPUs (AI and physics), neural shading, and as already mentioned ray and path tracing.


For a HW architecture that is hard coded (you know which one) to match this capability across the entire stack, which includes building a robust cachemem foundation, I suspect we could see massive benefits. Hopefully in the best case mirroring or even exceeding the occupancy coherence of the Pixel shaders pass in the C&C's coverage. Going from 30-45% occupancy coherence to ~90% is 2-3X higher occupancy coherence is a big deal. I know the Chips and Cheese's SER article math includes traversal step as well, but even if these benefits only applied to shading it would still be a complete gamechanger.

All this sounds too good to be true so can anyone please provide a sanity check or shoot it down if it's misleading?
 
Last edited:

Think this is the first mention of shipping Neural Texture Compression in a real game. This is not NVIDIA's SDK but Ubisofts own proprietary take that combines selective NTC (high memory pressure + instance counts) and automatic asset upscaling. Upscaling part was unexpected but Mirage was crossgen title with low quality assets and also apparently NTC benefits from higher-resolution training targets.

No talks scheduled about this at GDC, so possibly something to be discussed at a later point at Eurographics, HPG or SIGGRAPH. Also no confirmed date for when this ships in the game so it could be a while.
They already had a talk about NTC 2 years ago: https://gdcvault.com/play/1034892/Machine-Learning-Summit-Real-time

Still odd there hasn't been any update from Ubisoft. You don't do a silent drop for the world's first NTC game implementation, so it's prob still WIP.
 
GDDR7 is great says Micron. Oh, and gamers need 96GB of gpu ram.
Yass, gamers need smart update tactics, deets yo!

1772177450421.png

Imagine how disconnected these company execs are, advertising for the exact thing that's going to destroy their retail sales. In the case of Micron they can go full tone deaf mode, but Asus should know better. Good luck with your PC sales in 2026 Asus, stay tuned for more deets!
 
That's indeed a good blog + I noticed the stuff about occupancy. Isn't no barriers supposed to be a major win especially for divergent workloads? As soon as producer is finished, no more BS waiting for the last SIMD32 thread to complete?
I've prob spent too much time looking at this stuff xD This article introduces:



So
#1 API flexibility
#2 Cachemem efficiency boost
#3 No more fixed limitations where one weak link can stall entire execution
#4 Very much the same as #3, but due additional factors.

The SIGGRAPH 2025 PDF notes are a gold mine:

Page 253 illustrates why running multiple material shaders at once within a SIMD unit is a terrible idea. But by having a material node for each unique material this can be avoided altogether ensuring only coherent material shaders are being run.

From a hardware angle to properly benefit from this we would need to defer the any-hit shader evaluations, since performing a SE/GPC global payload sort is required to ensure there's plenty of any-hit requests to choose from. This is to ensure only one material node at a time is being executed for the material shaders. This is misleading the producer results can accumulate via the global payload sorter until these can be sent to the consumers. No expensive writes to global memory. Should allow the GPU to achieve very high occupancy. Because only one or a few to one material node at a time is executed in compute units, we can achieve very high occupancy. If there's more than one node a simple partial sort should be enough to ensure coherent shader execution. Thus extremely high coherence could be expected, likely eclipsing what is currently afforded by SER in DXR 1.2 So no more thread divergence and also no more low occupancy. Since PT is largely shading, not traversal, the potential improvement here is significant. By moving the entire RT pipeline into a work graph, not just the compute shader passes, additional efficiencies are adventitiously exploited to ensure a pipeline that's even more coherent and has even higher occupancy, thus eliminating most if not all bubbles, barriers, and empty launches.

Regular gaming workloads can benefit too, but will unlock new possibilities for GPU driven procedural content generation, complex systems on GPUs (AI and physics), neural shading, and as already mentioned ray and path tracing.


For a HW architecture that is hard coded (you know which one) to match this capability across the entire stack, which includes building a robust cachemem foundation, I suspect we could see massive benefits. Hopefully in the best case mirroring or even exceeding the occupancy coherence of the Pixel shaders pass in the C&C's coverage. Going from 30-45% occupancy coherence to ~90% is 2-3X higher occupancy coherence is a big deal. I know the Chips and Cheese's SER article math includes traversal step as well, but even if these benefits only applied to shading it would still be a complete gamechanger.

All this sounds too good to be true so can anyone please provide a sanity check or shoot it down if it's misleading?

I've rewritten this previous message. Sorry for any inaccuracies.

Work graphs looks even more impressive than I thought. Definitely potential for major reset with DXR 1.3. Fingers crossed.
 

Wednesday, March 11

DirectX State of the Union 2026: DirectStorage and Beyond
Speakers: Shawn Hargreaves (Microsoft), Danny Chen (Microsoft)
Date: Wednesday, March 11
Time: 11:30am - 12:30pm
Location: Room 3001/3003, West Hall

Thursday, March 12

DirectX: Bringing Console-Level GPU Tools to Windows
Speakers:
Austin Kinross (Microsoft), Budi Purnomo (AMD), Steven Tovey (Intel), Kevin Hawkins (Qualcomm)
Date: Thursday, March 12
Time: 11:30am – 12:30pm
Location: Room 2020/2022, West Hall

Evolving DirectX for the ML Era on Windows
Speakers:
Max McMullen (Microsoft), Hisham Chowdhury (AMD), Steven Tovey (Intel), Don Brittain (NVIDIA)
Date: Thursday, March 12
Time: 12:45pm – 1:45pm
Location: Room 2024, West Hall


Xbox at GDC 2026: Build for What’s Next​

The future of Xbox starts now.
February 12, 2026
Bryce Baer
Senior Director of Xbox Ecosystem Marketing

This year, GDC takes place from March 9-13 at the Moscone Convention Center in San Francisco, California.

This year, for the first time, we are hosting the Xbox Dev Summit, where we’ll be presenting six sponsored sessions to help prepare attendees to build for what's next. The Xbox Dev Summit is in West Hall Room 3001/3003 and will be kicking off with Jason Ronald, VP of Next Gen, at 10:10am on Wednesday, March 11.

Wednesday, March 11

Xbox Developer Summit Keynote: Building for the Future with Xbox
Speakers: Jason Ronald (Xbox)
Date: Wednesday, March 11
Time: 10:10am – 11:10am
Location: Room 3001/3003, West Hall

 
DirectX Compute Graph Compiler will be available for private preview this summer, please reach out to your Windows representative if you’re interested in joining.

DX Linear Algebra will enter public preview in April, giving developers an early opportunity to experiment with these capabilities and help shape the future of ML‑assisted graphics on Windows. See the Linear Algebra spec for more detail about the feature.

Introducing DX Linear Algebra​

(Last year, Cooperative Vector demonstrated that ML can be effectively integrated directly into the graphics pipeline, particularly for scenarios where developers want fine-grained, shader level control over how ML is applied alongside traditional rendering logic.

For the first time, developers could access hardware accelerated vector–matrix operations directly from HLSL, enabling a class of neural rendering techniques that execute inline with traditional shading.)

not all workloads fit this execution model. Many common and emerging scenarios—such as denoising, temporal upscaling, and more—require matrix–matrix operations, shared data across threads, and batch-oriented execution that go beyond what vector–matrix primitives alone can efficiently express.

DirectX Linear Algebra, an expansion of DirectX’s math capabilities is designed to support both vector and matrix-based ML workloads under a single programming model. DX Linear Algebra adds first-class matrix–matrix operations while preserving the ability to author ML directly in HLSL, giving developers explicit control over math, data flow, and execution for shader level ML scenarios.


Expanding to Model Level ML with DirectX Compute Graph Compiler​

DirectX Compute Graph Compiler is a new DirectX ML compiler API designed to execute full model graphs with native class GPU performance.

While shader-level ML (DX Linear Algebra above) is powerful, many modern ML-driven graphics workloads are best expressed and optimized as full computation graphs, not as isolated operators or hand-authored kernels. These graphs capture end-to-end structure—dataflow, dependencies, and deep fusion—that are difficult or impossible to exploit at the shader level, especially when targeting the full PC ecosystem.



Shader-level ML and model-level ML now live side by side in DirectX:
  1. HLSL Linear Algebra for small, inline workloads and
  2. DirectX Compute Graph Compiler for larger models.

 
Last edited:
DX13 needs a 'killer game' at launch to show off its full potential and convince players and developers that the upgrade is worth it.
I hope so but we haven't had even like a game use mesh shaders aside from Alan Wake II or Doom: The Dark Ages (Vulkan but it supports it). The extended cross gen console period + the popularity of GTX 10 series makes it very hard. And considering how games are expensive to make these days a game fully taking advantage of DX13 will probably be years after RDNA 5/Helix comes out.
 
Back
Top