This is part of a series on Differentiable Slang. For more information about practical examples of Slang with various machine learning rendering applications,…
This is part of a series on Differentiable Slang. For more information about practical examples of Slang with various machine learning rendering applications, see Differentiable Slang: Example Applications.
NVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language can serve as a unified platform for real-time, inverse, and differentiable rendering. The work is a collaboration between MIT, UCSD, UW, and NVIDIA researchers.
Slang is an open-source language for real-time graphics programming that brings new capabilities for writing and maintaining large-scale, high-performance, cross-platform graphics codebases. Slang adapts modern language constructs to the high-performance demands of real-time graphics, and generates code for D3D12, Vulkan, OptiX, CUDA, and the CPU. While Slang began as a research project, it has grown into a practical solution used in NVIDIA’s Omniverse and RTX Remix renderers, and NVIDIA’s Falcor research infrastructure.
The new research pioneers a co-design approach that shows the complexities of automatic differentiation can be handled elegantly if differentiation is incorporated as a first-class citizen in the entire system: the language, the type system, the intermediate representation (IR), the optimization passes, and the auto-completion engine.
Slang’s automatic differentiation integrates seamlessly with Slang’s modular programming model, GPU graphics pipelines, Python, and PyTorch. Slang supports differentiating arbitrary control flow, user-defined types, dynamic dispatch, generics, and global memory accesses. With Slang, existing real-time renderers can be made differentiable and learnable without major source code changes.
Bridging computer graphics and machine learning
Data-driven rendering algorithms are changing computer graphics, enabling powerful new representations for shape, textures, volumetrics, materials, and post-processing algorithms that increase performance and image quality. In parallel, computer vision and machine learning researchers are increasingly leveraging computer graphics, e.g., to improve 3D reconstruction through inverse rendering.
Bridging real-time graphics, machine learning, and computer vision development environments is challenging because of different tools, libraries, programming languages, and programming models. With our latest research, Slang enables developers to easily:
- Bring learning to rendering. Slang enables graphics developers to use gradient-based optimization to solve traditional graphics problems in a data-driven manner. For example, learning mipmap hierarchies using appearance-based optimization.
- Build differentiable renderers from existing graphics code. With Slang, we transformed a pre-existing real-time path tracer into a differentiable path tracer, reusing 90% of the Slang code.
- Bring graphics to ML training frameworks. Slang generates custom PyTorch plugins from graphics shader code. We demonstrate using Slang in Nvdiffrec to generate auto-differentiated CUDA kernels.
- Bring ML training inside the renderer. Slang facilitates training small neural networks inside a real-time renderer, such as the model used in neural radiance caching.
Differentiable programming and machine learning need gradients
Figure 1. Stanford Bunny placed inside the Cornell box. Left: Rendered image; Middle: Reference derivative with respect to the bunny’s translation in the y-axis; Right: Scene derivative generated by autodiff in Slang matches the reference.
A key pillar of machine learning methods is gradient-based optimization. Specifically, most ML algorithms are powered by reverse-mode automatic differentiation, an efficient way to propagate derivatives through a series of computations. This applies not only to large neural networks but also to many simpler data-driven algorithms that require the use of gradients and gradient descent.
Frameworks like PyTorch expose high-level operations on tensors (multi-dimensional matrices) that come with hand-coded reverse-mode kernels. As the user composes tensor operations to create their neural network, PyTorch composes their derivative computations automatically by chaining those kernels. The result is an easy-to-use system where the user does not have to write gradient flow manually, which is one of the reasons behind ML research’s accelerated pace.
Unfortunately, some computations aren’t easily captured by those high-level operations on arrays, creating difficulties for the users to express them efficiently. This is the case with graphics components such as a rasterizer or ray tracer, where diverging control flow and complex access patterns require a lot of inefficient active-mask tracking and other workarounds. Those workarounds are not only difficult to write and read but also have a significant performance and memory usage overhead.
As a result, most high-performance differentiable graphics pipelines, such as nvdiffrec, InstantNGP, and Gaussian splatting, are not written in pure Python. Instead, researchers write high-performance kernels in languages operating closer to the underlying hardware, such as CUDA, HLSL, or GLSL. Because these languages do not provide automatic differentiation, these applications use hand-derived gradients. Hand-differentiation is tedious and error-prone, making it difficult for others to use or modify those algorithms. This is where Slang comes in, as it can automatically generate differentiated shader code for multiple backends.
Designing Slang’s Automatic Differentiation
Figure 2. Propagated derivatives on the Zero Day scene computed by a differentiable path tracer written in the Falcor framework. The differentiable path tracer was built by reusing over 5,000 lines of pre-existing shader code.
Slang’s roots can be traced to the Spark programming language presented at Siggraph 2011 and, in its current form, to SIGGRAPH 2018. Adding automatic differentiation to Slang required years of research and many iterations of language design. Every part of the language and the compiler – including the parser, type system, standard library, IR, optimization passes, and the Intellisense engine – needed to be revised to support auto-diff as a first-class member of the language.
Slang’s type system has been extended to treat differentiability as a first-class property of functions and types. The type system enables compile-time checks to guard against common mistakes when working in differentiable programming frameworks, such as dropping derivatives unintentionally through calls to non-differentiable functions. We describe those and many more challenges and solutions in our technical paper (link).
In Slang, automatic differentiation is represented as a composable operator on functions: applying automatic differentiation on a function yields another function that can be used just as any other functions. This functional design enables higher-order differentiation, which is absent in many other frameworks. The ability to differentiate a function multiple times in any combination of forward and reverse modes significantly eases the implementation of advanced rendering algorithms, such as warped-area sampling and Hessian-Hamiltonian MLT.
Slang’s standard library has also been extended to support differentiable computations, and most existing HLSL intrinsic functions are treated as differentiable functions, allowing existing code that uses these intrinsics to be automatically differentiated without modifications.
Figure 3. Screenshot of Slang’s Visual Studio Code extension providing interactive hinting on the signature of an automatically differentiated function.
Slang offers a complete developer toolset, including a Visual Studio Code extension with comprehensive hinting and auto-completion support for differentiable entities, which improves productivity in our internal projects.
Slang adds Real-Time Graphics to the Differentiable Programming Ecosystem
The Slang compiler can emit derivative function code in HLSL (for use with Direct3D pipelines), GLSL/SPIR-V (for use with OpenGL and Vulkan), CUDA/OptiX (for use in standalone applications, in Python, or with tensor frameworks like PyTorch), and scalar C++ (for debugging). You can emit the same code to multiple targets, for instance, train efficient models with PyTorch’s optimizers and then deploy them in a video game or other interactive experience running on Vulkan/Direct3D without writing new or different code. A single representation written in one language is highly beneficial for long-term code maintenance and avoiding bugs arising if two versions are subtly different.
Similarly to NVIDIA’s WARP framework for differentiable simulation, Slang contributes to the growing ecosystem of differentiable programming. Slang allows the generation of derivatives automatically and using them together with both lower and higher-level programming environments. It is possible to use Slang together with hand-written, heavily optimized CUDA kernels and libraries.
If you prefer a higher-level approach and use Python interactive notebooks for research and experimentation, you can use Slang via the slangpy package (pip install slangpy) from environments like Jupyter notebooks. Slang can be a part of a rich notebook, Python, PyTorch, and numpy ecosystem to interface with data stored in various formats, interact with it using widgets, and visualize with plotting and data analysis libraries while offering an additional programming model, more suited for certain applications.
A Tale of Two Programming Models: Tensors vs Shading Languages
PyTorch and other tensor-based libraries, such as numpy, TensorFlow, and Jax, offer fundamentally different programming models from Slang and, in general, shading languages. PyTorch is designed primarily for “feed-forward” neural networks where operations on each element are relatively uniform without diverging control flow. The numpy and PyTorch n-dimensional array (NDArray) model operates on whole tensors, making it trivial to specify horizontal reductions like summing over axes and large matrix multiplications.
By contrast, shading languages occupy the other end of the spectrum and expose the single-instruction-multiple-threads (SIMT) model to allow programmers to specify programs operating on a single element or a small block of elements. This makes it easy to express intricate control flow where each set of elements executes a vastly different series of operations, such as when the rays of a path tracer strike different surfaces and execute different logic for their next bounce.
Both models co-exist and should be treated as complementary, as they fulfill different goals: A reduce-sum operation on a tensor would take one line of NDArray code, but hundreds of lines of code and multiple kernel launches to express efficiently in the SIMT style. Conversely, a variable-step ray marcher can be written elegantly in the SIMT style using dynamic loops and stopping conditions, but the same ray marcher would devolve into complex and unmaintainable active-mask-tracking NDArray code. Such code is not only difficult to write and read but can perform worse since every branch gets executed for each element instead of only one or the other, depending on the active state.
Figure 4. Left: NDArray frameworks’ “wavefront” model with operations on full batches and intermediate results stored in global memory. Right: SIMT frameworks’ fused model compiles multiple passes into one optimized kernel, using local intermediate results to save memory and bandwidth.
Performance benefits
PyTorch and other Machine Learning frameworks are built for training and inference of large neural networks. They use heavily optimized platform libraries to perform large matrix-multiply and convolution operations. While each individual operation is extremely efficient, the intermediate data between them is serialized to the main memory and checkpointed. During training, the forward and backpropagation pass are computed serially and separately. This makes the overhead of PyTorch significant for tiny neural networks and other differentiable programming uses in real-time graphics.
Slang’s automatic differentiation feature gives programmers the control over how gradient values are stored, accumulated, and computed, allowing significant performance and memory optimizations. By avoiding multiple kernel launches, excessive global memory accesses, and unnecessary synchronizations, it allows for fusing forward and backward passes and up to 10x training speedups compared to the same small-network and graphics workloads written with standard PyTorch operations. This speedup not only accelerates the training of machine learning models but also enables many novel applications that use smaller inline neural networks inside graphics workloads. Inline neural networks open up a whole new area of computer graphics research, such as neural radiance caching, neural texture compression, and neural appearance models.
Show me the code!
Head to GitHub for Slang’s open-source repository and the slangpy python package (documented here). The automatic differentiation language feature is documented in Slang’s user guide. We include several differentiable Slang tutorials that walk through the code for common graphics components in Slang while introducing Slang’s object-oriented differentiable programming model.
Slang + PyTorch tutorials using slangpy:
- 1-Triangle Rasterizer (Non-Differentiable)
- 1-Triangle Differentiable ‘Soft’ Rasterizer
- 1-Triangle Differentiable Rasterizer using Monte Carlo Edge Sampling
- Image Fitting using Tiny Inline MLPs (using CUDA’s WMMA API)
For additional examples, view the blog post Differentiable Slang: Example Applications.
Conclusion
Differentiable rendering is a powerful tool for computer graphics, computer vision, and image synthesis. While researchers have advanced its capabilities, built systems, and explored applications for years, the resulting systems were difficult to combine with existing large codebases. Now, with Slang, existing real-time renderers can be made differentiable.
Slang greatly simplifies adding shader code to machine learning pipelines, and vice-versa: adding learned components to rendering pipelines.
Real-time rendering experts can now explore building machine learning rendering components without rewriting the rendering code in ML frameworks. Slang facilitates data-driven asset optimization and improvement and aids the research of novel neural components in traditional rendering.
On the other end of the spectrum, machine learning researchers can now leverage existing renderers and assets with complex shaders and incorporate expressive state-of-the-art shading models in new architectures.
We are looking forward to seeing how bridging real-time graphics and machine learning contributes to new photorealistic neural and data-driven techniques.
We encourage readers to check out our SIGGRAPH Asia 2023 paper for more insights and technical details on Slang’s automatic differentiation feature.
Source:: NVIDIA