Commit Graph

239 Commits

Author SHA1 Message Date
Brecht Van Lommel
55b8fc718a Cycles: improve detection of HIP compiler for buildbot
And fix various broken things in the HIP kernel compilation.
2021-10-05 13:47:50 +02:00
Brecht Van Lommel
86ec9d79ec Fix build without Cycles HIP device 2021-09-28 20:00:55 +02:00
Brian Savery
044a77352f Cycles: add HIP device support for AMD GPUs
NOTE: this feature is not ready for user testing, and not yet enabled in daily
builds. It is being merged now for easier collaboration on development.

HIP is a heterogenous compute interface allowing C++ code to be executed on
GPUs similar to CUDA. It is intended to bring back AMD GPU rendering support
on Windows and Linux.

https://github.com/ROCm-Developer-Tools/HIP.

As of the time of writing, it should compile and run on Linux with existing
HIP compilers and driver runtimes. Publicly available compilers and drivers
for Windows will come later.

See task T91571 for more details on the current status and work remaining
to be done.

Credits:

Sayak Biswas (AMD)
Arya Rafii (AMD)
Brian Savery (AMD)

Differential Revision: https://developer.blender.org/D12578
2021-09-28 19:18:55 +02:00
Brecht Van Lommel
0803119725 Cycles: merge of cycles-x branch, a major update to the renderer
This includes much improved GPU rendering performance, viewport interactivity,
new shadow catcher, revamped sampling settings, subsurface scattering anisotropy,
new GPU volume sampling, improved PMJ sampling pattern, and more.

Some features have also been removed or changed, breaking backwards compatibility.
Including the removal of the OpenCL backend, for which alternatives are under
development.

Release notes and code docs:
https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycles
https://wiki.blender.org/wiki/Source/Render/Cycles

Credits:
* Sergey Sharybin
* Brecht Van Lommel
* Patrick Mours (OptiX backend)
* Christophe Hery (subsurface scattering anisotropy)
* William Leeson (PMJ sampling pattern)
* Alaska (various fixes and tweaks)
* Thomas Dinges (various fixes)

For the full commit history, see the cycles-x branch. This squashes together
all the changes since intermediate changes would often fail building or tests.

Ref T87839, T87837, T87836
Fixes T90734, T89353, T80267, T80267, T77185, T69800
2021-09-21 14:55:54 +02:00
Brecht Van Lommel
073bf8bf52 Cycles: remove WITH_CYCLES_DEBUG, add WITH_CYCLES_DEBUG_NAN
WITH_CYCLES_DEBUG was used for rendering BVH debugging passes. But since we
mainly use Embree an OptiX now, this information is no longer important.

WITH_CYCLES_DEBUG_NAN will enable additional checks for NaNs and invalid values
in the kernel, for Cycles developers. Previously these asserts where enabled in
all debug builds, but this is too likely to crash Blender in scenes that render
fine regardless of the NaNs. So this is behind a CMake option now.

Fixes T90240
2021-07-28 19:27:57 +02:00
Brecht Van Lommel
cf74cd9367 Cycles: upgrade CUDA to 11.4
This fixes a performance regression on Ampere cards, on specific scenes like
classroom. For cycles-x there is little difference, but this is still helpful
for LTS releases, and we need to upgrade at some point anyway.
2021-07-26 19:46:51 +02:00
Brecht Van Lommel
b42454be8b Cleanup: move BVH utility functions into own file 2021-04-19 21:07:34 +02:00
Patrick Mours
c10546f5e9 Cycles: Add support for shader raytracing in OptiX
Support for the AO and bevel shader nodes requires calling "optixTrace" from within the shading
VM, which is only allowed from inlined functions to the raygen program or callables. This patch
therefore converts the shading VM to use direct callables to make it work. To prevent performance
regressions a separate kernel module is compiled and used for this purpose.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D9733
2020-12-04 13:04:11 +01:00
Campbell Barton
2bd8f7e059 Cleanup: use string APPEND/PREPEND
Replace 'set' with 'string(APPEND/PREPEND ...)'.
This avoids duplicating the variable name.
2020-11-06 12:32:54 +11:00
Patrick Mours
3bb3b26c8f Cycles: Add CUDA 11 build support
With this patch the build system checks whether the "CUDA10_NVCC_EXECUTABLE" CMake
variable is set and if so will use that to build sm_30 kernels. Similarily for sm_8x kernels it
checks "CUDA11_NVCC_EXECUTABLE". All other kernels are built using the default CUDA
toolkit. This makes it possible to use either the CUDA 10 or CUDA 11 toolkit by default and
only selectively use the other for the kernels where its a hard requirement.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D9179
2020-10-13 15:15:44 +02:00
Patrick Mours
3df90de6c2 Cycles: Add NanoVDB support for rendering volumes
NanoVDB is a platform-independent sparse volume data structure that makes it possible to
use OpenVDB volumes on the GPU. This patch uses it for volume rendering in Cycles,
replacing the previous usage of dense 3D textures.

Since it has a big impact on memory usage and performance and changes the OpenVDB
branch used for the rest of Blender as well, this is not enabled by default yet, which will
happen only after 2.82 was branched off. To enable it, build both dependencies and Blender
itself with the "WITH_NANOVDB" CMake option.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D8794
2020-10-05 15:03:30 +02:00
Brecht Van Lommel
f04260d8c6 CMake: refresh building and external library handling of Cycles standalone
* Support precompiled libraries on Linux
* Add license headers
* Refactoring to deduplicate code

Includes work by Ray Molenkamp and Grische for precompiled libraries.

Ref D8769
2020-09-04 17:10:50 +02:00
Patrick Mours
d64e171c4b Cycles: Enable OptiX on first generation Maxwell GPUs again 2020-07-27 16:11:00 +02:00
Patrick Mours
a9644c812f Cycles: Use pre-compiled PTX kernel for older generation when no matching one is found
This patch changes the discovery of pre-compiled kernels, to look for any PTX, even if
it does not match the current architecture version exactly. It works because the driver can
JIT-compile PTX generated for architectures less than or equal to the current one.
This e.g. makes it possible to render on a new GPU architecture even if no pre-compiled
binary kernel was distributed for it as part of the Blender installation.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D8332
2020-07-20 19:25:27 +02:00
Brecht Van Lommel
d1ef5146d7 Cycles: remove SIMD BVH optimizations, to be replaced by Embree
Ref T73778

Depends on D8011

Maniphest Tasks: T73778

Differential Revision: https://developer.blender.org/D8012
2020-06-22 13:28:01 +02:00
Lukas Stockner
eacdcb2dd8 Cycles: Add new Sky Texture method including direct sunlight
This commit adds a new model to the Sky Texture node, which is based on a
method by Nishita et al. and works by basically simulating volumetric
scattering in the atmosphere.

By making some approximations (such as only considering single scattering),
we get a fairly simple and fast simulation code that takes into account
Rayleigh and Mie scattering as well as Ozone absorption.

This code is used to precompute a 512x128 texture which is then looked up
during render time, and is fast enough to allow real-time tweaking in the
viewport.

Due to the nature of the simulation, it exposes several parameters that
allow for lots of flexibility in choosing the look and matching real-world
conditions (such as Air/Dust/Ozone density and altitude).

Additionally, the same volumetric approach can be used to compute absorption
of the direct sunlight, so the model also supports adding direct sunlight.
This makes it significantly easier to set up Sun+Sky illumination where
the direction, intensity and color of the sun actually matches the sky.

In order to support properly sampling the direct sun component, the commit
also adds logic for sampling a specific area to the kernel light sampling
code. This is combined with portal and background map sampling using MIS.

This sampling logic works for the common case of having one Sky texture
going into the Background shader, but if a custom input to the Vector
node is used or if there are multiple Sky textures, it falls back to using
only background map sampling (while automatically setting the resolution to
4096x2048 if auto resolution is used).

More infos and preview can be found here:
https://docs.google.com/document/d/1gQta0ygFWXTrl5Pmvl_nZRgUw0mWg0FJeRuNKS36m08/view

Underlying model, implementation and documentation by Marco (@nacioss).
Improvements, cleanup and sun sampling by @lukasstockner.

Differential Revision: https://developer.blender.org/D7896
2020-06-17 21:06:41 +02:00
Brecht Van Lommel
d97c83712c Cycles: mark CUDA 10.2 as officially supported
It appears to work fine after a recent bugfix and testing for the past few
weeks.
2020-05-05 15:06:49 +02:00
Ray Molenkamp
aeb42cf8ab Cycles/Optix: Support building the optix kernels on demand.
CMake: `WITH_CYCLES_DEVICE_OPTIX` did not respect `WITH_CYCLES_CUDA_BINARIES` causing the optix kernel to be always build at build time.

Code: `device_optix.cpp` did not count on the optix kernel not existing in the default location.

For this to work, one should have before starting blender

1) working nvcc environment
2) Optix SDK installed and the OPTIX_ROOT_DIR environment variable pointing to it which is not set by default

Differential Revision: https://developer.blender.org/D7400

Reviewed By: Brecht
2020-04-11 12:59:21 -06:00
Ray Molenkamp
86c61ce64f Cycles: Restore cycles_cubin_cc to working order
Reviewed by: brecht pmoursnv
Differential Revision: https://developer.blender.org/D7136
2020-03-26 11:41:44 -06:00
Stefan Werner
51e898324d Adaptive Sampling for Cycles.
This feature takes some inspiration from
"RenderMan: An Advanced Path Tracing Architecture for Movie Rendering" and
"A Hierarchical Automatic Stopping Condition for Monte Carlo Global Illumination"

The basic principle is as follows:
While samples are being added to a pixel, the adaptive sampler writes half
of the samples to a separate buffer. This gives it two separate estimates
of the same pixel, and by comparing their difference it estimates convergence.
Once convergence drops below a given threshold, the pixel is considered done.

When a pixel has not converged yet and needs more samples than the minimum,
its immediate neighbors are also set to take more samples. This is done in order
to more reliably detect sharp features such as caustics. A 3x3 box filter that
is run periodically over the tile buffer is used for that purpose.

After a tile has finished rendering, the values of all passes are scaled as if
they were rendered with the full number of samples. This way, any code operating
on these buffers, for example the denoiser, does not need to be changed for
per-pixel sample counts.

Reviewed By: brecht, #cycles

Differential Revision: https://developer.blender.org/D4686
2020-03-05 12:21:38 +01:00
Charlie Jolly
20a4cdfd70 Cycles: Vector Rotate Node using Axis and Angle method
This node provides the ability to rotate a vector around a `center` point using either `Axis Angle` , `Single Axis` or `Euler` methods.

Reviewed By: #cycles, brecht

Differential Revision: https://developer.blender.org/D3789
2020-02-17 15:43:18 +00:00
Lukas Stockner
e760972221 Cycles: support for custom shader AOVs
Custom render passes are added in the Shader AOVs panel in the view layer
settings, with a name and data type. In shader nodes, an AOV Output node
is then used to output either a value or color to the pass.

Arbitrary names can be used for these passes, as long as they don't conflict
with built-in passes that are enabled. The AOV Output node can be used in both
material and world shader nodes.

Implemented by Lukas, with tweaks by Brecht.

Differential Revision: https://developer.blender.org/D4837
2019-12-10 20:44:46 +01:00
Campbell Barton
d310cbfa0f Merge branch 'blender-v2.81-release' 2019-10-29 01:38:34 +11:00
Campbell Barton
312075e688 CMake: add missing headers, use space before comments 2019-10-29 01:33:44 +11:00
Stefan Werner
35a545b752 Cycles: Allow PTX targets for CUDA kernel build.
This is intended for developers on Windows primarily:
Now, CUDA architectures of type compute_xx are supported. This allows for quicker builds,
at the expense of the CUDA driver running ptxas the first time a kernel is loaded.

Differential Revision: https://developer.blender.org/D5953
2019-10-16 10:29:04 +02:00
Patrick Mours
a2b52dc571 Cycles: add Optix device backend
This uses hardware-accelerated raytracing on NVIDIA RTX graphics cards.

It is still currently experimental. Most features are supported, but a few
are still missing like baking, branched path tracing and using CPU memory.
https://wiki.blender.org/wiki/Reference/Release_Notes/2.81/Cycles#NVIDIA_RTX

For building with Optix support, the Optix SDK must be installed. See here for
build instructions:
https://wiki.blender.org/wiki/Building_Blender/CUDA

Differential Revision: https://developer.blender.org/D5363
2019-09-13 11:50:11 +02:00
OmarSquircleArt
2ea82e86ca Shading: Add Vertex Color node.
This patch adds a new Vertex Color node. The node also returns the alpha
of the vertex color layer as an output.

Reviewers: brecht

Differential Revision: https://developer.blender.org/D5767
2019-09-12 17:42:13 +02:00
OmarSquircleArt
baaa89a0bc Shading: Rewrite Mapping node with dynamic inputs.
This patch rewrites the Mapping node to support dynamic inputs. The
Max and Min options have been removed. They can be added as Min and
Max Vector Math nodes manually.

Texture nodes still use the old matrix-based mapping. A new SVM node
`NODE_TEXTURE_MAPPING` has been added to preserve this functionality.
Similarly, in GLSL, a `mapping_mat4` function has been added.

Reviewers: brecht, JacquesLucke
2019-09-04 23:17:13 +02:00
OmarSquircleArt
23564583a4 Shading: Extend Noise node to other dimenstions.
This patch extends perlin noise to operate in 1D, 2D, 3D, and 4D
space. The noise code has also been refactored to be more readable.

The Color output and distortion patterns changed, so this patch
breaks backward compatibility. This is due to the fact that we
now use random offsets as noise seeds, as opposed to swizzling
and constants offsets.

Reviewers: brecht, JacquesLucke

Differential Revision: https://developer.blender.org/D5560
2019-09-04 17:54:32 +02:00
OmarSquircleArt
133dfdd704 Shading: Add White Noise node.
The White Noise node hashes the input and returns a random number in the
range [0, 1]. The input can be a 1D, 2D, 3D, or a 4D vector.

Reviewers: brecht, JacquesLucke

Differential Revision: https://developer.blender.org/D5550
2019-08-21 20:04:09 +02:00
OmarSquircleArt
313b789289 Shading: Add Clamp node to Cycles and EEVEE.
This patch adds a new node that clamps a value between a maximum and
a minimum values.

Reviewers: brecht

Differential Revision: https://developer.blender.org/D5476
2019-08-13 22:22:15 +02:00
OmarSquircleArt
71641ab56d Shading: Add Map Range node to Cycles and EEVEE.
This patch adds a new Map Range node that linearly remaps an input
value from a range to another. This node is similar to the compositor's
Map Range node.

Reviewers: brecht, JacquesLucke

Differential Revision: https://developer.blender.org/D5471
2019-08-13 16:38:56 +02:00
Brecht Van Lommel
b84db342a5 Fix build errors with older GCC versions like 4.9
We can add more fine grained checks for when these flags are supported so
that adding asan flags manually still has all the workarounds, but for now
compiling succesfully is more important.
2019-08-13 06:04:17 +02:00
Brecht Van Lommel
47bf754de4 Build: disable address sanitizer for Cycles optimized kernels with GCC
It's extremely slow to compile and run, so just disable it unless
WITH_CYCLES_KERNEL_ASAN is manually enabled. For Clang it's always
enabled since that appears to work ok.

This also limits the -fno-sanitize=vptr flag to the Cycles kernel, as it
was added specifically to work around an issue there.

Differential Revision: https://developer.blender.org/D5404
2019-08-05 15:23:57 +02:00
Campbell Barton
e12c08e8d1 ClangFormat: apply to source, most of intern
Apply clang format as proposed in T53211.

For details on usage and instructions for migrating branches
without conflicts, see:

https://wiki.blender.org/wiki/Tools/ClangFormat
2019-04-17 06:21:24 +02:00
Campbell Barton
5498e7f193 CMake: add library deps to CMakeLists.txt
Tested to work on Linux and macOS.

This will be enabled once all platforms are verified.

See D4684
2019-04-16 06:20:52 +02:00
Campbell Barton
813e470eac CMake: cleanup, arg rename, add definitions last 2019-04-16 06:15:18 +02:00
Brecht Van Lommel
65d95879f7 Cycles: upgrade to CUDA 10.1 as the one officially supported version.
This version fixes various bugs, and there is no need anymore to use both
9.1 and 10.0 for different cards.

There is a bug related to WITH_CYCLES_CUBIN_COMPILER and bump mapping in the
regression tests, so that remains disabled same as it was for CUDA 10.0.

Fix T59286: CUDA bake failing on some cards.
Fix T56858: CUDA 9.2 and 10 issues.
2019-03-15 16:52:28 +01:00
Jeroen Bakker
02a7e875d7 Cycles OpenCL: Remove single program
Part of the cleanup of the OpenCL codebase.
Single program is not effective when using OpenCL, it is slower
to compile and slower during rendering (when used in for example
`barbershop` or `victor`).

Reviewers: brecht, #cycles

Maniphest Tasks: T62267

Differential Revision: https://developer.blender.org/D4481
2019-03-08 16:31:35 +01:00
Jeroen Bakker
949ab753bb Cycles OpenCL: Remove OpenCL MegaKernel
Using OpenCL MegaKernel has been slow and therefore not usefull.
This patch will remove the mega kernel from the OpenCL codebase
and the OpenCLDeviceBase class.

T61736: removal of mega kernel
T61703: baking does not work with mega kernel

Tags: #cycles

Differential Revision: https://developer.blender.org/D4383
2019-02-20 15:17:22 +01:00
Jeroen Bakker
667033e89e T61463: Separate Baking kernels
Cycles OpenCL: Split baking kernels in own program

Fix T61463. Before this patch baking was part of the base kernels. There
are 3 baking kernels that and all 3 uses shader evaluation. Only for one
of these kernels the functionality was wrapped in the __NO_BAKING__
compile directive.

When you start baking this leads to long compile times. By separating
in individual programs will reduce the compile times.

Also wrapped all baking kernels with __NO_BAKING__ to reduce the
compilation times.

Impact on compilation time

    job   |   scene_name    | previous |  new  | percentage
  --------+-----------------+----------+-------+------------
   T61463 | empty           |    10.63 |  7.27 |         32%
   T61463 | bmw             |    17.91 | 14.24 |         20%
   T61463 | fishycat        |    19.57 | 15.08 |         23%
   T61463 | barbershop      |    54.10 | 48.18 |         11%
   T61463 | classroom       |    17.55 | 14.42 |         18%
   T61463 | koro            |    18.92 | 17.15 |          9%
   T61463 | pavillion       |    17.43 | 14.23 |         18%
   T61463 | splash279       |    16.48 | 15.33 |          7%
   T61463 | volume_emission |    36.22 | 34.19 |          6%

Impact on render time

    job   |   scene_name    | previous |   new   | percentage
  --------+-----------------+----------+---------+------------
   T61463 | empty           |    21.06 |   20.54 |          2%
   T61463 | bmw             |   198.44 |  189.59 |          4%
   T61463 | fishycat        |   394.20 |  388.50 |          1%
   T61463 | barbershop      |  1188.16 | 1185.49 |          0%
   T61463 | classroom       |   341.08 |  339.27 |          1%
   T61463 | koro            |   472.43 |  360.70 |         24%
   T61463 | pavillion       |   905.77 |  902.14 |          0%
   T61463 | splash279       |    55.26 |   54.92 |          1%
   T61463 | volume_emission |    62.59 |   39.09 |         38%

I don't have a grounded explanation why koro and volume_emission is this much
faster; I have done several tests though...

Maniphest Tasks: T61463

Differential Revision: https://developer.blender.org/D4376
2019-02-19 16:34:55 +01:00
Brecht Van Lommel
9800837b98 Cycles: Support multithreaded compilation of kernels
This patch implements a workaround to get the multithreaded compilation from D2231 working.
So far, it only works for Blender, not for Cycles Standalone. Also, I have only tested the Linux codepath in the helper function.
Depends on D2231.

Patch by lukasstockner97, jbakker, brecht

    job    |   scene_name    | compilation_time
----------+-----------------+------------------
    Baseline | empty           |            22.73
    D2264    | empty           |            13.94
    Baseline | bmw             |            56.44
    D2264    | bmw             |            41.32
    Baseline | fishycat        |            59.50
    D2264    | fishycat        |            45.19
    Baseline | barbershop      |           212.28
    D2264    | barbershop      |           169.81
    Baseline | victor          |            67.51
    D2264    | victor          |            53.60
    Baseline | classroom       |            51.46
    D2264    | classroom       |            39.02
    Baseline | koro            |            62.48
    D2264    | koro            |            49.03
    Baseline | pavillion       |            54.37
    D2264    | pavillion       |            38.82
    Baseline | splash279       |            47.43
    D2264    | splash279       |            37.94
    Baseline | volume_emission |           145.22
    D2264    | volume_emission |           121.10

This patch reduced compilation time as the split kernels and base
kernels are compiled in parallel. In cycles debug mode (256) you can set
unmark the opencl single program file, what reduces the compilation time
even further (bmw 17 seconds, barbershop 53 seconds).

Reviewers: brecht, dingto, sergey, juicyfruit, lukasstockner97

Reviewed By: brecht

Subscribers: Loner, jbakker, candreacchio, 3dLuver, LazyDodo, bliblubli

Differential Revision: https://developer.blender.org/D2264
2019-02-15 08:56:20 +01:00
Brecht Van Lommel
765795aed7 Fix macOS buildbot build, wrong CUDA version check. 2018-12-11 14:16:48 +01:00
Brecht Van Lommel
f5b46daf52 Fix build with old CMake versions. 2018-12-05 12:53:19 +01:00
Brecht Van Lommel
f63da3dcf5 Buildbot: enable support for NVIDIA Turing cards in Cycles (like GTX 20xx).
We currently only build the sm_7x kernels with CUDA 10.0, older cards still
use 9.1 until rendering errors are solved for them.
2018-12-04 16:03:18 +01:00
Brecht Van Lommel
b14ec18601 Cycles: add initial CUDA 10.0 support, but only recommend use for Turing cards.
There may still be rendering errors when used for older graphics cards.
2018-12-04 16:03:18 +01:00
Lukas Stockner
7fa6f72084 Cycles: Add sample-based runtime profiler that measures time spent in various parts of the CPU kernel
This commit adds a sample-based profiler that runs during CPU rendering and collects statistics on time spent in different parts of the kernel (ray intersection, shader evaluation etc.) as well as time spent per material and object.

The results are currently not exposed in the user interface or per Python yet, to see the stats on the console pass the "--cycles-print-stats" argument to Cycles (e.g. "./blender -- --cycles-print-stats").

Unfortunately, there is no clear way to extend this functionality to CUDA or OpenCL, so it is CPU-only for now.

Reviewers: brecht, sergey, swerner

Reviewed By: brecht, swerner

Differential Revision: https://developer.blender.org/D3892
2018-11-29 02:45:24 +01:00
Stefan Werner
2c5531c0a5 Cycles: Added Embree as BVH option for CPU renders.
Note that this is turned off by default and must be enabled at build time with the CMake WITH_CYCLES_EMBREE flag.
Embree must be built as a static library with ray masking turned on, the `make deps` scripts have been updated accordingly.
There, Embree is off by default too and must be enabled with the WITH_EMBREE flag.

Using Embree allows for much faster rendering of deformation motion blur while reducing the memory footprint.

TODO: GPU implementation, deduplication of data, leveraging more of Embrees features (e.g. tessellation cache).

Differential Revision: https://developer.blender.org/D3682
2018-11-07 12:58:12 +01:00
Stefan Werner
e58c6cf0c6 Cycles: Added Cryptomatte output.
This allows for extra output passes that encode automatic object and material masks
for the entire scene. It is an implementation of the Cryptomatte standard as
introduced by Psyop. A good future extension would be to add a manifest to the
export and to do plenty of testing to ensure that it is fully compatible with other
renderers and compositing programs that use Cryptomatte.

Internally, it adds the ability for Cycles to have several passes of the same type
that are distinguished by their name.

Differential Revision: https://developer.blender.org/D3538
2018-10-28 05:37:41 -04:00
Brecht Van Lommel
a0402074ed Fix wrong CUDA version warning in cmake.
Fix suggested by Dalai.
2018-09-19 16:24:45 +02:00