Commit Graph

2638 Commits

Author SHA1 Message Date
Brecht Van Lommel
d7d40745fa Cycles: changes to source code folders structure
* Split render/ into scene/ and session/. The scene/ folder now contains the
  scene and its nodes. The session/ folder contains the render session and
  associated data structures like drivers and render buffers.
* Move top level kernel headers into new folders kernel/camera/, kernel/film/,
  kernel/light/, kernel/sample/, kernel/util/
* Move integrator related kernel headers into kernel/integrator/
* Move OSL shaders from kernel/shaders/ to kernel/osl/shaders/

For patches and branches, git merge and rebase should be able to detect the
renames and move over code to the right file.
2021-10-26 15:36:39 +02:00
Brecht Van Lommel
75704091fc Cycles: add additive AO support through Fast GI settings
Add a Fast GI Method, either Replace for the existing behavior, or Add
to add ambient occlusion like the old world settings.

This replaces the old Ambient Occlusion settings in the world properties.
2021-10-26 14:56:43 +02:00
Brecht Van Lommel
eb1fed9d60 Cycles: restore Denoising Depth pass, when enabling Denoising Data passes
This is still useful in some cases even if not used by OpenImageDenoise. In
the future this may be replaced with a more generic system to control render
passes and filtering, but for now this just does what it did before.
2021-10-26 14:48:44 +02:00
Brecht Van Lommel
16a8d0fab0 Cycles: change Position render pass to be not antialiased
Similar to the Depth, for compositing the interpolated values between a far
and near object can be non-sensical.
2021-10-26 14:48:44 +02:00
Brecht Van Lommel
c4b02bb6bc Fix Cycles HIP binaries always recompiling 2021-10-22 14:32:24 +02:00
Brecht Van Lommel
282516e53e Cleanup: refactor float/half conversions for clarity 2021-10-22 13:03:03 +02:00
Sayak Biswas
d092933abb Cycles: various fixes for HIP and compilation of HIP binaries
* Additional structs added to the hipew loader for device props
* Adds hipRTC functions to the loader for future usage
* Enables CPU+GPU usage for HIP
* Cleanup to the adaptive kernel compilation process
* Fix for kernel compilation failures with HIP with latest master

Ref T92393, D12958
2021-10-22 12:15:29 +02:00
Brecht Van Lommel
be558d2d97 Fix T92363: OptiX fails with ambient occlusion node, after recent changes
This triggered a compiler bug where it does not handle the sub.s16 PTX
instruction. Instead refactor the code so we don't need to do uint16_t
subtraction at all.

Also update OptiX device to remove the AO pass direct callable.

Thanks Patrick Mours for figuring this out.
2021-10-21 21:25:34 +02:00
Brecht Van Lommel
df00463764 Cycles: add shadow path compaction for GPU rendering
Similar to main path compaction that happens before adding work tiles, this
compacts shadow paths before launching kernels that may add shadow paths.

Only do it when more than 50% of space is wasted.

It's not a clear win in all scenes, some are up to 1.5% slower. Likely caused
by different order of scheduling kernels having an unpredictable performance
impact. Still feels like compaction is just the right thing to avoid cases
where a few shadow paths can hold up a lot of main paths.

Differential Revision: https://developer.blender.org/D12944
2021-10-21 15:38:03 +02:00
Brecht Van Lommel
7d111f4ac2 Cleanup: remove unused code 2021-10-20 18:15:21 +02:00
Brecht Van Lommel
52c5300214 Cleanup: some renaming to better distinguish main and shadow paths 2021-10-20 17:50:31 +02:00
Brecht Van Lommel
cccfa597ba Cycles: make ambient occlusion pass take into account transparency again
Taking advantage of the new decoupled main and shadow paths. For CPU we
just store two nested structs in the integrator state, one for direct light
shadows and one for AO. For the GPU we restrict the number of shade surface
states to be executed based on available space in the shadow paths queue.

This also helps improve performance in benchmark scenes with an AO pass,
since it is no longer needed to use the shader raytracing kernel there,
which has worse performance.

Differential Revision: https://developer.blender.org/D12900
2021-10-20 17:50:31 +02:00
Sayak Biswas
ba4e227def HIP device code cleanup and fix for high VRAM usage
This patch cleans up code for HIP device and makes it more consistent with the CUDA code.
It also fixes the issue with high VRAM usage on AMD cards using HIP allowing better performance and usage on cards like 6600XT.
Added a check in intern/cycles/kernel/bvh/bvh_util.h to prevent compiler error with hipcc

Reviewed By: brecht, leesonw

Maniphest Tasks: T92124

Differential Revision: https://developer.blender.org/D12834
2021-10-20 14:04:28 +02:00
Brecht Van Lommel
fd77a28031 Cycles: bake transparent shadows for hair
These transparent shadows can be expansive to evaluate. Especially on the
GPU they can lead to poor occupancy when only some pixels require many kernel
launches to trace and evaluate many layers of transparency.

Baked transparency allows tracing a single ray in many cases by accumulating
the throughput directly in the intersection program without recording hits
or evaluating shaders. Transparency is baked at curve vertices and
interpolated, for most shaders this will look practically the same as actual
shader evaluation.

Fixes T91428, performance regression with spring demo file due to transparent
hair, and makes it render significantly faster than Blender 2.93.

Differential Revision: https://developer.blender.org/D12880
2021-10-19 15:11:09 +02:00
Brecht Van Lommel
d06828f0b8 Cycles: avoid intermediate stack array for writing shadow intersections
Helps save one OptiX payload and is a bit more efficient.

Differential Revision: https://developer.blender.org/D12909
2021-10-19 15:10:55 +02:00
Brecht Van Lommel
943e73b07e Cycles: decouple shadow paths from main path on GPU
The motivation for this is twofold. It improves performance (5-10% on most
benchmark scenes), and will help  to bring back transparency support for the
ambient occlusion pass.

* Duplicate some members from the main path state in the shadow path state.
* Add shadow paths incrementally to the array similar to what we do for
  the shadow catchers.
* For the scheduling, allow running shade surface and shade volume kernels
  as long as there is enough space in the shadow paths array. If not, execute
  shadow kernels until it is empty.

* Add IntegratorShadowState and ConstIntegratorShadowState typedefs that
  can be different between CPU and GPU. For GPU both main and shadow paths
  juse have an integer for SoA access. Bt with CPU it's a different pointer
  type so we get type safety checks in code shared between CPU and GPU.
* For CPU, add a separate IntegratorShadowStateCPU struct embedded in
  IntegratorShadowState.
* Update various functions to take the shadow state, and make SVM take either
  type of state using templates.

Differential Revision: https://developer.blender.org/D12889
2021-10-19 15:09:29 +02:00
Brecht Van Lommel
a395a1b36b Cleanup: fix compiler warnings 2021-10-19 12:59:05 +02:00
Sergey Sharybin
c107a3c4d9 Fix invalid principled diffuse in Cycles OSL
Need to initialize components for the full Diffuse BSDF.

Steps to reproduce:
- Default cube scene
- Switch to Cycles renderer
- Enable OSL backend
- Start viewport render
- Observe cube being much black

Differential Revision: https://developer.blender.org/D12921
2021-10-19 12:10:29 +02:00
Sergey Sharybin
765eba5a6e Cleanup: More readable Cycles OSL BSDF definition
A  Clang-Format configuration to make the closure definition block to
be properly recognized as such.

Also small wrapper macro to avoid comma in the actual definition code
which was causing unwanted indentation of parameters definition.

Requires Clang-Format 7 or newer. The version we ship in the libs is
12, so for recommended development setup it should all be good.

Differential Revision: https://developer.blender.org/D12920
2021-10-19 11:59:26 +02:00
Campbell Barton
695dc07cb1 Cleanup: clang-format 2021-10-19 18:31:15 +11:00
Brecht Van Lommel
41eba47a87 Revert "Cycles: optimize volume stack copying for shadow catcher/compaction"
This reverts commit 3065d26097. Causing crashes
in the spring scene.
2021-10-18 22:38:33 +02:00
Brecht Van Lommel
a9cb330815 Cleanup: minor refactoring in preparation of main and shadow path decoupling
Ref D12889
2021-10-18 19:02:10 +02:00
Brecht Van Lommel
2430f75279 Cycles: reduce GPU state memory a little
* isect Ng is no longer needed for shadows, for main path needed for SSS only
* Reduce rng_offset and queued_kernel to 16 bits

Ref D12889
2021-10-18 19:02:10 +02:00
Brecht Van Lommel
3065d26097 Cycles: optimize volume stack copying for shadow catcher/compaction
Only copy the number of items used instead of the max items.

Ref D12889
2021-10-18 19:02:10 +02:00
Brecht Van Lommel
fc4b1fede3 Cleanup: consistently use uint32_t for path flag 2021-10-18 19:02:10 +02:00
Brecht Van Lommel
1df3b51988 Cycles: replace integrator state argument macros
* Rename struct KernelGlobals to struct KernelGlobalsCPU
* Add KernelGlobals, IntegratorState and ConstIntegratorState typedefs
  that every device can define in its own way.
* Remove INTEGRATOR_STATE_ARGS and INTEGRATOR_STATE_PASS macros and
  replace with these new typedefs.
* Add explicit state argument to INTEGRATOR_STATE and similar macros

In preparation for decoupling main and shadow paths.

Differential Revision: https://developer.blender.org/D12888
2021-10-18 19:02:10 +02:00
Charlie Jolly
78b5050ff4 Cycles: Voronoi noise, fix uninitialised variable
Caused a debug crash in Windows MSVS.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D12873
2021-10-15 15:01:10 +01:00
Brecht Van Lommel
2f36762def Cleanup: refactor BVH2 shadow intersection for upcoming changes 2021-10-15 15:42:44 +02:00
Brecht Van Lommel
5d565062ed Cleanup: refactor OptiX shadow intersection for upcoming changes 2021-10-15 15:42:44 +02:00
Brecht Van Lommel
eb71157e2a Cleanup: add utility functions for packing integers 2021-10-15 15:42:44 +02:00
Brecht Van Lommel
2ba7c3aa65 Cleanup: refactor to make number of channels for shader evaluation variable 2021-10-15 15:42:44 +02:00
Brecht Van Lommel
53f25df5bc Fix T92128: Cycles CUDA wrong hair attributes, after recent changes 2021-10-15 15:42:44 +02:00
Michael Jones
a0f269f682 Cycles: Kernel address space changes for MSL
This is the first of a sequence of changes to support compiling Cycles kernels as MSL (Metal Shading Language) in preparation for a Metal GPU device implementation.

MSL requires that all pointer types be declared with explicit address space attributes (device, thread, etc...). There is already precedent for this with Cycles' address space macros (ccl_global, ccl_private, etc...), therefore the first step of MSL-enablement is to apply these consistently. Line-for-line this represents the largest change required to enable MSL. Applying this change first will simplify future patches as well as offering the emergent benefit of enhanced descriptiveness.

The vast majority of deltas in this patch fall into one of two cases:

- Ensuring ccl_private is specified for thread-local pointer types
- Ensuring ccl_global is specified for device-wide pointer types

Additionally, the ccl_addr_space qualifier can be removed. Prior to Cycles X, ccl_addr_space was used as a context-dependent address space qualifier, but now it is either redundant (e.g. in struct typedefs), or can be replaced by ccl_global in the case of pointer types. Associated function variants (e.g. lcg_step_float_addrspace) are also redundant.

In cases where address space qualifiers are chained with "const", this patch places the address space qualifier first. The rationale for this is that the choice of address space is likely to have the greater impact on runtime performance and overall architecture.

The final part of this patch is the addition of a metal/compat.h header. This is partially complete and will be extended in future patches, paving the way for the full Metal implementation.

Ref T92212

Reviewed By: brecht

Maniphest Tasks: T92212

Differential Revision: https://developer.blender.org/D12864
2021-10-14 16:14:43 +01:00
Sergey Sharybin
aa46459543 Fix shadow catcher behind transparent object on GPU
The assumption about absent shadow path was wrong.

The rest of the changes are to ensure shadow paths are finished prior
to the split, so that they write to the proper passes.

The issue was caught by running regression tests on OptiX.

Differential Revision: https://developer.blender.org/D12857
2021-10-14 09:39:38 +02:00
Campbell Barton
c1c6c11ca6 Cleanup: spelling in comments 2021-10-12 17:55:02 +11:00
Brecht Van Lommel
a94343a8af Cycles: improve SSS Fresnel and retro-reflection in Principled BSDF
For details see the "Extending the Disney BRDF to a BSDF with Integrated
Subsurface Scattering" paper.

We split the diffuse BSDF into a lambertian and retro-reflection component.
The retro-reflection component is always handled as a BSDF, while the
lambertian component can be replaced by a BSSRDF.

For the BSSRDF case, we compute Fresnel separately at the entry and exit
points, which may have different normals. As the scattering radius decreases
this converges to the BSDF case.

A downside is that this increases noise for subsurface scattering in the
Principled BSDF, due to some samples going to the retro-reflection component.
However the previous logic (also in 2.93) was simple wrong, using a
non-sensical view direction vector at the exit point. We use an importance
sampling weight estimate for the retro-reflection to try to better balance
samples between the BSDF and BSSRDF.

Differential Revision: https://developer.blender.org/D12801
2021-10-11 18:22:54 +02:00
Brecht Van Lommel
73a05ff9e8 Cycles: restore Christensen-Burley SSS
There is not enough time before the release to improve Random Walk to handle
all cases this was used for, so restore it for now.

Since there is no more path splitting in cycles-x, this can increase noise in
non-flat areas for the sample number of samples, though fewer rays will be traced
also. This is fundamentally a trade-off we made in the new design and why Random
Walk is a better fit. However the importance resampling we do now does help to
reduce noise.

Differential Revision: https://developer.blender.org/D12800
2021-10-11 18:22:54 +02:00
Brecht Van Lommel
736be7cf58 Fix T91997: Cycles glass + SSS not rendering correctly 2021-10-08 16:11:02 +02:00
Sergey Sharybin
f01c4f27f9 Fix Cycles speed regression after dynamic volume stack change
Only copy required part of volume stack instead of entire stack.

Solves time regression introduced by D12759 and avoids need in
implementing volume stack calculation to exactly match what the
path tracing will do (as well as potentially makes scenes with
a lot of volumes ans a tiny bit of deeply nested ones render
faster).

Still need to look into memory aspect of the regression, but
that is for separate patch.

Ref T92014

Maniphest Tasks: T92014

Differential Revision: https://developer.blender.org/D12790
2021-10-08 15:44:03 +02:00
Campbell Barton
de07bf2b13 Cleanup: spelling 2021-10-08 13:23:19 +11:00
Brecht Van Lommel
4ee97f129a Cleanup: remove unnecessary data from LocalIntersection 2021-10-07 21:35:24 +02:00
Brecht Van Lommel
04857cc8ef Cycles: fully decouple triangle and curve primitive storage from BVH2
Previously the storage here was optimized to avoid indirections in BVH2
traversal. This helps improve performance a bit, but makes performance
and memory usage of Embree and OptiX BVHs a bit worse also. It also adds
code complexity in other parts of the code.

Now decouple triangle and curve primitive storage from BVH2.
* Reduced peak memory usage on all devices
* Bit better performance for OptiX and Embree
* Bit worse performance for CUDA
* Simplified code:
** Intersection.prim/object now matches ShaderData.prim/object
** No more offset manipulation for mesh displacement before a BVH is built
** Remove primitive packing code and flags for Embree and OptiX
** Curve segments are now stored in a KernelCurve struct
* Also happens to fix a bug in baking with incorrect prim/object

Fixes T91968, T91770, T91902

Differential Revision: https://developer.blender.org/D12766
2021-10-06 17:52:04 +02:00
Sergey Sharybin
0194e54fd3 Fix compilation error with MSVC
MSVC does not support variable size array definition.
Use maximum possible stack, similar to the GPU case.

Not expected to have user-measurable difference.
2021-10-06 16:51:07 +02:00
Sergey Sharybin
c6275da852 Fix T91922: Cycles artifacts with high volume nested level
Make volume stack allocated conditionally, potentially based on the
actual nested level of objects in the scene.

Currently the nested level is estimated by number of volume objects.
This is a non-expensive check which is probably enough in practice
to get almost perfect memory usage and performance.

The conditional allocation is a bit tricky.

For the CPU we declare and define maximum possible volume stack,
because there are only that many integrator states on the CPU.

On the GPU we declare outer SoA to have all volume stack elements,
but only allocate actually needed ones. The actually used volume
stack size is passed as a pre-processor, which seems to be easiest
and fastest for the GPU state copy.

There seems to be no speed regression in the demo files on RTX6000.

Note that scenes with high nested level of volume will now be slower
but correct.

Differential Revision: https://developer.blender.org/D12759
2021-10-06 15:46:32 +02:00
Brecht Van Lommel
03f8c1abd0 Build: add ccache support for CUDA kernels on Linux 2021-10-06 14:21:26 +02:00
Mikhail Matrosov
ca0450feef Fix T91064: Cycles low poly meshes having black edges when shade smoothed
Fixes:{T91064}

Caused by {rBcd118c5581f482afc8554ff88b5b6f3b552b1682}

- Applies `ensure_valid_reflection()` to the normal input on all BSDFs for CPU and GPU.
- This doesn't affect hair.
- Removes `ensure_valid_reflection()` from the output of Bump Map and Normal Map nodes for CPU/GPU as it is not needed.
- The fix doesn't touch OSL.

Reviewed By: brecht, leesonw

Maniphest Tasks: T91064

Differential Revision: https://developer.blender.org/D12403
2021-10-06 10:25:09 +02:00
Campbell Barton
df8f507f41 Cleanup: spelling in comments 2021-10-06 14:54:05 +11:00
Jesse Yurkovich
76de3ac4ce Cleanup: Remove data duplication from various lookup tables in Cycles
This effectively undoes some of the following commit:
rB4537e8558468c71a03bf53f59c60f888b3412de2

The tables in question were duplicated 5-6 times into the blender
executable due to the headers being used in multiple translation units.
This contributes ~6.3kb worth of duplicate data into the binary.

Some further details are in the below revision.

Differential Revision: https://developer.blender.org/D12724
2021-10-05 19:09:01 -07:00
Sergey Sharybin
6e268a749f Fix adaptive sampling artifacts on tile boundaries
Implement an overscan support for tiles, so that adaptive sampling can
rely on the pixels neighbourhood.

Differential Revision: https://developer.blender.org/D12599
2021-10-05 16:19:14 +02:00
Brecht Van Lommel
55b8fc718a Cycles: improve detection of HIP compiler for buildbot
And fix various broken things in the HIP kernel compilation.
2021-10-05 13:47:50 +02:00