blender

Author	SHA1	Message	Date
Brecht Van Lommel	df00463764	Cycles: add shadow path compaction for GPU rendering Similar to main path compaction that happens before adding work tiles, this compacts shadow paths before launching kernels that may add shadow paths. Only do it when more than 50% of space is wasted. It's not a clear win in all scenes, some are up to 1.5% slower. Likely caused by different order of scheduling kernels having an unpredictable performance impact. Still feels like compaction is just the right thing to avoid cases where a few shadow paths can hold up a lot of main paths. Differential Revision: https://developer.blender.org/D12944	2021-10-21 15:38:03 +02:00
Brecht Van Lommel	0c52eed863	Cycles: more accurately count main paths for adding work tiles Easy now thanks to the main and shadow path decoupling. Doesn't help in an benchmark scene except Spring, where it reduces render time by maybe 2-3%. Ref T87836	2021-10-20 17:50:31 +02:00
Brecht Van Lommel	52c5300214	Cleanup: some renaming to better distinguish main and shadow paths	2021-10-20 17:50:31 +02:00
Brecht Van Lommel	cccfa597ba	Cycles: make ambient occlusion pass take into account transparency again Taking advantage of the new decoupled main and shadow paths. For CPU we just store two nested structs in the integrator state, one for direct light shadows and one for AO. For the GPU we restrict the number of shade surface states to be executed based on available space in the shadow paths queue. This also helps improve performance in benchmark scenes with an AO pass, since it is no longer needed to use the shader raytracing kernel there, which has worse performance. Differential Revision: https://developer.blender.org/D12900	2021-10-20 17:50:31 +02:00
Brecht Van Lommel	943e73b07e	Cycles: decouple shadow paths from main path on GPU The motivation for this is twofold. It improves performance (5-10% on most benchmark scenes), and will help to bring back transparency support for the ambient occlusion pass. * Duplicate some members from the main path state in the shadow path state. * Add shadow paths incrementally to the array similar to what we do for the shadow catchers. * For the scheduling, allow running shade surface and shade volume kernels as long as there is enough space in the shadow paths array. If not, execute shadow kernels until it is empty. * Add IntegratorShadowState and ConstIntegratorShadowState typedefs that can be different between CPU and GPU. For GPU both main and shadow paths juse have an integer for SoA access. Bt with CPU it's a different pointer type so we get type safety checks in code shared between CPU and GPU. * For CPU, add a separate IntegratorShadowStateCPU struct embedded in IntegratorShadowState. * Update various functions to take the shadow state, and make SVM take either type of state using templates. Differential Revision: https://developer.blender.org/D12889	2021-10-19 15:09:29 +02:00
Campbell Barton	c5a13ffcb4	Cleanup: spelling in comments	2021-10-18 12:13:10 +11:00
Sergey Sharybin	aa46459543	Fix shadow catcher behind transparent object on GPU The assumption about absent shadow path was wrong. The rest of the changes are to ensure shadow paths are finished prior to the split, so that they write to the proper passes. The issue was caught by running regression tests on OptiX. Differential Revision: https://developer.blender.org/D12857	2021-10-14 09:39:38 +02:00
Sergey Sharybin	cc04399937	Fix missing Cycles volume stack re-allocation Need to check allocation size, as the features do not change with volume stack depth detection.	2021-10-12 11:55:23 +02:00
Sergey Sharybin	719c319055	Fix Cycles long start on scene without volumes The state template iteration had difficult time dealing with 0-sized arrays, causing iteration for until integer overflows.	2021-10-07 15:54:56 +02:00
Sergey Sharybin	c6275da852	Fix T91922: Cycles artifacts with high volume nested level Make volume stack allocated conditionally, potentially based on the actual nested level of objects in the scene. Currently the nested level is estimated by number of volume objects. This is a non-expensive check which is probably enough in practice to get almost perfect memory usage and performance. The conditional allocation is a bit tricky. For the CPU we declare and define maximum possible volume stack, because there are only that many integrator states on the CPU. On the GPU we declare outer SoA to have all volume stack elements, but only allocate actually needed ones. The actually used volume stack size is passed as a pre-processor, which seems to be easiest and fastest for the GPU state copy. There seems to be no speed regression in the demo files on RTX6000. Note that scenes with high nested level of volume will now be slower but correct. Differential Revision: https://developer.blender.org/D12759	2021-10-06 15:46:32 +02:00
Sergey Sharybin	6e268a749f	Fix adaptive sampling artifacts on tile boundaries Implement an overscan support for tiles, so that adaptive sampling can rely on the pixels neighbourhood. Differential Revision: https://developer.blender.org/D12599	2021-10-05 16:19:14 +02:00
Sergey Sharybin	9a0850c8c2	Cycles: Fix wrong GPU state calculation Currently was only used for logging, but better to fix the size so that it matches reality. The issue was caused by decoupling number of shadow intersections and using much higher number for CPU. This caused the total state on GPU to be logged as 10s of gigabytes instead of 100s of megabytes. Differential Revision: https://developer.blender.org/D12755	2021-10-05 16:09:31 +02:00
Brecht Van Lommel	a754e35198	Cycles: refactor API for GPU display * Split GPUDisplay into two classes. PathTraceDisplay to implement the Cycles side, and DisplayDriver to implement the host application side. The DisplayDriver is now a fully abstract base class, embedded in the PathTraceDisplay. * Move copy_pixels_to_texture implementation out of the host side into the Cycles side, since it can be implemented in terms of the texture buffer mapping. * Move definition of DeviceGraphicsInteropDestination into display driver header, so that we do not need to expose private device headers in the public API. * Add more detailed comments about how the DisplayDriver should be implemented. The "driver" terminology might not be obvious, but is also used in other renderers. Differential Revision: https://developer.blender.org/D12626	2021-09-30 20:48:08 +02:00
Brecht Van Lommel	4d4113adc2	Cycles: record large number of transparent shadow intersections on CPU So we can do fewer intersection calls, only on the GPU do we need to save memory and do this in small steps. Ref T87836	2021-09-29 16:37:32 +02:00
Sergey Sharybin	731325a022	Cycles: Make sure GPU transfer is finished prior display update Noticed while looking into flickering issues in viewport. Doesn't seem to solve the flicker issue for me, but is something what is supposed to be happening anyway. Differential Revision: https://developer.blender.org/D12673	2021-09-29 14:05:51 +02:00
Campbell Barton	4d66cbd140	Cleanup: spelling in comments	2021-09-22 14:54:01 +10:00
Brecht Van Lommel	0803119725	Cycles: merge of cycles-x branch, a major update to the renderer This includes much improved GPU rendering performance, viewport interactivity, new shadow catcher, revamped sampling settings, subsurface scattering anisotropy, new GPU volume sampling, improved PMJ sampling pattern, and more. Some features have also been removed or changed, breaking backwards compatibility. Including the removal of the OpenCL backend, for which alternatives are under development. Release notes and code docs: https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycles https://wiki.blender.org/wiki/Source/Render/Cycles Credits: * Sergey Sharybin * Brecht Van Lommel * Patrick Mours (OptiX backend) * Christophe Hery (subsurface scattering anisotropy) * William Leeson (PMJ sampling pattern) * Alaska (various fixes and tweaks) * Thomas Dinges (various fixes) For the full commit history, see the cycles-x branch. This squashes together all the changes since intermediate changes would often fail building or tests. Ref T87839, T87837, T87836 Fixes T90734, T89353, T80267, T80267, T77185, T69800	2021-09-21 14:55:54 +02:00

17 Commits