blender

Author	SHA1	Message	Date
Brecht Van Lommel	cfa8b762e2	Code cleanup: move rng into path state. Also pass by value and don't write back now that it is just a hash for seeding and no longer an LCG state. Together this makes CUDA a tiny bit faster in my tests, but mainly simplifies code.	2017-08-19 18:14:16 +02:00
Brecht Van Lommel	dc7fcebb33	Code cleanup: make L_transparent part of PathRadiance.	2017-08-13 01:19:07 +02:00
Brecht Van Lommel	7542282c06	Code cleanup: make DebugData part of PathRadiance.	2017-08-13 01:19:07 +02:00
Brecht Van Lommel	8f97108353	Cycles: optimize CPU split kernel data init.	2017-08-12 20:43:34 +02:00
Brecht Van Lommel	601f94a3c2	Code cleanup: remove unused Cycles random number code.	2017-08-12 20:40:38 +02:00
Brecht Van Lommel	85ad248c36	Code cleanup: fix warning and improve terminology.	2017-08-12 13:18:05 +02:00
Sergey Sharybin	bd069a89aa	Fix T52229: Shadow Catcher artifacts when under transparency Added some extra tirckery to avoid background being tinted dark with transparent surface. Maybe a bit hacky, but seems to work fine.	2017-08-11 13:49:50 +02:00
Mai Lavelle	ec8ae4d5e9	Cycles: Pack kernel textures into buffers for OpenCL Image textures were being packed into a single buffer for OpenCL, which limited the amount of memory available for images to the size of one buffer (usually 4gb on AMD hardware). By packing textures into multiple buffers that limit is removed, while simultaneously reducing the number of buffers that need to be passed to each kernel. Benchmarks were within 2%. Fixes T51554. Differential Revision: https://developer.blender.org/D2745	2017-08-08 07:12:04 -04:00
Brecht Van Lommel	fc38276d74	Fix Cycles shadow catcher objects influencing each other. Since all the shadow catchers are already assumed to be in the footage, the shadows they cast on each other are already in the footage too. So don't just let shadow catchers skip self, but all shadow catchers. Another justification is that it should not matter if the shadow catcher is modeled as one object or multiple separate objects, the resulting render should be the same. Differential Revision: https://developer.blender.org/D2763	2017-08-07 17:54:26 +02:00
Sergey Sharybin	580741b317	Cycles: Cleanup, space after keyword	2017-08-07 14:47:51 +02:00
Sergey Sharybin	5f35682f3a	Fix T52021: Shadow catcher renders wrong when catcher object is behind transparent object Tweaked the path radiance summing and alpha to accommodate for possible contribution of light by transparent surface bounces happening prior to shadow catcher intersection. This commit will change the way how shadow catcher results looks when was behind semi transparent object, but the old result seemed to be fully wrong: there were big artifacts when alpha-overing the result on some actual footage.	2017-07-18 09:46:21 +02:00
Lukas Stockner	15fd758bd6	Fix T51950: Abnormally long Cycles OpenCL GPU render times with certain panoramic camera settings The problem here was that when a "invalid" path is generated by the panoramic camera, it was tagged as RAY_TO_REGENERATE with the intention of generating a new path in kernel_buffer_update. However, since that state was not handled in kernel_queue_enqueue, kernel_buffer_update did not process the path which resulted in an infinite loop.	2017-07-03 18:26:19 +02:00
Sergey Sharybin	40c04dd649	Cycles: Cleanup, indentation	2017-06-13 10:28:38 +02:00
Mai Lavelle	6238214159	Cycles: Faster split branched path tracing by sharing samples with inactive threads Unlike regular path tracing, branched path tracing is usually used with lower sample counts, at least for primary rays. This means that are less samples for the GPU to work on in parallel and rendering is slower. As there is less work overall there is also more inactive threads during rendering with BPT. This patch makes use of those inactive rays to render branched samples in parallel with other samples. Each thread that is preparing for a branched sample will attempt to find an inactive thread and if one is found the state for the sample is copied to that thread. Potentially, if there are enough inactive threads, 100s of branched samples could be generated from the same originating thread and ran in parallel giving large speed ups. Gives 70% faster render for pavillion midday scene. 20-60% faster on BMW with car paint replaced with SSS/volumes.	2017-06-10 04:08:49 -04:00
Mai Lavelle	ea846a4dfc	Cycles: Add kernel to enqueue inactive rays The queue will be used to make reuse of inactive threads to keep the GPU more busy.	2017-06-10 03:51:18 -04:00
Sergey Sharybin	8e655446d1	Fix T51537: Light passes are summed twice for split kernel since denoise commit Denoise commit introduced kernel_write_result() which saves light passes, so no need to call both kernel_write_result() and kernel_write_light_passes() from the split kernel. Weirdly enough. kernel_write_result() does not take care about debug passes.	2017-05-19 12:14:03 +02:00
Mai Lavelle	966a2681f9	Cycles: Fix building with native only option Approach suggested by Lukas S.	2017-05-16 16:05:04 -04:00
Hristo Gueorguiev	90b9467861	Cycles: fix AO approximation for split kernel	2017-05-11 11:58:25 +02:00
Lukas Stockner	43b374e8c5	Cycles: Implement denoising option for reducing noise in the rendered image This commit contains the first part of the new Cycles denoising option, which filters the resulting image using information gathered during rendering to get rid of noise while preserving visual features as well as possible. To use the option, enable it in the render layer options. The default settings fit a wide range of scenes, but the user can tweak individual settings to control the tradeoff between a noise-free image, image details, and calculation time. Note that the denoiser may still change in the future and that some features are not implemented yet. The most important missing feature is animation denoising, which uses information from multiple frames at once to produce a flicker-free and smoother result. These features will be added in the future. Finally, thanks to all the people who supported this project: - Google (through the GSoC) and Theory Studios for sponsoring the development - The authors of the papers I used for implementing the denoiser (more details on them will be included in the technical docs) - The other Cycles devs for feedback on the code, especially Sergey for mentoring the GSoC project and Brecht for the code review! - And of course the users who helped with testing, reported bugs and things that could and/or should work better!	2017-05-07 14:40:58 +02:00
Sergey Sharybin	2eb906e1b4	Cycles: Fix access array index of -1 in SSS and volume split kernels	2017-05-05 17:54:03 +02:00
Sergey Sharybin	850bb7a50b	Cycles: Cleanup, indentation	2017-05-05 16:54:37 +02:00
Hristo Gueorguiev	8b97e42eca	Cycles: Split kernel SSS & Volume data definitions cleanup	2017-05-05 13:42:26 +02:00
Hristo Gueorguiev	6bf4115c13	Cycles: Split kernel - sort shaders Reduce thread divergence in kernel_shader_eval. Rays are sorted in blocks of 2048 according to shader->id. On R9 290 Classroom is ~30% faster, and Pabellon Barcelone is ~8% faster. No sorting for CUDA split kernel. Reviewers: sergey, maiself Reviewed By: maiself Differential Revision: https://developer.blender.org/D2598	2017-05-03 15:30:45 +02:00
Mai Lavelle	915766f42d	Cycles: Branched path tracing for the split kernel This implements branched path tracing for the split kernel. General approach is to store the ray state at a branch point, trace the branched ray as normal, then restore the state as necessary before iterating to the next part of the path. A state machine is used to advance the indirect loop state, which avoids the need to add any new kernels. Each iteration the state machine recreates as much state as possible from the stored ray to keep overall storage down. Its kind of hard to keep all the different integration loops in sync, so this needs lots of testing to make sure everything is working correctly. We should probably start trying to deduplicate the integration loops more now. Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes could use more testing still. Reviewers: sergey, nirved Reviewed By: nirved Subscribers: Blendify, bliblubli Differential Revision: https://developer.blender.org/D2611	2017-05-02 14:26:46 -04:00
Hristo Gueorguiev	9d26e32ea2	Workaround for AMD GPU OpenCL compiler.	2017-04-25 20:08:14 +02:00
Sergey Sharybin	f970e859cf	Cycles: Cleanup, style	2017-04-18 11:39:21 +02:00
Lukas Stockner	ef816f9cff	Cycles: Fix the AO replacement option in the split kernel Currently the code for it was inside the hair-specific part, so it wouldn't be enabled in hairless renders.	2017-04-11 01:07:49 +02:00
Sergey Sharybin	0579eaae1f	Cycles: Make all #include statements relative to cycles source directory The idea is to make include statements more explicit and obvious where the file is coming from, additionally reducing chance of wrong header being picked up. For example, it was not obvious whether bvh.h was refferring to builder or traversal, whenter node.h is a generic graph node or a shader node and cases like that. Surely this might look obvious for the active developers, but after some time of not touching the code it becomes less obvious where file is coming from. This was briefly mentioned in T50824 and seems @brecht is fine with such explicitness, but need to agree with all active developers before committing this. Please note that this patch is lacking changes related on GPU/OpenCL support. This will be solved if/when we all agree this is a good idea to move forward. Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner Reviewed By: lukasstockner97, maiself, nirved, dingto Subscribers: brecht Differential Revision: https://developer.blender.org/D2586	2017-03-29 13:41:11 +02:00
Hristo Gueorguiev	e07ffcbd1c	Cycles: Add OpenCL support for shadow catcher feature The title says it all actually.	2017-03-27 10:46:59 +02:00
Hristo Gueorguiev	8ada7f7397	Cycles: Remove ccl_addr_space from RNG passed to functions Simplifies code quite a bit, making it shorter and easier to extend. Currently no functional changes for users, but is required for the upcoming work of shadow catcher support with OpenCL.	2017-03-27 10:46:28 +02:00
Sergey Sharybin	d14e39622a	Cycles: First implementation of shadow catcher It uses an idea of accumulating all possible light reachable across the light path (without taking shadow blocked into account) and accumulating total shaded light across the path. Dividing second figure by first one seems to be giving good estimate of the shadow. In fact, to my knowledge, it's something really similar to what is happening in the denoising branch, so we are aligned here which is good. The workflow is following: - Create an object which matches real-life object on which shadow is to be catched. - Create approximate similar material on that object. This is needed to make indirect light properly affecting CG objects in the scene. - Mark object as Shadow Catcher in the Object properties. Ideally, after doing that it will be possible to render the image and simply alpha-over it on top of real footage.	2017-03-27 10:46:03 +02:00
Sergey Sharybin	d6b4fb6429	Cycles: Fix mistake in previous split kernel commits Own stupid mistake. Reported by nirved in IRC, thanks!	2017-03-17 11:55:59 +01:00
Mai Lavelle	60a344b43d	Cycles: Fix handling of barriers	2017-03-17 01:54:04 -04:00
Sergey Sharybin	1cad64900e	Cycles: Define ccl_local variables in kernel functions Declaring ccl_local in a device function is not supported by certain compilers.	2017-03-16 11:27:17 +01:00
Sergey Sharybin	26620f3f87	Cycles: Avoid some ccl_local in various kernels	2017-03-16 11:27:17 +01:00
Sergey Sharybin	76acaefdd7	Cycles: Cleanup, wipe obviously outdated parts of split kernel comments	2017-03-13 17:16:16 +01:00
Sergey Sharybin	aa36c73c33	Cycles: Add missing header in the file	2017-03-13 16:59:09 +01:00
Hristo Gueorguiev	f169ff8b88	Fix T50925: Add AO approximation to split kernel	2017-03-13 11:15:58 +01:00
Mai Lavelle	96868a3941	Fix T50888: Numeric overflow in split kernel state buffer size calculation Overflow led to the state buffer being too small and the split kernel to get stuck doing nothing forever.	2017-03-11 05:39:28 -05:00
Hristo Gueorguiev	06c051363b	Cycles: split kernel_shadow_blocked to AO & DL parts Reduces memory allocation for split kernel. This allows for faster rendering due to bigger global size, specially when GPU memory is limited. Perfromance results: R9 290 total render time Before After Change BMW 4:37 4:34 -1.1 % Classroom 14:43 14:30 -1.5 % Fishy Cat 11:20 11:04 -2.4 % Koro 12:11 12:04 -1.0 % Pabellon Barcelona 22:01 20:44 -5.8 % Pabellon Barcelona() 15:32 15:09 -2.5 % () without glossy connected to volume	2017-03-09 17:09:37 +01:00
Hristo Gueorguiev	57e26627c4	Cycles: SSS and Volume rendering in split kernel Decoupled ray marching is not supported yet. Transparent shadows are always enabled for volume rendering. Changes in kernel/bvh and kernel/geom are from Sergey. This simiplifies code significantly, and prepares it for record-all transparent shadow function in split kernel.	2017-03-09 17:09:37 +01:00
Sergey Sharybin	712f7c3640	Cycles: Make it possible to access KernelGlobals from split data initialization function	2017-03-08 11:02:54 +01:00
Mai Lavelle	64751552f7	Cycles: Fix indentation	2017-03-08 01:31:32 -05:00
Mai Lavelle	fe7cc94dfa	Cycles: Fix strict warning about unused variable	2017-03-08 01:31:32 -05:00
Mai Lavelle	306034790f	Cycles: Calculate size of split state buffer kernel side By calculating the size of the state buffer in the kernel rather than the host less code is needed and the size actually reflects the requested features. Will also be a little faster in some cases because of larger global work size.	2017-03-08 01:31:30 -05:00
Mai Lavelle	223f45818e	Cycles: Initialize rng_state for split kernel Because the split kernel can render multiple samples in parallel it is necessary to have everything initialized before rendering of any samples begins. The code that normally handles initialization of `rng_state` (`kernel_path_trace_setup()`) only does so for the first sample, which was causing artifacts in the split kernel due to uninitialized `rng_state` for some samples. Note that because the split kernel can render samples in parallel this means that the split kernel is incompatible with the LCG.	2017-03-08 01:31:09 -05:00
Mai Lavelle	cd7d5669d1	Cycles: Remove sum_all_radiance kernel This was only needed for the previous implementation of parallel samples. As we don't have that any more it can be removed. Real reason for removal tho is this: `per_sample_output_buffers` was being calculated too small and artifacts resulted. The tile buffer is already the correct size and calculating the size for `per_sample_output_buffers` is a bit difficult with the current layout of the code. As `per_sample_output_buffers` was only needed for `sum_all_radiance`, removing that kernel and writing output to the tile buffer directly fixes the artifacts.	2017-03-08 01:31:07 -05:00
Mai Lavelle	4cf501b835	Cycles: Split path initialization into own kernel This makes it easier to initialize things correctly in the data_init kernel before they are needed by path tracing.	2017-03-08 01:30:43 -05:00
Mai Lavelle	817873cc83	Cycles: CUDA implementation of split kernel	2017-03-08 01:24:53 -05:00
Mai Lavelle	0892352bfe	Cycles: CPU implementation of split kernel	2017-03-08 00:52:41 -05:00

1 2

83 Commits