blender

Author	SHA1	Message	Date
Mai Lavelle	0892352bfe	Cycles: CPU implementation of split kernel	2017-03-08 00:52:41 -05:00
Mai Lavelle	230c00d872	Cycles: OpenCL split kernel refactor This does a few things at once: - Refactors host side split kernel logic into a new device agnostic class `DeviceSplitKernel`. - Removes tile splitting, a new work pool implementation takes its place and allows as many threads as will fit in memory regardless of tile size, which can give performance gains. - Refactors split state buffers into one buffer, as well as reduces the number of arguments passed to kernels. Means there's less code to deal with overall. - Moves kernel logic out of OpenCL kernel files so they can later be used by other device types. - Replaced OpenCL specific APIs with new generic versions - Tiles can now be seen updating during rendering	2017-03-08 00:52:41 -05:00
Sergey Sharybin	26cdc64a7f	Cycles: Split motion triangle file once again, avoids annoying forward declarations	2017-01-20 12:46:17 +01:00
Sergey Sharybin	14d343a8f9	Cycles: Move motion triangle intersection functions to own file Mimics how regular triangles are working and makes it more clear where the stuff is located in the kernel. Needed to have some forward declarations because of the current placement of things in the kernel.	2017-01-20 12:46:17 +01:00
Lukas Stockner	4e68f48227	Cycles: Initialize the RNG state from the kernel instead of the host This allows to save a memory copy, which will be particularly useful for network rendering. Reviewers: sergey, brecht, dingto, juicyfruit, maiself Differential Revision: https://developer.blender.org/D2323	2016-10-30 11:51:20 +01:00
Hristo Gueorguiev	8905c5c874	Cycles: OpenCL 3d textures support. Note that volume rendering is not supported yet, this is a step towards that. Reviewed By: brecht Differential Revision: https://developer.blender.org/D2299	2016-10-22 23:49:29 +02:00
Brecht Van Lommel	b4f9766ed1	Cycles CUDA: make CUDA 8.0 the officially supported version for all platforms.	2016-10-03 22:15:26 +02:00
Sergey Sharybin	2980c6ebae	Cycles: Move BVH constants to an own files, so they are easily re-usable	2016-09-19 13:00:41 +02:00
Mai Lavelle	e7ea1ae78c	Cycles microdisplacement: Improved automatic bump mapping Object coordinates can now be used in the displacement shader and will give correct results, where as before bump mapping was calculated from the displace positions and resulted in incorrect shading. This works by evaluating the shader in two parts, first bump then surface, and setting the shader state to match what it would be if the surface was undisplaced for the bump shader evaluation. Currently only `P` is set as if undisplaced, but other shader variables could be set as well, such as `I` or `time`. Since these aren't set to anything meaningful for displacement I left them out of this patch, we can decide what to do with them separately. Reviewed By: brecht Differential Revision: https://developer.blender.org/D2156	2016-09-01 22:45:49 -04:00
Sergey Sharybin	fdc43f993d	Cycles: Use static assert to control structures alignment	2016-08-11 10:12:06 +02:00
Mai Lavelle	0b68c68006	Cycles microdisplacement: Support for Catmull-Clark subdivision via OpenSubdiv Enables Catmull-Clark subdivision meshes with support for creases and attribute subdivision. Still waiting on OpenSubdiv to fully support face varying interpolation for subdividing uv coordinates tho. Also there may be some inconsistencies with Blender's subdivision which will be resolved at a later time. Code for reading patch tables and creating patch maps is borrowed from OpenSubdiv. Reviewed By: brecht Differential Revision: https://developer.blender.org/D2111	2016-08-07 11:13:11 -04:00
Sergey Sharybin	08ebd72851	Buildbot: Use annoying hybrid setup of two CUDA toolkits This is for until we'll solve issues with toolkit 8.0.	2016-08-02 15:32:03 +02:00
Brecht Van Lommel	9b6ed3a42b	Cycles: refactor kernel closure storage to use structs per closure type. Reviewed By: dingto, sergey Differential Revision: https://developer.blender.org/D2127	2016-07-31 02:34:43 +02:00
Alexander Gavrilov	ea2ebf7a00	Cycles: constant folding for RGB/Vector Curves and Color Ramp. These are complex nodes, and it's conceivable they may end up constant in some circumstances within node groups, so folding support is useful. Reviewed By: brecht Differential Revision: https://developer.blender.org/D2084	2016-07-31 02:18:23 +02:00
Mai Lavelle	c96ae81160	Cycles microdisplacement: ngons and attributes for subdivision meshes This adds support for ngons and attributes on subdivision meshes. Ngons are needed for proper attribute interpolation as well as correct Catmull-Clark subdivision. Several changes are made to achieve this: - new primitive `SubdFace` added to `Mesh` - 3 more textures are used to store info on patches from subd meshes - Blender export uses loop interface instead of tessface for subd meshes - `Attribute` class is updated with a simplified way to pass primitive counts around and to support ngons. - extra points for ngons are generated for O(1) attribute interpolation - curves are temporally disabled on subd meshes to avoid various bugs with implementation - old unneeded code is removed from `subd/` - various fixes and improvements Reviewed By: brecht Differential Revision: https://developer.blender.org/D2108	2016-07-29 03:36:30 -04:00
Sergey Sharybin	2ecbc3b777	Cycles: Add _all suffix to shadow traversal file Matches better naming of volume traversal files, where we've got optimized versions of a single step of volume intersection and traversal which will gather all volume intersections.	2016-07-11 13:58:47 +02:00
Sergey Sharybin	4355603790	Cycles: Move BVK kernel files to own directory BVH traversal is not really that much a geometry and we've got quite some traversals now. Makes sense to keep them separate in the name of source structure clarity.	2016-07-11 13:58:47 +02:00
Sergey Sharybin	a08e2179f1	Cycles: Implement unaligned nodes BVH traversal This commit implements traversal of unaligned BVH nodes. QBVH traversal is fully SIMD optimized and calculates orientation for all 4 children at a time, regular BVH might probably be optimized a bit more.	2016-07-07 17:25:48 +02:00
Lukas Stockner	23c276832b	Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs This commit adds a new distribution to the Glossy, Anisotropic and Glass BSDFs that implements the multiple-scattering microfacet model described in the paper "Multiple-Scattering Microfacet BSDFs with the Smith Model". Essentially, the improvement is that unlike classical GGX, which only models single scattering and assumes the contribution of multiple bounces to be zero, this new model performs a random walk on the microsurface until the ray leaves it again, which ensures perfect energy conservation. In practise, this means that the "darkening problem" - GGX materials becoming darker with increasing roughness - is solved in a physically correct and efficient way. The downside of this model is that it has no (known) analytic expression for evalation. However, it can be evaluated stochastically, and although the correct PDF isn't known either, the properties of MIS and the balance heuristic guarantee an unbiased result at the cost of slightly higher noise. Reviewers: dingto, #cycles, brecht Reviewed By: dingto, #cycles, brecht Subscribers: bliblubli, ace_dragon, gregzaal, brecht, harvester, dingto, marcog, swerner, jtheninja, Blendify, nutel Differential Revision: https://developer.blender.org/D2002	2016-06-23 22:57:26 +02:00
Lukas Stockner	dfa7ddd4a8	Cycles: Add svm_util_color.h file to CMake The file wasn't included in CMake and therefore not installed into the addon folder.	2016-06-20 19:07:22 +02:00
Martijn Berger	50f432b1e0	CMake, minor changes to make Visual studio 2015 use a compatible numpy and the standard cmake CUDA/NVCC arguments flag allowing 2015 build to use msvc 2013 for cuda	2016-06-04 11:42:48 +02:00
Thomas Dinges	a5a05fc291	Cycles: Fix long compile time with MSVC. Compile time per kernel increased alot after recent image commits, re-shuffle some code to fix this. Patch by "LazyDodo". Differential Revision: https://developer.blender.org/D2012	2016-05-20 16:50:29 +02:00
Sergey Sharybin	f616caa315	CMake: Fix compilation error when toolkit gives empty result Should we also check whether toolkit exist perhaps?	2016-05-09 16:05:02 +02:00
Thomas Beck	64c7306cdb	Cycles: Insert util_texture.h in CMakeLists to make Cycles compile again after recent refactory.	2016-04-16 11:58:38 +02:00
Sergey Sharybin	8cab327316	Cycles: Make CUDA 7.5 officially recommended This was a hard decision, because going newer CUDA toolkit makes rendering up to 5% slower. But on another hand, it solves major speed regressions (up to 30%) with branched path tracing on a top level cards. Neither of those regressions have a meaningful and sane workaround from the code itself. Toolkit 6.5 could still be used, but it's no longer recommended one.	2016-02-17 15:18:56 +01:00
Sergey Sharybin	1336e97b12	Cycles: Use CUDA_64_BIT_DEVICE_CODE to detect which CUDA architecture to use It is initialized based on size of pointer, which matches our previous behavior, but using it in Cycles side allows to cross-compile CUDA binaries.	2016-02-15 19:08:36 +01:00
Sergey Sharybin	9815f8a623	Cycles: Cleanup of OpenCL split kernel routines The idea is to switch from allocating separate buffers for shader data's structure of arrays to allocating one huge memory block and do some index trickery to make it accessed as SOA. This saves quite reasonable amount of lines of code in device_opencl and also makes it possible to get rid of special declaration of ShaderData structure. As a side effect it also makes it easier to experiment with SOA vs. AOS for split kernel. Works fine here on NVidia GTX580, Intel CPU amd AMD Fiji cards. Reviewers: #cycles, brecht, juicyfruit, dingto Differential Revision: https://developer.blender.org/D1593	2016-01-30 00:23:06 +01:00
Thomas Dinges	3ba9742be2	Cycles: Remove the experimental CUDA kernel. This commit removes the experimental CUDA kernel, making SSS and CMJ regular features. Several improvements have been made in the past few weeks (thanks Sergey!) which make SSS render several times faster (2-3x compared to 2.76b) on the GPU, and the increased VRAM usage has also been fixed. Therefore the experimental kernel is no longer needed. Differential Revision: https://developer.blender.org/D1726 Manual has been updated: too: https://www.blender.org/manual/render/cycles/features.html	2016-01-14 12:56:08 +01:00
Sergey Sharybin	c8a551bf13	Cycles: Don't install CPU-related kernel files	2015-12-30 18:51:35 +05:00
Sergey Sharybin	2b5d60eb2d	Cycles: Deduplicte CPU kernel declaration and definition code Main goal is to make kernel signatures editing easier and less prone to the errors caused by missing function signature update or so. This will also make it easier to add new CPU architectures. Reviewers: juicyfruit, dingto, lukasstockner97, brecht Reviewed By: dingto, lukasstockner97, brecht Differential Revision: https://developer.blender.org/D1703	2015-12-30 17:54:02 +05:00
Sergey Sharybin	a43e087fb8	Cycles: Add split kernel headers to project generation	2015-10-31 04:08:19 +05:00
Sergey Sharybin	7d10798af2	Cycles: Add voxel texture sampler shader node The idea of this node is to sampling of 3D voxels at a given coordinate supporting different mapping strategies (world space mapping, object local space etc). Currently not in use, it's a preparation step for supporting point density textures.	2015-07-18 22:09:20 +02:00
Campbell Barton	3bb698646a	CMake: minor edits	2015-06-30 22:44:27 +10:00
Sergey Sharybin	099aaea447	Cycles: Move branched path tracking into own file Code there started becoming a bit too big, by splitting it up it'll make it easier to do improvements or extending the features in there. The layout is not totally final yet, would need to try de-duplicating parts of code from split kernel with non-split integrators,	2015-06-15 23:02:42 +02:00
Sergey Sharybin	84ad20acef	Fix T44833: Can't use ccl_local space in non-kernel functions This commit re-shuffles code in split kernel once again and makes it so common parts which is in the headers is only responsible to making all the work needed for specified ray index. Getting ray index, checking for it's validity and enqueuing tasks are now happening in the device specified part of the kernel. This actually makes sense because enqueuing is indeed device-specified and i.e. with CUDA we'll want to enqueue kernels from kernel and avoid CPU roundtrip. TODO: - Kernel comments are still placed in the common header files, but since queue related stuff is not passed to those functions those comments might need to be split as well. Just currently read them considering that they're also covering the way how all devices are invoking the common code path. - Arguments might need to be wrapped into KernelGlobals, so we don't ened to pass all them around as function arguments.	2015-05-26 22:54:02 +05:00
Sergey Sharybin	2c503d8303	Cycles: Restructure kernel files organization Since the kernel split work we're now having quite a few of new files, majority of which are related on the kernel entry points. Keeping those files in the root kernel folder will eventually make it really hard to follow which files are actual implementation of Cycles kernel. Those files are now moved to kernel/kernels/<device_type>. This way adding extra entry points will be less noisy. It is also nice to have all device-specific files grouped together. Another change is in the way how split kernel invokes logic. Previously all the logic was implemented directly in the .cl files, which makes it a bit tricky to re-use the logic across other devices. Since we'll likely be looking into doing same split work for CUDA devices eventually it makes sense to move logic from .cl files to header files. Those files are stored in kernel/split. This does not mean the header files will not give error messages when tried to be included from other devices and their arguments will likely be changed, but having such separation is a good start anyway. There should be no functional changes. Reviewers: juicyfruit, dingto Differential Revision: https://developer.blender.org/D1314	2015-05-22 16:31:34 +05:00
Sergey Sharybin	329f704601	Cycles: Move utility atomics function to util_atomic.h No functional changes, just better to keep all atomic function in a single place, they might become handy later.	2015-05-21 16:12:50 +05:00
Sergey Sharybin	2ab909a88c	Cycles: Make experimental kernel build option more generic Previously it was explicitly mentioning it's NVidia kernel related option, but in fact it's also handy for the OpenCL kernel.	2015-05-15 13:22:47 +05:00
George Kyriazis	7f4479da42	Cycles: OpenCL kernel split This commit contains all the work related on the AMD megakernel split work which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely someone else which we're forgetting to mention. Currently only AMD cards are enabled for the new split kernel, but it is possible to force split opencl kernel to be used by setting the following environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1. Not all the features are supported yet, and that being said no motion blur, camera blur, SSS and volumetrics for now. Also transparent shadows are disabled on AMD device because of some compiler bug. This kernel is also only implements regular path tracing and supporting branched one will take a bit. Branched path tracing is exposed to the interface still, which is a bit misleading and will be hidden there soon. More feature will be enabled once they're ported to the split kernel and tested. Neither regular CPU nor CUDA has any difference, they're generating the same exact code, which means no regressions/improvements there. Based on the research paper: https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf Here's the documentation: https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit Design discussion of the patch: https://developer.blender.org/T44197 Differential Revision: https://developer.blender.org/D1200	2015-05-09 19:52:40 +05:00
Thomas Dinges	b3def11f5b	Cycles: Record all possible volume intersections for SSS and camera checks This replaces sequential ray moving followed with scene intersection with single BVH traversal, which gives us all possible intersections. Only implemented for CPU, due to qsort and a bigger memory usage on GPU which we rather avoid. GPU still uses the regular bvh volume intersection code, while CPU now uses the new code. This improves render performance for scenes with: a) Camera inside volume mesh b) SSS mesh intersecting a volume mesh/domain In simple volume files (not much geometry) performance is roughly the same (slightly faster). In files with a lot of geometry, the performance increase is larger. bmps.blend with a volume shader and camera inside the mesh, it renders ~10% faster here. Patch by Sergey and myself. Differential Revision: https://developer.blender.org/D1264	2015-04-29 23:31:06 +02:00
Sergey Sharybin	6cd82dbf57	CMake: Enable strict flags for C++	2015-03-27 18:23:31 +05:00
Sergey Sharybin	585dd26120	Cycles: Code cleanup, prepare for strict C++ flags	2015-03-27 18:23:31 +05:00
Sergey Sharybin	dc1043dda0	Cycles: Add fast math function module It is based on fmath.h from OIIO and could be used to give some speedup in areas where absolute accuracy is not so critical.	2015-01-31 01:49:41 +05:00
Sergey Sharybin	2382c8decd	Cycles: Fix compilation error with compilers which doesn't support AVX For SSE checks still could be decoupled to be able to compile SSE2 kernel and not SSE4 depending on the CPU or so.	2015-01-01 01:31:08 +05:00
Sergey Sharybin	03f28553ff	Cycles: Implement QBVH tree traversal This commit implements traversal for QBVH tree, which is based on the old loop code for traversal itself and Embree for node intersection. This commit also does some changes to the loop inspired by Embree: - Visibility flags are only checked for primitives. Doing visibility check for every node cost quite reasonable amount of time and in most cases those checks are true-positive. Other idea here would be to do visibility checks for leaf nodes only, but this would need to be investigated further. - For minimum hair width we extend all the nodes' bounding boxes. Again doing curve visibility check is quite costly for each of the nodes and those checks returns truth for most of the hierarchy anyway. There are number of possible optimization still, but current state is good enough in terms it makes rendering faster a little bit after recent watertight commit. Currently QBVH is only implemented for CPU with SSE2 support at least. All other devices would need to be supported later (if that'd make sense from performance point of view). The code is enabled for compilation in kernel. but blender wouldn't use it still.	2014-12-25 02:50:49 +05:00
Sergey Sharybin	f4df3ec05a	Cycles: Move triangle intersection functions into own file This way extending intersection routines with some pre-calculation step wouldn't explode the single file size, hopefully keeping them all in a nice maintainable state.	2014-12-25 02:50:48 +05:00
Sergey Sharybin	6a4a911fc3	Cycles: Optimize math node without links to a single value node Pretty straightforward implementation. Just needed to move some functions around to make them available at shader compile time.	2014-10-29 16:31:13 +05:00
Sergey Sharybin	e4b910a0aa	Cycles: __KERNEL_DEBUG__ wasn't set for compile-time kernels	2014-10-05 21:42:53 +06:00
Sergey Sharybin	27d660ad20	Cycles: Add support for debug passes Currently only summed number of traversal steps and intersections used by the camera ray intersection pass is implemented, but in the future we will support more debug passes which would help checking what things makes the scene slow. Example of such extra passes could be number of bounces, time spent on the shader tree evaluation and so. Implementation from the Cycles side is pretty much straightforward, could only mention here that it's a build-time option disabled by default. From the blender side it's implemented as a PASS_DEBUG with several subtypes possible. This way we don't need to create an extra DNA pass type for each of the debug passes, saving us a bits. Reviewers: campbellbarton Reviewed By: campbellbarton Differential Revision: https://developer.blender.org/D813	2014-10-04 19:00:26 +06:00
Thomas Dinges	5e10392e9f	Cycles: Missing volume traversal header in cmake for GPU compilation.	2014-10-03 17:11:00 +02:00

1 2 3

145 Commits