blender

Author	SHA1	Message	Date
Sergey Sharybin	ae41a08288	Cycles: Attempt to work around compilation of sm_20 and sm_21 Disabled forceinline for those architectures, which seems to be compiling successfully more often. There might be ~3% slowdown based on quick tests, but better be rendering something rather than failing to compile kernels again and again. Those architectures will be doomed for abandon once we'll switch to toolkit 9.	2017-09-08 18:37:54 +02:00
Sergey Sharybin	fd397a7d28	Cycles: Add utility macro ccl_ref It is defined to & for CPU side compilation, and defined to an empty for any GPU platform. The idea here is to use this macro instead of #ifdef block with bunch of duplicated lines just to make it so CPU code is efficient. Eventually we might switch to references on CUDA as well, but that would require some intensive testing.	2017-08-08 15:27:25 +02:00
Sergey Sharybin	55c15ad9de	Cycles: Use falltrhough attribute to help catching missing break statements	2017-05-24 17:23:54 +02:00
Sergey Sharybin	803337f3f6	\0;115;0cCycles: Cleanup, use ccl_restrict instead of ccl_restrict_ptr There were following issues with ccl_restrict_ptr: - We already had ccl_restrict for all platforms. - It was secretly adding `const` qualifier to the declaration, which is quite weird since non-const pointer can also be declared as restricted. - We never in Blender are using foo_ptr or FooPtr type definitions, so not sure why we should introduce such a thing here. - It is absolutely wrong from semantic point of view to put pointer into the restrict macro -- const is a part of type, not part of hint for compiler that some pointer is never aliased.	2017-05-19 12:41:03 +02:00
Lukas Stockner	43b374e8c5	Cycles: Implement denoising option for reducing noise in the rendered image This commit contains the first part of the new Cycles denoising option, which filters the resulting image using information gathered during rendering to get rid of noise while preserving visual features as well as possible. To use the option, enable it in the render layer options. The default settings fit a wide range of scenes, but the user can tweak individual settings to control the tradeoff between a noise-free image, image details, and calculation time. Note that the denoiser may still change in the future and that some features are not implemented yet. The most important missing feature is animation denoising, which uses information from multiple frames at once to produce a flicker-free and smoother result. These features will be added in the future. Finally, thanks to all the people who supported this project: - Google (through the GSoC) and Theory Studios for sponsoring the development - The authors of the papers I used for implementing the denoiser (more details on them will be included in the technical docs) - The other Cycles devs for feedback on the code, especially Sergey for mentoring the GSoC project and Brecht for the code review! - And of course the users who helped with testing, reported bugs and things that could and/or should work better!	2017-05-07 14:40:58 +02:00
Mai Lavelle	b60d4800c6	Cycles: Fix building of CUDA kernels with compilers where C++11 is disabled	2017-04-08 07:12:04 -04:00
Sergey Sharybin	7d77b3e813	Cycles: Fix compilation error with certain CUDA and host compiler configuration This seems to happen on Windows only, happened to Thomas and Nathan already. Similar patch Thomas was showing, but i do not see it committted. So comitting now in order to get more developers and users happy.	2017-04-07 18:28:38 +02:00
Sergey Sharybin	cc7386ec6b	Cycles: Remove toolkit-specific workaround from kernel	2017-03-29 15:07:53 +02:00
Sergey Sharybin	0579eaae1f	Cycles: Make all #include statements relative to cycles source directory The idea is to make include statements more explicit and obvious where the file is coming from, additionally reducing chance of wrong header being picked up. For example, it was not obvious whether bvh.h was refferring to builder or traversal, whenter node.h is a generic graph node or a shader node and cases like that. Surely this might look obvious for the active developers, but after some time of not touching the code it becomes less obvious where file is coming from. This was briefly mentioned in T50824 and seems @brecht is fine with such explicitness, but need to agree with all active developers before committing this. Please note that this patch is lacking changes related on GPU/OpenCL support. This will be solved if/when we all agree this is a good idea to move forward. Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner Reviewed By: lukasstockner97, maiself, nirved, dingto Subscribers: brecht Differential Revision: https://developer.blender.org/D2586	2017-03-29 13:41:11 +02:00
Mai Lavelle	c837bd5ea5	Cycles: Fix CUDA build error for some compilers Needed to include `util_types.h` before using `uint`.	2017-03-08 16:44:43 -05:00
Mai Lavelle	817873cc83	Cycles: CUDA implementation of split kernel	2017-03-08 01:24:53 -05:00
Brecht Van Lommel	a3abb020e3	Fix Cycles CUDA performance on CUDA 8.0. Mostly this is making inlining match CUDA 7.5 in a few performance critical places. The end result is that performance is now better than before, possibly due to less register spilling or other CUDA 8.0 compiler improvements. On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory usage is reduced a little too. Reviewed By: sergey Differential Revision: https://developer.blender.org/D2269	2016-10-03 22:15:25 +02:00
Brecht Van Lommel	33c83a289d	Fix Cycles OpenCL textures after recent CUDA fix. kernel_textures.h is included in device_opencl.cpp, so we can't check __KERNEL_OPENCL__ there.	2016-08-15 16:28:48 +02:00
Thomas Dinges	9d236ac06c	Cycles: Enable half float support (4 channels and 1 channel) on CUDA. Atm OpenEXR half files benefit from this and will use only 1/2 of the memory now. More space for HDRs! Part of my GSoC 2016.	2016-08-11 22:47:53 +02:00
Brecht Van Lommel	040fa75d7b	Fix Cycles CUDA adaptive kernel not working correctly after recent closure changes.	2016-08-09 01:31:07 +02:00
Sergey Sharybin	960db4c961	Cycles: Revert recent inline changes for CUDA 8 and sm_50+ This changes actually lead to 2x slowdown. It's getting a bit annoying because those are the changes to make pre-maxwell cards render with the same speed.	2016-08-03 11:41:58 +02:00
Sergey Sharybin	6353ecb996	Cycles: Tweaks to support CUDA 8 toolkit All the changes are mainly giving explicit tips on inlining functions, so they match how inlining worked with previous toolkit. This make kernel compiled by CUDA 8 render in average with same speed as previous kernels. Some scenes are somewhat faster, some of them are somewhat slower. But slowdown is within 1% so far. On a positive side it allows us to enable newer generation cards on buildbots (so GTX 10x0 will be officially supported soon).	2016-08-01 15:54:29 +02:00
Sergey Sharybin	cb3b19730c	Cycles: Use utility define for restrict pointers This way restrict can be used for CUDA and OpenCL as well. From quick tests in areas i've been testing this it might give some barely measurable %% of speedup, but it increases registers pressure. So use of this qualifier is still really limited.	2016-07-11 13:58:47 +02:00
Thomas Dinges	c9f1ed1e4c	Cycles: Add support for bindless textures. This adds support for CUDA Texture objects (also known as Bindless textures) for Kepler GPUs (Geforce 6xx and above). This is used for all 2D/3D textures, data still uses arrays as before. User benefits: * No more limits of image textures on Kepler. We had 5 float4 and 145 byte4 slots there before, now we have 1024 float4 and 1024 byte4. This can be extended further if we need to (just change the define). * Single channel textures slots (byte and float) are now supported on Kepler as well (1024 slots for each type). ToDo / Issues: * 3D textures don't work yet, at least don't show up during render. I have no idea whats wrong yet. * Dynamically allocate bindless_mapping array? I hope Fermi still works fine, but that should be tested on a Fermi card before pushing to master. Part of my GSoC 2016. Reviewers: sergey, #cycles, brecht Subscribers: swerner, jtheninja, brecht, sergey Differential Revision: https://developer.blender.org/D1999	2016-05-19 13:14:37 +02:00
Sergey Sharybin	700722f686	Cycles: Cleanup, indent nested preprocessor directives Quite straightforward, main trick is happening in path_source_replace_includes(). Reviewers: brecht, dingto, lukasstockner97, juicyfruit Differential Revision: https://developer.blender.org/D1794	2016-03-25 13:55:42 +01:00
Sergey Sharybin	1c4f21f85e	Cycles: Initial support of 3D textures for CUDA rendering Supports both smoke/fire and point density textures now. Reduces number of textures available for sm_20 and sm_21, but you have to compromise somewhere on such a limited hardware. Currently limited to linear interpolation only, and decoupled ray marching is not supported yet. Think those could be considered just a further improvement. Some quick example: https://developer.blender.org/F282934 Code is minimal and we can fully consider it a fix for missing support of 3D textures with CUDA. Reviewers: lukasstockner97, brecht, juicyfruit, dingto Reviewed By: brecht, juicyfruit, dingto Subscribers: mib2berlin Differential Revision: https://developer.blender.org/D1806	2016-02-15 21:26:29 +01:00
George Kyriazis	7f4479da42	Cycles: OpenCL kernel split This commit contains all the work related on the AMD megakernel split work which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely someone else which we're forgetting to mention. Currently only AMD cards are enabled for the new split kernel, but it is possible to force split opencl kernel to be used by setting the following environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1. Not all the features are supported yet, and that being said no motion blur, camera blur, SSS and volumetrics for now. Also transparent shadows are disabled on AMD device because of some compiler bug. This kernel is also only implements regular path tracing and supporting branched one will take a bit. Branched path tracing is exposed to the interface still, which is a bit misleading and will be hidden there soon. More feature will be enabled once they're ported to the split kernel and tested. Neither regular CPU nor CUDA has any difference, they're generating the same exact code, which means no regressions/improvements there. Based on the research paper: https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf Here's the documentation: https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit Design discussion of the patch: https://developer.blender.org/T44197 Differential Revision: https://developer.blender.org/D1200	2015-05-09 19:52:40 +05:00
Sergey Sharybin	6fc1669679	Cycles: Initial work towards selective nodes support compilation The goal is to be able to compile kernel with nodes which are actually needed to render current scene, hence improving performance of the kernel, The idea is: - Have few node groups, starting with a group which contains nodes are used really often, and then couple of groups which will be extension of this one. - Have feature-based nodes disabling, so it's possible to disable nodes related to features which are not used with the currently used nodes group. This commit only lays down needed routines for this approach, actual split will happen later after gathering statistics from bunch of production scenes.	2015-05-09 19:22:16 +05:00
Thomas Dinges	ee36e75b85	Cleanup: Fix Cycles Apache header. This was already mixed a bit, but the dot belongs there.	2014-12-25 02:50:24 +01:00
Campbell Barton	e2522b4a29	Cycles: correct math wrappers include the parens around value before cast, in some cases was causing double/float promotion by only casting the left value.	2014-10-08 00:13:26 +02:00
Brecht Van Lommel	741f17f05b	Cycles CUDA: make CUDA toolkit 6.0 the official supported version. This also updates the configurations to build kernels for compute capability 5.0 cards, when using and older CUDA toolkit version this will be skipped. Also includes tweaks to improve performance with this version: * Increase max registers on sm_30, sm_35 and sm_50 * No longer use texture storage on sm_30	2014-04-30 16:07:27 +02:00
Brecht Van Lommel	d9e52ac98b	Code cleanup: move half float functions to separate header file.	2014-01-15 15:29:22 +01:00
Brecht Van Lommel	c18712e868	Cycles: change __device and similar qualifiers to ccl_device in kernel code. This to avoids build conflicts with libc++ on FreeBSD, these __ prefixed values are reserved for compilers. I apologize to anyone who has patches or branches and has to go through the pain of merging this change, it may be easiest to do these same replacements in your code and then apply/merge the patch. Ref T37477.	2013-11-18 08:48:15 +01:00
Brecht Van Lommel	fa352bb749	Fix #35684 : cycles unable to use full 6GB of memory on NVidia Titan GPU. We now use arrays instead of textures for general storage on this card (image textures are still stored as texture). Textures were found to be faster on older cards, but the limits on 1D texture size have not increased along with the memory size, which meant that the full 6 GB could not be used. The performance actually seems to be slightly better with arrays in some tests on Titan. For older cards there seems to be a bit of a mix, some are better and others not. We may change those to use arrays too, but more testing is needed, only Titan and Tesla K20 (sm_35) is changed for now. The fact that arrays are faster is a bit surprising, as others found textures to be faster on Kepler. However even if they were, the memory limitation is more important to solve anyway. https://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus-kepler-and-fermi-addendum	2013-09-27 19:09:31 +00:00
Brecht Van Lommel	29f6616d60	Cycles: viewport render now takes scene color management settings into account, except for curves, that's still missing from the OpenColorIO GLSL shader. The pixels are stored in a half float texture, converterd from full float with native GPU instructions and SIMD on the CPU, so it should be pretty quick. Using a GLSL shader is useful for GPU render because it avoids a copy through CPU memory.	2013-08-30 23:49:38 +00:00
Brecht Van Lommel	b9ce231060	Cycles: relicense GNU GPL source code to Apache version 2.0. More information in this post: http://code.blender.org/ Thanks to all contributes for giving their permission!	2013-08-18 14:16:15 +00:00
Brecht Van Lommel	de9dffc61e	Cycles: initial subsurface multiple scattering support. It's not working as well as I would like, but it works, just add a subsurface scattering node and you can use it like any other BSDF. It is using fully raytraced sampling compatible with progressive rendering and other more advanced rendering algorithms we might used in the future, and it uses no extra memory so it's suitable for complex scenes. Disadvantage is that it can be quite noisy and slow. Two limitations that will be solved are that it does not work with bump mapping yet, and that the falloff function used is a simple cubic function, it's not using the real BSSRDF falloff function yet. The node has a color input, along with a scattering radius for each RGB color channel along with an overall scale factor for the radii. There is also no GPU support yet, will test if I can get that working later. Node Documentation: http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Shaders#BSSRDF Implementation notes: http://wiki.blender.org/index.php/Dev:2.6/Source/Render/Cycles/Subsurface_Scattering	2013-04-01 20:26:52 +00:00
Brecht Van Lommel	40b05d364e	Cycles: code refactoring to add generic lookup table memory.	2013-04-01 20:26:43 +00:00
Brecht Van Lommel	12117a8187	Fix cycles aliasing warnings caused by motion blur transforms.	2012-12-21 10:26:48 +00:00
Thomas Dinges	a7c68afc72	Cycles / CUDA: * Remove double declaration of cosf.	2012-05-28 23:51:06 +00:00
Brecht Van Lommel	131de4352b	Cycles: fixes to make CUDA 4.2 work, compiling gave errors in shadows and other places, was mainly due to instancing not working, but also found issues in procedural textures. The problem was with --use_fast_math, this seems to now have way lower precision for some operations. Disabled this flag and selectively use fast math functions. Did not find performance regression on GTX 460 after doing this.	2012-05-28 19:21:13 +00:00
Brecht Van Lommel	dd9c1b7fbf	Cycles: OpenCL image texture support, fix an attribute node issue and refactor feature enabling #defines a bit.	2012-05-13 12:32:44 +00:00
Brecht Van Lommel	5873301257	Sample as Lamp option for world shaders, to enable multiple importance sampling. By default lighting from the world is computed solely with indirect light sampling. However for more complex environment maps this can be too noisy, as sampling the BSDF may not easily find the highlights in the environment map image. By enabling this option, the world background will be sampled as a lamp, with lighter parts automatically given more samples. Map Resolution specifies the size of the importance map (res x res). Before rendering starts, an importance map is generated by "baking" a grayscale image from the world shader. This will then be used to determine which parts of the background are light and so should receive more samples than darker parts. Higher resolutions will result in more accurate sampling but take more setup time and memory. Patch by Mike Farnsworth, thanks!	2012-01-20 17:49:17 +00:00
Brecht Van Lommel	47853bf6f6	Cycles: OpenCL tweaks * Reduce kernel arguments size, helps compile for apple nvidia. * Fix use of unitialized variable in displace kernel. * Use build flags in opencl kernel md5 hash. * Reorganize code for kernel feature #defines a bit.	2011-11-22 13:15:19 +00:00
Brecht Van Lommel	cfbd6cf154	Cycles: * OpenCL now only uses GPU/Accelerator devices, it's only confusing if CPU device is used, easy to enable in the code for debugging. * OpenCL kernel binaries are now cached for faster startup after the first time compiling. * CUDA kernels can now be compiled and cached at runtime if the CUDA toolkit is installed. This means that even if the build does not have CUDA enabled, it's still possible to use it as long as you install the toolkit.	2011-09-09 12:04:39 +00:00
Brecht Van Lommel	bae896691a	Cycles: * Add alpha pass output, to use set Transparent option in Film panel. * Add Holdout closure (OSL terminology), this is like the Sky option in the internal renderer, objects with this closure show the background / zero alpha. * Add option to use Gaussian instead of Box pixel filter in the UI. * Remove camera response curves for now, they don't really belong here in the pipeline, should be moved to compositor. * Output full float values for rendering now, previously was only byte precision. * Add a patch from Thomas to get a preview passes option, but still disabled because it isn't quite working right yet. * CUDA: don't compile shader graph evaluation inline. * Convert tabs to spaces in python files.	2011-08-28 13:55:59 +00:00
Brecht Van Lommel	63d4bafff5	Cycles: some steps to getting OpenCL backend to compile.	2011-05-20 12:26:01 +00:00
Ton Roosendaal	da376e0237	Cycles render engine, initial commit. This is the engine itself, blender modifications and build instructions will follow later. Cycles uses code from some great open source projects, many thanks them: * BVH building and traversal code from NVidia's "Understanding the Efficiency of Ray Traversal on GPUs": http://code.google.com/p/understanding-the-efficiency-of-ray-traversal-on-gpus/ * Open Shading Language for a large part of the shading system: http://code.google.com/p/openshadinglanguage/ * Blender for procedural textures and a few other nodes. * Approximate Catmull Clark subdivision from NVidia Mesh tools: http://code.google.com/p/nvidia-mesh-tools/ * Sobol direction vectors from: http://web.maths.unsw.edu.au/~fkuo/sobol/ * Film response functions from: http://www.cs.columbia.edu/CAVE/software/softlib/dorf.php	2011-04-27 11:58:34 +00:00

43 Commits