blender

Author	SHA1	Message	Date
Sergey Sharybin	2c503d8303	Cycles: Restructure kernel files organization Since the kernel split work we're now having quite a few of new files, majority of which are related on the kernel entry points. Keeping those files in the root kernel folder will eventually make it really hard to follow which files are actual implementation of Cycles kernel. Those files are now moved to kernel/kernels/<device_type>. This way adding extra entry points will be less noisy. It is also nice to have all device-specific files grouped together. Another change is in the way how split kernel invokes logic. Previously all the logic was implemented directly in the .cl files, which makes it a bit tricky to re-use the logic across other devices. Since we'll likely be looking into doing same split work for CUDA devices eventually it makes sense to move logic from .cl files to header files. Those files are stored in kernel/split. This does not mean the header files will not give error messages when tried to be included from other devices and their arguments will likely be changed, but having such separation is a good start anyway. There should be no functional changes. Reviewers: juicyfruit, dingto Differential Revision: https://developer.blender.org/D1314	2015-05-22 16:31:34 +05:00
Thomas Dinges	53eab562b4	Cleanup: Remove some outdated comments related to split kernel.	2015-05-21 20:32:20 +02:00
Sergey Sharybin	7938bd1877	Cycles: Remove OSL from split headers Split kernel is mainly useful for GPUs which can not support OSL in visible future anyway.	2015-05-21 16:12:50 +05:00
Sergey Sharybin	329f704601	Cycles: Move utility atomics function to util_atomic.h No functional changes, just better to keep all atomic function in a single place, they might become handy later.	2015-05-21 16:12:50 +05:00
Sergey Sharybin	148ed4e05e	Cycles: Cleanup, synchronize name across file name, program and kernel names	2015-05-20 23:10:07 +05:00
Thomas Dinges	dae566894a	Cycles / OpenCL: Enable Camera Motion and Hair for AMD. Only enabled for the Experimental kernel though, so the feature set must be changed in the UI to use the features.	2015-05-17 18:46:25 +02:00
Campbell Barton	daeb3069cf	Cleanup: typos	2015-05-17 16:09:32 +10:00
Campbell Barton	31e96cbf96	Cleanup: style, spelling	2015-05-15 23:38:53 +10:00
Sergey Sharybin	c86a6f3efb	Cycles: Enable CMJ for Intel/NVidia experimental split kernels It is still disabled for AMD devices since can't test if it works fine on this hardware.	2015-05-15 13:22:47 +05:00
Sergey Sharybin	2ab909a88c	Cycles: Make experimental kernel build option more generic Previously it was explicitly mentioning it's NVidia kernel related option, but in fact it's also handy for the OpenCL kernel.	2015-05-15 13:22:47 +05:00
Sergey Sharybin	c9e8888f87	Cycles: Disable bake OpenCL kernel for NVidia devices prior to sm_30 Driver fails to compile kernel in reasonable time for those devices here, so for easier testing of the OpenCL split kernel work disabling bake kernel for now.	2015-05-15 13:22:47 +05:00
Sergey Sharybin	3c10ec96b5	Cycles: Enable object motion blur on Intel OpenCL platform This required allocating some memory related on object transform needed by ShaderData and currently it is done for all the platforms. Since we're targeting full feature-complete platforms this is rather acceptable at this point and in the future we'll do selective NO_HAIR/NO_SSS/NO_BLUR kernels. This is experimental still and in fact there're some major issues on NVidia platform and it's not really clear if it's a bug in compiler, some uninitizlied variable or other kind of issue.	2015-05-15 00:48:12 +05:00
Sergey Sharybin	f6c6dd44de	Cycles: Remove meaningless ifdef checks for features in device_opencl This file was actually checking for features enabled on CPU and surely all of them were enabled, so removing them does not cause any difference. ideally we'll need to do runtime feature detection and just pass some stuff as NULL to the kernel, or maybe also have variadic kernel entry points which is also possible quite easily.	2015-05-14 23:44:19 +05:00
Sergey Sharybin	5c34266383	Cycles: Enable camera motion blur in split kernel for Intel/NVidia It's good for testing and seems to work quite reliably here. This probably not totally cheap in terms of performance, but this we could solve quite easily by selective kernel compilation once other things are tested/proved to be reliable.	2015-05-14 23:35:19 +05:00
Sergey Sharybin	3d3d805b64	Cycles: Prepare code for OpenCL camera/motion blur The kernels are now compiling just fine, but there're some issues during rendering. This is still to be investigated.	2015-05-14 18:48:56 +05:00
Sergey Sharybin	5a63edb929	Cycles: Use special _auto versions of transform function in motion blur code Doing this as a separate commit so it's easier to revert in the future, once OpenCL 2.0 is becoming our requirement.	2015-05-14 18:48:56 +05:00
Sergey Sharybin	79aa50dc53	Cycles: Enable hair for split kernels when using Intel or NVidia drivers Apart from simply enabling this features needed changes to the code were done. Technical change, replacing SD access from "simple" structure to SOA.	2015-05-14 18:48:56 +05:00
Thomas Dinges	fc31bae66f	Cleanup: Avoid temp variable in portal sampling code.	2015-05-13 19:54:52 +02:00
Thomas Dinges	0a6e32173e	Cleanup / Cycles: De-Duplicate Portal data fetch and side check.	2015-05-13 16:05:30 +02:00
Sergey Sharybin	583fd3af65	Cycles: Fix typo in global space version of normal transform It was using direction transform, which is obviously wrong.	2015-05-10 00:53:32 +05:00
Sergey Sharybin	2840a5de8f	Cycles: Workaround for AMD compiler crashing building the split kernel It's a but in compiler but it's nice to have working kernel for until that bug is fixed.	2015-05-09 19:56:38 +05:00
George Kyriazis	7f4479da42	Cycles: OpenCL kernel split This commit contains all the work related on the AMD megakernel split work which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely someone else which we're forgetting to mention. Currently only AMD cards are enabled for the new split kernel, but it is possible to force split opencl kernel to be used by setting the following environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1. Not all the features are supported yet, and that being said no motion blur, camera blur, SSS and volumetrics for now. Also transparent shadows are disabled on AMD device because of some compiler bug. This kernel is also only implements regular path tracing and supporting branched one will take a bit. Branched path tracing is exposed to the interface still, which is a bit misleading and will be hidden there soon. More feature will be enabled once they're ported to the split kernel and tested. Neither regular CPU nor CUDA has any difference, they're generating the same exact code, which means no regressions/improvements there. Based on the research paper: https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf Here's the documentation: https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit Design discussion of the patch: https://developer.blender.org/T44197 Differential Revision: https://developer.blender.org/D1200	2015-05-09 19:52:40 +05:00
Sergey Sharybin	6fc1669679	Cycles: Initial work towards selective nodes support compilation The goal is to be able to compile kernel with nodes which are actually needed to render current scene, hence improving performance of the kernel, The idea is: - Have few node groups, starting with a group which contains nodes are used really often, and then couple of groups which will be extension of this one. - Have feature-based nodes disabling, so it's possible to disable nodes related to features which are not used with the currently used nodes group. This commit only lays down needed routines for this approach, actual split will happen later after gathering statistics from bunch of production scenes.	2015-05-09 19:22:16 +05:00
Sergey Sharybin	5068f7dc01	Cycles: Add utility function to graph to query number of closures used in it Currently unused but will be needed soon for the split kernel work.	2015-05-09 19:13:32 +05:00
Sergey Sharybin	d69c80f717	Cycles: Presumably correct workaround for addrspace in camera motion blur	2015-05-09 19:04:19 +05:00
Sergey Sharybin	c9133778cf	Cycles: Add CPU compat headers to some of the OSL implementation files This header was already included into some of the implementation files already, and this change is needed for some upcoming changes in the way how kernel_types.h works.	2015-05-09 19:04:16 +05:00
Thomas Dinges	900fc43bb4	Cleanup: Remove unused ray type flags. They were added for completeness, but it seems we don't need them.	2015-05-08 12:10:26 +02:00
Sergey Sharybin	9ca2b76a9f	Cycles: Cleanup, make it more clear what endif closes what ifdef	2015-05-07 15:02:43 +05:00
Campbell Barton	165598e49e	Correct typo: ifdef'd now, but obviously wrong	2015-05-07 10:12:12 +10:00
Sv. Lockal	7201f6d14c	Cycles: Use curve approximation for blackbody instead of lookup table Now we calculate color in range 800..12000 using an approximation a/x+bx+c for R and G and ((at + b)t + c)t + d) for B. Max absolute error for RGB for non-lut function is less than 0.0001, which is enough to get the same 8 bit/channel color as for OSL with a noticeable performance difference. However there is a slight visible difference between previous non-OSL implementation because of lookup table interpolation and offset-by-one mistake. The previous implementation gave black color outside of soft range (t > 12000), now it gives the same color as for 12000. Also blackbody node without input connected is being converted to value input at shader compile time. Reviewers: dingto, sergey Reviewed By: dingto Subscribers: nutel, brecht, juicyfruit Differential Revision: https://developer.blender.org/D1280	2015-05-05 06:11:54 +00:00
Thomas Dinges	4eab0e72b3	Cleanup: Update some comments and add ToDo.	2015-04-29 23:56:46 +02:00
Thomas Dinges	b3def11f5b	Cycles: Record all possible volume intersections for SSS and camera checks This replaces sequential ray moving followed with scene intersection with single BVH traversal, which gives us all possible intersections. Only implemented for CPU, due to qsort and a bigger memory usage on GPU which we rather avoid. GPU still uses the regular bvh volume intersection code, while CPU now uses the new code. This improves render performance for scenes with: a) Camera inside volume mesh b) SSS mesh intersecting a volume mesh/domain In simple volume files (not much geometry) performance is roughly the same (slightly faster). In files with a lot of geometry, the performance increase is larger. bmps.blend with a volume shader and camera inside the mesh, it renders ~10% faster here. Patch by Sergey and myself. Differential Revision: https://developer.blender.org/D1264	2015-04-29 23:31:06 +02:00
Sergey Sharybin	7aab5c6ca9	Cycles: Fix wrong termination criteria in SSS volume stack update Another issue spotted with Thomas.	2015-04-30 01:20:17 +05:00
Thomas Dinges	5e423775da	Cleanup: Move Cycles volume stack update for subsurface into kernel_volume.h.	2015-04-28 11:20:27 +02:00
Thomas Dinges	58a2b10a65	Cycles: Initialize portal variable directly, so we can avoid the one NULL check.	2015-04-27 23:12:53 +02:00
Lukas Stockner	f478c2cfbd	Cycles: Added support for light portals This patch adds support for light portals: objects that help sampling the environment light, therefore improving convergence. Using them tor other lights in a unidirectional pathtracer is virtually useless. The sampling is done with the area-preserving code already used for area lamps. MIS is used both for combination of different portals and for combining portal- and envmap-sampling. The direction of portals is considered, they aren't used if the sampling point is behind them. Reviewers: sergey, dingto, #cycles Reviewed By: dingto, #cycles Subscribers: Lapineige, nutel, jtheninja, dsisco11, januz, vitorbalbio, candreacchio, TARDISMaker, lichtwerk, ace_dragon, marcog, mib2berlin, Tunge, lopataasdf, lordodin, sergey, dingto Differential Revision: https://developer.blender.org/D1133	2015-04-28 01:30:16 +05:00
Sergey Sharybin	ae7d84dbc1	Cycles: Use native saturate function for CUDA This more a workaround for CUDA optimizer which can't optimize clamp(x, 0, 1) into a single instruction and uses 4 instructions instead. Original patch by @lockal with own modification: Don't make changes outside of the kernel. They don't make any difference anyway and term saturate() has a bit different meaning outside of kernel. This gives around 2% of speedup in Barcelona file, but in more complex shader setups with lots of math nodes with clamping speedup could be much nicer. Subscribers: dingto Projects: #cycles Differential Revision: https://developer.blender.org/D1224	2015-04-28 00:38:32 +05:00
Thomas Dinges	bc160d8a85	Cleanup: Code style.	2015-04-26 00:42:26 +02:00
Lukas Stockner	60c5a2f2d2	Cycles: Add Mirror ball mapping to camera panorama options The projection code was already in place, so this just exposes the option. Differential Revision: https://developer.blender.org/D1079	2015-04-25 23:51:56 +02:00
Campbell Barton	b82d571c85	Cleanup: style	2015-04-21 15:53:32 +10:00
Sergey Sharybin	828abaf11c	Cycles: Split BVH nodes storage into inner and leaf nodes This way we can get rid of inefficient memory usage caused by BVH boundbox part being unused by leaf nodes but still being allocated for them. Doing such split allows to save 6 of float4 values for QBVH per leaf node and 3 of float4 values for regular BVH per leaf node. This translates into following memory save using 01.01.01.G rendered without hair: Device memory size Device memory peak Global memory peak Before the patch: 4957 5051 7668 With the patch: 4467 4562 7332 The measurements are done against current master. Still need to run speed tests and it's hard to predict if it's faster or not: on the one hand leaf nodes are now much more coherent in cache, on the other hand they're not so much coherent with regular nodes anymore. Reviewers: brecht, juicyfruit Subscribers: venomgfx, eyecandy Differential Revision: https://developer.blender.org/D1236	2015-04-20 17:29:51 +05:00
Sergey Sharybin	bf11e362c5	Fix T44046: Cycles speed regression in 2.74 (CPU only) Issue was caused by MSVC not being able to optimize some code out in the same way as GCC/Clang does, so now that parts of code are explicitly unfolded in order to help compilers out. This makes speed loss much less drastic on my laptop. That's probably as good as we can do with MSVC without investing infinite amount of time looking trying to workaround the optimizer.	2015-04-08 18:47:25 +05:00
Sergey Sharybin	09a746b857	Cycles: Cleanup, typos	2015-04-08 01:15:38 +05:00
Sergey Sharybin	858f54f16e	Cycles: Cleanup, indentation	2015-04-07 22:41:08 +05:00
Sergey Sharybin	e2354e64d2	Cycles: Cleanup, spaces around assignment operator Did some bad spacing in recent commits, better to get rid of those so they does not confuse those who're working on sources.	2015-04-07 00:25:54 +05:00
Sergey Sharybin	c1d8ddacaf	Cycles: Avoid doing paranoid checks in filepath of builtin images Originally we thought it's needed in order to distinguish builtin file from filename which starts with '@', but the filepath is actually full path there and it's unlikely to have file system where '@' is a proper root character. Surprisingly this does not give visible speed differences, but it's still nice to get rid of redundant check.	2015-04-07 00:11:47 +05:00
Sergey Sharybin	7c19239bf9	Cycles: Support bultin 3d textures with OSL backend	2015-04-06 23:29:29 +05:00
Sergey Sharybin	a9bb8d8a73	Cycles: de-duplicate fast/approximate erf function calculation Our own implementation is in fact the same performance as in fast_math from OpenShadingLanguage, but implementation from fast_math is using explicit madd function, which increases chance of compiler deciding to use intrinsics.	2015-04-06 12:49:44 +05:00
Sergey Sharybin	ab2d05d958	Fix T44269: Typo in volume_attribute_float:geom_volume.h Was rather harmless typo since we either pass both dx,dy or pass both NULL.	2015-04-05 19:07:45 +05:00
Sergey Sharybin	b06962fcfe	Cycles: Avoid using lookup table for Beckmann slopes on GPU This patch is based on some work done in D788 and re-formulation from Beckmann implementation in OpenShadingLanguage. Skipping texture lookup helps a lot on GPUs where it's more expensive to access texture memory than to do some extra calculation in threads. CPU code still uses lookup-table based approach since this seems to be still faster (at least on computers i've got access to). This change gives about 2% speedup on BMW scene with GTX560TI.	2015-04-05 19:07:45 +05:00

1 2 3 4 5 ...

1167 Commits