The idea is to switch from allocating separate buffers for shader data's
structure of arrays to allocating one huge memory block and do some index
trickery to make it accessed as SOA.
This saves quite reasonable amount of lines of code in device_opencl and
also makes it possible to get rid of special declaration of ShaderData
structure.
As a side effect it also makes it easier to experiment with SOA vs. AOS
for split kernel.
Works fine here on NVidia GTX580, Intel CPU amd AMD Fiji cards.
Reviewers: #cycles, brecht, juicyfruit, dingto
Differential Revision: https://developer.blender.org/D1593
It was mainly unfinished code for volume in a split kernel which
should be done differently anyway to avoid such a code copy-paste.
The code didn't really work, so likely nobody will cry.
Use KernelGlobals to access all the global arrays for the intermediate
storage instead of passing all this storage things explicitly.
Tested here with Intel OpenCL, NVIDIA GTX580 and AMD Fiji, didn't see
any artifacts, so guess it's all good.
Reviewers: juicyfruit, dingto, lukasstockner97
Differential Revision: https://developer.blender.org/D1736
There is no function pointers in OpenCL specification. For as long
as we want to support this platform we should follow the specifications.
While the code is not totally optimal now, it should not be that huge
of performance issue on CPU since it does jump tables just nicely, so
it's not that much extra computation here.
The combined pass is built with the contributions the user finds fit.
It is useful for lightmap baking, as well as non-view dependent effects
baking.
The manual will be updated once we get closer to the 2.77 release.
Meanwhile the new page can be found here:
http://dalaifelinto.com/blender-manual/render/cycles/baking.html
Reviewers: sergey, brecht
Differential Revision: https://developer.blender.org/D1674
This commit removes the experimental CUDA kernel, making SSS and CMJ
regular features.
Several improvements have been made in the past few
weeks (thanks Sergey!) which make SSS render several times faster (2-3x
compared to 2.76b) on the GPU, and the increased VRAM usage has also been
fixed. Therefore the experimental kernel is no longer needed.
Differential Revision: https://developer.blender.org/D1726
Manual has been updated: too:
https://www.blender.org/manual/render/cycles/features.html
The goal is to make Experimental kernel closer in performance to the
official kernel, avoiding spills and such.
There should not be big impact on official kernel, own tests showed
few percent performance drop on laptop's GPU. CPU was always the
same speed on AVX, AVX2 and SSE4.1 CPUs i've been testing here.
This seems to be the last essential step before we can get rid of
Experimental kernel and enable SSS officially on GPU without causing
some major performance issues.
Surely some more tweaks are possibly required, but that we can do
for until cows go home anyway.
While previous code was already compiling with OSL 1.6 it was using some symbols
which were considered deprecated in upstream.
This commit adds some ifdefs, but soon we'll get rid of all them rather soon
with the upcoming OIIO/OSL update.
This commit changes the way how we pass bounce information to the Light
Path node. Instead of manualy copying the bounces into ShaderData, we now
directly pass PathState. This reduces the arguments that we need to pass
around and also makes it easier to extend the feature.
This commit also exposes the Transmission Bounce Depth to the Light Path
node. It works similar to the Transparent Depth Output: Replace a
Transmission lightpath after X bounces with another shader, e.g a Diffuse
one. This can be used to avoid black surfaces, due to low amount of max
bounces.
Reviewed by Sergey and Brecht, thanks for some hlp with this.
I tested compilation and usage on CPU (SVM and OSL), CUDA, OpenCL Split
and Mega kernel. Hopefully this covers all devices. :)
While SCons building system was serving us really good for ages it's no longer
having much attention by the developers and started to become quite a difficult
task to maintain.
What's even worse -- there started to be quite serious divergence between SCons
and CMake which was only accumulating over the releases now. The fact that none
of the active developers are really using SCons and that our main studio is also
using CMake spotting bugs in the SCons builds became quite a difficult task and
we aren't always spotting them in time.
Meanwhile CMake became really mature building system which is available on every
platform we support and arguably it's also easier and more robust to use.
This commit includes:
- Removal of actual SCons building system
- Removal of SCons git submodule
- Removal of documentation which is stored in the sources and covers SCons
- Tweaks to the buildbot master to stop using SCons submodule
(this change requires deploying to the server)
- Tweaks to the install dependencies script to skip installing or mentioning
SCons building system
- Tweaks to various helper scripts to avoid mention of SCons folders/files
as well
Reviewers: mont29, dingto, dfelinto, lukastoenne, lukasstockner97, brecht, Severin, merwin, aligorith, psy-fi, campbellbarton, juicyfruit
Reviewed By: campbellbarton, juicyfruit
Differential Revision: https://developer.blender.org/D1680
This commit adds "Bands Saw" and "Rings Saw" to the options for the Wave texture node in Cycles, behaving similar to the Saw option in BI textures.
Requested by @cekuhnen on BA.
Reviewers: dingto, sergey
Subscribers: cekuhnen
Differential Revision: https://developer.blender.org/D1699
This is an attempt to emulate real CMOS cameras which reads sensor by scanlines
and hence different scanlines are sampled at a different moment in time, which
causes so called rolling shutter effect. This effect will, for example, make
vertical straight lines being curved when doing horizontal camera pan.
This is controlled by the Shutter Type option in the Motion Blur panel.
Additionally, since scanline sampling is not instantaneous it's possible to have
motion blur on top of rolling shutter.
This is controlled by the Rolling Shutter Time slider which controls balance
between pure rolling shutter effect and pure motion blur effect.
Reviewers: brecht, juicyfruit, dingto, keir
Differential Revision: https://developer.blender.org/D1624
Vector mapping node was doing some weird mapping of both original and mapped
coordinates. Mapping of original coordinates was caused by the clamping nature
of the LUT generated from the node. Mapping of the mapped value again was quite
totally obscure -- one needed to constantly keep in mind that actual value will
be scaled up and moved down.
This commit makes it so values in the vector curve mapping are always absolute.
In fact, it is now behaving quite the same as RGB curve mapping node and the
code could be de-duplicated. Keeping the code duplicated for a bit so it's more
clear what exact parts of the node changed.
Reviewers: brecht
Subscribers: bassamk
Differential Revision: https://developer.blender.org/D1672
This makes it possible to move some parts of evaluation from host to the device
and hopefully reduce memory usage by avoid having full RGBA buffer on the host.
Reviewers: juicyfruit, lukasstockner97, brecht
Reviewed By: lukasstockner97, brecht
Differential Revision: https://developer.blender.org/D1702
Main goal is to make kernel signatures editing easier and less prone to the
errors caused by missing function signature update or so.
This will also make it easier to add new CPU architectures.
Reviewers: juicyfruit, dingto, lukasstockner97, brecht
Reviewed By: dingto, lukasstockner97, brecht
Differential Revision: https://developer.blender.org/D1703
Previously RGB Curves node will clamp input to 0..1 which is rather useless
when one wants to use HDR image textures and do bit of correction on them.
Now kernel code supports extrapolation of baked LUT based on first/last two
table points and performs linear extrapolation.
The only tricky part is to guess the range to bake the LUT for. Currently
it's using simple approach -- minmax of the input curves. While this behaves
ok for the simple cases it's easy to trick the system up causing incorrect
results.
Not sure we can solve those issues in a general case and since the new code
is giving more expected results it's not that bad actually. In the worst
case artist migh always create explicit point to make sure LUT is created
for the needed HDR range.
Reviewers: brecht, juicyfruit
Subscribers: sebastian_k
Differential Revision: https://developer.blender.org/D1658
This must have happened months ago, but as I did not `make clean` any build folder since then,
so only noted that today.
Issue is same as dirty patch we have to apply to ODL sources before building it in install_deps.sh - for
some mysterious reason, it has become impossible to compoile .osl files into .oso ones without
giving explicit output file name (otherwise it just produces `.oso` file - utterly stupid and useless).
We could probably fix that in own OSL source, but think being explicit here does not hurt anyway, so...
Let's go the easy way.
There were multiple issues which are solved now:
- It was possible that ray wouldn't be bounced off the BSSRDF, for example
when PDF or shader eval is zero. In this case PathState might have been
left in pre-bounced state which would have been gave incorrect shading
results.
This is solved by having separate PathState for each of the hits.
- Path radiance summing wasn't happening correct as well, indirect rays
were using wrong path radiance in the case when there were more than
one hit recorded.
This is now using a bit trickier state machine which calculates path
radiance for just SSS (both direct and indirect) and then sums it back
to the final radiance.
- Previous commit wasn't totally correct either and was an induced bug
due to wrong path state left from the "un-happened" ray bounce.
There should be no special case happening here, BSSRDFs will be replaced
with diffuse ones due to PATH_RAY_DIFFUSE_ANCESTOR flag.
- Merged back codebases for "delayed" and "immediate" indirect SSS ray
tracing, hopefully making it easier to maintain the codebase.
Sure this changes brings memory usage back by about 4-5%, but overall
it's still about 2x memory reduction for the experimental kernel here.
Thanks Brecht for the review!
This is actually how it was intended to work, just didn't notice it wasn't
really happening in the main ray loop.
Solves some memory issues reported in T46880.