Commit Graph

1698 Commits

Author SHA1 Message Date
Lukas Stockner
ef816f9cff Cycles: Fix the AO replacement option in the split kernel
Currently the code for it was inside the hair-specific part, so it wouldn't be enabled in hairless renders.
2017-04-11 01:07:49 +02:00
Sergey Sharybin
3b4cc5dfed Cycles: Workaround cubic volume filtering crashing on Linux
The issue was caused by recent change in inline policy.

There is some sort of memory corruption happening here, ASAN suggests
it's stack overflow issue. Not quite sure why it is happening tho and
was not able to solve anything here yet in the past hours.

Committing fix which works with a big TODO note.

The issue is visible on AVX2 machine when rendering cycles_reports_test.
2017-04-10 14:44:07 +02:00
Sergey Sharybin
90d85c7975 Cycles: Fix compilation error of AVX2 kernels with SSE optimization disabled 2017-04-10 14:44:04 +02:00
Sergey Sharybin
c3d393c1df Cycles: Cleanup, indentation and trailing whitespace 2017-04-10 14:44:04 +02:00
Mai Lavelle
b60d4800c6 Cycles: Fix building of CUDA kernels with compilers where C++11 is disabled 2017-04-08 07:12:04 -04:00
Sergey Sharybin
7d77b3e813 Cycles: Fix compilation error with certain CUDA and host compiler configuration
This seems to happen on Windows only, happened to Thomas and Nathan already.

Similar patch Thomas was showing, but i do not see it committted. So comitting
now in order to get more developers and users happy.
2017-04-07 18:28:38 +02:00
lazydodo
b332fc8f23 [Cycles/msvc] Get cycles_kernel compile time under control.
Ever since we merged the extra texture types (half etc) and spit kernel the compile time for cycles_kernel has been going out of control.

It's currently sitting at a cool 1295.762 seconds with our standard compiler (2013/x64/release)

I'm not entirely sure why msvc gets upset with it, but the inlining of matrix near the bottom of the tri-cubic 3d interpolator is the source of the issue, this patch excludes it from being inlined.

This patch bring it back down to a manageable 186 seconds. (7x faster!!)

with the attached bzzt.blend that @sergey  kindly provided i got the following results with builds with identical hashes

58:51.73 buildbot
58:04.23 Patched

it's really close, the slight speedup could be explained by the switch instead of having multiple if's (switches do generate more optimal code than a chain of if/else/if/else statements) but in all honesty it might just have been pure luck (dev box,very polluted, bad for benchmarks) regardless, this patch doesn't seem to slow down anything with my limited testing.

{F532336}

{F532337}

Reviewers: brecht, lukasstockner97, juicyfruit, dingto, sergey

Reviewed By: brecht, dingto, sergey

Subscribers: InsigMathK, sergey

Tags: #cycles

Differential Revision: https://developer.blender.org/D2595
2017-04-07 10:26:55 -06:00
Mai Lavelle
91b9db0724 Cycles: Change work pool and global size of split CPU for easier debugging 2017-04-07 06:06:08 -04:00
Mai Lavelle
8f85ee2fc9 Cycles: Fix indentation 2017-04-07 06:06:08 -04:00
Sergey Sharybin
ced8fff5de Fix T51051: Incorrect render on 32bit Linux
The issue was apparently caused by -fno-finite-math-only added to kernel.cpp
CFLAGS. For now just removed this flag from the kernel (we don't really want
it there at this point, and we don't have it for SSE/AVX optimized kernels).

But surely more investigation is needed here.
2017-03-30 11:37:31 +02:00
Sergey Sharybin
48fa2c83eb Cycles: Attempt to work around compilation errors of CUDA on sm_2x 2017-03-29 16:22:51 +02:00
Sergey Sharybin
be17445714 Cycles: Cleanup, indentation 2017-03-29 15:41:56 +02:00
Sergey Sharybin
cc7386ec6b Cycles: Remove toolkit-specific workaround from kernel 2017-03-29 15:07:53 +02:00
Sergey Sharybin
30bed91b78 Cycles: Fix compilation error with visibility flag disabled 2017-03-29 14:28:45 +02:00
Sergey Sharybin
0579eaae1f Cycles: Make all #include statements relative to cycles source directory
The idea is to make include statements more explicit and obvious where the
file is coming from, additionally reducing chance of wrong header being
picked up.

For example, it was not obvious whether bvh.h was refferring to builder
or traversal, whenter node.h is a generic graph node or a shader node
and cases like that.

Surely this might look obvious for the active developers, but after some
time of not touching the code it becomes less obvious where file is coming
from.

This was briefly mentioned in T50824 and seems @brecht is fine with such
explicitness, but need to agree with all active developers before committing
this.

Please note that this patch is lacking changes related on GPU/OpenCL
support. This will be solved if/when we all agree this is a good idea to move
forward.

Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner

Reviewed By: lukasstockner97, maiself, nirved, dingto

Subscribers: brecht

Differential Revision: https://developer.blender.org/D2586
2017-03-29 13:41:11 +02:00
Sergey Sharybin
6ea54fe9ff Cycles: Switch to reformulated Pluecker ray/triangle intersection
The intention of this commit it to address issues mentioned in the
reports T43865,T50164 and T50452.

The code is based on Embree code with some extra vectorization
to speed up single ray to single triangle intersection.

Unfortunately, such a fix is not coming for free. There is some
slowdown for AVX2 processors, mainly due to different vectorization
code, which caused different number of instructions to be executed
and different instructions-per-cycle counters. But on another hand
this commit makes pre-AVX2 platforms such as AVX and SSE4.1 a bit
faster. The prerformance goes as following:

              2.78c AVX2   2.78c AVX   Patch AVX2         Patch AVX
BMW            05:21.09     06:05.34    05:32.97 (+3.5%)   05:34.97 (-8.5%)
Classroom      16:55.36     18:24.51    17:10.41 (+1.4%)   17:15.87 (-6.3%)
Fishy Cat      08:08.49     08:36.26    08:09.19 (+0.2%)   08:12.25 (-4.7%
Koro           11:22.54     11:45.24    11:13.25 (-1.5%)   11:43.81 (-0.3%)
Barcelone      14:18.32     16:09.46    14:15.20 (-0.4%)   14:25.15 (-10.8%)

On GPU the performance is about 1.5-2% slower in my tests on GTX1080
but afraid we can't do much as a part of this chaneg here and
consider it a price to pay for more proper intersection check.

Made in collaboration with Maxym Dmytrychenko, big thanks to him!

Reviewers: brecht, juicyfruit, lukasstockner97, dingto

Differential Revision: https://developer.blender.org/D1574
2017-03-28 17:26:47 +02:00
Thomas Dinges
7a65f9b171 Cleanup: Resolve todo in CUDA voxel image code. 2017-03-27 22:36:26 +02:00
Sergey Sharybin
8d48ea0233 Cycles: Make shadow catcher an optional feature for OpenCL
Solves majority of speed regression on AMD OpenCL.
2017-03-27 10:47:14 +02:00
Hristo Gueorguiev
e07ffcbd1c Cycles: Add OpenCL support for shadow catcher feature
The title says it all actually.
2017-03-27 10:46:59 +02:00
Hristo Gueorguiev
8ada7f7397 Cycles: Remove ccl_addr_space from RNG passed to functions
Simplifies code quite a bit, making it shorter and easier to extend.
Currently no functional changes for users, but is required for the
upcoming work of shadow catcher support with OpenCL.
2017-03-27 10:46:28 +02:00
Sergey Sharybin
d14e39622a Cycles: First implementation of shadow catcher
It uses an idea of accumulating all possible light reachable across the
light path (without taking shadow blocked into account) and accumulating
total shaded light across the path. Dividing second figure by first one
seems to be giving good estimate of the shadow.

In fact, to my knowledge, it's something really similar to what is
happening in the denoising branch, so we are aligned here which is good.

The workflow is following:

- Create an object which matches real-life object on which shadow is
  to be catched.

- Create approximate similar material on that object.

  This is needed to make indirect light properly affecting CG objects
  in the scene.

- Mark object as Shadow Catcher in the Object properties.

Ideally, after doing that it will be possible to render the image and
simply alpha-over it on top of real footage.
2017-03-27 10:46:03 +02:00
Sergey Sharybin
5b45715f8a Cycles: Correct isfinite check used in integrator
Use fast-math friendly version of this function.

We should probably avoid unsafe fast math, but this is to be done with
real care with all the benchmarks properly done.

For now comitting much safer fix.
2017-03-24 15:39:33 +01:00
Sergey Sharybin
85a5fbf2ce Cycles: Workaround incorrect SSS with CUDA toolkit 8.0.61 2017-03-24 10:08:18 +01:00
Sergey Sharybin
27248c8636 Cycles: Remove unused macro 2017-03-23 17:59:02 +01:00
Sergey Sharybin
ba8c7d2ba1 Cycles: Use SSE-optimized version of triangle intersection for motion triangles
The title says it all actually. Gives up to 10% speedup on test scenes here
on i7-6800K.

Render times on GPU are unreliable here, but there might be some slowdown
caused by watertight nature of intersections.
2017-03-23 17:58:03 +01:00
Sergey Sharybin
a1348dde2e Cycles: Fix speed regression on GPU
Avoid construction of temporary array and make utility function force-inlined.
Additionally avoid calling float4_to_float3 twice.

This brings render times to the same values as before current patch series.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
2a5d7b5b1e Cycles: Use utility function for SSS triangle intersection
This effectively de-duplicates triangle intersection logic implemented
for both regular triangle and SSS triangle.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
a5b6742ed2 Cycles: Move watertight triangle intersection to an utility file
This way the code can be reused more easily.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
f8a999c965 Cycles: Move triangle intersection precalc to an util file
This is a preparation work for the followup commit which wil l move
remaining parts of Woop intersection logic to an utility file.

Doing it as a separate commit to keep changes more atomic and easier
to bisect when/if needed.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
b797a5ff78 Cycles: Cleanup, move utility function to utility file
Was an old TODO, this function is handy for some math utilities as well.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
1c5cceb7af Cycles: Move intersection math to own header file
There are following benefits:

- Modifying intersection algorithm will not cause so much re-compilation.
- It works around header dependency hell and allows us to use vectorization
  types much easier in there.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
e8ff06186e Cycles: Cleanup, inline AVX register construction from kernel global data
Currently should be no functional changes, preparing for some upcoming refactor.
2017-03-23 17:45:19 +01:00
Sergey Sharybin
2b44db4cfc Fix/workaround T50533: Transparency shader doesn't cast shadows with curve segments
There seems to be a compiler bug of MSVC2013. The issue does not happen on Linux and
does not happen on Windows when building with MSVC2015.

Since it's reallly a pain to debug release builds with MSVC2013 the AVX2 optimization
is disabled for curve sergemnts for this compiler.
2017-03-22 11:37:23 +01:00
Mai Lavelle
8fff6cc2f5 Cycles: Fix building of OpenCL kernels
Theres no overloading of functions in OpenCL so we can't make use of
`safe_normalize` with `float2`.
2017-03-20 22:55:52 -04:00
Sergey Sharybin
a201b99c5a Fix T50975: Cycles: Light sampling threshold inadvertently clamps negative lamps 2017-03-20 14:48:55 +01:00
Sergey Sharybin
18bf900b31 Fix T50990: Random black pixels in Cycles when rendering material with Multiscatter GGX 2017-03-20 12:07:41 +01:00
Sergey Sharybin
d6b4fb6429 Cycles: Fix mistake in previous split kernel commits
Own stupid mistake. Reported by nirved in IRC, thanks!
2017-03-17 11:55:59 +01:00
Sergey Sharybin
a58350b07f Cycles: Cleanup, indentation 2017-03-17 10:25:37 +01:00
Sergey Sharybin
e361adbca2 Cycles: Fix compilation error of LCG RNG 2017-03-17 09:58:08 +01:00
Mai Lavelle
60a344b43d Cycles: Fix handling of barriers 2017-03-17 01:54:04 -04:00
Sergey Sharybin
1cad64900e Cycles: Define ccl_local variables in kernel functions
Declaring ccl_local in a device function is not supported
by certain compilers.
2017-03-16 11:27:17 +01:00
Sergey Sharybin
1ff753baa4 Cycles: Workaround for compilation error caused by passing KernelGlobals
Pass globals as a bare pointer, same as it sued to be prior to split kernel rework.

AMD CPU platform and Intel OpenCL were complaining about this.

Perhaps we shouldn't pass globals as pointer at all, this isn't something what is
really portable and can cause issues on 32 bit perhaps.
2017-03-16 11:27:17 +01:00
Sergey Sharybin
26620f3f87 Cycles: Avoid some ccl_local in various kernels 2017-03-16 11:27:17 +01:00
Mai Lavelle
8dd0355c21 Cycles: Try to avoid infinite loops by catching invalid ray states 2017-03-14 06:22:57 -04:00
Sergey Sharybin
76acaefdd7 Cycles: Cleanup, wipe obviously outdated parts of split kernel comments 2017-03-13 17:16:16 +01:00
lazydodo
0c72008592 fix msvc warnings about unknown opencl pragmas 2017-03-13 10:08:14 -06:00
Sergey Sharybin
aa36c73c33 Cycles: Add missing header in the file 2017-03-13 16:59:09 +01:00
Hristo Gueorguiev
f169ff8b88 Fix T50925: Add AO approximation to split kernel 2017-03-13 11:15:58 +01:00
Sergey Sharybin
8794a43b68 Cycles: Make MESA compiler more happy
While this compiler is not officially supported yet, getting it to work is
a nice thing because more and more AMD cards will fall under MESA driver.

It's also nice to use explicit comparison with NULL, which makes it more
clear whether variable is a boolean or pointer. Even Rust enforces this!

Patch by Ian Bruce with own modifications.
2017-03-13 09:57:25 +01:00
Mai Lavelle
96868a3941 Fix T50888: Numeric overflow in split kernel state buffer size calculation
Overflow led to the state buffer being too small and the split kernel to
get stuck doing nothing forever.
2017-03-11 05:39:28 -05:00