Commit Graph

50 Commits

Author SHA1 Message Date
Brecht Van Lommel
f23fecf306 Fix use of uninitialized variable in recent SSS fix. 2016-07-24 16:40:30 +02:00
Sergey Sharybin
9946cca146 Fix T48860: Cycles SSS artifacts with spatially split BVH
The issue was caused by SSS intersection code gathering all
intersections without check for duplicated ones. This caused
situations when same intersection will be recorded twice in
the case if triangle is shared by several BVH nodes.

Usually this is handled by checking intersection distance
after sorting intersections (in shadow_blocked for example)
but for SSS we don't do such sorting and using number of
intersections to calculate various things.

Didn't find anything smarter than to check intersection
distance in triangle_intersect_subsurface().

This solves render artifacts in the cost of 1.5% slowdown
of extreme case rendering (SSS object filling in whole
FullHD screen).

Reviewers: brecht

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D2105
2016-07-18 10:04:20 +02:00
Sergey Sharybin
17e7454263 Cycles: Reduce memory usage by de-duplicating triangle storage
There are several internal changes for this:

First idea is to make __tri_verts to behave similar to __tri_storage,
meaning, __tri_verts array now contains all vertices of all triangles
instead of just mesh vertices. This saves some lookup when reading
triangle coordinates in functions like triangle_normal().

In order to make it efficient needed to store global triangle offset
somewhere. So no __tri_vindex.w contains a global triangle index which
can be used to read triangle vertices.

Additionally, the order of vertices in that array is aligned with
primitives from BVH. This is needed to keep cache as much coherent as
possible for BVH traversal. This causes some extra tricks needed to
fill the array in and deal with True Displacement but those trickery
is fully required to prevent noticeable slowdown.

Next idea was to use this __tri_verts instead of __tri_storage in
intersection code. Unfortunately, this is quite tricky to do without
noticeable speed loss. Mainly this loss is caused by extra lookup
happening to access vertex coordinate.

Fortunately, tricks here and there (i,e, some types changes to avoid
casts which are not really coming for free) reduces those losses to
an acceptable level. So now they are within couple of percent only,

On a positive site we've achieved:

- Few percent of memory save with triangle-only scenes. Actual save
  in this case is close to size of all vertices.

  On a more fine-subdivided scenes this benefit might become more
  obvious.

- Huge memory save of hairy scenes. For example, on koro.blend
  there is about 20% memory save. Similar figure for bunny.blend.

This memory save was the main goal of this commit to move forward
with Hair BVH which required more memory per BVH node. So while
this sounds exciting, this memory optimization will become invisible
by upcoming Hair BVH work.

But again on a positive side, we can add an option to NOT use Hair
BVH and then we'll have same-ish render times as we've got currently
but will have this 20% memory benefit on hairy scenes.
2016-07-07 17:25:48 +02:00
Sergey Sharybin
b595a692c8 Cycles: Limit degenerated triangle check got CUDA only
OpenCL seems to work fine here, and for some reason that comparison was
giving compilation error on OpenCL here.

Better to compile OpenCL kernel than to be fully robust to weird corner
cases.
2016-06-07 15:48:56 +02:00
Sergey Sharybin
f54a98a1c5 Cycles: Simplify check for degenerated faces on GPU
Still not sure how to properly solve the issue, needs some trickery to get
actual optimized values from intersection function (using printf() avoids
some optimization and makes stuff render correct).

For the time being let's just simplify check.
2016-06-03 10:36:04 +02:00
Sergey Sharybin
6cd13a221f Cycles: Rename tri_woop to tri_storage
It's no longer a pre-computed data and just a storage of triangle
coordinates which are faster to access to.
2016-04-11 17:18:14 +02:00
Sergey Sharybin
700722f686 Cycles: Cleanup, indent nested preprocessor directives
Quite straightforward, main trick is happening in path_source_replace_includes().

Reviewers: brecht, dingto, lukasstockner97, juicyfruit

Differential Revision: https://developer.blender.org/D1794
2016-03-25 13:55:42 +01:00
Sergey Sharybin
c93069083e Cycles: Tweaks for 32bit CUDA binaries
Tweak some inline policies. Not totally crazy yet, and in fact we now
have one less ifdef statement now.
2016-02-15 19:11:02 +01:00
Sergey Sharybin
72e31d6a72 Cycles: Always inline triangle precalc for CUDA devices
Since the SSS changes compiling Experimental sm_52 kernel seems
to work just fine.
2016-01-11 21:41:00 +05:00
Sergey Sharybin
a6bbf05ba6 Cycles: Fix wrong SSS intersection refinement when this option is disabled
The code is disabled by default, but we'd better keep it all correct.
2015-12-02 03:14:54 +05:00
Sergey Sharybin
8bca34fe32 Cysles: Avoid having ShaderData on the stack
This commit introduces a SSS-oriented intersection structure which is replacing
old logic of having separate arrays for just intersections and shader data and
encapsulates all the data needed for SSS evaluation.

This giver a huge stack memory saving on GPU. In own experiments it gave 25%
memory usage reduction on GTX560Ti (722MB vs. 946MB).

Unfortunately, this gave some performance loss of 20% which only happens on GPU.
This is perhaps due to different memory access pattern. Will be solved in the
future, hopefully.

Famous saying: won in memory - lost in time (which is also valid in other way
around).
2015-11-25 13:01:22 +05:00
Sergey Sharybin
47b1279762 Cycles: Watertight fix for SSS intersection
Same as previous commit, just was missing in there.
2015-10-22 22:10:40 +05:00
Sergey Sharybin
f84cbae43e Cycles: Fix for watertight intersection
It was possible to miss some intersection caused by wrong barycentric
coordinates sign.

Cases when one of the coordinate is zero and other are negative was not
handled correct.
2015-10-22 22:07:28 +05:00
Sergey Sharybin
3cee28ebf3 Fix T46143: Faces missing with GPU render
Epsilon was quite arbitrary for GPU, replaced with checking for zero-sized faces.

It should solve both original report and the new one. After the release we can check
why GPU doesn't produce accurate math here and go to the root of the issue.
2015-09-17 17:21:17 +05:00
Sergey Sharybin
1a04179802 Cycles: Cleanup, typo
Spotted by Campbell, thanks!
2015-09-09 14:25:43 +05:00
Sergey Sharybin
d13a0e8f4a Cycles: Limit triangle magnitude check for only GPU
Found a way to make AVX2 CPUs happy by reshuffling instructions a bit,
so now there's no weird precision errors happening in there.

This solves some render speed regressions on CPU, but unfortunately
this doesn't help for GPU rendering.
2015-09-09 13:39:36 +05:00
Sergey Sharybin
46d2abf78f Cycles: Only use ascii in comments 2015-09-09 13:39:36 +05:00
Sergey Sharybin
1a7eca3c54 Fix T46034: OpenCL kernel compilation error in latest buildbot
Simply expanded expression, so no float4->float3 conversion happens.
2015-09-07 15:02:44 +05:00
Sergey Sharybin
713ce037ab Cycles: Fix wrong check for zero-sized triangles
Initial idea was to optimize calculation a bit by skipping calculation of actual
triangle edges and use vector from ray origin to triangles. In practice this
optimization didn't quite work in cases when origin point is too close to the
triangle.

Let's do 2.76 with a bit more complicated calculation, still looking into exact
reasons why watertight intersections fails in certain cases, but actual fix might
bit be ready so soon.

This fixes wrong eyes on the lady from T46013.
2015-09-04 20:06:31 +05:00
Sergey Sharybin
7dc75ea8f4 Fix T45904: Cycles bug after recent triangle intersect changes
Calculated cross product from wrong vectors by accident.
2015-08-25 18:32:11 +02:00
Sergey Sharybin
2fb639deed Fix T45778: Objects scaled to 0 cause black artifacts with Static BVH
The issue was caused by some numeric instability in triangle intersection which
was visible on avx2 CPUs and GPUs (at least sm_20 here) but maybe some others
too.

Committing rather a workaround for now to be safe for the release, still need
some investigation.

From tests with grass field from Gooseberry project didn't see measurable
slowdown.
2015-08-24 21:23:49 +02:00
Sergey Sharybin
a95b0e0e9d Cycles: Another fix for OSX, sm_50 experimental actually also fails to compile
Didn't notice it originally because compilation was threaded.
2015-06-20 19:40:23 +02:00
Sergey Sharybin
5e2835037a Cycles: Tweak to previous commit, experimental sm_52 works on Linux but not OSX 2015-06-20 19:01:24 +02:00
Sergey Sharybin
34d665a4a4 Cycles: Un-inline triangle_intersect_precalc() on Apple OpenCL
This gives quite the same problems as experimental CUDA kernels
and for until it's found a root cause of the problem we'd just
explicitly uninline the function.
2015-06-20 18:00:30 +02:00
Sergey Sharybin
845854959f Cycles: Cleanup, make it more obvious which platform requires workaround for triangle intersection
Should be no functional changes.
2015-06-20 17:01:21 +02:00
Sergey Sharybin
2ab909a88c Cycles: Make experimental kernel build option more generic
Previously it was explicitly mentioning it's NVidia kernel related option,
but in fact it's also handy for the OpenCL kernel.
2015-05-15 13:22:47 +05:00
Sergey Sharybin
3d3d805b64 Cycles: Prepare code for OpenCL camera/motion blur
The kernels are now compiling just fine, but there're some issues
during rendering. This is still to be investigated.
2015-05-14 18:48:56 +05:00
Sergey Sharybin
bf11e362c5 Fix T44046: Cycles speed regression in 2.74 (CPU only)
Issue was caused by MSVC not being able to optimize some code out in the same
way as GCC/Clang does, so now that parts of code are explicitly unfolded in
order to help compilers out.

This makes speed loss much less drastic on my laptop. That's probably as good
as we can do with MSVC without investing infinite amount of time looking trying
to workaround the optimizer.
2015-04-08 18:47:25 +05:00
Sergey Sharybin
09a746b857 Cycles: Cleanup, typos 2015-04-08 01:15:38 +05:00
Sergey Sharybin
858f54f16e Cycles: Cleanup, indentation 2015-04-07 22:41:08 +05:00
Sergey Sharybin
f1494edf78 Cycles: Make SSS intersection closer to regular triangle intersection 2015-04-01 21:20:04 +05:00
Sergey Sharybin
394b947a50 Cycles: Remove unused direction from triangle intersection functions
This argument was unused and got nicely optimized out. But once it
starts to be using registers are getting stressed really crazy,
causing slow down of render.
2015-04-01 21:08:12 +05:00
Sergey Sharybin
5ff132182d Cycles: Code cleanup, spaces around keywords
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.

Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
2015-03-28 00:15:15 +05:00
Sergey Sharybin
dce16d57dc Revert "Fix T43865: Cycles: Watertight rendering produces artifacts on a huge plane"
The fix was really flacky, in terms during speed benchmarks i had
abort() in the fallback block to be sure it never runs in production
scenes, but that affected on the optimization as well. Without this
abort there's quite bad slowdown of 5-7% on the renders even tho
the Pleucker fallback was never run.

This is all weird and for now reverting the change which affects on
all the production scenes and will look into alternative fixes for
the original issue with precision loss on huge planes.

This reverts commit 9489205c5c.
2015-03-12 18:24:53 +05:00
Sergey Sharybin
9489205c5c Fix T43865: Cycles: Watertight rendering produces artifacts on a huge plane
The issue was caused by numerical instability whrn having ray origin close to a huge
triangle, which could have aused bad ray distance check.

Watertight Woop intersection isn't really addressing such cases, it's dealing with
small triangles far away from the ray origin instead, so it's a bit tricky yo make
it working reliably.

While we're quite close to the release it's safer to do check in Pleaucker coordinates
if ray close to a huge triangle. Likely this additional check combined with some other
tweaks to the code doesn't cause measurable slowdown in the scenes tested here.

After the release we can play a bit more with this code in order to make it more
stable without Pleucker fallback.
2015-03-05 18:55:30 +05:00
Sergey Sharybin
d544bc5cd5 Cycles: Fix embarrassing type remained after getting rid of utility SWAP() 2015-03-04 00:16:21 +05:00
Sergey Sharybin
edb7195f27 Cycles: Bring back distance check in re-intersection
From more investigation of the numeric failures in the kernel it appears
the check was rather correct. But in theory it;s also needed for the motion
triangles.
2015-02-10 19:07:55 +05:00
Sergey Sharybin
298d8681a0 Fix T43596: Refraction BSDF crashes blender on pre-sse4 CPU
This is the same issue T43475: SSE4 code is more robust to non-finite values
in the ray origin/direction. So for now added a check before doing BVH traversal
for pre-SSE4 CPUs.

For sure actual root of the issue is a bit different and much more tricky to
solve, especially without disturbing render results too much. Still looking
into this.

In any case, it's kinda fine to have such a check, we might later make it to be
a kernel_assert() instead of just a return.
2015-02-10 17:36:05 +05:00
Sergey Sharybin
b83d851901 Cycles: Another attempt to solve 32bit CUDA kernel
Previous fix didn't quite work well. For some reason everything worked fine when
using native nvcc in 32bit environment, but cross-compiling from 64bit platform
it was still running out of memory.

For now just made it so all the kernels are slower on 32bit CUDA as a temporary
solution. Either it'll be solved in next CUDA releases (by dropped 32bit? =\) or
we'll find better workaround.
2015-02-09 16:14:44 +05:00
Sergey Sharybin
da06dab4e5 Cycles: Use pre-aligned triangle vertex coordinates for subsurface intersection
This gives small speedup (around 2% in quick tests) for ray scattering.
2015-02-04 14:49:19 +05:00
Sergey Sharybin
432e478f43 Cycles: Further tweaks to T43511 to solve compilation error on 32bit platforms 2015-02-02 22:09:02 +05:00
Sergey Sharybin
31263192bb Fix T43511: Major slow down with many instanced objects in cycles GPU
Slowdown was caused by watertight intersection commit and follow-up workaorund
for compiler crash which uninlined utility function which rotates the ray.

Now it's only uninlined for sm_50 and sm_52 experimental kernels which are the
only ones which failed to compile.

Rendering still might be a bit slower but at least shouldn't be that dramatic.
2015-02-02 17:35:57 +05:00
Sergey Sharybin
3f5771475d Cycles: Don't perform re-intersection if ray distance is zero
It is possible that ray distance will be zero which would make intersection
refinement return NaN as the refined position which would later lead to all
sort of mathematical issues.

Don't think there are ways to improve intersection accuracy for such rays
so just return original intersection coordinate.

This should fix T43475.

TODO: Need to look into possible issues in Ashikhmin BSDF which might return
zero-length reflected/transmitted ray?
2015-01-31 01:49:48 +05:00
Sergey Sharybin
2a8a56929b Cycles: Fix unneeded int/float conversion happened in previous commit 2015-01-02 17:21:24 +05:00
Sergey Sharybin
4f2583ee13 Fix T43027: OpenCL kernel compilation broken after QBVH
OpenCL apparently does not support templates, so the idea of generic
function for swapping is a bit of a failure. Now it is either inlined
into the code (in triangle intersection) or has specific implementation
for QBVH.

This is probably even better, because we can't create QBVH-specific
function in util_math anyway.
2015-01-02 14:58:01 +05:00
Sergey Sharybin
fe06ec82a9 Cycles: Workaround CUDA 6.5.16 error after watertight commit
This issue doesn't happen with 6.5.12 and there's slight piece of hope it'll be
fixed in next toolkit releases..

For now we're forcing CUDA to not inline ray precalculation. This could lead to
some speed regression, but wouldn't expect it to be huge -- this code does not
run that often comparing to actual triangle intersection.
2014-12-25 14:15:37 +05:00
Thomas Dinges
4ab821c675 Cleanup: Typo fixes for comments. 2014-12-25 02:42:06 +01:00
Sergey Sharybin
ab8d9c4b88 Cycles: Add some utility functions and structures
Most of them are not currently used but are essential for the further work.

- CPU kernels with SSE2 support will now have sse3b, sse3f and sse3i

- Added templatedversions of min4, max4 which are handy to use with register
  variables.

- Added util_swap function which gets arguments by pointers.
  So hopefully it'll be a portable version of std::swap.
2014-12-25 02:50:49 +05:00
Sergey Sharybin
f770bc4757 Cycles: Implement watertight ray/triangle intersection
Using this paper: Sven Woop, Watertight Ray/Triangle Intersection

  http://jcgt.org/published/0002/01/05/paper.pdf

This change is expected to address quite reasonable amount of reports from the
bug tracker, plus it might help reducing the noise in some scenes.

Unfortunately, it's currently about 7% slower than the previous solution with
pre-computed triangle plane equations, but maybe with some smart tweaks to the
code (tests reshuffle, using SIMD in a nice way or so) we can avoid the speed
regression.

But perhaps smartest thing to do here would be to change single triangle / ray
intersection with multiple triangles / ray intersections. That's how Embree does
this and it's watertight single ray intersection is not any faster that this.

Currently only triangle intersection is modified accordingly to the paper, in
the future we would also want to modify the node / ray intersection.

Reviewers: brecht, juicyfruit

Subscribers: dingto, ton

Differential Revision: https://developer.blender.org/D819
2014-12-25 02:50:49 +05:00
Sergey Sharybin
f4df3ec05a Cycles: Move triangle intersection functions into own file
This way extending intersection routines with some pre-calculation step wouldn't
explode the single file size, hopefully keeping them all in a nice maintainable
state.
2014-12-25 02:50:48 +05:00