Epsilon was quite arbitrary for GPU, replaced with checking for zero-sized faces.
It should solve both original report and the new one. After the release we can check
why GPU doesn't produce accurate math here and go to the root of the issue.
Found a way to make AVX2 CPUs happy by reshuffling instructions a bit,
so now there's no weird precision errors happening in there.
This solves some render speed regressions on CPU, but unfortunately
this doesn't help for GPU rendering.
Initial idea was to optimize calculation a bit by skipping calculation of actual
triangle edges and use vector from ray origin to triangles. In practice this
optimization didn't quite work in cases when origin point is too close to the
triangle.
Let's do 2.76 with a bit more complicated calculation, still looking into exact
reasons why watertight intersections fails in certain cases, but actual fix might
bit be ready so soon.
This fixes wrong eyes on the lady from T46013.
The issue was caused by some numeric instability in triangle intersection which
was visible on avx2 CPUs and GPUs (at least sm_20 here) but maybe some others
too.
Committing rather a workaround for now to be safe for the release, still need
some investigation.
From tests with grass field from Gooseberry project didn't see measurable
slowdown.
This gives quite the same problems as experimental CUDA kernels
and for until it's found a root cause of the problem we'd just
explicitly uninline the function.
Issue was caused by MSVC not being able to optimize some code out in the same
way as GCC/Clang does, so now that parts of code are explicitly unfolded in
order to help compilers out.
This makes speed loss much less drastic on my laptop. That's probably as good
as we can do with MSVC without investing infinite amount of time looking trying
to workaround the optimizer.
This argument was unused and got nicely optimized out. But once it
starts to be using registers are getting stressed really crazy,
causing slow down of render.
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
The fix was really flacky, in terms during speed benchmarks i had
abort() in the fallback block to be sure it never runs in production
scenes, but that affected on the optimization as well. Without this
abort there's quite bad slowdown of 5-7% on the renders even tho
the Pleucker fallback was never run.
This is all weird and for now reverting the change which affects on
all the production scenes and will look into alternative fixes for
the original issue with precision loss on huge planes.
This reverts commit 9489205c5c.
The issue was caused by numerical instability whrn having ray origin close to a huge
triangle, which could have aused bad ray distance check.
Watertight Woop intersection isn't really addressing such cases, it's dealing with
small triangles far away from the ray origin instead, so it's a bit tricky yo make
it working reliably.
While we're quite close to the release it's safer to do check in Pleaucker coordinates
if ray close to a huge triangle. Likely this additional check combined with some other
tweaks to the code doesn't cause measurable slowdown in the scenes tested here.
After the release we can play a bit more with this code in order to make it more
stable without Pleucker fallback.
From more investigation of the numeric failures in the kernel it appears
the check was rather correct. But in theory it;s also needed for the motion
triangles.
This is the same issue T43475: SSE4 code is more robust to non-finite values
in the ray origin/direction. So for now added a check before doing BVH traversal
for pre-SSE4 CPUs.
For sure actual root of the issue is a bit different and much more tricky to
solve, especially without disturbing render results too much. Still looking
into this.
In any case, it's kinda fine to have such a check, we might later make it to be
a kernel_assert() instead of just a return.
Previous fix didn't quite work well. For some reason everything worked fine when
using native nvcc in 32bit environment, but cross-compiling from 64bit platform
it was still running out of memory.
For now just made it so all the kernels are slower on 32bit CUDA as a temporary
solution. Either it'll be solved in next CUDA releases (by dropped 32bit? =\) or
we'll find better workaround.
Slowdown was caused by watertight intersection commit and follow-up workaorund
for compiler crash which uninlined utility function which rotates the ray.
Now it's only uninlined for sm_50 and sm_52 experimental kernels which are the
only ones which failed to compile.
Rendering still might be a bit slower but at least shouldn't be that dramatic.
It is possible that ray distance will be zero which would make intersection
refinement return NaN as the refined position which would later lead to all
sort of mathematical issues.
Don't think there are ways to improve intersection accuracy for such rays
so just return original intersection coordinate.
This should fix T43475.
TODO: Need to look into possible issues in Ashikhmin BSDF which might return
zero-length reflected/transmitted ray?
OpenCL apparently does not support templates, so the idea of generic
function for swapping is a bit of a failure. Now it is either inlined
into the code (in triangle intersection) or has specific implementation
for QBVH.
This is probably even better, because we can't create QBVH-specific
function in util_math anyway.
This issue doesn't happen with 6.5.12 and there's slight piece of hope it'll be
fixed in next toolkit releases..
For now we're forcing CUDA to not inline ray precalculation. This could lead to
some speed regression, but wouldn't expect it to be huge -- this code does not
run that often comparing to actual triangle intersection.
Most of them are not currently used but are essential for the further work.
- CPU kernels with SSE2 support will now have sse3b, sse3f and sse3i
- Added templatedversions of min4, max4 which are handy to use with register
variables.
- Added util_swap function which gets arguments by pointers.
So hopefully it'll be a portable version of std::swap.
Using this paper: Sven Woop, Watertight Ray/Triangle Intersection
http://jcgt.org/published/0002/01/05/paper.pdf
This change is expected to address quite reasonable amount of reports from the
bug tracker, plus it might help reducing the noise in some scenes.
Unfortunately, it's currently about 7% slower than the previous solution with
pre-computed triangle plane equations, but maybe with some smart tweaks to the
code (tests reshuffle, using SIMD in a nice way or so) we can avoid the speed
regression.
But perhaps smartest thing to do here would be to change single triangle / ray
intersection with multiple triangles / ray intersections. That's how Embree does
this and it's watertight single ray intersection is not any faster that this.
Currently only triangle intersection is modified accordingly to the paper, in
the future we would also want to modify the node / ray intersection.
Reviewers: brecht, juicyfruit
Subscribers: dingto, ton
Differential Revision: https://developer.blender.org/D819
This way extending intersection routines with some pre-calculation step wouldn't
explode the single file size, hopefully keeping them all in a nice maintainable
state.