Previously we picked one of the RGB channels with equal probability, but this
works poorly in a dense volume after many bounces. Now we take into account
the throughput and single scattering albedo.
This makes it a little more practical to do brute force SSS with volumes, but
is still very inefficient because we do direct light sampling at every volume
bounce even when inside an opaque mesh. In theory there could be a light inside
the mesh so we can't automatically disable direct lighting.
Goal is to reduce OpenCL kernel recompilations.
Currently viewport renders are still set to use 64 closures as this seems to
be faster and we don't want to cause a performance regression there. Needs
to be investigated.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2775
With a Titan Xp, reduces path trace local memory from 1092MB to 840MB.
Benchmark performance was within 1% with both RX 480 and Titan Xp.
Original patch was implemented by Sergey.
Differential Revision: https://developer.blender.org/D2249
Similar to what we did for area lights previously, this should help
preserve stratification when using multiple BSDFs in theory. Improvements
are not easily noticeable in practice though, because the number of BSDFs
is usually low. Still nice to eliminate one sampling dimension.
Previously the Sobol pattern suffered from some correlation issues that
made the outline of objects like a smoke domain visible. This helps
simplify the code and also makes some other optimizations possible.
This was needed when we accessed OSL closure memory after shader evaluation,
which could get overwritten by another shader evaluation. But all closures
are immediatley converted to ShaderClosure now, so no longer needed.
Also pass by value and don't write back now that it is just a hash for seeding
and no longer an LCG state. Together this makes CUDA a tiny bit faster in my
tests, but mainly simplifies code.
Was some mismatch in address space. Seems to be caused by recent additions.
Additionally, moved decoupled ray marching functions under ifdef, so they
don't try to use malloc() functions.
Thanks Mai for testing the patch!
Simplifies code quite a bit, making it shorter and easier to extend.
Currently no functional changes for users, but is required for the
upcoming work of shadow catcher support with OpenCL.
Decoupled ray marching is not supported yet.
Transparent shadows are always enabled for volume rendering.
Changes in kernel/bvh and kernel/geom are from Sergey.
This simiplifies code significantly, and prepares it for
record-all transparent shadow function in split kernel.
Discard the whole volume stack on the last bounce (but keep
world volume if present).
Volumes are expected to be closed manifol meshes, meaning if
ray entered the volume there should be an intersection event
of ray exisintg the volume. Case when ray hit nothing and
there are still non-world volumes in the stack can happen in
either of cases.
1. Mesh is not closed manifold.
Such configurations are not really supported anyway and should
not be used.
Previous code would have consider the infinite length of the
ray to sample across, so render result wasn't really correct
anyway.
2. Exit intersection is more far away than the camera far
clip distance.
This case also will behave differently now, but previously it
wasn't really correct either, so it's not like we're breaking
something which was working as expected.
3. We missed exit event due to intersection precision issues.
This is exact the case which this patch fixes and avoid
fireflies.
4. Volume has Camera only visibility (all the rest visibility
is set to off)
This is what could be considered a regression but could be
solved quite easily by checking volume stack's objects flags
and keep entries which doesn't have Volume Scatter visibility
(or even better: ensure Volume Scatter visibility for objects
with volume closure),
Fixes T46108: Cycles - Overlapping emissive volumes generates unexpected bright hotspots around the intersection
Also fixes fireflies appearing on the edges of cube with
emissive volue.
Reviewers: juicyfruit, brecht
Reviewed By: brecht
Maniphest Tasks: T46108
Differential Revision: https://developer.blender.org/D2212
There were two cases where correlation issues were obvious:
- File from T38710 was giving issues in 2.78a again
- File from T50116 was having totally different shadow between
sample 1 and sample 32.
Use some more simplified version of CMJ hash which seems to give
nice randomized value which solves the correlation.
This commit will break all unit test files, but it's a bug fix
so perhaps OK to commit this.
This also fixes T41143: Sobol gives nonuniform noise
Proper science paper about hash function is coming.
Reviewers: brecht
Reviewed By: brecht
Subscribers: lukasstockner97
Differential Revision: https://developer.blender.org/D2385
Was giving huge artifacts in the barber shop file here in the studio,
Maybe not fully optimal solution, but committing it for now to have
closer look later.
All the changes are mainly giving explicit tips on inlining functions,
so they match how inlining worked with previous toolkit.
This make kernel compiled by CUDA 8 render in average with same speed
as previous kernels. Some scenes are somewhat faster, some of them are
somewhat slower. But slowdown is within 1% so far.
On a positive side it allows us to enable newer generation cards on
buildbots (so GTX 10x0 will be officially supported soon).
This hopefully fixes T48383 by avoiding two numerical problems that I found in the volume code.
Reviewers: sergey, dingto, brecht
Reviewed By: sergey, dingto, brecht
Maniphest Tasks: T48383
Differential Revision: https://developer.blender.org/D2051
At some point the idea was that we could have an optimization where we could
render multiple render layers without re-exporting the scene, by just updating
the layer bits. We are not doing this now and in practice with the available
render layer control like exclude layers it's not always possible anyway.
This makes it easier to support an arbitrary number of layers in the future
(hopefully this summer), and frees up some useful bits in the kernel.
Reviewed By: sergey, dingto
Differential Revision: https://developer.blender.org/D2020
This commit makes it so malloc() is only happening once per volume and
once per transparent shadow query (per thread), improving scalability of
the code to multiple CPU cores.
Hard to measure this with a low-bottom i7 here currently, but from quick
tests seems volume sampling gave about 3-5% speedup.
The idea is to store allocated memory in kernel globals, which are per
thread on CPU already.
Reviewers: dingto, juicyfruit, lukasstockner97, maiself, brecht
Reviewed By: brecht
Subscribers: Blendify, nutel
Differential Revision: https://developer.blender.org/D1996
This commit changes the way how we pass bounce information to the Light
Path node. Instead of manualy copying the bounces into ShaderData, we now
directly pass PathState. This reduces the arguments that we need to pass
around and also makes it easier to extend the feature.
This commit also exposes the Transmission Bounce Depth to the Light Path
node. It works similar to the Transparent Depth Output: Replace a
Transmission lightpath after X bounces with another shader, e.g a Diffuse
one. This can be used to avoid black surfaces, due to low amount of max
bounces.
Reviewed by Sergey and Brecht, thanks for some hlp with this.
I tested compilation and usage on CPU (SVM and OSL), CUDA, OpenCL Split
and Mega kernel. Hopefully this covers all devices. :)
This replaces sequential ray moving followed with scene intersection with
single BVH traversal, which gives us all possible intersections.
Only implemented for CPU, due to qsort and a bigger memory usage on GPU
which we rather avoid. GPU still uses the regular bvh volume intersection code, while CPU now uses the new code.
This improves render performance for scenes with:
a) Camera inside volume mesh
b) SSS mesh intersecting a volume mesh/domain
In simple volume files (not much geometry) performance is roughly the same
(slightly faster). In files with a lot of geometry, the performance
increase is larger. bmps.blend with a volume shader and camera inside the
mesh, it renders ~10% faster here.
Patch by Sergey and myself.
Differential Revision: https://developer.blender.org/D1264
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
This merges consecutive empty steps in the decoupled record function,
which can lead to fewer iterations in the scatter functions.
Only helps slightly though (1%), but doesn't hurt to have this.
Differential Revision: https://developer.blender.org/D873
It could have happened with really long rays and small steps.
Step size will be adjusted to the clamped number of steps in order
to preserve render result compatibility as much as possible.
We should probably reformulate this a bit, so it will give the
same looking results without step tweaks. But this new behavior
should already be much better that it was before.
The issue was caused by the way how we shoot the ray to see which rays we're
inside which might start bouncing back-n-forth between two close to parallel
intersecting faces.
Real solution would be to record all the intersections when shooting the ray,
but it's kinda tricky on GPU because of needed sorting and uncertainty of
how huge intersection array should be.
For now we'll just limit number of steps in the check so in worst case we'll
have some samples not being correct which will be compensated with further
sampling. Shouldn't be an issue since probability of such a lock is quite
small actually.