If there was any specularity in the Principled BSDF, it would get a sampling
weight of one regardless of its actual impact.
This commit makes Cycles estimate the contribution of the component and adjust
the weighting accordingly, which greatly improves the noise characteristics of
the Principled BSDF in many cases.
Note that this commit might slightly change the brightness of areas when using
MultiGGX and high roughnesses, but the new brightness is more accurate and
closer to the result of Branched Path Tracing. See T51836 for details.
Differential Revision: https://developer.blender.org/D2677
The PDF of the MultiGGX sampling is approximated by the singlescattering GGX
term as well as a scaled diffuse term that makes up for the energy in the
multiscattering component that's missed by GGX.
However, there were two problems with the glossy terms: The diffuse term missed
a normalization factor, and the singlescattering term was not properly scaled
down based on the albedo estimate.
The glass term was completely wrong and has been rewritten. It uses the fresnel
factor to weight reflection vs. refraction and uses the glossy MultiGGX model
for reflection.
For refraction, the correct singlescattering term is now used, and a new
albedo approximation is used that was derived by evaluating GGX albedo for
roughnesses from 0 to 1 and IORs from 1 to 3 and fitting numerical
approximations to it. The resulting model has a mean relative error of 9e-5,
but could probably be simplified without losing noticable accuracy in the
final render.
The improved PDFs help with glossy highlights (due to better light sampling vs.
closure sampling MIS) and fix the situation described in T51836 where mixing
MultiGGX with other closures (as it happens in e.g. the Principled
BSDF) causes incorrect darkening.
This is not enough to mutex-guard modification code of integer values,
since this operation is NOT atomic. This is not even safe for a single
byte data types.
For now guarded the getter functions, similar to other functions in
this module.
Ideally we want to switch modification to an atomic operations, so we
wouldn't need any locks in the getters.
Was some mismatch in address space. Seems to be caused by recent additions.
Additionally, moved decoupled ray marching functions under ifdef, so they
don't try to use malloc() functions.
Thanks Mai for testing the patch!
Now, when there is no usable neighboring pixel for denoising, the noisy value
is preserved instead of producing a NaN.
Also, negative results are clamped to zero.
Note that there are just workarounds that don't fix the underlying problems,
but these issues are very rare and I'm not sure if it's even possible to fix
the underlying problems without introducing a significant slowdown or quality
decrease in other situations.
Because of that and since 2.79 is happening very soon, I just went for these
workarounds for now.
Technically not passing all buffers used by a kernel is undefined
behavior. We haven't had any issues with this so far on AMD or
Nvidia, but it's known to be a problem with Intel and we received
a report from AMD that this is a problem on newer hardware, so we
need to make this change at some point.
Unfortunately there a cost to being correct, about 5% for the
benchmark scenes. For low sample counts it's even worse, I've
seen up to 50% slowdown. For the latter case I think adjusting
tile updating logic can help, but not sure what that would look
like yet (it would be just a few lines change however).
Unlike regular path tracing, branched path tracing is usually used with lower
sample counts, at least for primary rays. This means that are less samples for
the GPU to work on in parallel and rendering is slower. As there is less work
overall there is also more inactive threads during rendering with BPT. This
patch makes use of those inactive rays to render branched samples in parallel
with other samples.
Each thread that is preparing for a branched sample will attempt to find an
inactive thread and if one is found the state for the sample is copied to that
thread. Potentially, if there are enough inactive threads, 100s of branched
samples could be generated from the same originating thread and ran in
parallel giving large speed ups.
Gives 70% faster render for pavillion midday scene. 20-60% faster on BMW
with car paint replaced with SSS/volumes.
Due to various driver issues with AMD GCN 1 cards we can no longer support
these GPUs. This patch makes them unavailable to select for Cycles rendering.
GCN cards 2 and higher are still supported. Please use the most recent
drivers available to ensure proper functionality.
See here for a list to check which GPUs are supported:
https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units
The previous outlier heuristic only checked whether the pixel is more than
twice as bright compared to the 75% quantile of the 5x5 neighborhood.
While this detected fireflies robustly, it also incorrectly marked a lot of
legitimate small highlights as outliers and filtered them away.
This commit adds an additional condition for marking a pixel as a firefly:
In addition to being above the reference brightness, the lower end of the
3-sigma confidence interval has to be below it.
Since the lower end approximates how low the true value of the pixel might be,
this test separates pixels that are supposed to be very bright from pixels that
are very bright due to random fireflies.
Also, since there is now a reliable outlier filter as a preprocessing step,
the additional confidence interval test in the reconstruction kernel is no
longer needed.
Denoising was setting session parameters for every frame, which was detected as
a change and therefore caused a resync.
Since the parameter modification change is only needed for viewport rendering
(which doesn't support denoising anyways) and resyncing after a frame change
(which isn't affected by denoising settings), an easy fix is to just ignore
the denoising parameters like it's currently done with the samples.
Seems re-loading module invalidates memory pointers by the looks of it,
which gives an error on the next kernel call.
Not sure how to move memory pointer from one CUDA module to another one,
so for now simply disabling kernel re-load for CUDA devices. Not ideal,
but better than failing render.
Feature-selective option for CUDA is not an official feature anyway.
Volume shaders without anything connected to the surface output are treated
as if they had a transparent BSDF as the surface shader in Cycles, so the
denoiser should skip feature pass writing for them just as it does with an
actual transparent BSDF.
If the central pixel is an outlier, the denoiser is supposed to predict its
value from the surrounding pixels. However, in some cases the confidence
interval test would reject every single surrounding pixel, which leaves the
model fitting with no data to work with.
- Some arguments were inapproriatry tagged as unused
using (void)foo semantic.
Only use such semantic in tricky casses, when something
needs to be ignored in release builds or something is
dependent on tricky ifndef policy.
For rest of the cases just use void foo(int /bar*/)
semantic, which ensures variable is not used. Solves
confusion and code running out of sync with later
development.
- Used proper unused semantic to some arguments.
- Added braces to make code easier to follow, tricky
indentation with ifdef, uh.
For example, when using a radius of 1, only 9 pixels (due to weighting maybe
even less) will be used, but the transform code may still decide to use a
5-dimensional (or even higher) fit.
This causes severe overfitting and therefore weird pixel values.
To avoid this, this commit limits the amount of dimensions to a third of the
pixel number. For a radius of 3 or more, this doesn't change anything, but
for 1 and 2 it can prevent fireflies and/or negative values being produced.
Once again, numerical instabilities causing the Cholesky decomposition to fail.
However, further increasing the diagonal correction just because of a few
pixels in very specific scenes and settings seems unjustified.
Therefore, this commit simply falls back to the basic NLM-filtered pixel
if the more advanced model fails.
I wouldn't mind switching fully to Google style, but i am against of
mixing two different styles in same project. So just stick to brace
at the new line after function definition.
There were following issues with ccl_restrict_ptr:
- We already had ccl_restrict for all platforms.
- It was secretly adding `const` qualifier to the declaration,
which is quite weird since non-const pointer can also be
declared as restricted.
- We never in Blender are using foo_ptr or FooPtr type definitions,
so not sure why we should introduce such a thing here.
- It is absolutely wrong from semantic point of view to put pointer
into the restrict macro -- const is a part of type, not part of
hint for compiler that some pointer is never aliased.
Denoise commit introduced kernel_write_result() which saves light passes, so
no need to call both kernel_write_result() and kernel_write_light_passes() from
the split kernel.
Weirdly enough. kernel_write_result() does not take care about debug passes.