If there was any specularity in the Principled BSDF, it would get a sampling
weight of one regardless of its actual impact.
This commit makes Cycles estimate the contribution of the component and adjust
the weighting accordingly, which greatly improves the noise characteristics of
the Principled BSDF in many cases.
Note that this commit might slightly change the brightness of areas when using
MultiGGX and high roughnesses, but the new brightness is more accurate and
closer to the result of Branched Path Tracing. See T51836 for details.
Differential Revision: https://developer.blender.org/D2677
The PDF of the MultiGGX sampling is approximated by the singlescattering GGX
term as well as a scaled diffuse term that makes up for the energy in the
multiscattering component that's missed by GGX.
However, there were two problems with the glossy terms: The diffuse term missed
a normalization factor, and the singlescattering term was not properly scaled
down based on the albedo estimate.
The glass term was completely wrong and has been rewritten. It uses the fresnel
factor to weight reflection vs. refraction and uses the glossy MultiGGX model
for reflection.
For refraction, the correct singlescattering term is now used, and a new
albedo approximation is used that was derived by evaluating GGX albedo for
roughnesses from 0 to 1 and IORs from 1 to 3 and fitting numerical
approximations to it. The resulting model has a mean relative error of 9e-5,
but could probably be simplified without losing noticable accuracy in the
final render.
The improved PDFs help with glossy highlights (due to better light sampling vs.
closure sampling MIS) and fix the situation described in T51836 where mixing
MultiGGX with other closures (as it happens in e.g. the Principled
BSDF) causes incorrect darkening.
The crash did not happen yet because we always had proper vmemh defined in
the parent scope.
Patch by Ivan Ivanov (aka obiwanus), thanks!
Differential Revision: https://developer.blender.org/D2715
This is not enough to mutex-guard modification code of integer values,
since this operation is NOT atomic. This is not even safe for a single
byte data types.
For now guarded the getter functions, similar to other functions in
this module.
Ideally we want to switch modification to an atomic operations, so we
wouldn't need any locks in the getters.
Was some mismatch in address space. Seems to be caused by recent additions.
Additionally, moved decoupled ray marching functions under ifdef, so they
don't try to use malloc() functions.
Thanks Mai for testing the patch!
Now, when there is no usable neighboring pixel for denoising, the noisy value
is preserved instead of producing a NaN.
Also, negative results are clamped to zero.
Note that there are just workarounds that don't fix the underlying problems,
but these issues are very rare and I'm not sure if it's even possible to fix
the underlying problems without introducing a significant slowdown or quality
decrease in other situations.
Because of that and since 2.79 is happening very soon, I just went for these
workarounds for now.
Technically not passing all buffers used by a kernel is undefined
behavior. We haven't had any issues with this so far on AMD or
Nvidia, but it's known to be a problem with Intel and we received
a report from AMD that this is a problem on newer hardware, so we
need to make this change at some point.
Unfortunately there a cost to being correct, about 5% for the
benchmark scenes. For low sample counts it's even worse, I've
seen up to 50% slowdown. For the latter case I think adjusting
tile updating logic can help, but not sure what that would look
like yet (it would be just a few lines change however).
Unlike regular path tracing, branched path tracing is usually used with lower
sample counts, at least for primary rays. This means that are less samples for
the GPU to work on in parallel and rendering is slower. As there is less work
overall there is also more inactive threads during rendering with BPT. This
patch makes use of those inactive rays to render branched samples in parallel
with other samples.
Each thread that is preparing for a branched sample will attempt to find an
inactive thread and if one is found the state for the sample is copied to that
thread. Potentially, if there are enough inactive threads, 100s of branched
samples could be generated from the same originating thread and ran in
parallel giving large speed ups.
Gives 70% faster render for pavillion midday scene. 20-60% faster on BMW
with car paint replaced with SSS/volumes.
Due to various driver issues with AMD GCN 1 cards we can no longer support
these GPUs. This patch makes them unavailable to select for Cycles rendering.
GCN cards 2 and higher are still supported. Please use the most recent
drivers available to ensure proper functionality.
See here for a list to check which GPUs are supported:
https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units
The previous outlier heuristic only checked whether the pixel is more than
twice as bright compared to the 75% quantile of the 5x5 neighborhood.
While this detected fireflies robustly, it also incorrectly marked a lot of
legitimate small highlights as outliers and filtered them away.
This commit adds an additional condition for marking a pixel as a firefly:
In addition to being above the reference brightness, the lower end of the
3-sigma confidence interval has to be below it.
Since the lower end approximates how low the true value of the pixel might be,
this test separates pixels that are supposed to be very bright from pixels that
are very bright due to random fireflies.
Also, since there is now a reliable outlier filter as a preprocessing step,
the additional confidence interval test in the reconstruction kernel is no
longer needed.
Denoising was setting session parameters for every frame, which was detected as
a change and therefore caused a resync.
Since the parameter modification change is only needed for viewport rendering
(which doesn't support denoising anyways) and resyncing after a frame change
(which isn't affected by denoising settings), an easy fix is to just ignore
the denoising parameters like it's currently done with the samples.
Follow up to 9f044cb422
These comments described the difference between Microsoft & MinGW's struct definition. Now that we dropped MinGW we don't need to go into these details.
The Issue
=======
For a long time now MinGW has been unsupported and unmaintained and at this point,
it looks like something that we should just leave behind and move on.
Why Remove
==========
One of the big motivations for MinGW back in the day is that it was free compared to MSVC which was licensed based.
However, now that this is no longer true we have basically stopped updating the need CMake files.
Along with the CMake files, there are several patches to the extern libs needed to make this work. For example, see:
https://developer.blender.org/diffusion/B/browse/master/extern/carve/patches/mingw_w64.patch
If we wanted to keep MinGW then we would need to make more custom patches to the external libs and
this is not something our platform maintainers are willing to do.
For example, here is the patches needed to build python: https://github.com/Alexpux/MINGW-packages/tree/master/mingw-w64-python3
Fixes T51301
Differential Revision: https://developer.blender.org/D2648
This feature got lost with new auto-track API,
Added it back by extending frame accessor class. This isn't really
a frame thing, but we don't have other type of accessor here.
Surely, we can use old-style API here and pass mask via region
tracker options for this particular case, but then it becomes much
less obvious how real auto-tracker will access this mask with old
style API.
So seems we do need an accessor for such data, just matter of
finding better place than frame accessor.
Seems re-loading module invalidates memory pointers by the looks of it,
which gives an error on the next kernel call.
Not sure how to move memory pointer from one CUDA module to another one,
so for now simply disabling kernel re-load for CUDA devices. Not ideal,
but better than failing render.
Feature-selective option for CUDA is not an official feature anyway.
Volume shaders without anything connected to the surface output are treated
as if they had a transparent BSDF as the surface shader in Cycles, so the
denoiser should skip feature pass writing for them just as it does with an
actual transparent BSDF.
If the central pixel is an outlier, the denoiser is supposed to predict its
value from the surrounding pixels. However, in some cases the confidence
interval test would reject every single surrounding pixel, which leaves the
model fitting with no data to work with.
- Some arguments were inapproriatry tagged as unused
using (void)foo semantic.
Only use such semantic in tricky casses, when something
needs to be ignored in release builds or something is
dependent on tricky ifndef policy.
For rest of the cases just use void foo(int /bar*/)
semantic, which ensures variable is not used. Solves
confusion and code running out of sync with later
development.
- Used proper unused semantic to some arguments.
- Added braces to make code easier to follow, tricky
indentation with ifdef, uh.
For example, when using a radius of 1, only 9 pixels (due to weighting maybe
even less) will be used, but the transform code may still decide to use a
5-dimensional (or even higher) fit.
This causes severe overfitting and therefore weird pixel values.
To avoid this, this commit limits the amount of dimensions to a third of the
pixel number. For a radius of 3 or more, this doesn't change anything, but
for 1 and 2 it can prevent fireflies and/or negative values being produced.