Commit Graph

125 Commits

Author SHA1 Message Date
Sergey Sharybin
525673b37b Cycles: Fix uninitialized variable issue after recent changes 2016-12-14 17:31:11 +01:00
Sergey Sharybin
c4d6fd3ec0 Cycles: Consider GGX/Beckmann/Ashikhmin of 0 roughness a singular ray
This matches behavior of Multiscatter GGX and could become handy later on
when/if we decide it would be beneficial to replace on closure with another.

Reviewers: lukasstockner97, brecht

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D2413
2016-12-14 11:04:02 +01:00
Sergey Sharybin
87cd56b012 Fix T50075: Assert during debug render of hair_geom_transmission.blend 2016-12-01 12:11:11 +01:00
Brecht Van Lommel
a3abb020e3 Fix Cycles CUDA performance on CUDA 8.0.
Mostly this is making inlining match CUDA 7.5 in a few performance critical
places. The end result is that performance is now better than before, possibly
due to less register spilling or other CUDA 8.0 compiler improvements.

On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory
usage is reduced a little too.

Reviewed By: sergey

Differential Revision: https://developer.blender.org/D2269
2016-10-03 22:15:25 +02:00
Sergey Sharybin
0ec87f1227 Cycles: Cleanup, indentation 2016-09-28 17:05:33 +02:00
Sergey Sharybin
e1bfb89da2 Cycles: Fix compilation error with minimal feature set 2016-09-28 17:03:59 +02:00
Lukas Stockner
0b89b31a18 Cycles: Fix T49411: Multiscatter GGX with zero roughness when Filter Glossy is enabled 2016-09-25 22:09:38 +02:00
Sergey Sharybin
c5eb400b7c Cycles: Fix embarrassing typo
Spotted by Mai Lavelle, thanks!
2016-08-05 14:45:54 +02:00
Sergey Sharybin
500e0e9a3d Cycles: Some more inline policy tweaks for CUDA 8
Makes it so toolkit does exactly the same decision about what to inline,
but unfortunately it has really barely visible difference on GTX-980.
2016-08-02 15:13:34 +02:00
Sergey Sharybin
6353ecb996 Cycles: Tweaks to support CUDA 8 toolkit
All the changes are mainly giving explicit tips on inlining functions,
so they match how inlining worked with previous toolkit.

This make kernel compiled by CUDA 8 render in average with same speed
as previous kernels. Some scenes are somewhat faster, some of them are
somewhat slower. But slowdown is within 1% so far.

On a positive side it allows us to enable newer generation cards on
buildbots (so GTX 10x0 will be officially supported soon).
2016-08-01 15:54:29 +02:00
Brecht Van Lommel
9b6ed3a42b Cycles: refactor kernel closure storage to use structs per closure type.
Reviewed By: dingto, sergey

Differential Revision: https://developer.blender.org/D2127
2016-07-31 02:34:43 +02:00
Sergey Sharybin
d834759423 Cycles: Fix difference in Ashikhmin Shirley shader between CPU and GPU
The issue was caused by some NaN appearing in calculations.

Visible with scifi_armor_concept.blend from the cloud.
2016-07-28 18:46:29 +02:00
Lukas Stockner
d9cc3ea2c6 Cycles: Fix rays parallel to the surface in the triangle refine and MultiGGX code
In the triangle intersection refinement code, rays that are parallel to the triangle caused a divide by zero.
These rays might initially hit the triangle due to the watertight intersection test, but are very rare - therefore, just skipping the refinement for them works fine.

Also, a few remaining issues in the MultiGGX code are fixed that were caused by rays parallel to the surface (which happened more often there due to smooth shading).
2016-07-25 16:14:25 +02:00
Lukas Stockner
83ae0a0e06 Cycles: Calculate differentials in the Multiscattering GGX closures
The Multiscattering GGX closures didn't set the omega_i differentials, which could cause undefined behaviour.
2016-07-25 16:14:25 +02:00
Lukas Stockner
a2c82f5e5d Cycles: Fix OpenCL compilation after the recent numerical fixes 2016-07-17 19:24:53 +02:00
Lukas Stockner
d9281a6332 Cycles: Fix three numerical issues in the fresnel, normal map and Beckmann code
- In fresnel_dielectric, the differentials calculation sometimes divided by zero.
- When the normal map was (0.5, 0.5, 0.5), the code would try to normalize a zero vector. Now, it just uses the regular normal as a fallback.
- The approximate error function used in Beckmann sampling sometimes overflowed to inf while calculating r^16. The final value is 1 - 1/r^16, however,
  so now it just returns 1 if the computation would overflow otherwise.
2016-07-16 20:54:14 +02:00
Lukas Stockner
5ba78d76d4 Cycles: Deduplicate geometric factor calculation in the Beckmann distribution
Also, this fixes a numerical issue where A would be inf.
Since later G is set to 1 if A is larger than 1.6, the code now checks the reciprocal of A for being smaller than 1/1.6 - same effect, but no inf involved.
2016-07-16 20:54:14 +02:00
Sergey Sharybin
23cc453975 Fix T48732: New GGX breaks OpenCL kernel
Make sure we don't perform any implicit address space conversion.

A bit annoying, but less intrusive approaches (like using temp private
variable in .cl kernel) do not work correct here.

Using generic address space will help from code side here, but will
be somewhat slower due to extra things happening as far as i know.
2016-06-28 17:15:35 +05:00
Lukas Stockner
2a69b09b62 Fix T48732 v2: New GGX breaks OpenCL kernel
As far as I can see, the second issue there was that the functions receive a pointer to a member variable of the
ShaderData, which is stored in global memory. However, this means that the pointer points to global memory as well,
therefore OpenCL requires the ccl_addr_space "keyword" in front of the pointer.
With this commit, the OpenCL kernels build on Linux with the Intel CPU OpenCL runtime - however, they already did
without the change and I don't have an AMD card, so I can't really test whether the AMD runtime is happy as well now.
2016-06-26 00:51:16 +02:00
Thomas Dinges
99088f8b55 Fix T48732, OpenCL compile failure after Multiscatter GGX commit.
Use OpenCL "all" builtin type for conversion, according to OpenCL 1.1 spec 6.3e.
2016-06-25 11:14:06 +02:00
Lukas Stockner
23c276832b Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs
This commit adds a new distribution to the Glossy, Anisotropic and Glass BSDFs that implements the
multiple-scattering microfacet model described in the paper "Multiple-Scattering Microfacet BSDFs with the Smith Model".

Essentially, the improvement is that unlike classical GGX, which only models single scattering and assumes
the contribution of multiple bounces to be zero, this new model performs a random walk on the microsurface until
the ray leaves it again, which ensures perfect energy conservation.

In practise, this means that the "darkening problem" - GGX materials becoming darker with increasing
roughness - is solved in a physically correct and efficient way.

The downside of this model is that it has no (known) analytic expression for evalation. However, it can be
evaluated stochastically, and although the correct PDF isn't known either, the properties of MIS and the
balance heuristic guarantee an unbiased result at the cost of slightly higher noise.

Reviewers: dingto, #cycles, brecht

Reviewed By: dingto, #cycles, brecht

Subscribers: bliblubli, ace_dragon, gregzaal, brecht, harvester, dingto, marcog, swerner, jtheninja, Blendify, nutel

Differential Revision: https://developer.blender.org/D2002
2016-06-23 22:57:26 +02:00
Sergey Sharybin
7ac126e728 Fix T46492: GGX distribution produces black pixels
The issue was caused by some numerical instability.
2016-06-17 16:30:29 +02:00
Sergey Sharybin
84c68dcb3f Cycles: Minor cleanup, whitespace around keyword and preprocessor indent 2016-04-13 08:58:52 +02:00
Sergey Sharybin
700722f686 Cycles: Cleanup, indent nested preprocessor directives
Quite straightforward, main trick is happening in path_source_replace_includes().

Reviewers: brecht, dingto, lukasstockner97, juicyfruit

Differential Revision: https://developer.blender.org/D1794
2016-03-25 13:55:42 +01:00
Sergey Sharybin
69dc0c3192 Cycles: Fixes for Burley BSSRDF
There are several fixes in here, which hopefully will make the shader
working correct without too much magic in there.

First of all, this commit brings BURLEY_TRUNCATE down from 30 to 16
which reduces noise a lot. It's still higher than original truncate
from Brecht, but this reduces PDF value at a cutoff distance by an
order of magnitude (now it's 0.008387, previously it was 0.063521
for the albedo of 0.8 and radius 1.0). This should converge to a
proper result faster and don't have artifacts.

This kind of reverts fix for T47356, but after additional thinking
came to conclusion Burley is not being totally smooth, it is about
giving less waxy results which it's kind of doing in the file.

Second of all, this commit fixes burley_eval() to use normalized
diffusion reflectance. This matches the way we calculate CDF and
solves numeric instability close to 0, making PDF profile looking
closer to other SSS profiles:

  https://developer.blender.org/F282355
  https://developer.blender.org/F282356
  https://developer.blender.org/F282357

Reviewers: brecht

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D1792
2016-02-13 13:29:13 +01:00
Sergey Sharybin
2ac88328b0 Cycles: Fix Burley's CDF truncation after recent radius fix
This is all not really ideal, but good enough for tonight.
More thoughts and investigation tomorrow!
2016-02-08 21:50:38 +01:00
Sergey Sharybin
dae8326d1e Fix T47356: Too sharp falloww with Burley BSSRDF
After the clamping commit we need to bump BURLEY_TRUNCATE
constant a bit, otherwise mean free path does not really
match the disk radius needed for importance sampling.
2016-02-08 14:54:11 +01:00
Brecht Van Lommel
4443bb8922 Fix Burley BSSRDF NaNs and fireflies.
Explicitly truncate to Rm same way as the Gaussian BSSRDF, and use safe_sqrtf()
to be sure in case of float precision issues.
2016-02-06 11:52:43 +01:00
Sergey Sharybin
e688a62712 Cycles: Fix for initial guess of the radius for Burley BSSRDF
The value was too high, causing bad Newton iteration step.
Now the value is not so good, but it's still within 9 iterations
and those high number of iterations are only happening in
approx 1% of input values.
2016-02-05 10:06:08 +01:00
Sergey Sharybin
3e7389eaf2 Cycles: Speedup of Christensen-Burley SSS falloff function
The idea is simply to pre-compute fitting and parameterization
in the bssrdf_setup() function and re-use the values in both
sample() and eval().

The only trick is where to store the pre-calculated values and
the answer is inside of ShaderClosure->custom{1,2,3}. There's
no memory bump here because we now simply re-use padding fields
for the pre-calculated values. Similar trick we can do for other
BSDFs.

Seems to give nice speedup up to 7% here on my desktop with
Core i7 CPU, SSE4.1 kernel.
2016-02-04 15:29:58 +01:00
Sergey Sharybin
ad26407b52 Cycles: Implement approximate reflectance profiles
Using this paper:

  http://graphics.pixar.com/library/ApproxBSSRDF/paper.pdf

This model gives less blurry results than the Cubic and Gaussian
we had implemented:

- Cubic: https://developer.blender.org/F279670
- Burley: https://developer.blender.org/F279671

The model is called "Christensen-Burley" in the interface, which
actually should be read as "Physically based" or "Realistic".

Reviewers: juicyfruit, dingto, lukasstockner97, brecht

Reviewed By: brecht, dingto

Subscribers: robocyte

Differential Revision: https://developer.blender.org/D1759
2016-02-04 13:27:23 +05:00
Sergey Sharybin
e42852a339 Cycles: Cleanup and reference actual paper used for BSSRDF sampling 2016-02-02 18:06:29 +01:00
Sergey Sharybin
12b7850d4f Cycles: Fix for transmissive microfacet sampling
This is an alternate fix for T40964 which resolves bad handling of
caustics reported in T45609.

There were too much transmission rays being discarded by the original
fix, which caused by caustic light being totally disabled. There is
still some room for investigation why exactly original paper didn't
work that well, could be caused by the way how the pdf is calculated.

In any case current results seems rather correct now.
2015-07-31 13:46:58 +02:00
Thomas Dinges
6a0a205cb4 Cycles: Simplify volume_phase_eval().
This simplification is safe, as the call to volume_phase_eval() is guarded behind a CLOSURE_IS_PHASE check, which is equal to
CLOSURE_VOLUME_HENYEY_GREENSTEIN_ID. I don't think we will add more phase functions anytime soon, if at all.
2015-06-11 15:18:33 +02:00
George Kyriazis
7f4479da42 Cycles: OpenCL kernel split
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.

Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.

Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.

This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.

More feature will be enabled once they're ported to the split kernel and
tested.

Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.

Based on the research paper:

  https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf

Here's the documentation:

  https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit

Design discussion of the patch:

  https://developer.blender.org/T44197

Differential Revision: https://developer.blender.org/D1200
2015-05-09 19:52:40 +05:00
Sergey Sharybin
ae7d84dbc1 Cycles: Use native saturate function for CUDA
This more a workaround for CUDA optimizer which can't optimize clamp(x, 0, 1)
into a single instruction and uses 4 instructions instead.

Original patch by @lockal with own modification:

  Don't make changes outside of the kernel. They don't make any difference
  anyway and term saturate() has a bit different meaning outside of kernel.

This gives around 2% of speedup in Barcelona file, but in more complex shader
setups with lots of math nodes with clamping speedup could be much nicer.

Subscribers: dingto

Projects: #cycles

Differential Revision: https://developer.blender.org/D1224
2015-04-28 00:38:32 +05:00
Thomas Dinges
bc160d8a85 Cleanup: Code style. 2015-04-26 00:42:26 +02:00
Sergey Sharybin
e2354e64d2 Cycles: Cleanup, spaces around assignment operator
Did some bad spacing in recent commits, better to get rid of those so
they does not confuse those who're working on sources.
2015-04-07 00:25:54 +05:00
Sergey Sharybin
a9bb8d8a73 Cycles: de-duplicate fast/approximate erf function calculation
Our own implementation is in fact the same performance as in fast_math from
OpenShadingLanguage, but implementation from fast_math is using explicit madd
function, which increases chance of compiler deciding to use intrinsics.
2015-04-06 12:49:44 +05:00
Sergey Sharybin
b06962fcfe Cycles: Avoid using lookup table for Beckmann slopes on GPU
This patch is based on some work done in D788 and re-formulation from Beckmann
implementation in OpenShadingLanguage.

Skipping texture lookup helps a lot on GPUs where it's more expensive to access
texture memory than to do some extra calculation in threads.

CPU code still uses lookup-table based approach since this seems to be still
faster (at least on computers i've got access to).

This change gives about 2% speedup on BMW scene with GTX560TI.
2015-04-05 19:07:45 +05:00
Sergey Sharybin
252b36ce77 Cycles: Remove unused Beckmann slope sampling code
It did not preserve stratification too well and lookup-table approach was
working much better. There are now also some more interesting forumlation
from Wenzel and OpenShadingLanguage which should work better than old code.
2015-04-05 19:07:44 +05:00
Sergey Sharybin
af399884e1 Fix T44113: Ashikhmin-Shirley distribution of glossy shader at 0 roughness causes artifacts when background uses MIS
Was a division by zero error, solved in the same way as beckmann/ggx
deals with small roughness values.
2015-04-01 14:21:21 +05:00
Sergey Sharybin
5ff132182d Cycles: Code cleanup, spaces around keywords
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.

Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
2015-03-28 00:15:15 +05:00
Sergey Sharybin
87cff57207 Fix T44123: Cycles SSS renders black in recent builds
Issue was introduced in 01ee21f where i didn't notice *_setup()
function only doing partial initialization, and some of parameters
are expected to be initialized by callee function.

This was hitting only some setups, so tests with benchmark scenes
didn't unleash issues. Now it should all be fine.

This is to go to the 2.74 branch and we actually might re-AHOY.
2015-03-25 02:33:49 +05:00
Sergey Sharybin
ed7e593a4b Fix T43926: Volume scatter: intersecting objects GPU rendering artifacts
Fix T44007: Cycles Volumetrics: block artifacts with overlapping volumes

The issue was caused by uninitialized parameters of some closures, which
lead to unpredictable behavior of shader_merge_closures().
2015-03-23 12:48:33 +05:00
Thomas Dinges
6f3500db05 Cleanup: Remove unused SD_PHASE_HAS_EVAL flag.
We only have a non-singular volume closure and therefore no need to distinguish it.
2015-02-18 16:33:31 +01:00
Thomas Dinges
a2366a3a2e Cleanup for Cycles hair shader ifdefs.
sc->T and sc->data2 were behind __HAIR__ ifdef, now they are not anymore, so we can always assign the correct value.
2015-02-18 15:57:39 +01:00
Sergey Sharybin
f3e831f02d Cycles: Fix for hair transmission BSDF not returning proper label 2015-02-17 22:40:00 +05:00
Thomas Dinges
a0d7db503d Cycles: Small tweaks for Henyey Greenstein closure code.
* Avoid duplicative fabs(g) check in sample code.
* Avoid dot product in eval code.

Helps like ~1% when Scatter Anisotropy is 0.
2015-02-17 17:48:18 +01:00
Thomas Dinges
bf878d3c3d Cycles: Remove empty closure blur code and the corresponding entries in the switch.
Most compilers will probably optimize that out, but I still don't see a reason to keep it.
2015-02-17 13:44:25 +01:00