Commit Graph

205 Commits

Author SHA1 Message Date
Lukas Stockner
43b374e8c5 Cycles: Implement denoising option for reducing noise in the rendered image
This commit contains the first part of the new Cycles denoising option,
which filters the resulting image using information gathered during rendering
to get rid of noise while preserving visual features as well as possible.

To use the option, enable it in the render layer options. The default settings
fit a wide range of scenes, but the user can tweak individual settings to
control the tradeoff between a noise-free image, image details, and calculation
time.

Note that the denoiser may still change in the future and that some features
are not implemented yet. The most important missing feature is animation
denoising, which uses information from multiple frames at once to produce a
flicker-free and smoother result. These features will be added in the future.

Finally, thanks to all the people who supported this project:

- Google (through the GSoC) and Theory Studios for sponsoring the development
- The authors of the papers I used for implementing the denoiser (more details
  on them will be included in the technical docs)
- The other Cycles devs for feedback on the code, especially Sergey for
  mentoring the GSoC project and Brecht for the code review!
- And of course the users who helped with testing, reported bugs and things
  that could and/or should work better!
2017-05-07 14:40:58 +02:00
Mai Lavelle
915766f42d Cycles: Branched path tracing for the split kernel
This implements branched path tracing for the split kernel.

General approach is to store the ray state at a branch point, trace the
branched ray as normal, then restore the state as necessary before iterating
to the next part of the path. A state machine is used to advance the indirect
loop state, which avoids the need to add any new kernels. Each iteration the
state machine recreates as much state as possible from the stored ray to keep
overall storage down.

Its kind of hard to keep all the different integration loops in sync, so this
needs lots of testing to make sure everything is working correctly. We should
probably start trying to deduplicate the integration loops more now.

Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes
could use more testing still.

Reviewers: sergey, nirved

Reviewed By: nirved

Subscribers: Blendify, bliblubli

Differential Revision: https://developer.blender.org/D2611
2017-05-02 14:26:46 -04:00
Sergey Sharybin
0579eaae1f Cycles: Make all #include statements relative to cycles source directory
The idea is to make include statements more explicit and obvious where the
file is coming from, additionally reducing chance of wrong header being
picked up.

For example, it was not obvious whether bvh.h was refferring to builder
or traversal, whenter node.h is a generic graph node or a shader node
and cases like that.

Surely this might look obvious for the active developers, but after some
time of not touching the code it becomes less obvious where file is coming
from.

This was briefly mentioned in T50824 and seems @brecht is fine with such
explicitness, but need to agree with all active developers before committing
this.

Please note that this patch is lacking changes related on GPU/OpenCL
support. This will be solved if/when we all agree this is a good idea to move
forward.

Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner

Reviewed By: lukasstockner97, maiself, nirved, dingto

Subscribers: brecht

Differential Revision: https://developer.blender.org/D2586
2017-03-29 13:41:11 +02:00
Hristo Gueorguiev
8ada7f7397 Cycles: Remove ccl_addr_space from RNG passed to functions
Simplifies code quite a bit, making it shorter and easier to extend.
Currently no functional changes for users, but is required for the
upcoming work of shadow catcher support with OpenCL.
2017-03-27 10:46:28 +02:00
Sergey Sharybin
d14e39622a Cycles: First implementation of shadow catcher
It uses an idea of accumulating all possible light reachable across the
light path (without taking shadow blocked into account) and accumulating
total shaded light across the path. Dividing second figure by first one
seems to be giving good estimate of the shadow.

In fact, to my knowledge, it's something really similar to what is
happening in the denoising branch, so we are aligned here which is good.

The workflow is following:

- Create an object which matches real-life object on which shadow is
  to be catched.

- Create approximate similar material on that object.

  This is needed to make indirect light properly affecting CG objects
  in the scene.

- Mark object as Shadow Catcher in the Object properties.

Ideally, after doing that it will be possible to render the image and
simply alpha-over it on top of real footage.
2017-03-27 10:46:03 +02:00
Hristo Gueorguiev
57e26627c4 Cycles: SSS and Volume rendering in split kernel
Decoupled ray marching is not supported yet.

Transparent shadows are always enabled for volume rendering.

Changes in kernel/bvh and kernel/geom are from Sergey.
This simiplifies code significantly, and prepares it for
record-all transparent shadow function in split kernel.
2017-03-09 17:09:37 +01:00
Mai Lavelle
352ee7c3ef Cycles: Remove ccl_fetch and SOA 2017-03-08 00:52:41 -05:00
Sergey Sharybin
0330741548 Cycles: Add option to replace GI with AO approximation after certain amount of bounces
This is a speed up option which is mainly useful for viewport. Gives nice speedup in
the barbershop scene of 2x when replacing GI with AO after 2nd bounce without loosing
too much details.

Reviewers: brecht

Subscribers: eyecandy, venomgfx

Differential Revision: https://developer.blender.org/D2383
2017-01-27 14:21:49 +01:00
Sergey Sharybin
bc096e1eb8 Cycles: Split ShaderData object and shader flags
We started to run out of bits there, so now we separate flags
which came from __object_flags and which are either runtime or
coming from __shader_flags.

Rule now is: SD_OBJECT_* flags are to be tested against new
object_flags field of ShaderData, all the rest flags are to
be tested against flags field of ShaderData.

There should be no user-visible changes, and time difference
should be minimal. In fact, from tests here can only see hardly
measurable difference and sometimes the new code is somewhat
faster (all within a noise floor, so hard to tell for sure).

Reviewers: brecht, dingto, juicyfruit, lukasstockner97, maiself

Differential Revision: https://developer.blender.org/D2428
2017-01-23 12:56:55 +01:00
Sergey Sharybin
b9311b5e5a Cycles: Make object flag names more obvious that hey are object and not shader 2017-01-23 12:14:17 +01:00
Sergey Sharybin
53fa389802 Cycles: Use dedicated debug passes for traversed nodes and intersection tests
This way it's more clear whether some issue is caused by lots of geometry in
the node or by lots of "transparent" BVH nodes.
2017-01-12 13:44:35 +01:00
Sergey Sharybin
dd58390d71 Fix emissive volumes generates unexpected fireflies around intersections
Discard the whole volume stack on the last bounce (but keep
world volume if present).

Volumes are expected to be closed manifol meshes, meaning if
ray entered the volume there should be an intersection event
of ray exisintg the volume. Case when ray hit nothing and
there are still non-world volumes in the stack can happen in
either of cases.

1. Mesh is not closed manifold.

Such configurations are not really supported anyway and should
not be used.

Previous code would have consider the infinite length of the
ray to sample across, so render result wasn't really correct
anyway.

2. Exit intersection is more far away than the camera far
   clip distance.

This case also will behave differently now, but previously it
wasn't really correct either, so it's not like we're breaking
something which was working as expected.

3. We missed exit event due to intersection precision issues.

This is exact the case which this patch fixes and avoid
fireflies.

4. Volume has Camera only visibility (all the rest visibility
is set to off)

This is what could be considered a regression but could be
solved quite easily by checking volume stack's objects flags
and keep entries which doesn't have Volume Scatter visibility
(or even better: ensure Volume Scatter visibility for objects
with volume closure),

Fixes T46108: Cycles - Overlapping emissive volumes generates unexpected bright hotspots around the intersection
Also fixes fireflies appearing on the edges of cube with
emissive volue.

Reviewers: juicyfruit, brecht

Reviewed By: brecht

Maniphest Tasks: T46108

Differential Revision: https://developer.blender.org/D2212
2016-12-08 17:35:43 +01:00
Mai Lavelle
a1aa3a8b75 Cycles: Add comments to endif directives
`kernel_path.h` and `kernel_path_branched.h` have a lot of conditional code and
it was kind of hard to tell what code belonged to which directive. Should be
easier to read now.
2016-11-10 19:50:23 -05:00
Lukas Stockner
04aa454075 Cycles: Deduplicate AO calculation
No functional changes.
2016-10-31 00:40:59 +01:00
Lukas Stockner
26bf230920 Cycles: Add optional probabilistic termination of light samples based on their expected contribution
In scenes with many lights, some of them might have a very small contribution to some pixels, but the shadow rays are traced anyways.
To avoid that, this patch adds probabilistic termination to light samples - if the contribution before checking for shadowing is below a user-defined threshold, the sample will be discarded with probability (1 - (contribution / threshold)) and otherwise kept, but weighted more to remain unbiased.
This is the same approach that's also used in path termination based on length.

Note that the rendering remains unbiased with this option, it just adds a bit of noise - but if the setting is used moderately, the speedup gained easily outweighs the additional noise.

Reviewers: #cycles

Subscribers: sergey, brecht

Differential Revision: https://developer.blender.org/D2217
2016-10-30 11:31:28 +01:00
Brecht Van Lommel
a3abb020e3 Fix Cycles CUDA performance on CUDA 8.0.
Mostly this is making inlining match CUDA 7.5 in a few performance critical
places. The end result is that performance is now better than before, possibly
due to less register spilling or other CUDA 8.0 compiler improvements.

On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory
usage is reduced a little too.

Reviewed By: sergey

Differential Revision: https://developer.blender.org/D2269
2016-10-03 22:15:25 +02:00
Sergey Sharybin
99b1c1018a Cycles: Recent SSS inline changes broke CPU tests
Very weird, but let's just fall back a bit for now.
2016-08-03 15:27:48 +02:00
Sergey Sharybin
6353ecb996 Cycles: Tweaks to support CUDA 8 toolkit
All the changes are mainly giving explicit tips on inlining functions,
so they match how inlining worked with previous toolkit.

This make kernel compiled by CUDA 8 render in average with same speed
as previous kernels. Some scenes are somewhat faster, some of them are
somewhat slower. But slowdown is within 1% so far.

On a positive side it allows us to enable newer generation cards on
buildbots (so GTX 10x0 will be officially supported soon).
2016-08-01 15:54:29 +02:00
Sergey Sharybin
4355603790 Cycles: Move BVK kernel files to own directory
BVH traversal is not really that much a geometry and we've got
quite some traversals now. Makes sense to keep them separate in
the name of source structure clarity.
2016-07-11 13:58:47 +02:00
Lukas Stockner
23c276832b Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs
This commit adds a new distribution to the Glossy, Anisotropic and Glass BSDFs that implements the
multiple-scattering microfacet model described in the paper "Multiple-Scattering Microfacet BSDFs with the Smith Model".

Essentially, the improvement is that unlike classical GGX, which only models single scattering and assumes
the contribution of multiple bounces to be zero, this new model performs a random walk on the microsurface until
the ray leaves it again, which ensures perfect energy conservation.

In practise, this means that the "darkening problem" - GGX materials becoming darker with increasing
roughness - is solved in a physically correct and efficient way.

The downside of this model is that it has no (known) analytic expression for evalation. However, it can be
evaluated stochastically, and although the correct PDF isn't known either, the properties of MIS and the
balance heuristic guarantee an unbiased result at the cost of slightly higher noise.

Reviewers: dingto, #cycles, brecht

Reviewed By: dingto, #cycles, brecht

Subscribers: bliblubli, ace_dragon, gregzaal, brecht, harvester, dingto, marcog, swerner, jtheninja, Blendify, nutel

Differential Revision: https://developer.blender.org/D2002
2016-06-23 22:57:26 +02:00
Brecht Van Lommel
b49185df99 Cycles CUDA: reduce branched path stack memory by sharing indirect ShaderData.
Saves about 15% for the branched path kernel.
2016-05-25 21:13:24 +02:00
Brecht Van Lommel
999d5a6785 Cycles CUDA: reduce stack memory by reusing ShaderData.
57% less for path and 48% less for branched path.
2016-05-23 22:29:24 +02:00
Sergey Sharybin
700722f686 Cycles: Cleanup, indent nested preprocessor directives
Quite straightforward, main trick is happening in path_source_replace_includes().

Reviewers: brecht, dingto, lukasstockner97, juicyfruit

Differential Revision: https://developer.blender.org/D1794
2016-03-25 13:55:42 +01:00
Brecht Van Lommel
3c4f971392 Workaround for T47213: branched path sampling issues with CUDA 7.5. 2016-02-19 00:49:24 +01:00
Sergey Sharybin
1f273cec00 Cycles: Tweak inline policy for some functions
The goal is to make Experimental kernel closer in performance to the
official kernel, avoiding spills and such.

There should not be big impact on official kernel, own tests showed
few percent performance drop on laptop's GPU. CPU was always the
same speed on AVX, AVX2 and SSE4.1 CPUs i've been testing here.

This seems to be the last essential step before we can get rid of
Experimental kernel and enable SSS officially on GPU without causing
some major performance issues.

Surely some more tweaks are possibly required, but that we can do
for until cows go home anyway.
2016-01-14 14:53:05 +05:00
Thomas Dinges
83e73a2100 Cycles: Refactor how we pass bounce info to light path node.
This commit changes the way how we pass bounce information to the Light
Path node. Instead of manualy copying the bounces into ShaderData, we now
directly pass PathState. This reduces the arguments that we need to pass
around and also makes it easier to extend the feature.

This commit also exposes the Transmission Bounce Depth to the Light Path
node. It works similar to the Transparent Depth Output: Replace a
Transmission lightpath after X bounces with another shader, e.g a Diffuse
one. This can be used to avoid black surfaces, due to low amount of max
bounces.

Reviewed by Sergey and Brecht, thanks for some hlp with this.

I tested compilation and usage on CPU (SVM and OSL), CUDA, OpenCL Split
and Mega kernel. Hopefully this covers all devices. :)
2016-01-06 23:43:29 +01:00
Sergey Sharybin
d0a9ec5efc Cycles: Fix SSS object not properly reflected in glossy object with indirect clamping
This fixes remained issues reported in T46908.
2015-12-02 16:00:01 +05:00
Sergey Sharybin
6147c4037d Cycles: Fix wrong volume stack after SSS bounce
Was introduced by a recent fixes, now it should be all correct and additionally
it solves the TODO mentioned in the code.
2015-11-28 20:07:34 +05:00
Sergey Sharybin
f5d1551b6e Cycles: Fix wrong original ray used for SSS baking
Also de-duplicated some code by moving to an utility function.
2015-11-28 20:07:34 +05:00
Sergey Sharybin
1e43f0d742 Cycles: Set of fixes for delayed SSS ray tracing
There were multiple issues which are solved now:

- It was possible that ray wouldn't be bounced off the BSSRDF, for example
  when PDF or shader eval is zero. In this case PathState might have been
  left in pre-bounced state which would have been gave incorrect shading
  results.

  This is solved by having separate PathState for each of the hits.

- Path radiance summing wasn't happening correct as well, indirect rays
  were using wrong path radiance in the case when there were more than
  one hit recorded.

  This is now using a bit trickier state machine which calculates path
  radiance for just SSS (both direct and indirect) and then sums it back
  to the final radiance.

- Previous commit wasn't totally correct either and was an induced bug
  due to wrong path state left from the "un-happened" ray bounce.

  There should be no special case happening here, BSSRDFs will be replaced
  with diffuse ones due to PATH_RAY_DIFFUSE_ANCESTOR flag.

- Merged back codebases for "delayed" and "immediate" indirect SSS ray
  tracing, hopefully making it easier to maintain the codebase.

Sure this changes brings memory usage back by about 4-5%, but overall
it's still about 2x memory reduction for the experimental kernel here.

Thanks Brecht for the review!
2015-11-28 20:07:34 +05:00
Sergey Sharybin
8919ed3a62 Cycles: Fallback to diffuse BSDF for the indirect SSS rays when BSSRDF is hit
This is actually how it was intended to work, just didn't notice it wasn't
really happening in the main ray loop.

Solves some memory issues reported in T46880.
2015-11-28 20:07:34 +05:00
Sergey Sharybin
20fc9c00fd Cycles: Fully roll-back to non-delayed SSS indirect rays for CPU
There are some issues to be solved with the recent optimization we did for
the indirect rays for the SSS. Those issues will take a bit of a time to
be fully solved still and we need to unlock Caminandes team now, so let's
revert some changes back.

CUDA will still use delayed indirect rays since it's an experimental
feature.

For the details about what's to be done still please refer to T46880.
2015-11-27 17:15:02 +05:00
Sergey Sharybin
175f00c89a Revert "Cycles: Fix wrong SSS with regular path tracing and clamping enabled"
This wasn't really a complete fix and only worked if there was a single scatter
event recorded only. Proper fix requires some more thoughts to make it correct
without memory use increase.

This reverts commit bf9e88bfbe.
2015-11-27 17:15:02 +05:00
Sergey Sharybin
bf9e88bfbe Cycles: Fix wrong SSS with regular path tracing and clamping enabled
Radiance sum and reset was happening in different order after 26f1c51.

This is a quick fix to unlock Caminandes team, perhaps we can avoid having
separate variable to detect when radiance is to be sum.
2015-11-26 16:11:41 +05:00
Sergey Sharybin
26f1c51ca6 Cycles: Trace indirect subsurface rays by restarting the integrator loop
This gives much lower stack usage on GPU and reduces kernel memory size to
around 448MB on GTX560Ti (comparing to 652MB with previous commit and 946MB
with official release). There's also a barely measurable speedup of around
5%, but this is to be confirmed still.

At this stage we're using only ~3% for the experimental kernel and SSS
rendering seems to be faster by 40% and after some further testing we might
consider making SSS and CMJ official features and remove experimental
precompiled kernels.
2015-11-25 13:01:22 +05:00
Sergey Sharybin
2a5c1fc9cc Cycles: Delay shooting SSS indirect rays
The idea is to delay shooting indirect rays for the SSS sampling and
trace them after the main integration loop was finished.

This reduces GPU stack usage even further and brings it down to around
652MB (comparing to 722MB before the change and 946MB with previous
stable release).

This also solves the speed regression happened in the previous commit
and now simple SSS scene (SSS suzanne on the floor) renders in 0:50
(comparing to 1:16 with previous commit and 1:03 with official release).
2015-11-25 13:01:22 +05:00
Sergey Sharybin
8bca34fe32 Cysles: Avoid having ShaderData on the stack
This commit introduces a SSS-oriented intersection structure which is replacing
old logic of having separate arrays for just intersections and shader data and
encapsulates all the data needed for SSS evaluation.

This giver a huge stack memory saving on GPU. In own experiments it gave 25%
memory usage reduction on GTX560Ti (722MB vs. 946MB).

Unfortunately, this gave some performance loss of 20% which only happens on GPU.
This is perhaps due to different memory access pattern. Will be solved in the
future, hopefully.

Famous saying: won in memory - lost in time (which is also valid in other way
around).
2015-11-25 13:01:22 +05:00
Sergey Sharybin
099aaea447 Cycles: Move branched path tracking into own file
Code there started becoming a bit too big, by splitting it up it'll make it
easier to do improvements or extending the features in there.

The layout is not totally final yet, would need to try de-duplicating parts
of code from split kernel with non-split integrators,
2015-06-15 23:02:42 +02:00
Sergey Sharybin
596eadf0e1 Cycles: Add debug pass which shows number of instance pushes during camera ray intersection
TODO: We might want to refactor debug passes into PASS_DEBUG and some
debug_type (similar to Blender's side passes) to avoid issue of running
out of bits.
2015-06-12 00:12:03 +02:00
Sergey Sharybin
2bd6de5bbb Cycles: Add debug pass showing average number of ray bounces per pixel
Quite straightforward implementation, but still needs some work for the split
kernel. Includes both regular and split kernel implementation for that.

The pass is not exposed to the interface yet because it's currently not really
easy to have same pass listed in the menu multiple times.
2015-06-11 14:53:15 +02:00
George Kyriazis
7f4479da42 Cycles: OpenCL kernel split
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.

Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.

Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.

This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.

More feature will be enabled once they're ported to the split kernel and
tested.

Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.

Based on the research paper:

  https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf

Here's the documentation:

  https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit

Design discussion of the patch:

  https://developer.blender.org/T44197

Differential Revision: https://developer.blender.org/D1200
2015-05-09 19:52:40 +05:00
Thomas Dinges
900fc43bb4 Cleanup: Remove unused ray type flags.
They were added for completeness, but it seems we don't need them.
2015-05-08 12:10:26 +02:00
Thomas Dinges
5e423775da Cleanup: Move Cycles volume stack update for subsurface into kernel_volume.h. 2015-04-28 11:20:27 +02:00
Thomas Dinges
3db0e1ef6a Cycles: Simplify volume light connect code. 2015-03-13 00:09:13 +01:00
Thomas Dinges
60679a171d Revert "Cleanup: Simplify camera sample motion blur code."
This reverts commit 8197f0bb64.
2015-02-26 13:27:02 +01:00
Thomas Dinges
8197f0bb64 Cleanup: Simplify camera sample motion blur code. 2015-02-26 10:30:01 +01:00
Thomas Dinges
d979f39cf1 Cycles: Small improvement for volume render (decoupled)
Simplify branching here a bit, helps ~3% in volume_light_sampling.blend (Branched MIS scene).
2015-02-14 20:44:30 +01:00
Sergey Sharybin
25f33e058a Fix T43562: Cycles gets stuck with camera in volume in certain setup
The issue was caused by the way how we shoot the ray to see which rays we're
inside which might start bouncing back-n-forth between two close to parallel
intersecting faces.

Real solution would be to record all the intersections when shooting the ray,
but it's kinda tricky on GPU because of needed sorting and uncertainty of
how huge intersection array should be.

For now we'll just limit number of steps in the check so in worst case we'll
have some samples not being correct which will be compensated with further
sampling. Shouldn't be an issue since probability of such a lock is quite
small actually.
2015-02-05 16:10:50 +05:00
Thomas Dinges
ee36e75b85 Cleanup: Fix Cycles Apache header.
This was already mixed a bit, but the dot belongs there.
2014-12-25 02:50:24 +01:00
Campbell Barton
4c60aae66c Cleanup: warnings 2014-10-06 23:19:07 +02:00