This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.
Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.
Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.
This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.
More feature will be enabled once they're ported to the split kernel and
tested.
Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.
Based on the research paper:
https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf
Here's the documentation:
https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit
Design discussion of the patch:
https://developer.blender.org/T44197
Differential Revision: https://developer.blender.org/D1200
Issue was introduced in 01ee21f where i didn't notice *_setup()
function only doing partial initialization, and some of parameters
are expected to be initialized by callee function.
This was hitting only some setups, so tests with benchmark scenes
didn't unleash issues. Now it should all be fine.
This is to go to the 2.74 branch and we actually might re-AHOY.
This was caused by some internal optimization which evaluated SSS with
size of zero as BSDF but used different ID so the evaluation result
didn't appear in regular diffuse pass.
This lead to situation when SSS data was nowhere stored if the
size was zero.
Now SSS with zero size and close-to-zero sizes will be handled in the
same way from the passes point of view.
tri_shader does no longer need to a float.
Reviewers: dingto, sergey
Reviewed By: dingto, sergey
Subscribers: dingto
Projects: #cycles
Differential Revision: https://developer.blender.org/D789
Root of the issue goes back to the on-fly normals commit and the
latest fix for it wasn't actually correct. I've mixed two fixes
in there.
So the idea here goes back to storing negative scaled object flag
and flip runtime-calculated normal if this flag is set, which is
pretty much the same as the original fix for the issue from me.
The issue with motion blur wasn't caused by the rumtime normals
patch and it had issues before, because it already did runtime
normals calculation. Now made it so motion triangles takes the
negative scale flag into account.
This actually makes code more clean imo and avoids rather confusing
flipping code in mesh.cpp.
Fix T41079: Solid black render of object with negative scale and smooth shading
In both cases the issue was caused by negative scaled objects with single mesh
users for which scale gets applied when using static BVH.
Since the on-fly normals calculation land normals for such cases weren't flipped
leading them to point to a wrong direction.
Added a special object flag for this, which is a bit of a bummer because now
we've got less bits for real useful things, but this is the only way to get
proper normals without adding more complexity in the on-fly calculations.
* Volume multiple importace sampling support to combine equiangular and distance
sampling, for both homogeneous and heterogeneous volumes.
* Branched path "Sample All Direct Lights" and "Sample All Indirect Lights" now
apply to volumes as well as surfaces.
Implementation note:
For simplicity this is all done with decoupled ray marching, the only case we do
not use decoupled is for distance only sampling with one light sample. The
homogeneous case should still compile on the GPU because it only requires fixed
size storage, but the heterogeneous case will be trickier to get working.
Instead of pre-calculation and storage, we now calculate the face normal during render.
This gives a small slowdown (~1%) but decreases memory usage, which is especially important for GPUs,
where you have limited VRAM.
Part of my GSoC 2014.
This was the original code to get things working on old GPUs, but now it is no
longer in use and various features in fact depend on this to work correctly to
the point that enabling this code is too buggy to be useful.
This can for example be useful if you want to manually terminate the path at
some point and use a color other than black.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D454
This is done by adding a Volume Scatter node. In many cases you will want to
add together a Volume Absorption and Volume Scatter node with the same color
and density to get the expected results.
This should work with branched path tracing, mixing closures, overlapping
volumes, etc. However there's still various optimizations needed for sampling.
The main missing thing from the volume branch is the equiangular sampling for
homogeneous volumes.
The heterogeneous scattering code was arranged such that we can use a single
stratified random number for distance sampling, which gives less noise than
pseudo random numbers for each step. For volumes where the color is textured
there still seems to be something off, needs to be investigated.
This is done using the existing Emission node and closure (we may add a volume
emission node, not clear yet if it will be needed).
Volume emission only supports indirect light sampling which means it's not very
efficient to make small or far away bright light sources. Using direct light
sampling and MIS would be tricky and probably won't be added anytime soon. Other
renderers don't support this either as far as I know, lamps and ray visibility
tricks may be used instead.
This works pretty much as you would expect, overlapping volume objects gives
a more dense volume. What did change is that world volume shaders are now
active everywhere, they are no longer excluded inside objects.
This may not be desirable and we need to think of better control over this.
In some cases you clearly want it to happen, for example if you are rendering
a fire in a foggy environment. In other cases like the inside of a house you
may not want any fog, but it doesn't seem possible in general for the renderer
to automatically determine what is inside or outside of the house.
This is implemented using a simple fixed size array of shader/object ID pairs,
limited to max 15 overlapping objects. The closures from all shaders are put
into a single closure array, exactly the same as if an add shader was used to
combine them.
This is the simplest possible volume rendering case, constant density inside
the volume and no scattering or emission. My plan is to tweak, verify and commit
more volume rendering effects one by one, doing it all at once makes it
difficult to verify correctness and track down bugs.
Documentation is here:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Materials/Volume
Currently this hooks into path tracing in 3 ways, which should get us pretty
far until we add more advanced light sampling. These 3 hooks are repeated in
the path tracing, branched path tracing and transparent shadow code:
* Determine active volume shader at start of the path
* Change active volume shader on transmission through a surface
* Light attenuation over line segments between camera, surfaces and background
This is work by "storm", Stuart Broadfoot, Thomas Dinges and myself.
* Henyey-Greenstein scattering closure implementation.
* Rename transparent to absorption node and isotropic to scatter node.
* Volume density is folded into the closure weights.
* OSL support for volume closures and nodes.
* This commit has no user visible changes, there is no volume render code yet.
This is work by "storm", Stuart Broadfoot, Thomas Dinges and myself.
This to avoids build conflicts with libc++ on FreeBSD, these __ prefixed values
are reserved for compilers. I apologize to anyone who has patches or branches
and has to go through the pain of merging this change, it may be easiest to do
these same replacements in your code and then apply/merge the patch.
Ref T37477.
* Remove support for CUDA Toolkit 4.x, only Toolkit 5.0 and above are supported now.
* Remove support for sm_1x cards (< Fermi) for good. We didn't officially support those cards for a few releases already, now remove some special code that was still there.
A new hair bsdf node, with two closure options, is added. These closures allow the generation of the reflective and transmission components of hair. The node allows control of the highlight colour, roughness and angular shift.
Llimitations include:
-No glint or fresnel adjustments.
-The 'offset' is un-used when triangle primitives are used.
New features:
* Bump mapping now works with SSS
* Texture Blur factor for SSS, see the documentation for details:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Shaders#Subsurface_Scattering
Work in progress for feedback:
Initial implementation of the "BSSRDF Importance Sampling" paper, which uses
a different importance sampling method. It gives better quality results in
many ways, with the availability of both Cubic and Gaussian falloff functions,
but also tends to be more noisy when using the progressive integrator and does
not give great results with some geometry. It works quite well for the
non-progressive integrator and is often less noisy there.
This code may still change a lot, so unless you're testing it may be best to
stick to the Compatible falloff function.
Skin test render and file that takes advantage of the gaussian falloff:
http://www.pasteall.org/pic/show.php?id=57661http://www.pasteall.org/pic/show.php?id=57662http://www.pasteall.org/blend/23501
* Non-Progressive integrator is now available on the GPU (CUDA, sm_20 and above).
Implementation details:
* kernel_path_trace() has been split up into two functions:
kernel_path_trace_non_progressive() and kernel_path_trace_progressive().
* We compile two CUDA kernel entry functions (in kernel.cu) for the two integrators, they are still inside one .cubin file but due to the kernel separation there should be no performance problem. I tested with the BMW file on my Geforce 540M and the render times were the same for 100 samples (1.57 min in my case).
This is part of my GSoC project, SVN merge of r59032 + manual merge of UI changes for this from my branch.
* Render Passes are now available for Subsurface Scattering (Direct, Indirect and Color pass).
This is part of my GSoC project, SVN merge of r58587, r58828 and r58835.
* Added a Ray Depth output to the Light Path node, which gives the user access to the current bounce.
This can be used to limit the maximum ray bounce on a per shader basis. Another use case is to restrict light influence with this, to have a lamp only contribute to the direct lighting.
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/More#Light_Path
This is part of my GSoC 2013 project. SVN merge of r58091 and r58772 from soc-2013-dingto.
* Add CUDA compiler version detection to cmake/scons/runtime
* Remove noinline in kernel_shader.h and reenable --use_fast_math if CUDA 5.x
is used, these were workarounds for CUDA 4.2 bugs
* Change max number of registers to 32 for sm 2.x (based on performance tests
from Martijn Berger and confirmed here), and also for NVidia OpenCL.
Overall it seems that with these changes and the latest CUDA 5.0 download, that
performance is as good as or better than the 2.67b release with the scenes and
graphics cards I tested.
for Apple OpenCL on OS X 10.8 and simple AO render.
Also environment variable CYCLES_OPENCL_TEST can now be set to CPU, GPU,
ACCELERATOR, DEFAULT or ALL values to test particuler devices.
Code is added to restrict the pixel size of strands in cycles. It works best with ribbon primitives and a preset for these is included. It uses distance dependent expansion of the strands and then stochastic strand removal to give a fading. To prevent a slowdown for triangle mesh objects in the BVH an extra visibility flag has been added. It is also only applied for camera rays.
The strand width settings are also changed, so that the particle size is not included in the width calculation. Instead there is a separate particle system parameter for width scaling.