Commit Graph

139 Commits

Author SHA1 Message Date
Thomas Dinges
3807bcb3a8 Cleanup: Rename texture slots to float4 and byte, to distinguish from future float (single channel) and half_float slots.
Should be no functional changes, tested CPU and CUDA.
2016-05-06 14:37:35 +02:00
Sergey Sharybin
980f3c3693 Fix T48346: Transparent shadows do not work for instanced objects 2016-05-04 14:46:30 +02:00
Sergey Sharybin
ac00c17900 Cycles: Remove hair support from volume BVH traversal
There are couple of reasons:

- Volume shader on hair does behave really weird anyway and it's
  not something considered a bug really.

- Volume BVH traversal were only used by camera-in-volume check,
  which doesn't really make sense to take hair into account since
  it'll be rendered wrong anyway.

Such a removal makes both code easier to extend further (as in,
no need to worry about those traversal for hair bvh) and also
reduces stress on GPU compilers.
2016-04-11 17:18:14 +02:00
Sergey Sharybin
6cd13a221f Cycles: Rename tri_woop to tri_storage
It's no longer a pre-computed data and just a storage of triangle
coordinates which are faster to access to.
2016-04-11 17:18:14 +02:00
Sergey Sharybin
65f279b770 Cycles: Fix wrong camera in volume check when domain is only visible to camera rays 2016-04-04 19:30:38 +02:00
Sergey Sharybin
700722f686 Cycles: Cleanup, indent nested preprocessor directives
Quite straightforward, main trick is happening in path_source_replace_includes().

Reviewers: brecht, dingto, lukasstockner97, juicyfruit

Differential Revision: https://developer.blender.org/D1794
2016-03-25 13:55:42 +01:00
Sergey Sharybin
0e47e0cc9e Cycles: Use dedicated BVH for subsurface ray casting
This commit makes it so casting subsurface rays will totally ignore all
the BVH nodes and primitives which do not belong to a current object,
making it much simpler traversal code and reduces number of intersection
tests.

Reviewers: brecht, juicyfruit, dingto, lukasstockner97

Differential Revision: https://developer.blender.org/D1823
2016-03-25 13:42:13 +01:00
Sergey Sharybin
1c4f21f85e Cycles: Initial support of 3D textures for CUDA rendering
Supports both smoke/fire and point density textures now.

Reduces number of textures available for sm_20 and sm_21, but you have
to compromise somewhere on such a limited hardware.

Currently limited to linear interpolation only, and decoupled ray
marching is not supported yet. Think those could be considered just a
further improvement.

Some quick example:

  https://developer.blender.org/F282934

Code is minimal and we can fully consider it a fix for missing
support of 3D textures with CUDA.

Reviewers: lukasstockner97, brecht, juicyfruit, dingto

Reviewed By: brecht, juicyfruit, dingto

Subscribers: mib2berlin

Differential Revision: https://developer.blender.org/D1806
2016-02-15 21:26:29 +01:00
Sergey Sharybin
c93069083e Cycles: Tweaks for 32bit CUDA binaries
Tweak some inline policies. Not totally crazy yet, and in fact we now
have one less ifdef statement now.
2016-02-15 19:11:02 +01:00
Sergey Sharybin
3aa74828ab Cycles: Cleanup, indentation and braces 2016-02-03 15:00:55 +01:00
Sergey Sharybin
72e31d6a72 Cycles: Always inline triangle precalc for CUDA devices
Since the SSS changes compiling Experimental sm_52 kernel seems
to work just fine.
2016-01-11 21:41:00 +05:00
Campbell Barton
fc9505c9c5 Cleanup: warnings & spelling 2015-12-02 13:15:52 +11:00
Sergey Sharybin
a6bbf05ba6 Cycles: Fix wrong SSS intersection refinement when this option is disabled
The code is disabled by default, but we'd better keep it all correct.
2015-12-02 03:14:54 +05:00
Sergey Sharybin
8bca34fe32 Cysles: Avoid having ShaderData on the stack
This commit introduces a SSS-oriented intersection structure which is replacing
old logic of having separate arrays for just intersections and shader data and
encapsulates all the data needed for SSS evaluation.

This giver a huge stack memory saving on GPU. In own experiments it gave 25%
memory usage reduction on GTX560Ti (722MB vs. 946MB).

Unfortunately, this gave some performance loss of 20% which only happens on GPU.
This is perhaps due to different memory access pattern. Will be solved in the
future, hopefully.

Famous saying: won in memory - lost in time (which is also valid in other way
around).
2015-11-25 13:01:22 +05:00
Sergey Sharybin
47b1279762 Cycles: Watertight fix for SSS intersection
Same as previous commit, just was missing in there.
2015-10-22 22:10:40 +05:00
Sergey Sharybin
f84cbae43e Cycles: Fix for watertight intersection
It was possible to miss some intersection caused by wrong barycentric
coordinates sign.

Cases when one of the coordinate is zero and other are negative was not
handled correct.
2015-10-22 22:07:28 +05:00
Sergey Sharybin
29247a7a05 Cycles: Fix wrong intersection with motion blur and degenerate object transform 2015-10-09 15:58:03 +05:00
Sergey Sharybin
8fa4fccec4 Cycles: Fix intersection issues caused by degenerate instance matrix
Issue was caused by wrong intersection distance scaling on instance pop,
which could cause intersection distance to become zero, confusing following
intersection checks.
2015-10-09 15:58:03 +05:00
Sergey Sharybin
3cee28ebf3 Fix T46143: Faces missing with GPU render
Epsilon was quite arbitrary for GPU, replaced with checking for zero-sized faces.

It should solve both original report and the new one. After the release we can check
why GPU doesn't produce accurate math here and go to the root of the issue.
2015-09-17 17:21:17 +05:00
Sergey Sharybin
1a04179802 Cycles: Cleanup, typo
Spotted by Campbell, thanks!
2015-09-09 14:25:43 +05:00
Sergey Sharybin
d13a0e8f4a Cycles: Limit triangle magnitude check for only GPU
Found a way to make AVX2 CPUs happy by reshuffling instructions a bit,
so now there's no weird precision errors happening in there.

This solves some render speed regressions on CPU, but unfortunately
this doesn't help for GPU rendering.
2015-09-09 13:39:36 +05:00
Sergey Sharybin
46d2abf78f Cycles: Only use ascii in comments 2015-09-09 13:39:36 +05:00
Sergey Sharybin
1a7eca3c54 Fix T46034: OpenCL kernel compilation error in latest buildbot
Simply expanded expression, so no float4->float3 conversion happens.
2015-09-07 15:02:44 +05:00
Sergey Sharybin
713ce037ab Cycles: Fix wrong check for zero-sized triangles
Initial idea was to optimize calculation a bit by skipping calculation of actual
triangle edges and use vector from ray origin to triangles. In practice this
optimization didn't quite work in cases when origin point is too close to the
triangle.

Let's do 2.76 with a bit more complicated calculation, still looking into exact
reasons why watertight intersections fails in certain cases, but actual fix might
bit be ready so soon.

This fixes wrong eyes on the lady from T46013.
2015-09-04 20:06:31 +05:00
Sergey Sharybin
7dc75ea8f4 Fix T45904: Cycles bug after recent triangle intersect changes
Calculated cross product from wrong vectors by accident.
2015-08-25 18:32:11 +02:00
Sergey Sharybin
2fb639deed Fix T45778: Objects scaled to 0 cause black artifacts with Static BVH
The issue was caused by some numeric instability in triangle intersection which
was visible on avx2 CPUs and GPUs (at least sm_20 here) but maybe some others
too.

Committing rather a workaround for now to be safe for the release, still need
some investigation.

From tests with grass field from Gooseberry project didn't see measurable
slowdown.
2015-08-24 21:23:49 +02:00
Sergey Sharybin
9f63cbf4a7 Fix T45333: Volume Scatter crash blender 2015-07-13 18:54:26 +05:00
Sergey Sharybin
a95b0e0e9d Cycles: Another fix for OSX, sm_50 experimental actually also fails to compile
Didn't notice it originally because compilation was threaded.
2015-06-20 19:40:23 +02:00
Sergey Sharybin
5e2835037a Cycles: Tweak to previous commit, experimental sm_52 works on Linux but not OSX 2015-06-20 19:01:24 +02:00
Sergey Sharybin
34d665a4a4 Cycles: Un-inline triangle_intersect_precalc() on Apple OpenCL
This gives quite the same problems as experimental CUDA kernels
and for until it's found a root cause of the problem we'd just
explicitly uninline the function.
2015-06-20 18:00:30 +02:00
Sergey Sharybin
845854959f Cycles: Cleanup, make it more obvious which platform requires workaround for triangle intersection
Should be no functional changes.
2015-06-20 17:01:21 +02:00
Sergey Sharybin
34c3beb339 Cycles: Fix missing node distance update when only two child intersected in QBVH 2015-06-12 10:06:46 +02:00
Sergey Sharybin
596eadf0e1 Cycles: Add debug pass which shows number of instance pushes during camera ray intersection
TODO: We might want to refactor debug passes into PASS_DEBUG and some
debug_type (similar to Blender's side passes) to avoid issue of running
out of bits.
2015-06-12 00:12:03 +02:00
Sergey Sharybin
b3cc602adc Cycles: Remove meaningless debug traversal steps increment from QBVH volume code 2015-06-11 23:54:57 +02:00
Sergey Sharybin
2ab909a88c Cycles: Make experimental kernel build option more generic
Previously it was explicitly mentioning it's NVidia kernel related option,
but in fact it's also handy for the OpenCL kernel.
2015-05-15 13:22:47 +05:00
Sergey Sharybin
3d3d805b64 Cycles: Prepare code for OpenCL camera/motion blur
The kernels are now compiling just fine, but there're some issues
during rendering. This is still to be investigated.
2015-05-14 18:48:56 +05:00
Sergey Sharybin
5a63edb929 Cycles: Use special _auto versions of transform function in motion blur code
Doing this as a separate commit so it's easier to revert in the future, once
OpenCL 2.0 is becoming our requirement.
2015-05-14 18:48:56 +05:00
Sergey Sharybin
79aa50dc53 Cycles: Enable hair for split kernels when using Intel or NVidia drivers
Apart from simply enabling this features needed changes to the code were done.
Technical change, replacing SD access from "simple" structure to SOA.
2015-05-14 18:48:56 +05:00
Sergey Sharybin
583fd3af65 Cycles: Fix typo in global space version of normal transform
It was using direction transform, which is obviously wrong.
2015-05-10 00:53:32 +05:00
George Kyriazis
7f4479da42 Cycles: OpenCL kernel split
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.

Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.

Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.

This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.

More feature will be enabled once they're ported to the split kernel and
tested.

Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.

Based on the research paper:

  https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf

Here's the documentation:

  https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit

Design discussion of the patch:

  https://developer.blender.org/T44197

Differential Revision: https://developer.blender.org/D1200
2015-05-09 19:52:40 +05:00
Thomas Dinges
4eab0e72b3 Cleanup: Update some comments and add ToDo. 2015-04-29 23:56:46 +02:00
Thomas Dinges
b3def11f5b Cycles: Record all possible volume intersections for SSS and camera checks
This replaces sequential ray moving followed with scene intersection with
single BVH traversal, which gives us all possible intersections.

Only implemented for CPU, due to qsort and a bigger memory usage on GPU
which we rather avoid. GPU still uses the regular bvh volume intersection code, while CPU now uses the new code.

This improves render performance for scenes with:
a) Camera inside volume mesh
b) SSS mesh intersecting a volume mesh/domain

In simple volume files (not much geometry) performance is roughly the same
(slightly faster). In files with a lot of geometry, the performance
increase is larger. bmps.blend with a volume shader and camera inside the
mesh, it renders ~10% faster here.

Patch by Sergey and myself.

Differential Revision: https://developer.blender.org/D1264
2015-04-29 23:31:06 +02:00
Sergey Sharybin
ae7d84dbc1 Cycles: Use native saturate function for CUDA
This more a workaround for CUDA optimizer which can't optimize clamp(x, 0, 1)
into a single instruction and uses 4 instructions instead.

Original patch by @lockal with own modification:

  Don't make changes outside of the kernel. They don't make any difference
  anyway and term saturate() has a bit different meaning outside of kernel.

This gives around 2% of speedup in Barcelona file, but in more complex shader
setups with lots of math nodes with clamping speedup could be much nicer.

Subscribers: dingto

Projects: #cycles

Differential Revision: https://developer.blender.org/D1224
2015-04-28 00:38:32 +05:00
Sergey Sharybin
828abaf11c Cycles: Split BVH nodes storage into inner and leaf nodes
This way we can get rid of inefficient memory usage caused by BVH boundbox
part being unused by leaf nodes but still being allocated for them. Doing
such split allows to save 6 of float4 values for QBVH per leaf node and 3
of float4 values for regular BVH per leaf node.

This translates into following memory save using 01.01.01.G rendered
without hair:

                   Device memory size   Device memory peak   Global memory peak
Before the patch:  4957                 5051                 7668
With the patch:    4467                 4562                 7332

The measurements are done against current master. Still need to run speed tests
and it's hard to predict if it's faster or not: on the one hand leaf nodes are
now much more coherent in cache, on the other hand they're not so much coherent
with regular nodes anymore.

Reviewers: brecht, juicyfruit

Subscribers: venomgfx, eyecandy

Differential Revision: https://developer.blender.org/D1236
2015-04-20 17:29:51 +05:00
Sergey Sharybin
bf11e362c5 Fix T44046: Cycles speed regression in 2.74 (CPU only)
Issue was caused by MSVC not being able to optimize some code out in the same
way as GCC/Clang does, so now that parts of code are explicitly unfolded in
order to help compilers out.

This makes speed loss much less drastic on my laptop. That's probably as good
as we can do with MSVC without investing infinite amount of time looking trying
to workaround the optimizer.
2015-04-08 18:47:25 +05:00
Sergey Sharybin
09a746b857 Cycles: Cleanup, typos 2015-04-08 01:15:38 +05:00
Sergey Sharybin
858f54f16e Cycles: Cleanup, indentation 2015-04-07 22:41:08 +05:00
Sergey Sharybin
e2354e64d2 Cycles: Cleanup, spaces around assignment operator
Did some bad spacing in recent commits, better to get rid of those so
they does not confuse those who're working on sources.
2015-04-07 00:25:54 +05:00
Sergey Sharybin
ab2d05d958 Fix T44269: Typo in volume_attribute_float:geom_volume.h
Was rather harmless typo since we either pass both dx,dy or pass both NULL.
2015-04-05 19:07:45 +05:00
Sergey Sharybin
f1494edf78 Cycles: Make SSS intersection closer to regular triangle intersection 2015-04-01 21:20:04 +05:00