Commit Graph

79 Commits

Author SHA1 Message Date
Thomas Dinges
c08c931fb6 Cycles / CUDA: Increase maximum image textures on GPU.
Instead of 95, we can use 145 images now. This only affects Kepler and above (sm30, sm_35 and sm_50).

This can be increased further if needed, but let's first test if this does not come with a performance impact.

Originally developed during my GSoC 2013.
2014-05-11 03:38:39 +02:00
Thomas Dinges
fd26a32aa5 Fix T40119, CUDA Toolkit version mismatch 2014-05-10 01:26:04 +02:00
Campbell Barton
dc13969e48 Style cleanup: indentation, braces 2014-05-05 02:19:08 +10:00
Campbell Barton
1618329b00 Code cleanup: style, require ; for cuda_assert, opencl_assert 2014-05-04 03:57:50 +10:00
Brecht Van Lommel
198f5e506a Cycles: CUDA changes for kernel evaluation cancel 2014-05-02 21:19:10 -03:00
Campbell Barton
8d16869d83 Code cleanup: Add -Werror=float-conversion to Cycles 2014-05-03 07:31:46 +10:00
Brecht Van Lommel
741f17f05b Cycles CUDA: make CUDA toolkit 6.0 the official supported version.
This also updates the configurations to build kernels for compute capability
5.0 cards, when using and older CUDA toolkit version this will be skipped.

Also includes tweaks to improve performance with this version:
* Increase max registers on sm_30, sm_35 and sm_50
* No longer use texture storage on sm_30
2014-04-30 16:07:27 +02:00
Brecht Van Lommel
39bfde674c Cycles CUDA: don't use cuLaunchGridAsync at all for display devices.
As suggested by Martijn, this is slower than cuLaunchGrid.
2014-04-17 12:18:49 +02:00
Brecht Van Lommel
18da79f471 Cycles CUDA: only do async execution for GPUs not used for display.
Otherwise devices used for display will lock up the UI too much. This means
you might still get 100% CPU for the display device, but for others CPU usage
should be low still.

The check to see if a device is used for display may not be entirely reliable,
it checks if there is a watchdog timeout on the device, but I'm not entirely
sure that always exists for display devices or is disabled for non-display
devices, though some tools like cuda-gdb seem to make the same assumption.

Ref T39559
2014-04-17 12:08:18 +02:00
Brecht Van Lommel
415e10a0ef Fix another compile error with recent commit on visual studio. 2014-04-16 21:36:19 +02:00
Brecht Van Lommel
6f1afdbbfc Cycles CUDA: enabled branched path kernel again, with more registers. 2014-04-16 21:05:04 +02:00
Brecht Van Lommel
2851ed4a55 Cycles code refactor: use __launch_bounds__ instead of -maxrregcount for CUDA.
This makes it easier to have per kernel number of registers. Also, all the
tunable parameters for this are now in kernel.cu, rather than spread over cmake,
scons and device_cuda.cpp.
2014-04-16 21:05:04 +02:00
Thomas Dinges
297a2223b5 Cycles / CUDA: Increase sm_2x registers to 40.
This fixes the ptaxs "ACCESS_VIOLATION" error and should allow our Linux and Windows build bots to compile again.
Unfortunately this comes with a performance penalty on sm_2x cards, so this is only a workaround for now. Branched Path is still globally disabled on GPU.
2014-04-08 23:25:54 +02:00
Thomas Dinges
d923720312 Cycles: Disable Branched Path on all GPUs for now, until we separate the cubins.
SM_20 fails now as well, reported by Zanqdo in IRC.
2014-04-03 22:18:40 +02:00
Brecht Van Lommel
a2e4ebd36a Cycles code internals: add CPU kernel support for 3D image textures. 2014-03-29 13:03:48 +01:00
Thomas Dinges
859039f732 Cycles: Raise a proper error message when using Branched Path on sm_30, this is currently still disabled. 2014-03-27 10:29:22 +01:00
Sergey Sharybin
74518b2826 Fix T39420: Cycles viewport/preview flickers, when moving mouse across editors
Issue was caused by the wrong usage of OCIO GLSL binding API. To make it
work properly on pre-GLSL-1.3 drivers shader is to be enabled after the
texture is binded to the opengl context. Otherwise it wouldn't know the
proper texture size.

This is actually a regression in 2.70 and to be ported to 'a'.
2014-03-26 15:58:53 +06:00
Martijn Berger
28c1a860e2 Fix T39247
Changes to interpolation break texture allocation on sm35 and greater.
2014-03-19 07:37:18 +01:00
Martijn Berger
dd2dca2f7e Add support for multiple interpolation modes on cycles image textures
All textures are sampled bi-linear currently with the exception of OSL there texture sampling is fixed and set to smart bi-cubic.

This patch adds user control to this setting.

Added:
- bits to DNA / RNA in the form of an enum for supporting multiple interpolations types
- changes to the image texture node drawing code ( add enum)
- to ImageManager (this needs to know to allocate second texture when interpolation type is different)
- to node compiler (pass on interpolation type)
- to device tex_alloc this also needs to get the concept of multiple interpolation types
- implementation for doing non interpolated lookup for cuda and cpu
- implementation where we pass this along to osl ( this makes OSL also do linear untill I add smartcubic to the interface / DNA/ RNA)

Reviewers: brecht, dingto

Reviewed By: brecht

CC: dingto, venomgfx

Differential Revision: https://developer.blender.org/D317
2014-03-07 23:16:33 +01:00
Martijn Berger
1d01675833 Cuda use streams and async to avoid busywaiting
This switches api usage for cuda towards using more of the Async calls.

Updating only once every second is sufficiently cheap that I don't think it is worth doing it less often.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D262
2014-03-06 20:51:46 +01:00
Brecht Van Lommel
6b1a4fc66e Cycle CUDA: revert the f1aeb2ccf4 and 84f958754 busywait fixes for now.
It's unclear what kind of impact they have on performance at the moment, so I
rather play it safe and postpone this for 2.71.

Ref T38679, Ref T38712
2014-02-19 16:08:08 +01:00
Martijn Berger
f1aeb2ccf4 this is an attempted Fix: T38679
Cycles GPU Performance Regression

From my testing this (what i should have done in the first place) reduces the regression a lot.
Lets hope it is enough or we have to go back to busy waiting.
2014-02-17 20:11:45 +01:00
Martijn Berger
84f9587540 Cuda use streams and async to avoid busywaiting
This is my first stab at this and is based on this IRC converstation:

<mib2berlin> brecht: this is meaning as reminder only, I know you have other things to do > http://openvidia.sourceforge.net/index.php/Optimization_Notes#avoiding_busy_waits
<brecht> mib2berlin: thanks, bookmarked

only tested on Ubuntu 14.04 / cuda 5.0 but ill do some more testing tomorrow.

Also unsure about the placement and the lifetime of the stream and the event. But creating / deleting these seems to incur a non trivial cost.

Reviewers: brecht

Reviewed By: brecht

CC: mib2berlin, dingto

Differential Revision: https://developer.blender.org/D262
2014-01-28 18:40:08 +01:00
Brecht Van Lommel
241fccaf6a Fix T37817: cycles CUDA detection problem on Windows with non-ascii paths. 2014-01-11 00:47:58 +01:00
Martijn Berger
85a0c5d4e1 Cycles: network render code updated for latest changes and improved
This actually works somewhat now, although viewport rendering is broken and any
kind of network error or connection failure will kill Blender.

* Experimental WITH_CYCLES_NETWORK cmake option
* Networked Device is shown as an option next to CPU and GPU Compute
* Various updates to work with the latest Cycles code
* Locks and thread safety for RPC calls and tiles
* Refactored pointer mapping code
* Fix error in CPU brand string retrieval code

This includes work by Doug Gale, Martijn Berger and Brecht Van Lommel.

Reviewers: brecht

Differential Revision: http://developer.blender.org/D36
2013-12-07 12:26:58 +01:00
Thomas Dinges
b5a5773fa9 Cycles / CUDA:
* Remove support for  CUDA Toolkit 4.x, only Toolkit 5.0 and above are supported now.
* Remove support for sm_1x cards (< Fermi) for good. We didn't officially support those cards for a few releases already, now remove some special code that was still there.
2013-10-08 15:29:28 +00:00
Brecht Van Lommel
fa352bb749 Fix #35684: cycles unable to use full 6GB of memory on NVidia Titan GPU. We now
use arrays instead of textures for general storage on this card (image textures
are still stored as texture). Textures were found to be faster on older cards,
but the limits on 1D texture size have not increased along with the memory size,
which meant that the full 6 GB could not be used.

The performance actually seems to be slightly better with arrays in some tests
on Titan. For older cards there seems to be a bit of a mix, some are better and
others not. We may change those to use arrays too, but more testing is needed,
only Titan and Tesla K20 (sm_35) is changed for now.

The fact that arrays are faster is a bit surprising, as others found textures
to be faster on Kepler. However even if they were, the memory limitation is
more important to solve anyway.
https://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus-kepler-and-fermi-addendum
2013-09-27 19:09:31 +00:00
Brecht Van Lommel
29f6616d60 Cycles: viewport render now takes scene color management settings into account,
except for curves, that's still missing from the OpenColorIO GLSL shader.

The pixels are stored in a half float texture, converterd from full float with
native GPU instructions and SIMD on the CPU, so it should be pretty quick.
Using a GLSL shader is useful for GPU render because it avoids a copy through
CPU memory.
2013-08-30 23:49:38 +00:00
Brecht Van Lommel
6785874e7a Fix #36137: cycles render not using all GPU's when the number of GPU's is larger
than the number of CPU threads
2013-08-30 23:09:22 +00:00
Brecht Van Lommel
01e22d1b9f Cycles: more code refactoring to rename things internally as well. Also change
property name back so we keep compatibility.
2013-08-23 14:34:34 +00:00
Brecht Van Lommel
b9ce231060 Cycles: relicense GNU GPL source code to Apache version 2.0.
More information in this post:
http://code.blender.org/

Thanks to all contributes for giving their permission!
2013-08-18 14:16:15 +00:00
Thomas Dinges
743a7a4a4b Cycles:
* GPU kernel can now be compiled without __NON_PROGRESSIVE__ again, was broken after my last commit. Also add a check for have_error(), in case the GPU kernel comes without Non-Progressive, to avoid a crash.

* Don't compile progressive kernel twice on CPU, if __NON_PROGRESSIVE__ would be disabled there.
2013-08-09 20:03:49 +00:00
Thomas Dinges
a18112249d Cycles / Non-Progressive integrator:
* Non-Progressive integrator is now available on the GPU (CUDA, sm_20 and above). 

Implementation details:
* kernel_path_trace() has been split up into two functions:
kernel_path_trace_non_progressive() and kernel_path_trace_progressive().

* We compile two CUDA kernel entry functions (in kernel.cu) for the two integrators, they are still inside one .cubin file but due to the kernel separation there should be no performance problem. I tested with the BMW file on my Geforce 540M and the render times were the same for 100 samples (1.57 min in my case).

This is part of my GSoC project, SVN merge of r59032 + manual merge of UI changes for this from my branch.
2013-08-09 18:47:25 +00:00
Brecht Van Lommel
7902fa57b6 Code cleanup: cycles
* Reshuffle SSE #ifdefs to try to avoid compilation errors enabling SSE on 32 bit.
* Remove CUDA kernel launch size exception on Mac, is not needed.
* Make OSL file compilation quiet like c/cpp files.
2013-06-26 23:29:33 +00:00
Brecht Van Lommel
2e3035dd80 Cycles OpenCL: make displacement and world importance sampling work. 2013-06-21 13:05:08 +00:00
Brecht Van Lommel
8d6e5e2fee Cycles: update build configurations to include CUDA sm_35 architecture. When using
a compiler older than CUDA 5.0 it will give a warning and skip this architecture.
2013-06-20 13:10:47 +00:00
Brecht Van Lommel
16204bd647 Cycles: prepare to make CUDA 5.0 the official version we use
* Add CUDA compiler version detection to cmake/scons/runtime
* Remove noinline in kernel_shader.h and reenable --use_fast_math if CUDA 5.x
  is used, these were workarounds for CUDA 4.2 bugs
* Change max number of registers to 32 for sm 2.x (based on performance tests
  from Martijn Berger and confirmed here), and also for NVidia OpenCL.

Overall it seems that with these changes and the latest CUDA 5.0 download, that
performance is as good as or better than the 2.67b release with the scenes and
graphics cards I tested.
2013-06-19 17:54:23 +00:00
Thomas Dinges
11707119de Cycles:
* Code cleanup, remove unused "resolution" variable from the DeviceTask class, was never used.
2013-05-14 21:18:20 +00:00
Brecht Van Lommel
cd3283f573 Cycles CUDA: in case of cryptic error messages in the console, refer to wiki
documentation for possible solutions.
2013-05-13 21:36:48 +00:00
Thomas Dinges
50c28740d4 Cycles / CUDA:
* Simplify Computing Capability Check, only check for major.
2013-03-17 14:32:50 +00:00
Thomas Dinges
dc90ce5b6d Cycles GPU rendering:
* Deprecate computing capability 1.3 (sm_13)

This commit disables auto build of sm_13 CUDA platform, which means that starting with Blender 2.67, we don't support sm_13 devices anymore. It has become difficult to support that and it was already feature incomplete (no render-passes, AO, Multi Closure etc).

It's still possible to manually enable sm_13 for own tests, but building might break in the future.
2013-02-21 17:14:07 +00:00
Brecht Van Lommel
313dfbe35d Add some more detailed CUDA error prints to try to debug #34166. 2013-02-15 14:54:11 +00:00
Brecht Van Lommel
909d64079a Fix #34226: cycles shadow pass got incorrectly influenced by world multiple
importance sampleing.
2013-02-13 16:46:18 +00:00
Thomas Dinges
f146317b09 Cycles:
* CUDA: Make it more clear that sm_12 and below is not supported.
* OpenCL: __KERNEL_SHADING__ was declared twice for nvidia opencl device.
* Some reshuffle of defines in kernel_types.h. No functional changes.
2013-01-15 19:02:17 +00:00
Sergey Sharybin
e5179bfefc Remove usage WITH_CYCLES_CUDA_BINARIES in code, use check for
precompiled cubins instead,

Logic here is following now:
- If there're precompiled cubins, assume CUDA compute is available,
  otherwise
- If cuda toolkit found, assume CUDA compute is available
- In all other cases CUDA compute is not available

For windows there're still check for only precompiled binaries,
no runtime compilation is allowed.

Ended up with such decision after discussion with Brecht. The thing
is, if we'll support runtime compilation on windows we'll end up
having lots of reports about different aspects of something doesn't
work (you need particular toolkit version, msvc installed, environment
variables set properly and so) and giving feedback on such reports
will waste time.
2013-01-14 17:30:33 +00:00
Brecht Van Lommel
35c0b821a5 Cycles: deal a bit better with errors when CUDA runs out of memory, try to avoid crashes. 2012-12-23 12:53:58 +00:00
Brecht Van Lommel
fdadfde5c5 Fix #33158: motion vector pass wrong in cycles in some scenes, wrong vectors
due to float precision problem in matrix inverse.
2012-11-21 01:00:03 +00:00
Sergey Sharybin
6eec49ed20 Cycles: memory usage report
This commit adds memory usage information while rendering.

It reports memory used by device, meaning:

- For CPU it'll report real memory consumption
- For GPU rendering it'll report GPU memory consumption, but it'll
  also mean the same memory is used from host side.

This information displays information about memory requested by Cycles,
not memory really allocated on a device. Real memory usage might be
higher because of memory fragmentation or optimistic memory allocator.

There's really nothing we can do against this.

Also in contrast with blender internal's render cycles memory usage
does not include memory used by scene, only memory needed by cycles
itself will be displayed. So don't freak out if memory usage reported
by cycles would be much lower than blender internal's.

This commit also adds RenderEngine.update_memory_stats callback which
is used to tell memory consumption from external engine to blender.
This information is used to generate information line after rendering
is finished.
2012-11-05 08:04:57 +00:00
Sergey Sharybin
3b88a29abf Cycles: progressive refine option
Just makes progressive refine :)

This means the whole image would be refined gradually using as much
threads as it's set in performance settings. Having enough tiles is
required to have this option working as it's expected.

Technically it's implemented by repeatedly computing next sample for
all the tiles before switching to next sample.

This works around 7-12% slower than regular tile-based rendering, so
use this option only if you really need it.

This commit also fixes progressive update of image when Save Buffers
option is enabled.

And one more thing this commit fixes is handling display buffer with
Save Buffers option enabled. If this option is enabled image buffer
wouldn't have neither byte nor float buffer until image is fully
rendered which could backfire in missing image while rendering in
cases color management cache became full.

This issue solved by allocating byte buffer for image buffer from
tile update callback.

Patch was reviewed by Brecht. He also made some minor edits to
original version to patch. Thanks, man!
2012-10-13 12:38:32 +00:00
Lukas Toenne
efaf512406 Revert r50528: "Performance fix for Cycles: Don't wait in the main UI thread when resetting devices."
This commit leads to random freezes in Cycles rendering:
https://projects.blender.org/tracker/index.php?func=detail&aid=32545&group_id=9&atid=498

The goal of this commit was to remove UI lag for OSL, but since that is not officially supported yet, better revert it until a proper fix can be implemented in 2.65.
2012-09-17 12:07:06 +00:00