They were lost during simplification of kernel loading but might be rather
crucial for the performance.
Also made it so cflags are shared across kernels. Surely it might lead to
some unwanted kernel re-compilation but at the same time they might easily
run out of sync with the changes in kernel and so.
Experimental feature set id currently unavailable for megakernel, it'll
require some changes to the cache system to distinguish cached regular
kernels from cached experimental kernels.
Currently unused, but some features will be enabled soon.
This required allocating some memory related on object transform needed
by ShaderData and currently it is done for all the platforms. Since we're
targeting full feature-complete platforms this is rather acceptable at
this point and in the future we'll do selective NO_HAIR/NO_SSS/NO_BLUR
kernels.
This is experimental still and in fact there're some major issues on
NVidia platform and it's not really clear if it's a bug in compiler,
some uninitizlied variable or other kind of issue.
Some stupid fixes like spaces around operator and missing semicolon,
plus fix for wrong detecting of ShaderData SOA size. Thar was harmless
since there's only one closure array, but still better to fix this.
This file was actually checking for features enabled on CPU and surely all
of them were enabled, so removing them does not cause any difference.
ideally we'll need to do runtime feature detection and just pass some stuff
as NULL to the kernel, or maybe also have variadic kernel entry points which
is also possible quite easily.
No need to store them in the class, they're unlikely to be changed
and if they do change we're in big trouble anyway.
More appropriate approach would be then to typedef this things in
kernel_types.h, but still use inlined sizeof(),
Only those ones are priority for now, all the rest are still testable
if CYCLES_OPENCL_TEST or CYCLES_OPENCL_SPLIT_KERNEL_TEST environment
variables are set.
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.
Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.
Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.
This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.
More feature will be enabled once they're ported to the split kernel and
tested.
Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.
Based on the research paper:
https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf
Here's the documentation:
https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit
Design discussion of the patch:
https://developer.blender.org/T44197
Differential Revision: https://developer.blender.org/D1200
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
For CPU it gives available instructions set (SSE, AVX and so).
For GPU CUDA it reports most of the attribute values returned by
cuDeviceGetAttribute(). Ideally we need to only use set of those
which are driver-specific (so we don't clutter system info with
values which we can get from GPU specifications and be sure they
stay the same because driver can't affect on them).
Quite straightforward change, the only annoying thing is that we can't use
indentation for include directive just because of the way headers inlineing
works for OpenCL.
Might do smarter job in path_source_replace_includes() but don't want to
spend time on this yet.
This is rather legit case which happens i.e. when having persistent images enabled
and session is updating the lookup tables.
Now device_memory keeps track of amount of memory being allocated on the device,
which makes freeing using the proper allocated size, not the CPU side buffer
size.
This problem was introduced in 983cbafd18
Basically the issue is that we were not getting a unique index in the
baking routine for the RNG (random number generator).
Reviewers: sergey
Differential Revision: https://developer.blender.org/D749
For now it was mainly about OpenCL wrangler being duplicated
between Cycles and Compositor, but with OpenSubdiv work those
wranglers were gonna to be duplicated just once again.
This commit makes it so Cycles and Compositor uses wranglers
from this repositories:
- https://github.com/CudaWrangler/cuew
- https://github.com/OpenCLWrangler/clew
This repositories are based on the wranglers we used before
and they'll be likely continued maintaining by us plus some
more players in the market.
Pretty much straightforward change with some tricks in the
CMake/SCons to make this libs being passed to the linker
after all other libraries in order to make OpenSubdiv linked
against those wranglers in the future.
For those who're worrying about Cycles being less standalone,
it's not truth, it's rather more flexible now and in the future
different wranglers might be used in Cycles. For now it'll
just mean those libs would need to be put into Cycles repository
together with some other libs from Blender such as mikkspace.
This is mainly platform maintenance commit, should not be any
changes to the user space.
Reviewers: juicyfruit, dingto, campbellbarton
Reviewed By: juicyfruit, dingto, campbellbarton
Differential Revision: https://developer.blender.org/D707
Baking progress preview is not possible, in parts due to the way the API
was designed. But at least you get to see the progress bar while baking.
Reviewers: sergey
Differential Revision: https://developer.blender.org/D656
Now baking does one AA sample at a time, just like final render. There is
also some code for shader antialiasing that solves T40369 but it is disabled
for now because there may be unpredictable side effects.
The kernel for baking the world texture was the same as the one used for
baking. Now that's separate which allows the kernel to reserve much less
memory.
All textures are sampled bi-linear currently with the exception of OSL there texture sampling is fixed and set to smart bi-cubic.
This patch adds user control to this setting.
Added:
- bits to DNA / RNA in the form of an enum for supporting multiple interpolations types
- changes to the image texture node drawing code ( add enum)
- to ImageManager (this needs to know to allocate second texture when interpolation type is different)
- to node compiler (pass on interpolation type)
- to device tex_alloc this also needs to get the concept of multiple interpolation types
- implementation for doing non interpolated lookup for cuda and cpu
- implementation where we pass this along to osl ( this makes OSL also do linear untill I add smartcubic to the interface / DNA/ RNA)
Reviewers: brecht, dingto
Reviewed By: brecht
CC: dingto, venomgfx
Differential Revision: https://developer.blender.org/D317
After update to Mac OS X 10.9.1, OpenCL works now on my Intel CPU in the 2013 Macbook Pro (even the entire kernel).
The Intel Iris Pro GPU still segfaults here though, even when all flags are disabled (building "clay like" kernel only).
Maybe we need the -no-missing-prototypes for AMD hardware still, but I couldn't find a way to distuinguish here.
This actually works somewhat now, although viewport rendering is broken and any
kind of network error or connection failure will kill Blender.
* Experimental WITH_CYCLES_NETWORK cmake option
* Networked Device is shown as an option next to CPU and GPU Compute
* Various updates to work with the latest Cycles code
* Locks and thread safety for RPC calls and tiles
* Refactored pointer mapping code
* Fix error in CPU brand string retrieval code
This includes work by Doug Gale, Martijn Berger and Brecht Van Lommel.
Reviewers: brecht
Differential Revision: http://developer.blender.org/D36
except for curves, that's still missing from the OpenColorIO GLSL shader.
The pixels are stored in a half float texture, converterd from full float with
native GPU instructions and SIMD on the CPU, so it should be pretty quick.
Using a GLSL shader is useful for GPU render because it avoids a copy through
CPU memory.
* Add CUDA compiler version detection to cmake/scons/runtime
* Remove noinline in kernel_shader.h and reenable --use_fast_math if CUDA 5.x
is used, these were workarounds for CUDA 4.2 bugs
* Change max number of registers to 32 for sm 2.x (based on performance tests
from Martijn Berger and confirmed here), and also for NVidia OpenCL.
Overall it seems that with these changes and the latest CUDA 5.0 download, that
performance is as good as or better than the 2.67b release with the scenes and
graphics cards I tested.