Commit Graph

56 Commits

Author SHA1 Message Date
Thomas Dinges
a5a05fc291 Cycles: Fix long compile time with MSVC.
Compile time per kernel increased alot after recent image commits, re-shuffle some code to fix this.

Patch by "LazyDodo".

Differential Revision: https://developer.blender.org/D2012
2016-05-20 16:50:29 +02:00
Thomas Dinges
3c85e1ca1a Cycles: Add support for single channel byte textures.
This way, we also save 3/4th of memory for single channel byte textures (e.g. Bump Maps).

Note: In order for this to work, the texture *must* have 1 channel only.
In Gimp you can e.g. do that via the menu: Image -> Mode -> Grayscale
2016-05-12 14:51:42 +02:00
Thomas Dinges
4a4f043bc4 Cycles: Add support for single channel float textures on CPU.
Until now, single channel textures were packed into a float4, wasting 3 floats per pixel. Memory usage of such textures is now reduced by 3/4.
Voxel Attributes such as density, flame and heat benefit from this, but also Bumpmaps with one channel.
This commit also includes some cleanup and code deduplication for image loading.

Example Smoke render from Cosmos Laundromat: http://www.pasteall.org/pic/show.php?id=102972
Memory here went down from ~600MB to ~300MB.

Reviewers: #cycles, brecht

Differential Revision: https://developer.blender.org/D1981
2016-05-11 21:58:34 +02:00
Thomas Dinges
d6555d936c Cleanup: Avoid duplicative defines for CPU textures, use the ones from util_texture.h
Also includes some further byte -> byte4 renaming, missed that in last commit.
2016-05-09 09:16:41 +02:00
Thomas Dinges
3807bcb3a8 Cleanup: Rename texture slots to float4 and byte, to distinguish from future float (single channel) and half_float slots.
Should be no functional changes, tested CPU and CUDA.
2016-05-06 14:37:35 +02:00
Brecht Van Lommel
1dfbcd88d5 Fix a few compiler warnings with OS X / clang. 2016-04-17 01:05:50 +02:00
Sergey Sharybin
28604c46a1 Cycles: Make Blender importer more forward compatible
Basically the idea is to make code robust against extending
enum options in the future by falling back to a known safe
default setting when RNA is set to something unknown.

While this approach solves the issues similar to T47377,
but it wouldn't really help when/if any of the RNA values
gets ever deprecated and removed. There'll be no simple
solution to that apart from defining explicit mapping from
RNA value to Cycles one.

Another part which isn't so great actually is that we now
have to have some enum guards and give some explicit values
to the enum items, but we can live with that perhaps.

Reviewers: dingto, juicyfruit, lukasstockner97, brecht

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D1785
2016-02-12 15:27:33 +01:00
Thomas Dinges
aa49c16bd9 Cleanup: Avoid some warnings on OS X with clang and update comment. 2015-10-26 11:52:24 +01:00
Sergey Sharybin
350cf8ea7f Cycles: Cleanup, whitespace around keywords 2015-10-08 19:08:28 +05:00
Sergey Sharybin
d784568805 Cycles: Fix missing z-coordinate check in volume sampling 2015-10-05 12:40:50 +05:00
Sergey Sharybin
0bc4bc2e61 Fix T45946: Cycles texture interpolation bug
Coordinate clamping was done in the wrong order.
2015-09-03 18:16:30 +05:00
Sergey Sharybin
aac6ee6b87 Fix T45885: Cycles coordinate extension modes not working as expected
Fix T45769: Image Texture Node clipping bug

Simple mistakes in the normalized/pixel-space coordinates handling.

Render tests for this feature are coming.
2015-08-24 10:40:37 +02:00
Sergey Sharybin
a6b2650c7d Cycles: Correction to image extension type commits
Clipping wasn't working totally correct, need to check original coordinates,
not the integer ones,

Now CPU gives the same exact results for both SVM and OSL, CUDA is still doing
something crazy with edges.
2015-07-28 16:31:27 +02:00
Sergey Sharybin
4690281b17 Cycles: Add implementation of clip extension mode
For now there's no OpenCL support, it'll come later.
2015-07-28 14:36:08 +02:00
Sergey Sharybin
3fba620858 Cycles: Prepare for more image extension types support
Basically just replace boolean periodic flag with extension type enum in the
device API.
2015-07-28 14:14:24 +02:00
Sergey Sharybin
f2c54df625 Cycles: Expose image image extension mapping to the image manager
Currently only two mappings are supported by API, which is Repeat (old behavior)
and new Clip behavior. Internally this extension is being converted to periodic
flag which was already supported but wasn't exposed.

There's no support for OpenCL yet because of the way how we pack images into a
single texture.

Those settings are not exposed to UI or anywhere else and there should be no
functional changes so far.
2015-07-21 21:58:19 +02:00
Sergey Sharybin
097aa852cf Cycles: Silent paranoid uninitialized GCC warnings in release kernels 2015-06-13 16:29:54 +02:00
George Kyriazis
7f4479da42 Cycles: OpenCL kernel split
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.

Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.

Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.

This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.

More feature will be enabled once they're ported to the split kernel and
tested.

Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.

Based on the research paper:

  https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf

Here's the documentation:

  https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit

Design discussion of the patch:

  https://developer.blender.org/T44197

Differential Revision: https://developer.blender.org/D1200
2015-05-09 19:52:40 +05:00
Sergey Sharybin
6fc1669679 Cycles: Initial work towards selective nodes support compilation
The goal is to be able to compile kernel with nodes which are actually needed
to render current scene, hence improving performance of the kernel,

The idea is:

- Have few node groups, starting with a group which contains nodes are used
  really often, and then couple of groups which will be extension of this one.

- Have feature-based nodes disabling, so it's possible to disable nodes related
  to features which are not used with the currently used nodes group.

This commit only lays down needed routines for this approach, actual split will
happen later after gathering statistics from bunch of production scenes.
2015-05-09 19:22:16 +05:00
Campbell Barton
7221fbe9dd cleanup 2015-02-12 23:51:02 +11:00
Sergey Sharybin
13ad69c68e Cycles: Add print functions for sse3f, sse3i and sse3b 2015-02-11 00:20:34 +05:00
Sergey Sharybin
ddba5c27a7 Cycles: Ignore -Wmaybe-uninitialized from the kernel in release builds
This warning provided too much false-positive issues in release version of the
kernel, making it really easy to miss actual warnings.
2015-02-02 22:09:01 +05:00
Sergey Sharybin
5030daf2a8 Cycles: Remove redundant calculation of w in recent cubic commit
Was rather harmless since compiler will optimize it out, but nice to get
rid of this anyway.
2015-02-02 17:35:57 +05:00
Sergey Sharybin
b757f04a15 Cycles: Indentation fix for the previous commit 2015-02-02 02:04:47 +05:00
Sergey Sharybin
3b9d455a90 Cycles: Implement cubit image interpolation on CPU
Basically title says it all. Could be not totally optimized but the code is there now.
2015-02-02 02:02:10 +05:00
Sergey Sharybin
010f3ee438 Cycles: Fix compilation error on non-SSE2 architectures 2014-12-25 14:11:37 +05:00
Thomas Dinges
ee36e75b85 Cleanup: Fix Cycles Apache header.
This was already mixed a bit, but the dot belongs there.
2014-12-25 02:50:24 +01:00
Sergey Sharybin
03f28553ff Cycles: Implement QBVH tree traversal
This commit implements traversal for QBVH tree, which is based on the old loop
code for traversal itself and Embree for node intersection.

This commit also does some changes to the loop inspired by Embree:

- Visibility flags are only checked for primitives.

  Doing visibility check for every node cost quite reasonable amount of time
  and in most cases those checks are true-positive.

  Other idea here would be to do visibility checks for leaf nodes only, but
  this would need to be investigated further.

- For minimum hair width we extend all the nodes' bounding boxes.

  Again doing curve visibility check is quite costly for each of the nodes and
  those checks returns truth for most of the hierarchy anyway.

There are number of possible optimization still, but current state is good
enough in terms it makes rendering faster a little bit after recent watertight
commit.

Currently QBVH is only implemented for CPU with SSE2 support at least. All
other devices would need to be supported later (if that'd make sense from
performance point of view).

The code is enabled for compilation in kernel. but blender wouldn't use it
still.
2014-12-25 02:50:49 +05:00
Sergey Sharybin
ab8d9c4b88 Cycles: Add some utility functions and structures
Most of them are not currently used but are essential for the further work.

- CPU kernels with SSE2 support will now have sse3b, sse3f and sse3i

- Added templatedversions of min4, max4 which are handy to use with register
  variables.

- Added util_swap function which gets arguments by pointers.
  So hopefully it'll be a portable version of std::swap.
2014-12-25 02:50:49 +05:00
Sergey Sharybin
a8b9402c8f Cycles: Tweak to the expf() speed workaround
Add compile-time check for particular glibc version which fixed the issue.
This makes it so own-compiled blender is the fastest in the world, and the
only issue remains what should we do for release builds.

After some discussion with Campbell we decided to keep it as is for now
because slowdown is not that much noticeable. We'll disable this workaround
for release builds when all the majority of the distros will switch to the
new version of glibc.
2014-11-07 13:35:45 +05:00
Sergey Sharybin
d2d1b19170 Cycles: Expose volume voxel data interpolation to the interface
It is per-material setting which could be found under the Volume settings
in the material and world context buttons.

There could still be some code-wise improvements, like using variable-size
macro for interp3d instead of having interp3d_ex to which you can pass the
interpolation method.
2014-10-22 19:53:06 +06:00
Sergey Sharybin
c24698a37e Cycles: Implement tricubic b-spline interpolation for CPU texture_image
This is the first step towards supporting cubic interpolation for voxel
data (such as smoke and fire). It is not epxosed to the interface at all
yet, this is to be done soon after this change.
2014-10-22 19:41:58 +06:00
Campbell Barton
e2522b4a29 Cycles: correct math wrappers
include the parens around value before cast,
in some cases was causing double/float promotion by only casting the left value.
2014-10-08 00:13:26 +02:00
Sergey Sharybin
cd6129d1ff Cycles: Workaround dead-slow expf() on 64bit linux
Single precision exponent on 64bit linux tends to be order of magnitude slower
than double precision version even with single<->double precision conversion.

Some feedback in the mailing lists also suggests that logf() is also slow, but
this i didn't confirm here in the studio yet.

Depending on the shader setup it gives ~3% with the secret agent shot and up to
around 15% with the bmw scene here.
2014-10-06 12:36:46 +02:00
Thomas Dinges
cd5e1ff74e Cycles Refactor: Add SSE Utility code from Embree for cleaner SSE code.
This makes the code a bit easier to understand, and might come in handy
if we want to reuse more Embree code.

Differential Revision: https://developer.blender.org/D482

Code by Brecht, with fixes by Lockal, Sergey and myself.
2014-06-13 21:59:12 +02:00
Campbell Barton
fd7f5c4230 Cycles: revert part of the optimization from ff34c2d
This was faster for my AMD system but slower for Intel.

However with gcc4.9,-O3 I was able to get roughly the same speed before/after.

Revert since this isnt giving such clear benefits on most systems.
2014-05-06 14:07:04 +10:00
Campbell Barton
ff34c2de64 Cycles: avoid int->float conversions for pixel lookups
Gives ~3% speedup for image.blend test, and 6% for image heavy file.

Overall speedup in real-world use is likely much less.
2014-05-05 06:58:39 +10:00
Brecht Van Lommel
a2e4ebd36a Cycles code internals: add CPU kernel support for 3D image textures. 2014-03-29 13:03:48 +01:00
Martijn Berger
a8039d99f8 Fix cycles texture interpolation mode closest constant offset on some devices 2014-03-13 20:08:33 +01:00
Martijn Berger
dd2dca2f7e Add support for multiple interpolation modes on cycles image textures
All textures are sampled bi-linear currently with the exception of OSL there texture sampling is fixed and set to smart bi-cubic.

This patch adds user control to this setting.

Added:
- bits to DNA / RNA in the form of an enum for supporting multiple interpolations types
- changes to the image texture node drawing code ( add enum)
- to ImageManager (this needs to know to allocate second texture when interpolation type is different)
- to node compiler (pass on interpolation type)
- to device tex_alloc this also needs to get the concept of multiple interpolation types
- implementation for doing non interpolated lookup for cuda and cpu
- implementation where we pass this along to osl ( this makes OSL also do linear untill I add smartcubic to the interface / DNA/ RNA)

Reviewers: brecht, dingto

Reviewed By: brecht

CC: dingto, venomgfx

Differential Revision: https://developer.blender.org/D317
2014-03-07 23:16:33 +01:00
Sv. Lockal
7808360c5f Cycles: fix crash in SSE hair and half-floats on x86+vc2008
MSVC 2008 ignores alignement attribute when assigning from unaligned
float4 vector, returned from other function. Now Cycles uses unaligned
loads instead of casts for win32 in x86 mode.
2014-02-27 15:01:20 +04:00
Brecht Van Lommel
28e6d05e09 Fix cycles crash with float image textures on CPU without AVX support.
The AVX kernel functions for reading image textures could be get used from non-AVX
kernels. These are C++ class methods and need to be marked for inlining, all other
functions are static so they don't leak into other kernels.
2014-02-04 16:07:50 +01:00
Brecht Van Lommel
d9e52ac98b Code cleanup: move half float functions to separate header file. 2014-01-15 15:29:22 +01:00
Thomas Dinges
1578b55c27 Cycles: Move SIMD utility functions into its own file.
Recently added SSE macros for noise texture can be moved here as well, but I leave this for later.
2013-12-27 21:30:21 +01:00
Brecht Van Lommel
b9ce231060 Cycles: relicense GNU GPL source code to Apache version 2.0.
More information in this post:
http://code.blender.org/

Thanks to all contributes for giving their permission!
2013-08-18 14:16:15 +00:00
Brecht Van Lommel
d835d2f4e6 Code cleanup: avoid some warnings due to implicit uint/int/float/double conversion. 2013-06-07 16:06:17 +00:00
Brecht Van Lommel
de9dffc61e Cycles: initial subsurface multiple scattering support. It's not working as
well as I would like, but it works, just add a subsurface scattering node and
you can use it like any other BSDF.

It is using fully raytraced sampling compatible with progressive rendering
and other more advanced rendering algorithms we might used in the future, and
it uses no extra memory so it's suitable for complex scenes.

Disadvantage is that it can be quite noisy and slow. Two limitations that will
be solved are that it does not work with bump mapping yet, and that the falloff
function used is a simple cubic function, it's not using the real BSSRDF
falloff function yet.

The node has a color input, along with a scattering radius for each RGB color
channel along with an overall scale factor for the radii.

There is also no GPU support yet, will test if I can get that working later.

Node Documentation:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Shaders#BSSRDF

Implementation notes:
http://wiki.blender.org/index.php/Dev:2.6/Source/Render/Cycles/Subsurface_Scattering
2013-04-01 20:26:52 +00:00
Brecht Van Lommel
40b05d364e Cycles: code refactoring to add generic lookup table memory. 2013-04-01 20:26:43 +00:00
Brecht Van Lommel
3945979f2b Fix #33486: cycles CPU image textures were offset wrong by half a pixel compared
to OpenGL/CUDA/OSL rendering.
2012-12-12 09:17:21 +00:00
Brecht Van Lommel
adea12cb01 Cycles: merge of changes from tomato branch.
Regular rendering now works tiled, and supports save buffers to save memory
during render and cache render results.

Brick texture node by Thomas.
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Textures#Brick_Texture

Image texture Blended Box Mapping.
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Textures#Image_Texture
http://mango.blender.org/production/blended_box/

Various bug fixes by Sergey and Campbell.
* Fix for reading freed memory in some node setups.
* Fix incorrect memory read when synchronizing mesh motion.
* Fix crash appearing when direct light usage is different on different layers.
* Fix for vector pass gives wrong result in some circumstances.
* Fix for wrong resolution used for rendering Render Layer node.
* Option to cancel rendering when doing initial synchronization.
* No more texture limit when using CPU render.
* Many fixes for new tiled rendering.
2012-09-04 13:29:07 +00:00