This commit implements traversal for QBVH tree, which is based on the old loop
code for traversal itself and Embree for node intersection.
This commit also does some changes to the loop inspired by Embree:
- Visibility flags are only checked for primitives.
Doing visibility check for every node cost quite reasonable amount of time
and in most cases those checks are true-positive.
Other idea here would be to do visibility checks for leaf nodes only, but
this would need to be investigated further.
- For minimum hair width we extend all the nodes' bounding boxes.
Again doing curve visibility check is quite costly for each of the nodes and
those checks returns truth for most of the hierarchy anyway.
There are number of possible optimization still, but current state is good
enough in terms it makes rendering faster a little bit after recent watertight
commit.
Currently QBVH is only implemented for CPU with SSE2 support at least. All
other devices would need to be supported later (if that'd make sense from
performance point of view).
The code is enabled for compilation in kernel. but blender wouldn't use it
still.
Using this paper: Sven Woop, Watertight Ray/Triangle Intersection
http://jcgt.org/published/0002/01/05/paper.pdf
This change is expected to address quite reasonable amount of reports from the
bug tracker, plus it might help reducing the noise in some scenes.
Unfortunately, it's currently about 7% slower than the previous solution with
pre-computed triangle plane equations, but maybe with some smart tweaks to the
code (tests reshuffle, using SIMD in a nice way or so) we can avoid the speed
regression.
But perhaps smartest thing to do here would be to change single triangle / ray
intersection with multiple triangles / ray intersections. That's how Embree does
this and it's watertight single ray intersection is not any faster that this.
Currently only triangle intersection is modified accordingly to the paper, in
the future we would also want to modify the node / ray intersection.
Reviewers: brecht, juicyfruit
Subscribers: dingto, ton
Differential Revision: https://developer.blender.org/D819
The idea is to store visibility flags for leaf nodes only since visibility check
for inner nodes costs too much for QBVH hence it is not optimal to perform.
Leaf QBVH nodes have plenty of space to store all sort of flags, so we can make
nodes one element smaller, saving noticeable amount of memory.
Previously offsets were calculated based on the BVH node size,
which is wrong and real PITA in cases when some extra data is
to be added into (or removed from) the node.
Now use offsets which are not calculated form the node size.
This solves quite an over-allocation in BVH instances packing code,
unfortunately, it's not a magic bullet to solve memory bump caused
by the recent QBVH changes.
For that we'll likely need to decouple storage for leaf and inner
nodes. However, it's not really clear for now if it's something
important since that'd still be just a fraction of memory comparing
to all the hi-res textures.
Title says it all, quite straightforward implementation.
Would only mention that there's a bit of code duplication around packing node
into pack.nodes. Trying to de-duplicate it ends up in quite hairy code (like
functions with loads of arguments some of which could be NULL in certain
circumstances etc..). Leaving solving this duplication for later.
Before all the nodes were counted and allocated, leading to situations when
bunch of allocated memory is not used because reasonable amount of nodes are
simply ignored.
Before this Cycles used to try using the cache even so it knew for the
fact that reading it from the disk failed. This change doesn't make it
more stable if someone will try to trick Cycles and give malformed data
but it solves general cases when Blender crashed during the cache write
and will preserve rendering from crashing when trying to use that partial
cache.
- Removed the cycles subdivision and interpolation of hairkeys.
- Removed the parent settings.
- Removed all of the advanced settings and presets.
- This simplifies the UI to a few settings for the primitive type and a shape mode.
On the BMW scene, this gives roughly a 10% speedup overall with clang/gcc, and 30%
speedup with visual studio (2008). It turns out visual studio was optimizing the
existing code quite poorly compared to pretty good autovectorization by clang/gcc,
but hand written SSE code also gives a smaller speed boost there.
This code isn't enabled when using the hair minimum width feature yet, need to
make that work with the SSE code still.
* Revert r57203 (len() renaming)
There seems to be a problem with nVidia OpenCL after this and I haven't figured out the real cause yet.
Better to selectively enable native length() later, after figuring out what's wrong.
This fixes [#35612].
* Rename some math functions:
len -> length
len_squared -> length_squared
normalize_len -> normalize_length
* This way OpenCL uses its inbuilt length() function, rather than our own. The other two functions have been renamed for consistency.
* Tested CPU, CUDA and OpenCL compile, should be no functional changes.
Code is added to restrict the pixel size of strands in cycles. It works best with ribbon primitives and a preset for these is included. It uses distance dependent expansion of the strands and then stochastic strand removal to give a fading. To prevent a slowdown for triangle mesh objects in the BVH an extra visibility flag has been added. It is also only applied for camera rays.
The strand width settings are also changed, so that the particle size is not included in the width calculation. Instead there is a separate particle system parameter for width scaling.
The curve segment primitive has been added. This includes an intersection function and changes to the BVH.
A few small errors in the line segment intersection routine are also fixed.
should be no functional changes yet. UV, tangent and intercept are now stored
as attributes, with the intention to add more like multiple uv's, vertex
colors, generated coordinates and motion vectors later.
Things got a bit messy due to having both triangle and curve data in the same
mesh data structure, which also gives us two sets of attributes. This will get
cleaned up when we split the mesh class.
Patch [#33445] - Experimental Cycles Hair Rendering (CPU only)
This patch allows hair data to be exported to cycles and introduces a new line segment primitive to render with.
The UI appears under the particle tab and there is a new hair info node available.
It is only available under the experimental feature set and for cpu rendering.
=== BVH build time optimizations ===
* BVH building was multithreaded. Not all building is multithreaded, packing
and the initial bounding/splitting is still single threaded, but recursive
splitting is, which was the main bottleneck.
* Object splitting now uses binning rather than sorting of all elements, using
code from the Embree raytracer from Intel.
http://software.intel.com/en-us/articles/embree-photo-realistic-ray-tracing-kernels/
* Other small changes to avoid allocations, pack memory more tightly, avoid
some unnecessary operations, ...
These optimizations do not work yet when Spatial Splits are enabled, for that
more work is needed. There's also other optimizations still needed, in
particular for the case of many low poly objects, the packing step and node
memory allocation.
BVH raytracing time should remain about the same, but BVH build time should be
significantly reduced, test here show speedup of about 5x to 10x on a dual core
and 5x to 25x on an 8-core machine, depending on the scene.
=== Threads ===
Centralized task scheduler for multithreading, which is basically the
CPU device threading code wrapped into something reusable.
Basic idea is that there is a single TaskScheduler that keeps a pool of threads,
one for each core. Other places in the code can then create a TaskPool that they
can drop Tasks in to be executed by the scheduler, and wait for them to complete
or cancel them early.
=== Normal ====
Added a Normal output to the texture coordinate node. This currently
gives the object space normal, which is the same under object animation.
In the future this might become a "generated" normal so it's also stable for
deforming objects, but for now it's already useful for non-deforming objects.
=== Render Layers ===
Per render layer Samples control, leaving it to 0 will use the common scene
setting.
Environment pass will now render environment even if film is set to transparent.
Exclude Layers" added. Scene layers (all object that influence the render,
directly or indirectly) are shared between all render layers. However sometimes
it's useful to leave out some object influence for a particular render layer.
That's what this option allows you to do.
=== Filter Glossy ===
When using a value higher than 0.0, this will blur glossy reflections after
blurry bounces, to reduce noise at the cost of accuracy. 1.0 is a good
starting value to tweak.
Some light paths have a low probability of being found while contributing much
light to the pixel. As a result these light paths will be found in some pixels
and not in others, causing fireflies. An example of such a difficult path might
be a small light that is causing a small specular highlight on a sharp glossy
material, which we are seeing through a rough glossy material. With path tracing
it is difficult to find the specular highlight, but if we increase the roughness
on the material the highlight gets bigger and softer, and so easier to find.
Often this blurring will be hardly noticeable, because we are seeing it through
a blurry material anyway, but there are also cases where this will lead to a
loss of detail in lighting.
disk to be reused by the next render.
This is useful for rendering animations where only the camera or materials change.
Note that saving the BVH to disk only to be removed for the next frame is slower
if this is not the case and the meshes do actually change.
For a render, it will save bvh files to the cache user directory, and remove all
cache files from other renders. The files are named using a MD5 hash based on the
mesh, to verify if the meshes are still the same.
* Fix crash in light path node
* Fix struct alignment issue for cuda
* Fix issue with instances taking up too much memory
* Fix issue with ray visibility working incorrect on some objects
* Enable OpenCL always and remove option, it has no dependencies so may as well
* Refuse to load kernel if OpenCL version < 1.1, recent drivers are needed
* Better error handling for OpenCL device
* 3D views with rendered draw mode will now revert to wireframe on file load
* Add max diffuse/glossy/transmission bounces
* Add separate min/max for transparent depth
* Updated/added some presets that use these options
* Add ray visibility options for objects, to hide them from
camera/diffuse/glossy/transmission/shadow rays
* Is singular ray output for light path node
Details here:
http://wiki.blender.org/index.php/Dev:2.5/Source/Render/Cycles/LightPaths