The title says it all actually. Gives up to 10% speedup on test scenes here
on i7-6800K.
Render times on GPU are unreliable here, but there might be some slowdown
caused by watertight nature of intersections.
Avoid construction of temporary array and make utility function force-inlined.
Additionally avoid calling float4_to_float3 twice.
This brings render times to the same values as before current patch series.
This is a preparation work for the followup commit which wil l move
remaining parts of Woop intersection logic to an utility file.
Doing it as a separate commit to keep changes more atomic and easier
to bisect when/if needed.
There are following benefits:
- Modifying intersection algorithm will not cause so much re-compilation.
- It works around header dependency hell and allows us to use vectorization
types much easier in there.
There seems to be a compiler bug of MSVC2013. The issue does not happen on Linux and
does not happen on Windows when building with MSVC2015.
Since it's reallly a pain to debug release builds with MSVC2013 the AVX2 optimization
is disabled for curve sergemnts for this compiler.
For example, for RX480 you'll no longer see "Ellesmere" but will see
"AMD Radeon RX 480 Graphics" which makes more sense and allows to easily
distinguish which exact card it is when having multiple different cards
of Ellesmere codenames (i.e. RX480 and WX7100) in the same machine.
Previously, the code would only update the status string if the main status changed.
However, the main status did not include the remaining time, and therefore it wasn't updated until the amount of rendered tiles (which is part of the main status) changed.
This commit therefore makes the BlenderSession remember the time of the last status update and forces a status update if the last one was more than a second ago.
Reviewers: sergey
Differential Revision: https://developer.blender.org/D2465
This attempt at TLS was leftover from an earlier prototype. It never worked with Blender's build system, and was defined to just exist, not to actually do anything.
Before now it lived in source/blender/gpu for convenience. Only a few files in the gpu module use Gawain directly.
Tested on Mac, time to push and test on Windows.
Todo: some CMake magic to make it easy to
#include "gawain/some_header.h"
from any C or H file. Main problem here is the many editors that include GPU_immediate.h which includes Gawain's immediate.h -- is there a way to avoid changing every editor's CMakeLists?
Pass globals as a bare pointer, same as it sued to be prior to split kernel rework.
AMD CPU platform and Intel OpenCL were complaining about this.
Perhaps we shouldn't pass globals as pointer at all, this isn't something what is
really portable and can cause issues on 32 bit perhaps.
The range is controlled using the following command line arguments:
--cycles-resumable-start-chunk
--cycles-resumable-end-chunk
Those are 1-based index of range for rendering.