Cycles: Tweak inline policy for some functions

The goal is to make Experimental kernel closer in performance to the
official kernel, avoiding spills and such.

There should not be big impact on official kernel, own tests showed
few percent performance drop on laptop's GPU. CPU was always the
same speed on AVX, AVX2 and SSE4.1 CPUs i've been testing here.

This seems to be the last essential step before we can get rid of
Experimental kernel and enable SSS officially on GPU without causing
some major performance issues.

Surely some more tweaks are possibly required, but that we can do
for until cows go home anyway.
This commit is contained in:
Sergey Sharybin
2016-01-14 14:53:05 +05:00
parent 5af103fe00
commit 1f273cec00
6 changed files with 42 additions and 21 deletions

View File

@@ -183,11 +183,14 @@ ccl_device_inline bool shadow_blocked(KernelGlobals *kg, PathState *state, Ray *
* potentially transparent, and only in that case start marching. this gives
* one extra ray cast for the cases were we do want transparency. */
ccl_device_inline bool shadow_blocked(KernelGlobals *kg, ccl_addr_space PathState *state, ccl_addr_space Ray *ray_input, float3 *shadow
ccl_device_noinline bool shadow_blocked(KernelGlobals *kg,
ccl_addr_space PathState *state,
ccl_addr_space Ray *ray_input,
float3 *shadow
#ifdef __SPLIT_KERNEL__
, ShaderData *sd_mem, Intersection *isect_mem
, ShaderData *sd_mem, Intersection *isect_mem
#endif
)
)
{
*shadow = make_float3(1.0f, 1.0f, 1.0f);