Cycles: Enable inlining on Apple Silicon for 1.1x speedup

This is a stripped down version of D14645 without the scene specialisation optimisations.

The two major changes in this patch are:

- Enables more aggressive inlining on Apple Silicon resulting in a 1.1x speedup and 10% reduction in spill, at the cost of longer pipeline build times
- Revival of shader binary archives through a new ShaderCache which is shared between MetalDevice instances using the same physical MTLDevice. This mitigates the extra compile times via explicit caching (rather than, as before, relying on the implicit system shader cache which can be purged without notice)

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D14763
This commit is contained in:
Michael Jones
2022-04-26 19:00:35 +01:00
parent 994da7077d
commit b82de02e7c
6 changed files with 594 additions and 577 deletions

View File

@@ -29,10 +29,26 @@ using namespace metal::raytracing;
/* Qualifiers */
#define ccl_device
#define ccl_device_inline ccl_device
#define ccl_device_forceinline ccl_device
#define ccl_device_noinline ccl_device __attribute__((noinline))
#if defined(__KERNEL_METAL_APPLE__)
/* Inline everything for Apple GPUs.
* This gives ~1.1x speedup and 10% spill reduction for integator_shade_surface
* at the cost of longer compile times (~4.5 minutes on M1 Max). */
# define ccl_device __attribute__((always_inline))
# define ccl_device_inline __attribute__((always_inline))
# define ccl_device_forceinline __attribute__((always_inline))
# define ccl_device_noinline __attribute__((always_inline))
#else
# define ccl_device
# define ccl_device_inline ccl_device
# define ccl_device_forceinline ccl_device
# define ccl_device_noinline ccl_device __attribute__((noinline))
#endif
#define ccl_device_noinline_cpu ccl_device
#define ccl_device_inline_method ccl_device
#define ccl_global device