Seems that previous fix didn't work in all cases: Debian's build
environment didn't fully detect endianess, possibly due to typo,
possibly due to difference in various environments.
Using define magic from a more battle-tested project seems a safe
way to go.
There are more changes than just PPC since the upstream commit contains
full re-synchronization of all defines.
This commit updates numaapi to a latest library version from upstream.
The issue was caused by a limitation of GetNumaNodeProcessorMask():
on systems with more than 64 processors, this parameter is set to the
processor mask for the node only if the node is in the same processor
group as the calling thread. Otherwise, the parameter is set to zero.
Patch from Max Dmitrichenko, thanks!
The issue was introduced by a Threadripper2 commit back in
ce927e15e0. This boils down to threads inheriting affinity
from the parent thread. It is a question how this slipped
through the review (we definitely run benchmark round).
Quick fix could have been to always set CPU group affinity
in Cycles, and it would work for Windows. On other platforms
we did not have CPU groups API finished.
Ended up making Cycles aware of NUMA topology, so now we
bound threads to a specific NUMA node. This required adding
an external dependency to Cycles, but made some code there
shorter.