
The issue was caused by a limitation of GetNumaNodeProcessorMask(): on systems with more than 64 processors, this parameter is set to the processor mask for the node only if the node is in the same processor group as the calling thread. Otherwise, the parameter is set to zero. Patch from Max Dmitrichenko, thanks!