With small tiles, the repeated allocations on GPUs can actually slow down the denoising quite a lot. Allocating the buffer just once reduces rendertime for the default cube with 16x16 tiles and denoising on a mobile 1050 from 22.7sec to 14.0sec.