The idea of this flag was to prevent snapping onto an object which depends on
currently modifying ones. Using single flag makes more sense here, and also
makes it possible to replace some ob->recalc based magic with depsgraph query
to set those flags.
It looks stupid to first force some flag being set and then have workaround
to ignore that flag in snapping code. Let's just not set the flag in the first
place.
The only useful situation where such snapping was usable is to move roots of
disconnected hair, which still works just fine. However, there might be some
other hidden corner case where this workaround was needed.
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.
Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.
The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.
To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
For some blend modes there would be no effect with factor 1.0, even if factor
0.999 would give a very different image. Now the result should have no
discontinuity.
Differential Revision: https://developer.blender.org/D2925
Do a direct update of object transform instead, without involving
manual trickery of recalc flag.
Shouldn't be functional changes as far as artists are concerned,
but will allow us to get rid of recalc flags in 2.8.
Thanks Bastien for review!
Adds support for defining a number of tags as part of the rna-struct
definition, which its properties can set similar to property-flags.
BPY supports setting these tags when defining custom properties too.
* To define tags for a struct (which its properties can use then), define the tags in an `EnumPropertyItem` array, and assign them to the struct using `RNA_def_struct_property_tags(...)`.
* To set tags for an RNA-property in C, use the new `RNA_def_property_tags(...)`.
* To set tags for an RNA-property in Python, use the newly added tags parameter. E.g. `bpy.props.FloatProperty(name="Some Float", tags={'SOME_TAG', 'ANOTHER_TAG'})`.
No need to print status for basic & reliable operations,
build systems can output operations they run if needed,
or debug output changed in the source if developers are debugging.
Nice for ninja, so any printed text hints at a problem to fix.
This is fully unreadable to have lots of boolean arguments scattered across the
whole argument list. What does `false, true, true` mean in terms of behavior?
Replace those with bitfield which has advantage of having more human readable
meaning.
Pretty straightforward this time, we already have a single struct
pointer containing all needed data (or nearly).
And we gain about 10-15% speed on tracking! :)
Two more 'not really useful' cases (OMP only shows some noticeable
speedup with above 1M elements, and since this is quick operation anyway
compared to even ather basic operators, gain is in the 1% area of total
processing time in best case).
So not worth parallelizing here, we'll gain much more on tackling heavy
operations. ;)
And BMesh is free from OMP now!
Performances tests on this one are quite surprising actually...
Parallelized loop itself is at least 10 times quicker with new BLI_task
code than it was with OMP. And subdividing e.g. a heavy mesh with 3
levels of multires (whole process) takes 8 seconds with new code, while
10 seconds with OMP one. And cherry on top, BLI_task code only uses
about 50% of CPU load, while OMP one was at nearly 100%!
In fact, I suspect OMP code was not properly declaring outside vars,
generating a lot of uneeded locks.
Also, raised the minimum level of subdiv to enable parallelization,
tests here showed that we only start to get significant gain with subdiv
levels of 4, below single threaded one is quicker.
Those three ones were actually giving no significant benefits, in fact
even slowing things down in one case compared to no parallelization at
all (in `BM_mesh_elem_table_ensure()`).
Point being, once more, parallelizing *very* small tasks (like index or
flag setting, etc.) is nearly never worth it.
Also note that we could not easlily use per-item parallel looping in
those three cases, since they are heavily relying on valid
loop-generated index (or are doing non-threadable things like allocation
from a mempool)...