Implement an overscan support for tiles, so that adaptive sampling can
rely on the pixels neighbourhood.
Differential Revision: https://developer.blender.org/D12599
For example, crash when attempting to use OptiX denoiser on systems
without OptiX-capable device.
Perform check that scene update happened without errors.
Note that `et_error` makes progress to cancel, so the code was
simplified a bit.
* Add OutputDriver, replacing function callbacks in Session.
* Add PathTraceTile, replacing tile access methods in Session.
* Add more detailed comments about how this driver should be implemented.
* Add OIIOOutputDriver for Cycles standalone to output an image.
Differential Revision: https://developer.blender.org/D12627
* Split GPUDisplay into two classes. PathTraceDisplay to implement the Cycles side,
and DisplayDriver to implement the host application side. The DisplayDriver is now
a fully abstract base class, embedded in the PathTraceDisplay.
* Move copy_pixels_to_texture implementation out of the host side into the Cycles side,
since it can be implemented in terms of the texture buffer mapping.
* Move definition of DeviceGraphicsInteropDestination into display driver header, so
that we do not need to expose private device headers in the public API.
* Add more detailed comments about how the DisplayDriver should be implemented.
The "driver" terminology might not be obvious, but is also used in other renderers.
Differential Revision: https://developer.blender.org/D12626
We were writing large 2048x2048 tiles into EXR files, which appears to cause
integer overflow inside the OpenEXR library when there are multiple passes. Now
use smaller tiles in the image file, while still rendering large tiles.
This adds the requirement that the render tile size must be a multiple of 128
or be smaller than 128, this is adjusted automatically.
Samples count pass is normalized to the overall number of samples.
This means that we need to store actual value of the samples in the
tile buffer file.
A bit annoying to pull all those settings to BufferParams and need
to find a more generic solution, but for now this is easiest and a
quickest solution.
Differential Revision: https://developer.blender.org/D12597
This includes much improved GPU rendering performance, viewport interactivity,
new shadow catcher, revamped sampling settings, subsurface scattering anisotropy,
new GPU volume sampling, improved PMJ sampling pattern, and more.
Some features have also been removed or changed, breaking backwards compatibility.
Including the removal of the OpenCL backend, for which alternatives are under
development.
Release notes and code docs:
https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycleshttps://wiki.blender.org/wiki/Source/Render/Cycles
Credits:
* Sergey Sharybin
* Brecht Van Lommel
* Patrick Mours (OptiX backend)
* Christophe Hery (subsurface scattering anisotropy)
* William Leeson (PMJ sampling pattern)
* Alaska (various fixes and tweaks)
* Thomas Dinges (various fixes)
For the full commit history, see the cycles-x branch. This squashes together
all the changes since intermediate changes would often fail building or tests.
Ref T87839, T87837, T87836
Fixes T90734, T89353, T80267, T80267, T77185, T69800
Caused by 4f64fa4f86.
Was a bad backport from the Cycles X branch: the fact that CPU and GPU
has different reset code paths was not taken into account.
This is a backport of recent development in the Cycles X branch.
Fixes possible dead-lock in viewport rendering when exiting at an
exact bad moment (couldn't reproduce in master branch, but in the
cycles-x branch it was happening every now and then).
Differential Revision: https://developer.blender.org/D12154
Makes it consistent with the guidelines and the Cycles X branch, and
allows to backport fix for the viewport update from the branch. Will
cause a merge conflict, which should be simple accept-ours in the
branch.
It was possible that render buffers and scene kernel data will be out
of sync because reset and scene update happens in different locks.
This is similar issue we've fixed in the Cycles X branch, so backported
relevant changes from there.
This change removes what seems to be unused feature kernel.
Differential Revision: https://developer.blender.org/D11828
These seem to be causing some stability issues, and really are just not that
useful in practice. Compiling them is slow already, so it does not improve
the user experience much to show an AO preview if it's not nearly instant.
This optimizes device updates (during user edits or frame changes in
the viewport) by avoiding unnecessary computations. To achieve this,
we use a combination of the sockets' update flags as well as some new
flags passed to the various managers when tagging for an update to tell
exactly what the tagging is for (e.g. shader was modified, object was
removed, etc.).
Besides avoiding recomputations, we also avoid resending to the devices
unmodified data arrays, thus reducing bandwidth usage. For OptiX and
Embree, BVH packing was also multithreaded.
The performance improvements may vary depending on the used device (CPU
or GPU), and the content of the scene. Simple scenes (e.g. with no adaptive
subdivision or volumes) rendered using OptiX will benefit from this work
the most.
On average, for a variety of animated scenes, this gives a 3x speedup.
Reviewed By: #cycles, brecht
Maniphest Tasks: T79174
Differential Revision: https://developer.blender.org/D9555
Tile stealing may steal a CPU tile buffer and move it to the GPU, but next time around that
tile may be re-used on the CPU again (in progressive refinement mode). The buffer would
still be on the GPU then though, so is inaccessible to the CPU. As a result Blender crashed
when the CPU tried to write results to that tile buffer.
This fixes that by ensuring a stolen tile buffer is moved back to the device it is used on before
rendering.
The OptiX denoiser is part of the OptiX device, so to the tile manager looks like a GPU device. As a
result the tile stealing implementation erroneously stole CPU tiles and moved them to that OptiX
device, even though in this configuration the OptiX device was only set up for denoising and not
rendering. Launching the render kernel therefore caused a crash because of a missing AS etc.
This fixes that by ensuring tiles can only be stolen by devices that support render tiles.
Now that the Blender sync mechanism deletes nodes from the scene, we need to
ensure scene update is stopped before we do this.
Also add some more early out in scene geometry update to ensure we do not
continue working on incomplete geometry data, though that was not the cause of
this crash.
This encapsulates Node socket members behind a set of specific methods;
as such it is no longer possible to directly access Node class members
from exporters and parts of Cycles.
The methods are defined via the NODE_SOCKET_API macros in `graph/
node.h`, and are for getting or setting a specific socket's value, as
well as querying or modifying the state of its update flag.
The setters will check whether the value has changed and tag the socket
as modified appropriately. This will let us know how a Node has changed
and what to update, which is the first concrete step toward a more
granular scene update system.
Since the setters will tag the Node sockets as modified when passed
different data, this patch also removes the various modified methods
on Nodes in favor of Node::is_modified which checks the sockets'
update flags status.
Reviewed By: brecht
Maniphest Tasks: T79174
Differential Revision: https://developer.blender.org/D8544
While Cycles already supports using both CPU and GPU at the same time, there
currently is a large problem with it: Since the CPU grabs one tile per thread,
at the end of the render the GPU runs out of new work but the CPU still needs
quite some time to finish its current times.
Having smaller tiles helps somewhat, but especially OpenCL rendering tends to
lose performance with smaller tiles.
Therefore, this commit adds support for tile stealing: When a GPU device runs
out of new tiles, it can signal the CPU to release one of its tiles.
This way, at the end of the render, the GPU quickly finishes the remaining
tiles instead of having to wait for the CPU.
Thanks to AMD for sponsoring this work!
Differential Revision: https://developer.blender.org/D9324
Rather than just printing a message and falling back to the CPU. For render
farms it's better to avoid a potentially slow render on the CPU if the intent
was to render on the GPU.
Ref T82193, D9086
This encapsulates Node socket members behind a set of specific methods;
as such it is no longer possible to directly access Node class members
from exporters and parts of Cycles.
The methods are defined via the NODE_SOCKET_API macros in `graph/
node.h`, and are for getting or setting a specific socket's value, as
well as querying or modifying the state of its update flag.
The setters will check whether the value has changed and tag the socket
as modified appropriately. This will let us know how a Node has changed
and what to update, which is the first concrete step toward a more
granular scene update system.
Since the setters will tag the Node sockets as modified when passed
different data, this patch also removes the various `modified` methods
on Nodes in favor of `Node::is_modified` which checks the sockets'
update flags status.
Reviewed By: brecht
Maniphest Tasks: T79174
Differential Revision: https://developer.blender.org/D8544
This moves `Session::get_requested_device_features`,
`Session::load_kernels`, and `Session::update_scene` out of `Session`
and into `Scene`, as mentioned in D8544.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D8590
Compared to Optix denoise, this is usually slower since there is no GPU
acceleration. Some optimizations may still be possible, in avoid copies
to the GPU and/or denoising less often.
The main thing is that this adds viewport denoising support for computers
without an NVIDIA GPU (as long as the CPU supports SSE 4.1, which is nearly
all of them).
Ref T76259
Enabling render and viewport denoising is now both done from the render
properties. View layers still can individually be enabled/disabled for
denoising and have their own denoising parameters.
Note that the denoising engine also affects how denoising data passes are
output even if no denoising happens on the render itself, to make the passes
compatible with the engine.
This includes internal refactoring for how denoising parameters are passed
along, trying to avoid code duplication and unclear naming.
Ref T76259
Also removing the curve system manager which only stored a few curve intersection
settings. These are all changes towards making shape and subdivision settings
per-object instead of per-scene, but there is more work to do here.
Ref T73778
Depends on D8013
Maniphest Tasks: T73778
Differential Revision: https://developer.blender.org/D8014
This keeps render results compatible for combined CPU + GPU rendering.
Peformance and quality primitives is quite different than before. There
are now two options:
* Rounded Ribbon: render hair as flat ribbon with (fake) rounded normals, for
fast rendering. Hair curves are subdivided with a fixed number of user
specified subdivisions.
This gives relatively good results, especially when used with the Principled
Hair BSDF and hair viewed from a typical distance. There are artifacts when
viewed closed up, though this was also the case with all previous primitives
(but different ones).
* 3D Curve: render hair as 3D curve, for accurate results when viewing hair
close up. This automatically subdivides the curve until it is smooth.
This gives higher quality than any of the previous primitives, but does come
at a performance cost and is somewhat slower than our previous Thick curves.
The main problem here is performance. For CPU and OpenCL rendering performance
seems usually quite close or better for similar quality results.
However for CUDA and Optix, performance of 3D curve intersection is problematic,
with e.g. 1.45x longer render time in Koro (though there is no equivalent quality
and rounded ribbons seem fine for that scene). Any help or ideas to optimize this
are welcome.
Ref T73778
Depends on D8012
Maniphest Tasks: T73778
Differential Revision: https://developer.blender.org/D8013
This patch makes the infamous "Cancel" error in the viewport a thing of the past. Instead it
now shows a more useful error message and streamlines the error handling process in CUDA.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D8008
With this patch Cycles recognizing when a logical OptiX and CUDA device represent the same
physical GPU and attempts to eliminate unnecessary tile copies for viewport rendering if that
is the case for all active devices. In addition, denoising is now no longer performed on the first
available OptiX device only, but instead it will try to match CUDA and OptiX
rendering/denoising devices exactly to maximize utilization.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D7975
There should be no user visible change from this, except that tile size
now affects performance. The goal here is to simplify bake denoising in
D3099, letting it reuse more denoising tiles and pass code.
A lot of code is now shared with regular rendering, with the two main
differences being that we read some render result passes from the bake API
when starting to render a tile, and call the bake kernel instead of the
path trace kernel.
With this kind of design where Cycles asks for tiles from the bake API,
it should eventually be easier to reduce memory usage, show tiles as
they are baked, or bake multiple passes at once, though there's still
quite some work needed for that.
Reviewers: #cycles
Subscribers: monio, wmatyjewicz, lukasstockner97, michaelknubben
Differential Revision: https://developer.blender.org/D3108