blender

Author	SHA1	Message	Date
Brecht Van Lommel	fb99ea79f8	Code refactor: split displace/background into separate kernels, remove luma.	2017-10-05 17:57:58 +02:00
Brecht Van Lommel	49199963bf	Fix incorrect CUDA remaining time estimate after previous commit.	2017-10-04 23:25:51 +02:00
Brecht Van Lommel	6da6f8d33f	Cycles: CUDA faster rendering of small tiles, using multiple samples like OpenCL. The work size is still very conservative, and this doesn't help for progressive refine. For that we will need to render multiple tiles at the same time. But this should already help for denoising renders that require too much memory with big tiles, and just generally soften the performance dropoff with small tiles. Differential Revision: https://developer.blender.org/D2856	2017-10-04 21:58:47 +02:00
Brecht Van Lommel	12f4538205	Code refactor: use split variance calculation for mega kernels too. There is no significant difference in denoised benchmark scenes and denoising ctests, so might as well make it all consistent.	2017-10-04 21:11:14 +02:00
Brecht Van Lommel	e3e16cecc4	Code refactor: remove rng_state buffer and compute hash on the fly. A little faster on some benchmark scenes, a little slower on others, seems about performance neutral on average and saves a little memory.	2017-10-04 21:11:14 +02:00
Brecht Van Lommel	5b7d6ea54b	Code refactor: add WorkTile struct for passing work to kernel. This makes sharing some code between mega/split in following commits a bit easier, and also paves the way for rendering multiple tiles later.	2017-10-04 21:11:14 +02:00
Brecht Van Lommel	88520dd5b6	Code refactor: simplify CUDA context push/pop. Makes it possible to call a function like mem_alloc() when the context is already active. Also fixes some missing pops in case of errors.	2017-09-27 13:43:21 +02:00
Brecht Van Lommel	43a6cf1504	Cycles: attempt to recover from crashing CUDA/OpenCL drivers on Windows. I don't know if this will actually work, needs testing. Ref T52064.	2017-08-20 23:18:25 +02:00
Mai Lavelle	ec8ae4d5e9	Cycles: Pack kernel textures into buffers for OpenCL Image textures were being packed into a single buffer for OpenCL, which limited the amount of memory available for images to the size of one buffer (usually 4gb on AMD hardware). By packing textures into multiple buffers that limit is removed, while simultaneously reducing the number of buffers that need to be passed to each kernel. Benchmarks were within 2%. Fixes T51554. Differential Revision: https://developer.blender.org/D2745	2017-08-08 07:12:04 -04:00
Brecht Van Lommel	45dcd20ca9	Cycles: CUDA split performance tweaks, still far from megakernel. On Pabellon, 25.8s mega, 35.4s split before, 32.7s split after.	2017-08-05 14:32:59 +02:00
Sergey Sharybin	d37dd97e45	Cycles: Pass string by const reference rather than by value Some of the functions might have been inlined, but others i don't see how that was possible (don't think virtual functions can be inlined here). In any case, better be explicitly optimal in the code.	2017-07-05 12:27:41 +02:00
Lukas Stockner	705c43be0b	Cycles Denoising: Merge outlier heuristic and confidence interval test The previous outlier heuristic only checked whether the pixel is more than twice as bright compared to the 75% quantile of the 5x5 neighborhood. While this detected fireflies robustly, it also incorrectly marked a lot of legitimate small highlights as outliers and filtered them away. This commit adds an additional condition for marking a pixel as a firefly: In addition to being above the reference brightness, the lower end of the 3-sigma confidence interval has to be below it. Since the lower end approximates how low the true value of the pixel might be, this test separates pixels that are supposed to be very bright from pixels that are very bright due to random fireflies. Also, since there is now a reliable outlier filter as a preprocessing step, the additional confidence interval test in the reconstruction kernel is no longer needed.	2017-06-09 03:46:11 +02:00
Sergey Sharybin	34b689892b	Fix T51568: CUDA error in viewport render after fix for for OpenCL Seems re-loading module invalidates memory pointers by the looks of it, which gives an error on the next kernel call. Not sure how to move memory pointer from one CUDA module to another one, so for now simply disabling kernel re-load for CUDA devices. Not ideal, but better than failing render. Feature-selective option for CUDA is not an official feature anyway.	2017-05-22 12:28:21 +02:00
Sergey Sharybin	38a2bf665b	Cycles: Cleanup, style and unused arguments - Some arguments were inapproriatry tagged as unused using (void)foo semantic. Only use such semantic in tricky casses, when something needs to be ignored in release builds or something is dependent on tricky ifndef policy. For rest of the cases just use void foo(int /bar*/) semantic, which ensures variable is not used. Solves confusion and code running out of sync with later development. - Used proper unused semantic to some arguments. - Added braces to make code easier to follow, tricky indentation with ifdef, uh.	2017-05-20 05:21:27 -07:00
Lukas Stockner	ffd83a34ab	Fix T51502: Cycles denoising not using correctly aligned width for NLM on CUDA	2017-05-19 02:06:54 +02:00
Lukas Stockner	740cd28748	Cycles Denoising: Add more robust outlier heuristic to avoid artifacts Extremely bright pixels in the rendered image cause the denoising algorithm to produce extremely noticable artifacts. Therefore, a heuristic is needed to exclude these pixels from the filtering process. The new approach calculates the 75% percentile of the 5x5 neighborhood of each pixel and flags the pixel if it is more than twice as bright. During the reconstruction process, flagged pixels are skipped. Therefore, they don't cause any problems for neighboring pixels, and the outlier pixels themselves are replaced by a prediction of their actual value based on their feature pass values and the neighboring pixels. Therefore, the denoiser now also works as a smarter despeckling filter that uses a more accurate prediction of the pixel instead of a simple average. This can be used even if denoising isn't wanted by setting the denoising radius to 1.	2017-05-18 21:55:56 +02:00
Lukas Stockner	43b374e8c5	Cycles: Implement denoising option for reducing noise in the rendered image This commit contains the first part of the new Cycles denoising option, which filters the resulting image using information gathered during rendering to get rid of noise while preserving visual features as well as possible. To use the option, enable it in the render layer options. The default settings fit a wide range of scenes, but the user can tweak individual settings to control the tradeoff between a noise-free image, image details, and calculation time. Note that the denoiser may still change in the future and that some features are not implemented yet. The most important missing feature is animation denoising, which uses information from multiple frames at once to produce a flicker-free and smoother result. These features will be added in the future. Finally, thanks to all the people who supported this project: - Google (through the GSoC) and Theory Studios for sponsoring the development - The authors of the papers I used for implementing the denoiser (more details on them will be included in the technical docs) - The other Cycles devs for feedback on the code, especially Sergey for mentoring the GSoC project and Brecht for the code review! - And of course the users who helped with testing, reported bugs and things that could and/or should work better!	2017-05-07 14:40:58 +02:00
Sergey Sharybin	4384a7cf46	Cycles: Fix CUDA split kernel Global size y needs to be a multiple of 16.	2017-05-02 15:03:51 +02:00
Sergey Sharybin	4174e533c0	Cycles: Cache split kernels in CUDA device This way we don't re-load kernels for every sample in the viewport. Additionally, we don't risk global size changed inbetween of samples.	2017-05-02 15:03:12 +02:00
Mai Lavelle	1e6038a426	Cycles: Implement automatic global size for CUDA split kernel Not sure this is the best way to do things for CUDA but its much better than being unimplemented.	2017-04-11 03:11:18 -04:00
Sergey Sharybin	867d311307	Cycles: Fix warning with MSVC	2017-04-07 18:28:38 +02:00
Mai Lavelle	4b7d95290f	Cycles: More fixes after include changes	2017-03-31 10:12:13 +02:00
Sergey Sharybin	5af4e1ca15	Cycles: Only use CUDA 8.0 as officially supported one This deprecates CUDA 7.5.	2017-03-29 15:06:47 +02:00
Sergey Sharybin	0579eaae1f	Cycles: Make all #include statements relative to cycles source directory The idea is to make include statements more explicit and obvious where the file is coming from, additionally reducing chance of wrong header being picked up. For example, it was not obvious whether bvh.h was refferring to builder or traversal, whenter node.h is a generic graph node or a shader node and cases like that. Surely this might look obvious for the active developers, but after some time of not touching the code it becomes less obvious where file is coming from. This was briefly mentioned in T50824 and seems @brecht is fine with such explicitness, but need to agree with all active developers before committing this. Please note that this patch is lacking changes related on GPU/OpenCL support. This will be solved if/when we all agree this is a good idea to move forward. Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner Reviewed By: lukasstockner97, maiself, nirved, dingto Subscribers: brecht Differential Revision: https://developer.blender.org/D2586	2017-03-29 13:41:11 +02:00
Mai Lavelle	4d82d525f8	Cycles: Fix building for some compilers	2017-03-23 00:14:48 -04:00
Mai Lavelle	96868a3941	Fix T50888: Numeric overflow in split kernel state buffer size calculation Overflow led to the state buffer being too small and the split kernel to get stuck doing nothing forever.	2017-03-11 05:39:28 -05:00
Sergey Sharybin	712f7c3640	Cycles: Make it possible to access KernelGlobals from split data initialization function	2017-03-08 11:02:54 +01:00
Mai Lavelle	64751552f7	Cycles: Fix indentation	2017-03-08 01:31:32 -05:00
Mai Lavelle	306034790f	Cycles: Calculate size of split state buffer kernel side By calculating the size of the state buffer in the kernel rather than the host less code is needed and the size actually reflects the requested features. Will also be a little faster in some cases because of larger global work size.	2017-03-08 01:31:30 -05:00
Mai Lavelle	b78e543af9	Cycles: Add names to buffer allocations This is to help debug and track memory usage for generic buffers. We have similar for textures already since those require a name, but for buffers the name is only for debugging proposes.	2017-03-08 01:24:55 -05:00
Mai Lavelle	817873cc83	Cycles: CUDA implementation of split kernel	2017-03-08 01:24:53 -05:00
Mai Lavelle	0f56f7a811	Cycles: Allow device_memory to be used directly This is useful for when theres no host side memory attched to the buffer	2017-03-08 00:52:41 -05:00
Sergey Sharybin	5acac13eb4	Cycles: Fix compilation error on vanilla Ubuntu 16.10 Patch by @swerner, thanks!	2017-02-27 15:22:51 +01:00
Aaron Carlisle	e5d8c2a67f	Use new manual URL	2017-01-23 19:10:37 -05:00
Lukas Stockner	a2ebc5268f	Cycles: Refactor Progress system to provide better estimates The Progress system in Cycles had two limitations so far: - It just counted tiles, but ignored their size. For example, when rendering a 600x500 image with 512x512 tiles, the right 88x500 tile would count for 50% of the progress, although it only covers 15% of the image. - Scene update time was incorrectly counted as rendering time - therefore, the remaining time started very long and gradually decreased. This patch fixes both problems: First of all, the Progress now has a function to ignore time spans, and that is used to ignore scene update time. The larger change is the tile size: Instead of counting samples per tile, so that the final value is num_samplesnum_tiles, the code now counts every sample for every pixel, so that the final value is num_samplesnum_pixels. Along with that, some unused variables were removed from the Progress and Session classes. Reviewers: brecht, sergey, #cycles Subscribers: brecht, candreacchio, sergey Differential Revision: https://developer.blender.org/D2214	2016-12-03 05:02:21 +01:00
Sergey Sharybin	9aa8d1bc45	Cycles: Fix strict compilation warnings Should be no functional changes.	2016-11-22 16:39:03 +01:00
Lukas Stockner	dd921238d9	Cycles: Refactor Device selection to allow individual GPU compute device selection Previously, it was only possible to choose a single GPU or all of that type (CUDA or OpenCL). Now, a toggle button is displayed for every device. These settings are tied to the PCI Bus ID of the devices, so they're consistent across hardware addition and removal (but not when swapping/moving cards). From the code perspective, the more important change is that now, the compute device properties are stored in the Addon preferences of the Cycles addon, instead of directly in the User Preferences. This allows for a cleaner implementation, removing the Cycles C API functions that were called by the RNA code to specify the enum items. Note that this change is neither backwards- nor forwards-compatible, but since it's only a User Preference no existing files are broken. Reviewers: #cycles, brecht Reviewed By: #cycles, brecht Subscribers: brecht, juicyfruit, mib2berlin, Blendify Differential Revision: https://developer.blender.org/D2338	2016-11-07 03:19:29 +01:00
Martijn Berger	c02cce7b75	cycles, cuDeviceComputeCapability is deprecated as of cuda 5.0	2016-11-04 14:49:54 +01:00
Martijn Berger	4fdf68271c	Cycles standalone, compile fix UINT_MAX is not defined in device_cuda.cpp	2016-11-02 10:56:16 +01:00
Sergey Sharybin	333366dbcf	Cycles: Fix typo in shader cancel routines	2016-09-29 15:48:10 +02:00
Sergey Sharybin	91e0a16f2f	Cycles: Use XDG's .cache folder for cached kernels Basically just moves cached kernels from ~/.config/blender/BLENDER_VERSION to ~/.cache/cycles/kernels. This has following benefits: - Follows XDG specification more closely, not as if it's totally crucial or measurable by users, but still nice. - Prevents unexpected sizes of config folder, makes disk space used in more predictable for users way. - Allows to share kernels across multiple Blender versions, which makes it easier debugging at the times close to release. - "Copy Previous Settings" operator will no longer be copying possibly gigabytes of cached kernels, which used to lead to really nast disk usage and annoying delays of copying settings. - In the future we can have some smart logic to clear old unused cached kernels. Currently only done for Linux and OSX. Windows still follows old "cache" folder logic, but it's not really important for now because we don't support kernel compilation on this platform yet. Reviewers: dingto, juicyfruit, brecht Reviewed By: brecht Differential Revision: https://developer.blender.org/D2197	2016-09-12 09:39:05 +02:00
Thomas Dinges	9d236ac06c	Cycles: Enable half float support (4 channels and 1 channel) on CUDA. Atm OpenEXR half files benefit from this and will use only 1/2 of the memory now. More space for HDRs! Part of my GSoC 2016.	2016-08-11 22:47:53 +02:00
Thomas Dinges	c2a7317d1f	CUDA: We don't support Toolkits < 7.5, update error message.	2016-08-09 11:41:25 +02:00
Sergey Sharybin	b416168d85	Cycles: Cleanup, trailing whitespace	2016-08-02 14:09:34 +02:00
Sergey Sharybin	7b8b16a18c	Cycles: Some cleanup in CUDA device file	2016-08-02 14:09:34 +02:00
Sergey Sharybin	ad48f13099	Cycles: Include NVCC compiler flags into md5 hash This way we can easily switch between toolkits without worrying whether some kernel was compiled with old or new CUDA toolkit. It's also now possible to switch machine architecture and have proper cached kernel detected. Not as if it happens every day, but i did such a bitness switch back in the days :)	2016-08-02 14:09:34 +02:00
Sergey Sharybin	6353ecb996	Cycles: Tweaks to support CUDA 8 toolkit All the changes are mainly giving explicit tips on inlining functions, so they match how inlining worked with previous toolkit. This make kernel compiled by CUDA 8 render in average with same speed as previous kernels. Some scenes are somewhat faster, some of them are somewhat slower. But slowdown is within 1% so far. On a positive side it allows us to enable newer generation cards on buildbots (so GTX 10x0 will be officially supported soon).	2016-08-01 15:54:29 +02:00
Sergey Sharybin	b406b7be00	Cycles: Mark which CUDA device is used for display It is really handy to know which one is display when having two cards of same type in the machine.	2016-06-03 11:52:08 +02:00
Mai Lavelle	4388b29e98	Cycles: Add human readable sizes to debug output Some of these values can get quite large and are hard to read, adding this makes it easy to read them at a glance. Reviewed By: sergey Differential Revision: https://developer.blender.org/D2039	2016-05-31 06:13:54 -04:00
Brecht Van Lommel	f7c28a66e2	Fix Cycles compile errors with GCC due to double promotion as errors.	2016-05-22 19:17:22 +02:00

1 2 3 4

194 Commits