Commit graph

4482 commits

Author SHA1 Message Date
bunnei
028f6fdbf6
Merge pull request #3884 from ReinUsesLisp/border-colors
vk_sampler_cache: Use VK_EXT_custom_border_color when available
2020-05-07 12:18:53 -04:00
bunnei
41682e0888
Merge pull request #3815 from FernandoS27/command-list-2
GPU: More optimizations to GPU Command List Processing and DMA Copy Optimizations
2020-05-05 17:12:42 -04:00
bunnei
eb2c50c5e6
Update src/video_core/gpu.cpp
Co-authored-by: David <25727384+ogniK5377@users.noreply.github.com>
2020-05-05 15:39:44 -04:00
bunnei
ea09930196
Update src/video_core/gpu.cpp
Co-authored-by: David <25727384+ogniK5377@users.noreply.github.com>
2020-05-05 15:39:37 -04:00
ReinUsesLisp
227278098a vk_sampler_cache: Use VK_EXT_custom_border_color when available
This should fix grass interactions on Breath of the Wild on Vulkan.
It is currently untested against validation layers.

Nvidia's Windows 443.09 beta driver or Linux 440.66.12 is required for
now.
2020-05-04 20:49:23 -03:00
ReinUsesLisp
2dbf5290f2 vk_graphics_pipeline: Implement viewport swizzles with NV_viewport_swizzle 2020-05-04 18:31:17 -03:00
ReinUsesLisp
f813cd3ff7 gl_rasterizer: Implement viewport swizzles with NV_viewport_swizzle 2020-05-04 17:51:30 -03:00
ReinUsesLisp
9b8e962368 maxwell_3d: Add viewport swizzles 2020-05-04 17:50:59 -03:00
bunnei
2aff0b4733
Merge pull request #3808 from ReinUsesLisp/wait-for-idle
{maxwell_3d,buffer_cache}: Implement memory barriers using 3D registers
2020-05-03 02:43:18 -04:00
bunnei
f4ca8e0d3e
Merge pull request #3732 from lioncash/header
vulkan: Remove unnecessary includes
2020-05-02 01:36:57 -04:00
bunnei
0128901102
Merge pull request #3809 from ReinUsesLisp/empty-index
vk_rasterizer: Skip index buffer setup when vertices are zero
2020-05-02 01:21:57 -04:00
ReinUsesLisp
3b668e1210 vk_graphics_pipeline: Implement rasterizer_enable on Vulkan
We can simply enable rasterizer discard matching the current pipeline
key.
2020-05-02 01:47:25 -03:00
bunnei
e6b4311178
Merge pull request #3693 from ReinUsesLisp/clean-samplers
shader/texture: Support multiple unknown sampler properties
2020-05-02 00:45:41 -04:00
Jan Beich
b4d0724a63 fixed_pipeline_state: explicitly use template keyword after 1f345ebe3a
In file included from src/video_core/renderer_opengl/renderer_opengl.cpp:25:
In file included from src/./video_core/renderer_opengl/gl_rasterizer.h:26:
In file included from src/./video_core/renderer_opengl/gl_fence_manager.h:11:
src/./video_core/fence_manager.h:91:32: error: use 'template' keyword
      to treat 'Write' as a dependent template name
                memory_manager.Write<u32>(current_fence->GetAddress(), current_fence->GetPayload());
                               ^
                               template
src/./video_core/fence_manager.h:137:32: error: use 'template'
      keyword to treat 'Write' as a dependent template name
                memory_manager.Write<u32>(current_fence->GetAddress(), current_fence->GetPayload());
                               ^
                               template
2020-05-01 23:38:23 +00:00
Dan
96ee1b42bc maxwell_to_vk: implement missing signed int formats 2020-04-30 23:39:16 +02:00
Morph
7909860d16 texture: Implement R8G8UI
- Used by The Walking Dead: The Final Season
2020-04-30 13:19:36 -04:00
bunnei
bf3f030a0d
Merge pull request #3807 from ReinUsesLisp/fix-depth-clamp
maxwell_3d: Fix depth clamping register
2020-04-30 13:07:31 -04:00
bunnei
c7b5a87c90
Merge pull request #3799 from ReinUsesLisp/iadd-cc
shader: Implement P2R CC, IADD Rd.CC and IADD.X
2020-04-30 12:56:36 -04:00
bunnei
da2b8295e1
Merge pull request #3805 from ReinUsesLisp/preserve-contents
texture_cache: Reintroduce preserve_contents accurately
2020-04-30 12:56:19 -04:00
bunnei
6572660fde
Merge pull request #3788 from FernandoS27/revert
Revert: shader_decode: Fix LD, LDG when track constant buffer.
2020-04-30 12:55:39 -04:00
Lioncash
6c53edd4d3 vulkan: Remove unnecessary includes
Reduces some header churn and reduces rebuilds when some header
internals change.

While we're at it we can also resolve a missing include in buffer_cache.
2020-04-28 21:54:46 -04:00
ReinUsesLisp
871aadbe36 shader/arithmetic_integer: Fix tracking issue in temporary
This temporary is not needed as we mark Rd.CC + IADD.X as unimplemented.
It caused issues when tracking global buffers.
2020-04-28 17:14:53 -03:00
Fernando Sahmkow
9df67b2095 Clang Format and Documentation. 2020-04-28 14:02:51 -04:00
Fernando Sahmkow
37c690576f MaxwellDMA: Optimize micro copies. 2020-04-28 13:44:14 -04:00
bunnei
72b73d22ab
Merge pull request #3784 from ReinUsesLisp/shader-memory-util
shader/memory_util: Deduplicate code
2020-04-28 12:05:50 -04:00
ReinUsesLisp
d6a24b4a5b vk_rasterizer: Skip index buffer setup when vertices are zero
Xenoblade 2 invokes a draw call with zero vertices.
This is likely due to indirect drawing (glDrawArraysIndirect).

This causes a crash in the staging buffer pool when trying to create a
buffer with a size of zero. To workaround this, skip index buffer setup
entirely when the number of indices is zero.
2020-04-28 02:24:33 -03:00
ReinUsesLisp
fe931ac976 {maxwell_3d,buffer_cache}: Implement memory barriers using 3D registers
Drop MemoryBarrier from the buffer cache and use Maxwell3D's register
WaitForIdle.

To implement this on OpenGL we just call glMemoryBarrier with the
necessary bits.

Vulkan lacks this synchronization primitive, so we set an event and
immediately wait for it. This is not a pretty solution, but it's what
Vulkan can do without submitting the current command buffer to the queue
(which ends up being more expensive on the CPU).
2020-04-28 02:18:12 -03:00
Fernando Sahmkow
b87422a86f VideoCore/GPU: Delegate subchannel engines to the dma pusher. 2020-04-27 22:07:21 -04:00
Fernando Sahmkow
90e5694230 VideoCore/Engines: Refactor Engines CallMethod. 2020-04-27 21:47:58 -04:00
ReinUsesLisp
bb1ed66d99 maxwell_3d: Fix depth clamping register
Using deko3d as reference:
4e47ba0013/source/maxwell/gpu_3d_state.cpp (L42)

We were using bits 3 and 4 to determine depth clamping, but these are
the same both enabled and disabled:

state->depthClampEnable ? 0x101A : 0x181D

The same happens on Nvidia's OpenGL driver, where they do something like
this (default capabilities, GL 4.5 compatibility):

(state & DEPTH_CLAMP) != 0 ? 0x201a : 0x281c

There's always a difference between the first bits in this register, but
bit 11 is consistently disabled on both deko3d/NVN and OpenGL. This
commit changes yuzu's behaviour to use bit 11 to determine depth
clamping.

- Fixes depth issues on Super Mario Odyssey's intro.
2020-04-27 20:50:14 -03:00
Fernando Sahmkow
1517cba8ca
Merge pull request #3766 from ReinUsesLisp/renderpass-cache-key
vk_renderpass_cache: Pack renderpass cache key and unify keys
2020-04-27 16:05:14 -04:00
Fernando Sahmkow
a65e9ad552
Merge pull request #3756 from ReinUsesLisp/integrated-devices
vk_memory_manager: Remove unified memory model flag
2020-04-27 16:04:22 -04:00
bunnei
6c7d8073be
Merge pull request #3742 from FernandoS27/command-list
Optimize GPU Command Lists and Introduce Fast GPU Time Option
2020-04-27 00:18:46 -04:00
ReinUsesLisp
8da16cf9fb texture_cache: Reintroduce preserve_contents accurately
This reverts commit 94b0e2e5da.

preserve_contents proved to be a meaningful optimization. This commit
reintroduces it but properly implemented on OpenGL.

We have to make sure the clear removes all the previous contents of the
image.

It's not currently implemented on Vulkan because we can do smart things
there that's preferred to be introduced in a separate commit.
2020-04-26 19:53:02 -03:00
Rodrigo Locatti
7e38dd580f
Merge pull request #3753 from ReinUsesLisp/ac-vulkan
{gl,vk}_rasterizer: Add lazy default buffer maker and use it for empty buffers
2020-04-26 01:55:43 -03:00
ReinUsesLisp
ddd82ef42b shader/memory_util: Deduplicate code
Deduplicate code shared between vk_pipeline_cache and gl_shader_cache as
well as shader decoder code.

While we are at it, fix a bug in gl_shader_cache where compute shaders
had an start offset of a stage shader.
2020-04-26 01:38:51 -03:00
ReinUsesLisp
e895a4e2d7 shader/arithmetic_integer: Fix edge case and mark IADD.X Rd.CC as unimplemented
IADD.X Rd.CC requires some extra logic that is not currently
implemented. Abort when this is hit.
2020-04-25 22:58:33 -03:00
ReinUsesLisp
2a96bea6a7 shader/arithmetic_integer: Change IAdd to UAdd to avoid signed overflow
Signed integer addition overflow might be undefined behavior. It's free
to change operations to UAdd and use unsigned integers to avoid
potential bugs.
2020-04-25 22:57:54 -03:00
ReinUsesLisp
c788f9c0bd shader/arithmetic_integer: Implement IADD.X
IADD.X takes the carry flag and adds it to the result. This is generally
used to emulate 64-bit operations with 32-bit registers.
2020-04-25 22:56:11 -03:00
ReinUsesLisp
255197e643 shader/arithmetic_integer: Implement CC for IADD 2020-04-25 22:55:26 -03:00
ReinUsesLisp
ffc5ec6fa8 decode/register_set_predicate: Implement CC
P2R CC takes the state of condition codes and puts them into a register.
We already have this implemented for PR (predicates). This commit
implements CC over that.
2020-04-25 22:54:42 -03:00
ReinUsesLisp
d523734266 decode/register_set_predicate: Use move for shared pointers
Avoid atomic counters used by shared pointers.
2020-04-25 22:54:14 -03:00
bunnei
c5bf693882
Merge pull request #3721 from ReinUsesLisp/sort-devices
vulkan/wrapper: Sort physical devices
2020-04-25 03:27:40 -04:00
bunnei
4e37825dab
Merge pull request #3734 from ReinUsesLisp/half-float-mods
decode/arithmetic_half: Fix HADD2 and HMUL2 absolute and negation bits
2020-04-25 00:41:43 -04:00
ReinUsesLisp
527a1574c3 vk_rasterizer: Pack texceptions and color formats on invalid formats
Sometimes for unknown reasons NVN games can bind a render target format
of 0. This may be a yuzu bug.

With the commits before this the formats were specified without being
"packed", assuming all formats and texceptions will be written like in
the color_attachments vector.

To address this issue, iterate all render targets and pack them as they
are valid. This way they will match color_attachments.

- Fixes validation errors and graphical issues on Breath of the Wild.
2020-04-24 22:21:29 -03:00
bunnei
7c8acb0025
Merge pull request #3749 from ReinUsesLisp/lea-imm
shader/arithmetic_integer: Fix LEA_IMM encoding
2020-04-24 14:30:13 -04:00
Fernando Sahmkow
d8a961cd6c Revert: shader_decode: Fix LD, LDG when track constant buffer. 2020-04-24 11:00:54 -04:00
Markus Wick
e717a1df20 Fix -Wdeprecated-copy warning. 2020-04-24 09:33:04 +02:00
Markus Wick
c499c22cf7 Fix -Werror=conversion error. 2020-04-24 09:33:04 +02:00
ReinUsesLisp
dbaebd8582 decode/arithmetic_half: Fix HADD2 and HMUL2 absolute and negation bits
The encoding for negation and absolute value was wrong.
Extracting is now done manually. Similar instructions having different
encodings is the rule, not the exception. To keep sanity and readability
I preferred to extract the desired bit manually.

This is implemented against nxas:
8dbc389957/table.h (L68)

That is itself tested against nvdisasm (Nvidia's official disassembler).
2020-04-23 18:29:38 -03:00
ReinUsesLisp
4fb921ff6b shader/texture: Support multiple unknown sampler properties
This allows deducing some properties from the texture instruction before
asking the runtime. By doing this we can handle type mismatches in some
instructions from the renderer instead of the shader decoder.

Fixes texelFetch issues with games using 2D texture instructions on a 1D
sampler.
2020-04-23 18:04:13 -03:00
ReinUsesLisp
72deb773fd shader_ir: Turn classes into data structures 2020-04-23 18:00:06 -03:00
ReinUsesLisp
3e35101895 vk_rasterizer: Fix framebuffer creation validation errors
Framebuffer creation was ignoring the number of color attachments.
2020-04-23 17:34:16 -03:00
ReinUsesLisp
8c37cd1af6 vk_pipeline_cache: Unify pipeline cache keys into a single operation
This allows us to call Common::CityHash and std::memcmp only once for
GraphicsPipelineCacheKey. While we are at it, do the same for compute.
2020-04-23 17:34:16 -03:00
ReinUsesLisp
f665c92114 vk_renderpass_cache: Pack renderpass cache key to 12 bytes 2020-04-23 17:34:16 -03:00
bunnei
ff0c49e1ce
kernel: memory: Improve implementation of device shared memory. (#3707)
* kernel: memory: Improve implementation of device shared memory.

* fixup! kernel: memory: Improve implementation of device shared memory.

* fixup! kernel: memory: Improve implementation of device shared memory.
2020-04-23 11:37:12 -04:00
Fernando Sahmkow
5c9feaebb6 Clang Format. 2020-04-23 08:52:58 -04:00
Fernando Sahmkow
b8aef40c56 GPU: Add Fast GPU Time Option. 2020-04-23 08:52:57 -04:00
Fernando Sahmkow
18a88d19dc Maxwell3D: Process Macros on MultiMethod. 2020-04-23 08:52:56 -04:00
Fernando Sahmkow
3fedcc2f6e DMAPusher: Propagate multimethod writes into the engines. 2020-04-23 08:52:55 -04:00
bunnei
2409fedacf
Merge pull request #3697 from lioncash/declarations
CMakeLists: Enable -Wmissing-declarations on Linux builds
2020-04-23 02:18:52 -04:00
bunnei
bf2ddb8fd5
Merge pull request #3677 from FernandoS27/better-sync
Introduce Predictive Flushing and Improve ASYNC GPU
2020-04-22 22:09:38 -04:00
ReinUsesLisp
d9463f4562 vk_pipeline_cache: Fix unintentional memcpy into optional
The intention behind this was to assign a float to from an uint32_t, but
it was unintentionally being copied directly into the std::optional.

Copy to a temporary and assign that temporary to std::optional. This can
be replaced with std::bit_cast<float> once we are in C++20.
2020-04-22 21:36:05 -03:00
Fernando Sahmkow
c043ac4f13 GL_Fence_Manager: use GL_TIMEOUT_IGNORED instead of a loop, 2020-04-22 20:34:32 -04:00
Fernando Sahmkow
afae40a99e
Merge pull request #3653 from ReinUsesLisp/nsight-aftermath
renderer_vulkan: Integrate Nvidia Nsight Aftermath on Windows
2020-04-22 11:39:01 -04:00
Fernando Sahmkow
4e37f1b113 Address Feedback. 2020-04-22 11:36:27 -04:00
Fernando Sahmkow
39e5b72948 Async GPU: Correct flushing behavior to be similar to old async GPU behavior. 2020-04-22 11:36:26 -04:00
Fernando Sahmkow
1b3be8a8f8 MaxwellDMA: Correct copying on accuracy level. 2020-04-22 11:36:25 -04:00
Fernando Sahmkow
644588fd88 ShaderCache/PipelineCache: Cache null shaders. 2020-04-22 11:36:25 -04:00
Fernando Sahmkow
f616dc0b59 Address Feedback. 2020-04-22 11:36:24 -04:00
Fernando Sahmkow
ec2f3e48e1 Fix GCC error. 2020-04-22 11:36:23 -04:00
Fernando Sahmkow
b3e5f177ba QueryCache: Only do async flushes on async gpu. 2020-04-22 11:36:21 -04:00
Fernando Sahmkow
f4ab223ef0 Async GPU: Only do reactive flushing on Extreme Level. 2020-04-22 11:36:20 -04:00
ReinUsesLisp
b752faf2d3 vk_fence_manager: Initial implementation 2020-04-22 11:36:19 -04:00
Fernando Sahmkow
0649f05900 QueryCache: Implement Async Flushes. 2020-04-22 11:36:18 -04:00
Fernando Sahmkow
131b342130 OpenGL: Guarantee writes to Buffers. 2020-04-22 11:36:18 -04:00
Fernando Sahmkow
1fb516cd97 GPU: Implement Flush Requests for Async mode. 2020-04-22 11:36:17 -04:00
Fernando Sahmkow
b7bc3c2549 FenceManager: Manage syncpoints and rename fences to semaphores. 2020-04-22 11:36:16 -04:00
Fernando Sahmkow
96bb961a64 BufferCache: Refactor async managing. 2020-04-22 11:36:15 -04:00
Fernando Sahmkow
b10db7e4a5 FenceManager: Implement async buffer cache flushes on High settings 2020-04-22 11:36:15 -04:00
Fernando Sahmkow
4adfc9bb08 Rasterizer: Document SignalFence & ReleaseFences and setup skeletons on Vulkan. 2020-04-22 11:36:14 -04:00
Fernando Sahmkow
a081a7c855 GPU: Fix rebase errors. 2020-04-22 11:36:13 -04:00
Fernando Sahmkow
e84eb64e51 Rasterizer: Disable fence managing in synchronous gpu. 2020-04-22 11:36:12 -04:00
Fernando Sahmkow
165ae823f5 ThreadManager: Sync async reads on accurate gpu. 2020-04-22 11:36:12 -04:00
Fernando Sahmkow
57fdbd9b89 FenceManager: Implement should wait. 2020-04-22 11:36:11 -04:00
Fernando Sahmkow
1f345ebe3a GPU: Implement a Fence Manager. 2020-04-22 11:36:10 -04:00
Fernando Sahmkow
487379c593 OpenGL: Implement Fencing backend. 2020-04-22 11:36:10 -04:00
Fernando Sahmkow
ed7e965712 TextureCache: Flush linear textures after finishing rendering. 2020-04-22 11:36:09 -04:00
Fernando Sahmkow
339d0d9d6c GPU: Delay Fences. 2020-04-22 11:36:08 -04:00
Fernando Sahmkow
8b1eb44b3e BufferCache: Implement OnCPUWrite and SyncGuestHost 2020-04-22 11:36:07 -04:00
Fernando Sahmkow
da8f17715d GPU: Refactor synchronization on Async GPU 2020-04-22 11:36:06 -04:00
Fernando Sahmkow
a60a22d9c2 Texture Cache: Implement OnCPUWrite and SyncGuestHost 2020-04-22 11:36:05 -04:00
Fernando Sahmkow
084ceb925a UI: Replasce accurate GPU option for GPU Accuracy Level 2020-04-22 11:36:04 -04:00
ReinUsesLisp
6f47bd9641 vk_memory_manager: Remove unified memory model flag
All drivers (even Intel) seem to have a device local memory type that is
not host visible. Remove this flag so all devices follow the same path.

This fixes a crash when trying to map to host device local memory on
integrated devices.
2020-04-21 22:06:38 -03:00
bunnei
d64290884a
Merge pull request #3714 from lioncash/copies
gl_shader_decompiler: Avoid copies where applicable
2020-04-21 20:16:02 -04:00
ReinUsesLisp
488ed8bd02 vk_rasterizer: Add lazy default buffer maker and use it for empty buffers
Introduce a default buffer getter that lazily constructs an empty
buffer. This is intended to match OpenGL's buffer 0.

Use this for disabled vertex and uniform buffers.

While we are at it, include vertex buffer usages for staging buffers to
silence validation errors.
2020-04-21 19:55:52 -03:00
ReinUsesLisp
0bbae63300 gl_rasterizer: Fix buffers without size
On NVN buffers can be enabled but have no size. According to deko3d and
the behavior we see in Animal Crossing: New Horizons these buffers get
the special address of 0x1000 and limit themselves to 0xfff.

Implement buffers without a size by binding a null buffer to OpenGL
without a side.

1d1930beea/source/maxwell/gpu_3d_vbo.cpp (L62-L63)
2020-04-21 19:55:44 -03:00
Rodrigo Locatti
f293b15611
Merge pull request #3718 from ReinUsesLisp/better-pipeline-state
fixed_pipeline_state: Pack structure, use memcmp and CityHash on it
2020-04-21 18:17:58 -03:00
bunnei
9bf3abcb63
Merge pull request #3698 from lioncash/warning
General: Resolve minor assorted warnings
2020-04-21 14:11:18 -04:00
bunnei
d3e0cefa60
Merge pull request #3695 from ReinUsesLisp/default-attributes
maxwell_3d: Initialize format attributes constant as one
2020-04-20 21:40:18 -04:00
ReinUsesLisp
8734ccb0cb shader/arithmetic_integer: Fix LEA_IMM encoding
The operand order in LEA_IMM was flipped compared to nvdisasm. Fix that
using nxas as reference:

8dbc389957/table.h (L122)
2020-04-20 21:54:59 -03:00
Mat M
cb5b8ca886
Merge pull request #3733 from ambasta/patch-2
Initialize quad_indexed_pass before uint8_pass
2020-04-20 20:36:46 -04:00
Fernando Sahmkow
ec2f8f4272
Merge pull request #3700 from ReinUsesLisp/stream-buffer-sizes
vk_stream_buffer: Fix out of memory on boot on recent Nvidia drivers
2020-04-20 09:37:42 -04:00
Amit Prakash Ambasta
5324b1d01e
Initialize quad_indexed_pass before uint8_pass
Fixes Werror=reorder in gcc
2020-04-20 04:53:52 +05:30
Rodrigo Locatti
4932010c6f
Merge pull request #3729 from lioncash/globals
dma_pusher: Remove reliance on the global system instance
2020-04-19 19:12:40 -03:00
bunnei
85c17a2c35
Merge pull request #3694 from ReinUsesLisp/indexed-quads
vk_compute_pass: Implement indexed quads
2020-04-19 16:52:40 -04:00
Lioncash
44e959157b dma_pusher: Remove reliance on the global system instance
With this, the video core is now has no calls to the global system
instance at all.
2020-04-19 16:12:08 -04:00
bunnei
2ea7a70da0
Merge pull request #3686 from lioncash/table
texture_cache/format_lookup_table: Fix incorrect green, blue, and alpha indices
2020-04-19 15:33:33 -04:00
bunnei
73db83c0ab
Merge pull request #3679 from lioncash/track
track: Eliminate redundant copies
2020-04-19 01:22:47 -04:00
Jan Beich
afcc84a172 renderer_vulkan: assume X11 if not Windows/macOS after bf1d66b7c0
Render.Vulkan <Error> video_core/renderer_vulkan/renderer_vulkan.cpp:CreateInstance:131: Presentation not supported on this platform
Render.Vulkan <Error> video_core/renderer_vulkan/renderer_vulkan.cpp:CreateSurface:378: Presentation not supported on this platform
Core <Critical> core/core.cpp:Load:199: Failed to initialize system (Error 5)!
2020-04-19 00:32:23 +00:00
ReinUsesLisp
c81bf06d03 vulkan/wrapper: Sort physical devices
Sort discrete GPUs over the rest, Nvidia over AMD, AMD over Intel, Intel
over the rest. This gives us a somewhat consistent order when Optimus
is removed (renderdoc does this when it's attached).

This can break the configuration of users with an Intel GPU that
manually remove Optimus on yuzu. That said, it's a very unlikely to
happen.
2020-04-18 21:31:15 -03:00
ReinUsesLisp
d62f57cf5a fixed_pipeline_state: Hash and compare the whole structure
Pad FixedPipelineState's size to 384 bytes to be a multiple of 16.

Compare the whole struct with std::memcmp and hash with CityHash. Using
CityHash instead of a naive hash should reduce the number of collisions.
Improve used type traits to ensure this operation is safe.

With these changes the improvements to the hashable pipeline state are:

Optimized structure
Hash:            89 ns
Comparison:     103 ns
Construction*:  164 ns
Struct size:    384 bytes

Original structure
Hash:           148 ns
Equal:          174 ns
Construction*:  281 ns
Size:          1384 bytes

* Attribute state initialization is not measured

These measures are averages taken with std::chrono::high_accuracy_clock
on MSVC shipped on Visual Studio 16.6.0 Preview 2.1.
2020-04-18 19:57:26 -03:00
ReinUsesLisp
b571c92dfd fixed_pipeline_state: Pack blending state
Reduce FixedPipelineState's size to 364 bytes.
2020-04-18 19:23:35 -03:00
ReinUsesLisp
548dd27f45 fixed_pipeline_state: Pack rasterizer state
Reduce FixedPipelineState's size to 600 bytes.
2020-04-18 19:22:57 -03:00
ReinUsesLisp
7790144a55 fixed_pipeline_state: Pack depth stencil state
Reduce FixedPipelineState's size to 632 bytes.
2020-04-18 19:22:11 -03:00
ReinUsesLisp
ab6704f20c fixed_pipeline_state: Pack attribute state
Reduce FixedPipelineState's size from 1384 to 664 bytes
2020-04-18 19:21:19 -03:00
Mat M
5305806071
Merge pull request #3716 from bunnei/fix-another-impl-fallthrough
video_core: gl_shader_decompiler: Fix implicit fallthrough errors.
2020-04-18 15:17:52 -04:00
bunnei
03726fb7f5 video_core: gl_shader_decompiler: Fix implicit fallthrough errors. 2020-04-18 15:15:21 -04:00
Lioncash
bf328ed35a gl_shader_decompiler: Avoid copies where applicable
Avoids unnecessary reference count increments where applicable and also
avoids reallocating a vector.

Unlikely to make a huge difference, but given how trivial of an
amendment it is, why not?
2020-04-17 20:48:52 -04:00
Markus Wick
07fbef1776 video_code: Fix implicit switch fallthrough.
Since yesterday, this breaks the build on linux.
So let's fix it.
2020-04-17 23:43:35 +02:00
ReinUsesLisp
a7b6bd56d7 vk_stream_buffer: Fix out of memory on boot on recent Nvidia drivers
Nvidia recently introduced a new memory type for data streaming
(awesome!), but yuzu was assuming that all heaps had enough memory
for the assumed stream buffer size (256 MiB).

This worked fine on AMD but Nvidia's new memory heap was smaller than
256 MiB. This commit changes this assumption and allocates a bit less
than the size of the preferred heap, with a maximum of 256 MiB (to avoid
allocating all system memory on integrated devices).

- Fixes a crash on NVIDIA 450.82.0.0
2020-04-17 18:12:48 -03:00
Rodrigo Locatti
990c0b184f
Revert "gl_shader_cache: Use CompileDepth::FullDecompile on GLSL" 2020-04-17 17:41:48 -03:00
bunnei
b8f5c71f2d
Merge pull request #3666 from bunnei/new-vmm
Implement a new virtual memory manager
2020-04-17 16:33:08 -04:00
bunnei
ca3af2961c
Merge pull request #3682 from lioncash/uam
gl_query_cache: Resolve use-after-move in CachedQuery move assignment operator
2020-04-17 01:24:08 -04:00
bunnei
32fc2aae3c video_core: memory_manager: Updates for Common::PageTable changes. 2020-04-17 00:59:34 -04:00
bunnei
4caff51710 core: memory: Move to Core::Memory namespace.
- helpful to disambiguate Kernel::Memory namespace.
2020-04-17 00:59:28 -04:00
Lioncash
e2d8be1ca2 General: Resolve warnings related to missing declarations 2020-04-16 23:43:34 -04:00
Lioncash
678ac54749 decode/memory: Resolve unused variable warning
Only the first element of the returned pair is ever used.
2020-04-16 22:45:44 -04:00
Lioncash
d159643fd7 decode/texture: Resolve unused variable warnings.
Some variables aren't used, so we can remove these.

Unfortunately, diagnostics are still reported on structured bindings
even when annotated with [[maybe_unused]], so we need to unpack the
elements that we want to use manually.
2020-04-16 22:45:41 -04:00
Lioncash
f522abd8ab decode/texture: Collapse loop down into std::generate
Same behavior, less code.
2020-04-16 22:29:07 -04:00
Lioncash
7e2d60de26 decode/texture: Eliminate trivial missing field initializer warnings
We can just specify the initializers.
2020-04-16 22:27:21 -04:00
bunnei
79c1269f0f
Merge pull request #3673 from lioncash/extra
CMakeLists: Specify -Wextra on linux builds
2020-04-16 21:12:33 -04:00
ReinUsesLisp
238c6016f9 maxwell_3d: Initialize format attributes constant as one
nouveau expects this to be true but it doesn't set it.
2020-04-16 21:15:07 -03:00
ReinUsesLisp
c961770900 vk_compute_pass: Implement indexed quads
Implement indexed quads (GL_QUADS used with glDrawElements*) with a
compute pass conversion.

The compute shader converts from uint8/uint16/uint32 indices to uint32.
The format is passed through push constants to avoid having different
variants of the same shader.

- Used by Fast RMX
- Used by Xenoblade Chronicles 2 (it still has graphical due to
synchronization issues on Vulkan)
2020-04-16 21:12:32 -03:00
Fernando Sahmkow
c81f256111
Merge pull request #3600 from ReinUsesLisp/no-pointer-buf-cache
buffer_cache: Return handles instead of pointer to handles
2020-04-16 19:58:13 -04:00
ReinUsesLisp
090fd3fefa buffer_cache: Return handles instead of pointer to handles
The original idea of returning pointers is that handles can be moved.
The problem is that the implementation didn't take that in mind and made
everything harder to work with. This commit drops pointer to handles and
returns the handles themselves. While it is still true that handles can
be invalidated, this way we get an old handle instead of a dangling
pointer.

This problem can be solved in the future with sparse buffers.
2020-04-16 02:33:34 -03:00
Rodrigo Locatti
a5a2ee8766
Merge pull request #3689 from lioncash/unused-var
decode/shift: Remove unused variable within Shift()
2020-04-16 02:05:54 -03:00
Rodrigo Locatti
d196ce0f71
Merge pull request #3688 from lioncash/nequal
surface_view: Add missing operator!= to ViewParams
2020-04-16 01:39:51 -03:00
Rodrigo Locatti
4209dba1f6
Merge pull request #3680 from lioncash/static
gl_device: Mark stage_swizzle as constexpr
2020-04-16 01:26:23 -03:00
Rodrigo Locatti
60e8de7c95
Merge pull request #3687 from lioncash/constness
surface_base: Make IsInside() a const member function
2020-04-16 01:22:50 -03:00
Rodrigo Locatti
612966399b
Merge pull request #3685 from lioncash/copies
control_flow: Make use of std::move in TryInspectAddress()
2020-04-16 01:22:40 -03:00
Lioncash
cd2a12e78f decode/shift: Remove unused variable within Shift()
Removes a redundant variable that is already satisfied by the IsFull()
utility function.
2020-04-16 00:16:06 -04:00
Lioncash
5fbe8785d2 surface_view: Add missing operator!= to ViewParams
Provides logical symmetry to the interface.
2020-04-16 00:03:12 -04:00
Lioncash
d551c910bb surface_base: Make IsInside() a const member function
This doesn't modify internal state, so this can be made const.
2020-04-15 23:59:35 -04:00
bunnei
319df1db77
Merge pull request #3683 from lioncash/docs
video_core: Amend doxygen comment references
2020-04-15 23:54:58 -04:00
Lioncash
636c8ab85b texture_cache/format_lookup_table: Fix incorrect green, blue, and alpha indices
Previously these were all using the red component to derive the indices,
which is definitely not intentional.
2020-04-15 23:50:46 -04:00
Lioncash
72a224d3fc control_flow: Make use of std::move in TryInspectAddress()
Eliminates redundant atomic reference count increments and decrements.
2020-04-15 23:31:22 -04:00
Lioncash
11837e8f13 video_core: Amend doxygen comment references
Fixes broken documentation references.
2020-04-15 22:33:29 -04:00
Lioncash
3a60f19eaf gl_query_cache: Resolve use-after-move in CachedQuery move assignment operator
Avoids potential invalid junk data from being read.
2020-04-15 22:20:06 -04:00
Lioncash
71fb156611 gl_device: Mark stage_swizzle as constexpr
Previously this was mutable even though it shouldn't be.
2020-04-15 21:59:13 -04:00
Lioncash
e15ec2705c track: Eliminate redundant copies
Two variables can be references, while two others can be std::moved.
Makes for 4 less atomic reference count increments and decrements.
2020-04-15 21:50:09 -04:00
Lioncash
1c340c6efa CMakeLists: Specify -Wextra on linux builds
Allows reporting more cases where logic errors may exist, such as
implicit fallthrough cases, etc.

We currently ignore unused parameters, since we currently have many
cases where this is intentional (virtual interfaces).

While we're at it, we can also tidy up any existing code that causes
warnings. This also uncovered a few bugs as well.
2020-04-15 21:33:46 -04:00
Rodrigo Locatti
65cbb122ea
Merge pull request #3649 from FernandoS27/3d-fix
Texture Cache: Read current data when flushing a 3D segment.
2020-04-15 17:06:55 -03:00
Fernando Sahmkow
e33196d4e7
Merge pull request #3612 from ReinUsesLisp/red
shader/memory: Implement RED.E.ADD and minor changes to ATOM
2020-04-15 15:03:49 -04:00
Lioncash
213fff67bc CMakeLists: Make -Wreorder a compile-time error
This can result in silent logic bugs within code, and given the amount
of times these kind of warnings are caused, they should be flagged at
compile-time so no new code is submitted with them.
2020-04-15 14:14:41 -04:00
Mat M
64b5985f0a
Merge pull request #3662 from ReinUsesLisp/constant-attrs
gl_rasterizer: Implement constant vertex attributes
2020-04-15 11:54:50 -04:00
Fernando Sahmkow
6789d88a9c Texture Cache: Read current data when flushing a 3D segment.
This PR corrects flushing of 3D segments when data of other segments is
mixed, this aims to preserve the data in place.
2020-04-15 11:46:17 -04:00
Mat M
9208d555b7
Merge pull request #3668 from ReinUsesLisp/vtx-format-16ui
maxwell_to_vk: Add uint16 vertex formats
2020-04-15 11:43:52 -04:00
Mat M
ab72696beb
Merge pull request #3656 from ReinUsesLisp/glsl-full-decompile
gl_shader_cache: Use CompileDepth::FullDecompile on GLSL
2020-04-15 03:17:46 -04:00
Mat M
4878d6bb49
Merge pull request #3654 from ReinUsesLisp/fix-fb-attach
gl_texture_cache: Fix layered texture attachment base level
2020-04-15 03:17:18 -04:00
Mat M
50c0a92db8
Merge pull request #3663 from ReinUsesLisp/fcmp-rc
shader/arithmetic: Add FCMP_CR variant
2020-04-15 03:16:56 -04:00
Mat M
13331a3a32
Merge pull request #3664 from ReinUsesLisp/fe3h-black-squares
Revert "gl_shader_decompiler: Implement merges with bitfieldInsert"
2020-04-15 03:14:28 -04:00
ReinUsesLisp
3036067047 maxwell_to_vk: Add uint16 vertex formats 2020-04-15 04:06:30 -03:00
ReinUsesLisp
b4e43c64c8 maxwell_to_vk: Add missing breaks
Avoid invalid fallbacks.
2020-04-15 04:05:33 -03:00
ReinUsesLisp
0ca456830f vk_blit_screen: Initialize all members in VkPipelineViewportStateCreateInfo
When the dynamic state is specified, pViewports and pScissors are
ignored, quoting the specification:

  pViewports is a pointer to an array of VkViewport structures, defining
  the viewport transforms. If the viewport state is dynamic, this member
  is ignored.

That said, AMD's proprietary driver itself seem to read it regardless of
what the specification says.
2020-04-15 03:30:08 -03:00
Rodrigo Locatti
0b132e8cc1
Merge pull request #3657 from ReinUsesLisp/viewport-zero
vk_rasterizer: Default to 1 viewports with a size of 0
2020-04-15 01:51:17 -03:00
Fernando Sahmkow
daddbeffd1
Texture Cache: Only do buffer copies on accurate GPU. (#3634)
This is a simple optimization as Buffer Copies are mostly used for texture recycling. They are, however, useful when games abuse undefined behavior but most 3D APIs forbid it.
2020-04-14 23:21:00 -04:00
ReinUsesLisp
fd6371eba7 Revert "gl_shader_decompiler: Implement merges with bitfieldInsert"
This reverts commit 05cf270836.

Apparently the first approach using floats instead of bitfieldInert
worked better for Fire Emblem: Three Houses. Reverting to get that
behavior back.
2020-04-14 21:24:33 -03:00
ReinUsesLisp
fefe7f18f9 shader/arithmetic: Add FCMP_CR variant
Adds another variant of FCMP.
2020-04-14 19:11:04 -03:00
ReinUsesLisp
6dfcabc800 gl_rasterizer: Implement constant vertex attributes
Credits go to gdkchan from Ryujinx for finding constant attributes are
used in retail games.
2020-04-14 17:58:53 -03:00
ReinUsesLisp
37e5c4fa7c vk_rasterizer: Default to 1 viewports with a size of 0
Silence validation layer errors.
2020-04-14 04:44:34 -03:00
ReinUsesLisp
453d7419d9 gl_shader_cache: Use CompileDepth::FullDecompile on GLSL
From my testing on a Splatoon 2 shader that takes 3800ms on average to
compile changing to FullDecompile reduces it to 900ms on average.

The shader decoder will automatically fallback to a more naive method if
it can't use full decompile.
2020-04-14 01:34:20 -03:00
ReinUsesLisp
0e232cfdc1 renderer_vulkan: Integrate Nvidia Nsight Aftermath on Windows
Adds optional support for Nsight Aftermath. It is enabled through
ENABLE_NSIGHT_AFTERMATH in cmake. A path to the SDK has to be provided
by the environment variable NSIGHT_AFTERMATH_SDK.

Nsight Aftermath allows an application to generate "minidumps" of the
GPU state when a device loss happens. By analysing these on Nsight we
can know what a game was doing and why it triggered a device loss.

The dump is generated inside %APPDATA%\yuzu\log\gpucrash and this
directory is deleted every time a new instance is initialized with
Nsight enabled.

To enable it on yuzu there has a to be a driver and device capable of
running Nsight Aftermath on Vulkan. That means only Turing based GPUs
on the latest stable driver, beta drivers won't work for now.

It is manually enabled in Configuration>Debug>Enable Graphics Debugging
because when using all debugging capabilities there is a runtime cost.
2020-04-14 00:39:21 -03:00
ReinUsesLisp
21dc842171 gl_texture_cache: Fix layered texture attachment base level
The base level is already included in the texture view. If we specify
the base level in the texture again, this will end up in the incorrect
level and potentially out of bounds.
2020-04-13 18:24:56 -03:00
ReinUsesLisp
6cfe2a7246 renderer_vulkan: Remove Nvidia checkpoints 2020-04-13 17:33:59 -03:00
ReinUsesLisp
16105c6a66 renderer_vulkan: Catch device losses in more places 2020-04-13 17:33:59 -03:00
Rodrigo Locatti
7e4a132a77
Merge pull request #3636 from ReinUsesLisp/drop-vk-hpp
renderer_vulkan: Drop Vulkan-Hpp
2020-04-13 17:08:04 -03:00
Mat M
fbf13d3f48
Merge pull request #3651 from ReinUsesLisp/line-widths
gl_rasterizer: Implement line widths and smooth lines
2020-04-13 10:19:59 -04:00
Mat M
08266d70ba
Merge pull request #3638 from ReinUsesLisp/remove-preserve-contents
texture_cache: Remove preserve_contents
2020-04-13 10:19:01 -04:00
Mat M
c4001225f6
Merge pull request #3631 from ReinUsesLisp/more-astc
texture/astc: More small ASTC optimizations
2020-04-13 10:17:32 -04:00
Mat M
7b62212461
Merge pull request #3619 from ReinUsesLisp/i2i
shader/conversion: Implement I2I sign extension, saturation and selection
2020-04-13 10:17:07 -04:00
Mat M
3351e1e94f
Merge pull request #3627 from ReinUsesLisp/layered-view
gl_texture_cache: Attach view instead of base texture for layered attchments
2020-04-13 10:16:18 -04:00
Mat M
d37d899431
Merge pull request #3646 from ReinUsesLisp/fix-glsl-turing
gl_shader_decompiler: Improve generated code in HMergeH*
2020-04-13 10:15:12 -04:00
Mat M
47036859eb
Merge pull request #3633 from ReinUsesLisp/clean-texdec
shader/texture: Remove type mismatches management from shader decoder
2020-04-13 10:13:05 -04:00
ReinUsesLisp
76615b9f34 gl_rasterizer: Implement line widths and smooth lines
Implements "legacy" features from OpenGL present on hardware such as
smooth lines and line width.
2020-04-13 01:30:34 -03:00
ReinUsesLisp
05cf270836 gl_shader_decompiler: Implement merges with bitfieldInsert
This also fixes Turing issues but it avoids doing more bitcasts. This
should improve the generated code while also avoiding more points where
compilers can flush floats.
2020-04-12 22:39:59 -03:00
Fernando Sahmkow
3d91dbb21d
Merge pull request #3578 from ReinUsesLisp/vmnmx
shader/video: Partially implement VMNMX
2020-04-12 10:44:03 -04:00
ReinUsesLisp
75eb953575 gl_shader_decompiler: Improve generated code in HMergeH*
Avoiding bitwise expressions, this fixes Turing issues in shaders using
half float merges that affected several games.
2020-04-12 05:06:55 -03:00
ReinUsesLisp
76f178ba6e shader/video: Partially implement VMNMX
Implements the common usages for VMNMX. Inputs with a different size
than 32 bits are not supported and sign mismatches aren't supported
either.

VMNMX works as follows:
It grabs Ra and Rb and applies a maximum/minimum on them (this is
defined by .MX), having in mind the input sign. This result can then be
saturated. After the intermediate result is calculated, it applies
another operation on it using Rc. These operations are merges,
accumulations or another min/max pass.

This instruction allows to implement with a more flexible approach GCN's
min3 and max3 instructions (for instance).
2020-04-12 00:34:42 -03:00
ReinUsesLisp
a7baf6fee4 video_core: Add MSAA registers in 3D engine and TIC
This adds the registers used for multisampling. It doesn't implement
anything for now.
2020-04-12 00:21:27 -03:00
ReinUsesLisp
94b0e2e5da texture_cache: Remove preserve_contents
preserve_contents was always true. We can't assume we don't have to
preserve clears because scissored and color masked clears exist.

This removes preserve_contents and assumes it as true at all times.
2020-04-11 01:51:02 -03:00
ReinUsesLisp
2905142f47 renderer_vulkan: Drop Vulkan-Hpp 2020-04-10 22:49:02 -03:00
bunnei
51c6688e21
Merge pull request #3594 from ReinUsesLisp/vk-instance
yuzu: Drop SDL2 and Qt frontend Vulkan requirements
2020-04-10 20:06:55 -04:00
ReinUsesLisp
a87b16da9a shader/texture: Remove type mismatches management from shader decoder
Since commit e22816a5bb we handle type mismatches from the CPU.
We don't need to hack our shader decoder due to game bugs anymore.

Removed in this commit.
2020-04-10 00:57:32 -03:00
Fernando Sahmkow
7182ef31c9
Merge pull request #3622 from ReinUsesLisp/srgb-texture-border
video_core/texture: Use a LUT to convert sRGB texture borders
2020-04-09 18:01:48 -04:00
ReinUsesLisp
6bf5d2b011 astc: Hard code bit depth changes to 8 and use fast replicate 2020-04-09 18:37:12 -03:00
Rodrigo Locatti
36f607217f
Merge pull request #3610 from FernandoS27/gpu-caches
Refactor all the GPU Caches to use VAddr for cache addressing
2020-04-09 17:59:21 -03:00
ReinUsesLisp
bd2c1ab8a0 astc: Use boost's static_vector to avoid heap allocations 2020-04-09 05:27:57 -03:00
ReinUsesLisp
5de130beea astc: Implement a fast precompiled alternative for Replicate 2020-04-09 03:58:25 -03:00
ReinUsesLisp
6b4d4473be astc: Move Replicate to a constexpr LUT when possible 2020-04-09 03:35:07 -03:00