2 Vulkan Case Study.pdf

download 2 Vulkan Case Study.pdf

of 103

  • date post

    04-Jan-2017
  • Category

    Documents

  • view

    248
  • download

    3

Embed Size (px)

Transcript of 2 Vulkan Case Study.pdf

  • Vulkan Case Study2016 Khronos Seoul DevUSAMSUNG Electronics

    Soowan Park Graphics Engineer (soft.park@samsung.com)

    Joonyong Park Senior Graphics Engineer (jrdn.park@samsung.com)

  • Samsung Electronics

    Before the start

    All case study information & contents are based on our development experiences with Galaxy S7 spanning two chipset variants, using the ARM Mali and Qualcomm Adreno GPU.

  • Samsung Electronics

    Who we are

    GPU, Graphics R&D, MCD, SAMSUNG Electronics. gamedev@samsung.com

  • Samsung Electronics

    What we did

    ProtoStar, HIT, NFS, Vainglory

    MWC, GDC, SDC, E3, Gamescom, CEDEC

  • Samsung Electronics

    History

  • Samsung Electronics

    History

  • Samsung Electronics

    Agenda

    1. Swapchain

    2. Uniform Buffer

    3. GPU Driver

    4. Rendering

    5. GLES Fall-back

    6. Development Tip

  • Samsung Electronics

    For who?

    For Android Vulkan Developer.

    Its very simple case, But important!

  • 1. Swapchain

  • Samsung Electronics

    Swapchain - Android

    Triple Buffering - Google Project Butter (Applied since Android 4.1 Jelly Bean release) Android OpenGL ES runs with triple buffering by default

    adb shell dumpsys SurfaceFlinger

    Image Count of Swapchain Android platform requires at least 3 buffers to have better performance for this reason.

    With Java SurfaceView Currently Android Vulkan only support native activity. But, there are way to using

    SurfaceView & Java activity by passing surface handle to native through JNI to get NativeWindow handle.

    q.v. : https://developer.android.com/ndk/reference/group___native_activity.html

    Recommend to using GLSurfaceView like separated java side Renderthread for main render loop.

    #0 #1 #2 #0 #1User cant control the number of BackBuffer in OpenGL ES

  • Samsung Electronics

    Swapchain - Presentation Mode

    VK_PRESENT_MODE_MAILBOX_KHR

    Swapchain Images

    #0 #1 #2

    Internal queue (impl dependant)

    X*

    vkAcquireNextImage

    vkQueuePresent

    vkAcquireNextImage

    vkQueuePresent

    vkAcquireNextImage

    vkQueuePresent

    #0X=#0

    #1X=#1

    #2X=#2

    VBLANK

    Display controller will read from #1

    Latency

  • Samsung Electronics

    Swapchain - Presentation Mode

    VK_PRESENT_MODE_FIFO_KHR

    Swapchain Images

    #0 #1 #2

    Internal queue

    X*

    vkAcquireNextImage

    vkQueuePresent

    vkAcquireNextImage

    vkQueuePresent

    vkAcquireNextImage

    vkQueuePresent

    #0X=#0

    #1Y=#1

    #2Z=#2

    VBLANK

    Swaps #0 stored in X with the backbuffer.

    Latency

    Y* Z*

  • Samsung Electronics

    Swapchain - Presentation Mode

    VK_PRESENT_MODE_MAILBOX_KHR

    VK_PRESENT_MODE_FIFO_KHR

    DO NOT use MAILBOX mode in game. Unless latency is critical and

    you know what youre doing.

    60 FPS line

    60 FPS line

  • Samsung Electronics

    Swapchain - Presentation Mode

    Code Level (q.v. : https://www.khronos.org/registry/vulkan/specs/1.0-wsi_extensions/xhtml/vkspec.html, , 29.5. Surface Queries)

    uint32_t presentModeCount = 0;vkGetPhysicalDeviceSurfacePresentModesKHR(physicalDevice, surface, &presentModeCount, VK_NULL_HANDLE);std::vector pPresentModes(presentModeCount);vkGetPhysicalDeviceSurfacePresentModesKHR(physicalDevice, surface, &presentModeCount, pPresentModes.data());VkPresentModeKHR presentMode = VK_PRESENT_MODE_FIFO_KHR;

    const uint32_t desiredArraySize = 2;VkPresentModeKHR desiredPresentMode[] ={

    VK_PRESENT_MODE_FIFO_KHR,VK_PRESENT_MODE_MAILBOX_KHR

    };

    for (int d_n = 0; d_n < desiredArraySize; ++d_n){

    for (int p_n = 0; p_n < presentModeCount; ++p_n){

    if (pPresentModes[p_n] == desiredPresentMode[d_n]){

    presentMode = desiredPresentMode[d_n];d_n = desiredArraySize;break;

    }}

    }

  • Samsung Electronics

    Swapchain - SwapBuffer Comparison (Android)

    WSI (Windows System Integration)RENDERFRAME N (Vulkan)

    RENDERFRAME N (OpenGL ES)

    APPLICATION

    SURFACE FLINGER

    DISPLAY

    glClear / glDrawXXX #0 eglSwapBuffer #0Render Into BackBuffer (FrameBuffer 0) #0

    EGLSurface : GfxBuffer #0

    EGLSurface : GfxBuffer #1

    EGLSurface : Gfxbuffer #2

    WindowBuffer

    DequeueQueue

    EGLSurface : GfxBuffer #0

    WindowBuffer

    Associated Native Window

    glFlush() #0COMMAND FLUSHING & RENDERING

    No way to get GPU rendering completion

    vkAcquireNextImageKHR #0

    APPLICATION

    SURFACE FLINGER

    DISPLAY

    VkImage(Buffer) #0

    VkImage (Buffer) #1

    VkImage (Buffer) #2

    vkQueueSubmit #0Recorded Into Command Buffer #0

    associated Graphics QueuevkQueuePresentKHR #0

    WindowBuffer

    Dequeue Queue

    VkImage #0

    WindowBuffer

    WILL BLOCK HERE

    Associated Native Window

    COMMAND FLUSHING & RENDERING

    Rendering Complete Semaphore

    COMPLETE!

    INTERNALWAIT

    Application does the blocking wait to sync with GPU.(VK_PRESENT_MODE_FIFO_KHR)

    WILL BLOCK HERE

    Can explicitly get GPU rendering completion signal by using fence from submit

  • Samsung Electronics

    Swapchain - Synchronization failed case

    Tearing

  • Samsung Electronics

    Swapchain - Synchronization

    Fence Logic

    VkCommandBufferPool(Single-Thread)

    Swapchain

    VkImage #0 VkImage #1 VkImage #2

    VkFence #0

    VkCommandBuffer #0

    VkFence #1

    VkCommandBuffer #1

    VkFence #2

    VkCommandBuffer #2

    Swapchain

    VkImage #0 VkImage #1 VkImage #2

    VkFence #0

    VkCommandBuffer #0

    VkFence #1

    VkCommandBuffer #1

    VkFence #2

    VkCommandBuffer #2

    vkWaitForFences(fence #0)

    vkResetFence(fence #0)

    vkResetCommandBuffer(buf #0)

    vkBeginCommandBuffer(buf #0)

    Render ~

    vkQueueSubmit(fence #0)

    vkQueuePresentKHR

    vkWaitForFences(fence #1)

    vkResetFence(fence #1)

    vkResetCommandBuffer(buf #1)

    vkBeginCommandBuffer(buf #1)

    Render ~

    vkQueueSubmit(fence #1)

    vkQueuePresentKHR

    vkWaitForFences(fence #2)

    vkResetFence(fence #2)

    vkResetCommandBuffer(buf #2)

    vkBeginCommandBuffer(buf #2)

    Render ~

    vkQueueSubmit(fence #2)

    vkQueuePresentKHR

  • Samsung Electronics

    Image Layout - Swapchain

    Transitioning to the correct image layout for presenting and rendering.

    Very begin of drawing, after the first acquire

    getSwapchainImagesKHR : VK_IMAGE_LAYOUT_UNDEFINED

    VK_IMAGE_LAYOUT_GENERAL

    Clear presentable image

    Draw Routine

    Acquire

    VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL

    Render

    VK_IMAGE_LAYOUT_PRESENT_SRC_KHR

    Present

    // Create SwapchainvkGetSwapchainImagesKHR(device, swapchain, &swapchainImageCount, pSwapchainImages); // VK_IMAGE_LAYOUT_UNDEFINED

    // Frame loopswapchainIndex = acquire();if (firstAcquire){

    setImagesLayout(pSwapchainImages, swapchainImageCount, VK_IMAGE_LAYOUT_GENERAL);clearImages(pSwapchainImages, swapchainImageCount);

    }setImageLayout(pSwapchainImages[swapchainIndex], VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL);/*Rendering*/setImageLayout(pSwapchainImages[swapchainIndex], VK_IMAGE_LAYOUT_PRESENT_SRC_KHR);present(swapchainIndex);

  • Samsung Electronics

    Image Layout - Texture

    VK_TILING_LINEAR

    Create with VK_IMAGE_LAYOUT_PREINITIALIZED

    Set ImageData using vkMapMemory, vkUnmapMemory

    VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL

    VK_TILING_OPTIMAL

    Create with VK_IMAGE_LAYOUT_UNDEFINED

    VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL

    Set ImageData using Staging Buffer

    VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL

    You can check format property like this

    // Get Image Format PropertyVkFormatProperties formatProperty;vkGetPhysicalDeviceFormatProperties(physicalDevice, imageFormat, &formatProperty);if (formatProperty.optimalTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT) /**/;else if (formatProperty.linearTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT) /**/;

    VkDescriptorImageInfo VkImageView VkImage

    Texturing in Vulkan

    VkSampler

    VkDescriptorSet VkDeviceMemory

  • Samsung Electronics

    Image Layout - Texture

    Why we should use Staging Buffer?

    VK_TILING_LINEAR

    Texels are laid out in memory in row-major order, possibly with some padding on each row

    So you can access it with this eq.

    Common

    Compressed

    VK_TILING_OPTIMAL

    Texels are laid out in an implementation-

    dependent arrangement, for more

    optimal memory access

    // (x,y,z,layer) are in texel coordinates

    address(x,y,z,layer) = layer*arrayPitch +

    z*depthPitch + y*rowPitch + x*texelSize +

    offset;

    // (x,y,z,layer) are in compressed texel

    block coordinates address(x,y,z,layer) =

    layer*arrayPitch + z*depthPitch + y*rowPitch

    + x*compressedTexelBlockByteSize + offset;

    VkImage(VkDeviceMemory)

    ?VkImage(VkDeviceMemory)

  • Samsung Electronics

    Image Layout - Texture

    How can use Staging Buffer?

    vkCmdCopyBufferToImage

    VkImage with VK_TILING_OPTIMAL

    VkBuffer& stagingBuffer = getStagingBuffer(imageBufferSize);VkBufferImageCopy region = getRegionFromImage(image);fillBuffer(stagingBuffer, pImageData);vkCmdCopyBufferToImage(commandBuffer, stagingBuffer, image,VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &region);

    DO NOT use VK_MEMORY_PROPERTY_HOST_VISIBLE