Journey of Pixels in Adobe Photoshop on Intel HD Graphics · PDF fileJourney of Pixels in...
Transcript of Journey of Pixels in Adobe Photoshop on Intel HD Graphics · PDF fileJourney of Pixels in...
Journey of Pixels in Adobe Photoshop on Intel HD Graphics Murali Madhanagopal(Intel), Jerry Harris(Adobe), Yuyan Song(Adobe), Joseph Hsieh (Adobe)
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice.
All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user.
Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.
Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.Intel.com/performance
Intel, Intel Inside, the Intel logo, Centrino, Intel Core, Intel Atom, Pentium, and Ultrabook are trademarks of Intel Corporation in the United States and other countries
Legal
2
Agenda
Photoshop performance on Intel HD Graphics – Murali Madhanagopal
Photoshop GPU Usage – Jerry Harris
Photoshop Blur Gallery – Yuyan Song
OpenCL in Photoshop Creative Cloud – Joseph Hsieh
Conclusion
Demos
3
Processor Graphics Brief History Integrated Graphics since late 90’s
Focused on enterprise and entry level consumer graphics
Focus on performance has increased significantly over the past few years
MSS leader for the past several years
54.8% MSS for desktop and 64.6% MSS for notebooks as of Q3’12
Annual volume of >280Mu, ramp of >150Mu in yr of launch
CPU + Gfx on same package since 2010, same die since 2011
Source: John Peddie Research, Q3’12 4
Graphics Strategy -the Change Past Present
1 die Eg: G45, Core i5/i3
Multi-die strategy 4th Gen Core i7/5/3 products will have 3 dies – GT1/GT2/GT3
(n-1) fab technology CPU on 32nm, GPU on 45nm in 2010 CPU on 45nm, GPU on 65nm in 2009
n generation fab technology CPU and GPU on 32nm in 2011, 22nm in 2012 CPU and GPU on the same fab technology going forward
No power sharing Turbo power sharing Dynamically shift voltage and freq between CPU and GPU based on demand to maximize perf/watt for CPU and GPU based work
Less die area for Gfx Greater die area for Graphics Nearly doubling transistor count every year
(n-1) API support n generation API support
Media capabilities Game changing Media capabilities Decode, Image processing, Encode
Chipset integration CPU integration LLC cache sharing (8MB) for fast latency access to Gfx data
Higher performance, better features and TTM execution
5
Workstation pGFX Gen-to-Gen Comparison
E3 v3 (HSW) Broadwell
14nm GPU & CPU 22nm GPU & CPU
2014:
TBD
E3 v2 (IVB)
22nm GPU & CPU
2012: 32 GB sys mem, up to 1.5 GB
allocated as video RAM
DX 11.0 OCL 1.1 OGL 4.0
GT2: 16 EUs
2013: 32 GB sys mem , up to 1.5 GB
allocated as video RAM
DX 11.1 OCL 1.2 OGL 4.1
GT2: 20 EUs GT3: 40
Broadwell adds significant improvements for GFX applications in 2014
E3 (SNB)
32nm GPU & CPU
2011: 32 GB sys mem, up to 1.5 GB
allocated as video RAM
DX 10.1 OGL 3.2
GT2: 12EUs
6
Photoshop Gen to Gen GPU Features
CS5 Creative Cloud (CC)
OpenCL: Smart Sharpen
Modes:
Use Graphics Processor
Use OpenCL
GPU enabled UI Pixel Bender plugin
Open GL:
Scrubby Zoom HUD Color Picker
Color Sampling Ring Repousse
3D Overlays
OpenGL Modes: Basic:
Normal: Advanced:
CS4 GPU Canvas/3D
interactions
Open GL: Smooth Zoom
Panning Canvas Rotate
Pixel Grid 3D Axis/Lights
Modes:
Edit->Preferences->Enable OpenGL Drawing
Ironlake(G45, 2010) was enabled for Basic mode. SNB, IVB, HSW all support Advanced GL mode.
IVB and HSW support OpenCL acceleration in Photoshop CS6/CC.
CS6
Content Editing
OpenGL: Adaptive Wide Angle
Liquify Oil Paint
Puppet Warp Lighting Effects
3D Enhancements
OpenCL: Field/Iris/Tilt Shift
Blur
7
Photoshop OGL performance
Photoshop performance scales with EU’s and memory bandwidth!
GPU Utility Intel® HD Graphics P3000 (SNB GT2)
Intel® HD Graphics P4000 (IVB GT2)
Intel® HD Graphics P4600 (HSW GT2)
Intel® Iris™ Pro Graphics 5200
Birds Eye View test 60.44 88.29 89.98 187.54
Hand Toss Test 58.65 66.75 66.91 79.08
Paint Brush Size 300 41.07 48.88 52.29 49.41
Paint Brush Size 500 37.90 47.35 43.93 46.20
Rotate Test 67.20 98.94 79.85 92.61
Scrubby Zoom Test 65.09 51.26 102.12 202.78
Smooth Zoom Test 60.85 94.35 85.83 204.41
Averages (fps) 52.69 65.51 77.69 135.69
24% 19% 74%
* Photoshop CC-64 on Win7-64, 16gb 1600Mhz DDR3 8
Performance (contd)
* Liquify and Blur processing time in seconds normalized to 1 for GPU acceleration off on HSW GT2
Liquify filter shows a 2.5X improvement and Field Blur 6X improvement with GPU acceleration
0 1 2 3
Liquify Filter
Photoshop CC OpenGL
OpenGL On
OpenGL Off
9
0 2 4 6 8
Field Blur
Photoshop CC OpenCL
OpenCL On
OpenCL Off
What is happening under the hood?
GPUView is a Microsoft tool showing cpu-gpu interaction
Both CPU and GPU are efficiently utilized
Photoshop is multithreaded and many CPU cores submit work to the gpu
GPU is mostly busy at 70%-98% utilization. EU Utilization is 85%.
Memory Utilization is at 36% for GT2 and goes down to 10% with GT3e
No stalls in the GPU pipeline.
* Liquify filter applied to 60 mb image 10
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Photoshop GPU Usage Jerry Harris | Principal Scientist - Photoshop
11
© 2012 Adobe Systems Incorporated. All Rights Reserved.
User Expectations – Mobility World – aka Post WIMP
12
© 2012 Adobe Systems Incorporated. All Rights Reserved.
NUI Evolution - iEnvy moves to the desktop
14
© 2012 Adobe Systems Incorporated. All Rights Reserved.
NUI Evolution - iEnvy moves to the desktop
15
© 2012 Adobe Systems Incorporated. All Rights Reserved.
NUI Evolution - iEnvy moves to mobile x86
16
© 2012 Adobe Systems Incorporated. All Rights Reserved.
NUI Evolution - iEnvy moves to mobile x86
17
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Simple Photoshop Layer structure
Virtual Tiled array of planar components
8, 15, and 32f values
0..64 possible channels
Unassociated Alpha (not preweighted)
Sheet Mask == Alpha
Other masks include User mask, and clipping path
Other layers include placed content
Smart Docs …Other PS docs
3D, Type, Shapes
Movies
19
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Photoshop Layer Stack View Updating (Simplified)
Array of layers
Composite of layers below target layer cached
Layers above the current layer target recomposited
Closest Pyramid level to the view scale is composited
Update occurs one tile at a time
Some edits occur at the max pyramid level
Some previews apply edits at viewing level
20
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Photoshop GL Texture Usage- Data Structure
Sparse Tiled Pyramid
Tiles are 360x360
Texture2D
Advanced Mode – int format –GL_RGBA16F_ARB
Shader for Checker board compositing
Shader for Tone mapping
Shader for Color Matching
Non-Advanced – int format GL_RGBA8
Pixels are Screen Ready
21
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Gl Canvas- User visible features
Smooth zoom animations
Smooth canvas toss animations
Canvas Rotation
Temporal refinement when panning large images (blurry to sharp)
Pixel Grid
Improved filtering when at non-powers of two zoom levels
Antialiasing of Paths, Shapes, and Overlays
HDR Tone Mapping
ACE Color Matching
Brush Resizing feedback
22
© 2012 Adobe Systems Incorporated. All Rights Reserved.
GPU Tiled Pyramid Update Management
Just prior to redraw – load as many tiles at the level of the pyramid that best matches the
viewing conditions - followed by a rendering + swapbuffers.
Idle time – Back fill pyramid with no updates
When a navigation modality is engaged start prefetching
Pan Tool – Around the periphery of the current view frustum
Zoom Tool – Above and/or below the current view frustum
23
© 2012 Adobe Systems Incorporated. All Rights Reserved.
The toll for Full screen immersion
24
2650x1600 monitor in full screen mode at 66.7% 4016 x 2424 8 Bit - 200 megs per frame = 12 gigs per second 16 bit – 400 megs per frame = 24 gigs per second 32 bit – 800 megs per frame = 48 gigs per second
3840 x 2160 monitor in full screen mode at 66.7% 5818 x 3272 8 Bit – 400 megs per frame = 24 gigs per second 16 bit – 800 megs per frame = 48 gigs per second 32 bit – 1600 megs per frame = 96 gigs per second
5 passes over data - COW – Modify – Composite – Interleave - Upload
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Future Directions
Avoid reincarnating the pixel data.
Adopt the INTEL_map_texture extension
Exploit OpenGL/OpenCL interopt
25
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Photoshop Blur Gallery Yuyan Song, Computer Scientist, Adobe Systems Inc.
27
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Why OpenCL
Only cross-platform GPU computing solution Advantages over OpenGL Learning curve Data format Debugging
Increasing maturity and ubiquity
28
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Blur Gallery Demo
Field Blur Iris Blur Tilt Shift
29
© 2012 Adobe Systems Incorporated. All Rights Reserved.
How did we do it?
OpenCL kernels were ported from optimized CPU code Broken into 2K x 2K blocks for GPU Use 1K x 1K scale down image for mouse-down interaction
30
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Challenges
Need good candidate algorithms Bandwidth Compute Parallel
Need debugged C algorithm first
31
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Challenges
Issue in using multiple command queues in multiple threads Resource limits Win/Mac Timeout issues on low end cards. Out of memory issues on low end cards.
Platform variation Driver Issues Various compiler issues
32
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Performance Comparison Systems of standard configuration show 4-8x gain for typical use-cases Gains improve with Blur radii (results from Intel® Iris™ Pro Graphics 5200
running Windows 7 listed below) General application processing accounts for majority of time in smaller workloads
33
Radius in Pixels (21 mp image)
OpenCL on OpenCL off
100 5.2s 19.6s
250 7.8s 53.8s
500 17.3s 120.8s
© 2012 Adobe Systems Incorporated. All Rights Reserved.
OpenCL in Photoshop CC Joseph Hsieh, Computer Scientist II, Adobe Systems Inc.
34
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Renovated Smart Sharpen Feature
Adobe renovated the legacy smart sharpen by introducing patch based denoise and sharpen algorithm. New patch based algorithm produces sharpened image
without halo effect. Furthermore, the denoise step suppress the “noise get boosted when you sharpen” issue.
35
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Renovated Smart Sharpen Feature
36
Original Picture
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Renovated Smart Sharpen Feature
37
After Applying the Legacy Smart Sharpen
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Renovated Smart Sharpen Feature
38
After Applying the Patch Based Smart Sharpen
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Challenges
Our patch based denoise algorithm is heavily memory bound. We can not cache portion of input image to local memory.
39
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Attempted Solutions
40
1. For each pixel, is there redundant comparison part in up to 65 patches comparison?
2. Maybe using local memory in some way to release the stress on global memory.
3. The intuitive approach of using read_imagef() is slower then high end CPUs on lower end GPUs (even with 90%+ cache hit).
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Benchmark
41
7x faster than the CPU standalone depends on this video adaptor. But…
3.1 s 3.8 s 3.4 s
5.4 s
3.7 s
8.4 s
OCL On OCL Off
Intel HD P4600(gpu) VS Xeon E3-1285 3.6GHz (cpu)
Win 8 with Blur Radius 0 using 5616x3744 (21M pixels) RGB Image
Denoise amount:
1% 10% 100%
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Benchmark
42
OCL pipeline is not optimized yet in this version. From the chart, we can see by using OpenCL on denoise, it basically already nullify the performance impact of denoise step (~ 1 s)
3.1 s 3.8 s 3.4 s
5.4 s
3.7 s
8.4 s
OCL On OCL Off
Intel HD P4600 VS Xeon E3-1285 3.6GHz
Win 8 with Blur Radius 0 using 5616x3744 (21M pixels) RGB Image
Denoise amount: 1% 10% 100%
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Ongoing Work
Full smart sharpen feature pipeline on OpenCL. Apply OpenCL to other features.
43
© 2012 Adobe Systems Incorporated. All Rights Reserved.
Conclusion Xeon E3v3/HSW GT2 (Intel® HD Graphics 4600) Photoshop performance
at entry level professional card. Intel® Iris™ Pro Graphics graphics at 48w performance compares to mid level discrete card.
HSW Ultrabooks at 15w can handle mid sized images - 21MP. OpenCL makes it possible for Adobe to introduce more advanced features
to our customers without compromise of user experience and responsiveness of our tools.
Intel and Adobe are working together to enable Photoshop to travel with you in the new mobile form factors.