Real-Time CPU Based H.265/HEVC Encoding Solution with ...
Transcript of Real-Time CPU Based H.265/HEVC Encoding Solution with ...
Real-Time CPU
Based H.265/HEVC Encoding
Solution with Intel® Platform
Technology
Yang Lu
Intel Corporation
Shanghai, PRC
2013.12
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
2
Contents
Contents ............................................................................................................................................. 2
1. Abstract.................................................................................................................................. 3
2. Video Codec Introduction ............................................................................................ 3
3. H.265/HEVC Performance Issues ............................................................................ 4
4. Real-time HEVC Encoder Solution Based on Intel® Xeon™ Platform
6
5. Summary ............................................................................................................................ 14
Reference ....................................................................................................................................... 14
Notices .......................................................................................................................................... Error! Bookmark not defined.
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
3
1. Abstract International Telecommunication Union (ITU) announced the new video codec standard:
High Efficiency Video Coding (HEVC)/H.265, which claims should be about 50 percent
more efficient than the current H.264/MPEG-4 standard. However the complexity of the
algorithm and data structure of H.265 is much more than 4 times the H.264, that means
H.265 based codec will require more computing resource/power than its predecessor. In
this paper we investigate the HEVC codec characters, focus on CPU based software
video trans-coding technologies that provides the best video quality and most flexible
programming model, maximize IA platforms’ capabilities at one HEVC codec, to achieve
real-time performance for HEVC encoding codec on IA platform.
2. Video Codec Introduction
Video coding standards have evolved primarily through the development of the
well-known ITU-T and ISO/IEC standards. The ITU-T produced H.261 and H.263,
ISO/IEC produced MPEG-1 and MPEG-4 Visual, and the two organizations jointly
produced the H.262/MPEG-2 Video and H.264/MPEG-4 Advanced Video Coding (AVC)
standards [1].
Figure 1. Video Standard/Codec Evolution
H.265/HEVC (High-Efficiency Video Coding), introduced last year, is the latest video
codec standard developed by ISO / IEC and ITU-T, aimed to maximize compression
capability and reduce data loss. H.265/HEVC doubles the compression ratio compared
to the previous H.264/AVC standard, but has the same subjective quality. HEVC
technology helps online video providers to provide high-quality video with lesser
bandwidth, making it the next video codec revolution.
HEVC propose several new video coding syntax architecture and algorithms to obtain the
high efficient coding standard[1][2]:
a) Random Access and Bitstream Splicing Features
The new design supports special features to enable random access and bitstream splicing.
In H.264/MPEG-4 AVC, a bitstream must always start with an IDR access unit, but in the
HEVC random access is supported.
b) Coding Tree Units Structure
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
4
A picture is partitioned into coding tree units (CTUs), which each contain luma CTBs
and chroma CTBs. The value of L may be equal to 16, 32, or 64 as determined by an
encoded syntax element specified in the SPS. The CTU contains a quadtree syntax that
allows for splitting the CBs to a selected appropriate size based on the signal
characteristics of the region that is covered by the CTB. All previous video coding
standards just use the fixed array size of 16×16 luma samples, but HEVC supports
variable-size CTBs selected according to needs of encoders in terms of memory and
computational requirements.
c) Tree-Structured Partitioning Into Transform Blocks and Units
A CB can be recursively partitioned into transform blocks (TBs). The partitioning is
signaled by a residual quadtree. In contrast to previous standards, the HEVC design
allows a TB to span across multiple PBs for interpicture-predicted CUs to maximize the
potential coding efficiency benefits of the quadtree-structured TB partitioning.
d) Intrapicture Prediction
Directional prediction with 33 different directional orientations is defined for (square)
transform block(TB) sizes from 4×4 up to 32×32. The possible prediction directions are
all directions. HEVC supports various intrapicture predictive coding methods referred to
as Intra−Angular, Intra−Planar, and Intra−DC.
This advanced coding standard demands extremely high processing capabilities from
both of client devices and backend trans-coding servers.
3. H.265/HEVC Performance Issues
Current HEVC HM project only implement the major functionalities of this standard, the
real performance still far away from the production and real deployment.
− no parallel scheme
− poor vectorization tuning
Figure 2. HM project profiling – thread concurrency
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
5
Figure 3. HM project profiling – hot code
This HEVC encoder consumes over 100 times of CPU resource than H.264 on server side,
and more than 10 times CPU power on client side.
H.265/HEVC codec attract world-wide multi groups/agencies to optimize the
performance, push to real deployment. Several open sourced projects: a) OpenHEVC(currently HM10.0 compatible, and did some optimization on decoder)
https://github.com/OpenHEVC/openHEVC b) x265(compatible with HM, and did optimization on parallel & SIMD)
http://code.google.com/p/x265/
https://bitbucket.org/multicoreware/x265/wiki/Home
We take a 720p 24 fps video to evaluate the x.265 encoder performance, on Intel(R)
Xeon(R) Sandy Bridge(E5-2680 @ 2.70GHz, 8*2 physical cores) platform. This codec did lots
of work to optimize the original standard by both of task and data parallelism, however
from our benchmarking it can only use 6 cores’ capabilities in total 32 logical cores
system (SMT ON), can’t maximize current multi-core platform computing resource.
Figure 4. CPU usage of X.265 project
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
6
Figure 5. SIMD tuning of X.265 project
In x.265 project, SIMD instruction has been utilized to tuning vectorization, which
contribute 70+% performance speedup here, with further icc compiling optimization, we
get 2x speedup on IA platform totally. However the encoder performance here still has
big gap with the real-time encoder deployment, especially for HD 1080p videos.
In the PRC, more than 20 multimedia ISVs are pursuing available HEVC solution and
platform to save the online video service cost and maintain the quality at the same time.
Figure 6: Online video market in the PRC
4. Real-time HEVC Encoder Solution Based on
Intel® Xeon™ Platform
Video encoding application is a standard CPU and memory intensive workload, which
requires high capabilities of the server platform, such as core computing efficiency,
reliability, and stability. The computing complexity of H.265/HEVC codec is far more 4
times than previous H.264/MPEG, it raises unprecedented processing requirements to the
backend server platform. In this section, we will introduce IA major technologies that
help Strongene[3] HEVC codec to reach the 1080p real-time encoding standard.
4.1 SIMD Vectorization Tuning for HEVC Encoding Functions
Most of the video and image time-consuming functions locate to the block based data
intensive computing, which can be optimized by the IA SIMD(single instruction multi
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
7
data) vectorization instructions. SIMD instructions process multi set data within one
single CPU cycle, that will greatly improve the data throughput and execution efficiency.
SIMD have been widely supported at x86 processors, evolving from MMX, SSE, AVX,
to the AVX2 at different x86 platform generations.
We take a common 64*64 block computing in video/image processing as an example
here to demonstrate how to utilize the SSE and AVX2 intrinsic to optimize the original
code:
Code example for 64*64 block computing #include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "smmintrin.h"
#include "immintrin.h"
/********* original block computing serial scalar computing ************/ #define PIXEL_SAD_C( func_type, name, lx, ly )
func_type int name( pixel *pix1, int i_stride_pix1,pixel *pix2, int i_stride_pix2 )
{
int sum = 0;
int x, y;
for( y = 0; y < ly; y+=2 )
{
for( x = 0; x < lx; x++ )
{
sum += abs( pix1[x] - pix2[x] );
}
pix1 += i_stride_pix1<<1;
pix2 += i_stride_pix2<<1;
}
return sum << 1;
}
PIXEL_SAD_C( static, LENT_sad_64x64_c, 64, 64 )
PIXEL_SAD_C( static, LENT_sad_32x32_c, 32, 32 )
#define SAD4( w, h )
static void LENT_sad4_##w##x##h##_c( pixel *fenc, pixel *p0, pixel *p1, pixel *p2,
pixel *p3, int i_stride, int cost[4] )
{
cost[0] = LENT_sad_##w##x##h##_c( fenc, FENC_STRIDE, p0, i_stride );
cost[1] = LENT_sad_##w##x##h##_c( fenc, FENC_STRIDE, p1, i_stride );
cost[2] = LENT_sad_##w##x##h##_c( fenc, FENC_STRIDE, p2, i_stride );
cost[3] = LENT_sad_##w##x##h##_c( fenc, FENC_STRIDE, p3, i_stride );
}
SAD4( 64, 64 )
SAD4( 32, 32 )
/************** SSE instruction implementation ************************/
void inline sad4_32_fast_sse( pixel *fenc, pixel *p0, pixel *p1, pixel *p2, pixel
*p3, int i_stride, int cost[4], int ly )
{
__m128i sum = _mm_setzero_si128();
int i;
i_stride <<= 1;
for( i = 0; i < ly; i += 2 )
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
8
{
__m128i se = _mm_load_si128( (__m128i *)(fenc) );
__m128i s0 = _mm_loadu_si128( (__m128i *)(p0) );
__m128i s1 = _mm_loadu_si128( (__m128i *)(p1) );
__m128i s2 = _mm_loadu_si128( (__m128i *)(p2) );
__m128i s3 = _mm_loadu_si128( (__m128i *)(p3) );
s0 = _mm_sad_epu8( se, s0 );
s1 = _mm_sad_epu8( se, s1 );
s2 = _mm_sad_epu8( se, s2 );
s3 = _mm_sad_epu8( se, s3 );
s0 = _mm_hadd_epi32( s0, s1 );
s1 = _mm_hadd_epi32( s2, s3 );
sum = _mm_add_epi32( sum, _mm_hadd_epi32( s0, s1 ) );
se = _mm_load_si128( (__m128i *)(fenc + 16) );
s0 = _mm_loadu_si128( (__m128i *)(p0 + 16) );
s1 = _mm_loadu_si128( (__m128i *)(p1 + 16) );
s2 = _mm_loadu_si128( (__m128i *)(p2 + 16) );
s3 = _mm_loadu_si128( (__m128i *)(p3 + 16) );
s0 = _mm_sad_epu8( se, s0 );
s1 = _mm_sad_epu8( se, s1 );
s2 = _mm_sad_epu8( se, s2 );
s3 = _mm_sad_epu8( se, s3 );
s0 = _mm_hadd_epi32( s0, s1 );
s1 = _mm_hadd_epi32( s2, s3 );
sum = _mm_add_epi32( sum, _mm_hadd_epi32( s0, s1 ) );
fenc += (2*FENC_STRIDE);
p0 += i_stride;
p1 += i_stride;
p2 += i_stride;
p3 += i_stride;
}
_mm_storeu_si128( (__m128i *)cost, _mm_slli_epi32( sum, 1) );
}
void inline sad4_64_fast_sse( pixel *fenc, pixel *p0, pixel *p1, pixel *p2, pixel
*p3, int i_stride,int cost[4], int ly )
{
__m128i sum = _mm_setzero_si128();
int i;
i_stride <<= 1;
for( i = 0; i < ly; i += 2 )
{
__m128i se = _mm_load_si128( (__m128i *)(fenc) );
__m128i s0 = _mm_loadu_si128( (__m128i *)(p0) );
__m128i s1 = _mm_loadu_si128( (__m128i *)(p1) );
__m128i s2 = _mm_loadu_si128( (__m128i *)(p2) );
__m128i s3 = _mm_loadu_si128( (__m128i *)(p3) );
s0 = _mm_sad_epu8( se, s0 );
s1 = _mm_sad_epu8( se, s1 );
s2 = _mm_sad_epu8( se, s2 );
s3 = _mm_sad_epu8( se, s3 );
s0 = _mm_hadd_epi32( s0, s1 );
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
9
s1 = _mm_hadd_epi32( s2, s3 );
sum = _mm_add_epi32( sum, _mm_hadd_epi32( s0, s1 ) );
se = _mm_load_si128( (__m128i *)(fenc + 16) );
s0 = _mm_loadu_si128( (__m128i *)(p0 + 16) );
s1 = _mm_loadu_si128( (__m128i *)(p1 + 16) );
s2 = _mm_loadu_si128( (__m128i *)(p2 + 16) );
s3 = _mm_loadu_si128( (__m128i *)(p3 + 16) );
s0 = _mm_sad_epu8( se, s0 );
s1 = _mm_sad_epu8( se, s1 );
s2 = _mm_sad_epu8( se, s2 );
s3 = _mm_sad_epu8( se, s3 );
s0 = _mm_hadd_epi32( s0, s1 );
s1 = _mm_hadd_epi32( s2, s3 );
sum = _mm_add_epi32( sum, _mm_hadd_epi32( s0, s1 ) );
se = _mm_load_si128( (__m128i *)(fenc + 32) );
s0 = _mm_loadu_si128( (__m128i *)(p0 + 32) );
s1 = _mm_loadu_si128( (__m128i *)(p1 + 32) );
s2 = _mm_loadu_si128( (__m128i *)(p2 + 32) );
s3 = _mm_loadu_si128( (__m128i *)(p3 + 32) );
s0 = _mm_sad_epu8( se, s0 );
s1 = _mm_sad_epu8( se, s1 );
s2 = _mm_sad_epu8( se, s2 );
s3 = _mm_sad_epu8( se, s3 );
s0 = _mm_hadd_epi32( s0, s1 );
s1 = _mm_hadd_epi32( s2, s3 );
sum = _mm_add_epi32( sum, _mm_hadd_epi32( s0, s1 ) );
se = _mm_load_si128( (__m128i *)(fenc + 48) );
s0 = _mm_loadu_si128( (__m128i *)(p0 + 48) );
s1 = _mm_loadu_si128( (__m128i *)(p1 + 48) );
s2 = _mm_loadu_si128( (__m128i *)(p2 + 48) );
s3 = _mm_loadu_si128( (__m128i *)(p3 + 48) );
s0 = _mm_sad_epu8( se, s0 );
s1 = _mm_sad_epu8( se, s1 );
s2 = _mm_sad_epu8( se, s2 );
s3 = _mm_sad_epu8( se, s3 );
s0 = _mm_hadd_epi32( s0, s1 );
s1 = _mm_hadd_epi32( s2, s3 );
sum = _mm_add_epi32( sum, _mm_hadd_epi32( s0, s1 ) );
fenc += (2*FENC_STRIDE);
p0 += i_stride;
p1 += i_stride;
p2 += i_stride;
p3 += i_stride;
}
_mm_storeu_si128( (__m128i *)cost, _mm_slli_epi32( sum, 1) );
}
/************** AVX2 instruction implementation ************************/
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
10
void inline sad4_32_fast_avx2( pixel *fenc, pixel *p0, pixel *p1, pixel *p2, pixel
*p3, int i_stride,int cost[4], int ly )
{
__m256i sum = _mm256_setzero_si256();
int i;
i_stride <<= 1;
for( i = 0; i < ly; i += 2 )
{
__m256i se = _mm256_load_si256( (__m256i *)(fenc) );
__m256i s0 = _mm256_loadu_si256( (__m256i *)(p0) );
__m256i s1 = _mm256_loadu_si256( (__m256i *)(p1) );
__m256i s2 = _mm256_loadu_si256( (__m256i *)(p2) );
__m256i s3 = _mm256_loadu_si256( (__m256i *)(p3) );
s0 = _mm256_sad_epu8( se, s0 );
s1 = _mm256_sad_epu8( se, s1 );
s2 = _mm256_sad_epu8( se, s2 );
s3 = _mm256_sad_epu8( se, s3 );
s0 = _mm256_hadd_epi32( s0, s1 );
s1 = _mm256_hadd_epi32( s2, s3 );
sum = _mm256_add_epi32( sum, _mm256_hadd_epi32( s0, s1 ) );
fenc += (2*FENC_STRIDE);
p0 += i_stride;
p1 += i_stride;
p2 += i_stride;
p3 += i_stride;
}
_mm256_storeu_si256( (__m256i *)cost, _mm256_slli_epi32( sum, 1) );
}
void inline sad4_64_fast_avx2( pixel *fenc, pixel *p0, pixel *p1, pixel *p2, pixel
*p3, int i_stride, int cost[4], int ly )
{
__m256i sum = _mm256_setzero_si256();
int i;
i_stride <<= 1;
for( i = 0; i < ly; i += 2 )
{
__m256i se = _mm256_load_si256( (__m256i *)(fenc) );
__m256i s0 = _mm256_loadu_si256( (__m256i *)(p0) );
__m256i s1 = _mm256_loadu_si256( (__m256i *)(p1) );
__m256i s2 = _mm256_loadu_si256( (__m256i *)(p2) );
__m256i s3 = _mm256_loadu_si256( (__m256i *)(p3) );
s0 = _mm256_sad_epu8( se, s0 );
s1 = _mm256_sad_epu8( se, s1 );
s2 = _mm256_sad_epu8( se, s2 );
s3 = _mm256_sad_epu8( se, s3 );
s0 = _mm256_hadd_epi32( s0, s1 );
s1 = _mm256_hadd_epi32( s2, s3 );
sum = _mm256_add_epi32( sum, _mm256_hadd_epi32( s0, s1 ) );
se = _mm256_load_si256( (__m256i *)(fenc + 32) );
s0 = _mm256_loadu_si256( (__m256i *)(p0 + 32) );
s1 = _mm256_loadu_si256( (__m256i *)(p1 + 32) );
s2 = _mm256_loadu_si256( (__m256i *)(p2 + 32) );
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
11
s3 = _mm256_loadu_si256( (__m256i *)(p3 + 32) );
s0 = _mm256_sad_epu8( se, s0 );
s1 = _mm256_sad_epu8( se, s1 );
s2 = _mm256_sad_epu8( se, s2 );
s3 = _mm256_sad_epu8( se, s3 );
s0 = _mm256_hadd_epi32( s0, s1 );
s1 = _mm256_hadd_epi32( s2, s3 );
sum = _mm256_add_epi32( sum, _mm256_hadd_epi32( s0, s1 ) );
fenc += (2*FENC_STRIDE);
p0 += i_stride;
p1 += i_stride;
p2 += i_stride;
p3 += i_stride;
}
_mm256_storeu_si256( (__m256i *)cost, _mm256_slli_epi32( sum, 1) );
}
Result: CPU Cycle original SSE AVX2 run 1 98877 977 679 run 2 98463 1092 690 run 3 98152 978 679 run 4 98003 943 679 run 5 98118 954 678 avg. 98322.6 988.8 681 speedup 1.00 99.44 144.38
Table 1. SSE and AVX2 implementation result
From the table 1, in this function, the SSE and AVX2 instructions can boost the
performance hundred times, and AVX2 code further provide more than 40% performance
improvement than SSE.
In Strongene encoding codec, observed from the profiling data, all the major hot
functions can be vectorized by SIMD instructions, such as low-complexity motion
compensation interpolation, transpose-free integer transform, butterfly Hadamard
transform and the least-memory-redundancy SAD/SSD calculation. Based on above
SIMD programming model and paradigms, Strongene rewrite the hot functions in the
encoding codec to pursuing maximum performance increase. Figure 7 is our profiling
data on a standard 1080p HEVC encoding scenario, 60% hot functions are running in
SIMD SSE instructions, and have started the AVX2 coding also.
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
12
Figure 7. Profiling Results of Strogene encoding functions
AVX2 instruction will theoretically double the performance of previous 128b SSE code
by 256b int computing, which will be supported in Xeon Haswell platform that launched
in 2014, we can expect further extremely performance improvement when upgrade the
SSE code to AVX2 at Haswell platform.
4.2 Thread Concurrency and Cores Scalability Tuning
As we have seen in the section 2.3, most of current implementations can’t utilize all the
cores’ capabilities of the multi-core platform. Based on the latest IA Xeon multi-core
architectures, clarified the parallelism dependency between HEVC CTB based algorithms,
Strongene propose the inter-frame wave-front (IFW) parallel framework to replace
original OWF(overlapped wave-front) and WPP(wave-front parallel processing) methods.
Then develop a three-level thread management scheme to guarantee the IFW can fully
utilize all the CPU cores to accelerate the HEVC encoding process. With this new
parallelism framework, at Intel(R) Xeon(R) Ivy Bridge(E5-2697 @2.70GHz, 12*2
physical cores, SMT OFF) platform, Strongene codec can utilize 18-24 physical cores’
computing resource, pretty good thread concurrency achieved.
Figure 8. Thread Concurrency and CPU Utilization in Strongene Encoding Codec
With the new WHP parallelism framework and fully implemented SIMD instructions
from task level and data level respectively, Strongene encoding codec accomplished
70.3x performance speedup at x86 processors for1080p video sequences.
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
13
4.3 Further Tuning with SMT/HT
Simultaneous Multithreading (SMT), also called Hyper-threading (HT) technology is
widely supported in all IA platforms, that make the operating system addresses two
virtual or logical cores for each physical core, and shares the resources between them
when possible. The main function of hyper-threading is to decrease the number of
dependent instructions on the pipeline. It offers performance benefits when CPU cores
fully running in the heavy level, but not in every application such as that have the cores
stay idle, in this case SMT technology will introduce the task/thread switching overhead.
Therefore, we turn off the SMT in Strongene encoding codec platform, and reach the
HEVC 1080p video real-time encoding standard at IA Xeon IVY Bridge E5-2697 v2
platform, as the yellow line showed in following table.
Platform Resolution Bitrate (kbps) fps CPU Usage Encoding- mode SMT
WSM
E7-8837
@2.67GHz
(8*8c)
720p 800 8.2 15c ultrafast OFF
1600 2.6 18c ultraslow OFF
1080p 1500 3.6 27c ultrafast OFF
3000 1.4 23c ultraslow OFF
4k 5000 1.2 19c ultrafast OFF
10000 0.5 21c ultraslow OFF
IVY
E5-2697 v2
@2.70GHz
(2*12c)
720p 1000 11 40% 14c ultraslow ON
720p 1000 46 60% 16c ultrafast ON
1080p 1500 21 70% 16c ultrafast ON
1080p 1500 25 80% 18c ultrafast OFF
IVB
E7-4890
@2.80GHz
(4*15c)
1080p 2000 22 19c ultrafast ON
1080p 8000 6.11 15c ultraslow ON
4k 8000 7.02 29c ultrafast ON
4k 8000 3.28 23c ultraslow ON
Table 2. Strongene HEVC encoding performance on Xeon Platform1
After achieving tremendous performance improvements, we further evaluate the
Strongene HEVC encoding codec capability at IA Xeon platform, focus on the bandwidth
and quality issues.
File:BQTerrance_1920x1080_60.yuv
Resolution:1920x1080 Size:1869Mbyte,622080 kbps
Platform:E5-2697 v2 @2.70GHz, RAM 64GB
DDR3-1867, QPI 8.0 GT /s
OS/SW:Red Hat 6.4, kernel 2.6.32, gcc v4.4.7, ffmpeg
v2.0.1, Lentoid HEVC Encoder r2096 linux
Codec Size(byte) Bitrate(kbps) PSNR_Y/U/V(db)
H.264 12254696 4078.1
32.311/39.369/42.043
H.265 6215615 2064.28 34.016/39.822/42.141
1Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such
as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Configurations: [describe config + what test used + who did testing]. For more information go to http://www.intel.com/performance
White Paper: Real-Time CPU Based H.265/HEVC Encoding Solution with Intel® Platform Technology
14
Table 3. H.264 and H.265 codec performance compare
From the Table 3 and Figure 9, we can see that H.265/HEVC codec saves 50%
bandwidth and maintain the same video quality.
Figure 9. Bandwidth and PSNR Compare of H.264 and H.265 codec
5. Summary
H.265/HEVC is the most popular video standard in the coming decade, all the media
applications and products are pursuing the HEVC support currently. In this paper, we
accomplished a CPU based real-time HEVC encoding solution on Intel(R) Xeon(R)
platform with IA new platform technologies. Our IA platform based advanced solution
has been deployed in Xunlei[4] online video service and product, and will definitely
accelerate the H.265/HEVC technology production and population.
Reference [1] Overview of the High Efficiency Video Coding (HEVC) Standard, IEEE TRANSACTIONS ON
CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 12, DECEMBER 2012.
[2] High Efficiency Video Coding (HEVC) text specification draft 10, JCTVC-L1003_v34
[3] http://www.strongene.com/en/homepage.jsp
[4] http://www.xunlei.com/