X265-Android

H.265的壓縮率早有耳聞,我想在我們的項目中使用它,於是花了幾個小時的時間來預言X265是否適合在Android上使用,結論是不適合,因爲CPU佔用率過高,幀率很低。於是在qcom上看了幾款CPU目前都是支持硬解軟編,但是我們目前對編碼的需求大於解碼,所以目前只能放棄H.265。

Linux測試

  • 測試的CPU
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 60
Model name:            Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
Stepping:              3
CPU MHz:               3593.250
CPU max MHz:           3600.0000
CPU min MHz:           800.0000
BogoMIPS:              6400.21
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K
NUMA node0 CPU(s):     0-3
  • ubuntu下安裝X265
apt-get install x265
  • 測試參數
    這裏採用ultrafast模式、去B幀、碼率4MB、分辨率1080P、幀率50(這個不會影響編碼性能)
x265 -p ultrafast --bframes 0 --bitrate 4000 --input-res 1920x1080 --fps 50 i420_1920x1080_50.yuv -o out.h265
  • 測試分析
    1. X265使用了X86彙編加速(using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2)
    2. 在i5-4570上的平均編碼幀率也就才16.64 fps,要知道手機的H.264的硬編碼可以輕鬆上60 fps。
    3. CPU佔用率奇高,跑它以後啥都別幹了。
yuv  [info]: 1920x1080 fps 50000/1000 i420p8 frames 0 - 500 of 501
x265 [info]: HEVC encoder version 1.5
x265 [info]: build info [Linux][GCC 4.9.2][64 bit] 8bpp
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
x265 [info]: Main profile, Level-4.1 (Main tier)
x265 [info]: WPP streams / frame threads / pool  : 34 / 2 / 4
x265 [info]: CTU size / RQT depth inter / intra  : 32 / 1 / 1
x265 [info]: ME / range / subpel / merge         : dia / 25 / 0 / 2
x265 [info]: Keyframe min / max / scenecut       : 25 / 250 / 0
x265 [info]: Lookahead / bframes / badapt        : 10 / 0 / 0
x265 [info]: b-pyramid / weightp / weightb / refs: 0 / 0 / 0 / 1
x265 [info]: Rate Control / AQ-Strength / CUTree : ABR-4000 kbps / 0.0 / 0
x265 [info]: tools: rd=2 psy-rd=0.30 early-skip deblock fast-intra tmvp 
x265 [info]: frame I:      3, Avg QP:34.18  kb/s: 16426.80
x265 [info]: frame P:    498, Avg QP:34.74  kb/s: 3945.84 
x265 [info]: global :    501, Avg QP:34.74  kb/s: 4020.58 
x265 [info]: consecutive B-frames: 100.0% 

encoded 501 frames in 30.10s (16.64 fps), 4020.58 kb/s
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
12164 liu       20   0  764752 282116   4840 S 346.3  3.5   1:28.37 x265

Android測試

並沒有死心,繼續在Android上測試。測試的手機型號爲:SM-G9009W,CPU爲Qcom的8974。

  • 8974的codec能力
    所以爲啥H.265目前普及不開是有原因的。
8974 Encoder capabilities
 ______________________________________________________
 | Codec    | W       H       fps     Mbps    MB/s    |
 |__________|_________________________________________|
 | h264     | 3840    2160    30      100     972000  |
 |          | 4096    2160    24      100     829440  |
 | mpeg4    | 1920    1088    30      40      244800  |
 | vp8      | 1920    1088    30      20      244800  |
 | h263     | 864     480     30      2       48600   |
 |__________|_________________________________________|

 8974 Decoder capabilities
 ______________________________________________________
 | Codec    | W       H       fps     Mbps    MB/s    |
 |__________|_________________________________________|
 | h264     | 3840    2160    30      100     972000  |
 |          | 4096    2160    24      100     829440  |
 | hevc     | 1920    1088    30      6       244800  |
 | mpeg4    | 1920    1088    60      60      489600  |
 | vc1      | 1920    1088    60      60      489600  |
 | vp8      | 3820    2160    30      20      972000  |
 | divx3    | 720     480     30      2       40500   |
 | div4/5/6 | 1920    1088    30      10      244800  |
 | h263     | 864     480     30      2       48600   |
 | mpeg2    | 1920    1088    30      40      244800  |
 |__________|_________________________________________|
  • 測試的CPU
Processor    : ARMv7 Processor rev 1 (v7l)
processor    : 0
BogoMIPS    : 38.40

processor    : 1
BogoMIPS    : 38.40

processor    : 2
BogoMIPS    : 38.40

processor    : 3
BogoMIPS    : 38.40

Features    : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 
CPU implementer    : 0x51
CPU architecture: 7
CPU variant    : 0x2
CPU part    : 0x06f
CPU revision    : 1

Hardware    : Qualcomm MSM8974PRO-AC
Revision    : 000a
Serial        : 0000083000009994
  • 測試參數
    由於手機上本身就已經有一個832x480分辨的yuv文件,所以我就直接拿它進行測試,碼率降爲500KB,其他參數保持和linux一樣
./x265 -p ultrafast --bframes 0 --bitrate 500 --input-res 832x480 --fps 30 832_480.i420 -o out.h265
  • 測試分析

    1. 雖然using cpu capabilities: none不過確實使用了neon加速的,爲什麼不顯示neon呢,因爲僅僅支持x86的顯示,請看x265_report_simd這個函數。
    2. 832x480的分辨率幀率既然纔有9 fps
    3. CPU佔用率奇高,跑它以後啥都別幹了
mes 0 --bitrate 500 --input-res 832x480 --fps 30 832_480.i420 -o out.h265     <
yuv  [info]: 832x480 fps 30000/1000 i420p8 frames 0 - 500 of 501
raw  [info]: output file: out.h265
x265 [info]: HEVC encoder version X265_VERSION
x265 [info]: build info [Linux][GCC 4.9.0][32 bit][noasm] 8bit
x265 [info]: using cpu capabilities: none!
x265 [info]: Main profile, Level-3 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: frame threads / pool features       : 2 / wpp(15 rows)
x265 [warning]: Source height < 720p; disabling lookahead-slices
x265 [info]: Coding QT: max CU size, min CU size : 32 / 16
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : dia / 57 / 0 / 2
x265 [info]: Keyframe min / max / scenecut       : 25 / 250 / 0
x265 [info]: Lookahead / bframes / badapt        : 5 / 0 / 0
x265 [info]: b-pyramid / weightp / weightb       : 0 / 0 / 0
x265 [info]: References / ref-limit  cu / depth  : 1 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 0.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : ABR-500 kbps / 0.60
x265 [info]: tools: rd=2 psy-rd=2.00 early-skip rskip tmvp fast-intra
x265 [info]: tools: strong-intra-smoothing deblock
x265 [info]: frame I:      3, Avg QP:34.72  kb/s: 6350.88
x265 [info]: frame P:    498, Avg QP:37.60  kb/s: 491.82
x265 [info]: consecutive B-frames: 100.0% 

encoded 501 frames in 55.63s (9.01 fps), 526.91 kb/s, Avg QP:37.58
31580  2  96% S     8  67568K  40544K     shell    ./x265

Android上X265的編譯

  • 下載
hg clone https://bitbucket.org/multicoreware/x265
  • Android.mk
LOCAL_PATH := $(call my-dir)

#---------- static module ----------#

COMMON_CPP_SRCS := \
    common/cpu.cpp \
    common/ipfilter.cpp \
    common/threadpool.cpp \
    common/param.cpp \
    common/picyuv.cpp \
    common/framedata.cpp \
    common/bitstream.cpp \
    common/pixel.cpp \
    common/predict.cpp \
    common/quant.cpp \
    common/constants.cpp \
    common/md5.cpp \
    common/dct.cpp \
    common/loopfilter.cpp \
    common/primitives.cpp \
    common/scalinglist.cpp \
    common/piclist.cpp \
    common/frame.cpp \
    common/slice.cpp \
    common/common.cpp \
    common/threading.cpp \
    common/lowres.cpp \
    common/intrapred.cpp \
    common/wavefront.cpp \
    common/winxp.cpp \
    common/shortyuv.cpp \
    common/yuv.cpp \
    common/deblock.cpp \
    common/cudata.cpp \
    common/version.cpp

COMMON_ARM_SRCS := \
    common/arm/asm-primitives.cpp \
    common/arm/asm.S \
    common/arm/blockcopy8.S \
    common/arm/cpu-a.S \
    common/arm/dct-a.S \
    common/arm/ipfilter8.S \
    common/arm/mc-a.S \
    common/arm/pixel-util.S \
    common/arm/sad-a.S \
    common/arm/ssd-a.S

COMMON_X86_SRCS := \
    common/x86/blockcopy8.asm \
    common/x86/const-a.asm \
    common/x86/cpu-a.asm \
    common/x86/dct8.asm \
    common/x86/intrapred16.asm \
    common/x86/intrapred8_allangs.asm \
    common/x86/intrapred8.asm \
    common/x86/ipfilter16.asm \
    common/x86/ipfilter8.asm \
    common/x86/loopfilter.asm \
    common/x86/mc-a2.asm \
    common/x86/mc-a.asm \
    common/x86/pixel-32.asm \
    common/x86/pixel-a.asm \
    common/x86/pixeladd8.asm \
    common/x86/pixel-util8.asm \
    common/x86/sad16-a.asm \
    common/x86/sad-a.asm \
    common/x86/ssd-a.asm \
    common/x86/x86inc.asm \
    common/x86/x86util.asm

ENCODER_CPP_SRCS := \
    encoder/analysis.cpp \
    encoder/api.cpp \
    encoder/bitcost.cpp \
    encoder/dpb.cpp \
    encoder/encoder.cpp \
    encoder/entropy.cpp \
    encoder/frameencoder.cpp \
    encoder/framefilter.cpp \
    encoder/level.cpp \
    encoder/motion.cpp \
    encoder/nal.cpp \
    encoder/ratecontrol.cpp \
    encoder/reference.cpp \
    encoder/sao.cpp \
    encoder/search.cpp \
    encoder/sei.cpp \
    encoder/slicetype.cpp \
    encoder/weightPrediction.cpp \


include $(CLEAR_VARS)
LOCAL_MODULE     := common
LOCAL_ARM_MODULE := arm

LOCAL_CFLAGS     := -Wall -Wextra -Wshadow -std=gnu++98 -fPIC -Wno-array-bounds -ffast-math -fno-exceptions -fpermissive -frtti -Wno-maybe-uninitialized
LOCAL_CFLAGS     += -DEXPORT_C_API=1 -DHAVE_INT_TYPES_H=1 -DHIGH_BIT_DEPTH=0 -DX265_DEPTH=8 -DX265_NS=x265 -D__STDC_LIMIT_MACROS=1 -DHAVE_STRTOK_R
LOCAL_EXPORT_CFLAGS := $(LOCAL_CFLAGS)

LOCAL_SRC_FILES := $(COMMON_CPP_SRCS)

$(info arm = $(TARGET_ARCH_ABI))
ifneq (, $(findstring $(TARGET_ARCH_ABI),armeabi armeabi-v7a))
    LOCAL_CFLAGS    += -DHAVE_NEON -DX265_ARCH_ARM
    LOCAL_SRC_FILES += $(COMMON_ARM_SRCS)
endif

ifeq ($(TARGET_ARCH_ABI),x86)
    LOCAL_CFLAGS    += -UX86_64 -DX265_ARCH_X86
    LOCAL_SRC_FILES += $(COMMON_X86_SRCS)
endif

LOCAL_C_INCLUDES := $(LOCAL_PATH) $(LOCAL_PATH)/common $(LOCAL_PATH)/encoder
LOCAL_EXPORT_C_INCLUDES := $(LOCAL_C_INCLUDES)
include $(BUILD_STATIC_LIBRARY)


#---------- static module ----------#

include $(CLEAR_VARS)
LOCAL_MODULE     := encoder
LOCAL_ARM_MODULE := arm
LOCAL_SRC_FILES  := $(ENCODER_CPP_SRCS)
LOCAL_STATIC_LIBRARIES := common
include $(BUILD_STATIC_LIBRARY)


#---------- static module ----------#

include $(CLEAR_VARS)
LOCAL_MODULE     := input
LOCAL_ARM_MODULE := arm
LOCAL_SRC_FILES := \
    input/input.cpp \
    input/y4m.cpp \
    input/yuv.cpp

LOCAL_C_INCLUDES := $(LOCAL_PATH)
LOCAL_STATIC_LIBRARIES := common
include $(BUILD_STATIC_LIBRARY)


#---------- static module ----------#

include $(CLEAR_VARS)
LOCAL_MODULE     := output
LOCAL_ARM_MODULE := arm

LOCAL_SRC_FILES := \
    output/reconplay.cpp \
    output/raw.cpp \
    output/y4m.cpp \
    output/yuv.cpp \
    output/output.cpp

LOCAL_C_INCLUDES := $(LOCAL_PATH)
LOCAL_STATIC_LIBRARIES := common
include $(BUILD_STATIC_LIBRARY)


include $(CLEAR_VARS)
LOCAL_MODULE     := x265
LOCAL_ARM_MODULE := arm
LOCAL_WHOLE_STATIC_LIBRARIES := encoder input output
include $(BUILD_SHARED_LIBRARY)
#---------- binary module ----------#

include $(CLEAR_VARS)
LOCAL_MODULE     := x265_test
LOCAL_ARM_MODULE := arm
LOCAL_SRC_FILES  := x265-extras.cpp x265.cpp
LOCAL_C_INCLUDES := $(LOCAL_PATH)
LOCAL_STATIC_LIBRARIES := encoder input output
include $(BUILD_EXECUTABLE)
  • 編譯結果
    1. libx265.so 在Android上可用的動態庫
    2. x265_test 我在上面用到的測試程序
[armeabi-v7a] SharedLibrary  : libx265.so
[armeabi-v7a] Install        : libx265.so => out/libs/armeabi-v7a/libx265.so
[armeabi-v7a] Compile++ thumb: x265_test <= x265-extras.cpp
[armeabi-v7a] Compile++ thumb: x265_test <= x265.cpp
[armeabi-v7a] Executable     : x265_test
[armeabi-v7a] Install        : x265_test => out/libs/armeabi-v7a/x265_test
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章