Projects
home:dsterba
x265
Sign Up
Log In
Username
Password
We truncated the diff of some files because they were too big. If you want to see the full diff for every file,
click here
.
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
Expand all
Collapse all
Difference Between Revision 2 and
Essentials
/
x265
View file
x265.changes
Changed
@@ -1,4 +1,9 @@ ------------------------------------------------------------------- +Tue Nov 22 00:00:00 UTC 2022 - David Sterba <dsterba@suse.cz> + +- update to git 40e37bce9a357fb18f, so 206 + +------------------------------------------------------------------- Wed May 19 13:21:09 UTC 2021 - Luigi Baldoni <aloisio@gmx.com> - Build libx265_main10 and libx265_main12 unconditionally and
View file
x265.spec
Changed
@@ -17,7 +17,7 @@ # -%define sover 199 +%define sover 206 %define libname lib%{name} %define libsoname %{libname}-%{sover} %define uver 3_5 @@ -29,9 +29,9 @@ Group: Productivity/Multimedia/Video/Editors and Convertors URL: https://bitbucket.org/multicoreware/x265_git Source0: https://bitbucket.org/multicoreware/x265_git/downloads/%{name}_%{version}.tar.gz -Patch0: arm.patch +#Patch0: arm.patch Patch1: x265.pkgconfig.patch -Patch2: x265-fix_enable512.patch +#Patch2: x265-fix_enable512.patch BuildRequires: cmake >= 2.8.8 BuildRequires: gcc-c++ BuildRequires: nasm >= 2.13 @@ -41,6 +41,7 @@ %endif %description +(git version) x265 is a free library for encoding next-generation H265/HEVC video streams.
View file
baselibs.conf
Changed
@@ -1,1 +1,1 @@ -libx265-199 +libx265-206
View file
x265_3.5.tar.gz/build/aarch64-linux/crosscompile.cmake
Changed
@@ -3,7 +3,7 @@ # Please report bugs on bitbucket # Run cmake with: cmake -DCMAKE_TOOLCHAIN_FILE=crosscompile.cmake -G "Unix Makefiles" ../../source && ccmake ../../source -set(CROSS_COMPILE_ARM 1) +set(CROSS_COMPILE_ARM64 1) set(CMAKE_SYSTEM_NAME Linux) set(CMAKE_SYSTEM_PROCESSOR aarch64)
View file
x265_3.5.tar.gz/build/arm-linux/make-Makefiles.bash
Changed
@@ -1,4 +1,4 @@ #!/bin/bash # Run this from within a bash shell -cmake -G "Unix Makefiles" ../../source && ccmake ../../source +cmake -DCMAKE_TOOLCHAIN_FILE="crosscompile.cmake" -G "Unix Makefiles" ../../source && ccmake ../../source
View file
x265_3.5.tar.gz/doc/reST/cli.rst
Changed
@@ -632,9 +632,8 @@ auto-detection by the encoder. If specified, the encoder will attempt to bring the encode specifications within that specified level. If the encoder is unable to reach the level it issues a - warning and aborts the encode. If the requested requirement level is - higher than the actual level, the actual requirement level is - signaled. + warning and aborts the encode. The requested level will be signaled + in the bitstream even if it is higher than the actual level. Beware, specifying a decoder level will force the encoder to enable VBV for constant rate factor encodes, which may introduce @@ -714,11 +713,8 @@ (main, main10, etc). Second, an encoder is created from this x265_param instance and the :option:`--level-idc` and :option:`--high-tier` parameters are used to reduce bitrate or other - features in order to enforce the target level. Finally, the encoder - re-examines the final set of parameters and detects the actual - minimum decoder requirement level and this is what is signaled in - the bitstream headers. The detected decoder level will only use High - tier if the user specified a High tier level. + features in order to enforce the target level. The detected decoder level + will only use High tier if the user specified a High tier level. The signaled profile will be determined by the encoder's internal bitdepth and input color space. If :option:`--keyint` is 0 or 1, @@ -961,21 +957,21 @@ Note that :option:`--analysis-save-reuse-level` and :option:`--analysis-load-reuse-level` must be paired with :option:`--analysis-save` and :option:`--analysis-load` respectively. - +--------------+------------------------------------------+ - | Level | Description | - +==============+==========================================+ - | 1 | Lookahead information | - +--------------+------------------------------------------+ - | 2 to 4 | Level 1 + intra/inter modes, ref's | - +--------------+------------------------------------------+ - | 5 and 6 | Level 2 + rect-amp | - +--------------+------------------------------------------+ - | 7 | Level 5 + AVC size CU refinement | - +--------------+------------------------------------------+ - | 8 and 9 | Level 5 + AVC size Full CU analysis-info | - +--------------+------------------------------------------+ - | 10 | Level 5 + Full CU analysis-info | - +--------------+------------------------------------------+ + +--------------+---------------------------------------------------+ + | Level | Description | + +==============+===================================================+ + | 1 | Lookahead information | + +--------------+---------------------------------------------------+ + | 2 to 4 | Level 1 + intra/inter modes, depth, ref's, cutree | + +--------------+---------------------------------------------------+ + | 5 and 6 | Level 2 + rect-amp | + +--------------+---------------------------------------------------+ + | 7 | Level 5 + AVC size CU refinement | + +--------------+---------------------------------------------------+ + | 8 and 9 | Level 5 + AVC size Full CU analysis-info | + +--------------+---------------------------------------------------+ + | 10 | Level 5 + Full CU analysis-info | + +--------------+---------------------------------------------------+ .. option:: --refine-mv-type <string> @@ -1332,6 +1328,11 @@ Search range for HME level 0, 1 and 2. The Search Range for each HME level must be between 0 and 32768(excluding). Default search range is 16,32,48 for level 0,1,2 respectively. + +.. option:: --mcstf, --no-mcstf + + Enable Motion Compensated Temporal filtering. + Default: disabled Spatial/intra options ===================== @@ -1473,17 +1474,9 @@ .. option:: --hist-scenecut, --no-hist-scenecut - Indicates that scenecuts need to be detected using luma edge and chroma histograms. - :option:`--hist-scenecut` enables scenecut detection using the histograms and disables the default scene cut algorithm. - :option:`--no-hist-scenecut` disables histogram based scenecut algorithm. - -.. option:: --hist-threshold <0.0..1.0> - - This value represents the threshold for normalized SAD of edge histograms used in scenecut detection. - This requires :option:`--hist-scenecut` to be enabled. For example, a value of 0.2 indicates that a frame with normalized SAD value - greater than 0.2 against the previous frame as scenecut. - Increasing the threshold reduces the number of scenecuts detected. - Default 0.03. + Scenecuts detected based on histogram, intensity and variance of the picture. + :option:`--hist-scenecut` enables or :option:`--no-hist-scenecut` disables scenecut detection based on + histogram. .. option:: --radl <integer> @@ -1766,6 +1759,11 @@ Default 1.0. **Range of values:** 0.0 to 3.0 +.. option:: --sbrc --no-sbrc + + To enable and disable segment based rate control. + Default: disabled. + .. option:: --hevc-aq Enable adaptive quantization @@ -2057,6 +2055,14 @@ rate control mode. Default disabled. **Experimental feature** + + +.. option:: bEncFocusedFramesOnly + + Used to trigger encoding of selective GOPs; Disabled by default. + + **API ONLY** + Quantization Options ==================== @@ -2427,6 +2433,81 @@ Values in the range 0..12. See D.3.3 of the HEVC spec. for a detailed explanation. Required for HLG (Hybrid Log Gamma) signaling. Not signaled by default. +.. option:: --video-signal-type-preset <string> + + Specify combinations of color primaries, transfer characteristics, color matrix, + range of luma and chroma signals, and chroma sample location. + String format: <system-id>:<color-volume> + + This has higher precedence than individual VUI parameters. If any individual VUI option + is specified together with this, which changes the values set corresponding to the system-id + or color-volume, it will be discarded. + + system-id options and their corresponding values: + +----------------+---------------------------------------------------------------+ + | system-id | Value | + +================+===============================================================+ + | BT601_525 | --colorprim smpte170m --transfer smpte170m | + | | --colormatrix smpte170m --range limited --chromaloc 0 | + +----------------+---------------------------------------------------------------+ + | BT601_626 | --colorprim bt470bg --transfer smpte170m --colormatrix bt470bg| + | | --range limited --chromaloc 0 | + +----------------+---------------------------------------------------------------+ + | BT709_YCC | --colorprim bt709 --transfer bt709 --colormatrix bt709 | + | | --range limited --chromaloc 0 | + +----------------+---------------------------------------------------------------+ + | BT709_RGB | --colorprim bt709 --transfer bt709 --colormatrix gbr | + | | --range limited | + +----------------+---------------------------------------------------------------+ + | BT2020_YCC_NCL | --colorprim bt2020 --transfer bt2020-10 --colormatrix bt709 | + | | --range limited --chromaloc 2 | + +----------------+---------------------------------------------------------------+ + | BT2020_RGB | --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020nc| + | | --range limited | + +----------------+---------------------------------------------------------------+ + | BT2100_PQ_YCC | --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020nc| + | | --range limited --chromaloc 2 | + +----------------+---------------------------------------------------------------+ + | BT2100_PQ_ICTCP| --colorprim bt2020 --transfer smpte2084 --colormatrix ictcp | + | | --range limited --chromaloc 2 | + +----------------+---------------------------------------------------------------+ + | BT2100_PQ_RGB | --colorprim bt2020 --transfer smpte2084 --colormatrix gbr | + | | --range limited | + +----------------+---------------------------------------------------------------+ + | BT2100_HLG_YCC | --colorprim bt2020 --transfer arib-std-b67 | + | | --colormatrix bt2020nc --range limited --chromaloc 2 | + +----------------+---------------------------------------------------------------+ + | BT2100_HLG_RGB | --colorprim bt2020 --transfer arib-std-b67 --colormatrix gbr | + | | --range limited | + +----------------+---------------------------------------------------------------+ + | FR709_RGB | --colorprim bt709 --transfer bt709 --colormatrix gbr | + | | --range full | + +----------------+---------------------------------------------------------------+ + | FR2020_RGB | --colorprim bt2020 --transfer bt2020-10 --colormatrix gbr | + | | --range full | + +----------------+---------------------------------------------------------------+ + | FRP3D65_YCC | --colorprim smpte432 --transfer bt709 --colormatrix smpte170m | + | | --range full --chromaloc 1 | + +----------------+---------------------------------------------------------------+ + + color-volume options and their corresponding values: + +----------------+---------------------------------------------------------------+ + | color-volume | Value | + +================+===============================================================+ + | P3D65x1000n0005| --master-display G(13250,34500)B(7500,3000)R(34000,16000) | + | | WP(15635,16450)L(10000000,5) | + +----------------+---------------------------------------------------------------+ + | P3D65x4000n005 | --master-display G(13250,34500)B(7500,3000)R(34000,16000) | + | | WP(15635,16450)L(40000000,50) | + +----------------+---------------------------------------------------------------+ + | BT2100x108n0005| --master-display G(8500,39850)B(6550,2300)R(34000,146000) | + | | WP(15635,16450)L(10000000,1) | + +----------------+---------------------------------------------------------------+ + + Note: The color-volume options can be used only with the system-id options BT2100_PQ_YCC, + BT2100_PQ_ICTCP, and BT2100_PQ_RGB. It is incompatible with other options.
View file
x265_3.5.tar.gz/doc/reST/introduction.rst
Changed
@@ -77,6 +77,6 @@ to start is with the `Motion Picture Experts Group - Licensing Authority - HEVC Licensing Program <http://www.mpegla.com/main/PID/HEVC/default.aspx>`_. -x265 is a registered trademark of MulticoreWare, Inc. The x265 logo is +x265 is a registered trademark of MulticoreWare, Inc. The X265 logo is a trademark of MulticoreWare, and may only be used with explicit written permission. All rights reserved.
View file
x265_3.5.tar.gz/readme.rst
Changed
@@ -2,7 +2,7 @@ x265 HEVC Encoder ================= -| **Read:** | Online `documentation <http://x265.readthedocs.org/en/default/>`_ | Developer `wiki <http://bitbucket.org/multicoreware/x265/wiki/>`_ +| **Read:** | Online `documentation <http://x265.readthedocs.org/en/master/>`_ | Developer `wiki <http://bitbucket.org/multicoreware/x265_git/wiki/>`_ | **Download:** | `releases <http://ftp.videolan.org/pub/videolan/x265/>`_ | **Interact:** | #x265 on freenode.irc.net | `x265-devel@videolan.org <http://mailman.videolan.org/listinfo/x265-devel>`_ | `Report an issue <https://bitbucket.org/multicoreware/x265/issues?status=new&status=open>`_
View file
x265_3.5.tar.gz/source/CMakeLists.txt
Changed
@@ -29,7 +29,7 @@ option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF) mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD) # X265_BUILD must be incremented each time the public API is changed -set(X265_BUILD 199) +set(X265_BUILD 206) configure_file("${PROJECT_SOURCE_DIR}/x265.def.in" "${PROJECT_BINARY_DIR}/x265.def") configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in" @@ -38,14 +38,20 @@ SET(CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake" "${CMAKE_MODULE_PATH}") # System architecture detection -string(TOLOWER "${CMAKE_SYSTEM_PROCESSOR}" SYSPROC) +if (APPLE AND CMAKE_OSX_ARCHITECTURES) + string(TOLOWER "${CMAKE_OSX_ARCHITECTURES}" SYSPROC) +else() + string(TOLOWER "${CMAKE_SYSTEM_PROCESSOR}" SYSPROC) +endif() set(X86_ALIASES x86 i386 i686 x86_64 amd64) -set(ARM_ALIASES armv6l armv7l aarch64) +set(ARM_ALIASES armv6l armv7l) +set(ARM64_ALIASES arm64 arm64e aarch64) list(FIND X86_ALIASES "${SYSPROC}" X86MATCH) list(FIND ARM_ALIASES "${SYSPROC}" ARMMATCH) +list(FIND ARM64_ALIASES "${SYSPROC}" ARM64MATCH) set(POWER_ALIASES ppc64 ppc64le) list(FIND POWER_ALIASES "${SYSPROC}" POWERMATCH) -if("${SYSPROC}" STREQUAL "" OR X86MATCH GREATER "-1") +if(X86MATCH GREATER "-1") set(X86 1) add_definitions(-DX265_ARCH_X86=1) if(CMAKE_CXX_FLAGS STREQUAL "-m32") @@ -70,15 +76,18 @@ else() set(CROSS_COMPILE_ARM 0) endif() + message(STATUS "Detected ARM target processor") set(ARM 1) - if("${CMAKE_SIZEOF_VOID_P}" MATCHES 8) - message(STATUS "Detected ARM64 target processor") - set(ARM64 1) - add_definitions(-DX265_ARCH_ARM=1 -DX265_ARCH_ARM64=1 -DHAVE_ARMV6=0) - else() - message(STATUS "Detected ARM target processor") - add_definitions(-DX265_ARCH_ARM=1 -DX265_ARCH_ARM64=0 -DHAVE_ARMV6=1) - endif() + add_definitions(-DX265_ARCH_ARM=1 -DHAVE_ARMV6=1) +elseif(ARM64MATCH GREATER "-1") + #if(CROSS_COMPILE_ARM64) + #message(STATUS "Cross compiling for ARM64 arch") + #else() + #set(CROSS_COMPILE_ARM64 0) + #endif() + message(STATUS "Detected ARM64 target processor") + set(ARM64 1) + add_definitions(-DX265_ARCH_ARM64=1 -DHAVE_NEON) else() message(STATUS "CMAKE_SYSTEM_PROCESSOR value `${CMAKE_SYSTEM_PROCESSOR}` is unknown") message(STATUS "Please add this value near ${CMAKE_CURRENT_LIST_FILE}:${CMAKE_CURRENT_LIST_LINE}") @@ -239,24 +248,22 @@ endif() endif() if(ARM AND CROSS_COMPILE_ARM) - if(ARM64) - set(ARM_ARGS -fPIC) - else() - set(ARM_ARGS -march=armv6 -mfloat-abi=soft -mfpu=vfp -marm -fPIC) - endif() message(STATUS "cross compile arm") + set(ARM_ARGS -march=armv6 -mfloat-abi=soft -mfpu=vfp -marm -fPIC) elseif(ARM) - if(ARM64) - set(ARM_ARGS -fPIC) + find_package(Neon) + if(CPU_HAS_NEON) + set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=neon -marm -fPIC) add_definitions(-DHAVE_NEON) else() - find_package(Neon) - if(CPU_HAS_NEON) - set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=neon -marm -fPIC) - add_definitions(-DHAVE_NEON) - else() - set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=vfp -marm) - endif() + set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=vfp -marm) + endif() + endif() + if(ARM64 OR CROSS_COMPILE_ARM64) + set(ARM_ARGS -fPIC -flax-vector-conversions) + find_package(Neon) + if(CPU_HAS_NEON) + add_definitions(-DHAVE_NEON) endif() endif() add_definitions(${ARM_ARGS}) @@ -350,7 +357,7 @@ endif(GCC) find_package(Nasm) -if(ARM OR CROSS_COMPILE_ARM) +if(ARM OR CROSS_COMPILE_ARM OR ARM64 OR CROSS_COMPILE_ARM64) option(ENABLE_ASSEMBLY "Enable use of assembly coded primitives" ON) elseif(NASM_FOUND AND X86) if (NASM_VERSION_STRING VERSION_LESS "2.13.0") @@ -440,6 +447,18 @@ endif() add_definitions(-DX265_NS=${X265_NS}) +if(ARM64) + if(HIGH_BIT_DEPTH) + if(MAIN12) + list(APPEND ASM_FLAGS -DHIGH_BIT_DEPTH=1 -DBIT_DEPTH=12 -DX265_NS=${X265_NS}) + else() + list(APPEND ASM_FLAGS -DHIGH_BIT_DEPTH=1 -DBIT_DEPTH=10 -DX265_NS=${X265_NS}) + endif() + else() + list(APPEND ASM_FLAGS -DHIGH_BIT_DEPTH=0 -DBIT_DEPTH=8 -DX265_NS=${X265_NS}) + endif() +endif(ARM64) + option(WARNINGS_AS_ERRORS "Stop compiles on first warning" OFF) if(WARNINGS_AS_ERRORS) if(GCC) @@ -536,11 +555,7 @@ # compile ARM arch asm files here enable_language(ASM) foreach(ASM ${ARM_ASMS}) - if(ARM64) - set(ASM_SRC ${CMAKE_CURRENT_SOURCE_DIR}/common/aarch64/${ASM}) - else() - set(ASM_SRC ${CMAKE_CURRENT_SOURCE_DIR}/common/arm/${ASM}) - endif() + set(ASM_SRC ${CMAKE_CURRENT_SOURCE_DIR}/common/arm/${ASM}) list(APPEND ASM_SRCS ${ASM_SRC}) list(APPEND ASM_OBJS ${ASM}.${SUFFIX}) add_custom_command( @@ -549,6 +564,19 @@ ARGS ${ARM_ARGS} -c ${ASM_SRC} -o ${ASM}.${SUFFIX} DEPENDS ${ASM_SRC}) endforeach() + elseif(ARM64 OR CROSS_COMPILE_ARM64) + # compile ARM64 arch asm files here + enable_language(ASM) + foreach(ASM ${ARM_ASMS}) + set(ASM_SRC ${CMAKE_CURRENT_SOURCE_DIR}/common/aarch64/${ASM}) + list(APPEND ASM_SRCS ${ASM_SRC}) + list(APPEND ASM_OBJS ${ASM}.${SUFFIX}) + add_custom_command( + OUTPUT ${ASM}.${SUFFIX} + COMMAND ${CMAKE_CXX_COMPILER} + ARGS ${ARM_ARGS} ${ASM_FLAGS} -c ${ASM_SRC} -o ${ASM}.${SUFFIX} + DEPENDS ${ASM_SRC}) + endforeach() elseif(X86) # compile X86 arch asm files here foreach(ASM ${MSVC_ASMS})
View file
x265_3.5.tar.gz/source/abrEncApp.cpp
Changed
@@ -1,1111 +1,1111 @@ -/***************************************************************************** -* Copyright (C) 2013-2020 MulticoreWare, Inc -* -* Authors: Pooja Venkatesan <pooja@multicorewareinc.com> -* Aruna Matheswaran <aruna@multicorewareinc.com> -* -* This program is free software; you can redistribute it and/or modify -* it under the terms of the GNU General Public License as published by -* the Free Software Foundation; either version 2 of the License, or -* (at your option) any later version. -* -* This program is distributed in the hope that it will be useful, -* but WITHOUT ANY WARRANTY; without even the implied warranty of -* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -* GNU General Public License for more details. -* -* You should have received a copy of the GNU General Public License -* along with this program; if not, write to the Free Software -* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. -* -* This program is also available under a commercial proprietary license. -* For more information, contact us at license @ x265.com. -*****************************************************************************/ - -#include "abrEncApp.h" -#include "mv.h" -#include "slice.h" -#include "param.h" - -#include <signal.h> -#include <errno.h> - -#include <queue> - -using namespace X265_NS; - -/* Ctrl-C handler */ -static volatile sig_atomic_t b_ctrl_c /* = 0 */; -static void sigint_handler(int) -{ - b_ctrl_c = 1; -} - -namespace X265_NS { - // private namespace -#define X265_INPUT_QUEUE_SIZE 250 - - AbrEncoder::AbrEncoder(CLIOptions cliopt, uint8_t numEncodes, int &ret) - { - m_numEncodes = numEncodes; - m_numActiveEncodes.set(numEncodes); - m_queueSize = (numEncodes > 1) ? X265_INPUT_QUEUE_SIZE : 1; - m_passEnc = X265_MALLOC(PassEncoder*, m_numEncodes); - - for (uint8_t i = 0; i < m_numEncodes; i++) - { - m_passEnci = new PassEncoder(i, cliopti, this); - if (!m_passEnci) - { - x265_log(NULL, X265_LOG_ERROR, "Unable to allocate memory for passEncoder\n"); - ret = 4; - } - m_passEnci->init(ret); - } - - if (!allocBuffers()) - { - x265_log(NULL, X265_LOG_ERROR, "Unable to allocate memory for buffers\n"); - ret = 4; - } - - /* start passEncoder worker threads */ - for (uint8_t pass = 0; pass < m_numEncodes; pass++) - m_passEncpass->startThreads(); - } - - bool AbrEncoder::allocBuffers() - { - m_inputPicBuffer = X265_MALLOC(x265_picture**, m_numEncodes); - m_analysisBuffer = X265_MALLOC(x265_analysis_data*, m_numEncodes); - - m_picWriteCnt = new ThreadSafeIntegerm_numEncodes; - m_picReadCnt = new ThreadSafeIntegerm_numEncodes; - m_analysisWriteCnt = new ThreadSafeIntegerm_numEncodes; - m_analysisReadCnt = new ThreadSafeIntegerm_numEncodes; - - m_picIdxReadCnt = X265_MALLOC(ThreadSafeInteger*, m_numEncodes); - m_analysisWrite = X265_MALLOC(ThreadSafeInteger*, m_numEncodes); - m_analysisRead = X265_MALLOC(ThreadSafeInteger*, m_numEncodes); - m_readFlag = X265_MALLOC(int*, m_numEncodes); - - for (uint8_t pass = 0; pass < m_numEncodes; pass++) - { - m_inputPicBufferpass = X265_MALLOC(x265_picture*, m_queueSize); - for (uint32_t idx = 0; idx < m_queueSize; idx++) - { - m_inputPicBufferpassidx = x265_picture_alloc(); - x265_picture_init(m_passEncpass->m_param, m_inputPicBufferpassidx); - } - - CHECKED_MALLOC_ZERO(m_analysisBufferpass, x265_analysis_data, m_queueSize); - m_picIdxReadCntpass = new ThreadSafeIntegerm_queueSize; - m_analysisWritepass = new ThreadSafeIntegerm_queueSize; - m_analysisReadpass = new ThreadSafeIntegerm_queueSize; - m_readFlagpass = X265_MALLOC(int, m_queueSize); - } - return true; - fail: - return false; - } - - void AbrEncoder::destroy() - { - x265_cleanup(); /* Free library singletons */ - for (uint8_t pass = 0; pass < m_numEncodes; pass++) - { - for (uint32_t index = 0; index < m_queueSize; index++) - { - X265_FREE(m_inputPicBufferpassindex->planes0); - x265_picture_free(m_inputPicBufferpassindex); - } - - X265_FREE(m_inputPicBufferpass); - X265_FREE(m_analysisBufferpass); - X265_FREE(m_readFlagpass); - delete m_picIdxReadCntpass; - delete m_analysisWritepass; - delete m_analysisReadpass; - m_passEncpass->destroy(); - delete m_passEncpass; - } - X265_FREE(m_inputPicBuffer); - X265_FREE(m_analysisBuffer); - X265_FREE(m_readFlag); - - delete m_picWriteCnt; - delete m_picReadCnt; - delete m_analysisWriteCnt; - delete m_analysisReadCnt; - - X265_FREE(m_picIdxReadCnt); - X265_FREE(m_analysisWrite); - X265_FREE(m_analysisRead); - - X265_FREE(m_passEnc); - } - - PassEncoder::PassEncoder(uint32_t id, CLIOptions cliopt, AbrEncoder *parent) - { - m_id = id; - m_cliopt = cliopt; - m_parent = parent; - if(!(m_cliopt.enableScaler && m_id)) - m_input = m_cliopt.input; - m_param = cliopt.param; - m_inputOver = false; - m_lastIdx = -1; - m_encoder = NULL; - m_scaler = NULL; - m_reader = NULL; - m_ret = 0; - } - - int PassEncoder::init(int &result) - { - if (m_parent->m_numEncodes > 1) - setReuseLevel(); - - if (!(m_cliopt.enableScaler && m_id)) - m_reader = new Reader(m_id, this); - else - { - VideoDesc *src = NULL, *dst = NULL; - dst = new VideoDesc(m_param->sourceWidth, m_param->sourceHeight, m_param->internalCsp, m_param->internalBitDepth); - int dstW = m_parent->m_passEncm_id - 1->m_param->sourceWidth; - int dstH = m_parent->m_passEncm_id - 1->m_param->sourceHeight; - src = new VideoDesc(dstW, dstH, m_param->internalCsp, m_param->internalBitDepth); - if (src != NULL && dst != NULL) - { - m_scaler = new Scaler(0, 1, m_id, src, dst, this); - if (!m_scaler) - { - x265_log(m_param, X265_LOG_ERROR, "\n MALLOC failure in Scaler"); - result = 4; - } - } - } - - /* note: we could try to acquire a different libx265 API here based on - * the profile found during option parsing, but it must be done before - * opening an encoder */ - - if (m_param) - m_encoder = m_cliopt.api->encoder_open(m_param); - if (!m_encoder) - { - x265_log(NULL, X265_LOG_ERROR, "x265_encoder_open() failed for Enc, \n"); - m_ret = 2; - return -1;
View file
x265_3.5.tar.gz/source/cmake/FindNeon.cmake
Changed
@@ -1,10 +1,19 @@ include(FindPackageHandleStandardArgs) # Check the version of neon supported by the ARM CPU -execute_process(COMMAND cat /proc/cpuinfo | grep Features | grep neon - OUTPUT_VARIABLE neon_version - ERROR_QUIET - OUTPUT_STRIP_TRAILING_WHITESPACE) +if(APPLE) + execute_process(COMMAND sysctl -a + COMMAND grep "hw.optional.neon: 1" + OUTPUT_VARIABLE neon_version + ERROR_QUIET + OUTPUT_STRIP_TRAILING_WHITESPACE) +else() + execute_process(COMMAND cat /proc/cpuinfo | grep Features | grep neon + OUTPUT_VARIABLE neon_version + ERROR_QUIET + OUTPUT_STRIP_TRAILING_WHITESPACE) +endif() + if(neon_version) set(CPU_HAS_NEON 1) endif()
View file
x265_3.5.tar.gz/source/common/CMakeLists.txt
Changed
@@ -84,35 +84,38 @@ endif(ENABLE_ASSEMBLY AND X86) if(ENABLE_ASSEMBLY AND (ARM OR CROSS_COMPILE_ARM)) - if(ARM64) - if(GCC AND (CMAKE_CXX_FLAGS_RELEASE MATCHES "-O3")) - message(STATUS "Detected CXX compiler using -O3 optimization level") - add_definitions(-DAUTO_VECTORIZE=1) - endif() - set(C_SRCS asm-primitives.cpp pixel.h ipfilter8.h) - - # add ARM assembly/intrinsic files here - set(A_SRCS asm.S mc-a.S sad-a.S pixel-util.S ipfilter8.S) - set(VEC_PRIMITIVES) + set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h) - set(ARM_ASMS "${A_SRCS}" CACHE INTERNAL "ARM Assembly Sources") - foreach(SRC ${C_SRCS}) - set(ASM_PRIMITIVES ${ASM_PRIMITIVES} aarch64/${SRC}) - endforeach() - else() - set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h) + # add ARM assembly/intrinsic files here + set(A_SRCS asm.S cpu-a.S mc-a.S sad-a.S pixel-util.S ssd-a.S blockcopy8.S ipfilter8.S dct-a.S) + set(VEC_PRIMITIVES) - # add ARM assembly/intrinsic files here - set(A_SRCS asm.S cpu-a.S mc-a.S sad-a.S pixel-util.S ssd-a.S blockcopy8.S ipfilter8.S dct-a.S) - set(VEC_PRIMITIVES) + set(ARM_ASMS "${A_SRCS}" CACHE INTERNAL "ARM Assembly Sources") + foreach(SRC ${C_SRCS}) + set(ASM_PRIMITIVES ${ASM_PRIMITIVES} arm/${SRC}) + endforeach() + source_group(Assembly FILES ${ASM_PRIMITIVES}) +endif(ENABLE_ASSEMBLY AND (ARM OR CROSS_COMPILE_ARM)) - set(ARM_ASMS "${A_SRCS}" CACHE INTERNAL "ARM Assembly Sources") - foreach(SRC ${C_SRCS}) - set(ASM_PRIMITIVES ${ASM_PRIMITIVES} arm/${SRC}) - endforeach() +if(ENABLE_ASSEMBLY AND (ARM64 OR CROSS_COMPILE_ARM64)) + if(GCC AND (CMAKE_CXX_FLAGS_RELEASE MATCHES "-O3")) + message(STATUS "Detected CXX compiler using -O3 optimization level") + add_definitions(-DAUTO_VECTORIZE=1) endif() + + set(C_SRCS asm-primitives.cpp pixel-prim.h pixel-prim.cpp filter-prim.h filter-prim.cpp dct-prim.h dct-prim.cpp loopfilter-prim.cpp loopfilter-prim.h intrapred-prim.cpp arm64-utils.cpp arm64-utils.h fun-decls.h) + enable_language(ASM) + + # add ARM assembly/intrinsic files here + set(A_SRCS asm.S mc-a.S sad-a.S pixel-util.S p2s.S ipfilter.S blockcopy8.S ssd-a.S) + set(VEC_PRIMITIVES) + + set(ARM_ASMS "${A_SRCS}" CACHE INTERNAL "ARM Assembly Sources") + foreach(SRC ${C_SRCS}) + set(ASM_PRIMITIVES ${ASM_PRIMITIVES} aarch64/${SRC}) + endforeach() source_group(Assembly FILES ${ASM_PRIMITIVES}) -endif(ENABLE_ASSEMBLY AND (ARM OR CROSS_COMPILE_ARM)) +endif(ENABLE_ASSEMBLY AND (ARM64 OR CROSS_COMPILE_ARM64)) if(POWER) set_source_files_properties(version.cpp PROPERTIES COMPILE_FLAGS -DX265_VERSION=${X265_VERSION}) @@ -169,4 +172,6 @@ scalinglist.cpp scalinglist.h quant.cpp quant.h contexts.h deblock.cpp deblock.h - scaler.cpp scaler.h) + scaler.cpp scaler.h + ringmem.cpp ringmem.h + temporalfilter.cpp temporalfilter.h)
View file
x265_3.5.tar.gz/source/common/aarch64/arm64-utils.cpp
Added
@@ -0,0 +1,300 @@ +#include "common.h" +#include "x265.h" +#include "arm64-utils.h" +#include <arm_neon.h> + +#define COPY_16(d,s) *(uint8x16_t *)(d) = *(uint8x16_t *)(s) +namespace X265_NS +{ + + + +void transpose8x8(uint8_t *dst, const uint8_t *src, intptr_t dstride, intptr_t sstride) +{ + uint8x8_t a0, a1, a2, a3, a4, a5, a6, a7; + uint8x8_t b0, b1, b2, b3, b4, b5, b6, b7; + + a0 = *(uint8x8_t *)(src + 0 * sstride); + a1 = *(uint8x8_t *)(src + 1 * sstride); + a2 = *(uint8x8_t *)(src + 2 * sstride); + a3 = *(uint8x8_t *)(src + 3 * sstride); + a4 = *(uint8x8_t *)(src + 4 * sstride); + a5 = *(uint8x8_t *)(src + 5 * sstride); + a6 = *(uint8x8_t *)(src + 6 * sstride); + a7 = *(uint8x8_t *)(src + 7 * sstride); + + b0 = vtrn1_u32(a0, a4); + b1 = vtrn1_u32(a1, a5); + b2 = vtrn1_u32(a2, a6); + b3 = vtrn1_u32(a3, a7); + b4 = vtrn2_u32(a0, a4); + b5 = vtrn2_u32(a1, a5); + b6 = vtrn2_u32(a2, a6); + b7 = vtrn2_u32(a3, a7); + + a0 = vtrn1_u16(b0, b2); + a1 = vtrn1_u16(b1, b3); + a2 = vtrn2_u16(b0, b2); + a3 = vtrn2_u16(b1, b3); + a4 = vtrn1_u16(b4, b6); + a5 = vtrn1_u16(b5, b7); + a6 = vtrn2_u16(b4, b6); + a7 = vtrn2_u16(b5, b7); + + b0 = vtrn1_u8(a0, a1); + b1 = vtrn2_u8(a0, a1); + b2 = vtrn1_u8(a2, a3); + b3 = vtrn2_u8(a2, a3); + b4 = vtrn1_u8(a4, a5); + b5 = vtrn2_u8(a4, a5); + b6 = vtrn1_u8(a6, a7); + b7 = vtrn2_u8(a6, a7); + + *(uint8x8_t *)(dst + 0 * dstride) = b0; + *(uint8x8_t *)(dst + 1 * dstride) = b1; + *(uint8x8_t *)(dst + 2 * dstride) = b2; + *(uint8x8_t *)(dst + 3 * dstride) = b3; + *(uint8x8_t *)(dst + 4 * dstride) = b4; + *(uint8x8_t *)(dst + 5 * dstride) = b5; + *(uint8x8_t *)(dst + 6 * dstride) = b6; + *(uint8x8_t *)(dst + 7 * dstride) = b7; +} + + + + + + +void transpose16x16(uint8_t *dst, const uint8_t *src, intptr_t dstride, intptr_t sstride) +{ + uint16x8_t a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, aA, aB, aC, aD, aE, aF; + uint16x8_t b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, bA, bB, bC, bD, bE, bF; + uint16x8_t c0, c1, c2, c3, c4, c5, c6, c7, c8, c9, cA, cB, cC, cD, cE, cF; + uint16x8_t d0, d1, d2, d3, d4, d5, d6, d7, d8, d9, dA, dB, dC, dD, dE, dF; + + a0 = *(uint16x8_t *)(src + 0 * sstride); + a1 = *(uint16x8_t *)(src + 1 * sstride); + a2 = *(uint16x8_t *)(src + 2 * sstride); + a3 = *(uint16x8_t *)(src + 3 * sstride); + a4 = *(uint16x8_t *)(src + 4 * sstride); + a5 = *(uint16x8_t *)(src + 5 * sstride); + a6 = *(uint16x8_t *)(src + 6 * sstride); + a7 = *(uint16x8_t *)(src + 7 * sstride); + a8 = *(uint16x8_t *)(src + 8 * sstride); + a9 = *(uint16x8_t *)(src + 9 * sstride); + aA = *(uint16x8_t *)(src + 10 * sstride); + aB = *(uint16x8_t *)(src + 11 * sstride); + aC = *(uint16x8_t *)(src + 12 * sstride); + aD = *(uint16x8_t *)(src + 13 * sstride); + aE = *(uint16x8_t *)(src + 14 * sstride); + aF = *(uint16x8_t *)(src + 15 * sstride); + + b0 = vtrn1q_u64(a0, a8); + b1 = vtrn1q_u64(a1, a9); + b2 = vtrn1q_u64(a2, aA); + b3 = vtrn1q_u64(a3, aB); + b4 = vtrn1q_u64(a4, aC); + b5 = vtrn1q_u64(a5, aD); + b6 = vtrn1q_u64(a6, aE); + b7 = vtrn1q_u64(a7, aF); + b8 = vtrn2q_u64(a0, a8); + b9 = vtrn2q_u64(a1, a9); + bA = vtrn2q_u64(a2, aA); + bB = vtrn2q_u64(a3, aB); + bC = vtrn2q_u64(a4, aC); + bD = vtrn2q_u64(a5, aD); + bE = vtrn2q_u64(a6, aE); + bF = vtrn2q_u64(a7, aF); + + c0 = vtrn1q_u32(b0, b4); + c1 = vtrn1q_u32(b1, b5); + c2 = vtrn1q_u32(b2, b6); + c3 = vtrn1q_u32(b3, b7); + c4 = vtrn2q_u32(b0, b4); + c5 = vtrn2q_u32(b1, b5); + c6 = vtrn2q_u32(b2, b6); + c7 = vtrn2q_u32(b3, b7); + c8 = vtrn1q_u32(b8, bC); + c9 = vtrn1q_u32(b9, bD); + cA = vtrn1q_u32(bA, bE); + cB = vtrn1q_u32(bB, bF); + cC = vtrn2q_u32(b8, bC); + cD = vtrn2q_u32(b9, bD); + cE = vtrn2q_u32(bA, bE); + cF = vtrn2q_u32(bB, bF); + + d0 = vtrn1q_u16(c0, c2); + d1 = vtrn1q_u16(c1, c3); + d2 = vtrn2q_u16(c0, c2); + d3 = vtrn2q_u16(c1, c3); + d4 = vtrn1q_u16(c4, c6); + d5 = vtrn1q_u16(c5, c7); + d6 = vtrn2q_u16(c4, c6); + d7 = vtrn2q_u16(c5, c7); + d8 = vtrn1q_u16(c8, cA); + d9 = vtrn1q_u16(c9, cB); + dA = vtrn2q_u16(c8, cA); + dB = vtrn2q_u16(c9, cB); + dC = vtrn1q_u16(cC, cE); + dD = vtrn1q_u16(cD, cF); + dE = vtrn2q_u16(cC, cE); + dF = vtrn2q_u16(cD, cF); + + *(uint16x8_t *)(dst + 0 * dstride) = vtrn1q_u8(d0, d1); + *(uint16x8_t *)(dst + 1 * dstride) = vtrn2q_u8(d0, d1); + *(uint16x8_t *)(dst + 2 * dstride) = vtrn1q_u8(d2, d3); + *(uint16x8_t *)(dst + 3 * dstride) = vtrn2q_u8(d2, d3); + *(uint16x8_t *)(dst + 4 * dstride) = vtrn1q_u8(d4, d5); + *(uint16x8_t *)(dst + 5 * dstride) = vtrn2q_u8(d4, d5); + *(uint16x8_t *)(dst + 6 * dstride) = vtrn1q_u8(d6, d7); + *(uint16x8_t *)(dst + 7 * dstride) = vtrn2q_u8(d6, d7); + *(uint16x8_t *)(dst + 8 * dstride) = vtrn1q_u8(d8, d9); + *(uint16x8_t *)(dst + 9 * dstride) = vtrn2q_u8(d8, d9); + *(uint16x8_t *)(dst + 10 * dstride) = vtrn1q_u8(dA, dB); + *(uint16x8_t *)(dst + 11 * dstride) = vtrn2q_u8(dA, dB); + *(uint16x8_t *)(dst + 12 * dstride) = vtrn1q_u8(dC, dD); + *(uint16x8_t *)(dst + 13 * dstride) = vtrn2q_u8(dC, dD); + *(uint16x8_t *)(dst + 14 * dstride) = vtrn1q_u8(dE, dF); + *(uint16x8_t *)(dst + 15 * dstride) = vtrn2q_u8(dE, dF); + + +} + + +void transpose32x32(uint8_t *dst, const uint8_t *src, intptr_t dstride, intptr_t sstride) +{ + //assumption: there is no partial overlap + transpose16x16(dst, src, dstride, sstride); + transpose16x16(dst + 16 * dstride + 16, src + 16 * sstride + 16, dstride, sstride); + if (dst == src) + { + uint8_t tmp16 * 16 __attribute__((aligned(64))); + transpose16x16(tmp, src + 16, 16, sstride); + transpose16x16(dst + 16, src + 16 * sstride, dstride, sstride); + for (int i = 0; i < 16; i++) + { + COPY_16(dst + (16 + i)*dstride, tmp + 16 * i); + } + } + else + { + transpose16x16(dst + 16 * dstride, src + 16, dstride, sstride); + transpose16x16(dst + 16, src + 16 * sstride, dstride, sstride); + } + +} + + + +void transpose8x8(uint16_t *dst, const uint16_t *src, intptr_t dstride, intptr_t sstride) +{ + uint16x8_t a0, a1, a2, a3, a4, a5, a6, a7; + uint16x8_t b0, b1, b2, b3, b4, b5, b6, b7; + + a0 = *(uint16x8_t *)(src + 0 * sstride); + a1 = *(uint16x8_t *)(src + 1 * sstride); + a2 = *(uint16x8_t *)(src + 2 * sstride); + a3 = *(uint16x8_t *)(src + 3 * sstride); + a4 = *(uint16x8_t *)(src + 4 * sstride); + a5 = *(uint16x8_t *)(src + 5 * sstride);
View file
x265_3.5.tar.gz/source/common/aarch64/arm64-utils.h
Added
@@ -0,0 +1,15 @@ +#ifndef __ARM64_UTILS_H__ +#define __ARM64_UTILS_H__ + + +namespace X265_NS +{ +void transpose8x8(uint8_t *dst, const uint8_t *src, intptr_t dstride, intptr_t sstride); +void transpose16x16(uint8_t *dst, const uint8_t *src, intptr_t dstride, intptr_t sstride); +void transpose32x32(uint8_t *dst, const uint8_t *src, intptr_t dstride, intptr_t sstride); +void transpose8x8(uint16_t *dst, const uint16_t *src, intptr_t dstride, intptr_t sstride); +void transpose16x16(uint16_t *dst, const uint16_t *src, intptr_t dstride, intptr_t sstride); +void transpose32x32(uint16_t *dst, const uint16_t *src, intptr_t dstride, intptr_t sstride); +} + +#endif
View file
x265_3.5.tar.gz/source/common/aarch64/asm-primitives.cpp
Changed
@@ -3,6 +3,7 @@ * * Authors: Hongbin Liu <liuhongbin1@huawei.com> * Yimeng Su <yimeng.su@huawei.com> + * Sebastian Pop <spop@amazon.com> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -22,11 +23,267 @@ * For more information, contact us at license @ x265.com. *****************************************************************************/ + #include "common.h" #include "primitives.h" #include "x265.h" #include "cpu.h" +extern "C" { +#include "fun-decls.h" +} + +#define ALL_LUMA_TU_TYPED(prim, fncdef, fname, cpu) \ + p.cuBLOCK_4x4.prim = fncdef PFX(fname ## _4x4_ ## cpu); \ + p.cuBLOCK_8x8.prim = fncdef PFX(fname ## _8x8_ ## cpu); \ + p.cuBLOCK_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \ + p.cuBLOCK_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \ + p.cuBLOCK_64x64.prim = fncdef PFX(fname ## _64x64_ ## cpu) + +#define ALL_LUMA_TU(prim, fname, cpu) ALL_LUMA_TU_TYPED(prim, , fname, cpu) + +#define ALL_LUMA_PU_TYPED(prim, fncdef, fname, cpu) \ + p.puLUMA_4x4.prim = fncdef PFX(fname ## _4x4_ ## cpu); \ + p.puLUMA_8x8.prim = fncdef PFX(fname ## _8x8_ ## cpu); \ + p.puLUMA_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \ + p.puLUMA_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \ + p.puLUMA_64x64.prim = fncdef PFX(fname ## _64x64_ ## cpu); \ + p.puLUMA_8x4.prim = fncdef PFX(fname ## _8x4_ ## cpu); \ + p.puLUMA_4x8.prim = fncdef PFX(fname ## _4x8_ ## cpu); \ + p.puLUMA_16x8.prim = fncdef PFX(fname ## _16x8_ ## cpu); \ + p.puLUMA_8x16.prim = fncdef PFX(fname ## _8x16_ ## cpu); \ + p.puLUMA_16x32.prim = fncdef PFX(fname ## _16x32_ ## cpu); \ + p.puLUMA_32x16.prim = fncdef PFX(fname ## _32x16_ ## cpu); \ + p.puLUMA_64x32.prim = fncdef PFX(fname ## _64x32_ ## cpu); \ + p.puLUMA_32x64.prim = fncdef PFX(fname ## _32x64_ ## cpu); \ + p.puLUMA_16x12.prim = fncdef PFX(fname ## _16x12_ ## cpu); \ + p.puLUMA_12x16.prim = fncdef PFX(fname ## _12x16_ ## cpu); \ + p.puLUMA_16x4.prim = fncdef PFX(fname ## _16x4_ ## cpu); \ + p.puLUMA_4x16.prim = fncdef PFX(fname ## _4x16_ ## cpu); \ + p.puLUMA_32x24.prim = fncdef PFX(fname ## _32x24_ ## cpu); \ + p.puLUMA_24x32.prim = fncdef PFX(fname ## _24x32_ ## cpu); \ + p.puLUMA_32x8.prim = fncdef PFX(fname ## _32x8_ ## cpu); \ + p.puLUMA_8x32.prim = fncdef PFX(fname ## _8x32_ ## cpu); \ + p.puLUMA_64x48.prim = fncdef PFX(fname ## _64x48_ ## cpu); \ + p.puLUMA_48x64.prim = fncdef PFX(fname ## _48x64_ ## cpu); \ + p.puLUMA_64x16.prim = fncdef PFX(fname ## _64x16_ ## cpu); \ + p.puLUMA_16x64.prim = fncdef PFX(fname ## _16x64_ ## cpu) +#define ALL_LUMA_PU(prim, fname, cpu) ALL_LUMA_PU_TYPED(prim, , fname, cpu) + +#define ALL_LUMA_PU_T(prim, fname) \ + p.puLUMA_4x4.prim = fname<LUMA_4x4>; \ + p.puLUMA_8x8.prim = fname<LUMA_8x8>; \ + p.puLUMA_16x16.prim = fname<LUMA_16x16>; \ + p.puLUMA_32x32.prim = fname<LUMA_32x32>; \ + p.puLUMA_64x64.prim = fname<LUMA_64x64>; \ + p.puLUMA_8x4.prim = fname<LUMA_8x4>; \ + p.puLUMA_4x8.prim = fname<LUMA_4x8>; \ + p.puLUMA_16x8.prim = fname<LUMA_16x8>; \ + p.puLUMA_8x16.prim = fname<LUMA_8x16>; \ + p.puLUMA_16x32.prim = fname<LUMA_16x32>; \ + p.puLUMA_32x16.prim = fname<LUMA_32x16>; \ + p.puLUMA_64x32.prim = fname<LUMA_64x32>; \ + p.puLUMA_32x64.prim = fname<LUMA_32x64>; \ + p.puLUMA_16x12.prim = fname<LUMA_16x12>; \ + p.puLUMA_12x16.prim = fname<LUMA_12x16>; \ + p.puLUMA_16x4.prim = fname<LUMA_16x4>; \ + p.puLUMA_4x16.prim = fname<LUMA_4x16>; \ + p.puLUMA_32x24.prim = fname<LUMA_32x24>; \ + p.puLUMA_24x32.prim = fname<LUMA_24x32>; \ + p.puLUMA_32x8.prim = fname<LUMA_32x8>; \ + p.puLUMA_8x32.prim = fname<LUMA_8x32>; \ + p.puLUMA_64x48.prim = fname<LUMA_64x48>; \ + p.puLUMA_48x64.prim = fname<LUMA_48x64>; \ + p.puLUMA_64x16.prim = fname<LUMA_64x16>; \ + p.puLUMA_16x64.prim = fname<LUMA_16x64> + +#define ALL_CHROMA_420_PU_TYPED(prim, fncdef, fname, cpu) \ + p.chromaX265_CSP_I420.puCHROMA_420_4x4.prim = fncdef PFX(fname ## _4x4_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_8x8.prim = fncdef PFX(fname ## _8x8_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_4x2.prim = fncdef PFX(fname ## _4x2_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_2x4.prim = fncdef PFX(fname ## _2x4_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_8x4.prim = fncdef PFX(fname ## _8x4_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_4x8.prim = fncdef PFX(fname ## _4x8_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_16x8.prim = fncdef PFX(fname ## _16x8_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_8x16.prim = fncdef PFX(fname ## _8x16_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_32x16.prim = fncdef PFX(fname ## _32x16_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_16x32.prim = fncdef PFX(fname ## _16x32_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_8x6.prim = fncdef PFX(fname ## _8x6_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_6x8.prim = fncdef PFX(fname ## _6x8_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_8x2.prim = fncdef PFX(fname ## _8x2_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_2x8.prim = fncdef PFX(fname ## _2x8_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_16x12.prim = fncdef PFX(fname ## _16x12_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_12x16.prim = fncdef PFX(fname ## _12x16_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_16x4.prim = fncdef PFX(fname ## _16x4_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_4x16.prim = fncdef PFX(fname ## _4x16_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_32x24.prim = fncdef PFX(fname ## _32x24_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_24x32.prim = fncdef PFX(fname ## _24x32_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_32x8.prim = fncdef PFX(fname ## _32x8_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_8x32.prim = fncdef PFX(fname ## _8x32_ ## cpu) +#define ALL_CHROMA_420_PU(prim, fname, cpu) ALL_CHROMA_420_PU_TYPED(prim, , fname, cpu) + +#define ALL_CHROMA_420_4x4_PU_TYPED(prim, fncdef, fname, cpu) \ + p.chromaX265_CSP_I420.puCHROMA_420_4x4.prim = fncdef PFX(fname ## _4x4_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_8x2.prim = fncdef PFX(fname ## _8x2_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_8x8.prim = fncdef PFX(fname ## _8x8_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_8x4.prim = fncdef PFX(fname ## _8x4_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_8x6.prim = fncdef PFX(fname ## _8x6_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_4x8.prim = fncdef PFX(fname ## _4x8_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_16x8.prim = fncdef PFX(fname ## _16x8_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_8x16.prim = fncdef PFX(fname ## _8x16_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_32x16.prim = fncdef PFX(fname ## _32x16_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_16x32.prim = fncdef PFX(fname ## _16x32_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_16x12.prim = fncdef PFX(fname ## _16x12_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_12x16.prim = fncdef PFX(fname ## _12x16_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_16x4.prim = fncdef PFX(fname ## _16x4_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_4x16.prim = fncdef PFX(fname ## _4x16_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_32x24.prim = fncdef PFX(fname ## _32x24_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_24x32.prim = fncdef PFX(fname ## _24x32_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_32x8.prim = fncdef PFX(fname ## _32x8_ ## cpu); \ + p.chromaX265_CSP_I420.puCHROMA_420_8x32.prim = fncdef PFX(fname ## _8x32_ ## cpu) +#define ALL_CHROMA_420_4x4_PU(prim, fname, cpu) ALL_CHROMA_420_4x4_PU_TYPED(prim, , fname, cpu) + +#define ALL_CHROMA_422_PU_TYPED(prim, fncdef, fname, cpu) \ + p.chromaX265_CSP_I422.puCHROMA_422_4x8.prim = fncdef PFX(fname ## _4x8_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_8x16.prim = fncdef PFX(fname ## _8x16_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_16x32.prim = fncdef PFX(fname ## _16x32_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_32x64.prim = fncdef PFX(fname ## _32x64_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_4x4.prim = fncdef PFX(fname ## _4x4_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_2x8.prim = fncdef PFX(fname ## _2x8_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_8x8.prim = fncdef PFX(fname ## _8x8_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_4x16.prim = fncdef PFX(fname ## _4x16_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_8x32.prim = fncdef PFX(fname ## _8x32_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_16x64.prim = fncdef PFX(fname ## _16x64_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_8x12.prim = fncdef PFX(fname ## _8x12_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_6x16.prim = fncdef PFX(fname ## _6x16_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_8x4.prim = fncdef PFX(fname ## _8x4_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_2x16.prim = fncdef PFX(fname ## _2x16_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_16x24.prim = fncdef PFX(fname ## _16x24_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_12x32.prim = fncdef PFX(fname ## _12x32_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_16x8.prim = fncdef PFX(fname ## _16x8_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_4x32.prim = fncdef PFX(fname ## _4x32_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_32x48.prim = fncdef PFX(fname ## _32x48_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_24x64.prim = fncdef PFX(fname ## _24x64_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_32x16.prim = fncdef PFX(fname ## _32x16_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_8x64.prim = fncdef PFX(fname ## _8x64_ ## cpu) +#define ALL_CHROMA_422_PU(prim, fname, cpu) ALL_CHROMA_422_PU_TYPED(prim, , fname, cpu) + +#define ALL_CHROMA_422_PU_TYPED_1(prim, fncdef, fname, cpu) \ + p.chromaX265_CSP_I422.puCHROMA_422_4x8.prim = fncdef PFX(fname ## _4x8_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_8x16.prim = fncdef PFX(fname ## _8x16_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_16x32.prim = fncdef PFX(fname ## _16x32_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_32x64.prim = fncdef PFX(fname ## _32x64_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_4x4.prim = fncdef PFX(fname ## _4x4_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_8x8.prim = fncdef PFX(fname ## _8x8_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_4x16.prim = fncdef PFX(fname ## _4x16_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_8x32.prim = fncdef PFX(fname ## _8x32_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_16x64.prim = fncdef PFX(fname ## _16x64_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_8x12.prim = fncdef PFX(fname ## _8x12_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_8x4.prim = fncdef PFX(fname ## _8x4_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_16x24.prim = fncdef PFX(fname ## _16x24_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_12x32.prim = fncdef PFX(fname ## _12x32_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_16x8.prim = fncdef PFX(fname ## _16x8_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_4x32.prim = fncdef PFX(fname ## _4x32_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_32x48.prim = fncdef PFX(fname ## _32x48_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_24x64.prim = fncdef PFX(fname ## _24x64_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_32x16.prim = fncdef PFX(fname ## _32x16_ ## cpu); \ + p.chromaX265_CSP_I422.puCHROMA_422_8x64.prim = fncdef PFX(fname ## _8x64_ ## cpu) +#define ALL_CHROMA_422_PU_1(prim, fname, cpu) ALL_CHROMA_422_PU_TYPED_1(prim, , fname, cpu) + +#define ALL_CHROMA_444_PU_TYPED(prim, fncdef, fname, cpu) \ + p.chromaX265_CSP_I444.puLUMA_4x4.prim = fncdef PFX(fname ## _4x4_ ## cpu); \ + p.chromaX265_CSP_I444.puLUMA_8x8.prim = fncdef PFX(fname ## _8x8_ ## cpu); \ + p.chromaX265_CSP_I444.puLUMA_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \ + p.chromaX265_CSP_I444.puLUMA_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \ + p.chromaX265_CSP_I444.puLUMA_64x64.prim = fncdef PFX(fname ## _64x64_ ## cpu); \ + p.chromaX265_CSP_I444.puLUMA_8x4.prim = fncdef PFX(fname ## _8x4_ ## cpu); \ + p.chromaX265_CSP_I444.puLUMA_4x8.prim = fncdef PFX(fname ## _4x8_ ## cpu); \ + p.chromaX265_CSP_I444.puLUMA_16x8.prim = fncdef PFX(fname ## _16x8_ ## cpu); \ + p.chromaX265_CSP_I444.puLUMA_8x16.prim = fncdef PFX(fname ## _8x16_ ## cpu); \ + p.chromaX265_CSP_I444.puLUMA_16x32.prim = fncdef PFX(fname ## _16x32_ ## cpu); \ + p.chromaX265_CSP_I444.puLUMA_32x16.prim = fncdef PFX(fname ## _32x16_ ## cpu); \ + p.chromaX265_CSP_I444.puLUMA_64x32.prim = fncdef PFX(fname ## _64x32_ ## cpu); \
View file
x265_3.5.tar.gz/source/common/aarch64/asm.S
Changed
@@ -1,7 +1,8 @@ /***************************************************************************** - * Copyright (C) 2020 MulticoreWare, Inc + * Copyright (C) 2020-2021 MulticoreWare, Inc * * Authors: Hongbin Liu <liuhongbin1@huawei.com> + * Sebastian Pop <spop@amazon.com> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -23,32 +24,64 @@ .arch armv8-a +#define PFX3(prefix, name) prefix ## _ ## name +#define PFX2(prefix, name) PFX3(prefix, name) +#define PFX(name) PFX2(X265_NS, name) + +#ifdef __APPLE__ +#define PREFIX 1 +#endif + #ifdef PREFIX #define EXTERN_ASM _ +#define HAVE_AS_FUNC 0 +#elif defined __clang__ +#define EXTERN_ASM +#define HAVE_AS_FUNC 0 +#define PREFIX 1 #else #define EXTERN_ASM +#define HAVE_AS_FUNC 1 #endif #ifdef __ELF__ #define ELF #else +#ifdef PREFIX +#define ELF # +#else #define ELF @ #endif - -#define HAVE_AS_FUNC 1 +#endif #if HAVE_AS_FUNC #define FUNC #else +#ifdef PREFIX +#define FUNC # +#else #define FUNC @ #endif +#endif + +#ifdef __APPLE__ +.macro endfunc +ELF .size \name, . - \name +FUNC .endfunc +.endm +#endif .macro function name, export=1 +#ifdef __APPLE__ + .global \name + endfunc +#else .macro endfunc ELF .size \name, . - \name FUNC .endfunc .purgem endfunc .endm +#endif .align 2 .if \export == 1 .global EXTERN_ASM\name @@ -64,6 +97,83 @@ .endif .endm +.macro const name, align=2 + .macro endconst +ELF .size \name, . - \name + .purgem endconst + .endm +#ifdef __MACH__ + .const_data +#else + .section .rodata +#endif + .align \align +\name: +.endm + +.macro movrel rd, val, offset=0 +#if defined(__APPLE__) + .if \offset < 0 + adrp \rd, \val@PAGE + add \rd, \rd, \val@PAGEOFF + sub \rd, \rd, -(\offset) + .else + adrp \rd, \val+(\offset)@PAGE + add \rd, \rd, \val+(\offset)@PAGEOFF + .endif +#elif defined(PIC) && defined(_WIN32) + .if \offset < 0 + adrp \rd, \val + add \rd, \rd, :lo12:\val + sub \rd, \rd, -(\offset) + .else + adrp \rd, \val+(\offset) + add \rd, \rd, :lo12:\val+(\offset) + .endif +#elif defined(PIC) + adrp \rd, \val+(\offset) + add \rd, \rd, :lo12:\val+(\offset) +#else + ldr \rd, =\val+\offset +#endif +.endm #define FENC_STRIDE 64 #define FDEC_STRIDE 32 + +.macro SUMSUB_AB sum, diff, a, b + add \sum, \a, \b + sub \diff, \a, \b +.endm + +.macro SUMSUB_ABCD s1, d1, s2, d2, a, b, c, d + SUMSUB_AB \s1, \d1, \a, \b + SUMSUB_AB \s2, \d2, \c, \d +.endm + +.macro HADAMARD4_V r1, r2, r3, r4, t1, t2, t3, t4 + SUMSUB_ABCD \t1, \t2, \t3, \t4, \r1, \r2, \r3, \r4 + SUMSUB_ABCD \r1, \r3, \r2, \r4, \t1, \t3, \t2, \t4 +.endm + +.macro ABS2 a b + abs \a, \a + abs \b, \b +.endm + +.macro ABS8 v0, v1, v2, v3, v4, v5, v6, v7 + ABS2 \v0, \v1 + ABS2 \v2, \v3 + ABS2 \v4, \v5 + ABS2 \v6, \v7 +.endm + +.macro vtrn t1, t2, s1, s2 + trn1 \t1, \s1, \s2 + trn2 \t2, \s1, \s2 +.endm + +.macro trn4 t1, t2, t3, t4, s1, s2, s3, s4 + vtrn \t1, \t2, \s1, \s2 + vtrn \t3, \t4, \s3, \s4 +.endm
View file
x265_3.5.tar.gz/source/common/aarch64/blockcopy8.S
Added
@@ -0,0 +1,1322 @@ +/***************************************************************************** + * Copyright (C) 2021 MulticoreWare, Inc + * + * Authors: Sebastian Pop <spop@amazon.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. + * + * This program is also available under a commercial proprietary license. + * For more information, contact us at license @ x265.com. + *****************************************************************************/ + +#include "asm.S" + +#ifdef __APPLE__ +.section __RODATA,__rodata +#else +.section .rodata +#endif + +.align 4 + +.text + +/* void blockcopy_sp(pixel* a, intptr_t stridea, const int16_t* b, intptr_t strideb) + * + * r0 - a + * r1 - stridea + * r2 - b + * r3 - strideb */ +function PFX(blockcopy_sp_4x4_neon) + lsl x3, x3, #1 +.rept 2 + ld1 {v0.8h}, x2, x3 + ld1 {v1.8h}, x2, x3 + xtn v0.8b, v0.8h + xtn v1.8b, v1.8h + st1 {v0.s}0, x0, x1 + st1 {v1.s}0, x0, x1 +.endr + ret +endfunc + +function PFX(blockcopy_sp_8x8_neon) + lsl x3, x3, #1 +.rept 4 + ld1 {v0.8h}, x2, x3 + ld1 {v1.8h}, x2, x3 + xtn v0.8b, v0.8h + xtn v1.8b, v1.8h + st1 {v0.d}0, x0, x1 + st1 {v1.d}0, x0, x1 +.endr + ret +endfunc + +function PFX(blockcopy_sp_16x16_neon) + lsl x3, x3, #1 + movrel x11, xtn_xtn2_table + ld1 {v31.16b}, x11 +.rept 8 + ld1 {v0.8h-v1.8h}, x2, x3 + ld1 {v2.8h-v3.8h}, x2, x3 + tbl v0.16b, {v0.16b,v1.16b}, v31.16b + tbl v1.16b, {v2.16b,v3.16b}, v31.16b + st1 {v0.16b}, x0, x1 + st1 {v1.16b}, x0, x1 +.endr + ret +endfunc + +function PFX(blockcopy_sp_32x32_neon) + mov w12, #4 + lsl x3, x3, #1 + movrel x11, xtn_xtn2_table + ld1 {v31.16b}, x11 +.loop_csp32: + sub w12, w12, #1 +.rept 4 + ld1 {v0.8h-v3.8h}, x2, x3 + ld1 {v4.8h-v7.8h}, x2, x3 + tbl v0.16b, {v0.16b,v1.16b}, v31.16b + tbl v1.16b, {v2.16b,v3.16b}, v31.16b + tbl v2.16b, {v4.16b,v5.16b}, v31.16b + tbl v3.16b, {v6.16b,v7.16b}, v31.16b + st1 {v0.16b-v1.16b}, x0, x1 + st1 {v2.16b-v3.16b}, x0, x1 +.endr + cbnz w12, .loop_csp32 + ret +endfunc + +function PFX(blockcopy_sp_64x64_neon) + mov w12, #16 + lsl x3, x3, #1 + sub x3, x3, #64 + movrel x11, xtn_xtn2_table + ld1 {v31.16b}, x11 +.loop_csp64: + sub w12, w12, #1 +.rept 4 + ld1 {v0.8h-v3.8h}, x2, #64 + ld1 {v4.8h-v7.8h}, x2, x3 + tbl v0.16b, {v0.16b,v1.16b}, v31.16b + tbl v1.16b, {v2.16b,v3.16b}, v31.16b + tbl v2.16b, {v4.16b,v5.16b}, v31.16b + tbl v3.16b, {v6.16b,v7.16b}, v31.16b + st1 {v0.16b-v3.16b}, x0, x1 +.endr + cbnz w12, .loop_csp64 + ret +endfunc + +// void blockcopy_ps(int16_t* a, intptr_t stridea, const pixel* b, intptr_t strideb) +function PFX(blockcopy_ps_4x4_neon) + lsl x1, x1, #1 +.rept 2 + ld1 {v0.8b}, x2, x3 + ld1 {v1.8b}, x2, x3 + uxtl v0.8h, v0.8b + uxtl v1.8h, v1.8b + st1 {v0.4h}, x0, x1 + st1 {v1.4h}, x0, x1 +.endr + ret +endfunc + +function PFX(blockcopy_ps_8x8_neon) + lsl x1, x1, #1 +.rept 4 + ld1 {v0.8b}, x2, x3 + ld1 {v1.8b}, x2, x3 + uxtl v0.8h, v0.8b + uxtl v1.8h, v1.8b + st1 {v0.8h}, x0, x1 + st1 {v1.8h}, x0, x1 +.endr + ret +endfunc + +function PFX(blockcopy_ps_16x16_neon) + lsl x1, x1, #1 +.rept 8 + ld1 {v4.16b}, x2, x3 + ld1 {v5.16b}, x2, x3 + uxtl v0.8h, v4.8b + uxtl2 v1.8h, v4.16b + uxtl v2.8h, v5.8b + uxtl2 v3.8h, v5.16b + st1 {v0.8h-v1.8h}, x0, x1 + st1 {v2.8h-v3.8h}, x0, x1 +.endr + ret +endfunc + +function PFX(blockcopy_ps_32x32_neon) + lsl x1, x1, #1 + mov w12, #4 +.loop_cps32: + sub w12, w12, #1 +.rept 4 + ld1 {v16.16b-v17.16b}, x2, x3 + ld1 {v18.16b-v19.16b}, x2, x3 + uxtl v0.8h, v16.8b + uxtl2 v1.8h, v16.16b + uxtl v2.8h, v17.8b + uxtl2 v3.8h, v17.16b + uxtl v4.8h, v18.8b + uxtl2 v5.8h, v18.16b + uxtl v6.8h, v19.8b + uxtl2 v7.8h, v19.16b + st1 {v0.8h-v3.8h}, x0, x1 + st1 {v4.8h-v7.8h}, x0, x1 +.endr + cbnz w12, .loop_cps32 + ret +endfunc + +function PFX(blockcopy_ps_64x64_neon) + lsl x1, x1, #1 + sub x1, x1, #64 + mov w12, #16 +.loop_cps64: + sub w12, w12, #1 +.rept 4 + ld1 {v16.16b-v19.16b}, x2, x3 + uxtl v0.8h, v16.8b + uxtl2 v1.8h, v16.16b
View file
x265_3.5.tar.gz/source/common/aarch64/dct-prim.cpp
Added
@@ -0,0 +1,947 @@ +#include "dct-prim.h" + + +#if HAVE_NEON + +#include <arm_neon.h> + + +namespace +{ +using namespace X265_NS; + + +static int16x8_t rev16(const int16x8_t a) +{ + static const int8x16_t tbl = {14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1}; + return vqtbx1q_u8(a, a, tbl); +} + +static int32x4_t rev32(const int32x4_t a) +{ + static const int8x16_t tbl = {12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3}; + return vqtbx1q_u8(a, a, tbl); +} + +static void transpose_4x4x16(int16x4_t &x0, int16x4_t &x1, int16x4_t &x2, int16x4_t &x3) +{ + int16x4_t s0, s1, s2, s3; + s0 = vtrn1_s32(x0, x2); + s1 = vtrn1_s32(x1, x3); + s2 = vtrn2_s32(x0, x2); + s3 = vtrn2_s32(x1, x3); + + x0 = vtrn1_s16(s0, s1); + x1 = vtrn2_s16(s0, s1); + x2 = vtrn1_s16(s2, s3); + x3 = vtrn2_s16(s2, s3); +} + + + +static int scanPosLast_opt(const uint16_t *scan, const coeff_t *coeff, uint16_t *coeffSign, uint16_t *coeffFlag, + uint8_t *coeffNum, int numSig, const uint16_t * /*scanCG4x4*/, const int /*trSize*/) +{ + + // This is an optimized function for scanPosLast, which removes the rmw dependency, once integrated into mainline x265, should replace reference implementation + // For clarity, left the original reference code in comments + int scanPosLast = 0; + + uint16_t cSign = 0; + uint16_t cFlag = 0; + uint8_t cNum = 0; + + uint32_t prevcgIdx = 0; + do + { + const uint32_t cgIdx = (uint32_t)scanPosLast >> MLS_CG_SIZE; + + const uint32_t posLast = scanscanPosLast; + + const int curCoeff = coeffposLast; + const uint32_t isNZCoeff = (curCoeff != 0); + /* + NOTE: the new algorithm is complicated, so I keep reference code here + uint32_t posy = posLast >> log2TrSize; + uint32_t posx = posLast - (posy << log2TrSize); + uint32_t blkIdx0 = ((posy >> MLS_CG_LOG2_SIZE) << codingParameters.log2TrSizeCG) + (posx >> MLS_CG_LOG2_SIZE); + const uint32_t blkIdx = ((posLast >> (2 * MLS_CG_LOG2_SIZE)) & ~maskPosXY) + ((posLast >> MLS_CG_LOG2_SIZE) & maskPosXY); + sigCoeffGroupFlag64 |= ((uint64_t)isNZCoeff << blkIdx); + */ + + // get L1 sig map + numSig -= isNZCoeff; + + if (scanPosLast % (1 << MLS_CG_SIZE) == 0) + { + coeffSignprevcgIdx = cSign; + coeffFlagprevcgIdx = cFlag; + coeffNumprevcgIdx = cNum; + cSign = 0; + cFlag = 0; + cNum = 0; + } + // TODO: optimize by instruction BTS + cSign += (uint16_t)(((curCoeff < 0) ? 1 : 0) << cNum); + cFlag = (cFlag << 1) + (uint16_t)isNZCoeff; + cNum += (uint8_t)isNZCoeff; + prevcgIdx = cgIdx; + scanPosLast++; + } + while (numSig > 0); + + coeffSignprevcgIdx = cSign; + coeffFlagprevcgIdx = cFlag; + coeffNumprevcgIdx = cNum; + return scanPosLast - 1; +} + + +#if (MLS_CG_SIZE == 4) +template<int log2TrSize> +static void nonPsyRdoQuant_neon(int16_t *m_resiDctCoeff, int64_t *costUncoded, int64_t *totalUncodedCost, + int64_t *totalRdCost, uint32_t blkPos) +{ + const int transformShift = MAX_TR_DYNAMIC_RANGE - X265_DEPTH - + log2TrSize; /* Represents scaling through forward transform */ + const int scaleBits = SCALE_BITS - 2 * transformShift; + const uint32_t trSize = 1 << log2TrSize; + + int64x2_t vcost_sum_0 = vdupq_n_s64(0); + int64x2_t vcost_sum_1 = vdupq_n_s64(0); + for (int y = 0; y < MLS_CG_SIZE; y++) + { + int16x4_t in = *(int16x4_t *)&m_resiDctCoeffblkPos; + int32x4_t mul = vmull_s16(in, in); + int64x2_t cost0, cost1; + cost0 = vshll_n_s32(vget_low_s32(mul), scaleBits); + cost1 = vshll_high_n_s32(mul, scaleBits); + *(int64x2_t *)&costUncodedblkPos + 0 = cost0; + *(int64x2_t *)&costUncodedblkPos + 2 = cost1; + vcost_sum_0 = vaddq_s64(vcost_sum_0, cost0); + vcost_sum_1 = vaddq_s64(vcost_sum_1, cost1); + blkPos += trSize; + } + int64_t sum = vaddvq_s64(vaddq_s64(vcost_sum_0, vcost_sum_1)); + *totalUncodedCost += sum; + *totalRdCost += sum; +} + +template<int log2TrSize> +static void psyRdoQuant_neon(int16_t *m_resiDctCoeff, int16_t *m_fencDctCoeff, int64_t *costUncoded, + int64_t *totalUncodedCost, int64_t *totalRdCost, int64_t *psyScale, uint32_t blkPos) +{ + const int transformShift = MAX_TR_DYNAMIC_RANGE - X265_DEPTH - + log2TrSize; /* Represents scaling through forward transform */ + const int scaleBits = SCALE_BITS - 2 * transformShift; + const uint32_t trSize = 1 << log2TrSize; + //using preprocessor to bypass clang bug + const int max = X265_MAX(0, (2 * transformShift + 1)); + + int64x2_t vcost_sum_0 = vdupq_n_s64(0); + int64x2_t vcost_sum_1 = vdupq_n_s64(0); + int32x4_t vpsy = vdupq_n_s32(*psyScale); + for (int y = 0; y < MLS_CG_SIZE; y++) + { + int32x4_t signCoef = vmovl_s16(*(int16x4_t *)&m_resiDctCoeffblkPos); + int32x4_t predictedCoef = vsubq_s32(vmovl_s16(*(int16x4_t *)&m_fencDctCoeffblkPos), signCoef); + int64x2_t cost0, cost1; + cost0 = vmull_s32(vget_low_s32(signCoef), vget_low_s32(signCoef)); + cost1 = vmull_high_s32(signCoef, signCoef); + cost0 = vshlq_n_s64(cost0, scaleBits); + cost1 = vshlq_n_s64(cost1, scaleBits); + int64x2_t neg0 = vmull_s32(vget_low_s32(predictedCoef), vget_low_s32(vpsy)); + int64x2_t neg1 = vmull_high_s32(predictedCoef, vpsy); + if (max > 0) + { + int64x2_t shift = vdupq_n_s64(-max); + neg0 = vshlq_s64(neg0, shift); + neg1 = vshlq_s64(neg1, shift); + } + cost0 = vsubq_s64(cost0, neg0); + cost1 = vsubq_s64(cost1, neg1); + *(int64x2_t *)&costUncodedblkPos + 0 = cost0; + *(int64x2_t *)&costUncodedblkPos + 2 = cost1; + vcost_sum_0 = vaddq_s64(vcost_sum_0, cost0); + vcost_sum_1 = vaddq_s64(vcost_sum_1, cost1); + + blkPos += trSize; + } + int64_t sum = vaddvq_s64(vaddq_s64(vcost_sum_0, vcost_sum_1)); + *totalUncodedCost += sum; + *totalRdCost += sum; +} + +#else +#error "MLS_CG_SIZE must be 4 for neon version" +#endif + + + +template<int trSize> +int count_nonzero_neon(const int16_t *quantCoeff) +{ + X265_CHECK(((intptr_t)quantCoeff & 15) == 0, "quant buffer not aligned\n"); + int count = 0; + int16x8_t vcount = vdupq_n_s16(0); + const int numCoeff = trSize * trSize; + int i = 0; + for (; (i + 8) <= numCoeff; i += 8) + { + int16x8_t in = *(int16x8_t *)&quantCoeffi; + vcount = vaddq_s16(vcount, vtstq_s16(in, in)); + } + for (; i < numCoeff; i++) + { + count += quantCoeffi != 0; + } + + return count - vaddvq_s16(vcount);
View file
x265_3.5.tar.gz/source/common/aarch64/dct-prim.h
Added
@@ -0,0 +1,19 @@ +#ifndef __DCT_PRIM_NEON_H__ +#define __DCT_PRIM_NEON_H__ + + +#include "common.h" +#include "primitives.h" +#include "contexts.h" // costCoeffNxN_c +#include "threading.h" // CLZ + +namespace X265_NS +{ +// x265 private namespace +void setupDCTPrimitives_neon(EncoderPrimitives &p); +}; + + + +#endif +
View file
x265_3.5.tar.gz/source/common/aarch64/filter-prim.cpp
Added
@@ -0,0 +1,973 @@ +#if HAVE_NEON + +#include "filter-prim.h" +#include <arm_neon.h> + +namespace +{ + +using namespace X265_NS; + + +template<int width, int height> +void filterPixelToShort_neon(const pixel *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride) +{ + const int shift = IF_INTERNAL_PREC - X265_DEPTH; + int row, col; + const int16x8_t off = vdupq_n_s16(IF_INTERNAL_OFFS); + for (row = 0; row < height; row++) + { + + for (col = 0; col < width; col += 8) + { + int16x8_t in; + +#if HIGH_BIT_DEPTH + in = *(int16x8_t *)&srccol; +#else + in = vmovl_u8(*(uint8x8_t *)&srccol); +#endif + + int16x8_t tmp = vshlq_n_s16(in, shift); + tmp = vsubq_s16(tmp, off); + *(int16x8_t *)&dstcol = tmp; + + } + + src += srcStride; + dst += dstStride; + } +} + + +template<int N, int width, int height> +void interp_horiz_pp_neon(const pixel *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int coeffIdx) +{ + const int16_t *coeff = (N == 4) ? g_chromaFiltercoeffIdx : g_lumaFiltercoeffIdx; + int headRoom = IF_FILTER_PREC; + int offset = (1 << (headRoom - 1)); + uint16_t maxVal = (1 << X265_DEPTH) - 1; + int cStride = 1; + + src -= (N / 2 - 1) * cStride; + int16x8_t vc; + vc = *(int16x8_t *)coeff; + int16x4_t low_vc = vget_low_s16(vc); + int16x4_t high_vc = vget_high_s16(vc); + + const int32x4_t voffset = vdupq_n_s32(offset); + const int32x4_t vhr = vdupq_n_s32(-headRoom); + + int row, col; + for (row = 0; row < height; row++) + { + for (col = 0; col < width; col += 8) + { + int32x4_t vsum1, vsum2; + + int16x8_t inputN; + + for (int i = 0; i < N; i++) + { +#if HIGH_BIT_DEPTH + inputi = *(int16x8_t *)&srccol + i; +#else + inputi = vmovl_u8(*(uint8x8_t *)&srccol + i); +#endif + } + vsum1 = voffset; + vsum2 = voffset; + + vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input0), low_vc, 0); + vsum2 = vmlal_high_lane_s16(vsum2, input0, low_vc, 0); + + vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input1), low_vc, 1); + vsum2 = vmlal_high_lane_s16(vsum2, input1, low_vc, 1); + + vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input2), low_vc, 2); + vsum2 = vmlal_high_lane_s16(vsum2, input2, low_vc, 2); + + vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input3), low_vc, 3); + vsum2 = vmlal_high_lane_s16(vsum2, input3, low_vc, 3); + + if (N == 8) + { + vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input4), high_vc, 0); + vsum2 = vmlal_high_lane_s16(vsum2, input4, high_vc, 0); + vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input5), high_vc, 1); + vsum2 = vmlal_high_lane_s16(vsum2, input5, high_vc, 1); + vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input6), high_vc, 2); + vsum2 = vmlal_high_lane_s16(vsum2, input6, high_vc, 2); + vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input7), high_vc, 3); + vsum2 = vmlal_high_lane_s16(vsum2, input7, high_vc, 3); + + } + + vsum1 = vshlq_s32(vsum1, vhr); + vsum2 = vshlq_s32(vsum2, vhr); + + int16x8_t vsum = vuzp1q_s16(vsum1, vsum2); + vsum = vminq_s16(vsum, vdupq_n_s16(maxVal)); + vsum = vmaxq_s16(vsum, vdupq_n_s16(0)); +#if HIGH_BIT_DEPTH + *(int16x8_t *)&dstcol = vsum; +#else + uint8x16_t usum = vuzp1q_u8(vsum, vsum); + *(uint8x8_t *)&dstcol = vget_low_u8(usum); +#endif + + } + + src += srcStride; + dst += dstStride; + } +} + +#if HIGH_BIT_DEPTH + +template<int N, int width, int height> +void interp_horiz_ps_neon(const uint16_t *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride, int coeffIdx, + int isRowExt) +{ + const int16_t *coeff = (N == 4) ? g_chromaFiltercoeffIdx : g_lumaFiltercoeffIdx; + const int headRoom = IF_INTERNAL_PREC - X265_DEPTH; + const int shift = IF_FILTER_PREC - headRoom; + const int offset = (unsigned) - IF_INTERNAL_OFFS << shift; + + int blkheight = height; + src -= N / 2 - 1; + + if (isRowExt) + { + src -= (N / 2 - 1) * srcStride; + blkheight += N - 1; + } + int32x4_t vc0 = vmovl_s16(*(int16x4_t *)coeff); + int32x4_t vc1; + + if (N == 8) + { + vc1 = vmovl_s16(*(int16x4_t *)(coeff + 4)); + } + + const int32x4_t voffset = vdupq_n_s32(offset); + const int32x4_t vhr = vdupq_n_s32(-shift); + + int row, col; + for (row = 0; row < blkheight; row++) + { + for (col = 0; col < width; col += 4) + { + int32x4_t vsum; + + int32x4_t inputN; + + for (int i = 0; i < N; i++) + { + inputi = vmovl_s16(*(int16x4_t *)&srccol + i); + } + vsum = voffset; + vsum = vmlaq_laneq_s32(vsum, (input0), vc0, 0); + vsum = vmlaq_laneq_s32(vsum, (input1), vc0, 1); + vsum = vmlaq_laneq_s32(vsum, (input2), vc0, 2); + vsum = vmlaq_laneq_s32(vsum, (input3), vc0, 3); + + + if (N == 8) + { + vsum = vmlaq_laneq_s32(vsum, (input4), vc1, 0); + vsum = vmlaq_laneq_s32(vsum, (input5), vc1, 1); + vsum = vmlaq_laneq_s32(vsum, (input6), vc1, 2); + vsum = vmlaq_laneq_s32(vsum, (input7), vc1, 3); + + } + + vsum = vshlq_s32(vsum, vhr); + *(int16x4_t *)&dstcol = vmovn_u32(vsum); + } + + src += srcStride; + dst += dstStride; + } +} + + +#else + +template<int N, int width, int height> +void interp_horiz_ps_neon(const uint8_t *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride, int coeffIdx, + int isRowExt)
View file
x265_3.5.tar.gz/source/common/aarch64/filter-prim.h
Added
@@ -0,0 +1,21 @@ +#ifndef _FILTER_PRIM_ARM64_H__ +#define _FILTER_PRIM_ARM64_H__ + + +#include "common.h" +#include "slicetype.h" // LOWRES_COST_MASK +#include "primitives.h" +#include "x265.h" + + +namespace X265_NS +{ + + +void setupFilterPrimitives_neon(EncoderPrimitives &p); + +}; + + +#endif +
View file
x265_3.5.tar.gz/source/common/aarch64/fun-decls.h
Added
@@ -0,0 +1,224 @@ +/***************************************************************************** + * Copyright (C) 2021 MulticoreWare, Inc + * + * Authors: Sebastian Pop <spop@amazon.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. + * + * This program is also available under a commercial proprietary license. + * For more information, contact us at license @ x265.com. + *****************************************************************************/ + +#define FUNCDEF_TU(ret, name, cpu, ...) \ + ret PFX(name ## _4x4_ ## cpu(__VA_ARGS__)); \ + ret PFX(name ## _8x8_ ## cpu(__VA_ARGS__)); \ + ret PFX(name ## _16x16_ ## cpu(__VA_ARGS__)); \ + ret PFX(name ## _32x32_ ## cpu(__VA_ARGS__)); \ + ret PFX(name ## _64x64_ ## cpu(__VA_ARGS__)) + +#define FUNCDEF_TU_S(ret, name, cpu, ...) \ + ret PFX(name ## _4_ ## cpu(__VA_ARGS__)); \ + ret PFX(name ## _8_ ## cpu(__VA_ARGS__)); \ + ret PFX(name ## _16_ ## cpu(__VA_ARGS__)); \ + ret PFX(name ## _32_ ## cpu(__VA_ARGS__)); \ + ret PFX(name ## _64_ ## cpu(__VA_ARGS__)) + +#define FUNCDEF_TU_S2(ret, name, cpu, ...) \ + ret PFX(name ## 4_ ## cpu(__VA_ARGS__)); \ + ret PFX(name ## 8_ ## cpu(__VA_ARGS__)); \ + ret PFX(name ## 16_ ## cpu(__VA_ARGS__)); \ + ret PFX(name ## 32_ ## cpu(__VA_ARGS__)); \ + ret PFX(name ## 64_ ## cpu(__VA_ARGS__)) + +#define FUNCDEF_PU(ret, name, cpu, ...) \ + ret PFX(name ## _4x4_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _8x8_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _16x16_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _32x32_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _64x64_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _8x4_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _4x8_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _16x8_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _8x16_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _16x32_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _32x16_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _64x32_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _32x64_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _16x12_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _12x16_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _16x4_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _4x16_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _32x24_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _24x32_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _32x8_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _8x32_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _64x48_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _48x64_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _64x16_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _16x64_ ## cpu)(__VA_ARGS__) + +#define FUNCDEF_CHROMA_PU(ret, name, cpu, ...) \ + FUNCDEF_PU(ret, name, cpu, __VA_ARGS__); \ + ret PFX(name ## _4x2_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _4x4_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _2x4_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _8x2_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _2x8_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _8x6_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _6x8_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _8x12_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _12x8_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _6x16_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _16x6_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _2x16_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _16x2_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _4x12_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _12x4_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _32x12_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _12x32_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _32x4_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _4x32_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _32x48_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _48x32_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _16x24_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _24x16_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _8x64_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _64x8_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _64x24_ ## cpu)(__VA_ARGS__); \ + ret PFX(name ## _24x64_ ## cpu)(__VA_ARGS__); + +#define DECLS(cpu) \ + FUNCDEF_TU(void, cpy2Dto1D_shl, cpu, int16_t* dst, const int16_t* src, intptr_t srcStride, int shift); \ + FUNCDEF_TU(void, cpy2Dto1D_shr, cpu, int16_t* dst, const int16_t* src, intptr_t srcStride, int shift); \ + FUNCDEF_TU(void, cpy1Dto2D_shl, cpu, int16_t* dst, const int16_t* src, intptr_t srcStride, int shift); \ + FUNCDEF_TU(void, cpy1Dto2D_shl_aligned, cpu, int16_t* dst, const int16_t* src, intptr_t srcStride, int shift); \ + FUNCDEF_TU(void, cpy1Dto2D_shr, cpu, int16_t* dst, const int16_t* src, intptr_t srcStride, int shift); \ + FUNCDEF_TU_S(uint32_t, copy_cnt, cpu, int16_t* dst, const int16_t* src, intptr_t srcStride); \ + FUNCDEF_TU_S(int, count_nonzero, cpu, const int16_t* quantCoeff); \ + FUNCDEF_TU(void, blockfill_s, cpu, int16_t* dst, intptr_t dstride, int16_t val); \ + FUNCDEF_TU(void, blockfill_s_aligned, cpu, int16_t* dst, intptr_t dstride, int16_t val); \ + FUNCDEF_CHROMA_PU(void, blockcopy_ss, cpu, int16_t* dst, intptr_t dstStride, const int16_t* src, intptr_t srcStride); \ + FUNCDEF_CHROMA_PU(void, blockcopy_pp, cpu, pixel* dst, intptr_t dstStride, const pixel* src, intptr_t srcStride); \ + FUNCDEF_PU(void, blockcopy_sp, cpu, pixel* dst, intptr_t dstStride, const int16_t* src, intptr_t srcStride); \ + FUNCDEF_PU(void, blockcopy_ps, cpu, int16_t* dst, intptr_t dstStride, const pixel* src, intptr_t srcStride); \ + FUNCDEF_PU(void, interp_8tap_horiz_pp, cpu, const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \ + FUNCDEF_PU(void, interp_8tap_horiz_ps, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); \ + FUNCDEF_PU(void, interp_8tap_vert_pp, cpu, const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \ + FUNCDEF_PU(void, interp_8tap_vert_ps, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx); \ + FUNCDEF_PU(void, interp_8tap_vert_sp, cpu, const int16_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \ + FUNCDEF_PU(void, interp_8tap_vert_ss, cpu, const int16_t* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx); \ + FUNCDEF_PU(void, interp_8tap_hv_pp, cpu, const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int idxX, int idxY); \ + FUNCDEF_CHROMA_PU(void, filterPixelToShort, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride); \ + FUNCDEF_CHROMA_PU(void, filterPixelToShort_aligned, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride); \ + FUNCDEF_CHROMA_PU(void, interp_horiz_pp, cpu, const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \ + FUNCDEF_CHROMA_PU(void, interp_4tap_horiz_pp, cpu, const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \ + FUNCDEF_CHROMA_PU(void, interp_horiz_ps, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); \ + FUNCDEF_CHROMA_PU(void, interp_4tap_horiz_ps, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); \ + FUNCDEF_CHROMA_PU(void, interp_4tap_vert_pp, cpu, const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \ + FUNCDEF_CHROMA_PU(void, interp_4tap_vert_ps, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx); \ + FUNCDEF_CHROMA_PU(void, interp_4tap_vert_sp, cpu, const int16_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \ + FUNCDEF_CHROMA_PU(void, interp_4tap_vert_ss, cpu, const int16_t* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx); \ + FUNCDEF_CHROMA_PU(void, addAvg, cpu, const int16_t*, const int16_t*, pixel*, intptr_t, intptr_t, intptr_t); \ + FUNCDEF_CHROMA_PU(void, addAvg_aligned, cpu, const int16_t*, const int16_t*, pixel*, intptr_t, intptr_t, intptr_t); \ + FUNCDEF_PU(void, pixel_avg_pp, cpu, pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); \ + FUNCDEF_PU(void, pixel_avg_pp_aligned, cpu, pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); \ + FUNCDEF_PU(void, sad_x3, cpu, const pixel*, const pixel*, const pixel*, const pixel*, intptr_t, int32_t*); \ + FUNCDEF_PU(void, sad_x4, cpu, const pixel*, const pixel*, const pixel*, const pixel*, const pixel*, intptr_t, int32_t*); \ + FUNCDEF_CHROMA_PU(int, pixel_sad, cpu, const pixel*, intptr_t, const pixel*, intptr_t); \ + FUNCDEF_CHROMA_PU(sse_t, pixel_ssd_s, cpu, const int16_t*, intptr_t); \ + FUNCDEF_CHROMA_PU(sse_t, pixel_ssd_s_aligned, cpu, const int16_t*, intptr_t); \ + FUNCDEF_TU_S(sse_t, pixel_ssd_s, cpu, const int16_t*, intptr_t); \ + FUNCDEF_TU_S(sse_t, pixel_ssd_s_aligned, cpu, const int16_t*, intptr_t); \ + FUNCDEF_PU(sse_t, pixel_sse_pp, cpu, const pixel*, intptr_t, const pixel*, intptr_t); \ + FUNCDEF_CHROMA_PU(sse_t, pixel_sse_ss, cpu, const int16_t*, intptr_t, const int16_t*, intptr_t); \ + FUNCDEF_PU(void, pixel_sub_ps, cpu, int16_t* a, intptr_t dstride, const pixel* b0, const pixel* b1, intptr_t sstride0, intptr_t sstride1); \ + FUNCDEF_PU(void, pixel_add_ps, cpu, pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1); \ + FUNCDEF_PU(void, pixel_add_ps_aligned, cpu, pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1); \ + FUNCDEF_CHROMA_PU(int, pixel_satd, cpu, const pixel*, intptr_t, const pixel*, intptr_t); \ + FUNCDEF_TU_S2(void, ssimDist, cpu, const pixel *fenc, uint32_t fStride, const pixel *recon, intptr_t rstride, uint64_t *ssBlock, int shift, uint64_t *ac_k); \ + FUNCDEF_TU_S2(void, normFact, cpu, const pixel *src, uint32_t blockSize, int shift, uint64_t *z_k) + +DECLS(neon); + +void x265_pixel_planecopy_cp_neon(const uint8_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int width, int height, int shift); + +uint64_t x265_pixel_var_8x8_neon(const pixel* pix, intptr_t stride); +uint64_t x265_pixel_var_16x16_neon(const pixel* pix, intptr_t stride); +uint64_t x265_pixel_var_32x32_neon(const pixel* pix, intptr_t stride); +uint64_t x265_pixel_var_64x64_neon(const pixel* pix, intptr_t stride); + +void x265_getResidual4_neon(const pixel* fenc, const pixel* pred, int16_t* residual, intptr_t stride); +void x265_getResidual8_neon(const pixel* fenc, const pixel* pred, int16_t* residual, intptr_t stride); +void x265_getResidual16_neon(const pixel* fenc, const pixel* pred, int16_t* residual, intptr_t stride); +void x265_getResidual32_neon(const pixel* fenc, const pixel* pred, int16_t* residual, intptr_t stride); + +void x265_scale1D_128to64_neon(pixel *dst, const pixel *src); +void x265_scale2D_64to32_neon(pixel* dst, const pixel* src, intptr_t stride); + +int x265_pixel_satd_4x4_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_4x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_4x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_4x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_8x4_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_8x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_8x12_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_8x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_8x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_8x64_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_12x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_12x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_16x4_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_16x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_16x12_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_16x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_16x24_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_16x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_16x64_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_24x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_24x64_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_32x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_32x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_32x24_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_32x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_32x48_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_32x64_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_48x64_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_64x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); +int x265_pixel_satd_64x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
View file
x265_3.5.tar.gz/source/common/aarch64/intrapred-prim.cpp
Added
@@ -0,0 +1,264 @@ +#include "common.h" +#include "primitives.h" + + +#if 1 +#include "arm64-utils.h" +#include <arm_neon.h> + +using namespace X265_NS; + +namespace +{ + + + +template<int width> +void intra_pred_ang_neon(pixel *dst, intptr_t dstStride, const pixel *srcPix0, int dirMode, int bFilter) +{ + int width2 = width << 1; + // Flip the neighbours in the horizontal case. + int horMode = dirMode < 18; + pixel neighbourBuf129; + const pixel *srcPix = srcPix0; + + if (horMode) + { + neighbourBuf0 = srcPix0; + //for (int i = 0; i < width << 1; i++) + //{ + // neighbourBuf1 + i = srcPixwidth2 + 1 + i; + // neighbourBufwidth2 + 1 + i = srcPix1 + i; + //} + memcpy(&neighbourBuf1, &srcPixwidth2 + 1, sizeof(pixel) * (width << 1)); + memcpy(&neighbourBufwidth2 + 1, &srcPix1, sizeof(pixel) * (width << 1)); + srcPix = neighbourBuf; + } + + // Intra prediction angle and inverse angle tables. + const int8_t angleTable17 = { -32, -26, -21, -17, -13, -9, -5, -2, 0, 2, 5, 9, 13, 17, 21, 26, 32 }; + const int16_t invAngleTable8 = { 4096, 1638, 910, 630, 482, 390, 315, 256 }; + + // Get the prediction angle. + int angleOffset = horMode ? 10 - dirMode : dirMode - 26; + int angle = angleTable8 + angleOffset; + + // Vertical Prediction. + if (!angle) + { + for (int y = 0; y < width; y++) + { + memcpy(&dsty * dstStride, srcPix + 1, sizeof(pixel)*width); + } + if (bFilter) + { + int topLeft = srcPix0, top = srcPix1; + for (int y = 0; y < width; y++) + { + dsty * dstStride = x265_clip((int16_t)(top + ((srcPixwidth2 + 1 + y - topLeft) >> 1))); + } + } + } + else // Angular prediction. + { + // Get the reference pixels. The reference base is the first pixel to the top (neighbourBuf1). + pixel refBuf64; + const pixel *ref; + + // Use the projected left neighbours and the top neighbours. + if (angle < 0) + { + // Number of neighbours projected. + int nbProjected = -((width * angle) >> 5) - 1; + pixel *ref_pix = refBuf + nbProjected + 1; + + // Project the neighbours. + int invAngle = invAngleTable- angleOffset - 1; + int invAngleSum = 128; + for (int i = 0; i < nbProjected; i++) + { + invAngleSum += invAngle; + ref_pix- 2 - i = srcPixwidth2 + (invAngleSum >> 8); + } + + // Copy the top-left and top pixels. + //for (int i = 0; i < width + 1; i++) + //ref_pix-1 + i = srcPixi; + + memcpy(&ref_pix-1, srcPix, (width + 1)*sizeof(pixel)); + ref = ref_pix; + } + else // Use the top and top-right neighbours. + { + ref = srcPix + 1; + } + + // Pass every row. + int angleSum = 0; + for (int y = 0; y < width; y++) + { + angleSum += angle; + int offset = angleSum >> 5; + int fraction = angleSum & 31; + + if (fraction) // Interpolate + { + if (width >= 8 && sizeof(pixel) == 1) + { + const int16x8_t f0 = vdupq_n_s16(32 - fraction); + const int16x8_t f1 = vdupq_n_s16(fraction); + for (int x = 0; x < width; x += 8) + { + uint8x8_t in0 = *(uint8x8_t *)&refoffset + x; + uint8x8_t in1 = *(uint8x8_t *)&refoffset + x + 1; + int16x8_t lo = vmlaq_s16(vdupq_n_s16(16), vmovl_u8(in0), f0); + lo = vmlaq_s16(lo, vmovl_u8(in1), f1); + lo = vshrq_n_s16(lo, 5); + *(uint8x8_t *)&dsty * dstStride + x = vmovn_u16(lo); + } + } + else if (width >= 4 && sizeof(pixel) == 2) + { + const int32x4_t f0 = vdupq_n_s32(32 - fraction); + const int32x4_t f1 = vdupq_n_s32(fraction); + for (int x = 0; x < width; x += 4) + { + uint16x4_t in0 = *(uint16x4_t *)&refoffset + x; + uint16x4_t in1 = *(uint16x4_t *)&refoffset + x + 1; + int32x4_t lo = vmlaq_s32(vdupq_n_s32(16), vmovl_u16(in0), f0); + lo = vmlaq_s32(lo, vmovl_u16(in1), f1); + lo = vshrq_n_s32(lo, 5); + *(uint16x4_t *)&dsty * dstStride + x = vmovn_u32(lo); + } + } + else + { + for (int x = 0; x < width; x++) + { + dsty * dstStride + x = (pixel)(((32 - fraction) * refoffset + x + fraction * refoffset + x + 1 + 16) >> 5); + } + } + } + else // Copy. + { + memcpy(&dsty * dstStride, &refoffset, sizeof(pixel)*width); + } + } + } + + // Flip for horizontal. + if (horMode) + { + if (width == 8) + { + transpose8x8(dst, dst, dstStride, dstStride); + } + else if (width == 16) + { + transpose16x16(dst, dst, dstStride, dstStride); + } + else if (width == 32) + { + transpose32x32(dst, dst, dstStride, dstStride); + } + else + { + for (int y = 0; y < width - 1; y++) + { + for (int x = y + 1; x < width; x++) + { + pixel tmp = dsty * dstStride + x; + dsty * dstStride + x = dstx * dstStride + y; + dstx * dstStride + y = tmp; + } + } + } + } +} + +template<int log2Size> +void all_angs_pred_neon(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma) +{ + const int size = 1 << log2Size; + for (int mode = 2; mode <= 34; mode++) + { + pixel *srcPix = (g_intraFilterFlagsmode & size ? filtPix : refPix); + pixel *out = dest + ((mode - 2) << (log2Size * 2)); + + intra_pred_ang_neon<size>(out, size, srcPix, mode, bLuma); + + // Optimize code don't flip buffer + bool modeHor = (mode < 18); + + // transpose the block if this is a horizontal mode + if (modeHor) + { + if (size == 8) + { + transpose8x8(out, out, size, size); + }
View file
x265_3.5.tar.gz/source/common/aarch64/intrapred-prim.h
Added
@@ -0,0 +1,15 @@ +#ifndef INTRAPRED_PRIM_H__ + +#if defined(__aarch64__) + +namespace X265_NS +{ +// x265 private namespace + +void setupIntraPrimitives_neon(EncoderPrimitives &p); +} + +#endif + +#endif +
View file
x265_3.5.tar.gz/source/common/aarch64/ipfilter.S
Added
@@ -0,0 +1,2452 @@ +/***************************************************************************** + * Copyright (C) 2021 MulticoreWare, Inc + * + * Authors: Sebastian Pop <spop@amazon.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. + * + * This program is also available under a commercial proprietary license. + * For more information, contact us at license @ x265.com. + *****************************************************************************/ + +// Functions in this file: +// ***** luma_vpp ***** +// ***** luma_vps ***** +// ***** luma_vsp ***** +// ***** luma_vss ***** +// ***** luma_hpp ***** +// ***** luma_hps ***** +// ***** chroma_vpp ***** +// ***** chroma_vps ***** +// ***** chroma_vsp ***** +// ***** chroma_vss ***** +// ***** chroma_hpp ***** +// ***** chroma_hps ***** + +#include "asm.S" + +#ifdef __APPLE__ +.section __RODATA,__rodata +#else +.section .rodata +#endif + +.align 4 + +.text + +// Macros below follow these conventions: +// - input data in registers: v0, v1, v2, v3, v4, v5, v6, v7 +// - constants in registers: v24, v25, v26, v27, v31 +// - temporary registers: v16, v17, v18, v19, v20, v21, v22, v23, v28, v29, v30. +// - _32b macros output a result in v17.4s +// - _64b and _32b_1 macros output results in v17.4s, v18.4s + +.macro vextin8 v + ldp d6, d7, x11, #16 +.if \v == 0 + // qpel_filter_0 only uses values in v3 + ext v3.8b, v6.8b, v7.8b, #4 +.else +.if \v != 3 + ext v0.8b, v6.8b, v7.8b, #1 +.endif + ext v1.8b, v6.8b, v7.8b, #2 + ext v2.8b, v6.8b, v7.8b, #3 + ext v3.8b, v6.8b, v7.8b, #4 + ext v4.8b, v6.8b, v7.8b, #5 + ext v5.8b, v6.8b, v7.8b, #6 + ext v6.8b, v6.8b, v7.8b, #7 +.endif +.endm + +.macro vextin8_64 v + ldp q6, q7, x11, #32 +.if \v == 0 + // qpel_filter_0 only uses values in v3 + ext v3.16b, v6.16b, v7.16b, #4 +.else +.if \v != 3 + // qpel_filter_3 does not use values in v0 + ext v0.16b, v6.16b, v7.16b, #1 +.endif + ext v1.16b, v6.16b, v7.16b, #2 + ext v2.16b, v6.16b, v7.16b, #3 + ext v3.16b, v6.16b, v7.16b, #4 + ext v4.16b, v6.16b, v7.16b, #5 + ext v5.16b, v6.16b, v7.16b, #6 +.if \v == 1 + ext v6.16b, v6.16b, v7.16b, #7 + // qpel_filter_1 does not use v7 +.else + ext v16.16b, v6.16b, v7.16b, #7 + ext v7.16b, v6.16b, v7.16b, #8 + mov v6.16b, v16.16b +.endif +.endif +.endm + +.macro vextin8_chroma v + ldp d6, d7, x11, #16 +.if \v == 0 + // qpel_filter_chroma_0 only uses values in v1 + ext v1.8b, v6.8b, v7.8b, #2 +.else + ext v0.8b, v6.8b, v7.8b, #1 + ext v1.8b, v6.8b, v7.8b, #2 + ext v2.8b, v6.8b, v7.8b, #3 + ext v3.8b, v6.8b, v7.8b, #4 +.endif +.endm + +.macro vextin8_chroma_64 v + ldp q16, q17, x11, #32 +.if \v == 0 + // qpel_filter_chroma_0 only uses values in v1 + ext v1.16b, v16.16b, v17.16b, #2 +.else + ext v0.16b, v16.16b, v17.16b, #1 + ext v1.16b, v16.16b, v17.16b, #2 + ext v2.16b, v16.16b, v17.16b, #3 + ext v3.16b, v16.16b, v17.16b, #4 +.endif +.endm + +.macro qpel_load_32b v +.if \v == 0 + add x6, x6, x11 // do not load 3 values that are not used in qpel_filter_0 + ld1 {v3.8b}, x6, x1 +.elseif \v == 1 || \v == 2 || \v == 3 +.if \v != 3 // not used in qpel_filter_3 + ld1 {v0.8b}, x6, x1 +.else + add x6, x6, x1 +.endif + ld1 {v1.8b}, x6, x1 + ld1 {v2.8b}, x6, x1 + ld1 {v3.8b}, x6, x1 + ld1 {v4.8b}, x6, x1 + ld1 {v5.8b}, x6, x1 +.if \v != 1 // not used in qpel_filter_1 + ld1 {v6.8b}, x6, x1 + ld1 {v7.8b}, x6 +.else + ld1 {v6.8b}, x6 +.endif +.endif +.endm + +.macro qpel_load_64b v +.if \v == 0 + add x6, x6, x11 // do not load 3 values that are not used in qpel_filter_0 + ld1 {v3.16b}, x6, x1 +.elseif \v == 1 || \v == 2 || \v == 3 +.if \v != 3 // not used in qpel_filter_3 + ld1 {v0.16b}, x6, x1 +.else + add x6, x6, x1 +.endif + ld1 {v1.16b}, x6, x1 + ld1 {v2.16b}, x6, x1 + ld1 {v3.16b}, x6, x1 + ld1 {v4.16b}, x6, x1 + ld1 {v5.16b}, x6, x1 +.if \v != 1 // not used in qpel_filter_1 + ld1 {v6.16b}, x6, x1 + ld1 {v7.16b}, x6 +.else + ld1 {v6.16b}, x6 +.endif +.endif +.endm + +.macro qpel_chroma_load_32b v +.if \v == 0 + // qpel_filter_chroma_0 only uses values in v1 + add x6, x6, x1 + ldr d1, x6 +.else + ld1 {v0.8b}, x6, x1 + ld1 {v1.8b}, x6, x1 + ld1 {v2.8b}, x6, x1 + ld1 {v3.8b}, x6 +.endif +.endm + +.macro qpel_chroma_load_64b v +.if \v == 0 + // qpel_filter_chroma_0 only uses values in v1 + add x6, x6, x1 + ldr q1, x6 +.else + ld1 {v0.16b}, x6, x1 + ld1 {v1.16b}, x6, x1 + ld1 {v2.16b}, x6, x1 + ld1 {v3.16b}, x6 +.endif +.endm
View file
x265_3.5.tar.gz/source/common/aarch64/ipfilter8.S
Deleted
@@ -1,414 +0,0 @@ -/***************************************************************************** - * Copyright (C) 2020 MulticoreWare, Inc - * - * Authors: Yimeng Su <yimeng.su@huawei.com> - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. - * - * This program is also available under a commercial proprietary license. - * For more information, contact us at license @ x265.com. - *****************************************************************************/ - -#include "asm.S" - -.section .rodata - -.align 4 - -.text - - - -.macro qpel_filter_0_32b - movi v24.8h, #64 - uxtl v19.8h, v5.8b - smull v17.4s, v19.4h, v24.4h - smull2 v18.4s, v19.8h, v24.8h -.endm - -.macro qpel_filter_1_32b - movi v16.8h, #58 - uxtl v19.8h, v5.8b - smull v17.4s, v19.4h, v16.4h - smull2 v18.4s, v19.8h, v16.8h - - movi v24.8h, #10 - uxtl v21.8h, v1.8b - smull v19.4s, v21.4h, v24.4h - smull2 v20.4s, v21.8h, v24.8h - - movi v16.8h, #17 - uxtl v23.8h, v2.8b - smull v21.4s, v23.4h, v16.4h - smull2 v22.4s, v23.8h, v16.8h - - movi v24.8h, #5 - uxtl v1.8h, v6.8b - smull v23.4s, v1.4h, v24.4h - smull2 v16.4s, v1.8h, v24.8h - - sub v17.4s, v17.4s, v19.4s - sub v18.4s, v18.4s, v20.4s - - uxtl v1.8h, v4.8b - sshll v19.4s, v1.4h, #2 - sshll2 v20.4s, v1.8h, #2 - - add v17.4s, v17.4s, v21.4s - add v18.4s, v18.4s, v22.4s - - uxtl v1.8h, v0.8b - uxtl v2.8h, v3.8b - ssubl v21.4s, v2.4h, v1.4h - ssubl2 v22.4s, v2.8h, v1.8h - - add v17.4s, v17.4s, v19.4s - add v18.4s, v18.4s, v20.4s - sub v21.4s, v21.4s, v23.4s - sub v22.4s, v22.4s, v16.4s - add v17.4s, v17.4s, v21.4s - add v18.4s, v18.4s, v22.4s -.endm - -.macro qpel_filter_2_32b - movi v16.4s, #11 - uxtl v19.8h, v5.8b - uxtl v20.8h, v2.8b - saddl v17.4s, v19.4h, v20.4h - saddl2 v18.4s, v19.8h, v20.8h - - uxtl v21.8h, v1.8b - uxtl v22.8h, v6.8b - saddl v19.4s, v21.4h, v22.4h - saddl2 v20.4s, v21.8h, v22.8h - - mul v19.4s, v19.4s, v16.4s - mul v20.4s, v20.4s, v16.4s - - movi v16.4s, #40 - mul v17.4s, v17.4s, v16.4s - mul v18.4s, v18.4s, v16.4s - - uxtl v21.8h, v4.8b - uxtl v22.8h, v3.8b - saddl v23.4s, v21.4h, v22.4h - saddl2 v16.4s, v21.8h, v22.8h - - uxtl v1.8h, v0.8b - uxtl v2.8h, v7.8b - saddl v21.4s, v1.4h, v2.4h - saddl2 v22.4s, v1.8h, v2.8h - - shl v23.4s, v23.4s, #2 - shl v16.4s, v16.4s, #2 - - add v19.4s, v19.4s, v21.4s - add v20.4s, v20.4s, v22.4s - add v17.4s, v17.4s, v23.4s - add v18.4s, v18.4s, v16.4s - sub v17.4s, v17.4s, v19.4s - sub v18.4s, v18.4s, v20.4s -.endm - -.macro qpel_filter_3_32b - movi v16.8h, #17 - movi v24.8h, #5 - - uxtl v19.8h, v5.8b - smull v17.4s, v19.4h, v16.4h - smull2 v18.4s, v19.8h, v16.8h - - uxtl v21.8h, v1.8b - smull v19.4s, v21.4h, v24.4h - smull2 v20.4s, v21.8h, v24.8h - - movi v16.8h, #58 - uxtl v23.8h, v2.8b - smull v21.4s, v23.4h, v16.4h - smull2 v22.4s, v23.8h, v16.8h - - movi v24.8h, #10 - uxtl v1.8h, v6.8b - smull v23.4s, v1.4h, v24.4h - smull2 v16.4s, v1.8h, v24.8h - - sub v17.4s, v17.4s, v19.4s - sub v18.4s, v18.4s, v20.4s - - uxtl v1.8h, v3.8b - sshll v19.4s, v1.4h, #2 - sshll2 v20.4s, v1.8h, #2 - - add v17.4s, v17.4s, v21.4s - add v18.4s, v18.4s, v22.4s - - uxtl v1.8h, v4.8b - uxtl v2.8h, v7.8b - ssubl v21.4s, v1.4h, v2.4h - ssubl2 v22.4s, v1.8h, v2.8h - - add v17.4s, v17.4s, v19.4s - add v18.4s, v18.4s, v20.4s - sub v21.4s, v21.4s, v23.4s - sub v22.4s, v22.4s, v16.4s - add v17.4s, v17.4s, v21.4s - add v18.4s, v18.4s, v22.4s -.endm - - - - -.macro vextin8 - ld1 {v3.16b}, x11, #16 - mov v7.d0, v3.d1 - ext v0.8b, v3.8b, v7.8b, #1 - ext v4.8b, v3.8b, v7.8b, #2 - ext v1.8b, v3.8b, v7.8b, #3 - ext v5.8b, v3.8b, v7.8b, #4 - ext v2.8b, v3.8b, v7.8b, #5 - ext v6.8b, v3.8b, v7.8b, #6 - ext v3.8b, v3.8b, v7.8b, #7 -.endm - - - -// void interp_horiz_ps_c(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt) -.macro HPS_FILTER a b filterhps - mov w12, #8192 - mov w6, w10 - sub x3, x3, #\a - lsl x3, x3, #1 - mov w9, #\a - cmp w9, #4 - b.eq 14f - cmp w9, #12 - b.eq 15f - b 7f -14:
View file
x265_3.5.tar.gz/source/common/aarch64/ipfilter8.h
Deleted
@@ -1,55 +0,0 @@ -/***************************************************************************** - * Copyright (C) 2020 MulticoreWare, Inc - * - * Authors: Yimeng Su <yimeng.su@huawei.com> - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. - * - * This program is also available under a commercial proprietary license. - * For more information, contact us at license @ x265.com. - *****************************************************************************/ - -#ifndef X265_IPFILTER8_AARCH64_H -#define X265_IPFILTER8_AARCH64_H - - -void x265_interp_8tap_horiz_ps_4x4_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_4x8_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_4x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_8x4_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_8x8_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_8x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_8x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_12x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_16x4_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_16x8_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_16x12_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_16x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_16x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_16x64_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_24x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_32x8_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_32x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_32x24_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_32x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_32x64_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_48x64_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_64x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_64x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_64x48_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); -void x265_interp_8tap_horiz_ps_64x64_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); - - -#endif // ifndef X265_IPFILTER8_AARCH64_H
View file
x265_3.5.tar.gz/source/common/aarch64/loopfilter-prim.cpp
Added
@@ -0,0 +1,290 @@ +#include "loopfilter-prim.h" + +#define PIXEL_MIN 0 + + + +#if !(HIGH_BIT_DEPTH) && defined(HAVE_NEON) +#include<arm_neon.h> + +namespace +{ + + +/* get the sign of input variable (TODO: this is a dup, make common) */ +static inline int8_t signOf(int x) +{ + return (x >> 31) | ((int)((((uint32_t) - x)) >> 31)); +} + +static inline int8x8_t sign_diff_neon(const uint8x8_t in0, const uint8x8_t in1) +{ + int16x8_t in = vsubl_u8(in0, in1); + return vmovn_s16(vmaxq_s16(vminq_s16(in, vdupq_n_s16(1)), vdupq_n_s16(-1))); +} + +static void calSign_neon(int8_t *dst, const pixel *src1, const pixel *src2, const int endX) +{ + int x = 0; + for (; (x + 8) <= endX; x += 8) + { + *(int8x8_t *)&dstx = sign_diff_neon(*(uint8x8_t *)&src1x, *(uint8x8_t *)&src2x); + } + + for (; x < endX; x++) + { + dstx = signOf(src1x - src2x); + } +} + +static void processSaoCUE0_neon(pixel *rec, int8_t *offsetEo, int width, int8_t *signLeft, intptr_t stride) +{ + + + int y; + int8_t signRight, signLeft0; + int8_t edgeType; + + for (y = 0; y < 2; y++) + { + signLeft0 = signLefty; + int x = 0; + + if (width >= 8) + { + int8x8_t vsignRight; + int8x8x2_t shifter; + shifter.val10 = signLeft0; + static const int8x8_t index = {8, 0, 1, 2, 3, 4, 5, 6}; + int8x8_t tbl = *(int8x8_t *)offsetEo; + for (; (x + 8) <= width; x += 8) + { + uint8x8_t in = *(uint8x8_t *)&recx; + vsignRight = sign_diff_neon(in, *(uint8x8_t *)&recx + 1); + shifter.val0 = vneg_s8(vsignRight); + int8x8_t tmp = shifter.val0; + int8x8_t edge = vtbl2_s8(shifter, index); + int8x8_t vedgeType = vadd_s8(vadd_s8(vsignRight, edge), vdup_n_s8(2)); + shifter.val10 = tmp7; + int16x8_t t1 = vmovl_s8(vtbl1_s8(tbl, vedgeType)); + t1 = vaddw_u8(t1, in); + t1 = vmaxq_s16(t1, vdupq_n_s16(0)); + t1 = vminq_s16(t1, vdupq_n_s16(255)); + *(uint8x8_t *)&recx = vmovn_u16(t1); + } + signLeft0 = shifter.val10; + } + for (; x < width; x++) + { + signRight = ((recx - recx + 1) < 0) ? -1 : ((recx - recx + 1) > 0) ? 1 : 0; + edgeType = signRight + signLeft0 + 2; + signLeft0 = -signRight; + recx = x265_clip(recx + offsetEoedgeType); + } + rec += stride; + } +} + +static void processSaoCUE1_neon(pixel *rec, int8_t *upBuff1, int8_t *offsetEo, intptr_t stride, int width) +{ + int x = 0; + int8_t signDown; + int edgeType; + + if (width >= 8) + { + int8x8_t tbl = *(int8x8_t *)offsetEo; + for (; (x + 8) <= width; x += 8) + { + uint8x8_t in0 = *(uint8x8_t *)&recx; + uint8x8_t in1 = *(uint8x8_t *)&recx + stride; + int8x8_t vsignDown = sign_diff_neon(in0, in1); + int8x8_t vedgeType = vadd_s8(vadd_s8(vsignDown, *(int8x8_t *)&upBuff1x), vdup_n_s8(2)); + *(int8x8_t *)&upBuff1x = vneg_s8(vsignDown); + int16x8_t t1 = vmovl_s8(vtbl1_s8(tbl, vedgeType)); + t1 = vaddw_u8(t1, in0); + *(uint8x8_t *)&recx = vqmovun_s16(t1); + } + } + for (; x < width; x++) + { + signDown = signOf(recx - recx + stride); + edgeType = signDown + upBuff1x + 2; + upBuff1x = -signDown; + recx = x265_clip(recx + offsetEoedgeType); + } +} + +static void processSaoCUE1_2Rows_neon(pixel *rec, int8_t *upBuff1, int8_t *offsetEo, intptr_t stride, int width) +{ + int y; + int8_t signDown; + int edgeType; + + for (y = 0; y < 2; y++) + { + int x = 0; + if (width >= 8) + { + int8x8_t tbl = *(int8x8_t *)offsetEo; + for (; (x + 8) <= width; x += 8) + { + uint8x8_t in0 = *(uint8x8_t *)&recx; + uint8x8_t in1 = *(uint8x8_t *)&recx + stride; + int8x8_t vsignDown = sign_diff_neon(in0, in1); + int8x8_t vedgeType = vadd_s8(vadd_s8(vsignDown, *(int8x8_t *)&upBuff1x), vdup_n_s8(2)); + *(int8x8_t *)&upBuff1x = vneg_s8(vsignDown); + int16x8_t t1 = vmovl_s8(vtbl1_s8(tbl, vedgeType)); + t1 = vaddw_u8(t1, in0); + t1 = vmaxq_s16(t1, vdupq_n_s16(0)); + t1 = vminq_s16(t1, vdupq_n_s16(255)); + *(uint8x8_t *)&recx = vmovn_u16(t1); + + } + } + for (; x < width; x++) + { + signDown = signOf(recx - recx + stride); + edgeType = signDown + upBuff1x + 2; + upBuff1x = -signDown; + recx = x265_clip(recx + offsetEoedgeType); + } + rec += stride; + } +} + +static void processSaoCUE2_neon(pixel *rec, int8_t *bufft, int8_t *buff1, int8_t *offsetEo, int width, intptr_t stride) +{ + int x; + + if (abs(buff1 - bufft) < 16) + { + for (x = 0; x < width; x++) + { + int8_t signDown = signOf(recx - recx + stride + 1); + int edgeType = signDown + buff1x + 2; + bufftx + 1 = -signDown; + recx = x265_clip(recx + offsetEoedgeType);; + } + } + else + { + int8x8_t tbl = *(int8x8_t *)offsetEo; + x = 0; + for (; (x + 8) <= width; x += 8) + { + uint8x8_t in0 = *(uint8x8_t *)&recx; + uint8x8_t in1 = *(uint8x8_t *)&recx + stride + 1; + int8x8_t vsignDown = sign_diff_neon(in0, in1); + int8x8_t vedgeType = vadd_s8(vadd_s8(vsignDown, *(int8x8_t *)&buff1x), vdup_n_s8(2)); + *(int8x8_t *)&bufftx + 1 = vneg_s8(vsignDown); + int16x8_t t1 = vmovl_s8(vtbl1_s8(tbl, vedgeType)); + t1 = vaddw_u8(t1, in0); + t1 = vmaxq_s16(t1, vdupq_n_s16(0)); + t1 = vminq_s16(t1, vdupq_n_s16(255)); + *(uint8x8_t *)&recx = vmovn_u16(t1); + } + for (; x < width; x++) + { + int8_t signDown = signOf(recx - recx + stride + 1); + int edgeType = signDown + buff1x + 2; + bufftx + 1 = -signDown; + recx = x265_clip(recx + offsetEoedgeType);; + } + + } +} + + +static void processSaoCUE3_neon(pixel *rec, int8_t *upBuff1, int8_t *offsetEo, intptr_t stride, int startX, int endX)
View file
x265_3.5.tar.gz/source/common/aarch64/loopfilter-prim.h
Added
@@ -0,0 +1,16 @@ +#ifndef _LOOPFILTER_NEON_H__ +#define _LOOPFILTER_NEON_H__ + +#include "common.h" +#include "primitives.h" + +#define PIXEL_MIN 0 + +namespace X265_NS +{ +void setupLoopFilterPrimitives_neon(EncoderPrimitives &p); + +}; + + +#endif
View file
x265_3.5.tar.gz/source/common/aarch64/mc-a.S
Changed
@@ -1,7 +1,8 @@ /***************************************************************************** - * Copyright (C) 2020 MulticoreWare, Inc + * Copyright (C) 2020-2021 MulticoreWare, Inc * * Authors: Hongbin Liu <liuhongbin1@huawei.com> + * Sebastian Pop <spop@amazon.com> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -23,14 +24,18 @@ #include "asm.S" +#ifdef __APPLE__ +.section __RODATA,__rodata +#else .section .rodata +#endif .align 4 .text .macro pixel_avg_pp_4xN_neon h -function x265_pixel_avg_pp_4x\h\()_neon +function PFX(pixel_avg_pp_4x\h\()_neon) .rept \h ld1 {v0.s}0, x2, x3 ld1 {v1.s}0, x4, x5 @@ -46,7 +51,7 @@ pixel_avg_pp_4xN_neon 16 .macro pixel_avg_pp_8xN_neon h -function x265_pixel_avg_pp_8x\h\()_neon +function PFX(pixel_avg_pp_8x\h\()_neon) .rept \h ld1 {v0.8b}, x2, x3 ld1 {v1.8b}, x4, x5 @@ -61,3 +66,506 @@ pixel_avg_pp_8xN_neon 8 pixel_avg_pp_8xN_neon 16 pixel_avg_pp_8xN_neon 32 + +function PFX(pixel_avg_pp_12x16_neon) + sub x1, x1, #4 + sub x3, x3, #4 + sub x5, x5, #4 +.rept 16 + ld1 {v0.s}0, x2, #4 + ld1 {v1.8b}, x2, x3 + ld1 {v2.s}0, x4, #4 + ld1 {v3.8b}, x4, x5 + urhadd v4.8b, v0.8b, v2.8b + urhadd v5.8b, v1.8b, v3.8b + st1 {v4.s}0, x0, #4 + st1 {v5.8b}, x0, x1 +.endr + ret +endfunc + +.macro pixel_avg_pp_16xN_neon h +function PFX(pixel_avg_pp_16x\h\()_neon) +.rept \h + ld1 {v0.16b}, x2, x3 + ld1 {v1.16b}, x4, x5 + urhadd v2.16b, v0.16b, v1.16b + st1 {v2.16b}, x0, x1 +.endr + ret +endfunc +.endm + +pixel_avg_pp_16xN_neon 4 +pixel_avg_pp_16xN_neon 8 +pixel_avg_pp_16xN_neon 12 +pixel_avg_pp_16xN_neon 16 +pixel_avg_pp_16xN_neon 32 + +function PFX(pixel_avg_pp_16x64_neon) + mov w12, #8 +.lpavg_16x64: + sub w12, w12, #1 +.rept 8 + ld1 {v0.16b}, x2, x3 + ld1 {v1.16b}, x4, x5 + urhadd v2.16b, v0.16b, v1.16b + st1 {v2.16b}, x0, x1 +.endr + cbnz w12, .lpavg_16x64 + ret +endfunc + +function PFX(pixel_avg_pp_24x32_neon) + sub x1, x1, #16 + sub x3, x3, #16 + sub x5, x5, #16 + mov w12, #4 +.lpavg_24x32: + sub w12, w12, #1 +.rept 8 + ld1 {v0.16b}, x2, #16 + ld1 {v1.8b}, x2, x3 + ld1 {v2.16b}, x4, #16 + ld1 {v3.8b}, x4, x5 + urhadd v0.16b, v0.16b, v2.16b + urhadd v1.8b, v1.8b, v3.8b + st1 {v0.16b}, x0, #16 + st1 {v1.8b}, x0, x1 +.endr + cbnz w12, .lpavg_24x32 + ret +endfunc + +.macro pixel_avg_pp_32xN_neon h +function PFX(pixel_avg_pp_32x\h\()_neon) +.rept \h + ld1 {v0.16b-v1.16b}, x2, x3 + ld1 {v2.16b-v3.16b}, x4, x5 + urhadd v0.16b, v0.16b, v2.16b + urhadd v1.16b, v1.16b, v3.16b + st1 {v0.16b-v1.16b}, x0, x1 +.endr + ret +endfunc +.endm + +pixel_avg_pp_32xN_neon 8 +pixel_avg_pp_32xN_neon 16 +pixel_avg_pp_32xN_neon 24 + +.macro pixel_avg_pp_32xN1_neon h +function PFX(pixel_avg_pp_32x\h\()_neon) + mov w12, #\h / 8 +.lpavg_32x\h\(): + sub w12, w12, #1 +.rept 8 + ld1 {v0.16b-v1.16b}, x2, x3 + ld1 {v2.16b-v3.16b}, x4, x5 + urhadd v0.16b, v0.16b, v2.16b + urhadd v1.16b, v1.16b, v3.16b + st1 {v0.16b-v1.16b}, x0, x1 +.endr + cbnz w12, .lpavg_32x\h + ret +endfunc +.endm + +pixel_avg_pp_32xN1_neon 32 +pixel_avg_pp_32xN1_neon 64 + +function PFX(pixel_avg_pp_48x64_neon) + mov w12, #8 +.lpavg_48x64: + sub w12, w12, #1 +.rept 8 + ld1 {v0.16b-v2.16b}, x2, x3 + ld1 {v3.16b-v5.16b}, x4, x5 + urhadd v0.16b, v0.16b, v3.16b + urhadd v1.16b, v1.16b, v4.16b + urhadd v2.16b, v2.16b, v5.16b + st1 {v0.16b-v2.16b}, x0, x1 +.endr + cbnz w12, .lpavg_48x64 + ret +endfunc + +.macro pixel_avg_pp_64xN_neon h +function PFX(pixel_avg_pp_64x\h\()_neon) + mov w12, #\h / 4 +.lpavg_64x\h\(): + sub w12, w12, #1 +.rept 4 + ld1 {v0.16b-v3.16b}, x2, x3 + ld1 {v4.16b-v7.16b}, x4, x5 + urhadd v0.16b, v0.16b, v4.16b + urhadd v1.16b, v1.16b, v5.16b + urhadd v2.16b, v2.16b, v6.16b + urhadd v3.16b, v3.16b, v7.16b + st1 {v0.16b-v3.16b}, x0, x1 +.endr + cbnz w12, .lpavg_64x\h + ret +endfunc +.endm + +pixel_avg_pp_64xN_neon 16 +pixel_avg_pp_64xN_neon 32 +pixel_avg_pp_64xN_neon 48 +pixel_avg_pp_64xN_neon 64 + +// void addAvg(const int16_t* src0, const int16_t* src1, pixel* dst, intptr_t src0Stride, intptr_t src1Stride, intptr_t dstStride) +.macro addAvg_start + lsl x3, x3, #1 + lsl x4, x4, #1 + mov w11, #0x40 + dup v30.16b, w11 +.endm + +.macro addAvg_2xN h
View file
x265_3.5.tar.gz/source/common/aarch64/p2s.S
Added
@@ -0,0 +1,452 @@ +/***************************************************************************** + * Copyright (C) 2021 MulticoreWare, Inc + * + * Authors: Sebastian Pop <spop@amazon.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. + * + * This program is also available under a commercial proprietary license. + * For more information, contact us at license @ x265.com. + *****************************************************************************/ + +#include "asm.S" + +#ifdef __APPLE__ +.section __RODATA,__rodata +#else +.section .rodata +#endif + +.align 4 + +.text + +#if HIGH_BIT_DEPTH +# if BIT_DEPTH == 10 +# define P2S_SHIFT 4 +# elif BIT_DEPTH == 12 +# define P2S_SHIFT 2 +# endif +.macro p2s_start + add x3, x3, x3 + add x1, x1, x1 + movi v31.8h, #0xe0, lsl #8 +.endm + +#else // if !HIGH_BIT_DEPTH +# define P2S_SHIFT 6 +.macro p2s_start + add x3, x3, x3 + movi v31.8h, #0xe0, lsl #8 +.endm +#endif // HIGH_BIT_DEPTH + +.macro p2s_2x2 +#if HIGH_BIT_DEPTH + ld1 {v0.s}0, x0, x1 + ld1 {v0.s}1, x0, x1 + shl v3.8h, v0.8h, #P2S_SHIFT +#else + ldrh w10, x0 + add x0, x0, x1 + ldrh w11, x0 + orr w10, w10, w11, lsl #16 + add x0, x0, x1 + dup v0.4s, w10 + ushll v3.8h, v0.8b, #P2S_SHIFT +#endif + add v3.8h, v3.8h, v31.8h + st1 {v3.s}0, x2, x3 + st1 {v3.s}1, x2, x3 +.endm + +// filterPixelToShort(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride) +.macro p2s_2xN h +function PFX(filterPixelToShort_2x\h\()_neon) + p2s_start +.rept \h / 2 + p2s_2x2 +.endr + ret +endfunc +.endm + +p2s_2xN 4 +p2s_2xN 8 +p2s_2xN 16 + +.macro p2s_6x2 +#if HIGH_BIT_DEPTH + ld1 {v0.d}0, x0, #8 + ld1 {v1.s}0, x0, x1 + ld1 {v0.d}1, x0, #8 + ld1 {v1.s}1, x0, x1 + shl v3.8h, v0.8h, #P2S_SHIFT + shl v4.8h, v1.8h, #P2S_SHIFT +#else + ldr s0, x0 + ldrh w10, x0, #4 + add x0, x0, x1 + ld1 {v0.s}1, x0 + ldrh w11, x0, #4 + add x0, x0, x1 + orr w10, w10, w11, lsl #16 + dup v1.4s, w10 + ushll v3.8h, v0.8b, #P2S_SHIFT + ushll v4.8h, v1.8b, #P2S_SHIFT +#endif + add v3.8h, v3.8h, v31.8h + add v4.8h, v4.8h, v31.8h + st1 {v3.d}0, x2, #8 + st1 {v4.s}0, x2, x3 + st1 {v3.d}1, x2, #8 + st1 {v4.s}1, x2, x3 +.endm + +.macro p2s_6xN h +function PFX(filterPixelToShort_6x\h\()_neon) + p2s_start + sub x3, x3, #8 +#if HIGH_BIT_DEPTH + sub x1, x1, #8 +#endif +.rept \h / 2 + p2s_6x2 +.endr + ret +endfunc +.endm + +p2s_6xN 8 +p2s_6xN 16 + +function PFX(filterPixelToShort_4x2_neon) + p2s_start +#if HIGH_BIT_DEPTH + ld1 {v0.d}0, x0, x1 + ld1 {v0.d}1, x0, x1 + shl v3.8h, v0.8h, #P2S_SHIFT +#else + ld1 {v0.s}0, x0, x1 + ld1 {v0.s}1, x0, x1 + ushll v3.8h, v0.8b, #P2S_SHIFT +#endif + add v3.8h, v3.8h, v31.8h + st1 {v3.d}0, x2, x3 + st1 {v3.d}1, x2, x3 + ret +endfunc + +function PFX(filterPixelToShort_4x4_neon) + p2s_start +#if HIGH_BIT_DEPTH + ld1 {v0.d}0, x0, x1 + ld1 {v0.d}1, x0, x1 + shl v3.8h, v0.8h, #P2S_SHIFT +#else + ld1 {v0.s}0, x0, x1 + ld1 {v0.s}1, x0, x1 + ushll v3.8h, v0.8b, #P2S_SHIFT +#endif + add v3.8h, v3.8h, v31.8h + st1 {v3.d}0, x2, x3 + st1 {v3.d}1, x2, x3 +#if HIGH_BIT_DEPTH + ld1 {v1.d}0, x0, x1 + ld1 {v1.d}1, x0, x1 + shl v4.8h, v1.8h, #P2S_SHIFT +#else + ld1 {v1.s}0, x0, x1 + ld1 {v1.s}1, x0, x1 + ushll v4.8h, v1.8b, #P2S_SHIFT +#endif + add v4.8h, v4.8h, v31.8h + st1 {v4.d}0, x2, x3 + st1 {v4.d}1, x2, x3 + ret +endfunc + +.macro p2s_4xN h +function PFX(filterPixelToShort_4x\h\()_neon) + p2s_start +.rept \h / 2 +#if HIGH_BIT_DEPTH + ld1 {v0.16b}, x0, x1 + shl v0.8h, v0.8h, #P2S_SHIFT +#else + ld1 {v0.8b}, x0, x1 + ushll v0.8h, v0.8b, #P2S_SHIFT +#endif + add v2.4h, v0.4h, v31.4h + st1 {v2.4h}, x2, x3 +#if HIGH_BIT_DEPTH + ld1 {v1.16b}, x0, x1 + shl v1.8h, v1.8h, #P2S_SHIFT +#else + ld1 {v1.8b}, x0, x1 + ushll v1.8h, v1.8b, #P2S_SHIFT
View file
x265_3.5.tar.gz/source/common/aarch64/pixel-prim.cpp
Added
@@ -0,0 +1,2063 @@ +#include "common.h" +#include "slicetype.h" // LOWRES_COST_MASK +#include "primitives.h" +#include "x265.h" + +#include "pixel-prim.h" +#include "arm64-utils.h" +#if HAVE_NEON + +#include <arm_neon.h> + +using namespace X265_NS; + + + +namespace +{ + + +/* SATD SA8D variants - based on x264 */ +static inline void SUMSUB_AB(int16x8_t &sum, int16x8_t &sub, const int16x8_t a, const int16x8_t b) +{ + sum = vaddq_s16(a, b); + sub = vsubq_s16(a, b); +} + +static inline void transpose_8h(int16x8_t &t1, int16x8_t &t2, const int16x8_t s1, const int16x8_t s2) +{ + t1 = vtrn1q_s16(s1, s2); + t2 = vtrn2q_s16(s1, s2); +} + +static inline void transpose_4s(int16x8_t &t1, int16x8_t &t2, const int16x8_t s1, const int16x8_t s2) +{ + t1 = vtrn1q_s32(s1, s2); + t2 = vtrn2q_s32(s1, s2); +} + +#if (X265_DEPTH <= 10) +static inline void transpose_2d(int16x8_t &t1, int16x8_t &t2, const int16x8_t s1, const int16x8_t s2) +{ + t1 = vtrn1q_s64(s1, s2); + t2 = vtrn2q_s64(s1, s2); +} +#endif + + +static inline void SUMSUB_ABCD(int16x8_t &s1, int16x8_t &d1, int16x8_t &s2, int16x8_t &d2, + int16x8_t a, int16x8_t b, int16x8_t c, int16x8_t d) +{ + SUMSUB_AB(s1, d1, a, b); + SUMSUB_AB(s2, d2, c, d); +} + +static inline void HADAMARD4_V(int16x8_t &r1, int16x8_t &r2, int16x8_t &r3, int16x8_t &r4, + int16x8_t &t1, int16x8_t &t2, int16x8_t &t3, int16x8_t &t4) +{ + SUMSUB_ABCD(t1, t2, t3, t4, r1, r2, r3, r4); + SUMSUB_ABCD(r1, r3, r2, r4, t1, t3, t2, t4); +} + + +static int _satd_4x8_8x4_end_neon(int16x8_t v0, int16x8_t v1, int16x8_t v2, int16x8_t v3) + +{ + + int16x8_t v4, v5, v6, v7, v16, v17, v18, v19; + + + SUMSUB_AB(v16, v17, v0, v1); + SUMSUB_AB(v18, v19, v2, v3); + + SUMSUB_AB(v4 , v6 , v16, v18); + SUMSUB_AB(v5 , v7 , v17, v19); + + v0 = vtrn1q_s16(v4, v5); + v1 = vtrn2q_s16(v4, v5); + v2 = vtrn1q_s16(v6, v7); + v3 = vtrn2q_s16(v6, v7); + + SUMSUB_AB(v16, v17, v0, v1); + SUMSUB_AB(v18, v19, v2, v3); + + v0 = vtrn1q_s32(v16, v18); + v1 = vtrn2q_s32(v16, v18); + v2 = vtrn1q_s32(v17, v19); + v3 = vtrn2q_s32(v17, v19); + + v0 = vabsq_s16(v0); + v1 = vabsq_s16(v1); + v2 = vabsq_s16(v2); + v3 = vabsq_s16(v3); + + v0 = vmaxq_u16(v0, v1); + v1 = vmaxq_u16(v2, v3); + + v0 = vaddq_u16(v0, v1); + return vaddlvq_u16(v0); +} + +static inline int _satd_4x4_neon(int16x8_t v0, int16x8_t v1) +{ + int16x8_t v2, v3; + SUMSUB_AB(v2, v3, v0, v1); + + v0 = vzip1q_s64(v2, v3); + v1 = vzip2q_s64(v2, v3); + SUMSUB_AB(v2, v3, v0, v1); + + v0 = vtrn1q_s16(v2, v3); + v1 = vtrn2q_s16(v2, v3); + SUMSUB_AB(v2, v3, v0, v1); + + v0 = vtrn1q_s32(v2, v3); + v1 = vtrn2q_s32(v2, v3); + + v0 = vabsq_s16(v0); + v1 = vabsq_s16(v1); + v0 = vmaxq_u16(v0, v1); + + return vaddlvq_s16(v0); +} + +static void _satd_8x4v_8x8h_neon(int16x8_t &v0, int16x8_t &v1, int16x8_t &v2, int16x8_t &v3, int16x8_t &v20, + int16x8_t &v21, int16x8_t &v22, int16x8_t &v23) +{ + int16x8_t v16, v17, v18, v19, v4, v5, v6, v7; + + SUMSUB_AB(v16, v18, v0, v2); + SUMSUB_AB(v17, v19, v1, v3); + + HADAMARD4_V(v20, v21, v22, v23, v0, v1, v2, v3); + + transpose_8h(v0, v1, v16, v17); + transpose_8h(v2, v3, v18, v19); + transpose_8h(v4, v5, v20, v21); + transpose_8h(v6, v7, v22, v23); + + SUMSUB_AB(v16, v17, v0, v1); + SUMSUB_AB(v18, v19, v2, v3); + SUMSUB_AB(v20, v21, v4, v5); + SUMSUB_AB(v22, v23, v6, v7); + + transpose_4s(v0, v2, v16, v18); + transpose_4s(v1, v3, v17, v19); + transpose_4s(v4, v6, v20, v22); + transpose_4s(v5, v7, v21, v23); + + v0 = vabsq_s16(v0); + v1 = vabsq_s16(v1); + v2 = vabsq_s16(v2); + v3 = vabsq_s16(v3); + v4 = vabsq_s16(v4); + v5 = vabsq_s16(v5); + v6 = vabsq_s16(v6); + v7 = vabsq_s16(v7); + + v0 = vmaxq_u16(v0, v2); + v1 = vmaxq_u16(v1, v3); + v2 = vmaxq_u16(v4, v6); + v3 = vmaxq_u16(v5, v7); + +} + +#if HIGH_BIT_DEPTH + +#if (X265_DEPTH > 10) +static inline void transpose_2d(int32x4_t &t1, int32x4_t &t2, const int32x4_t s1, const int32x4_t s2) +{ + t1 = vtrn1q_s64(s1, s2); + t2 = vtrn2q_s64(s1, s2); +} + +static inline void ISUMSUB_AB(int32x4_t &sum, int32x4_t &sub, const int32x4_t a, const int32x4_t b) +{ + sum = vaddq_s32(a, b); + sub = vsubq_s32(a, b); +} + +static inline void ISUMSUB_AB_FROM_INT16(int32x4_t &suml, int32x4_t &sumh, int32x4_t &subl, int32x4_t &subh, + const int16x8_t a, const int16x8_t b) +{ + suml = vaddl_s16(vget_low_s16(a), vget_low_s16(b)); + sumh = vaddl_high_s16(a, b); + subl = vsubl_s16(vget_low_s16(a), vget_low_s16(b)); + subh = vsubl_high_s16(a, b); +} + +#endif + +static inline void _sub_8x8_fly(const uint16_t *pix1, intptr_t stride_pix1, const uint16_t *pix2, intptr_t stride_pix2, + int16x8_t &v0, int16x8_t &v1, int16x8_t &v2, int16x8_t &v3, + int16x8_t &v20, int16x8_t &v21, int16x8_t &v22, int16x8_t &v23) +{ + uint16x8_t r0, r1, r2, r3; + uint16x8_t t0, t1, t2, t3; + int16x8_t v16, v17; + int16x8_t v18, v19; +
View file
x265_3.5.tar.gz/source/common/aarch64/pixel-prim.h
Added
@@ -0,0 +1,23 @@ +#ifndef PIXEL_PRIM_NEON_H__ +#define PIXEL_PRIM_NEON_H__ + +#include "common.h" +#include "slicetype.h" // LOWRES_COST_MASK +#include "primitives.h" +#include "x265.h" + + + +namespace X265_NS +{ + + + +void setupPixelPrimitives_neon(EncoderPrimitives &p); + + +} + + +#endif +
View file
x265_3.5.tar.gz/source/common/aarch64/pixel-util.S
Changed
@@ -1,8 +1,9 @@ /***************************************************************************** - * Copyright (C) 2020 MulticoreWare, Inc + * Copyright (C) 2020-2021 MulticoreWare, Inc * * Authors: Yimeng Su <yimeng.su@huawei.com> * Hongbin Liu <liuhongbin1@huawei.com> + * Sebastian Pop <spop@amazon.com> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -24,12 +25,677 @@ #include "asm.S" +#ifdef __APPLE__ +.section __RODATA,__rodata +#else .section .rodata +#endif .align 4 .text +// uint64_t pixel_var(const pixel* pix, intptr_t i_stride) +function PFX(pixel_var_8x8_neon) + ld1 {v4.8b}, x0, x1 // pixx + uxtl v0.8h, v4.8b // sum = pixx + umull v1.8h, v4.8b, v4.8b + uaddlp v1.4s, v1.8h // sqr = pixx * pixx + +.rept 7 + ld1 {v4.8b}, x0, x1 // pixx + umull v31.8h, v4.8b, v4.8b + uaddw v0.8h, v0.8h, v4.8b // sum += pixx + uadalp v1.4s, v31.8h // sqr += pixx * pixx +.endr + uaddlv s0, v0.8h + uaddlv d1, v1.4s + fmov w0, s0 + fmov x1, d1 + orr x0, x0, x1, lsl #32 // return sum + ((uint64_t)sqr << 32); + ret +endfunc + +.macro pixel_var_start + movi v0.16b, #0 + movi v1.16b, #0 + movi v2.16b, #0 + movi v3.16b, #0 +.endm + +.macro pixel_var_1 v + uaddw v0.8h, v0.8h, \v\().8b + umull v30.8h, \v\().8b, \v\().8b + uaddw2 v1.8h, v1.8h, \v\().16b + umull2 v31.8h, \v\().16b, \v\().16b + uadalp v2.4s, v30.8h + uadalp v3.4s, v31.8h +.endm + +.macro pixel_var_end + uaddlv s0, v0.8h + uaddlv s1, v1.8h + add v2.4s, v2.4s, v3.4s + fadd s0, s0, s1 + uaddlv d2, v2.4s + fmov w0, s0 + fmov x2, d2 + orr x0, x0, x2, lsl #32 +.endm + +function PFX(pixel_var_16x16_neon) + pixel_var_start + mov w12, #16 +.loop_var_16: + sub w12, w12, #1 + ld1 {v4.16b}, x0, x1 + pixel_var_1 v4 + cbnz w12, .loop_var_16 + pixel_var_end + ret +endfunc + +function PFX(pixel_var_32x32_neon) + pixel_var_start + mov w12, #32 +.loop_var_32: + sub w12, w12, #1 + ld1 {v4.16b-v5.16b}, x0, x1 + pixel_var_1 v4 + pixel_var_1 v5 + cbnz w12, .loop_var_32 + pixel_var_end + ret +endfunc + +function PFX(pixel_var_64x64_neon) + pixel_var_start + mov w12, #64 +.loop_var_64: + sub w12, w12, #1 + ld1 {v4.16b-v7.16b}, x0, x1 + pixel_var_1 v4 + pixel_var_1 v5 + pixel_var_1 v6 + pixel_var_1 v7 + cbnz w12, .loop_var_64 + pixel_var_end + ret +endfunc + +// void getResidual4_neon(const pixel* fenc, const pixel* pred, int16_t* residual, intptr_t stride) +function PFX(getResidual4_neon) + lsl x4, x3, #1 +.rept 2 + ld1 {v0.8b}, x0, x3 + ld1 {v1.8b}, x1, x3 + ld1 {v2.8b}, x0, x3 + ld1 {v3.8b}, x1, x3 + usubl v4.8h, v0.8b, v1.8b + usubl v5.8h, v2.8b, v3.8b + st1 {v4.8b}, x2, x4 + st1 {v5.8b}, x2, x4 +.endr + ret +endfunc + +function PFX(getResidual8_neon) + lsl x4, x3, #1 +.rept 4 + ld1 {v0.8b}, x0, x3 + ld1 {v1.8b}, x1, x3 + ld1 {v2.8b}, x0, x3 + ld1 {v3.8b}, x1, x3 + usubl v4.8h, v0.8b, v1.8b + usubl v5.8h, v2.8b, v3.8b + st1 {v4.16b}, x2, x4 + st1 {v5.16b}, x2, x4 +.endr + ret +endfunc + +function PFX(getResidual16_neon) + lsl x4, x3, #1 +.rept 8 + ld1 {v0.16b}, x0, x3 + ld1 {v1.16b}, x1, x3 + ld1 {v2.16b}, x0, x3 + ld1 {v3.16b}, x1, x3 + usubl v4.8h, v0.8b, v1.8b + usubl2 v5.8h, v0.16b, v1.16b + usubl v6.8h, v2.8b, v3.8b + usubl2 v7.8h, v2.16b, v3.16b + st1 {v4.8h-v5.8h}, x2, x4 + st1 {v6.8h-v7.8h}, x2, x4 +.endr + ret +endfunc + +function PFX(getResidual32_neon) + lsl x4, x3, #1 + mov w12, #4 +.loop_residual_32: + sub w12, w12, #1 +.rept 4 + ld1 {v0.16b-v1.16b}, x0, x3 + ld1 {v2.16b-v3.16b}, x1, x3 + ld1 {v4.16b-v5.16b}, x0, x3 + ld1 {v6.16b-v7.16b}, x1, x3 + usubl v16.8h, v0.8b, v2.8b + usubl2 v17.8h, v0.16b, v2.16b + usubl v18.8h, v1.8b, v3.8b + usubl2 v19.8h, v1.16b, v3.16b + usubl v20.8h, v4.8b, v6.8b + usubl2 v21.8h, v4.16b, v6.16b + usubl v22.8h, v5.8b, v7.8b + usubl2 v23.8h, v5.16b, v7.16b + st1 {v16.8h-v19.8h}, x2, x4 + st1 {v20.8h-v23.8h}, x2, x4 +.endr + cbnz w12, .loop_residual_32 + ret +endfunc + +// void pixel_sub_ps_neon(int16_t* a, intptr_t dstride, const pixel* b0, const pixel* b1, intptr_t sstride0, intptr_t sstride1) +function PFX(pixel_sub_ps_4x4_neon) + lsl x1, x1, #1 +.rept 2 + ld1 {v0.8b}, x2, x4 + ld1 {v1.8b}, x3, x5 + ld1 {v2.8b}, x2, x4 + ld1 {v3.8b}, x3, x5 + usubl v4.8h, v0.8b, v1.8b + usubl v5.8h, v2.8b, v3.8b + st1 {v4.4h}, x0, x1 + st1 {v5.4h}, x0, x1 +.endr + ret
View file
x265_3.5.tar.gz/source/common/aarch64/pixel-util.h
Deleted
@@ -1,40 +0,0 @@ -/***************************************************************************** - * Copyright (C) 2020 MulticoreWare, Inc - * - * Authors: Yimeng Su <yimeng.su@huawei.com> - * Hongbin Liu <liuhongbin1@huawei.com> - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. - * - * This program is also available under a commercial proprietary license. - * For more information, contact us at license @ x265.com. - *****************************************************************************/ - -#ifndef X265_PIXEL_UTIL_AARCH64_H -#define X265_PIXEL_UTIL_AARCH64_H - -int x265_pixel_satd_4x4_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); -int x265_pixel_satd_4x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); -int x265_pixel_satd_4x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); -int x265_pixel_satd_4x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); -int x265_pixel_satd_8x4_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); -int x265_pixel_satd_8x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); -int x265_pixel_satd_12x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); -int x265_pixel_satd_12x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2); - -uint32_t x265_quant_neon(const int16_t* coef, const int32_t* quantCoeff, int32_t* deltaU, int16_t* qCoef, int qBits, int add, int numCoeff); -int PFX(psyCost_4x4_neon)(const pixel* source, intptr_t sstride, const pixel* recon, intptr_t rstride); - -#endif // ifndef X265_PIXEL_UTIL_AARCH64_H
View file
x265_3.5.tar.gz/source/common/aarch64/pixel.h
Deleted
@@ -1,105 +0,0 @@ -/***************************************************************************** - * Copyright (C) 2020 MulticoreWare, Inc - * - * Authors: Hongbin Liu <liuhongbin1@huawei.com> - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. - * - * This program is also available under a commercial proprietary license. - * For more information, contact us at license @ x265.com. - *****************************************************************************/ - -#ifndef X265_I386_PIXEL_AARCH64_H -#define X265_I386_PIXEL_AARCH64_H - -void x265_pixel_avg_pp_4x4_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_4x8_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_4x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_8x4_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_8x8_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_8x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_8x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_12x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_16x4_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_16x8_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_16x12_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_16x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_16x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_16x64_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_24x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_32x8_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_32x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_32x24_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_32x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_32x64_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_48x64_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_64x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_64x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_64x48_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); -void x265_pixel_avg_pp_64x64_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); - -void x265_sad_x3_4x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_4x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_4x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_8x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_8x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_8x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_8x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_12x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_16x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_16x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_16x12_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_16x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_16x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_16x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_24x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_32x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_32x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_32x24_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_32x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_32x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_48x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_64x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_64x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_64x48_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); -void x265_sad_x3_64x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res); - -void x265_sad_x4_4x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_4x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_4x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_8x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_8x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_8x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_8x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_12x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_16x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_16x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_16x12_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_16x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_16x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_16x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_24x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_32x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_32x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_32x24_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_32x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_32x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_48x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_64x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_64x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_64x48_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); -void x265_sad_x4_64x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res); - -#endif // ifndef X265_I386_PIXEL_AARCH64_H
View file
x265_3.5.tar.gz/source/common/aarch64/sad-a.S
Changed
@@ -1,7 +1,8 @@ /***************************************************************************** - * Copyright (C) 2020 MulticoreWare, Inc + * Copyright (C) 2020-2021 MulticoreWare, Inc * * Authors: Hongbin Liu <liuhongbin1@huawei.com> + * Sebastian Pop <spop@amazon.com> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -23,83 +24,661 @@ #include "asm.S" +#ifdef __APPLE__ +.section __RODATA,__rodata +#else .section .rodata +#endif .align 4 .text -.macro SAD_X_START_8 x - ld1 {v0.8b}, x0, x9 -.if \x == 3 - ld1 {v1.8b}, x1, x4 - ld1 {v2.8b}, x2, x4 - ld1 {v3.8b}, x3, x4 -.elseif \x == 4 - ld1 {v1.8b}, x1, x5 - ld1 {v2.8b}, x2, x5 - ld1 {v3.8b}, x3, x5 - ld1 {v4.8b}, x4, x5 +.macro SAD_START_4 f + ld1 {v0.s}0, x0, x1 + ld1 {v0.s}1, x0, x1 + ld1 {v1.s}0, x2, x3 + ld1 {v1.s}1, x2, x3 + \f v16.8h, v0.8b, v1.8b +.endm + +.macro SAD_4 h +.rept \h / 2 - 1 + SAD_START_4 uabal +.endr +.endm + +.macro SAD_START_8 f + ld1 {v0.8b}, x0, x1 + ld1 {v1.8b}, x2, x3 + ld1 {v2.8b}, x0, x1 + ld1 {v3.8b}, x2, x3 + \f v16.8h, v0.8b, v1.8b + \f v17.8h, v2.8b, v3.8b +.endm + +.macro SAD_8 h +.rept \h / 2 - 1 + SAD_START_8 uabal +.endr +.endm + +.macro SAD_START_16 f + ld1 {v0.16b}, x0, x1 + ld1 {v1.16b}, x2, x3 + ld1 {v2.16b}, x0, x1 + ld1 {v3.16b}, x2, x3 + \f v16.8h, v0.8b, v1.8b + \f\()2 v17.8h, v0.16b, v1.16b + uabal v16.8h, v2.8b, v3.8b + uabal2 v17.8h, v2.16b, v3.16b +.endm + +.macro SAD_16 h +.rept \h / 2 - 1 + SAD_START_16 uabal +.endr +.endm + +.macro SAD_START_32 + movi v16.16b, #0 + movi v17.16b, #0 + movi v18.16b, #0 + movi v19.16b, #0 +.endm + +.macro SAD_32 + ld1 {v0.16b-v1.16b}, x0, x1 + ld1 {v2.16b-v3.16b}, x2, x3 + ld1 {v4.16b-v5.16b}, x0, x1 + ld1 {v6.16b-v7.16b}, x2, x3 + uabal v16.8h, v0.8b, v2.8b + uabal2 v17.8h, v0.16b, v2.16b + uabal v18.8h, v1.8b, v3.8b + uabal2 v19.8h, v1.16b, v3.16b + uabal v16.8h, v4.8b, v6.8b + uabal2 v17.8h, v4.16b, v6.16b + uabal v18.8h, v5.8b, v7.8b + uabal2 v19.8h, v5.16b, v7.16b +.endm + +.macro SAD_END_32 + add v16.8h, v16.8h, v17.8h + add v17.8h, v18.8h, v19.8h + add v16.8h, v16.8h, v17.8h + uaddlv s0, v16.8h + fmov w0, s0 + ret +.endm + +.macro SAD_START_64 + movi v16.16b, #0 + movi v17.16b, #0 + movi v18.16b, #0 + movi v19.16b, #0 + movi v20.16b, #0 + movi v21.16b, #0 + movi v22.16b, #0 + movi v23.16b, #0 +.endm + +.macro SAD_64 + ld1 {v0.16b-v3.16b}, x0, x1 + ld1 {v4.16b-v7.16b}, x2, x3 + ld1 {v24.16b-v27.16b}, x0, x1 + ld1 {v28.16b-v31.16b}, x2, x3 + uabal v16.8h, v0.8b, v4.8b + uabal2 v17.8h, v0.16b, v4.16b + uabal v18.8h, v1.8b, v5.8b + uabal2 v19.8h, v1.16b, v5.16b + uabal v20.8h, v2.8b, v6.8b + uabal2 v21.8h, v2.16b, v6.16b + uabal v22.8h, v3.8b, v7.8b + uabal2 v23.8h, v3.16b, v7.16b + + uabal v16.8h, v24.8b, v28.8b + uabal2 v17.8h, v24.16b, v28.16b + uabal v18.8h, v25.8b, v29.8b + uabal2 v19.8h, v25.16b, v29.16b + uabal v20.8h, v26.8b, v30.8b + uabal2 v21.8h, v26.16b, v30.16b + uabal v22.8h, v27.8b, v31.8b + uabal2 v23.8h, v27.16b, v31.16b +.endm + +.macro SAD_END_64 + add v16.8h, v16.8h, v17.8h + add v17.8h, v18.8h, v19.8h + add v16.8h, v16.8h, v17.8h + uaddlp v16.4s, v16.8h + add v18.8h, v20.8h, v21.8h + add v19.8h, v22.8h, v23.8h + add v17.8h, v18.8h, v19.8h + uaddlp v17.4s, v17.8h + add v16.4s, v16.4s, v17.4s + uaddlv d0, v16.4s + fmov x0, d0 + ret +.endm + +.macro SAD_START_12 + movrel x12, sad12_mask + ld1 {v31.16b}, x12 + movi v16.16b, #0 + movi v17.16b, #0 +.endm + +.macro SAD_12 + ld1 {v0.16b}, x0, x1 + and v0.16b, v0.16b, v31.16b + ld1 {v1.16b}, x2, x3 + and v1.16b, v1.16b, v31.16b + ld1 {v2.16b}, x0, x1 + and v2.16b, v2.16b, v31.16b + ld1 {v3.16b}, x2, x3 + and v3.16b, v3.16b, v31.16b + uabal v16.8h, v0.8b, v1.8b + uabal2 v17.8h, v0.16b, v1.16b + uabal v16.8h, v2.8b, v3.8b + uabal2 v17.8h, v2.16b, v3.16b +.endm + +.macro SAD_END_12 + add v16.8h, v16.8h, v17.8h + uaddlv s0, v16.8h + fmov w0, s0 + ret +.endm + +.macro SAD_START_24 + movi v16.16b, #0 + movi v17.16b, #0 + movi v18.16b, #0 + sub x1, x1, #16 + sub x3, x3, #16 +.endm + +.macro SAD_24
View file
x265_3.5.tar.gz/source/common/aarch64/ssd-a.S
Added
@@ -0,0 +1,483 @@ +/***************************************************************************** + * Copyright (C) 2021 MulticoreWare, Inc + * + * Authors: Sebastian Pop <spop@amazon.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. + * + * This program is also available under a commercial proprietary license. + * For more information, contact us at license @ x265.com. + *****************************************************************************/ + +#include "asm.S" + +#ifdef __APPLE__ +.section __RODATA,__rodata +#else +.section .rodata +#endif + +.align 4 + +.text + +.macro ret_v0_w0 + trn2 v1.2d, v0.2d, v0.2d + add v0.2s, v0.2s, v1.2s + addp v0.2s, v0.2s, v0.2s + fmov w0, s0 + ret +.endm + +function PFX(pixel_sse_pp_4x4_neon) + ld1 {v16.s}0, x0, x1 + ld1 {v17.s}0, x2, x3 + ld1 {v18.s}0, x0, x1 + ld1 {v19.s}0, x2, x3 + ld1 {v20.s}0, x0, x1 + ld1 {v21.s}0, x2, x3 + ld1 {v22.s}0, x0, x1 + ld1 {v23.s}0, x2, x3 + + usubl v1.8h, v16.8b, v17.8b + usubl v2.8h, v18.8b, v19.8b + usubl v3.8h, v20.8b, v21.8b + usubl v4.8h, v22.8b, v23.8b + + smull v0.4s, v1.4h, v1.4h + smlal v0.4s, v2.4h, v2.4h + smlal v0.4s, v3.4h, v3.4h + smlal v0.4s, v4.4h, v4.4h + ret_v0_w0 +endfunc + +function PFX(pixel_sse_pp_4x8_neon) + ld1 {v16.s}0, x0, x1 + ld1 {v17.s}0, x2, x3 + usubl v1.8h, v16.8b, v17.8b + ld1 {v16.s}0, x0, x1 + ld1 {v17.s}0, x2, x3 + smull v0.4s, v1.4h, v1.4h +.rept 6 + usubl v1.8h, v16.8b, v17.8b + ld1 {v16.s}0, x0, x1 + smlal v0.4s, v1.4h, v1.4h + ld1 {v17.s}0, x2, x3 +.endr + usubl v1.8h, v16.8b, v17.8b + smlal v0.4s, v1.4h, v1.4h + ret_v0_w0 +endfunc + +function PFX(pixel_sse_pp_8x8_neon) + ld1 {v16.8b}, x0, x1 + ld1 {v17.8b}, x2, x3 + usubl v1.8h, v16.8b, v17.8b + ld1 {v16.8b}, x0, x1 + smull v0.4s, v1.4h, v1.4h + smlal2 v0.4s, v1.8h, v1.8h + ld1 {v17.8b}, x2, x3 + +.rept 6 + usubl v1.8h, v16.8b, v17.8b + ld1 {v16.8b}, x0, x1 + smlal v0.4s, v1.4h, v1.4h + smlal2 v0.4s, v1.8h, v1.8h + ld1 {v17.8b}, x2, x3 +.endr + usubl v1.8h, v16.8b, v17.8b + smlal v0.4s, v1.4h, v1.4h + smlal2 v0.4s, v1.8h, v1.8h + ret_v0_w0 +endfunc + +function PFX(pixel_sse_pp_8x16_neon) + ld1 {v16.8b}, x0, x1 + ld1 {v17.8b}, x2, x3 + usubl v1.8h, v16.8b, v17.8b + ld1 {v16.8b}, x0, x1 + smull v0.4s, v1.4h, v1.4h + smlal2 v0.4s, v1.8h, v1.8h + ld1 {v17.8b}, x2, x3 + +.rept 14 + usubl v1.8h, v16.8b, v17.8b + ld1 {v16.8b}, x0, x1 + smlal v0.4s, v1.4h, v1.4h + smlal2 v0.4s, v1.8h, v1.8h + ld1 {v17.8b}, x2, x3 +.endr + usubl v1.8h, v16.8b, v17.8b + smlal v0.4s, v1.4h, v1.4h + smlal2 v0.4s, v1.8h, v1.8h + ret_v0_w0 +endfunc + +.macro sse_pp_16xN h +function PFX(pixel_sse_pp_16x\h\()_neon) + ld1 {v16.16b}, x0, x1 + ld1 {v17.16b}, x2, x3 + usubl v1.8h, v16.8b, v17.8b + usubl2 v2.8h, v16.16b, v17.16b + ld1 {v16.16b}, x0, x1 + ld1 {v17.16b}, x2, x3 + smull v0.4s, v1.4h, v1.4h + smlal2 v0.4s, v1.8h, v1.8h + smlal v0.4s, v2.4h, v2.4h + smlal2 v0.4s, v2.8h, v2.8h +.rept \h - 2 + usubl v1.8h, v16.8b, v17.8b + usubl2 v2.8h, v16.16b, v17.16b + ld1 {v16.16b}, x0, x1 + smlal v0.4s, v1.4h, v1.4h + smlal2 v0.4s, v1.8h, v1.8h + ld1 {v17.16b}, x2, x3 + smlal v0.4s, v2.4h, v2.4h + smlal2 v0.4s, v2.8h, v2.8h +.endr + usubl v1.8h, v16.8b, v17.8b + usubl2 v2.8h, v16.16b, v17.16b + smlal v0.4s, v1.4h, v1.4h + smlal2 v0.4s, v1.8h, v1.8h + smlal v0.4s, v2.4h, v2.4h + smlal2 v0.4s, v2.8h, v2.8h + ret_v0_w0 +endfunc +.endm + +sse_pp_16xN 16 +sse_pp_16xN 32 + +function PFX(pixel_sse_pp_32x32_neon) + mov w12, #8 + movi v0.16b, #0 + movi v1.16b, #0 +.loop_sse_pp_32: + sub w12, w12, #1 +.rept 4 + ld1 {v16.16b,v17.16b}, x0, x1 + ld1 {v18.16b,v19.16b}, x2, x3 + usubl v2.8h, v16.8b, v18.8b + usubl2 v3.8h, v16.16b, v18.16b + usubl v4.8h, v17.8b, v19.8b + usubl2 v5.8h, v17.16b, v19.16b + smlal v0.4s, v2.4h, v2.4h + smlal2 v1.4s, v2.8h, v2.8h + smlal v0.4s, v3.4h, v3.4h + smlal2 v1.4s, v3.8h, v3.8h + smlal v0.4s, v4.4h, v4.4h + smlal2 v1.4s, v4.8h, v4.8h + smlal v0.4s, v5.4h, v5.4h + smlal2 v1.4s, v5.8h, v5.8h +.endr + cbnz w12, .loop_sse_pp_32 + add v0.4s, v0.4s, v1.4s + ret_v0_w0 +endfunc + +function PFX(pixel_sse_pp_32x64_neon) + mov w12, #16 + movi v0.16b, #0 + movi v1.16b, #0 +.loop_sse_pp_32x64: + sub w12, w12, #1 +.rept 4 + ld1 {v16.16b,v17.16b}, x0, x1 + ld1 {v18.16b,v19.16b}, x2, x3 + usubl v2.8h, v16.8b, v18.8b
View file
x265_3.5.tar.gz/source/common/common.h
Changed
@@ -130,7 +130,7 @@ typedef uint64_t pixel4; typedef int64_t ssum2_t; #define SHIFT_TO_BITPLANE 9 -#define HISTOGRAM_BINS 1024 +#define BRIGHTNESS_THRESHOLD 120 // The threshold above which a pixel is bright #else typedef uint8_t pixel; typedef uint16_t sum_t; @@ -138,7 +138,7 @@ typedef uint32_t pixel4; typedef int32_t ssum2_t; // Signed sum #define SHIFT_TO_BITPLANE 7 -#define HISTOGRAM_BINS 256 +#define BRIGHTNESS_THRESHOLD 30 // The threshold above which a pixel is bright #endif // if HIGH_BIT_DEPTH #if X265_DEPTH < 10 @@ -162,6 +162,8 @@ #define MIN_QPSCALE 0.21249999999999999 #define MAX_MAX_QPSCALE 615.46574234477100 +#define FRAME_BRIGHTNESS_THRESHOLD 50.0 // Min % of pixels in a frame, that are above BRIGHTNESS_THRESHOLD for it to be considered a bright frame +#define FRAME_EDGE_THRESHOLD 10.0 // Min % of edge pixels in a frame, for it to be considered to have high edge density template<typename T> @@ -340,6 +342,9 @@ #define FILLER_OVERHEAD (NAL_TYPE_OVERHEAD + START_CODE_OVERHEAD + 1) #define MAX_NUM_DYN_REFINE (NUM_CU_DEPTH * X265_REFINE_INTER_LEVELS) +#define X265_BYTE 8 + +#define MAX_MCSTF_TEMPORAL_WINDOW_LENGTH 8 namespace X265_NS { @@ -434,6 +439,14 @@ #define x265_unlink(fileName) unlink(fileName) #define x265_rename(oldName, newName) rename(oldName, newName) #endif +/* Close a file */ +#define x265_fclose(file) if (file != NULL) fclose(file); file=NULL; +#define x265_fread(val, size, readSize, fileOffset,errorMessage)\ + if (fread(val, size, readSize, fileOffset) != readSize)\ + {\ + x265_log(NULL, X265_LOG_ERROR, errorMessage); \ + return; \ + } int x265_exp2fix8(double x); double x265_ssim2dB(double ssim);
View file
x265_3.5.tar.gz/source/common/cpu.cpp
Changed
@@ -7,6 +7,8 @@ * Steve Borho <steve@borho.org> * Hongbin Liu <liuhongbin1@huawei.com> * Yimeng Su <yimeng.su@huawei.com> + * Josh Dekker <josh@itanimul.li> + * Jean-Baptiste Kempf <jb@videolan.org> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -105,6 +107,9 @@ { "NEON", X265_CPU_NEON }, { "FastNeonMRC", X265_CPU_FAST_NEON_MRC }, +#elif X265_ARCH_ARM64 + { "NEON", X265_CPU_NEON }, + #elif X265_ARCH_POWER8 { "Altivec", X265_CPU_ALTIVEC }, @@ -369,12 +374,23 @@ flags |= PFX(cpu_fast_neon_mrc_test)() ? X265_CPU_FAST_NEON_MRC : 0; #endif // TODO: write dual issue test? currently it's A8 (dual issue) vs. A9 (fast mrc) -#elif X265_ARCH_ARM64 - flags |= X265_CPU_NEON; #endif // if HAVE_ARMV6 return flags; } +#elif X265_ARCH_ARM64 + +uint32_t cpu_detect(bool benableavx512) +{ + int flags = 0; + + #if HAVE_NEON + flags |= X265_CPU_NEON; + #endif + + return flags; +} + #elif X265_ARCH_POWER8 uint32_t cpu_detect(bool benableavx512)
View file
x265_3.5.tar.gz/source/common/frame.cpp
Changed
@@ -63,13 +63,40 @@ m_thetaPic = NULL; m_edgeBitPlane = NULL; m_edgeBitPic = NULL; + m_frameSegment = 0; m_isInsideWindow = 0; + + // mcstf + m_isSubSampled = NULL; + m_mcstf = NULL; + m_refPicCnt0 = 0; + m_refPicCnt1 = 0; + m_nextMCSTF = NULL; + m_prevMCSTF = NULL; + } bool Frame::create(x265_param *param, float* quantOffsets) { m_fencPic = new PicYuv; m_param = param; + + if (m_param->bEnableTemporalFilter) + { + m_mcstf = new TemporalFilter; + m_mcstf->init(param); + + m_fencPicSubsampled2 = new PicYuv; + m_fencPicSubsampled4 = new PicYuv; + + if (!m_fencPicSubsampled2->createScaledPicYUV(param, 2)) + return false; + if (!m_fencPicSubsampled4->createScaledPicYUV(param, 4)) + return false; + + CHECKED_MALLOC_ZERO(m_isSubSampled, int, 1); + } + CHECKED_MALLOC_ZERO(m_rcData, RcStats, 1); if (param->bCTUInfo) @@ -104,7 +131,7 @@ CHECKED_MALLOC_ZERO(m_classifyCount, uint32_t, size); } - if (param->rc.aqMode == X265_AQ_EDGE || (param->rc.zonefileCount && param->rc.aqMode != 0)) + if (param->rc.aqMode == X265_AQ_EDGE || param->rc.frameSegment || (param->rc.zonefileCount && param->rc.aqMode != 0)) { uint32_t numCuInWidth = (param->sourceWidth + param->maxCUSize - 1) / param->maxCUSize; uint32_t numCuInHeight = (param->sourceHeight + param->maxCUSize - 1) / param->maxCUSize; @@ -151,6 +178,22 @@ return false; } +bool Frame::createSubSample() +{ + + m_fencPicSubsampled2 = new PicYuv; + m_fencPicSubsampled4 = new PicYuv; + + if (!m_fencPicSubsampled2->createScaledPicYUV(m_param, 2)) + return false; + if (!m_fencPicSubsampled4->createScaledPicYUV(m_param, 4)) + return false; + CHECKED_MALLOC_ZERO(m_isSubSampled, int, 1); + return true; +fail: + return false; +} + bool Frame::allocEncodeData(x265_param *param, const SPS& sps) { m_encData = new FrameData; @@ -207,6 +250,26 @@ m_fencPic = NULL; } + if (m_param->bEnableTemporalFilter) + { + + if (m_fencPicSubsampled2) + { + m_fencPicSubsampled2->destroy(); + delete m_fencPicSubsampled2; + m_fencPicSubsampled2 = NULL; + } + + if (m_fencPicSubsampled4) + { + m_fencPicSubsampled4->destroy(); + delete m_fencPicSubsampled4; + m_fencPicSubsampled4 = NULL; + } + delete m_mcstf; + X265_FREE(m_isSubSampled); + } + if (m_reconPic) { m_reconPic->destroy(); @@ -267,7 +330,8 @@ X265_FREE(m_addOnPrevChange); m_addOnPrevChange = NULL; } - m_lowres.destroy(); + + m_lowres.destroy(m_param); X265_FREE(m_rcData); if (m_param->bDynamicRefine)
View file
x265_3.5.tar.gz/source/common/frame.h
Changed
@@ -28,6 +28,7 @@ #include "common.h" #include "lowres.h" #include "threading.h" +#include "temporalfilter.h" namespace X265_NS { // private namespace @@ -70,6 +71,7 @@ double count4; double offset4; double bufferFillFinal; + int64_t currentSatd; }; class Frame @@ -83,6 +85,9 @@ /* Data associated with x265_picture */ PicYuv* m_fencPic; + PicYuv* m_fencPicSubsampled2; + PicYuv* m_fencPicSubsampled4; + int m_poc; int m_encodeOrder; int64_t m_pts; // user provided presentation time stamp @@ -132,6 +137,13 @@ bool m_classifyFrame; int m_fieldNum; + /*MCSTF*/ + TemporalFilter* m_mcstf; + int m_refPicCnt2; + Frame* m_nextMCSTF; // PicList doubly linked list pointers + Frame* m_prevMCSTF; + int* m_isSubSampled; + /* aq-mode 4 : Gaussian, edge and theta frames for edge information */ pixel* m_edgePic; pixel* m_gaussianPic; @@ -141,11 +153,16 @@ pixel* m_edgeBitPlane; pixel* m_edgeBitPic; + /* segment for each frame */ + int m_frameSegment; + + int m_isInsideWindow; Frame(); bool create(x265_param *param, float* quantOffsets); + bool createSubSample(); bool allocEncodeData(x265_param *param, const SPS& sps); void reinit(const SPS& sps); void destroy();
View file
x265_3.5.tar.gz/source/common/framedata.cpp
Changed
@@ -62,7 +62,7 @@ } else return false; - CHECKED_MALLOC_ZERO(m_cuStat, RCStatCU, sps.numCUsInFrame); + CHECKED_MALLOC_ZERO(m_cuStat, RCStatCU, sps.numCUsInFrame + 1); CHECKED_MALLOC(m_rowStat, RCStatRow, sps.numCuInHeight); reinit(sps);
View file
x265_3.5.tar.gz/source/common/lowres.cpp
Changed
@@ -28,6 +28,28 @@ using namespace X265_NS; +/* + * Down Sample input picture + */ +static +void frame_lowres_core(const pixel* src0, pixel* dst0, + intptr_t src_stride, intptr_t dst_stride, int width, int height) +{ + for (int y = 0; y < height; y++) + { + const pixel* src1 = src0 + src_stride; + for (int x = 0; x < width; x++) + { + // slower than naive bilinear, but matches asm +#define FILTER(a, b, c, d) ((((a + b + 1) >> 1) + ((c + d + 1) >> 1) + 1) >> 1) + dst0x = FILTER(src02 * x, src12 * x, src02 * x + 1, src12 * x + 1); +#undef FILTER + } + src0 += src_stride * 2; + dst0 += dst_stride; + } +} + bool PicQPAdaptationLayer::create(uint32_t width, uint32_t height, uint32_t partWidth, uint32_t partHeight, uint32_t numAQPartInWidthExt, uint32_t numAQPartInHeightExt) { aqPartWidth = partWidth; @@ -190,13 +212,48 @@ } } + if (param->rc.frameSegment) + lowresEdgePlane = X265_MALLOC(pixel, lumaStride * (lines + (origPic->m_lumaMarginY * 2))); + + if (param->bHistBasedSceneCut) + { + quarterSampleLowResWidth = widthFullRes / 4; + quarterSampleLowResHeight = heightFullRes / 4; + quarterSampleLowResOriginX = 16; + quarterSampleLowResOriginY = 16; + quarterSampleLowResStrideY = quarterSampleLowResWidth + 2 * quarterSampleLowResOriginY; + + size_t quarterSampleLowResPlanesize = quarterSampleLowResStrideY * (quarterSampleLowResHeight + 2 * quarterSampleLowResOriginX); + /* allocate quarter sampled lowres buffers */ + CHECKED_MALLOC_ZERO(quarterSampleLowResBuffer, pixel, quarterSampleLowResPlanesize); + + // Allocate memory for Histograms + picHistogram = X265_MALLOC(uint32_t***, NUMBER_OF_SEGMENTS_IN_WIDTH * sizeof(uint32_t***)); + picHistogram0 = X265_MALLOC(uint32_t**, NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT); + for (uint32_t wd = 1; wd < NUMBER_OF_SEGMENTS_IN_WIDTH; wd++) { + picHistogramwd = picHistogram0 + wd * NUMBER_OF_SEGMENTS_IN_HEIGHT; + } + + for (uint32_t regionInPictureWidthIndex = 0; regionInPictureWidthIndex < NUMBER_OF_SEGMENTS_IN_WIDTH; regionInPictureWidthIndex++) + { + for (uint32_t regionInPictureHeightIndex = 0; regionInPictureHeightIndex < NUMBER_OF_SEGMENTS_IN_HEIGHT; regionInPictureHeightIndex++) + { + picHistogramregionInPictureWidthIndexregionInPictureHeightIndex = X265_MALLOC(uint32_t*, NUMBER_OF_SEGMENTS_IN_WIDTH *sizeof(uint32_t*)); + picHistogramregionInPictureWidthIndexregionInPictureHeightIndex0 = X265_MALLOC(uint32_t, 3 * HISTOGRAM_NUMBER_OF_BINS * sizeof(uint32_t)); + for (uint32_t wd = 1; wd < 3; wd++) { + picHistogramregionInPictureWidthIndexregionInPictureHeightIndexwd = picHistogramregionInPictureWidthIndexregionInPictureHeightIndex0 + wd * HISTOGRAM_NUMBER_OF_BINS; + } + } + } + } + return true; fail: return false; } -void Lowres::destroy() +void Lowres::destroy(x265_param* param) { X265_FREE(buffer0); if(bEnableHME) @@ -234,7 +291,9 @@ X265_FREE(invQscaleFactor8x8); X265_FREE(edgeInclined); X265_FREE(qpAqMotionOffset); - X265_FREE(blockVariance); + if (param->bDynamicRefine || param->bEnableFades) + X265_FREE(blockVariance); + if (maxAQDepth > 0) { for (uint32_t d = 0; d < 4; d++) @@ -254,6 +313,29 @@ delete pAQLayer; } + + // Histograms + if (param->bHistBasedSceneCut) + { + for (uint32_t segmentInFrameWidthIdx = 0; segmentInFrameWidthIdx < NUMBER_OF_SEGMENTS_IN_WIDTH; segmentInFrameWidthIdx++) + { + if (picHistogramsegmentInFrameWidthIdx) + { + for (uint32_t segmentInFrameHeightIdx = 0; segmentInFrameHeightIdx < NUMBER_OF_SEGMENTS_IN_HEIGHT; segmentInFrameHeightIdx++) + { + if (picHistogramsegmentInFrameWidthIdxsegmentInFrameHeightIdx) + X265_FREE(picHistogramsegmentInFrameWidthIdxsegmentInFrameHeightIdx0); + X265_FREE(picHistogramsegmentInFrameWidthIdxsegmentInFrameHeightIdx); + } + } + } + if (picHistogram) + X265_FREE(picHistogram0); + X265_FREE(picHistogram); + + X265_FREE(quarterSampleLowResBuffer); + + } } // (re) initialize lowres state void Lowres::init(PicYuv *origPic, int poc) @@ -266,10 +348,6 @@ indB = 0; memset(costEst, -1, sizeof(costEst)); memset(weightedCostDelta, 0, sizeof(weightedCostDelta)); - interPCostPercDiff = 0.0; - intraCostPercDiff = 0.0; - m_bIsMaxThres = false; - m_bIsHardScenecut = false; if (qpAqOffset && invQscaleFactor) memset(costEstAq, -1, sizeof(costEstAq)); @@ -314,4 +392,16 @@ } fpelPlane0 = lowresPlane0; + + if (origPic->m_param->bHistBasedSceneCut) + { + // Quarter Sampled Input Picture Formation + // TO DO: Replace with ASM function + frame_lowres_core( + lowresPlane0, + quarterSampleLowResBuffer + quarterSampleLowResOriginX + quarterSampleLowResOriginY * quarterSampleLowResStrideY, + lumaStride, + quarterSampleLowResStrideY, + widthFullRes / 4, heightFullRes / 4); + } }
View file
x265_3.5.tar.gz/source/common/lowres.h
Changed
@@ -32,6 +32,10 @@ namespace X265_NS { // private namespace +#define HISTOGRAM_NUMBER_OF_BINS 256 +#define NUMBER_OF_SEGMENTS_IN_WIDTH 4 +#define NUMBER_OF_SEGMENTS_IN_HEIGHT 4 + struct ReferencePlanes { ReferencePlanes() { memset(this, 0, sizeof(ReferencePlanes)); } @@ -44,6 +48,9 @@ pixel* fpelLowerResPlane3; pixel* lowerResPlane4; + /* Edge Plane in Lowres */ + pixel* lowresEdgePlane; + bool isWeighted; bool isLowres; bool isHMELowres; @@ -214,13 +221,13 @@ double* qpAqOffset; // AQ QP offset values for each 16x16 CU double* qpCuTreeOffset; // cuTree QP offset values for each 16x16 CU double* qpAqMotionOffset; - int* invQscaleFactor; // qScale values for qp Aq Offsets + int* invQscaleFactor; // qScale values for qp Aq Offsets int* invQscaleFactor8x8; // temporary buffer for qg-size 8 uint32_t* blockVariance; uint64_t wp_ssd3; // This is different than SSDY, this is sum(pixel^2) - sum(pixel)^2 for entire frame uint64_t wp_sum3; double frameVariance; - int* edgeInclined; + int* edgeInclined; /* cutree intermediate data */ @@ -230,18 +237,30 @@ uint32_t heightFullRes; uint32_t m_maxCUSize; uint32_t m_qgSize; - + uint16_t* propagateCost; double weightedCostDeltaX265_BFRAME_MAX + 2; ReferencePlanes weightedRefX265_BFRAME_MAX + 2; + /* For hist-based scenecut */ - bool m_bIsMaxThres; - double interPCostPercDiff; - double intraCostPercDiff; - bool m_bIsHardScenecut; + int quarterSampleLowResWidth; // width of 1/4 lowres frame in pixels + int quarterSampleLowResHeight; // height of 1/4 lowres frame in pixels + int quarterSampleLowResStrideY; + int quarterSampleLowResOriginX; + int quarterSampleLowResOriginY; + pixel *quarterSampleLowResBuffer; + bool bHistScenecutAnalyzed; + + uint16_t picAvgVariance; + uint16_t picAvgVarianceCb; + uint16_t picAvgVarianceCr; + + uint32_t ****picHistogram; + uint64_t averageIntensityPerSegmentNUMBER_OF_SEGMENTS_IN_WIDTHNUMBER_OF_SEGMENTS_IN_HEIGHT3; + uint8_t averageIntensity3; bool create(x265_param* param, PicYuv *origPic, uint32_t qgSize); - void destroy(); + void destroy(x265_param* param); void init(PicYuv *origPic, int poc); }; }
View file
x265_3.5.tar.gz/source/common/mv.h
Changed
@@ -105,6 +105,8 @@ { return x >= _min.x && x <= _max.x && y >= _min.y && y <= _max.y; } + + void set(int32_t _x, int32_t _y) { x = _x; y = _y; } }; }
View file
x265_3.5.tar.gz/source/common/param.cpp
Changed
@@ -145,6 +145,8 @@ param->bAnnexB = 1; param->bRepeatHeaders = 0; param->bEnableAccessUnitDelimiters = 0; + param->bEnableEndOfBitstream = 0; + param->bEnableEndOfSequence = 0; param->bEmitHRDSEI = 0; param->bEmitInfoSEI = 1; param->bEmitHDRSEI = 0; /*Deprecated*/ @@ -168,7 +170,6 @@ param->bFrameAdaptive = X265_B_ADAPT_TRELLIS; param->bBPyramid = 1; param->scenecutThreshold = 40; /* Magic number pulled in from x264 */ - param->edgeTransitionThreshold = 0.03; param->bHistBasedSceneCut = 0; param->lookaheadSlices = 8; param->lookaheadThreads = 0; @@ -278,7 +279,10 @@ param->rc.rfConstantMin = 0; param->rc.bStatRead = 0; param->rc.bStatWrite = 0; + param->rc.dataShareMode = X265_SHARE_MODE_FILE; param->rc.statFileName = NULL; + param->rc.sharedMemName = NULL; + param->rc.bEncFocusedFramesOnly = 0; param->rc.complexityBlur = 20; param->rc.qblur = 0.5; param->rc.zoneCount = 0; @@ -292,6 +296,7 @@ param->rc.bEnableConstVbv = 0; param->bResetZoneConfig = 1; param->reconfigWindowSize = 0; + param->rc.frameSegment = 0; param->decoderVbvMaxRate = 0; param->bliveVBV2pass = 0; @@ -321,6 +326,7 @@ param->maxLuma = PIXEL_MAX; param->log2MaxPocLsb = 8; param->maxSlices = 1; + param->videoSignalTypePreset = NULL; /*Conformance window*/ param->confWinRightOffset = 0; @@ -373,10 +379,16 @@ param->bEnableSvtHevc = 0; param->svtHevcParam = NULL; + /* MCSTF */ + param->bEnableTemporalFilter = 0; + param->temporalFilterStrength = 0.95; + #ifdef SVT_HEVC param->svtHevcParam = svtParam; svt_param_default(param); #endif + /* Film grain characteristics model filename */ + param->filmGrain = NULL; } int x265_param_default_preset(x265_param* param, const char* preset, const char* tune) @@ -949,7 +961,6 @@ { bError = false; p->scenecutThreshold = atoi(value); - p->bHistBasedSceneCut = 0; } } OPT("temporal-layers") p->bEnableTemporalSubLayers = atobool(value); @@ -1184,6 +1195,7 @@ int pass = x265_clip3(0, 3, atoi(value)); p->rc.bStatWrite = pass & 1; p->rc.bStatRead = pass & 2; + p->rc.dataShareMode = X265_SHARE_MODE_FILE; } OPT("stats") p->rc.statFileName = strdup(value); OPT("scaling-list") p->scalingLists = strdup(value); @@ -1216,27 +1228,14 @@ OPT("opt-ref-list-length-pps") p->bOptRefListLengthPPS = atobool(value); OPT("multi-pass-opt-rps") p->bMultiPassOptRPS = atobool(value); OPT("scenecut-bias") p->scenecutBias = atof(value); - OPT("hist-scenecut") - { - p->bHistBasedSceneCut = atobool(value); - if (bError) - { - bError = false; - p->bHistBasedSceneCut = 0; - } - if (p->bHistBasedSceneCut) - { - bError = false; - p->scenecutThreshold = 0; - } - } - OPT("hist-threshold") p->edgeTransitionThreshold = atof(value); + OPT("hist-scenecut") p->bHistBasedSceneCut = atobool(value); OPT("rskip-edge-threshold") p->edgeVarThreshold = atoi(value)/100.0f; OPT("lookahead-threads") p->lookaheadThreads = atoi(value); OPT("opt-cu-delta-qp") p->bOptCUDeltaQP = atobool(value); OPT("multi-pass-opt-analysis") p->analysisMultiPassRefine = atobool(value); OPT("multi-pass-opt-distortion") p->analysisMultiPassDistortion = atobool(value); OPT("aq-motion") p->bAQMotion = atobool(value); + OPT("sbrc") p->rc.frameSegment = atobool(value); OPT("dynamic-rd") p->dynamicRd = atof(value); OPT("analysis-reuse-level") { @@ -1446,6 +1445,12 @@ OPT("vbv-live-multi-pass") p->bliveVBV2pass = atobool(value); OPT("min-vbv-fullness") p->minVbvFullness = atof(value); OPT("max-vbv-fullness") p->maxVbvFullness = atof(value); + OPT("video-signal-type-preset") p->videoSignalTypePreset = strdup(value); + OPT("eob") p->bEnableEndOfBitstream = atobool(value); + OPT("eos") p->bEnableEndOfSequence = atobool(value); + /* Film grain characterstics model filename */ + OPT("film-grain") p->filmGrain = (char* )value; + OPT("mcstf") p->bEnableTemporalFilter = atobool(value); else return X265_PARAM_BAD_NAME; } @@ -1761,8 +1766,6 @@ "scenecutThreshold must be greater than 0"); CHECK(param->scenecutBias < 0 || 100 < param->scenecutBias, "scenecut-bias must be between 0 and 100"); - CHECK(param->edgeTransitionThreshold < 0.0 || 1.0 < param->edgeTransitionThreshold, - "hist-threshold must be between 0.0 and 1.0"); CHECK(param->radl < 0 || param->radl > param->bframes, "radl must be between 0 and bframes"); CHECK(param->rdPenalty < 0 || param->rdPenalty > 2, @@ -1824,15 +1827,15 @@ "Invalid refine-ctu-distortion value, must be either 0 or 1"); CHECK(param->maxAUSizeFactor < 0.5 || param->maxAUSizeFactor > 1.0, "Supported factor for controlling max AU size is from 0.5 to 1"); - CHECK((param->dolbyProfile != 0) && (param->dolbyProfile != 50) && (param->dolbyProfile != 81) && (param->dolbyProfile != 82), - "Unsupported Dolby Vision profile, only profile 5, profile 8.1 and profile 8.2 enabled"); + CHECK((param->dolbyProfile != 0) && (param->dolbyProfile != 50) && (param->dolbyProfile != 81) && (param->dolbyProfile != 82) && (param->dolbyProfile != 84), + "Unsupported Dolby Vision profile, only profile 5, profile 8.1, profile 8.2 and profile 8.4 enabled"); CHECK(param->dupThreshold < 1 || 99 < param->dupThreshold, "Invalid frame-duplication threshold. Value must be between 1 and 99."); if (param->dolbyProfile) { CHECK((param->rc.vbvMaxBitrate <= 0 || param->rc.vbvBufferSize <= 0), "Dolby Vision requires VBV settings to enable HRD.\n"); - CHECK((param->internalBitDepth != 10), "Dolby Vision profile - 5, profile - 8.1 and profile - 8.2 is Main10 only\n"); - CHECK((param->internalCsp != X265_CSP_I420), "Dolby Vision profile - 5, profile - 8.1 and profile - 8.2 requires YCbCr 4:2:0 color space\n"); + CHECK((param->internalBitDepth != 10), "Dolby Vision profile - 5, profile - 8.1, profile - 8.2 and profile - 8.4 are Main10 only\n"); + CHECK((param->internalCsp != X265_CSP_I420), "Dolby Vision profile - 5, profile - 8.1, profile - 8.2 and profile - 8.4 requires YCbCr 4:2:0 color space\n"); if (param->dolbyProfile == 81) CHECK(!(param->masteringDisplayColorVolume), "Dolby Vision profile - 8.1 requires Mastering display color volume information\n"); } @@ -1898,6 +1901,11 @@ param->bSingleSeiNal = 0; x265_log(param, X265_LOG_WARNING, "None of the SEI messages are enabled. Disabling Single SEI NAL\n"); } + if (param->bEnableTemporalFilter && (param->frameNumThreads > 1)) + { + param->bEnableTemporalFilter = 0; + x265_log(param, X265_LOG_WARNING, "MCSTF can be enabled with frame thread = 1 only. Disabling MCSTF\n"); + } CHECK(param->confWinRightOffset < 0, "Conformance Window Right Offset must be 0 or greater"); CHECK(param->confWinBottomOffset < 0, "Conformance Window Bottom Offset must be 0 or greater"); CHECK(param->decoderVbvMaxRate < 0, "Invalid Decoder Vbv Maxrate. Value can not be less than zero"); @@ -1910,6 +1918,7 @@ x265_log(param, X265_LOG_WARNING, "Live VBV enabled without VBV settings.Disabling live VBV in 2 pass\n"); } } + CHECK(param->rc.dataShareMode != X265_SHARE_MODE_FILE && param->rc.dataShareMode != X265_SHARE_MODE_SHAREDMEM, "Invalid data share mode. It must be one of the X265_DATA_SHARE_MODES enum values\n" ); return check_failed; } @@ -1970,8 +1979,8 @@ x265_log(param, X265_LOG_INFO, "Keyframe min / max / scenecut / bias : %d / %d / %d / %.2lf \n", param->keyframeMin, param->keyframeMax, param->scenecutThreshold, param->scenecutBias * 100); else if (param->bHistBasedSceneCut && param->keyframeMax != INT_MAX) - x265_log(param, X265_LOG_INFO, "Keyframe min / max / scenecut / edge threshold : %d / %d / %d / %.2lf\n", - param->keyframeMin, param->keyframeMax, param->bHistBasedSceneCut, param->edgeTransitionThreshold); + x265_log(param, X265_LOG_INFO, "Keyframe min / max / scenecut : %d / %d / %d\n", + param->keyframeMin, param->keyframeMax, param->bHistBasedSceneCut); else if (param->keyframeMax == INT_MAX) x265_log(param, X265_LOG_INFO, "Keyframe min / max / scenecut : disabled\n"); @@ -1988,9 +1997,11 @@ param->maxNumReferences, (param->limitReferences & X265_REF_LIMIT_CU) ? "on" : "off", (param->limitReferences & X265_REF_LIMIT_DEPTH) ? "on" : "off"); - if (param->rc.aqMode) + if (param->rc.aqMode && !param->rc.frameSegment) x265_log(param, X265_LOG_INFO, "AQ: mode / str / qg-size / cu-tree : %d / %0.1f / %d / %d\n", param->rc.aqMode, param->rc.aqStrength, param->rc.qgSize, param->rc.cuTree); + else if (param->rc.frameSegment) + x265_log(param, X265_LOG_INFO, "AQ: mode / str / qg-size / cu-tree : auto / %0.1f / %d / %d\n", param->rc.aqStrength, param->rc.qgSize, param->rc.cuTree); if (param->bLossless) x265_log(param, X265_LOG_INFO, "Rate Control : Lossless\n"); @@ -2089,6 +2100,8 @@ bufSize += strlen(p->numaPools); if (p->masteringDisplayColorVolume) bufSize += strlen(p->masteringDisplayColorVolume); + if (p->videoSignalTypePreset) + bufSize += strlen(p->videoSignalTypePreset);
View file
x265_3.5.tar.gz/source/common/piclist.cpp
Changed
@@ -45,6 +45,25 @@ m_count++; } +void PicList::pushFrontMCSTF(Frame& curFrame) +{ + X265_CHECK(!curFrame.m_nextMCSTF && !curFrame.m_nextMCSTF, "piclist: picture already in OPB list\n"); // ensure frame is not in a list + curFrame.m_nextMCSTF = m_start; + curFrame.m_prevMCSTF = NULL; + + if (m_count) + { + m_start->m_prevMCSTF = &curFrame; + m_start = &curFrame; + } + else + { + m_start = m_end = &curFrame; + } + m_count++; + +} + void PicList::pushBack(Frame& curFrame) { X265_CHECK(!curFrame.m_next && !curFrame.m_prev, "piclist: picture already in list\n"); // ensure frame is not in a list @@ -63,6 +82,24 @@ m_count++; } +void PicList::pushBackMCSTF(Frame& curFrame) +{ + X265_CHECK(!curFrame.m_nextMCSTF && !curFrame.m_prevMCSTF, "piclist: picture already in OPB list\n"); // ensure frame is not in a list + curFrame.m_nextMCSTF = NULL; + curFrame.m_prevMCSTF = m_end; + + if (m_count) + { + m_end->m_nextMCSTF = &curFrame; + m_end = &curFrame; + } + else + { + m_start = m_end = &curFrame; + } + m_count++; +} + Frame *PicList::popFront() { if (m_start) @@ -94,6 +131,14 @@ return curFrame; } +Frame* PicList::getPOCMCSTF(int poc) +{ + Frame *curFrame = m_start; + while (curFrame && curFrame->m_poc != poc) + curFrame = curFrame->m_nextMCSTF; + return curFrame; +} + Frame *PicList::popBack() { if (m_end) @@ -117,6 +162,29 @@ return NULL; } +Frame *PicList::popBackMCSTF() +{ + if (m_end) + { + Frame* temp = m_end; + m_count--; + + if (m_count) + { + m_end = m_end->m_prevMCSTF; + m_end->m_nextMCSTF = NULL; + } + else + { + m_start = m_end = NULL; + } + temp->m_nextMCSTF = temp->m_prevMCSTF = NULL; + return temp; + } + else + return NULL; +} + Frame* PicList::getCurFrame(void) { Frame *curFrame = m_start; @@ -158,3 +226,36 @@ curFrame.m_next = curFrame.m_prev = NULL; } + +void PicList::removeMCSTF(Frame& curFrame) +{ +#if _DEBUG + Frame *tmp = m_start; + while (tmp && tmp != &curFrame) + { + tmp = tmp->m_nextMCSTF; + } + + X265_CHECK(tmp == &curFrame, "framelist: pic being removed was not in list\n"); // verify pic is in this list +#endif + + m_count--; + if (m_count) + { + if (m_start == &curFrame) + m_start = curFrame.m_nextMCSTF; + if (m_end == &curFrame) + m_end = curFrame.m_prevMCSTF; + + if (curFrame.m_nextMCSTF) + curFrame.m_nextMCSTF->m_prevMCSTF = curFrame.m_prevMCSTF; + if (curFrame.m_prevMCSTF) + curFrame.m_prevMCSTF->m_nextMCSTF = curFrame.m_nextMCSTF; + } + else + { + m_start = m_end = NULL; + } + + curFrame.m_nextMCSTF = curFrame.m_prevMCSTF = NULL; +}
View file
x265_3.5.tar.gz/source/common/piclist.h
Changed
@@ -49,24 +49,31 @@ /** Push picture to end of the list */ void pushBack(Frame& pic); + void pushBackMCSTF(Frame& pic); /** Push picture to beginning of the list */ void pushFront(Frame& pic); + void pushFrontMCSTF(Frame& pic); /** Pop picture from end of the list */ Frame* popBack(); + Frame* popBackMCSTF(); /** Pop picture from beginning of the list */ Frame* popFront(); /** Find frame with specified POC */ Frame* getPOC(int poc); + /* Find next MCSTF frame with specified POC */ + Frame* getPOCMCSTF(int poc); /** Get the current Frame from the list **/ Frame* getCurFrame(void); /** Remove picture from list */ void remove(Frame& pic); + /* Remove MCSTF picture from list */ + void removeMCSTF(Frame& pic); Frame* first() { return m_start; }
View file
x265_3.5.tar.gz/source/common/picyuv.cpp
Changed
@@ -125,6 +125,58 @@ return false; } +/*Copy pixels from the picture buffer of a frame to picture buffer of another frame*/ +void PicYuv::copyFromFrame(PicYuv* source) +{ + uint32_t numCuInHeight = (m_picHeight + m_param->maxCUSize - 1) / m_param->maxCUSize; + + int maxHeight = numCuInHeight * m_param->maxCUSize; + memcpy(m_picBuf0, source->m_picBuf0, sizeof(pixel)* m_stride * (maxHeight + (m_lumaMarginY * 2))); + m_picOrg0 = m_picBuf0 + m_lumaMarginY * m_stride + m_lumaMarginX; + + if (m_picCsp != X265_CSP_I400) + { + memcpy(m_picBuf1, source->m_picBuf1, sizeof(pixel)* m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2))); + memcpy(m_picBuf2, source->m_picBuf2, sizeof(pixel)* m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2))); + + m_picOrg1 = m_picBuf1 + m_chromaMarginY * m_strideC + m_chromaMarginX; + m_picOrg2 = m_picBuf2 + m_chromaMarginY * m_strideC + m_chromaMarginX; + } + else + { + m_picBuf1 = m_picBuf2 = NULL; + m_picOrg1 = m_picOrg2 = NULL; + } +} + +bool PicYuv::createScaledPicYUV(x265_param* param, uint8_t scaleFactor) +{ + m_param = param; + m_picWidth = m_param->sourceWidth / scaleFactor; + m_picHeight = m_param->sourceHeight / scaleFactor; + + m_picCsp = m_param->internalCsp; + m_hChromaShift = CHROMA_H_SHIFT(m_picCsp); + m_vChromaShift = CHROMA_V_SHIFT(m_picCsp); + + uint32_t numCuInWidth = (m_picWidth + param->maxCUSize - 1) / param->maxCUSize; + uint32_t numCuInHeight = (m_picHeight + param->maxCUSize - 1) / param->maxCUSize; + + m_lumaMarginX = 128; // search margin for L0 and L1 ME in horizontal direction + m_lumaMarginY = 128; // search margin for L0 and L1 ME in vertical direction + m_stride = (numCuInWidth * param->maxCUSize) + (m_lumaMarginX << 1); + + int maxHeight = numCuInHeight * param->maxCUSize; + CHECKED_MALLOC_ZERO(m_picBuf0, pixel, m_stride * (maxHeight + (m_lumaMarginY * 2))); + m_picOrg0 = m_picBuf0 + m_lumaMarginY * m_stride + m_lumaMarginX; + m_picBuf1 = m_picBuf2 = NULL; + m_picOrg1 = m_picOrg2 = NULL; + return true; + +fail: + return false; +} + int PicYuv::getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp) { m_picWidth = picWidth;
View file
x265_3.5.tar.gz/source/common/picyuv.h
Changed
@@ -78,11 +78,13 @@ PicYuv(); bool create(x265_param* param, bool picAlloc = true, pixel *pixelbuf = NULL); + bool createScaledPicYUV(x265_param* param, uint8_t scaleFactor); bool createOffsets(const SPS& sps); void destroy(); int getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp); void copyFromPicture(const x265_picture&, const x265_param& param, int padx, int pady); + void copyFromFrame(PicYuv* source); intptr_t getChromaAddrOffset(uint32_t ctuAddr, uint32_t absPartIdx) const { return m_cuOffsetCctuAddr + m_buOffsetCabsPartIdx; }
View file
x265_3.5.tar.gz/source/common/pixel.cpp
Changed
@@ -266,7 +266,7 @@ { int satd = 0; -#if ENABLE_ASSEMBLY && X265_ARCH_ARM64 +#if ENABLE_ASSEMBLY && X265_ARCH_ARM64 && !HIGH_BIT_DEPTH pixelcmp_t satd_4x4 = x265_pixel_satd_4x4_neon; #endif @@ -284,7 +284,7 @@ { int satd = 0; -#if ENABLE_ASSEMBLY && X265_ARCH_ARM64 +#if ENABLE_ASSEMBLY && X265_ARCH_ARM64 && !HIGH_BIT_DEPTH pixelcmp_t satd_8x4 = x265_pixel_satd_8x4_neon; #endif @@ -627,6 +627,23 @@ } } +static +void frame_subsample_luma(const pixel* src0, pixel* dst0, intptr_t src_stride, intptr_t dst_stride, int width, int height) +{ + for (int y = 0; y < height; y++, src0 += 2 * src_stride, dst0 += dst_stride) + { + const pixel *inRow = src0; + const pixel *inRowBelow = src0 + src_stride; + pixel *target = dst0; + for (int x = 0; x < width; x++) + { + targetx = (((inRow0 + inRowBelow0 + 1) >> 1) + ((inRow1 + inRowBelow1 + 1) >> 1) + 1) >> 1; + inRow += 2; + inRowBelow += 2; + } + } +} + /* structural similarity metric */ static void ssim_4x4x2_core(const pixel* pix1, intptr_t stride1, const pixel* pix2, intptr_t stride2, int sums24) { @@ -1355,5 +1372,7 @@ p.cuBLOCK_16x16.normFact = normFact_c; p.cuBLOCK_32x32.normFact = normFact_c; p.cuBLOCK_64x64.normFact = normFact_c; + /* SubSample Luma*/ + p.frameSubSampleLuma = frame_subsample_luma; } }
View file
x265_3.5.tar.gz/source/common/primitives.h
Changed
@@ -232,6 +232,8 @@ typedef void(*psyRdoQuant_t2)(int16_t *m_resiDctCoeff, int16_t *m_fencDctCoeff, int64_t *costUncoded, int64_t *totalUncodedCost, int64_t *totalRdCost, int64_t *psyScale, uint32_t blkPos); typedef void(*ssimDistortion_t)(const pixel *fenc, uint32_t fStride, const pixel *recon, intptr_t rstride, uint64_t *ssBlock, int shift, uint64_t *ac_k); typedef void(*normFactor_t)(const pixel *src, uint32_t blockSize, int shift, uint64_t *z_k); +/* SubSampling Luma */ +typedef void (*downscaleluma_t)(const pixel* src0, pixel* dstf, intptr_t src_stride, intptr_t dst_stride, int width, int height); /* Function pointers to optimized encoder primitives. Each pointer can reference * either an assembly routine, a SIMD intrinsic primitive, or a C function */ struct EncoderPrimitives @@ -353,6 +355,8 @@ downscale_t frameInitLowres; downscale_t frameInitLowerRes; + /* Sub Sample Luma */ + downscaleluma_t frameSubSampleLuma; cutree_propagate_cost propagateCost; cutree_fix8_unpack fix8Unpack; cutree_fix8_pack fix8Pack; @@ -488,7 +492,7 @@ #if ENABLE_ASSEMBLY && X265_ARCH_ARM64 extern "C" { -#include "aarch64/pixel-util.h" +#include "aarch64/fun-decls.h" } #endif
View file
x265_3.5.tar.gz/source/common/ringmem.cpp
Added
@@ -0,0 +1,357 @@ +/***************************************************************************** + * Copyright (C) 2013-2017 MulticoreWare, Inc + * + * Authors: liwei <liwei@multicorewareinc.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. + * + * This program is also available under a commercial proprietary license. + * For more information, contact us at license @ x265.com + *****************************************************************************/ + +#include "ringmem.h" + +#ifndef _WIN32 +#include <sys/mman.h> +#endif ////< _WIN32 + +#ifdef _WIN32 +#define X265_SHARED_MEM_NAME "Local\\_x265_shr_mem_" +#define X265_SEMAPHORE_RINGMEM_WRITER_NAME "_x265_semW_" +#define X265_SEMAPHORE_RINGMEM_READER_NAME "_x265_semR_" +#else /* POSIX / pthreads */ +#define X265_SHARED_MEM_NAME "/tmp/_x265_shr_mem_" +#define X265_SEMAPHORE_RINGMEM_WRITER_NAME "/tmp/_x265_semW_" +#define X265_SEMAPHORE_RINGMEM_READER_NAME "/tmp/_x265_semR_" +#endif + +#define RINGMEM_ALLIGNMENT 64 + +namespace X265_NS { + RingMem::RingMem() + : m_initialized(false) + , m_protectRW(false) + , m_itemSize(0) + , m_itemCnt(0) + , m_dataPool(NULL) + , m_shrMem(NULL) +#ifdef _WIN32 + , m_handle(NULL) +#else //_WIN32 + , m_filepath(NULL) +#endif //_WIN32 + , m_writeSem(NULL) + , m_readSem(NULL) + { + } + + + RingMem::~RingMem() + { + } + + bool RingMem::skipRead(int32_t cnt) { + if (!m_initialized) + { + return false; + } + + if (m_protectRW) + { + for (int i = 0; i < cnt; i++) + { + m_readSem->take(); + } + } + + ATOMIC_ADD(&m_shrMem->m_read, cnt); + + if (m_protectRW) + { + m_writeSem->give(cnt); + } + + return true; + } + + bool RingMem::skipWrite(int32_t cnt) { + if (!m_initialized) + { + return false; + } + + if (m_protectRW) + { + for (int i = 0; i < cnt; i++) + { + m_writeSem->take(); + } + } + + ATOMIC_ADD(&m_shrMem->m_write, cnt); + + if (m_protectRW) + { + m_readSem->give(cnt); + } + + return true; + } + + ///< initialize + bool RingMem::init(int32_t itemSize, int32_t itemCnt, const char *name, bool protectRW) + { + ///< check parameters + if (itemSize <= 0 || itemCnt <= 0 || NULL == name) + { + ///< invalid parameters + return false; + } + + if (!m_initialized) + { + ///< formating names + char nameBufMAX_SHR_NAME_LEN = { 0 }; + + ///< shared memory name + snprintf(nameBuf, sizeof(nameBuf) - 1, "%s%s", X265_SHARED_MEM_NAME, name); + + ///< create or open shared memory + bool newCreated = false; + + ///< calculate the size of the shared memory + int32_t shrMemSize = (itemSize * itemCnt + sizeof(ShrMemCtrl) + RINGMEM_ALLIGNMENT - 1) & ~(RINGMEM_ALLIGNMENT - 1); + +#ifdef _WIN32 + HANDLE h = OpenFileMappingA(FILE_MAP_WRITE | FILE_MAP_READ, FALSE, nameBuf); + if (!h) + { + h = CreateFileMappingA(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, shrMemSize, nameBuf); + + if (!h) + { + return false; + } + + newCreated = true; + } + + void *pool = MapViewOfFile(h, FILE_MAP_ALL_ACCESS, 0, 0, 0); + + ///< should not close the handle here, otherwise the OpenFileMapping would fail + //CloseHandle(h); + m_handle = h; + + if (!pool) + { + return false; + } + +#else /* POSIX / pthreads */ + mode_t mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH; + int flag = O_RDWR; + int shrfd = -1; + if ((shrfd = open(nameBuf, flag, mode)) < 0) + { + flag |= O_CREAT; + + shrfd = open(nameBuf, flag, mode); + if (shrfd < 0) + { + return false; + } + newCreated = true; + + lseek(shrfd, shrMemSize - 1, SEEK_SET); + + if (-1 == write(shrfd, "\0", 1)) + { + close(shrfd); + return false; + } + + if (lseek(shrfd, 0, SEEK_END) < shrMemSize) + { + close(shrfd); + return false; + } + } + + void *pool = mmap(0, + shrMemSize, + PROT_READ | PROT_WRITE, + MAP_SHARED, + shrfd, + 0); + + close(shrfd);
View file
x265_3.5.tar.gz/source/common/ringmem.h
Added
@@ -0,0 +1,90 @@ +/***************************************************************************** + * Copyright (C) 2013-2017 MulticoreWare, Inc + * + * Authors: liwei <liwei@multicorewareinc.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. + * + * This program is also available under a commercial proprietary license. + * For more information, contact us at license @ x265.com + *****************************************************************************/ + +#ifndef X265_RINGMEM_H +#define X265_RINGMEM_H + +#include "common.h" +#include "threading.h" + +#if _MSC_VER +#define snprintf _snprintf +#define strdup _strdup +#endif + +namespace X265_NS { + +#define MAX_SHR_NAME_LEN 256 + + class RingMem { + public: + RingMem(); + ~RingMem(); + + bool skipRead(int32_t cnt); + + bool skipWrite(int32_t cnt); + + ///< initialize + ///< protectRW: if use the semaphore the protect the write and read operation. + bool init(int32_t itemSize, int32_t itemCnt, const char *name, bool protectRW = false); + ///< finalize + void release(); + + typedef void(*fnRWSharedData)(void *dst, void *src, int32_t size); + + ///< data read + bool readNext(void* dst, fnRWSharedData callback); + ///< data write + bool writeData(void *data, fnRWSharedData callback); + + private: + bool m_initialized; + bool m_protectRW; + + int32_t m_itemSize; + int32_t m_itemCnt; + ///< data pool + void *m_dataPool; + typedef struct { + ///< index to write + int32_t m_write; + ///< index to read + int32_t m_read; + + }ShrMemCtrl; + + ShrMemCtrl *m_shrMem; +#ifdef _WIN32 + void *m_handle; +#else // _WIN32 + char *m_filepath; +#endif // _WIN32 + + ///< Semaphores + NamedSemaphore *m_writeSem; + NamedSemaphore *m_readSem; + }; +}; + +#endif // ifndef X265_RINGMEM_H
View file
x265_3.5.tar.gz/source/common/slice.h
Changed
@@ -363,6 +363,7 @@ int m_iNumRPSInSPS; const x265_param *m_param; int m_fieldNum; + Frame* m_mcstfRefFrameList2MAX_MCSTF_TEMPORAL_WINDOW_LENGTH; Slice() {
View file
x265_3.5.tar.gz/source/common/temporalfilter.cpp
Added
@@ -0,0 +1,1017 @@ +/***************************************************************************** +* Copyright (C) 2013-2021 MulticoreWare, Inc +* + * Authors: Ashok Kumar Mishra <ashok@multicorewareinc.com> + * +* This program is free software; you can redistribute it and/or modify +* it under the terms of the GNU General Public License as published by +* the Free Software Foundation; either version 2 of the License, or +* (at your option) any later version. +* +* This program is distributed in the hope that it will be useful, +* but WITHOUT ANY WARRANTY; without even the implied warranty of +* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +* GNU General Public License for more details. +* +* You should have received a copy of the GNU General Public License +* along with this program; if not, write to the Free Software +* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. +* +* This program is also available under a commercial proprietary license. +* For more information, contact us at license @ x265.com. +*****************************************************************************/ +#include "common.h" +#include "temporalfilter.h" +#include "primitives.h" + +#include "frame.h" +#include "slice.h" +#include "framedata.h" +#include "analysis.h" + +using namespace X265_NS; + +void OrigPicBuffer::addPicture(Frame* inFrame) +{ + m_mcstfPicList.pushFrontMCSTF(*inFrame); +} + +void OrigPicBuffer::addEncPicture(Frame* inFrame) +{ + m_mcstfOrigPicFreeList.pushFrontMCSTF(*inFrame); +} + +void OrigPicBuffer::addEncPictureToPicList(Frame* inFrame) +{ + m_mcstfOrigPicList.pushFrontMCSTF(*inFrame); +} + +OrigPicBuffer::~OrigPicBuffer() +{ + while (!m_mcstfOrigPicList.empty()) + { + Frame* curFrame = m_mcstfOrigPicList.popBackMCSTF(); + curFrame->destroy(); + delete curFrame; + } + + while (!m_mcstfOrigPicFreeList.empty()) + { + Frame* curFrame = m_mcstfOrigPicFreeList.popBackMCSTF(); + curFrame->destroy(); + delete curFrame; + } +} + +void OrigPicBuffer::setOrigPicList(Frame* inFrame, int frameCnt) +{ + Slice* slice = inFrame->m_encData->m_slice; + uint8_t j = 0; + for (int iterPOC = (inFrame->m_poc - inFrame->m_mcstf->m_range); + iterPOC <= (inFrame->m_poc + inFrame->m_mcstf->m_range); iterPOC++) + { + if (iterPOC != inFrame->m_poc) + { + if (iterPOC < 0) + continue; + if (iterPOC >= frameCnt) + break; + + Frame *iterFrame = m_mcstfPicList.getPOCMCSTF(iterPOC); + X265_CHECK(iterFrame, "Reference frame not found in OPB"); + if (iterFrame != NULL) + { + slice->m_mcstfRefFrameList1j = iterFrame; + iterFrame->m_refPicCnt1--; + } + + iterFrame = m_mcstfOrigPicList.getPOCMCSTF(iterPOC); + if (iterFrame != NULL) + { + + slice->m_mcstfRefFrameList1j = iterFrame; + + iterFrame->m_refPicCnt1--; + Frame *cFrame = m_mcstfOrigPicList.getPOCMCSTF(inFrame->m_poc); + X265_CHECK(cFrame, "Reference frame not found in encoded OPB"); + cFrame->m_refPicCnt1--; + } + j++; + } + } +} + +void OrigPicBuffer::recycleOrigPicList() +{ + Frame *iterFrame = m_mcstfPicList.first(); + + while (iterFrame) + { + Frame *curFrame = iterFrame; + iterFrame = iterFrame->m_nextMCSTF; + if (!curFrame->m_refPicCnt1) + { + m_mcstfPicList.removeMCSTF(*curFrame); + iterFrame = m_mcstfPicList.first(); + } + } + + iterFrame = m_mcstfOrigPicList.first(); + + while (iterFrame) + { + Frame *curFrame = iterFrame; + iterFrame = iterFrame->m_nextMCSTF; + if (!curFrame->m_refPicCnt1) + { + m_mcstfOrigPicList.removeMCSTF(*curFrame); + *curFrame->m_isSubSampled = false; + m_mcstfOrigPicFreeList.pushFrontMCSTF(*curFrame); + iterFrame = m_mcstfOrigPicList.first(); + } + } +} + +void OrigPicBuffer::addPictureToFreelist(Frame* inFrame) +{ + m_mcstfOrigPicFreeList.pushBack(*inFrame); +} + +TemporalFilter::TemporalFilter() +{ + m_sourceWidth = 0; + m_sourceHeight = 0, + m_QP = 0; + m_sliceTypeConfig = 3; + m_numRef = 0; + m_useSADinME = 1; + + m_range = 2; + m_chromaFactor = 0.55; + m_sigmaMultiplier = 9.0; + m_sigmaZeroPoint = 10.0; + m_motionVectorFactor = 16; +} + +void TemporalFilter::init(const x265_param* param) +{ + m_param = param; + m_bitDepth = param->internalBitDepth; + m_sourceWidth = param->sourceWidth; + m_sourceHeight = param->sourceHeight; + m_internalCsp = param->internalCsp; + m_numComponents = (m_internalCsp != X265_CSP_I400) ? MAX_NUM_COMPONENT : 1; + + m_metld = new MotionEstimatorTLD; + + predPUYuv.create(FENC_STRIDE, X265_CSP_I400); +} + +int TemporalFilter::createRefPicInfo(TemporalFilterRefPicInfo* refFrame, x265_param* param) +{ + CHECKED_MALLOC_ZERO(refFrame->mvs, MV, sizeof(MV)* ((m_sourceWidth ) / 4) * ((m_sourceHeight ) / 4)); + refFrame->mvsStride = m_sourceWidth / 4; + CHECKED_MALLOC_ZERO(refFrame->mvs0, MV, sizeof(MV)* ((m_sourceWidth ) / 16) * ((m_sourceHeight ) / 16)); + refFrame->mvsStride0 = m_sourceWidth / 16; + CHECKED_MALLOC_ZERO(refFrame->mvs1, MV, sizeof(MV)* ((m_sourceWidth ) / 16) * ((m_sourceHeight ) / 16)); + refFrame->mvsStride1 = m_sourceWidth / 16; + CHECKED_MALLOC_ZERO(refFrame->mvs2, MV, sizeof(MV)* ((m_sourceWidth ) / 16)*((m_sourceHeight ) / 16)); + refFrame->mvsStride2 = m_sourceWidth / 16; + + CHECKED_MALLOC_ZERO(refFrame->noise, int, sizeof(int) * ((m_sourceWidth) / 4) * ((m_sourceHeight) / 4)); + CHECKED_MALLOC_ZERO(refFrame->error, int, sizeof(int) * ((m_sourceWidth) / 4) * ((m_sourceHeight) / 4)); + + refFrame->slicetype = X265_TYPE_AUTO; + + refFrame->compensatedPic = new PicYuv; + refFrame->compensatedPic->create(param, true); + + return 1; +fail: + return 0; +} + +int TemporalFilter::motionErrorLumaSAD( + PicYuv *orig, + PicYuv *buffer, + int x, + int y, + int dx,
View file
x265_3.5.tar.gz/source/common/temporalfilter.h
Added
@@ -0,0 +1,185 @@ +/***************************************************************************** +* Copyright (C) 2013-2021 MulticoreWare, Inc +* + * Authors: Ashok Kumar Mishra <ashok@multicorewareinc.com> + * +* This program is free software; you can redistribute it and/or modify +* it under the terms of the GNU General Public License as published by +* the Free Software Foundation; either version 2 of the License, or +* (at your option) any later version. +* +* This program is distributed in the hope that it will be useful, +* but WITHOUT ANY WARRANTY; without even the implied warranty of +* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +* GNU General Public License for more details. +* +* You should have received a copy of the GNU General Public License +* along with this program; if not, write to the Free Software +* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. +* +* This program is also available under a commercial proprietary license. +* For more information, contact us at license @ x265.com. +*****************************************************************************/ + +#ifndef X265_TEMPORAL_FILTER_H +#define X265_TEMPORAL_FILTER_H + +#include "x265.h" +#include "picyuv.h" +#include "mv.h" +#include "piclist.h" +#include "yuv.h" +#include "motion.h" + +const int s_interpolationFilter168 = +{ + { 0, 0, 0, 64, 0, 0, 0, 0 }, //0 + { 0, 1, -3, 64, 4, -2, 0, 0 }, //1 -->--> + { 0, 1, -6, 62, 9, -3, 1, 0 }, //2 --> + { 0, 2, -8, 60, 14, -5, 1, 0 }, //3 -->--> + { 0, 2, -9, 57, 19, -7, 2, 0 }, //4 + { 0, 3, -10, 53, 24, -8, 2, 0 }, //5 -->--> + { 0, 3, -11, 50, 29, -9, 2, 0 }, //6 --> + { 0, 3, -11, 44, 35, -10, 3, 0 }, //7 -->--> + { 0, 1, -7, 38, 38, -7, 1, 0 }, //8 + { 0, 3, -10, 35, 44, -11, 3, 0 }, //9 -->--> + { 0, 2, -9, 29, 50, -11, 3, 0 }, //10--> + { 0, 2, -8, 24, 53, -10, 3, 0 }, //11-->--> + { 0, 2, -7, 19, 57, -9, 2, 0 }, //12 + { 0, 1, -5, 14, 60, -8, 2, 0 }, //13-->--> + { 0, 1, -3, 9, 62, -6, 1, 0 }, //14--> + { 0, 0, -2, 4, 64, -3, 1, 0 } //15-->--> +}; + +const double s_refStrengths34 = +{ // abs(POC offset) + // 1, 2 3 4 + {0.85, 0.57, 0.41, 0.33}, // m_range * 2 + {1.13, 0.97, 0.81, 0.57}, // m_range + {0.30, 0.30, 0.30, 0.30} // otherwise +}; + +namespace X265_NS { + class OrigPicBuffer + { + public: + PicList m_mcstfPicList; + PicList m_mcstfOrigPicFreeList; + PicList m_mcstfOrigPicList; + + ~OrigPicBuffer(); + void addPicture(Frame*); + void addEncPicture(Frame*); + void setOrigPicList(Frame*, int); + void recycleOrigPicList(); + void addPictureToFreelist(Frame*); + void addEncPictureToPicList(Frame*); + }; + + struct MotionEstimatorTLD + { + MotionEstimate me; + + MotionEstimatorTLD() + { + me.init(X265_CSP_I400); + me.setQP(X265_LOOKAHEAD_QP); + } + + ~MotionEstimatorTLD() {} + }; + + struct TemporalFilterRefPicInfo + { + PicYuv* picBuffer; + PicYuv* picBufferSubSampled2; + PicYuv* picBufferSubSampled4; + MV* mvs; + MV* mvs0; + MV* mvs1; + MV* mvs2; + uint32_t mvsStride; + uint32_t mvsStride0; + uint32_t mvsStride1; + uint32_t mvsStride2; + int* error; + int* noise; + + int16_t origOffset; + bool isFilteredFrame; + PicYuv* compensatedPic; + + int* isSubsampled; + + int slicetype; + }; + + class TemporalFilter + { + public: + TemporalFilter(); + ~TemporalFilter() {} + + void init(const x265_param* param); + + //private: + // Private static member variables + const x265_param *m_param; + int32_t m_bitDepth; + int m_range; + uint8_t m_numRef; + double m_chromaFactor; + double m_sigmaMultiplier; + double m_sigmaZeroPoint; + int m_motionVectorFactor; + int m_padding; + + // Private member variables + + int m_sourceWidth; + int m_sourceHeight; + int m_QP; + + int m_internalCsp; + int m_numComponents; + uint8_t m_sliceTypeConfig; + + MotionEstimatorTLD* m_metld; + Yuv predPUYuv; + int m_useSADinME; + + int createRefPicInfo(TemporalFilterRefPicInfo* refFrame, x265_param* param); + + void bilateralFilter(Frame* frame, TemporalFilterRefPicInfo* mctfRefList, double overallStrength); + + void motionEstimationLuma(MV *mvs, uint32_t mvStride, PicYuv *orig, PicYuv *buffer, int bs, + MV *previous = 0, uint32_t prevmvStride = 0, int factor = 1); + + void motionEstimationLumaDoubleRes(MV *mvs, uint32_t mvStride, PicYuv *orig, PicYuv *buffer, int blockSize, + MV *previous, uint32_t prevMvStride, int factor, int* minError); + + int motionErrorLumaSSD(PicYuv *orig, + PicYuv *buffer, + int x, + int y, + int dx, + int dy, + int bs, + int besterror = 8 * 8 * 1024 * 1024); + + int motionErrorLumaSAD(PicYuv *orig, + PicYuv *buffer, + int x, + int y, + int dx, + int dy, + int bs, + int besterror = 8 * 8 * 1024 * 1024); + + void destroyRefPicInfo(TemporalFilterRefPicInfo* curFrame); + + void applyMotion(MV *mvs, uint32_t mvsStride, PicYuv *input, PicYuv *output); + + }; +} +#endif
View file
x265_3.5.tar.gz/source/common/threading.h
Changed
@@ -3,6 +3,7 @@ * * Authors: Steve Borho <steve@borho.org> * Min Chen <chenm003@163.com> + liwei <liwei@multicorewareinc.com> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -253,6 +254,47 @@ int m_val; }; +class NamedSemaphore +{ +public: + NamedSemaphore() : m_sem(NULL) + { + } + + ~NamedSemaphore() + { + } + + bool create(const char* name, const int initcnt, const int maxcnt) + { + if(!m_sem) + { + m_sem = CreateSemaphoreA(NULL, initcnt, maxcnt, name); + } + return m_sem != NULL; + } + + bool give(const int32_t cnt) + { + return ReleaseSemaphore(m_sem, (LONG)cnt, NULL) != FALSE; + } + + bool take(const uint32_t time_out = INFINITE) + { + int32_t rt = WaitForSingleObject(m_sem, time_out); + return rt != WAIT_TIMEOUT && rt != WAIT_FAILED; + } + + void release() + { + CloseHandle(m_sem); + m_sem = NULL; + } + +private: + HANDLE m_sem; +}; + #else /* POSIX / pthreads */ typedef pthread_t ThreadHandle; @@ -459,6 +501,282 @@ int m_val; }; +#define TIMEOUT_INFINITE 0xFFFFFFFF + +class NamedSemaphore +{ +public: + NamedSemaphore() + : m_sem(NULL) +#ifndef __APPLE__ + , m_name(NULL) +#endif //__APPLE__ + { + } + + ~NamedSemaphore() + { + } + + bool create(const char* name, const int initcnt, const int maxcnt) + { + bool ret = false; + + if (initcnt >= maxcnt) + { + return false; + } + +#ifdef __APPLE__ + do + { + int32_t pshared = name != NULL ? PTHREAD_PROCESS_SHARED : PTHREAD_PROCESS_PRIVATE; + + m_sem = (mac_sem_t *)malloc(sizeof(mac_sem_t)); + if (!m_sem) + { + break; + } + + if (pthread_mutexattr_init(&m_sem->mutexAttr)) + { + break; + } + + if (pthread_mutexattr_setpshared(&m_sem->mutexAttr, pshared)) + { + break; + } + + if (pthread_condattr_init(&m_sem->condAttr)) + { + break; + } + + if (pthread_condattr_setpshared(&m_sem->condAttr, pshared)) + { + break; + } + + if (pthread_mutex_init(&m_sem->mutex, &m_sem->mutexAttr)) + { + break; + } + + if (pthread_cond_init(&m_sem->cond, &m_sem->condAttr)) + { + break; + } + + m_sem->curCnt = initcnt; + m_sem->maxCnt = maxcnt; + + ret = true; + } while (0); + + if (!ret) + { + release(); + } + +#else //__APPLE__ + m_sem = sem_open(name, O_CREAT | O_EXCL, 0666, initcnt); + if (m_sem != SEM_FAILED) + { + m_name = strdup(name); + ret = true; + } + else + { + if (EEXIST == errno) + { + m_sem = sem_open(name, 0); + if (m_sem != SEM_FAILED) + { + m_name = strdup(name); + ret = true; + } + } + } +#endif //__APPLE__ + + return ret; + } + + bool give(const int32_t cnt) + { + if (!m_sem) + { + return false; + } + +#ifdef __APPLE__ + if (pthread_mutex_lock(&m_sem->mutex)) + { + return false; + } + + int oldCnt = m_sem->curCnt; + m_sem->curCnt += cnt; + if (m_sem->curCnt > m_sem->maxCnt) + { + m_sem->curCnt = m_sem->maxCnt; + } + + bool ret = true; + if (!oldCnt) + { + ret = 0 == pthread_cond_broadcast(&m_sem->cond); + } + + if (pthread_mutex_unlock(&m_sem->mutex)) + { + return false; + } + + return ret; +#else //__APPLE__ + int ret = 0; + int32_t curCnt = cnt; + while (curCnt-- && !ret) { + ret = sem_post(m_sem); + }
View file
x265_3.5.tar.gz/source/common/threadpool.cpp
Changed
@@ -301,7 +301,7 @@ /* limit threads based on param->numaPools * For windows because threads can't be allocated to live across sockets * changing the default behavior to be per-socket pools -- FIXME */ -#if defined(_WIN32_WINNT) && _WIN32_WINNT >= _WIN32_WINNT_WIN7 +#if defined(_WIN32_WINNT) && _WIN32_WINNT >= _WIN32_WINNT_WIN7 || HAVE_LIBNUMA if (!p->numaPools || (strcmp(p->numaPools, "NULL") == 0 || strcmp(p->numaPools, "*") == 0 || strcmp(p->numaPools, "") == 0)) { char poolString50 = "";
View file
x265_3.5.tar.gz/source/common/version.cpp
Changed
@@ -71,7 +71,7 @@ #define ONOS "Unk-OS" #endif -#if X86_64 +#if X86_64 || X265_ARCH_ARM64 #define BITS "64 bit" #else #define BITS "32 bit"
View file
x265_3.5.tar.gz/source/common/x86/asm-primitives.cpp
Changed
@@ -1091,6 +1091,7 @@ p.frameInitLowres = PFX(frame_init_lowres_core_sse2); p.frameInitLowerRes = PFX(frame_init_lowres_core_sse2); + p.frameSubSampleLuma = PFX(frame_subsample_luma_sse2); // TODO: the planecopy_sp is really planecopy_SC now, must be fix it //p.planecopy_sp = PFX(downShift_16_sse2); p.planecopy_sp_shl = PFX(upShift_16_sse2); @@ -1121,6 +1122,7 @@ { ASSIGN2(p.scale1D_128to64, scale1D_128to64_ssse3); p.scale2D_64to32 = PFX(scale2D_64to32_ssse3); + p.frameSubSampleLuma = PFX(frame_subsample_luma_ssse3); // p.puLUMA_4x4.satd = p.cuBLOCK_4x4.sa8d = PFX(pixel_satd_4x4_ssse3); this one is broken ALL_LUMA_PU(satd, pixel_satd, ssse3); @@ -1462,6 +1464,7 @@ p.puLUMA_64x48.copy_pp = (copy_pp_t)PFX(blockcopy_ss_64x48_avx); p.puLUMA_64x64.copy_pp = (copy_pp_t)PFX(blockcopy_ss_64x64_avx); p.propagateCost = PFX(mbtree_propagate_cost_avx); + p.frameSubSampleLuma = PFX(frame_subsample_luma_avx); } if (cpuMask & X265_CPU_XOP) { @@ -1473,6 +1476,7 @@ LUMA_VAR(xop); p.frameInitLowres = PFX(frame_init_lowres_core_xop); p.frameInitLowerRes = PFX(frame_init_lowres_core_xop); + p.frameSubSampleLuma = PFX(frame_subsample_luma_xop); } if (cpuMask & X265_CPU_AVX2) { @@ -2301,6 +2305,9 @@ p.frameInitLowres = PFX(frame_init_lowres_core_avx2); p.frameInitLowerRes = PFX(frame_init_lowres_core_avx2); + + p.frameSubSampleLuma = PFX(frame_subsample_luma_avx2); + p.propagateCost = PFX(mbtree_propagate_cost_avx2); p.fix8Unpack = PFX(cutree_fix8_unpack_avx2); p.fix8Pack = PFX(cutree_fix8_pack_avx2); @@ -3300,6 +3307,7 @@ //p.frameInitLowres = PFX(frame_init_lowres_core_mmx2); p.frameInitLowres = PFX(frame_init_lowres_core_sse2); p.frameInitLowerRes = PFX(frame_init_lowres_core_sse2); + p.frameSubSampleLuma = PFX(frame_subsample_luma_sse2); ALL_LUMA_TU(blockfill_sNONALIGNED, blockfill_s, sse2); ALL_LUMA_TU(blockfill_sALIGNED, blockfill_s, sse2); @@ -3424,6 +3432,8 @@ ASSIGN2(p.scale1D_128to64, scale1D_128to64_ssse3); p.scale2D_64to32 = PFX(scale2D_64to32_ssse3); + p.frameSubSampleLuma = PFX(frame_subsample_luma_ssse3); + ASSIGN2(p.puLUMA_8x4.convert_p2s, filterPixelToShort_8x4_ssse3); ASSIGN2(p.puLUMA_8x8.convert_p2s, filterPixelToShort_8x8_ssse3); ASSIGN2(p.puLUMA_8x16.convert_p2s, filterPixelToShort_8x16_ssse3); @@ -3691,6 +3701,7 @@ p.frameInitLowres = PFX(frame_init_lowres_core_avx); p.frameInitLowerRes = PFX(frame_init_lowres_core_avx); p.propagateCost = PFX(mbtree_propagate_cost_avx); + p.frameSubSampleLuma = PFX(frame_subsample_luma_avx); } if (cpuMask & X265_CPU_XOP) { @@ -3702,6 +3713,7 @@ p.cuBLOCK_16x16.sse_pp = PFX(pixel_ssd_16x16_xop); p.frameInitLowres = PFX(frame_init_lowres_core_xop); p.frameInitLowerRes = PFX(frame_init_lowres_core_xop); + p.frameSubSampleLuma = PFX(frame_subsample_luma_xop); } #if X86_64 @@ -4684,6 +4696,8 @@ p.saoCuStatsE2 = PFX(saoCuStatsE2_avx2); p.saoCuStatsE3 = PFX(saoCuStatsE3_avx2); + p.frameSubSampleLuma = PFX(frame_subsample_luma_avx2); + if (cpuMask & X265_CPU_BMI2) { p.scanPosLast = PFX(scanPosLast_avx2_bmi2);
View file
x265_3.5.tar.gz/source/common/x86/mc-a2.asm
Changed
@@ -992,6 +992,262 @@ FRAME_INIT_LOWRES %endif +%macro SUBSAMPLEFILT8x4 7 + mova %3, r0+%7 + mova %4, r0+r2+%7 + pavgb %3, %4 + pavgb %4, r0+r2*2+%7 + PALIGNR %1, %3, 1, m6 + PALIGNR %2, %4, 1, m6 +%if cpuflag(xop) + pavgb %1, %3 + pavgb %2, %4 +%else + pavgb %1, %3 + pavgb %2, %4 + psrlw %5, %1, 8 + psrlw %6, %2, 8 + pand %1, m7 + pand %2, m7 +%endif +%endmacro + +%macro SUBSAMPLEFILT32x4U 1 + movu m1, r0+r2 + pavgb m0, m1, r0 + movu m3, r0+r2+1 + pavgb m2, m3, r0+1 + pavgb m1, r0+r2*2 + pavgb m3, r0+r2*2+1 + pavgb m0, m2 + pavgb m1, m3 + + movu m3, r0+r2+mmsize + pavgb m2, m3, r0+mmsize + movu m5, r0+r2+1+mmsize + pavgb m4, m5, r0+1+mmsize + pavgb m2, m4 + + pshufb m0, m7 + pshufb m2, m7 + punpcklqdq m0, m0, m2 + vpermq m0, m0, q3120 + movu %1, m0 +%endmacro + +%macro SUBSAMPLEFILT16x2 3 + mova m3, r0+%3+mmsize + mova m2, r0+%3 + pavgb m3, r0+%3+r2+mmsize + pavgb m2, r0+%3+r2 + PALIGNR %1, m3, 1, m6 + pavgb %1, m3 + PALIGNR m3, m2, 1, m6 + pavgb m3, m2 +%if cpuflag(xop) + vpperm m3, m3, %1, m6 +%else + pand m3, m7 + pand %1, m7 + packuswb m3, %1 +%endif + mova %2, m3 + mova %1, m2 +%endmacro + +%macro SUBSAMPLEFILT8x2U 2 + mova m2, r0+%2 + pavgb m2, r0+%2+r2 + mova m0, r0+%2+1 + pavgb m0, r0+%2+r2+1 + pavgb m1, m3 + pavgb m0, m2 + pand m1, m7 + pand m0, m7 + packuswb m0, m1 + mova %1, m0 +%endmacro + +%macro SUBSAMPLEFILT8xU 2 + mova m3, r0+%2+8 + mova m2, r0+%2 + pavgw m3, r0+%2+r2+8 + pavgw m2, r0+%2+r2 + movu m1, r0+%2+10 + movu m0, r0+%2+2 + pavgw m1, r0+%2+r2+10 + pavgw m0, r0+%2+r2+2 + pavgw m1, m3 + pavgw m0, m2 + psrld m3, m1, 16 + pand m1, m7 + pand m0, m7 + packssdw m0, m1 + movu %1, m0 +%endmacro + +%macro SUBSAMPLEFILT8xA 3 + movu m3, r0+%3+mmsize + movu m2, r0+%3 + pavgw m3, r0+%3+r2+mmsize + pavgw m2, r0+%3+r2 + PALIGNR %1, m3, 2, m6 + pavgw %1, m3 + PALIGNR m3, m2, 2, m6 + pavgw m3, m2 +%if cpuflag(xop) + vpperm m3, m3, %1, m6 +%else + pand m3, m7 + pand %1, m7 + packssdw m3, %1 +%endif +%if cpuflag(avx2) + vpermq m3, m3, q3120 +%endif + movu %2, m3 + movu %1, m2 +%endmacro + +;----------------------------------------------------------------------------- +; void frame_subsample_luma( uint8_t *src0, uint8_t *dst0, +; intptr_t src_stride, intptr_t dst_stride, int width, int height ) +;----------------------------------------------------------------------------- + +%macro FRAME_SUBSAMPLE_LUMA 0 +cglobal frame_subsample_luma, 6,7,(12-4*(BIT_DEPTH/9)) ; 8 for HIGH_BIT_DEPTH, 12 otherwise +%if HIGH_BIT_DEPTH + shl dword r3m, 1 + FIX_STRIDES r2 + shl dword r4m, 1 +%endif +%if mmsize >= 16 + add dword r4m, mmsize-1 + and dword r4m, ~(mmsize-1) +%endif + ; src += 2*(height-1)*stride + 2*width + mov r6d, r5m + dec r6d + imul r6d, r2d + add r6d, r4m + lea r0, r0+r6*2 + ; dst += (height-1)*stride + width + mov r6d, r5m + dec r6d + imul r6d, r3m + add r6d, r4m + add r1, r6 + ; gap = stride - width + mov r6d, r3m + sub r6d, r4m + PUSH r6 + %define dst_gap rsp+gprsize + mov r6d, r2d + sub r6d, r4m + shl r6d, 1 + PUSH r6 + %define src_gap rsp +%if HIGH_BIT_DEPTH +%if cpuflag(xop) + mova m6, deinterleave_shuf32a + mova m7, deinterleave_shuf32b +%else + pcmpeqw m7, m7 + psrld m7, 16 +%endif +.vloop: + mov r6d, r4m +%ifnidn cpuname, mmx2 + movu m0, r0 + movu m1, r0+r2 + pavgw m0, m1 + pavgw m1, r0+r2*2 +%endif +.hloop: + sub r0, mmsize*2 + sub r1, mmsize +%ifidn cpuname, mmx2 + SUBSAMPLEFILT8xU r1, 0 +%else + SUBSAMPLEFILT8xA m0, r1, 0 +%endif + sub r6d, mmsize + jg .hloop +%else ; !HIGH_BIT_DEPTH +%if cpuflag(avx2) + mova m7, deinterleave_shuf +%elif cpuflag(xop) + mova m6, deinterleave_shuf32a + mova m7, deinterleave_shuf32b +%else + pcmpeqb m7, m7 + psrlw m7, 8 +%endif +.vloop: + mov r6d, r4m +%ifnidn cpuname, mmx2 +%if mmsize <= 16 + mova m0, r0
View file
x265_3.5.tar.gz/source/common/x86/mc.h
Changed
@@ -36,6 +36,17 @@ #undef LOWRES +#define SUBSAMPLELUMA(cpu) \ + void PFX(frame_subsample_luma_ ## cpu)(const pixel* src0, pixel* dst0, intptr_t src_stride, intptr_t dst_stride, int width, int height); +SUBSAMPLELUMA(mmx2) +SUBSAMPLELUMA(sse2) +SUBSAMPLELUMA(ssse3) +SUBSAMPLELUMA(avx) +SUBSAMPLELUMA(avx2) +SUBSAMPLELUMA(xop) + +#undef SUBSAMPLELUMA + #define PROPAGATE_COST(cpu) \ void PFX(mbtree_propagate_cost_ ## cpu)(int* dst, const uint16_t* propagateIn, const int32_t* intraCosts, \ const uint16_t* interCosts, const int32_t* invQscales, const double* fpsFactor, int len);
View file
x265_3.5.tar.gz/source/encoder/analysis.cpp
Changed
@@ -3645,7 +3645,7 @@ qp += distortionData->offsetctu.m_cuAddr; } - if (m_param->analysisLoadReuseLevel == 10 && m_param->rc.cuTree) + if (m_param->analysisLoadReuseLevel >= 2 && m_param->rc.cuTree) { int cuIdx = (ctu.m_cuAddr * ctu.m_numPartitions) + cuGeom.absPartIdx; if (ctu.m_slice->m_sliceType == I_SLICE)
View file
x265_3.5.tar.gz/source/encoder/api.cpp
Changed
@@ -608,6 +608,14 @@ if (numEncoded < 0) encoder->m_aborted = true; + if ((!encoder->m_numDelayedPic && !numEncoded) && (encoder->m_param->bEnableEndOfSequence || encoder->m_param->bEnableEndOfBitstream)) + { + Bitstream bs; + encoder->getEndNalUnits(encoder->m_nalList, bs); + *pp_nal = &encoder->m_nalList.m_nal0; + if (pi_nal) *pi_nal = encoder->m_nalList.m_numNal; + } + return numEncoded; }
View file
x265_3.5.tar.gz/source/encoder/dpb.cpp
Changed
@@ -70,10 +70,18 @@ { Frame *curFrame = iterFrame; iterFrame = iterFrame->m_next; - if (!curFrame->m_encData->m_bHasReferences && !curFrame->m_countRefEncoders) + bool isMCSTFReferenced = false; + + if (curFrame->m_param->bEnableTemporalFilter) + isMCSTFReferenced =!!(curFrame->m_refPicCnt1); + + if (!curFrame->m_encData->m_bHasReferences && !curFrame->m_countRefEncoders && !isMCSTFReferenced) { curFrame->m_bChromaExtended = false; + if (curFrame->m_param->bEnableTemporalFilter) + *curFrame->m_isSubSampled = false; + // Reset column counter X265_CHECK(curFrame->m_reconRowFlag != NULL, "curFrame->m_reconRowFlag check failure"); X265_CHECK(curFrame->m_reconColCount != NULL, "curFrame->m_reconColCount check failure");
View file
x265_3.5.tar.gz/source/encoder/encoder.cpp
Changed
@@ -72,7 +72,40 @@ { { 1, 1, 1, 1, 1, 5, 1, 2, 2, 2, 50 }, { 1, 1, 1, 1, 1, 5, 0, 16, 9, 9, 81 }, - { 1, 1, 1, 1, 1, 5, 0, 1, 1, 1, 82 } + { 1, 1, 1, 1, 1, 5, 0, 1, 1, 1, 82 }, + { 1, 1, 1, 1, 1, 5, 0, 18, 9, 9, 84 } +}; + +typedef struct +{ + int bEnableVideoSignalTypePresentFlag; + int bEnableColorDescriptionPresentFlag; + int bEnableChromaLocInfoPresentFlag; + int colorPrimaries; + int transferCharacteristics; + int matrixCoeffs; + int bEnableVideoFullRangeFlag; + int chromaSampleLocTypeTopField; + int chromaSampleLocTypeBottomField; + const char* systemId; +}VideoSignalTypePresets; + +VideoSignalTypePresets vstPresets = +{ + {1, 1, 1, 6, 6, 6, 0, 0, 0, "BT601_525"}, + {1, 1, 1, 5, 6, 5, 0, 0, 0, "BT601_626"}, + {1, 1, 1, 1, 1, 1, 0, 0, 0, "BT709_YCC"}, + {1, 1, 0, 1, 1, 0, 0, 0, 0, "BT709_RGB"}, + {1, 1, 1, 9, 14, 1, 0, 2, 2, "BT2020_YCC_NCL"}, + {1, 1, 0, 9, 16, 9, 0, 0, 0, "BT2020_RGB"}, + {1, 1, 1, 9, 16, 9, 0, 2, 2, "BT2100_PQ_YCC"}, + {1, 1, 1, 9, 16, 14, 0, 2, 2, "BT2100_PQ_ICTCP"}, + {1, 1, 0, 9, 16, 0, 0, 0, 0, "BT2100_PQ_RGB"}, + {1, 1, 1, 9, 18, 9, 0, 2, 2, "BT2100_HLG_YCC"}, + {1, 1, 0, 9, 18, 0, 0, 0, 0, "BT2100_HLG_RGB"}, + {1, 1, 0, 1, 1, 0, 1, 0, 0, "FR709_RGB"}, + {1, 1, 0, 9, 14, 0, 1, 0, 0, "FR2020_RGB"}, + {1, 1, 1, 12, 1, 6, 1, 1, 1, "FRP3D65_YCC"} }; } @@ -109,6 +142,7 @@ m_threadPool = NULL; m_analysisFileIn = NULL; m_analysisFileOut = NULL; + m_filmGrainIn = NULL; m_naluFile = NULL; m_offsetEmergency = NULL; m_iFrameNum = 0; @@ -134,12 +168,8 @@ m_prevTonemapPayload.payload = NULL; m_startPoint = 0; m_saveCTUSize = 0; - m_edgePic = NULL; - m_edgeHistThreshold = 0; - m_chromaHistThreshold = 0.0; - m_scaledEdgeThreshold = 0.0; - m_scaledChromaThreshold = 0.0; m_zoneIndex = 0; + m_origPicBuffer = 0; } inline char *strcatFilename(const char *input, const char *suffix) @@ -216,34 +246,6 @@ } } - if (m_param->bHistBasedSceneCut) - { - m_planeSizes0 = (m_param->sourceWidth >> x265_cli_cspsp->internalCsp.width0) * (m_param->sourceHeight >> x265_cli_cspsm_param->internalCsp.height0); - uint32_t pixelbytes = m_param->internalBitDepth > 8 ? 2 : 1; - m_edgePic = X265_MALLOC(pixel, m_planeSizes0 * pixelbytes); - m_edgeHistThreshold = m_param->edgeTransitionThreshold; - m_chromaHistThreshold = x265_min(m_edgeHistThreshold * 10.0, MAX_SCENECUT_THRESHOLD); - m_scaledEdgeThreshold = x265_min(m_edgeHistThreshold * SCENECUT_STRENGTH_FACTOR, MAX_SCENECUT_THRESHOLD); - m_scaledChromaThreshold = x265_min(m_chromaHistThreshold * SCENECUT_STRENGTH_FACTOR, MAX_SCENECUT_THRESHOLD); - if (m_param->sourceBitDepth != m_param->internalBitDepth) - { - int size = m_param->sourceWidth * m_param->sourceHeight; - int hshift = CHROMA_H_SHIFT(m_param->internalCsp); - int vshift = CHROMA_V_SHIFT(m_param->internalCsp); - int widthC = m_param->sourceWidth >> hshift; - int heightC = m_param->sourceHeight >> vshift; - - m_inputPic0 = X265_MALLOC(pixel, size); - if (m_param->internalCsp != X265_CSP_I400) - { - for (int j = 1; j < 3; j++) - { - m_inputPicj = X265_MALLOC(pixel, widthC * heightC); - } - } - } - } - // Do not allow WPP if only one row or fewer than 3 columns, it is pointless and unstable if (rows == 1 || cols < 3) { @@ -357,6 +359,10 @@ lookAheadThreadPooli.start(); m_lookahead->m_numPools = pools; m_dpb = new DPB(m_param); + + if (m_param->bEnableTemporalFilter) + m_origPicBuffer = new OrigPicBuffer(); + m_rateControl = new RateControl(*m_param, this); if (!m_param->bResetZoneConfig) { @@ -518,6 +524,15 @@ } } } + if (m_param->filmGrain) + { + m_filmGrainIn = x265_fopen(m_param->filmGrain, "rb"); + if (!m_filmGrainIn) + { + x265_log_file(NULL, X265_LOG_ERROR, "Failed to open film grain characteristics binary file %s\n", m_param->filmGrain); + } + } + m_bZeroLatency = !m_param->bframes && !m_param->lookaheadDepth && m_param->frameNumThreads == 1 && m_param->maxSlices == 1; m_aborted |= parseLambdaFile(m_param); @@ -879,26 +894,6 @@ } } - if (m_param->bHistBasedSceneCut) - { - if (m_edgePic != NULL) - { - X265_FREE_ZERO(m_edgePic); - } - - if (m_param->sourceBitDepth != m_param->internalBitDepth) - { - X265_FREE_ZERO(m_inputPic0); - if (m_param->internalCsp != X265_CSP_I400) - { - for (int i = 1; i < 3; i++) - { - X265_FREE_ZERO(m_inputPici); - } - } - } - } - for (int i = 0; i < m_param->frameNumThreads; i++) { if (m_frameEncoderi) @@ -924,6 +919,10 @@ delete zoneReadCount; delete zoneWriteCount; } + + if (m_param->bEnableTemporalFilter) + delete m_origPicBuffer; + if (m_rateControl) { m_rateControl->destroy(); @@ -963,6 +962,8 @@ } if (m_naluFile) fclose(m_naluFile); + if (m_filmGrainIn) + x265_fclose(m_filmGrainIn); #ifdef SVT_HEVC X265_FREE(m_svtAppData); @@ -974,6 +975,7 @@ /* release string arguments that were strdup'd */ free((char*)m_param->rc.lambdaFileName); free((char*)m_param->rc.statFileName); + free((char*)m_param->rc.sharedMemName); free((char*)m_param->analysisReuseFileName); free((char*)m_param->scalingLists); free((char*)m_param->csvfn); @@ -982,6 +984,7 @@ free((char*)m_param->toneMapFile); free((char*)m_param->analysisSave); free((char*)m_param->analysisLoad); + free((char*)m_param->videoSignalTypePreset); PARAM_NS::x265_param_free(m_param); } } @@ -1358,215 +1361,90 @@ dest->planes2 = (char*)dest->planes1 + src->stride1 * (src->height >> x265_cli_cspssrc->colorSpace.height1); } -bool Encoder::computeHistograms(x265_picture *pic) +bool Encoder::isFilterThisframe(uint8_t sliceTypeConfig, int curSliceType) { - pixel *src = NULL, *planeV = NULL, *planeU = NULL; - uint32_t widthC, heightC; - int hshift, vshift; -
View file
x265_3.5.tar.gz/source/encoder/encoder.h
Changed
@@ -32,6 +32,7 @@ #include "nal.h" #include "framedata.h" #include "svt.h" +#include "temporalfilter.h" #ifdef ENABLE_HDR10_PLUS #include "dynamicHDR10/hdr10plus.h" #endif @@ -256,19 +257,6 @@ int m_bToneMap; // Enables tone-mapping int m_enableNal; - /* For histogram based scene-cut detection */ - pixel* m_edgePic; - pixel* m_inputPic3; - int32_t m_curYUVHist3HISTOGRAM_BINS; - int32_t m_prevYUVHist3HISTOGRAM_BINS; - int32_t m_curEdgeHist2; - int32_t m_prevEdgeHist2; - uint32_t m_planeSizes3; - double m_edgeHistThreshold; - double m_chromaHistThreshold; - double m_scaledEdgeThreshold; - double m_scaledChromaThreshold; - #ifdef ENABLE_HDR10_PLUS const hdr10plus_api *m_hdr10plus_api; uint8_t **m_cim; @@ -295,6 +283,9 @@ ThreadSafeInteger* zoneReadCount; ThreadSafeInteger* zoneWriteCount; + /* Film grain model file */ + FILE* m_filmGrainIn; + OrigPicBuffer* m_origPicBuffer; Encoder(); ~Encoder() @@ -327,6 +318,8 @@ void getStreamHeaders(NALList& list, Entropy& sbacCoder, Bitstream& bs); + void getEndNalUnits(NALList& list, Bitstream& bs); + void fetchStats(x265_stats* stats, size_t statsSizeBytes); void printSummary(); @@ -373,11 +366,6 @@ void copyPicture(x265_picture *dest, const x265_picture *src); - bool computeHistograms(x265_picture *pic); - void computeHistogramSAD(double *maxUVNormalizedSAD, double *edgeNormalizedSAD, int curPoc); - double normalizeRange(int32_t value, int32_t minValue, int32_t maxValue, double rangeStart, double rangeEnd); - void findSceneCuts(x265_picture *pic, bool& bDup, double m_maxUVSADVal, double m_edgeSADVal, bool& isMaxThres, bool& isHardSC); - void initRefIdx(); void analyseRefIdx(int *numRefIdx); void updateRefIdx(); @@ -387,6 +375,11 @@ void configureDolbyVisionParams(x265_param* p); + void configureVideoSignalTypePreset(x265_param* p); + + bool isFilterThisframe(uint8_t sliceTypeConfig, int curSliceType); + bool generateMcstfRef(Frame* frameEnc, FrameEncoder* currEncoder); + protected: void initVPS(VPS *vps);
View file
x265_3.5.tar.gz/source/encoder/frameencoder.cpp
Changed
@@ -34,6 +34,7 @@ #include "common.h" #include "slicetype.h" #include "nal.h" +#include "temporalfilter.h" namespace X265_NS { void weightAnalyse(Slice& slice, Frame& frame, x265_param& param); @@ -101,6 +102,16 @@ delete m_rce.picTimingSEI; delete m_rce.hrdTiming; } + + if (m_param->bEnableTemporalFilter) + { + delete m_frameEncTF->m_metld; + + for (int i = 0; i < (m_frameEncTF->m_range << 1); i++) + m_frameEncTF->destroyRefPicInfo(&m_mcstfRefListi); + + delete m_frameEncTF; + } } bool FrameEncoder::init(Encoder *top, int numRows, int numCols) @@ -195,6 +206,16 @@ m_sliceAddrBits = (uint16_t)(tmp + 1); } + if (m_param->bEnableTemporalFilter) + { + m_frameEncTF = new TemporalFilter(); + if (m_frameEncTF) + m_frameEncTF->init(m_param); + + for (int i = 0; i < (m_frameEncTF->m_range << 1); i++) + ok &= !!m_frameEncTF->createRefPicInfo(&m_mcstfRefListi, m_param); + } + return ok; } @@ -450,7 +471,7 @@ m_ssimCnt = 0; memset(&(m_frame->m_encData->m_frameStats), 0, sizeof(m_frame->m_encData->m_frameStats)); - if (!m_param->bHistBasedSceneCut && m_param->rc.aqMode != X265_AQ_EDGE && m_param->recursionSkipMode == EDGE_BASED_RSKIP) + if (m_param->rc.aqMode != X265_AQ_EDGE && m_param->recursionSkipMode == EDGE_BASED_RSKIP) { int height = m_frame->m_fencPic->m_picHeight; int width = m_frame->m_fencPic->m_picWidth; @@ -467,6 +488,12 @@ * unit) */ Slice* slice = m_frame->m_encData->m_slice; + if (m_param->bEnableEndOfSequence && m_frame->m_lowres.sliceType == X265_TYPE_IDR && m_frame->m_poc) + { + m_bs.resetBits(); + m_nalList.serialize(NAL_UNIT_EOS, m_bs); + } + if (m_param->bEnableAccessUnitDelimiters && (m_frame->m_poc || m_param->bRepeatHeaders)) { m_bs.resetBits(); @@ -573,6 +600,12 @@ int qp = m_top->m_rateControl->rateControlStart(m_frame, &m_rce, m_top); m_rce.newQp = qp; + if (m_param->bEnableTemporalFilter) + { + m_frameEncTF->m_QP = qp; + m_frameEncTF->bilateralFilter(m_frame, m_mcstfRefList, m_param->temporalFilterStrength); + } + if (m_nr) { if (qp > QP_MAX_SPEC && m_frame->m_param->rc.vbvBufferSize) @@ -756,7 +789,14 @@ m_seiAlternativeTC.m_preferredTransferCharacteristics = m_param->preferredTransferCharacteristics; m_seiAlternativeTC.writeSEImessages(m_bs, *slice->m_sps, NAL_UNIT_PREFIX_SEI, m_nalList, m_param->bSingleSeiNal); } - + /* Write Film grain characteristics if present */ + if (this->m_top->m_filmGrainIn) + { + FilmGrainCharacteristics m_filmGrain; + /* Read the Film grain model file */ + readModel(&m_filmGrain, this->m_top->m_filmGrainIn); + m_filmGrain.writeSEImessages(m_bs, *slice->m_sps, NAL_UNIT_PREFIX_SEI, m_nalList, m_param->bSingleSeiNal); + } /* Write user SEI */ for (int i = 0; i < m_frame->m_userSEI.numPayloads; i++) { @@ -933,6 +973,23 @@ if (m_param->bDynamicRefine && m_top->m_startPoint <= m_frame->m_encodeOrder) //Avoid collecting data that will not be used by future frames. collectDynDataFrame(); + if (m_param->bEnableTemporalFilter && m_top->isFilterThisframe(m_frame->m_mcstf->m_sliceTypeConfig, m_frame->m_lowres.sliceType)) + { + //Reset the MCSTF context in Frame Encoder and Frame + for (int i = 0; i < (m_frameEncTF->m_range << 1); i++) + { + memset(m_mcstfRefListi.mvs0, 0, sizeof(MV) * ((m_param->sourceWidth / 16) * (m_param->sourceHeight / 16))); + memset(m_mcstfRefListi.mvs1, 0, sizeof(MV) * ((m_param->sourceWidth / 16) * (m_param->sourceHeight / 16))); + memset(m_mcstfRefListi.mvs2, 0, sizeof(MV) * ((m_param->sourceWidth / 16) * (m_param->sourceHeight / 16))); + memset(m_mcstfRefListi.mvs, 0, sizeof(MV) * ((m_param->sourceWidth / 4) * (m_param->sourceHeight / 4))); + memset(m_mcstfRefListi.noise, 0, sizeof(int) * ((m_param->sourceWidth / 4) * (m_param->sourceHeight / 4))); + memset(m_mcstfRefListi.error, 0, sizeof(int) * ((m_param->sourceWidth / 4) * (m_param->sourceHeight / 4))); + + m_frame->m_mcstf->m_numRef = 0; + } + } + + if (m_param->rc.bStatWrite) { int totalI = 0, totalP = 0, totalSkip = 0; @@ -2127,6 +2184,54 @@ m_nr->nrOffsetDenoisecat0 = 0; } } + +void FrameEncoder::readModel(FilmGrainCharacteristics* m_filmGrain, FILE* filmgrain) +{ + char const* errorMessage = "Error reading FilmGrain characteristics\n"; + FilmGrain m_fg; + x265_fread((char* )&m_fg, sizeof(bool) * 3 + sizeof(uint8_t), 1, filmgrain, errorMessage); + m_filmGrain->m_filmGrainCharacteristicsCancelFlag = m_fg.m_filmGrainCharacteristicsCancelFlag; + m_filmGrain->m_filmGrainCharacteristicsPersistenceFlag = m_fg.m_filmGrainCharacteristicsPersistenceFlag; + m_filmGrain->m_filmGrainModelId = m_fg.m_filmGrainModelId; + m_filmGrain->m_separateColourDescriptionPresentFlag = m_fg.m_separateColourDescriptionPresentFlag; + if (m_filmGrain->m_separateColourDescriptionPresentFlag) + { + ColourDescription m_clr; + x265_fread((char* )&m_clr, sizeof(bool) + sizeof(uint8_t) * 5, 1, filmgrain, errorMessage); + m_filmGrain->m_filmGrainBitDepthLumaMinus8 = m_clr.m_filmGrainBitDepthLumaMinus8; + m_filmGrain->m_filmGrainBitDepthChromaMinus8 = m_clr.m_filmGrainBitDepthChromaMinus8; + m_filmGrain->m_filmGrainFullRangeFlag = m_clr.m_filmGrainFullRangeFlag; + m_filmGrain->m_filmGrainColourPrimaries = m_clr.m_filmGrainColourPrimaries; + m_filmGrain->m_filmGrainTransferCharacteristics = m_clr.m_filmGrainTransferCharacteristics; + m_filmGrain->m_filmGrainMatrixCoeffs = m_clr.m_filmGrainMatrixCoeffs; + } + FGPresent m_present; + x265_fread((char* )&m_present, sizeof(bool) * 3 + sizeof(uint8_t) * 2, 1, filmgrain, errorMessage); + m_filmGrain->m_blendingModeId = m_present.m_blendingModeId; + m_filmGrain->m_log2ScaleFactor = m_present.m_log2ScaleFactor; + m_filmGrain->m_compModel0.bPresentFlag = m_present.m_presentFlag0; + m_filmGrain->m_compModel1.bPresentFlag = m_present.m_presentFlag1; + m_filmGrain->m_compModel2.bPresentFlag = m_present.m_presentFlag2; + for (int i = 0; i < MAX_NUM_COMPONENT; i++) + { + if (m_filmGrain->m_compModeli.bPresentFlag) + { + x265_fread((char* )(&m_filmGrain->m_compModeli.m_filmGrainNumIntensityIntervalMinus1), sizeof(uint8_t), 1, filmgrain, errorMessage); + x265_fread((char* )(&m_filmGrain->m_compModeli.numModelValues), sizeof(uint8_t), 1, filmgrain, errorMessage); + m_filmGrain->m_compModeli.intensityValues = (FilmGrainCharacteristics::CompModelIntensityValues* ) malloc(sizeof(FilmGrainCharacteristics::CompModelIntensityValues) * (m_filmGrain->m_compModeli.m_filmGrainNumIntensityIntervalMinus1+1)) ; + for (int j = 0; j <= m_filmGrain->m_compModeli.m_filmGrainNumIntensityIntervalMinus1; j++) + { + x265_fread((char* )(&m_filmGrain->m_compModeli.intensityValuesj.intensityIntervalLowerBound), sizeof(uint8_t), 1, filmgrain, errorMessage); + x265_fread((char* )(&m_filmGrain->m_compModeli.intensityValuesj.intensityIntervalUpperBound), sizeof(uint8_t), 1, filmgrain, errorMessage); + m_filmGrain->m_compModeli.intensityValuesj.compModelValue = (int* ) malloc(sizeof(int) * (m_filmGrain->m_compModeli.numModelValues)); + for (int k = 0; k < m_filmGrain->m_compModeli.numModelValues; k++) + { + x265_fread((char* )(&m_filmGrain->m_compModeli.intensityValuesj.compModelValuek), sizeof(int), 1, filmgrain, errorMessage); + } + } + } + } +} #if ENABLE_LIBVMAF void FrameEncoder::vmafFrameLevelScore() {
View file
x265_3.5.tar.gz/source/encoder/frameencoder.h
Changed
@@ -40,6 +40,7 @@ #include "ratecontrol.h" #include "reference.h" #include "nal.h" +#include "temporalfilter.h" namespace X265_NS { // private x265 namespace @@ -113,6 +114,34 @@ } }; +/*Film grain characteristics*/ +struct FilmGrain +{ + bool m_filmGrainCharacteristicsCancelFlag; + bool m_filmGrainCharacteristicsPersistenceFlag; + bool m_separateColourDescriptionPresentFlag; + uint8_t m_filmGrainModelId; + uint8_t m_blendingModeId; + uint8_t m_log2ScaleFactor; +}; + +struct ColourDescription +{ + bool m_filmGrainFullRangeFlag; + uint8_t m_filmGrainBitDepthLumaMinus8; + uint8_t m_filmGrainBitDepthChromaMinus8; + uint8_t m_filmGrainColourPrimaries; + uint8_t m_filmGrainTransferCharacteristics; + uint8_t m_filmGrainMatrixCoeffs; +}; + +struct FGPresent +{ + uint8_t m_blendingModeId; + uint8_t m_log2ScaleFactor; + bool m_presentFlag3; +}; + // Manages the wave-front processing of a single encoding frame class FrameEncoder : public WaveFront, public Thread { @@ -205,6 +234,10 @@ FrameFilter m_frameFilter; NALList m_nalList; + // initialization for mcstf + TemporalFilter* m_frameEncTF; + TemporalFilterRefPicInfo m_mcstfRefListMAX_MCSTF_TEMPORAL_WINDOW_LENGTH; + class WeightAnalysis : public BondedTaskGroup { public: @@ -250,6 +283,7 @@ void collectDynDataFrame(); void computeAvgTrainingData(); void collectDynDataRow(CUData& ctu, FrameStats* rowStats); + void readModel(FilmGrainCharacteristics* m_filmGrain, FILE* filmgrain); }; }
View file
x265_3.5.tar.gz/source/encoder/motion.cpp
Changed
@@ -190,6 +190,31 @@ X265_CHECK(!bChromaSATD, "chroma distortion measurements impossible in this code path\n"); } +/* Called by lookahead, luma only, no use of PicYuv */ +void MotionEstimate::setSourcePU(pixel *fencY, intptr_t stride, intptr_t offset, int pwidth, int pheight, const int method, const int refine) +{ + partEnum = partitionFromSizes(pwidth, pheight); + X265_CHECK(LUMA_4x4 != partEnum, "4x4 inter partition detected!\n"); + sad = primitives.pupartEnum.sad; + ads = primitives.pupartEnum.ads; + satd = primitives.pupartEnum.satd; + sad_x3 = primitives.pupartEnum.sad_x3; + sad_x4 = primitives.pupartEnum.sad_x4; + + + blockwidth = pwidth; + blockOffset = offset; + absPartIdx = ctuAddr = -1; + + /* Search params */ + searchMethod = method; + subpelRefine = refine; + + /* copy PU block into cache */ + primitives.pupartEnum.copy_pp(fencPUYuv.m_buf0, FENC_STRIDE, fencY + offset, stride); + X265_CHECK(!bChromaSATD, "chroma distortion measurements impossible in this code path\n"); +} + /* Called by Search::predInterSearch() or --pme equivalent, chroma residual might be considered */ void MotionEstimate::setSourcePU(const Yuv& srcFencYuv, int _ctuAddr, int cuPartIdx, int puPartIdx, int pwidth, int pheight, const int method, const int refine, bool bChroma) {
View file
x265_3.5.tar.gz/source/encoder/motion.h
Changed
@@ -77,7 +77,7 @@ void init(int csp); /* Methods called at slice setup */ - + void setSourcePU(pixel *fencY, intptr_t stride, intptr_t offset, int pwidth, int pheight, const int searchMethod, const int subpelRefine); void setSourcePU(pixel *fencY, intptr_t stride, intptr_t offset, int pwidth, int pheight, const int searchMethod, const int searchL0, const int searchL1, const int subpelRefine); void setSourcePU(const Yuv& srcFencYuv, int ctuAddr, int cuPartIdx, int puPartIdx, int pwidth, int pheight, const int searchMethod, const int subpelRefine, bool bChroma);
View file
x265_3.5.tar.gz/source/encoder/ratecontrol.cpp
Changed
@@ -41,6 +41,10 @@ #define BR_SHIFT 6 #define CPB_SHIFT 4 +#define SHARED_DATA_ALIGNMENT 4 ///< 4btye, 32bit +#define CUTREE_SHARED_MEM_NAME "cutree" +#define GOP_CNT_CU_TREE 3 + using namespace X265_NS; /* Amortize the partial cost of I frames over the next N frames */ @@ -104,6 +108,37 @@ return output; } +typedef struct CUTreeSharedDataItem +{ + uint8_t *type; + uint16_t *stats; +}CUTreeSharedDataItem; + +void static ReadSharedCUTreeData(void *dst, void *src, int32_t size) +{ + CUTreeSharedDataItem *statsDst = reinterpret_cast<CUTreeSharedDataItem *>(dst); + uint8_t *typeSrc = reinterpret_cast<uint8_t *>(src); + *statsDst->type = *typeSrc; + + ///< for memory alignment, the type will take 32bit in the shared memory + int32_t offset = (sizeof(*statsDst->type) + SHARED_DATA_ALIGNMENT - 1) & ~(SHARED_DATA_ALIGNMENT - 1); + uint16_t *statsSrc = reinterpret_cast<uint16_t *>(typeSrc + offset); + memcpy(statsDst->stats, statsSrc, size - offset); +} + +void static WriteSharedCUTreeData(void *dst, void *src, int32_t size) +{ + CUTreeSharedDataItem *statsSrc = reinterpret_cast<CUTreeSharedDataItem *>(src); + uint8_t *typeDst = reinterpret_cast<uint8_t *>(dst); + *typeDst = *statsSrc->type; + + ///< for memory alignment, the type will take 32bit in the shared memory + int32_t offset = (sizeof(*statsSrc->type) + SHARED_DATA_ALIGNMENT - 1) & ~(SHARED_DATA_ALIGNMENT - 1); + uint16_t *statsDst = reinterpret_cast<uint16_t *>(typeDst + offset); + memcpy(statsDst, statsSrc->stats, size - offset); +} + + inline double qScale2bits(RateControlEntry *rce, double qScale) { if (qScale < 0.1) @@ -209,6 +244,7 @@ m_lastAbrResetPoc = -1; m_statFileOut = NULL; m_cutreeStatFileOut = m_cutreeStatFileIn = NULL; + m_cutreeShrMem = NULL; m_rce2Pass = NULL; m_encOrder = NULL; m_lastBsliceSatdCost = 0; @@ -320,6 +356,42 @@ m_cuTreeStats.qpBufferi = NULL; } +bool RateControl::initCUTreeSharedMem() +{ + if (!m_cutreeShrMem) { + m_cutreeShrMem = new RingMem(); + if (!m_cutreeShrMem) + { + return false; + } + + ///< now cutree data form at most 3 gops would be stored in the shared memory at the same time + int32_t itemSize = (sizeof(uint8_t) + SHARED_DATA_ALIGNMENT - 1) & ~(SHARED_DATA_ALIGNMENT - 1); + if (m_param->rc.qgSize == 8) + { + itemSize += sizeof(uint16_t) * m_ncu * 4; + } + else + { + itemSize += sizeof(uint16_t) * m_ncu; + } + + int32_t itemCnt = X265_MIN(m_param->keyframeMax, (int)(m_fps + 0.5)); + itemCnt *= GOP_CNT_CU_TREE; + + char shrnameMAX_SHR_NAME_LEN = { 0 }; + strcpy(shrname, m_param->rc.sharedMemName); + strcat(shrname, CUTREE_SHARED_MEM_NAME); + + if (!m_cutreeShrMem->init(itemSize, itemCnt, shrname)) + { + return false; + } + } + + return true; +} + bool RateControl::init(const SPS& sps) { if (m_isVbv && !m_initVbv) @@ -421,244 +493,257 @@ /* Load stat file and init 2pass algo */ if (m_param->rc.bStatRead) { - m_expectedBitsSum = 0; - char *p, *statsIn, *statsBuf; - /* read 1st pass stats */ - statsIn = statsBuf = x265_slurp_file(fileName); - if (!statsBuf) - return false; - if (m_param->rc.cuTree) + if (X265_SHARE_MODE_FILE == m_param->rc.dataShareMode) { - char *tmpFile = strcatFilename(fileName, ".cutree"); - if (!tmpFile) + m_expectedBitsSum = 0; + char *p, *statsIn, *statsBuf; + /* read 1st pass stats */ + statsIn = statsBuf = x265_slurp_file(fileName); + if (!statsBuf) return false; - m_cutreeStatFileIn = x265_fopen(tmpFile, "rb"); - X265_FREE(tmpFile); - if (!m_cutreeStatFileIn) + if (m_param->rc.cuTree) { - x265_log_file(m_param, X265_LOG_ERROR, "can't open stats file %s.cutree\n", fileName); - return false; + char *tmpFile = strcatFilename(fileName, ".cutree"); + if (!tmpFile) + return false; + m_cutreeStatFileIn = x265_fopen(tmpFile, "rb"); + X265_FREE(tmpFile); + if (!m_cutreeStatFileIn) + { + x265_log_file(m_param, X265_LOG_ERROR, "can't open stats file %s.cutree\n", fileName); + return false; + } } - } - /* check whether 1st pass options were compatible with current options */ - if (strncmp(statsBuf, "#options:", 9)) - { - x265_log(m_param, X265_LOG_ERROR,"options list in stats file not valid\n"); - return false; - } - { - int i, j, m; - uint32_t k , l; - bool bErr = false; - char *opts = statsBuf; - statsIn = strchr(statsBuf, '\n'); - if (!statsIn) - { - x265_log(m_param, X265_LOG_ERROR, "Malformed stats file\n"); - return false; - } - *statsIn = '\0'; - statsIn++; - if ((p = strstr(opts, " input-res=")) == 0 || sscanf(p, " input-res=%dx%d", &i, &j) != 2) - { - x265_log(m_param, X265_LOG_ERROR, "Resolution specified in stats file not valid\n"); - return false; - } - if ((p = strstr(opts, " fps=")) == 0 || sscanf(p, " fps=%u/%u", &k, &l) != 2) + /* check whether 1st pass options were compatible with current options */ + if (strncmp(statsBuf, "#options:", 9)) { - x265_log(m_param, X265_LOG_ERROR, "fps specified in stats file not valid\n"); + x265_log(m_param, X265_LOG_ERROR, "options list in stats file not valid\n"); return false; } - if (((p = strstr(opts, " vbv-maxrate=")) == 0 || sscanf(p, " vbv-maxrate=%d", &m) != 1) && m_param->rc.rateControlMode == X265_RC_CRF) { - x265_log(m_param, X265_LOG_ERROR, "Constant rate-factor is incompatible with 2pass without vbv-maxrate in the previous pass\n"); - return false; - } - if (k != m_param->fpsNum || l != m_param->fpsDenom) - { - x265_log(m_param, X265_LOG_ERROR, "fps mismatch with 1st pass (%u/%u vs %u/%u)\n", - m_param->fpsNum, m_param->fpsDenom, k, l); - return false; - } - if (m_param->analysisMultiPassRefine) - { - p = strstr(opts, "ref="); - sscanf(p, "ref=%d", &i); - if (i > m_param->maxNumReferences) + int i, j, m; + uint32_t k, l; + bool bErr = false; + char *opts = statsBuf; + statsIn = strchr(statsBuf, '\n'); + if (!statsIn) { - x265_log(m_param, X265_LOG_ERROR, "maxNumReferences cannot be less than 1st pass (%d vs %d)\n", - i, m_param->maxNumReferences); + x265_log(m_param, X265_LOG_ERROR, "Malformed stats file\n"); return false;
View file
x265_3.5.tar.gz/source/encoder/ratecontrol.h
Changed
@@ -28,6 +28,7 @@ #include "common.h" #include "sei.h" +#include "ringmem.h" namespace X265_NS { // encoder namespace @@ -73,6 +74,7 @@ Predictor rowPreds32; Predictor* rowPred2; + int64_t currentSatd; int64_t lastSatd; /* Contains the picture cost of the previous frame, required for resetAbr and VBV */ int64_t leadingNoBSatd; int64_t rowTotalBits; /* update cplxrsum and totalbits at the end of 2 rows */ @@ -87,6 +89,8 @@ double rowCplxrSum; double qpNoVbv; double bufferFill; + double bufferFillFinal; + double bufferFillActual; double targetFill; bool vbvEndAdj; double frameDuration; @@ -237,6 +241,8 @@ FILE* m_statFileOut; FILE* m_cutreeStatFileOut; FILE* m_cutreeStatFileIn; + ///< store the cutree data in memory instead of file + RingMem *m_cutreeShrMem; double m_lastAccumPNorm; double m_expectedBitsSum; /* sum of qscale2bits after rceq, ratefactor, and overflow, only includes finished frames */ int64_t m_predictedBits; @@ -271,6 +277,9 @@ int writeRateControlFrameStats(Frame* curFrame, RateControlEntry* rce); bool initPass2(); + bool initCUTreeSharedMem(); + void skipCUTreeSharedMemRead(int32_t cnt); + double forwardMasking(Frame* curFrame, double q); double backwardMasking(Frame* curFrame, double q);
View file
x265_3.5.tar.gz/source/encoder/sei.h
Changed
@@ -73,6 +73,101 @@ } }; +/* Film grain characteristics */ +class FilmGrainCharacteristics : public SEI +{ + public: + + FilmGrainCharacteristics() + { + m_payloadType = FILM_GRAIN_CHARACTERISTICS; + m_payloadSize = 0; + } + + struct CompModelIntensityValues + { + uint8_t intensityIntervalLowerBound; + uint8_t intensityIntervalUpperBound; + int* compModelValue; + }; + + struct CompModel + { + bool bPresentFlag; + uint8_t numModelValues; + uint8_t m_filmGrainNumIntensityIntervalMinus1; + CompModelIntensityValues* intensityValues; + }; + + CompModel m_compModelMAX_NUM_COMPONENT; + bool m_filmGrainCharacteristicsPersistenceFlag; + bool m_filmGrainCharacteristicsCancelFlag; + bool m_separateColourDescriptionPresentFlag; + bool m_filmGrainFullRangeFlag; + uint8_t m_filmGrainModelId; + uint8_t m_blendingModeId; + uint8_t m_log2ScaleFactor; + uint8_t m_filmGrainBitDepthLumaMinus8; + uint8_t m_filmGrainBitDepthChromaMinus8; + uint8_t m_filmGrainColourPrimaries; + uint8_t m_filmGrainTransferCharacteristics; + uint8_t m_filmGrainMatrixCoeffs; + + void writeSEI(const SPS&) + { + WRITE_FLAG(m_filmGrainCharacteristicsCancelFlag, "film_grain_characteristics_cancel_flag"); + + if (!m_filmGrainCharacteristicsCancelFlag) + { + WRITE_CODE(m_filmGrainModelId, 2, "film_grain_model_id"); + WRITE_FLAG(m_separateColourDescriptionPresentFlag, "separate_colour_description_present_flag"); + if (m_separateColourDescriptionPresentFlag) + { + WRITE_CODE(m_filmGrainBitDepthLumaMinus8, 3, "film_grain_bit_depth_luma_minus8"); + WRITE_CODE(m_filmGrainBitDepthChromaMinus8, 3, "film_grain_bit_depth_chroma_minus8"); + WRITE_FLAG(m_filmGrainFullRangeFlag, "film_grain_full_range_flag"); + WRITE_CODE(m_filmGrainColourPrimaries, X265_BYTE, "film_grain_colour_primaries"); + WRITE_CODE(m_filmGrainTransferCharacteristics, X265_BYTE, "film_grain_transfer_characteristics"); + WRITE_CODE(m_filmGrainMatrixCoeffs, X265_BYTE, "film_grain_matrix_coeffs"); + } + WRITE_CODE(m_blendingModeId, 2, "blending_mode_id"); + WRITE_CODE(m_log2ScaleFactor, 4, "log2_scale_factor"); + for (uint8_t c = 0; c < 3; c++) + { + WRITE_FLAG(m_compModelc.bPresentFlag && m_compModelc.m_filmGrainNumIntensityIntervalMinus1 + 1 > 0 && m_compModelc.numModelValues > 0, "comp_model_present_flagc"); + } + for (uint8_t c = 0; c < 3; c++) + { + if (m_compModelc.bPresentFlag && m_compModelc.m_filmGrainNumIntensityIntervalMinus1 + 1 > 0 && m_compModelc.numModelValues > 0) + { + assert(m_compModelc.m_filmGrainNumIntensityIntervalMinus1 + 1 <= 256); + assert(m_compModelc.numModelValues <= X265_BYTE); + WRITE_CODE(m_compModelc.m_filmGrainNumIntensityIntervalMinus1 , X265_BYTE, "num_intensity_intervals_minus1c"); + WRITE_CODE(m_compModelc.numModelValues - 1, 3, "num_model_values_minus1c"); + for (uint8_t interval = 0; interval < m_compModelc.m_filmGrainNumIntensityIntervalMinus1 + 1; interval++) + { + WRITE_CODE(m_compModelc.intensityValuesinterval.intensityIntervalLowerBound, X265_BYTE, "intensity_interval_lower_boundci"); + WRITE_CODE(m_compModelc.intensityValuesinterval.intensityIntervalUpperBound, X265_BYTE, "intensity_interval_upper_boundci"); + for (uint8_t j = 0; j < m_compModelc.numModelValues; j++) + { + WRITE_SVLC(m_compModelc.intensityValuesinterval.compModelValuej,"comp_model_valueci"); + } + } + } + } + WRITE_FLAG(m_filmGrainCharacteristicsPersistenceFlag, "film_grain_characteristics_persistence_flag"); + } + if (m_bitIf->getNumberOfWrittenBits() % X265_BYTE != 0) + { + WRITE_FLAG(1, "payload_bit_equal_to_one"); + while (m_bitIf->getNumberOfWrittenBits() % X265_BYTE != 0) + { + WRITE_FLAG(0, "payload_bit_equal_to_zero"); + } + } + } +}; + static const uint32_t ISO_IEC_11578_LEN = 16; class SEIuserDataUnregistered : public SEI
View file
x265_3.5.tar.gz/source/encoder/slicetype.cpp
Changed
@@ -45,6 +45,14 @@ namespace { +uint32_t acEnergyVarHist(uint64_t sum_ssd, int shift) +{ + uint32_t sum = (uint32_t)sum_ssd; + uint32_t ssd = (uint32_t)(sum_ssd >> 32); + + return ssd - ((uint64_t)sum * sum >> shift); +} + /* Compute variance to derive AC energy of each block */ inline uint32_t acEnergyVar(Frame *curFrame, uint64_t sum_ssd, int shift, int plane) { @@ -184,7 +192,7 @@ { for (int colNum = 0; colNum < width; colNum++) { - if ((rowNum >= 2) && (colNum >= 2) && (rowNum != height - 2) && (colNum != width - 2)) //Ignoring the border pixels of the picture + if ((rowNum >= 2) && (colNum >= 2) && (rowNum < height - 2) && (colNum < width - 2)) //Ignoring the border pixels of the picture { /* 5x5 Gaussian filter 2 4 5 4 2 @@ -519,7 +527,7 @@ if (param->rc.aqMode == X265_AQ_EDGE) edgeFilter(curFrame, param); - if (param->rc.aqMode == X265_AQ_EDGE && !param->bHistBasedSceneCut && param->recursionSkipMode == EDGE_BASED_RSKIP) + if (param->rc.aqMode == X265_AQ_EDGE && param->recursionSkipMode == EDGE_BASED_RSKIP) { pixel* src = curFrame->m_edgePic + curFrame->m_fencPic->m_lumaMarginY * curFrame->m_fencPic->m_stride + curFrame->m_fencPic->m_lumaMarginX; primitives.planecopy_pp_shr(src, curFrame->m_fencPic->m_stride, curFrame->m_edgeBitPic, @@ -1050,7 +1058,30 @@ m_countPreLookahead = 0; #endif - memset(m_histogram, 0, sizeof(m_histogram)); + m_accHistDiffRunningAvgCb = X265_MALLOC(uint32_t*, NUMBER_OF_SEGMENTS_IN_WIDTH * sizeof(uint32_t*)); + m_accHistDiffRunningAvgCb0 = X265_MALLOC(uint32_t, NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT); + memset(m_accHistDiffRunningAvgCb0, 0, sizeof(uint32_t) * NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT); + for (uint32_t w = 1; w < NUMBER_OF_SEGMENTS_IN_WIDTH; w++) { + m_accHistDiffRunningAvgCbw = m_accHistDiffRunningAvgCb0 + w * NUMBER_OF_SEGMENTS_IN_HEIGHT; + } + + m_accHistDiffRunningAvgCr = X265_MALLOC(uint32_t*, NUMBER_OF_SEGMENTS_IN_WIDTH * sizeof(uint32_t*)); + m_accHistDiffRunningAvgCr0 = X265_MALLOC(uint32_t, NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT); + memset(m_accHistDiffRunningAvgCr0, 0, sizeof(uint32_t) * NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT); + for (uint32_t w = 1; w < NUMBER_OF_SEGMENTS_IN_WIDTH; w++) { + m_accHistDiffRunningAvgCrw = m_accHistDiffRunningAvgCr0 + w * NUMBER_OF_SEGMENTS_IN_HEIGHT; + } + + m_accHistDiffRunningAvg = X265_MALLOC(uint32_t*, NUMBER_OF_SEGMENTS_IN_WIDTH * sizeof(uint32_t*)); + m_accHistDiffRunningAvg0 = X265_MALLOC(uint32_t, NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT); + memset(m_accHistDiffRunningAvg0, 0, sizeof(uint32_t) * NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT); + for (uint32_t w = 1; w < NUMBER_OF_SEGMENTS_IN_WIDTH; w++) { + m_accHistDiffRunningAvgw = m_accHistDiffRunningAvg0 + w * NUMBER_OF_SEGMENTS_IN_HEIGHT; + } + + m_resetRunningAvg = true; + + m_segmentCountThreshold = (uint32_t)(((float)((NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT) * 50) / 100) + 0.5); } #if DETAILED_CU_STATS @@ -1098,6 +1129,7 @@ m_pooli.stopWorkers(); } } + void Lookahead::destroy() { // these two queues will be empty unless the encode was aborted @@ -1377,6 +1409,357 @@ } } +double computeBrightnessIntensity(pixel *inPlane, int width, int height, intptr_t stride) +{ + pixel* rowStart = inPlane; + double count = 0; + + for (int i = 0; i < height; i++) + { + for (int j = 0; j < width; j++) + { + if (rowStartj > BRIGHTNESS_THRESHOLD) + count++; + } + rowStart += stride; + } + + /* Returns the brightness percentage of the input plane */ + return (count / (width * height)) * 100; +} + +double computeEdgeIntensity(pixel *inPlane, int width, int height, intptr_t stride) +{ + pixel* rowStart = inPlane; + double count = 0; + + for (int i = 0; i < height; i++) + { + for (int j = 0; j < width; j++) + { + if (rowStartj > 0) + count++; + } + rowStart += stride; + } + + /* Returns the edge percentage of the input plane */ + return (count / (width * height)) * 100; +} + +uint32_t LookaheadTLD::calcVariance(pixel* inpSrc, intptr_t stride, intptr_t blockOffset, uint32_t plane) +{ + pixel* src = inpSrc + blockOffset; + + uint32_t var; + if (!plane) + var = acEnergyVarHist(primitives.cuBLOCK_8x8.var(src, stride), 6); + else + var = acEnergyVarHist(primitives.cuBLOCK_4x4.var(src, stride), 4); + + x265_emms(); + return var; +} + +/* +** Compute Block and Picture Variance, Block Mean for all blocks in the picture +*/ +void LookaheadTLD::computePictureStatistics(Frame *curFrame) +{ + int maxCol = curFrame->m_fencPic->m_picWidth; + int maxRow = curFrame->m_fencPic->m_picHeight; + intptr_t inpStride = curFrame->m_fencPic->m_stride; + + // Variance + uint64_t picTotVariance = 0; + uint32_t variance; + + uint64_t blockXY = 0; + pixel* src = curFrame->m_fencPic->m_picOrg0; + + for (int blockY = 0; blockY < maxRow; blockY += 8) + { + uint64_t rowVariance = 0; + for (int blockX = 0; blockX < maxCol; blockX += 8) + { + intptr_t blockOffsetLuma = blockX + (blockY * inpStride); + + variance = calcVariance( + src, + inpStride, + blockOffsetLuma, 0); + + rowVariance += variance; + blockXY++; + } + picTotVariance += (uint16_t)(rowVariance / maxCol); + } + + curFrame->m_lowres.picAvgVariance = (uint16_t)(picTotVariance / maxRow); + + // Collect chroma variance + int hShift = curFrame->m_fencPic->m_hChromaShift; + int vShift = curFrame->m_fencPic->m_vChromaShift; + + int maxColChroma = curFrame->m_fencPic->m_picWidth >> hShift; + int maxRowChroma = curFrame->m_fencPic->m_picHeight >> vShift; + intptr_t cStride = curFrame->m_fencPic->m_strideC; + + pixel* srcCb = curFrame->m_fencPic->m_picOrg1; + + picTotVariance = 0; + for (int blockY = 0; blockY < maxRowChroma; blockY += 4) + { + uint64_t rowVariance = 0; + for (int blockX = 0; blockX < maxColChroma; blockX += 4) + { + intptr_t blockOffsetChroma = blockX + blockY * cStride; + + variance = calcVariance( + srcCb, + cStride, + blockOffsetChroma, 1); + + rowVariance += variance; + blockXY++; + } + picTotVariance += (uint16_t)(rowVariance / maxColChroma); + } + + curFrame->m_lowres.picAvgVarianceCb = (uint16_t)(picTotVariance / maxRowChroma); + + + pixel* srcCr = curFrame->m_fencPic->m_picOrg2; + + picTotVariance = 0;
View file
x265_3.5.tar.gz/source/encoder/slicetype.h
Changed
@@ -44,6 +44,24 @@ #define EDGE_INCLINATION 45 #define TEMPORAL_SCENECUT_THRESHOLD 50 +#define X265_ABS(a) (((a) < 0) ? (-(a)) : (a)) + +#define PICTURE_DIFF_VARIANCE_TH 390 +#define PICTURE_VARIANCE_TH 1500 +#define LOW_VAR_SCENE_CHANGE_TH 2250 +#define HIGH_VAR_SCENE_CHANGE_TH 3500 + +#define PICTURE_DIFF_VARIANCE_CHROMA_TH 10 +#define PICTURE_VARIANCE_CHROMA_TH 20 +#define LOW_VAR_SCENE_CHANGE_CHROMA_TH 2250/4 +#define HIGH_VAR_SCENE_CHANGE_CHROMA_TH 3500/4 + +#define FLASH_TH 1.5 +#define FADE_TH 4 +#define INTENSITY_CHANGE_TH 4 + +#define NUM64x64INPIC(w,h) ((w*h)>> (MAX_LOG2_CU_SIZE<<1)) + #if HIGH_BIT_DEPTH #define EDGE_THRESHOLD 1023.0 #else @@ -93,7 +111,29 @@ ~LookaheadTLD() { X265_FREE(wbuffer0); } + void collectPictureStatistics(Frame *curFrame); + void computeIntensityHistogramBinsLuma(Frame *curFrame, uint64_t *sumAvgIntensityTotalSegmentsLuma); + + void computeIntensityHistogramBinsChroma( + Frame *curFrame, + uint64_t *sumAverageIntensityCb, + uint64_t *sumAverageIntensityCr); + + void calculateHistogram( + pixel *inputSrc, + uint32_t inputWidth, + uint32_t inputHeight, + intptr_t stride, + uint8_t dsFactor, + uint32_t *histogram, + uint64_t *sum); + + void computePictureStatistics(Frame *curFrame); + + uint32_t calcVariance(pixel* src, intptr_t stride, intptr_t blockOffset, uint32_t plane); + void calcAdaptiveQuantFrame(Frame *curFrame, x265_param* param); + void calcFrameSegment(Frame *curFrame); void lowresIntraEstimate(Lowres& fenc, uint32_t qgSize); void weightsAnalyse(Lowres& fenc, Lowres& ref); @@ -124,7 +164,6 @@ /* pre-lookahead */ int m_fullQueueSize; - int m_histogramX265_BFRAME_MAX + 1; int m_lastKeyframe; int m_8x8Width; int m_8x8Height; @@ -153,6 +192,14 @@ bool m_isFadeIn; uint64_t m_fadeCount; int m_fadeStart; + + uint32_t **m_accHistDiffRunningAvgCb; + uint32_t **m_accHistDiffRunningAvgCr; + uint32_t **m_accHistDiffRunningAvg; + + bool m_resetRunningAvg; + uint32_t m_segmentCountThreshold; + Lookahead(x265_param *param, ThreadPool *pool); #if DETAILED_CU_STATS int64_t m_slicetypeDecideElapsedTime; @@ -174,6 +221,7 @@ void getEstimatedPictureCost(Frame *pic); void setLookaheadQueue(); + int findSliceType(int poc); protected: @@ -184,6 +232,10 @@ /* called by slicetypeAnalyse() to make slice decisions */ bool scenecut(Lowres **frames, int p0, int p1, bool bRealScenecut, int numFrames); bool scenecutInternal(Lowres **frames, int p0, int p1, bool bRealScenecut); + + bool histBasedScenecut(Lowres **frames, int p0, int p1, int numFrames); + bool detectHistBasedSceneChange(Lowres **frames, int p0, int p1, int p2); + void slicetypePath(Lowres **frames, int length, char(*best_paths)X265_LOOKAHEAD_MAX + 1); int64_t slicetypePathCost(Lowres **frames, char *path, int64_t threshold); int64_t vbvFrameCost(Lowres **frames, int p0, int p1, int b);
View file
x265_3.5.tar.gz/source/test/CMakeLists.txt
Changed
@@ -23,15 +23,13 @@ # add ARM assembly files if(ARM OR CROSS_COMPILE_ARM) - if(NOT ARM64) - enable_language(ASM) - set(NASM_SRC checkasm-arm.S) - add_custom_command( - OUTPUT checkasm-arm.obj - COMMAND ${CMAKE_CXX_COMPILER} - ARGS ${NASM_FLAGS} ${CMAKE_CURRENT_SOURCE_DIR}/checkasm-arm.S -o checkasm-arm.obj - DEPENDS checkasm-arm.S) - endif() + enable_language(ASM) + set(NASM_SRC checkasm-arm.S) + add_custom_command( + OUTPUT checkasm-arm.obj + COMMAND ${CMAKE_CXX_COMPILER} + ARGS ${NASM_FLAGS} ${CMAKE_CURRENT_SOURCE_DIR}/checkasm-arm.S -o checkasm-arm.obj + DEPENDS checkasm-arm.S) endif(ARM OR CROSS_COMPILE_ARM) # add PowerPC assembly files
View file
x265_3.5.tar.gz/source/test/pixelharness.cpp
Changed
@@ -406,6 +406,32 @@ return true; } +bool PixelHarness::check_downscaleluma_t(downscaleluma_t ref, downscaleluma_t opt) +{ + ALIGN_VAR_16(pixel, ref_destf32 * 32); + ALIGN_VAR_16(pixel, opt_destf32 * 32); + + intptr_t src_stride = 64; + intptr_t dst_stride = 32; + int bx = 32; + int by = 32; + int j = 0; + for (int i = 0; i < ITERS; i++) + { + int index = i % TEST_CASES; + ref(pixel_test_buffindex + j, ref_destf, src_stride, dst_stride, bx, by); + checked(opt, pixel_test_buffindex + j, opt_destf, src_stride, dst_stride, bx, by); + + if (memcmp(ref_destf, opt_destf, 32 * 32 * sizeof(pixel))) + return false; + + reportfail(); + j += INCR; + } + + return true; +} + bool PixelHarness::check_cpy2Dto1D_shl_t(cpy2Dto1D_shl_t ref, cpy2Dto1D_shl_t opt) { ALIGN_VAR_16(int16_t, ref_dest64 * 64); @@ -2793,6 +2819,15 @@ } } + if (opt.frameSubSampleLuma) + { + if (!check_downscaleluma_t(ref.frameSubSampleLuma, opt.frameSubSampleLuma)) + { + printf("SubSample Luma failed!\n"); + return false; + } + } + if (opt.scale1D_128to64NONALIGNED) { if (!check_scale1D_pp(ref.scale1D_128to64NONALIGNED, opt.scale1D_128to64NONALIGNED)) @@ -3492,6 +3527,12 @@ REPORT_SPEEDUP(opt.frameInitLowres, ref.frameInitLowres, pbuf2, pbuf1, pbuf2, pbuf3, pbuf4, 64, 64, 64, 64); } + if (opt.frameSubSampleLuma) + { + HEADER0("downscaleluma"); + REPORT_SPEEDUP(opt.frameSubSampleLuma, ref.frameSubSampleLuma, pbuf2, pbuf1, 64, 64, 64, 64); + } + if (opt.scale1D_128to64NONALIGNED) { HEADER0("scale1D_128to64");
View file
x265_3.5.tar.gz/source/test/pixelharness.h
Changed
@@ -138,6 +138,7 @@ bool check_integral_inith(integralh_t ref, integralh_t opt); bool check_ssimDist(ssimDistortion_t ref, ssimDistortion_t opt); bool check_normFact(normFactor_t ref, normFactor_t opt, int block); + bool check_downscaleluma_t(downscaleluma_t ref, downscaleluma_t opt); public:
View file
x265_3.5.tar.gz/source/test/regression-tests.txt
Changed
@@ -18,12 +18,12 @@ BasketballDrive_1920x1080_50.y4m,--preset faster --aq-strength 2 --merange 190 --slices 3 BasketballDrive_1920x1080_50.y4m,--preset medium --ctu 16 --max-tu-size 8 --subme 7 --qg-size 16 --cu-lossless --tu-inter-depth 3 --limit-tu 1 BasketballDrive_1920x1080_50.y4m,--preset medium --keyint -1 --nr-inter 100 -F4 --no-sao -BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 2 --bitrate 7000 --limit-modes::--preset medium --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 2 --bitrate 7000 --limit-modes +BasketballDrive_1920x1080_50.y4m,--preset medium --analysis-save x265_analysis.dat --analysis-save-reuse-level 2 --bitrate 7000 --limit-modes::--preset medium --analysis-load x265_analysis.dat --analysis-load-reuse-level 2 --bitrate 7000 --limit-modes BasketballDrive_1920x1080_50.y4m,--preset slow --nr-intra 100 -F4 --aq-strength 3 --qg-size 16 --limit-refs 1 BasketballDrive_1920x1080_50.y4m,--preset slower --lossless --chromaloc 3 --subme 0 --limit-tu 4 -BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --bitrate 7000 --limit-tu 0::--preset slower --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 10 --bitrate 7000 --limit-tu 0 +BasketballDrive_1920x1080_50.y4m,--preset slower --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --bitrate 7000 --limit-tu 0::--preset slower --analysis-load x265_analysis.dat --analysis-load-reuse-level 10 --bitrate 7000 --limit-tu 0 BasketballDrive_1920x1080_50.y4m,--preset veryslow --crf 4 --cu-lossless --pmode --limit-refs 1 --aq-mode 3 --limit-tu 3 -BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --crf 18 --tskip-fast --limit-tu 2::--preset veryslow --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 5 --crf 18 --tskip-fast --limit-tu 2 +BasketballDrive_1920x1080_50.y4m,--preset veryslow --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --crf 18 --tskip-fast --limit-tu 2::--preset veryslow --analysis-load x265_analysis.dat --analysis-load-reuse-level 5 --crf 18 --tskip-fast --limit-tu 2 BasketballDrive_1920x1080_50.y4m,--preset veryslow --recon-y4m-exec "ffplay -i pipe:0 -autoexit" Coastguard-4k.y4m,--preset ultrafast --recon-y4m-exec "ffplay -i pipe:0 -autoexit" Coastguard-4k.y4m,--preset superfast --tune grain --overscan=crop @@ -53,7 +53,7 @@ DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset veryfast --weightp --nr-intra 1000 -F4 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset medium --nr-inter 500 -F4 --no-psy-rdoq DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset slower --no-weightp --rdoq-level 0 --limit-refs 3 --tu-inter-depth 4 --limit-tu 3 -DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1::--preset fast --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 5 --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1 +DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1::--preset fast --analysis-load x265_analysis.dat --analysis-load-reuse-level 5 --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1 FourPeople_1280x720_60.y4m,--preset superfast --no-wpp --lookahead-slices 2 FourPeople_1280x720_60.y4m,--preset veryfast --aq-mode 2 --aq-strength 1.5 --qg-size 8 FourPeople_1280x720_60.y4m,--preset medium --qp 38 --no-psy-rd @@ -160,11 +160,14 @@ Kimono1_1920x1080_24_400.yuv,--preset superfast --qp 28 --zones 0,139,q=32 sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut --hist-threshold 0.02 --frame-dup --dup-threshold 60 --hrd --bitrate 10000 --vbv-bufsize 15000 --vbv-maxrate 12000 sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut --hist-threshold 0.02 +sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut --hist-threshold 0.02 --no-traditional-scenecut sintel_trailer_2k_1920x1080_24.yuv, --preset ultrafast --hist-scenecut --hist-threshold 0.02 crowd_run_1920x1080_50.yuv, --preset faster --ctu 32 --rskip 2 --rskip-edge-threshold 5 crowd_run_1920x1080_50.yuv, --preset fast --ctu 64 --rskip 2 --rskip-edge-threshold 5 --aq-mode 4 crowd_run_1920x1080_50.yuv, --preset slow --ctu 32 --rskip 2 --rskip-edge-threshold 5 --hist-scenecut --hist-threshold 0.1 crowd_run_1920x1080_50.yuv, --preset slower --ctu 16 --rskip 2 --rskip-edge-threshold 5 --hist-scenecut --hist-threshold 0.1 --aq-mode 4 +crowd_run_1920x1080_50.yuv, --preset ultrafast --video-signal-type-preset BT2100_PQ_YCC:BT2100x108n0005 +crowd_run_1920x1080_50.yuv, --preset ultrafast --eob --eos # Main12 intraCost overflow bug test 720p50_parkrun_ter.y4m,--preset medium @@ -182,14 +185,18 @@ #scaled save/load test crowd_run_1080p50.y4m,--preset ultrafast --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 1 --scale-factor 2 --crf 26 --vbv-maxrate 8000 --vbv-bufsize 8000::crowd_run_2160p50.y4m, --preset ultrafast --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 1 --scale-factor 2 --crf 26 --vbv-maxrate 12000 --vbv-bufsize 12000 -crowd_run_1080p50.y4m,--preset superfast --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 5000 --vbv-bufsize 5000::crowd_run_2160p50.y4m, --preset superfast --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 10000 --vbv-bufsize 10000 -crowd_run_1080p50.y4m,--preset fast --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --scale-factor 2 --qp 18::crowd_run_2160p50.y4m, --preset fast --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 5 --scale-factor 2 --qp 18 +crowd_run_1080p50.y4m,--preset superfast --analysis-save x265_analysis.dat --analysis-save-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 5000 --vbv-bufsize 5000::crowd_run_2160p50.y4m, --preset superfast --analysis-load x265_analysis.dat --analysis-load-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 10000 --vbv-bufsize 10000 +crowd_run_1080p50.y4m,--preset fast --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --scale-factor 2 --qp 18::crowd_run_2160p50.y4m, --preset fast --analysis-load x265_analysis.dat --analysis-load-reuse-level 5 --scale-factor 2 --qp 18 crowd_run_1080p50.y4m,--preset medium --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000 --vbv-maxrate 5000 --vbv-bufsize 5000 --early-skip --tu-inter-depth 3::crowd_run_2160p50.y4m, --preset medium --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 10000 --early-skip --tu-inter-depth 3 --refine-intra 4 --dynamic-refine::crowd_run_2160p50.y4m, --preset medium --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 10000 --early-skip --tu-inter-depth 3 --refine-intra 3 --refine-inter 3 -RaceHorses_416x240_30.y4m,--preset slow --no-cutree --ctu 16 --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --scale-factor 2 --crf 22 --vbv-maxrate 1000 --vbv-bufsize 1000::RaceHorses_832x480_30.y4m, --preset slow --no-cutree --ctu 32 --analysis-load x265_analysis.dat --analysis-save x265_analysis_2.dat --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --crf 16 --vbv-maxrate 4000 --vbv-bufsize 4000 --refine-intra 0 --refine-inter 1::RaceHorses_1664x960_30.y4m,--preset slow --no-cutree --ctu 64 --analysis-load x265_analysis_2.dat --analysis-load-reuse-level 10 --scale-factor 2 --crf 12 --vbv-maxrate 7000 --vbv-bufsize 7000 --refine-intra 2 --refine-inter 2 +RaceHorses_416x240_30.y4m,--preset slow --ctu 16 --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --scale-factor 2 --crf 22 --vbv-maxrate 1000 --vbv-bufsize 1000::RaceHorses_832x480_30.y4m, --preset slow --ctu 32 --analysis-load x265_analysis.dat --analysis-save x265_analysis_2.dat --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --crf 16 --vbv-maxrate 4000 --vbv-bufsize 4000 --refine-intra 0 --refine-inter 1::RaceHorses_1664x960_30.y4m,--preset slow --ctu 64 --analysis-load x265_analysis_2.dat --analysis-load-reuse-level 10 --scale-factor 2 --crf 12 --vbv-maxrate 7000 --vbv-bufsize 7000 --refine-intra 2 --refine-inter 2 ElFunete_960x540_60.yuv,--colorprim bt709 --transfer bt709 --chromaloc 2 --aud --repeat-headers --no-opt-qp-pps --no-opt-ref-list-length-pps --wpp --no-interlace --sar 1:1 --min-keyint 60 --no-open-gop --rc-lookahead 180 --bframes 5 --b-intra --ref 4 --cbqpoffs -2 --crqpoffs -2 --lookahead-threads 0 --weightb --qg-size 8 --me star --preset veryslow --frame-threads 1 --b-adapt 2 --aq-mode 3 --rd 6 --pools 15 --colormatrix bt709 --keyint 120 --high-tier --ctu 64 --tune psnr --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500 --analysis-save-reuse-level 10 --analysis-save elfuente_960x540.dat --scale-factor 2::ElFunete_1920x1080_60.yuv,--colorprim bt709 --transfer bt709 --chromaloc 2 --aud --repeat-headers --no-opt-qp-pps --no-opt-ref-list-length-pps --wpp --no-interlace --sar 1:1 --min-keyint 60 --no-open-gop --rc-lookahead 180 --bframes 5 --b-intra --ref 4 --cbqpoffs -2 --crqpoffs -2 --lookahead-threads 0 --weightb --qg-size 8 --me star --preset veryslow --frame-threads 1 --b-adapt 2 --aq-mode 3 --rd 6 --pools 15 --colormatrix bt709 --keyint 120 --high-tier --ctu 64 --tune psnr --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500 --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --analysis-save elfuente_1920x1080.dat --limit-tu 0 --scale-factor 2 --analysis-load elfuente_960x540.dat --refine-intra 4 --refine-inter 2::ElFuente_3840x2160_60.yuv,--colorprim bt709 --transfer bt709 --chromaloc 2 --aud --repeat-headers --no-opt-qp-pps --no-opt-ref-list-length-pps --wpp --no-interlace --sar 1:1 --min-keyint 60 --no-open-gop --rc-lookahead 180 --bframes 5 --b-intra --ref 4 --cbqpoffs -2 --crqpoffs -2 --lookahead-threads 0 --weightb --qg-size 8 --me star --preset veryslow --frame-threads 1 --b-adapt 2 --aq-mode 3 --rd 6 --pools 15 --colormatrix bt709 --keyint 120 --high-tier --ctu 64 --tune=psnr --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000 --analysis-load-reuse-level 10 --limit-tu 0 --scale-factor 2 --analysis-load elfuente_1920x1080.dat --refine-intra 4 --refine-inter 2 #save/load with ctu distortion refinement CrowdRun_1920x1080_50_10bit_422.yuv,--no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --refine-ctu-distortion 1 --bitrate 7000::--no-cutree --analysis-load x265_analysis.dat --refine-ctu-distortion 1 --bitrate 7000 --analysis-load-reuse-level 5 #segment encoding BasketballDrive_1920x1080_50.y4m, --preset ultrafast --no-open-gop --chunk-start 100 --chunk-end 200 +#Test FG SEI message addition +#OldTownCross_1920x1080_50_10bit_422.yuv,--preset slower --tune grain --film-grain "OldTownCross_1920x1080_50_10bit_422.bin" +#RaceHorses_416x240_30_10bit.yuv,--preset ultrafast --signhide --colormatrix bt709 --film-grain "RaceHorses_416x240_30_10bit.bin" + # vim: tw=200
View file
x265_3.5.tar.gz/source/test/save-load-tests.txt
Changed
@@ -12,10 +12,10 @@ # not auto-detected. crowd_run_1080p50.y4m, --preset ultrafast --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 1 --scale-factor 2 --crf 26 --vbv-maxrate 8000 --vbv-bufsize 8000::crowd_run_2160p50.y4m, --preset ultrafast --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 1 --scale-factor 2 --crf 26 --vbv-maxrate 12000 --vbv-bufsize 12000 crowd_run_540p50.y4m, --preset ultrafast --no-cutree --analysis-save x265_analysis.dat --scale-factor 2 --crf 26 --vbv-maxrate 8000 --vbv-bufsize 8000::crowd_run_1080p50.y4m, --preset ultrafast --no-cutree --analysis-load x265_analysis.dat --scale-factor 2 --crf 26 --vbv-maxrate 12000 --vbv-bufsize 12000 -crowd_run_1080p50.y4m, --preset superfast --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 5000 --vbv-bufsize 5000::crowd_run_2160p50.y4m, --preset superfast --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 10000 --vbv-bufsize 10000 -crowd_run_1080p50.y4m, --preset fast --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --scale-factor 2 --qp 18::crowd_run_2160p50.y4m, --preset fast --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 5 --scale-factor 2 --qp 18 -crowd_run_1080p50.y4m, --preset medium --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000 --vbv-maxrate 5000 --vbv-bufsize 5000 --early-skip --tu-inter-depth 3::crowd_run_2160p50.y4m, --preset medium --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 10000 --early-skip --tu-inter-depth 3 --refine-intra 4 --dynamic-refine::crowd_run_2160p50.y4m, --preset medium --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 10000 --early-skip --tu-inter-depth 3 --refine-intra 3 --refine-inter 3 +crowd_run_1080p50.y4m, --preset superfast --analysis-save x265_analysis.dat --analysis-save-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 5000 --vbv-bufsize 5000::crowd_run_2160p50.y4m, --preset superfast --analysis-load x265_analysis.dat --analysis-load-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 10000 --vbv-bufsize 10000 +crowd_run_1080p50.y4m, --preset fast --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --scale-factor 2 --qp 18::crowd_run_2160p50.y4m, --preset fast --analysis-load x265_analysis.dat --analysis-load-reuse-level 5 --scale-factor 2 --qp 18 +crowd_run_1080p50.y4m, --preset medium --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000 --vbv-maxrate 5000 --vbv-bufsize 5000 --early-skip --tu-inter-depth 3::crowd_run_2160p50.y4m, --preset medium --analysis-load x265_analysis.dat --analysis-load-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 10000 --early-skip --tu-inter-depth 3 --refine-intra 4 --dynamic-refine::crowd_run_2160p50.y4m, --preset medium --analysis-load x265_analysis.dat --analysis-load-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 10000 --early-skip --tu-inter-depth 3 --refine-intra 3 --refine-inter 3 RaceHorses_416x240_30.y4m, --preset slow --no-cutree --ctu 16 --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --scale-factor 2 --crf 22 --vbv-maxrate 1000 --vbv-bufsize 1000::RaceHorses_832x480_30.y4m, --preset slow --no-cutree --ctu 32 --analysis-load x265_analysis.dat --analysis-save x265_analysis_2.dat --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --crf 16 --vbv-maxrate 4000 --vbv-bufsize 4000 --refine-intra 0 --refine-inter 1::RaceHorses_1664x960_30.y4m, --preset slow --no-cutree --ctu 64 --analysis-load x265_analysis_2.dat --analysis-load-reuse-level 10 --scale-factor 2 --crf 12 --vbv-maxrate 7000 --vbv-bufsize 7000 --refine-intra 2 --refine-inter 2 -crowd_run_540p50.y4m, --preset veryslow --no-cutree --analysis-save x265_analysis_540.dat --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000 --vbv-bufsize 15000 --vbv-maxrate 9000::crowd_run_1080p50.y4m, --preset veryslow --no-cutree --analysis-save x265_analysis_1080.dat --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_1080p50.y4m, --preset veryslow --no-cutree --analysis-save x265_analysis_1080.dat --analysis-load x265_analysis_540.dat --refine-intra 4 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_2160p50.y4m, --preset veryslow --no-cutree --analysis-save x265_analysis_2160.dat --analysis-load x265_analysis_1080.dat --refine-intra 3 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000::crowd_run_2160p50.y4m, --preset veryslow --no-cutree --analysis-load x265_analysis_2160.dat --refine-intra 2 --dynamic-refine --analysis-load-reuse-level 10 --scale-factor 1 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000 +crowd_run_540p50.y4m, --preset veryslow --analysis-save x265_analysis_540.dat --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000 --vbv-bufsize 15000 --vbv-maxrate 9000::crowd_run_1080p50.y4m, --preset veryslow --analysis-save x265_analysis_1080.dat --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_1080p50.y4m, --preset veryslow --analysis-save x265_analysis_1080.dat --analysis-load x265_analysis_540.dat --refine-intra 4 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_2160p50.y4m, --preset veryslow --analysis-save x265_analysis_2160.dat --analysis-load x265_analysis_1080.dat --refine-intra 3 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000::crowd_run_2160p50.y4m, --preset veryslow --analysis-load x265_analysis_2160.dat --refine-intra 2 --dynamic-refine --analysis-load-reuse-level 10 --scale-factor 1 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000 crowd_run_540p50.y4m, --preset medium --no-cutree --analysis-save x265_analysis_540.dat --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000 --vbv-bufsize 15000 --vbv-maxrate 9000::crowd_run_1080p50.y4m, --preset medium --no-cutree --analysis-save x265_analysis_1080.dat --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_1080p50.y4m, --preset medium --no-cutree --analysis-save x265_analysis_1080.dat --analysis-load x265_analysis_540.dat --refine-intra 4 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_2160p50.y4m, --preset medium --no-cutree --analysis-save x265_analysis_2160.dat --analysis-load x265_analysis_1080.dat --refine-intra 3 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000::crowd_run_2160p50.y4m, --preset medium --no-cutree --analysis-load x265_analysis_2160.dat --refine-intra 2 --dynamic-refine --analysis-load-reuse-level 10 --scale-factor 1 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000 News-4k.y4m, --preset medium --analysis-save x265_analysis_fdup.dat --frame-dup --hrd --bitrate 10000 --vbv-bufsize 15000 --vbv-maxrate 12000::News-4k.y4m, --analysis-load x265_analysis_fdup.dat --frame-dup --hrd --bitrate 10000 --vbv-bufsize 15000 --vbv-maxrate 12000
View file
x265_3.5.tar.gz/source/test/smoke-tests.txt
Changed
@@ -23,3 +23,7 @@ # Main12 intraCost overflow bug test 720p50_parkrun_ter.y4m,--preset medium 720p50_parkrun_ter.y4m,--preset=fast --hevc-aq --no-cutree +# Test FG SEI message addition +# CrowdRun_1920x1080_50_10bit_444.yuv,--preset=ultrafast --weightp --keyint -1 --film-grain "CrowdRun_1920x1080_50_10bit_444.bin" +# DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset=veryfast --min-cu 16 --film-grain "DucksAndLegs_1920x1080_60_10bit_422.bin" +# NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset=superfast --bitrate 10000 --sao --limit-sao --cll --max-cll "1000,400" --film-grain "NebutaFestival_2560x1600_60_10bit_crop.bin"
View file
x265_3.5.tar.gz/source/test/testbench.cpp
Changed
@@ -208,15 +208,8 @@ EncoderPrimitives asmprim; memset(&asmprim, 0, sizeof(asmprim)); - setupAssemblyPrimitives(asmprim, test_archi.flag); - -#if X265_ARCH_ARM64 - /* Temporary workaround because luma_vsp assembly primitive has not been completed - * but interp_8tap_hv_pp_cpu uses mixed C primitive and assembly primitive. - * Otherwise, segment fault occurs. */ - setupAliasCPrimitives(cprim, asmprim, test_archi.flag); -#endif + setupAssemblyPrimitives(asmprim, test_archi.flag); setupAliasPrimitives(asmprim); memcpy(&primitives, &asmprim, sizeof(EncoderPrimitives)); for (size_t h = 0; h < sizeof(harness) / sizeof(TestHarness*); h++) @@ -239,14 +232,8 @@ #if X265_ARCH_X86 setupInstrinsicPrimitives(optprim, cpuid); #endif - setupAssemblyPrimitives(optprim, cpuid); -#if X265_ARCH_ARM64 - /* Temporary workaround because luma_vsp assembly primitive has not been completed - * but interp_8tap_hv_pp_cpu uses mixed C primitive and assembly primitive. - * Otherwise, segment fault occurs. */ - setupAliasCPrimitives(cprim, optprim, cpuid); -#endif + setupAssemblyPrimitives(optprim, cpuid); /* Note that we do not setup aliases for performance tests, that would be * redundant. The testbench only verifies they are correctly aliased */
View file
x265_3.5.tar.gz/source/test/testharness.h
Changed
@@ -73,7 +73,7 @@ #include <x86intrin.h> #elif ( !defined(__APPLE__) && defined (__GNUC__) && defined(__ARM_NEON__)) #include <arm_neon.h> -#elif defined(__GNUC__) && (!defined(__clang__) || __clang_major__ < 4) +#else /* fallback for older GCC/MinGW */ static inline uint32_t __rdtsc(void) { @@ -82,15 +82,13 @@ #if X265_ARCH_X86 asm volatile("rdtsc" : "=a" (a) ::"edx"); #elif X265_ARCH_ARM -#if X265_ARCH_ARM64 - asm volatile("mrs %0, cntvct_el0" : "=r"(a)); -#else // TOD-DO: verify following inline asm to get cpu Timestamp Counter for ARM arch // asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r"(a)); // TO-DO: replace clock() function with appropriate ARM cpu instructions a = clock(); -#endif +#elif X265_ARCH_ARM64 + asm volatile("mrs %0, cntvct_el0" : "=r"(a)); #endif return a; } @@ -140,7 +138,7 @@ * needs an explicit asm check because it only sometimes crashes in normal use. */ intptr_t PFX(checkasm_call)(intptr_t (*func)(), int *ok, ...); float PFX(checkasm_call_float)(float (*func)(), int *ok, ...); -#elif X265_ARCH_ARM == 0 +#elif (X265_ARCH_ARM == 0 && X265_ARCH_ARM64 == 0) #define PFX(stack_pagealign)(func, align) func() #endif
View file
x265_3.5.tar.gz/source/x265.h
Changed
@@ -747,6 +747,13 @@ static const x265_vmaf_commondata vcd = { { NULL, (char *)"/usr/local/share/model/vmaf_v0.6.1.pkl", NULL, NULL, 0, 0, 0, 0, 0, 0, 0, NULL, 0, 1, 0 } }; + +typedef enum +{ + X265_SHARE_MODE_FILE = 0, + X265_SHARE_MODE_SHAREDMEM +}X265_DATA_SHARE_MODES; + /* x265 input parameters * * For version safety you may use x265_param_alloc/free() to manage the @@ -1433,10 +1440,10 @@ double rfConstantMin; /* Multi-pass encoding */ - /* Enable writing the stats in a multi-pass encode to the stat output file */ + /* Enable writing the stats in a multi-pass encode to the stat output file/memory */ int bStatWrite; - /* Enable loading data from the stat input file in a multi pass encode */ + /* Enable loading data from the stat input file/memory in a multi pass encode */ int bStatRead; /* Filename of the 2pass output/input stats file, if unspecified the @@ -1489,6 +1496,24 @@ /* internally enable if tune grain is set */ int bEnableConstVbv; + /* enable SBRC mode for each sequence */ + int frameSegment; + + /* if only the focused frames would be re-encode or not */ + int bEncFocusedFramesOnly; + + /* Share the data with stats file or shared memory. + It must be one of the X265_DATA_SHARE_MODES enum values + Available if the bStatWrite or bStatRead is true. + Use stats file by default. + The stats file mode would be used among the encoders running in sequence. + The shared memory mode could only be used among the encoders running in parallel. + Now only the cutree data could be shared among shared memory. More data would be support in the future.*/ + int dataShareMode; + + /* Unique shared memory name. Required if the shared memory mode enabled. NULL by default */ + const char* sharedMemName; + } rc; /*== Video Usability Information ==*/ @@ -1869,12 +1894,6 @@ /* The offset by which QP is incremented for non-referenced inter-frames after a scenecut when bEnableSceneCutAwareQp is 1 or 3. */ double fwdNonRefQpDelta; - /* A genuine threshold used for histogram based scene cut detection. - * This threshold determines whether a frame is a scenecut or not - * when compared against the edge and chroma histogram sad values. - * Default 0.03. Range: Real number in the interval (0,1). */ - double edgeTransitionThreshold; - /* Enables histogram based scenecut detection algorithm to detect scenecuts. Default disabled */ int bHistBasedSceneCut; @@ -1948,6 +1967,28 @@ /* The offset by which QP is incremented for non-referenced inter-frames before a scenecut when bEnableSceneCutAwareQp is 2 or 3. */ double bwdNonRefQpDelta; + + /* Specify combinations of color primaries, transfer characteristics, color matrix, + * range of luma and chroma signals, and chroma sample location. This has higher + * precedence than individual VUI parameters. If any individual VUI option is specified + * together with this, which changes the values set corresponding to the system-id + * or color-volume, it will be discarded. */ + const char* videoSignalTypePreset; + + /* Flag indicating whether the encoder should emit an End of Bitstream + * NAL at the end of bitstream. Default false */ + int bEnableEndOfBitstream; + + /* Flag indicating whether the encoder should emit an End of Sequence + * NAL at the end of every Coded Video Sequence. Default false */ + int bEnableEndOfSequence; + + /* Film Grain Characteristic file */ + char* filmGrain; + + /*Motion compensated temporal filter*/ + int bEnableTemporalFilter; + double temporalFilterStrength; } x265_param; /* x265_param_alloc:
View file
x265_3.5.tar.gz/source/x265cli.cpp
Changed
@@ -28,8 +28,8 @@ #include "x265cli.h" #include "svt.h" -#define START_CODE 0x00000001 -#define START_CODE_BYTES 4 +#define START_CODE 0x00000001 +#define START_CODE_BYTES 4 #ifdef __cplusplus namespace X265_NS { @@ -174,7 +174,6 @@ H1(" --scenecut-bias <0..100.0> Bias for scenecut detection. Default %.2f\n", param->scenecutBias); H0(" --hist-scenecut Enables histogram based scene-cut detection using histogram based algorithm.\n"); H0(" --no-hist-scenecut Disables histogram based scene-cut detection using histogram based algorithm.\n"); - H1(" --hist-threshold <0.0..1.0> Luma Edge histogram's Normalized SAD threshold for histogram based scenecut detection Default %.2f\n", param->edgeTransitionThreshold); H0(" --no-fades Enable detection and handling of fade-in regions. Default %s\n", OPT(param->bEnableFades)); H1(" --scenecut-aware-qp <0..3> Enable increasing QP for frames inside the scenecut window around scenecut. Default %s\n", OPT(param->bEnableSceneCutAwareQp)); H1(" 0 - Disabled\n"); @@ -262,6 +261,7 @@ H0(" --aq-strength <float> Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength); H0(" --qp-adaptation-range <float> Delta QP range by QP adaptation based on a psycho-visual model (1.0 to 6.0). Default %.2f\n", param->rc.qpAdaptationRange); H0(" --no-aq-motion Block level QP adaptation based on the relative motion between the block and the frame. Default %s\n", OPT(param->bAQMotion)); + H1(" --no-sbrc Enables the segment based rate control, using its scene statistics. Default %s\n", OPT(param->rc.frameSegment)); H0(" --qg-size <int> Specifies the size of the quantization group (64, 32, 16, 8). Default %d\n", param->rc.qgSize); H0(" --no-cutree Enable cutree for Adaptive Quantization. Default %s\n", OPT(param->rc.cuTree)); H0(" --no-rc-grain Enable ratecontrol mode to handle grains specifically. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableGrain)); @@ -314,6 +314,30 @@ H0(" --master-display <string> SMPTE ST 2086 master display color volume info SEI (HDR)\n"); H0(" format: G(x,y)B(x,y)R(x,y)WP(x,y)L(max,min)\n"); H0(" --max-cll <string> Specify content light level info SEI as \"cll,fall\" (HDR).\n"); + H0(" --video-signal-type-preset <string> Specify combinations of color primaries, transfer characteristics, color matrix, range of luma and chroma signals, and chroma sample location\n"); + H0(" format: <system-id>:<color-volume>\n"); + H0(" This has higher precedence than individual VUI parameters. If any individual VUI option is specified together with this,\n"); + H0(" which changes the values set corresponding to the system-id or color-volume, it will be discarded.\n"); + H0(" The color-volume can be used only with the system-id options BT2100_PQ_YCC, BT2100_PQ_ICTCP, and BT2100_PQ_RGB.\n"); + H0(" system-id options and their corresponding values:\n"); + H0(" BT601_525: --colorprim smpte170m --transfer smpte170m --colormatrix smpte170m --range limited --chromaloc 0\n"); + H0(" BT601_626: --colorprim bt470bg --transfer smpte170m --colormatrix bt470bg --range limited --chromaloc 0\n"); + H0(" BT709_YCC: --colorprim bt709 --transfer bt709 --colormatrix bt709 --range limited --chromaloc 0\n"); + H0(" BT709_RGB: --colorprim bt709 --transfer bt709 --colormatrix gbr --range limited\n"); + H0(" BT2020_YCC_NCL: --colorprim bt2020 --transfer bt2020-10 --colormatrix bt709 --range limited --chromaloc 2\n"); + H0(" BT2020_RGB: --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020nc --range limited\n"); + H0(" BT2100_PQ_YCC: --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020nc --range limited --chromaloc 2\n"); + H0(" BT2100_PQ_ICTCP: --colorprim bt2020 --transfer smpte2084 --colormatrix ictcp --range limited --chromaloc 2\n"); + H0(" BT2100_PQ_RGB: --colorprim bt2020 --transfer smpte2084 --colormatrix gbr --range limited\n"); + H0(" BT2100_HLG_YCC: --colorprim bt2020 --transfer arib-std-b67 --colormatrix bt2020nc --range limited --chromaloc 2\n"); + H0(" BT2100_HLG_RGB: --colorprim bt2020 --transfer arib-std-b67 --colormatrix gbr --range limited\n"); + H0(" FR709_RGB: --colorprim bt709 --transfer bt709 --colormatrix gbr --range full\n"); + H0(" FR2020_RGB: --colorprim bt2020 --transfer bt2020-10 --colormatrix gbr --range full\n"); + H0(" FRP3D65_YCC: --colorprim smpte432 --transfer bt709 --colormatrix smpte170m --range full --chromaloc 1\n"); + H0(" color-volume options and their corresponding values:\n"); + H0(" P3D65x1000n0005: --master-display G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,5)\n"); + H0(" P3D65x4000n005: --master-display G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(40000000,50)\n"); + H0(" BT2100x108n0005: --master-display G(8500,39850)B(6550,2300)R(34000,146000)WP(15635,16450)L(10000000,1)\n"); H0(" --no-cll Emit content light level info SEI. Default %s\n", OPT(param->bEmitCLL)); H0(" --no-hdr10 Control dumping of HDR10 SEI packet. If max-cll or master-display has non-zero values, this is enabled. Default %s\n", OPT(param->bEmitHDR10SEI)); H0(" --no-hdr-opt Add luma and chroma offsets for HDR/WCG content. Default %s. Now deprecated.\n", OPT(param->bHDROpt)); @@ -327,6 +351,8 @@ H0(" --no-idr-recovery-sei Emit recovery point infor SEI at each IDR frame \n"); H0(" --no-temporal-layers Enable a temporal sublayer for unreferenced B frames. Default %s\n", OPT(param->bEnableTemporalSubLayers)); H0(" --no-aud Emit access unit delimiters at the start of each access unit. Default %s\n", OPT(param->bEnableAccessUnitDelimiters)); + H0(" --no-eob Emit end of bitstream nal unit at the end of the bitstream. Default %s\n", OPT(param->bEnableEndOfBitstream)); + H0(" --no-eos Emit end of sequence nal unit at the end of every coded video sequence. Default %s\n", OPT(param->bEnableEndOfSequence)); H1(" --hash <integer> Decoded Picture Hash SEI 0: disabled, 1: MD5, 2: CRC, 3: Checksum. Default %d\n", param->decodedPictureHashSEI); H0(" --atc-sei <integer> Emit the alternative transfer characteristics SEI message where the integer is the preferred transfer characteristics. Default disabled\n"); H0(" --pic-struct <integer> Set the picture structure and emits it in the picture timing SEI message. Values in the range 0..12. See D.3.3 of the HEVC spec. for a detailed explanation.\n"); @@ -344,6 +370,7 @@ H0(" --lowpass-dct Use low-pass subband dct approximation. Default %s\n", OPT(param->bLowPassDct)); H0(" --no-frame-dup Enable Frame duplication. Default %s\n", OPT(param->bEnableFrameDuplication)); H0(" --dup-threshold <integer> PSNR threshold for Frame duplication. Default %d\n", param->dupThreshold); + H0(" --no-mcstf Enable GOP based temporal filter. Default %d\n", param->bEnableTemporalFilter); #ifdef SVT_HEVC H0(" --nosvt Enable SVT HEVC encoder %s\n", OPT(param->bEnableSvtHevc)); H0(" --no-svt-hme Enable Hierarchial motion estimation(HME) in SVT HEVC encoder \n"); @@ -365,6 +392,9 @@ H1(" 2 - unable to open encoder\n"); H1(" 3 - unable to generate stream headers\n"); H1(" 4 - encoder abort\n"); + H0("\nSEI Message Options\n"); + H0(" --film-grain <filename> File containing Film Grain Characteristics to be written as a SEI Message\n"); + #undef OPT #undef H0 #undef H1 @@ -1010,57 +1040,57 @@ return 1; } - /* Parse the RPU file and extract the RPU corresponding to the current picture - * and fill the rpu field of the input picture */ - int CLIOptions::rpuParser(x265_picture * pic) - { - uint8_t byteVal; - uint32_t code = 0; - int bytesRead = 0; - pic->rpu.payloadSize = 0; - - if (!pic->pts) - { - while (bytesRead++ < 4 && fread(&byteVal, sizeof(uint8_t), 1, dolbyVisionRpu)) - code = (code << 8) | byteVal; - - if (code != START_CODE) - { - x265_log(NULL, X265_LOG_ERROR, "Invalid Dolby Vision RPU startcode in POC %d\n", pic->pts); - return 1; - } - } - - bytesRead = 0; - while (fread(&byteVal, sizeof(uint8_t), 1, dolbyVisionRpu)) - { - code = (code << 8) | byteVal; - if (bytesRead++ < 3) - continue; - if (bytesRead >= 1024) - { - x265_log(NULL, X265_LOG_ERROR, "Invalid Dolby Vision RPU size in POC %d\n", pic->pts); - return 1; - } - - if (code != START_CODE) - pic->rpu.payloadpic->rpu.payloadSize++ = (code >> (3 * 8)) & 0xFF; - else - return 0; - } - - int ShiftBytes = START_CODE_BYTES - (bytesRead - pic->rpu.payloadSize); - int bytesLeft = bytesRead - pic->rpu.payloadSize; - code = (code << ShiftBytes * 8); - for (int i = 0; i < bytesLeft; i++) - { - pic->rpu.payloadpic->rpu.payloadSize++ = (code >> (3 * 8)) & 0xFF; - code = (code << 8); - } - if (!pic->rpu.payloadSize) - x265_log(NULL, X265_LOG_WARNING, "Dolby Vision RPU not found for POC %d\n", pic->pts); - return 0; - } + /* Parse the RPU file and extract the RPU corresponding to the current picture + * and fill the rpu field of the input picture */ + int CLIOptions::rpuParser(x265_picture * pic) + { + uint8_t byteVal; + uint32_t code = 0; + int bytesRead = 0; + pic->rpu.payloadSize = 0; + + if (!pic->pts) + { + while (bytesRead++ < 4 && fread(&byteVal, sizeof(uint8_t), 1, dolbyVisionRpu)) + code = (code << 8) | byteVal; + + if (code != START_CODE) + { + x265_log(NULL, X265_LOG_ERROR, "Invalid Dolby Vision RPU startcode in POC %d\n", pic->pts); + return 1; + } + } + + bytesRead = 0; + while (fread(&byteVal, sizeof(uint8_t), 1, dolbyVisionRpu)) + { + code = (code << 8) | byteVal; + if (bytesRead++ < 3) + continue; + if (bytesRead >= 1024) + { + x265_log(NULL, X265_LOG_ERROR, "Invalid Dolby Vision RPU size in POC %d\n", pic->pts); + return 1; + } + + if (code != START_CODE) + pic->rpu.payloadpic->rpu.payloadSize++ = (code >> (3 * 8)) & 0xFF; + else + return 0; + } + + int ShiftBytes = START_CODE_BYTES - (bytesRead - pic->rpu.payloadSize); + int bytesLeft = bytesRead - pic->rpu.payloadSize; + code = (code << ShiftBytes * 8); + for (int i = 0; i < bytesLeft; i++) + { + pic->rpu.payloadpic->rpu.payloadSize++ = (code >> (3 * 8)) & 0xFF; + code = (code << 8); + } + if (!pic->rpu.payloadSize) + x265_log(NULL, X265_LOG_WARNING, "Dolby Vision RPU not found for POC %d\n", pic->pts); + return 0; + } #ifdef __cplusplus }
View file
x265_3.5.tar.gz/source/x265cli.h
Changed
@@ -143,7 +143,6 @@ { "scenecut-bias", required_argument, NULL, 0 }, { "hist-scenecut", no_argument, NULL, 0}, { "no-hist-scenecut", no_argument, NULL, 0}, - { "hist-threshold", required_argument, NULL, 0}, { "fades", no_argument, NULL, 0 }, { "no-fades", no_argument, NULL, 0 }, { "scenecut-aware-qp", required_argument, NULL, 0 }, @@ -182,6 +181,8 @@ { "qp", required_argument, NULL, 'q' }, { "aq-mode", required_argument, NULL, 0 }, { "aq-strength", required_argument, NULL, 0 }, + { "sbrc", no_argument, NULL, 0 }, + { "no-sbrc", no_argument, NULL, 0 }, { "rc-grain", no_argument, NULL, 0 }, { "no-rc-grain", no_argument, NULL, 0 }, { "ipratio", required_argument, NULL, 0 }, @@ -244,6 +245,7 @@ { "crop-rect", required_argument, NULL, 0 }, /* DEPRECATED */ { "master-display", required_argument, NULL, 0 }, { "max-cll", required_argument, NULL, 0 }, + {"video-signal-type-preset", required_argument, NULL, 0 }, { "min-luma", required_argument, NULL, 0 }, { "max-luma", required_argument, NULL, 0 }, { "log2-max-poc-lsb", required_argument, NULL, 8 }, @@ -263,6 +265,10 @@ { "repeat-headers", no_argument, NULL, 0 }, { "aud", no_argument, NULL, 0 }, { "no-aud", no_argument, NULL, 0 }, + { "eob", no_argument, NULL, 0 }, + { "no-eob", no_argument, NULL, 0 }, + { "eos", no_argument, NULL, 0 }, + { "no-eos", no_argument, NULL, 0 }, { "info", no_argument, NULL, 0 }, { "no-info", no_argument, NULL, 0 }, { "zones", required_argument, NULL, 0 }, @@ -349,6 +355,8 @@ { "frame-dup", no_argument, NULL, 0 }, { "no-frame-dup", no_argument, NULL, 0 }, { "dup-threshold", required_argument, NULL, 0 }, + { "mcstf", no_argument, NULL, 0 }, + { "no-mcstf", no_argument, NULL, 0 }, #ifdef SVT_HEVC { "svt", no_argument, NULL, 0 }, { "no-svt", no_argument, NULL, 0 }, @@ -373,6 +381,7 @@ { "abr-ladder", required_argument, NULL, 0 }, { "min-vbv-fullness", required_argument, NULL, 0 }, { "max-vbv-fullness", required_argument, NULL, 0 }, + { "film-grain", required_argument, NULL, 0 }, { 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 0, 0, 0 },
Locations
Projects
Search
Status Monitor
Help
Open Build Service
OBS Manuals
API Documentation
OBS Portal
Reporting a Bug
Contact
Mailing List
Forums
Chat (IRC)
Twitter
Open Build Service (OBS)
is an
openSUSE project
.