Packman Build Service PMBS

We truncated the diff of some files because they were too big. If you want to see the full diff for every file, click here.

Difference Between Revision 2 and Essentials / x265

x265.changes Changed

x265.spec Changed

baselibs.conf Changed

x265_3.5.tar.gz/build/aarch64-linux/crosscompile.cmake Changed

x265_3.5.tar.gz/build/arm-linux/make-Makefiles.bash Changed

x265_3.5.tar.gz/doc/reST/cli.rst Changed

@@ -632,9 +632,8 @@
 	auto-detection by the encoder. If specified, the encoder will
 	attempt to bring the encode specifications within that specified
 	level. If the encoder is unable to reach the level it issues a
-	warning and aborts the encode. If the requested requirement level is
-	higher than the actual level, the actual requirement level is
-	signaled.
+	warning and aborts the encode. The requested level will be signaled 
+	in the bitstream even if it is higher than the actual level.
 
 	Beware, specifying a decoder level will force the encoder to enable
 	VBV for constant rate factor encodes, which may introduce
@@ -714,11 +713,8 @@
 	(main, main10, etc). Second, an encoder is created from this
 	x265_param instance and the :option:`--level-idc` and
 	:option:`--high-tier` parameters are used to reduce bitrate or other
-	features in order to enforce the target level. Finally, the encoder
-	re-examines the final set of parameters and detects the actual
-	minimum decoder requirement level and this is what is signaled in
-	the bitstream headers. The detected decoder level will only use High
-	tier if the user specified a High tier level.
+	features in order to enforce the target level. The detected decoder level
+	will only use High tier if the user specified a High tier level.
 
 	The signaled profile will be determined by the encoder's internal
 	bitdepth and input color space. If :option:`--keyint` is 0 or 1,
@@ -961,21 +957,21 @@
 	Note that :option:`--analysis-save-reuse-level` and :option:`--analysis-load-reuse-level` must be paired
 	with :option:`--analysis-save` and :option:`--analysis-load` respectively.
 
-	+--------------+------------------------------------------+
-	| Level        | Description                              |
-	+==============+==========================================+
-	| 1            | Lookahead information                    |
-	+--------------+------------------------------------------+
-	| 2 to 4       | Level 1 + intra/inter modes, ref's       |
-	+--------------+------------------------------------------+
-	| 5 and 6      | Level 2 + rect-amp                       |
-	+--------------+------------------------------------------+
-	| 7            | Level 5 + AVC size CU refinement         |
-	+--------------+------------------------------------------+
-	| 8 and 9      | Level 5 + AVC size Full CU analysis-info |
-	+--------------+------------------------------------------+
-	| 10           | Level 5 + Full CU analysis-info          |
-	+--------------+------------------------------------------+
+	+--------------+---------------------------------------------------+
+	| Level        | Description                                       |
+	+==============+===================================================+
+	| 1            | Lookahead information                             |
+	+--------------+---------------------------------------------------+
+	| 2 to 4       | Level 1 + intra/inter modes, depth, ref's, cutree |
+	+--------------+---------------------------------------------------+
+	| 5 and 6      | Level 2 + rect-amp                                |
+	+--------------+---------------------------------------------------+
+	| 7            | Level 5 + AVC size CU refinement                  |
+	+--------------+---------------------------------------------------+
+	| 8 and 9      | Level 5 + AVC size Full CU analysis-info          |
+	+--------------+---------------------------------------------------+
+	| 10           | Level 5 + Full CU analysis-info                   |
+	+--------------+---------------------------------------------------+
 
 .. option:: --refine-mv-type <string>
 
@@ -1332,6 +1328,11 @@
 	Search range for HME level 0, 1 and 2.
 	The Search Range for each HME level must be between 0 and 32768(excluding).
 	Default search range is 16,32,48 for level 0,1,2 respectively.
+	
+.. option:: --mcstf, --no-mcstf
+
+    Enable Motion Compensated Temporal filtering.
+	Default: disabled
 
 Spatial/intra options
 =====================
@@ -1473,17 +1474,9 @@
 
 .. option:: --hist-scenecut, --no-hist-scenecut
 
-	Indicates that scenecuts need to be detected using luma edge and chroma histograms.
-	:option:`--hist-scenecut` enables scenecut detection using the histograms and disables the default scene cut algorithm.
-	:option:`--no-hist-scenecut` disables histogram based scenecut algorithm.
-	
-.. option:: --hist-threshold <0.0..1.0>
-
-	This value represents the threshold for normalized SAD of edge histograms used in scenecut detection.
-	This requires :option:`--hist-scenecut` to be enabled. For example, a value of 0.2 indicates that a frame with normalized SAD value 
-	greater than 0.2 against the previous frame as scenecut. 
-	Increasing the threshold reduces the number of scenecuts detected.
-	Default 0.03.
+	Scenecuts detected based on histogram, intensity and variance of the picture.
+	:option:`--hist-scenecut` enables or :option:`--no-hist-scenecut` disables scenecut detection based on
+	histogram.
 	
 .. option:: --radl <integer>
 	
@@ -1766,6 +1759,11 @@
 	Default 1.0.
 	**Range of values:** 0.0 to 3.0
 
+.. option:: --sbrc --no-sbrc
+
+	To enable and disable segment based rate control.
+	Default: disabled.
+
 .. option:: --hevc-aq
 
 	Enable adaptive quantization
@@ -2057,6 +2055,14 @@
    rate control mode.
 
    Default disabled. **Experimental feature**
+   
+
+.. option:: bEncFocusedFramesOnly
+
+	Used to trigger encoding of selective GOPs; Disabled by default.
+	
+	**API ONLY**
+	
 
 Quantization Options
 ====================
@@ -2427,6 +2433,81 @@
 	Values in the range 0..12. See D.3.3 of the HEVC spec. for a detailed explanation.
 	Required for HLG (Hybrid Log Gamma) signaling. Not signaled by default.
 
+.. option:: --video-signal-type-preset <string>
+
+	Specify combinations of color primaries, transfer characteristics, color matrix,
+	range of luma and chroma signals, and chroma sample location.
+	String format: <system-id>:<color-volume>
+	
+	This has higher precedence than individual VUI parameters. If any individual VUI option
+	is specified together with this, which changes the values set corresponding to the system-id
+	or color-volume, it will be discarded.
+
+	system-id options and their corresponding values:
+	+----------------+---------------------------------------------------------------+
+	| system-id      | Value                                                         |
+	+================+===============================================================+
+	| BT601_525      | --colorprim smpte170m --transfer smpte170m                    |
+	|                | --colormatrix smpte170m --range limited --chromaloc 0         |
+	+----------------+---------------------------------------------------------------+
+	| BT601_626      | --colorprim bt470bg --transfer smpte170m --colormatrix bt470bg|
+	|                | --range limited --chromaloc 0                                 |
+	+----------------+---------------------------------------------------------------+
+	| BT709_YCC      | --colorprim bt709 --transfer bt709 --colormatrix bt709        |
+	|                | --range limited --chromaloc 0                                 |
+	+----------------+---------------------------------------------------------------+
+	| BT709_RGB      | --colorprim bt709 --transfer bt709 --colormatrix gbr          |
+	|                | --range limited                                               |
+	+----------------+---------------------------------------------------------------+
+	| BT2020_YCC_NCL | --colorprim bt2020 --transfer bt2020-10 --colormatrix bt709   |
+	|                | --range limited --chromaloc 2                                 |
+	+----------------+---------------------------------------------------------------+
+	| BT2020_RGB     | --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020nc|
+	|                | --range limited                                               |
+	+----------------+---------------------------------------------------------------+
+	| BT2100_PQ_YCC  | --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020nc|
+	|                | --range limited --chromaloc 2                                 |
+	+----------------+---------------------------------------------------------------+
+	| BT2100_PQ_ICTCP| --colorprim bt2020 --transfer smpte2084 --colormatrix ictcp   |
+	|                | --range limited --chromaloc 2                                 |
+	+----------------+---------------------------------------------------------------+
+	| BT2100_PQ_RGB  | --colorprim bt2020 --transfer smpte2084 --colormatrix gbr     |
+	|                | --range limited                                               |
+	+----------------+---------------------------------------------------------------+
+	| BT2100_HLG_YCC | --colorprim bt2020 --transfer arib-std-b67                    |
+	|                | --colormatrix bt2020nc --range limited --chromaloc 2          |
+	+----------------+---------------------------------------------------------------+
+	| BT2100_HLG_RGB | --colorprim bt2020 --transfer arib-std-b67 --colormatrix gbr  |
+	|                | --range limited                                               |
+	+----------------+---------------------------------------------------------------+
+	| FR709_RGB      | --colorprim bt709 --transfer bt709 --colormatrix gbr          |
+	|                | --range full                                                  |
+	+----------------+---------------------------------------------------------------+
+	| FR2020_RGB     | --colorprim bt2020 --transfer bt2020-10 --colormatrix gbr     |
+	|                | --range full                                                  |
+	+----------------+---------------------------------------------------------------+
+	| FRP3D65_YCC    | --colorprim smpte432 --transfer bt709 --colormatrix smpte170m |
+	|                | --range full --chromaloc 1                                    |
+	+----------------+---------------------------------------------------------------+
+
+	color-volume options and their corresponding values:
+	+----------------+---------------------------------------------------------------+
+	| color-volume   | Value                                                         |
+	+================+===============================================================+
+	| P3D65x1000n0005| --master-display G(13250,34500)B(7500,3000)R(34000,16000)     |
+	|                |                  WP(15635,16450)L(10000000,5)                 |
+	+----------------+---------------------------------------------------------------+
+	| P3D65x4000n005 | --master-display G(13250,34500)B(7500,3000)R(34000,16000)     |
+	|                |                  WP(15635,16450)L(40000000,50)                |
+	+----------------+---------------------------------------------------------------+
+	| BT2100x108n0005| --master-display G(8500,39850)B(6550,2300)R(34000,146000)     |
+	|                |                  WP(15635,16450)L(10000000,1)                 |
+	+----------------+---------------------------------------------------------------+
+
+	Note: The color-volume options can be used only with the system-id options BT2100_PQ_YCC,
+	       BT2100_PQ_ICTCP, and BT2100_PQ_RGB. It is incompatible with other options.

x265_3.5.tar.gz/doc/reST/introduction.rst Changed

x265_3.5.tar.gz/readme.rst Changed

x265_3.5.tar.gz/source/CMakeLists.txt Changed

@@ -29,7 +29,7 @@
 option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
 mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 199)
+set(X265_BUILD 206)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
                "${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
@@ -38,14 +38,20 @@
 SET(CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake" "${CMAKE_MODULE_PATH}")
 
 # System architecture detection
-string(TOLOWER "${CMAKE_SYSTEM_PROCESSOR}" SYSPROC)
+if (APPLE AND CMAKE_OSX_ARCHITECTURES)
+    string(TOLOWER "${CMAKE_OSX_ARCHITECTURES}" SYSPROC)
+else()
+    string(TOLOWER "${CMAKE_SYSTEM_PROCESSOR}" SYSPROC)
+endif()
 set(X86_ALIASES x86 i386 i686 x86_64 amd64)
-set(ARM_ALIASES armv6l armv7l aarch64)
+set(ARM_ALIASES armv6l armv7l)
+set(ARM64_ALIASES arm64 arm64e aarch64)
 list(FIND X86_ALIASES "${SYSPROC}" X86MATCH)
 list(FIND ARM_ALIASES "${SYSPROC}" ARMMATCH)
+list(FIND ARM64_ALIASES "${SYSPROC}" ARM64MATCH)
 set(POWER_ALIASES ppc64 ppc64le)
 list(FIND POWER_ALIASES "${SYSPROC}" POWERMATCH)
-if("${SYSPROC}" STREQUAL "" OR X86MATCH GREATER "-1")
+if(X86MATCH GREATER "-1")
     set(X86 1)
     add_definitions(-DX265_ARCH_X86=1)
     if(CMAKE_CXX_FLAGS STREQUAL "-m32")
@@ -70,15 +76,18 @@
     else()
         set(CROSS_COMPILE_ARM 0)
     endif()
+	message(STATUS "Detected ARM target processor")
     set(ARM 1)
-    if("${CMAKE_SIZEOF_VOID_P}" MATCHES 8)
-        message(STATUS "Detected ARM64 target processor")
-        set(ARM64 1)
-        add_definitions(-DX265_ARCH_ARM=1 -DX265_ARCH_ARM64=1 -DHAVE_ARMV6=0)
-    else()
-        message(STATUS "Detected ARM target processor")
-        add_definitions(-DX265_ARCH_ARM=1 -DX265_ARCH_ARM64=0 -DHAVE_ARMV6=1)
-    endif()
+    add_definitions(-DX265_ARCH_ARM=1 -DHAVE_ARMV6=1)
+elseif(ARM64MATCH GREATER "-1")
+    #if(CROSS_COMPILE_ARM64)
+        #message(STATUS "Cross compiling for ARM64 arch")
+    #else()
+        #set(CROSS_COMPILE_ARM64 0)
+    #endif()
+    message(STATUS "Detected ARM64 target processor")
+    set(ARM64 1)
+    add_definitions(-DX265_ARCH_ARM64=1 -DHAVE_NEON)
 else()
     message(STATUS "CMAKE_SYSTEM_PROCESSOR value `${CMAKE_SYSTEM_PROCESSOR}` is unknown")
     message(STATUS "Please add this value near ${CMAKE_CURRENT_LIST_FILE}:${CMAKE_CURRENT_LIST_LINE}")
@@ -239,24 +248,22 @@
         endif()
     endif()
     if(ARM AND CROSS_COMPILE_ARM)
-        if(ARM64)
-            set(ARM_ARGS -fPIC)
-        else()
-            set(ARM_ARGS -march=armv6 -mfloat-abi=soft -mfpu=vfp -marm -fPIC)
-        endif()
         message(STATUS "cross compile arm")
+		set(ARM_ARGS -march=armv6 -mfloat-abi=soft -mfpu=vfp -marm -fPIC)
     elseif(ARM)
-        if(ARM64)
-            set(ARM_ARGS -fPIC)
+        find_package(Neon)
+        if(CPU_HAS_NEON)
+            set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=neon -marm -fPIC)
             add_definitions(-DHAVE_NEON)
         else()
-            find_package(Neon)
-            if(CPU_HAS_NEON)
-                set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=neon -marm -fPIC)
-                add_definitions(-DHAVE_NEON)
-            else()
-                set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=vfp -marm)
-            endif()
+            set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=vfp -marm)
+        endif()
+    endif()
+	if(ARM64 OR CROSS_COMPILE_ARM64)
+	    set(ARM_ARGS -fPIC -flax-vector-conversions)
+        find_package(Neon)
+        if(CPU_HAS_NEON)
+            add_definitions(-DHAVE_NEON)
         endif()
     endif()
     add_definitions(${ARM_ARGS})
@@ -350,7 +357,7 @@
 endif(GCC)
 
 find_package(Nasm)
-if(ARM OR CROSS_COMPILE_ARM)
+if(ARM OR CROSS_COMPILE_ARM OR ARM64 OR CROSS_COMPILE_ARM64)
     option(ENABLE_ASSEMBLY "Enable use of assembly coded primitives" ON)
 elseif(NASM_FOUND AND X86)
     if (NASM_VERSION_STRING VERSION_LESS "2.13.0")
@@ -440,6 +447,18 @@
 endif()
 add_definitions(-DX265_NS=${X265_NS})
 
+if(ARM64)
+  if(HIGH_BIT_DEPTH)
+    if(MAIN12)
+      list(APPEND ASM_FLAGS -DHIGH_BIT_DEPTH=1 -DBIT_DEPTH=12 -DX265_NS=${X265_NS})
+    else()
+      list(APPEND ASM_FLAGS -DHIGH_BIT_DEPTH=1 -DBIT_DEPTH=10 -DX265_NS=${X265_NS})
+    endif()
+  else()
+    list(APPEND ASM_FLAGS -DHIGH_BIT_DEPTH=0 -DBIT_DEPTH=8 -DX265_NS=${X265_NS})
+  endif()
+endif(ARM64)
+
 option(WARNINGS_AS_ERRORS "Stop compiles on first warning" OFF)
 if(WARNINGS_AS_ERRORS)
     if(GCC)
@@ -536,11 +555,7 @@
     # compile ARM arch asm files here
         enable_language(ASM)
         foreach(ASM ${ARM_ASMS})
-            if(ARM64)
-                set(ASM_SRC ${CMAKE_CURRENT_SOURCE_DIR}/common/aarch64/${ASM})
-            else()
-                set(ASM_SRC ${CMAKE_CURRENT_SOURCE_DIR}/common/arm/${ASM})
-            endif()
+			set(ASM_SRC ${CMAKE_CURRENT_SOURCE_DIR}/common/arm/${ASM})
             list(APPEND ASM_SRCS ${ASM_SRC})
             list(APPEND ASM_OBJS ${ASM}.${SUFFIX})
             add_custom_command(
@@ -549,6 +564,19 @@
                 ARGS ${ARM_ARGS} -c ${ASM_SRC} -o ${ASM}.${SUFFIX}
                 DEPENDS ${ASM_SRC})
         endforeach()
+	elseif(ARM64 OR CROSS_COMPILE_ARM64)
+    # compile ARM64 arch asm files here
+        enable_language(ASM)
+        foreach(ASM ${ARM_ASMS})
+            set(ASM_SRC ${CMAKE_CURRENT_SOURCE_DIR}/common/aarch64/${ASM})
+            list(APPEND ASM_SRCS ${ASM_SRC})
+            list(APPEND ASM_OBJS ${ASM}.${SUFFIX})
+            add_custom_command(
+                OUTPUT ${ASM}.${SUFFIX}
+                COMMAND ${CMAKE_CXX_COMPILER}
+                ARGS ${ARM_ARGS} ${ASM_FLAGS} -c ${ASM_SRC} -o ${ASM}.${SUFFIX}
+                DEPENDS ${ASM_SRC})
+        endforeach()
     elseif(X86)
     # compile X86 arch asm files here
         foreach(ASM ${MSVC_ASMS})

x265_3.5.tar.gz/source/abrEncApp.cpp Changed

@@ -1,1111 +1,1111 @@
-/*****************************************************************************
-* Copyright (C) 2013-2020 MulticoreWare, Inc
-*
-* Authors: Pooja Venkatesan <pooja@multicorewareinc.com>
-*          Aruna Matheswaran <aruna@multicorewareinc.com>
-*
-* This program is free software; you can redistribute it and/or modify
-* it under the terms of the GNU General Public License as published by
-* the Free Software Foundation; either version 2 of the License, or
-* (at your option) any later version.
-*
-* This program is distributed in the hope that it will be useful,
-* but WITHOUT ANY WARRANTY; without even the implied warranty of
-* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-* GNU General Public License for more details.
-*
-* You should have received a copy of the GNU General Public License
-* along with this program; if not, write to the Free Software
-* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
-*
-* This program is also available under a commercial proprietary license.
-* For more information, contact us at license @ x265.com.
-*****************************************************************************/
-
-#include "abrEncApp.h"
-#include "mv.h"
-#include "slice.h"
-#include "param.h"
-
-#include <signal.h>
-#include <errno.h>
-
-#include <queue>
-
-using namespace X265_NS;
-
-/* Ctrl-C handler */
-static volatile sig_atomic_t b_ctrl_c /* = 0 */;
-static void sigint_handler(int)
-{
-    b_ctrl_c = 1;
-}
-
-namespace X265_NS {
-    // private namespace
-#define X265_INPUT_QUEUE_SIZE 250
-
-    AbrEncoder::AbrEncoder(CLIOptions cliopt, uint8_t numEncodes, int &ret)
-    {
-        m_numEncodes = numEncodes;
-        m_numActiveEncodes.set(numEncodes);
-        m_queueSize = (numEncodes > 1) ? X265_INPUT_QUEUE_SIZE : 1;
-        m_passEnc = X265_MALLOC(PassEncoder*, m_numEncodes);
-
-        for (uint8_t i = 0; i < m_numEncodes; i++)
-        {
-            m_passEnci = new PassEncoder(i, cliopti, this);
-            if (!m_passEnci)
-            {
-                x265_log(NULL, X265_LOG_ERROR, "Unable to allocate memory for passEncoder\n");
-                ret = 4;
-            }
-            m_passEnci->init(ret);
-        }
-
-        if (!allocBuffers())
-        {
-            x265_log(NULL, X265_LOG_ERROR, "Unable to allocate memory for buffers\n");
-            ret = 4;
-        }
-
-        /* start passEncoder worker threads */
-        for (uint8_t pass = 0; pass < m_numEncodes; pass++)
-            m_passEncpass->startThreads();
-    }
-
-    bool AbrEncoder::allocBuffers()
-    {
-        m_inputPicBuffer = X265_MALLOC(x265_picture**, m_numEncodes);
-        m_analysisBuffer = X265_MALLOC(x265_analysis_data*, m_numEncodes);
-
-        m_picWriteCnt = new ThreadSafeIntegerm_numEncodes;
-        m_picReadCnt = new ThreadSafeIntegerm_numEncodes;
-        m_analysisWriteCnt = new ThreadSafeIntegerm_numEncodes;
-        m_analysisReadCnt = new ThreadSafeIntegerm_numEncodes;
-
-        m_picIdxReadCnt = X265_MALLOC(ThreadSafeInteger*, m_numEncodes);
-        m_analysisWrite = X265_MALLOC(ThreadSafeInteger*, m_numEncodes);
-        m_analysisRead = X265_MALLOC(ThreadSafeInteger*, m_numEncodes);
-        m_readFlag = X265_MALLOC(int*, m_numEncodes);
-
-        for (uint8_t pass = 0; pass < m_numEncodes; pass++)
-        {
-            m_inputPicBufferpass = X265_MALLOC(x265_picture*, m_queueSize);
-            for (uint32_t idx = 0; idx < m_queueSize; idx++)
-            {
-                m_inputPicBufferpassidx = x265_picture_alloc();
-                x265_picture_init(m_passEncpass->m_param, m_inputPicBufferpassidx);
-            }
-
-            CHECKED_MALLOC_ZERO(m_analysisBufferpass, x265_analysis_data, m_queueSize);
-            m_picIdxReadCntpass = new ThreadSafeIntegerm_queueSize;
-            m_analysisWritepass = new ThreadSafeIntegerm_queueSize;
-            m_analysisReadpass = new ThreadSafeIntegerm_queueSize;
-            m_readFlagpass = X265_MALLOC(int, m_queueSize);
-        }
-        return true;
-    fail:
-        return false;
-    }
-
-    void AbrEncoder::destroy()
-    {
-        x265_cleanup(); /* Free library singletons */
-        for (uint8_t pass = 0; pass < m_numEncodes; pass++)
-        {
-            for (uint32_t index = 0; index < m_queueSize; index++)
-            {
-                X265_FREE(m_inputPicBufferpassindex->planes0);
-                x265_picture_free(m_inputPicBufferpassindex);
-            }
-
-            X265_FREE(m_inputPicBufferpass);
-            X265_FREE(m_analysisBufferpass);
-            X265_FREE(m_readFlagpass);
-            delete m_picIdxReadCntpass;
-            delete m_analysisWritepass;
-            delete m_analysisReadpass;
-            m_passEncpass->destroy();
-            delete m_passEncpass;
-        }
-        X265_FREE(m_inputPicBuffer);
-        X265_FREE(m_analysisBuffer);
-        X265_FREE(m_readFlag);
-
-        delete m_picWriteCnt;
-        delete m_picReadCnt;
-        delete m_analysisWriteCnt;
-        delete m_analysisReadCnt;
-
-        X265_FREE(m_picIdxReadCnt);
-        X265_FREE(m_analysisWrite);
-        X265_FREE(m_analysisRead);
-
-        X265_FREE(m_passEnc);
-    }
-
-    PassEncoder::PassEncoder(uint32_t id, CLIOptions cliopt, AbrEncoder *parent)
-    {
-        m_id = id;
-        m_cliopt = cliopt;
-        m_parent = parent;
-        if(!(m_cliopt.enableScaler && m_id))
-            m_input = m_cliopt.input;
-        m_param = cliopt.param;
-        m_inputOver = false;
-        m_lastIdx = -1;
-        m_encoder = NULL;
-        m_scaler = NULL;
-        m_reader = NULL;
-        m_ret = 0;
-    }
-
-    int PassEncoder::init(int &result)
-    {
-        if (m_parent->m_numEncodes > 1)
-            setReuseLevel();
-                
-        if (!(m_cliopt.enableScaler && m_id))
-            m_reader = new Reader(m_id, this);
-        else
-        {
-            VideoDesc *src = NULL, *dst = NULL;
-            dst = new VideoDesc(m_param->sourceWidth, m_param->sourceHeight, m_param->internalCsp, m_param->internalBitDepth);
-            int dstW = m_parent->m_passEncm_id - 1->m_param->sourceWidth;
-            int dstH = m_parent->m_passEncm_id - 1->m_param->sourceHeight;
-            src = new VideoDesc(dstW, dstH, m_param->internalCsp, m_param->internalBitDepth);
-            if (src != NULL && dst != NULL)
-            {
-                m_scaler = new Scaler(0, 1, m_id, src, dst, this);
-                if (!m_scaler)
-                {
-                    x265_log(m_param, X265_LOG_ERROR, "\n MALLOC failure in Scaler");
-                    result = 4;
-                }
-            }
-        }
-
-        /* note: we could try to acquire a different libx265 API here based on
-        * the profile found during option parsing, but it must be done before
-        * opening an encoder */
-
-        if (m_param)
-            m_encoder = m_cliopt.api->encoder_open(m_param);
-        if (!m_encoder)
-        {
-            x265_log(NULL, X265_LOG_ERROR, "x265_encoder_open() failed for Enc, \n");
-            m_ret = 2;
-            return -1;

x265_3.5.tar.gz/source/cmake/FindNeon.cmake Changed

x265_3.5.tar.gz/source/common/CMakeLists.txt Changed

@@ -84,35 +84,38 @@
 endif(ENABLE_ASSEMBLY AND X86)
 
 if(ENABLE_ASSEMBLY AND (ARM OR CROSS_COMPILE_ARM))
-    if(ARM64)
-        if(GCC AND (CMAKE_CXX_FLAGS_RELEASE MATCHES "-O3"))
-            message(STATUS "Detected CXX compiler using -O3 optimization level")
-            add_definitions(-DAUTO_VECTORIZE=1)
-        endif()
-        set(C_SRCS asm-primitives.cpp pixel.h ipfilter8.h)
-
-        # add ARM assembly/intrinsic files here
-        set(A_SRCS asm.S mc-a.S sad-a.S pixel-util.S ipfilter8.S)
-        set(VEC_PRIMITIVES)
+    set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h)
 
-        set(ARM_ASMS "${A_SRCS}" CACHE INTERNAL "ARM Assembly Sources")
-        foreach(SRC ${C_SRCS})
-            set(ASM_PRIMITIVES ${ASM_PRIMITIVES} aarch64/${SRC})
-        endforeach()
-    else()
-        set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h)
+    # add ARM assembly/intrinsic files here
+    set(A_SRCS asm.S cpu-a.S mc-a.S sad-a.S pixel-util.S ssd-a.S blockcopy8.S ipfilter8.S dct-a.S)
+    set(VEC_PRIMITIVES)
 
-        # add ARM assembly/intrinsic files here
-        set(A_SRCS asm.S cpu-a.S mc-a.S sad-a.S pixel-util.S ssd-a.S blockcopy8.S ipfilter8.S dct-a.S)
-        set(VEC_PRIMITIVES)
+    set(ARM_ASMS "${A_SRCS}" CACHE INTERNAL "ARM Assembly Sources")
+    foreach(SRC ${C_SRCS})
+        set(ASM_PRIMITIVES ${ASM_PRIMITIVES} arm/${SRC})
+    endforeach()
+    source_group(Assembly FILES ${ASM_PRIMITIVES})
+endif(ENABLE_ASSEMBLY AND (ARM OR CROSS_COMPILE_ARM))
 
-        set(ARM_ASMS "${A_SRCS}" CACHE INTERNAL "ARM Assembly Sources")
-        foreach(SRC ${C_SRCS})
-            set(ASM_PRIMITIVES ${ASM_PRIMITIVES} arm/${SRC})
-        endforeach()
+if(ENABLE_ASSEMBLY AND (ARM64 OR CROSS_COMPILE_ARM64))
+    if(GCC AND (CMAKE_CXX_FLAGS_RELEASE MATCHES "-O3"))
+        message(STATUS "Detected CXX compiler using -O3 optimization level")
+        add_definitions(-DAUTO_VECTORIZE=1)
     endif()
+
+    set(C_SRCS asm-primitives.cpp pixel-prim.h pixel-prim.cpp filter-prim.h filter-prim.cpp dct-prim.h dct-prim.cpp loopfilter-prim.cpp loopfilter-prim.h intrapred-prim.cpp arm64-utils.cpp arm64-utils.h fun-decls.h)
+    enable_language(ASM)
+
+    # add ARM assembly/intrinsic files here
+    set(A_SRCS asm.S mc-a.S sad-a.S pixel-util.S p2s.S ipfilter.S blockcopy8.S ssd-a.S)
+    set(VEC_PRIMITIVES)
+
+    set(ARM_ASMS "${A_SRCS}" CACHE INTERNAL "ARM Assembly Sources")
+    foreach(SRC ${C_SRCS})
+        set(ASM_PRIMITIVES ${ASM_PRIMITIVES} aarch64/${SRC})
+    endforeach()
     source_group(Assembly FILES ${ASM_PRIMITIVES})
-endif(ENABLE_ASSEMBLY AND (ARM OR CROSS_COMPILE_ARM))
+endif(ENABLE_ASSEMBLY AND (ARM64 OR CROSS_COMPILE_ARM64))
 
 if(POWER)
     set_source_files_properties(version.cpp PROPERTIES COMPILE_FLAGS -DX265_VERSION=${X265_VERSION})
@@ -169,4 +172,6 @@
     scalinglist.cpp scalinglist.h
     quant.cpp quant.h contexts.h
     deblock.cpp deblock.h
-    scaler.cpp scaler.h)
+    scaler.cpp scaler.h
+    ringmem.cpp ringmem.h
+    temporalfilter.cpp temporalfilter.h)

x265_3.5.tar.gz/source/common/aarch64/arm64-utils.cpp Added

@@ -0,0 +1,300 @@
+#include "common.h"
+#include "x265.h"
+#include "arm64-utils.h"
+#include <arm_neon.h>
+
+#define COPY_16(d,s) *(uint8x16_t *)(d) = *(uint8x16_t *)(s)
+namespace X265_NS
+{
+
+
+
+void transpose8x8(uint8_t *dst, const uint8_t *src, intptr_t dstride, intptr_t sstride)
+{
+    uint8x8_t a0, a1, a2, a3, a4, a5, a6, a7;
+    uint8x8_t b0, b1, b2, b3, b4, b5, b6, b7;
+
+    a0 = *(uint8x8_t *)(src + 0 * sstride);
+    a1 = *(uint8x8_t *)(src + 1 * sstride);
+    a2 = *(uint8x8_t *)(src + 2 * sstride);
+    a3 = *(uint8x8_t *)(src + 3 * sstride);
+    a4 = *(uint8x8_t *)(src + 4 * sstride);
+    a5 = *(uint8x8_t *)(src + 5 * sstride);
+    a6 = *(uint8x8_t *)(src + 6 * sstride);
+    a7 = *(uint8x8_t *)(src + 7 * sstride);
+
+    b0 = vtrn1_u32(a0, a4);
+    b1 = vtrn1_u32(a1, a5);
+    b2 = vtrn1_u32(a2, a6);
+    b3 = vtrn1_u32(a3, a7);
+    b4 = vtrn2_u32(a0, a4);
+    b5 = vtrn2_u32(a1, a5);
+    b6 = vtrn2_u32(a2, a6);
+    b7 = vtrn2_u32(a3, a7);
+
+    a0 = vtrn1_u16(b0, b2);
+    a1 = vtrn1_u16(b1, b3);
+    a2 = vtrn2_u16(b0, b2);
+    a3 = vtrn2_u16(b1, b3);
+    a4 = vtrn1_u16(b4, b6);
+    a5 = vtrn1_u16(b5, b7);
+    a6 = vtrn2_u16(b4, b6);
+    a7 = vtrn2_u16(b5, b7);
+
+    b0 = vtrn1_u8(a0, a1);
+    b1 = vtrn2_u8(a0, a1);
+    b2 = vtrn1_u8(a2, a3);
+    b3 = vtrn2_u8(a2, a3);
+    b4 = vtrn1_u8(a4, a5);
+    b5 = vtrn2_u8(a4, a5);
+    b6 = vtrn1_u8(a6, a7);
+    b7 = vtrn2_u8(a6, a7);
+
+    *(uint8x8_t *)(dst + 0 * dstride) = b0;
+    *(uint8x8_t *)(dst + 1 * dstride) = b1;
+    *(uint8x8_t *)(dst + 2 * dstride) = b2;
+    *(uint8x8_t *)(dst + 3 * dstride) = b3;
+    *(uint8x8_t *)(dst + 4 * dstride) = b4;
+    *(uint8x8_t *)(dst + 5 * dstride) = b5;
+    *(uint8x8_t *)(dst + 6 * dstride) = b6;
+    *(uint8x8_t *)(dst + 7 * dstride) = b7;
+}
+
+
+
+
+
+
+void transpose16x16(uint8_t *dst, const uint8_t *src, intptr_t dstride, intptr_t sstride)
+{
+    uint16x8_t a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, aA, aB, aC, aD, aE, aF;
+    uint16x8_t b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, bA, bB, bC, bD, bE, bF;
+    uint16x8_t c0, c1, c2, c3, c4, c5, c6, c7, c8, c9, cA, cB, cC, cD, cE, cF;
+    uint16x8_t d0, d1, d2, d3, d4, d5, d6, d7, d8, d9, dA, dB, dC, dD, dE, dF;
+
+    a0 = *(uint16x8_t *)(src + 0 * sstride);
+    a1 = *(uint16x8_t *)(src + 1 * sstride);
+    a2 = *(uint16x8_t *)(src + 2 * sstride);
+    a3 = *(uint16x8_t *)(src + 3 * sstride);
+    a4 = *(uint16x8_t *)(src + 4 * sstride);
+    a5 = *(uint16x8_t *)(src + 5 * sstride);
+    a6 = *(uint16x8_t *)(src + 6 * sstride);
+    a7 = *(uint16x8_t *)(src + 7 * sstride);
+    a8 = *(uint16x8_t *)(src + 8 * sstride);
+    a9 = *(uint16x8_t *)(src + 9 * sstride);
+    aA = *(uint16x8_t *)(src + 10 * sstride);
+    aB = *(uint16x8_t *)(src + 11 * sstride);
+    aC = *(uint16x8_t *)(src + 12 * sstride);
+    aD = *(uint16x8_t *)(src + 13 * sstride);
+    aE = *(uint16x8_t *)(src + 14 * sstride);
+    aF = *(uint16x8_t *)(src + 15 * sstride);
+
+    b0 = vtrn1q_u64(a0, a8);
+    b1 = vtrn1q_u64(a1, a9);
+    b2 = vtrn1q_u64(a2, aA);
+    b3 = vtrn1q_u64(a3, aB);
+    b4 = vtrn1q_u64(a4, aC);
+    b5 = vtrn1q_u64(a5, aD);
+    b6 = vtrn1q_u64(a6, aE);
+    b7 = vtrn1q_u64(a7, aF);
+    b8 = vtrn2q_u64(a0, a8);
+    b9 = vtrn2q_u64(a1, a9);
+    bA = vtrn2q_u64(a2, aA);
+    bB = vtrn2q_u64(a3, aB);
+    bC = vtrn2q_u64(a4, aC);
+    bD = vtrn2q_u64(a5, aD);
+    bE = vtrn2q_u64(a6, aE);
+    bF = vtrn2q_u64(a7, aF);
+
+    c0 = vtrn1q_u32(b0, b4);
+    c1 = vtrn1q_u32(b1, b5);
+    c2 = vtrn1q_u32(b2, b6);
+    c3 = vtrn1q_u32(b3, b7);
+    c4 = vtrn2q_u32(b0, b4);
+    c5 = vtrn2q_u32(b1, b5);
+    c6 = vtrn2q_u32(b2, b6);
+    c7 = vtrn2q_u32(b3, b7);
+    c8 = vtrn1q_u32(b8, bC);
+    c9 = vtrn1q_u32(b9, bD);
+    cA = vtrn1q_u32(bA, bE);
+    cB = vtrn1q_u32(bB, bF);
+    cC = vtrn2q_u32(b8, bC);
+    cD = vtrn2q_u32(b9, bD);
+    cE = vtrn2q_u32(bA, bE);
+    cF = vtrn2q_u32(bB, bF);
+
+    d0 = vtrn1q_u16(c0, c2);
+    d1 = vtrn1q_u16(c1, c3);
+    d2 = vtrn2q_u16(c0, c2);
+    d3 = vtrn2q_u16(c1, c3);
+    d4 = vtrn1q_u16(c4, c6);
+    d5 = vtrn1q_u16(c5, c7);
+    d6 = vtrn2q_u16(c4, c6);
+    d7 = vtrn2q_u16(c5, c7);
+    d8 = vtrn1q_u16(c8, cA);
+    d9 = vtrn1q_u16(c9, cB);
+    dA = vtrn2q_u16(c8, cA);
+    dB = vtrn2q_u16(c9, cB);
+    dC = vtrn1q_u16(cC, cE);
+    dD = vtrn1q_u16(cD, cF);
+    dE = vtrn2q_u16(cC, cE);
+    dF = vtrn2q_u16(cD, cF);
+
+    *(uint16x8_t *)(dst + 0 * dstride)  = vtrn1q_u8(d0, d1);
+    *(uint16x8_t *)(dst + 1 * dstride)  = vtrn2q_u8(d0, d1);
+    *(uint16x8_t *)(dst + 2 * dstride)  = vtrn1q_u8(d2, d3);
+    *(uint16x8_t *)(dst + 3 * dstride)  = vtrn2q_u8(d2, d3);
+    *(uint16x8_t *)(dst + 4 * dstride)  = vtrn1q_u8(d4, d5);
+    *(uint16x8_t *)(dst + 5 * dstride)  = vtrn2q_u8(d4, d5);
+    *(uint16x8_t *)(dst + 6 * dstride)  = vtrn1q_u8(d6, d7);
+    *(uint16x8_t *)(dst + 7 * dstride)  = vtrn2q_u8(d6, d7);
+    *(uint16x8_t *)(dst + 8 * dstride)  = vtrn1q_u8(d8, d9);
+    *(uint16x8_t *)(dst + 9 * dstride)  = vtrn2q_u8(d8, d9);
+    *(uint16x8_t *)(dst + 10 * dstride)  = vtrn1q_u8(dA, dB);
+    *(uint16x8_t *)(dst + 11 * dstride)  = vtrn2q_u8(dA, dB);
+    *(uint16x8_t *)(dst + 12 * dstride)  = vtrn1q_u8(dC, dD);
+    *(uint16x8_t *)(dst + 13 * dstride)  = vtrn2q_u8(dC, dD);
+    *(uint16x8_t *)(dst + 14 * dstride)  = vtrn1q_u8(dE, dF);
+    *(uint16x8_t *)(dst + 15 * dstride)  = vtrn2q_u8(dE, dF);
+
+
+}
+
+
+void transpose32x32(uint8_t *dst, const uint8_t *src, intptr_t dstride, intptr_t sstride)
+{
+    //assumption: there is no partial overlap
+    transpose16x16(dst, src, dstride, sstride);
+    transpose16x16(dst + 16 * dstride + 16, src + 16 * sstride + 16, dstride, sstride);
+    if (dst == src)
+    {
+        uint8_t tmp16 * 16 __attribute__((aligned(64)));
+        transpose16x16(tmp, src + 16, 16, sstride);
+        transpose16x16(dst + 16, src + 16 * sstride, dstride, sstride);
+        for (int i = 0; i < 16; i++)
+        {
+            COPY_16(dst + (16 + i)*dstride, tmp + 16 * i);
+        }
+    }
+    else
+    {
+        transpose16x16(dst + 16 * dstride, src + 16, dstride, sstride);
+        transpose16x16(dst + 16, src + 16 * sstride, dstride, sstride);
+    }
+
+}
+
+
+
+void transpose8x8(uint16_t *dst, const uint16_t *src, intptr_t dstride, intptr_t sstride)
+{
+    uint16x8_t a0, a1, a2, a3, a4, a5, a6, a7;
+    uint16x8_t b0, b1, b2, b3, b4, b5, b6, b7;
+
+    a0 = *(uint16x8_t *)(src + 0 * sstride);
+    a1 = *(uint16x8_t *)(src + 1 * sstride);
+    a2 = *(uint16x8_t *)(src + 2 * sstride);
+    a3 = *(uint16x8_t *)(src + 3 * sstride);
+    a4 = *(uint16x8_t *)(src + 4 * sstride);
+    a5 = *(uint16x8_t *)(src + 5 * sstride);

x265_3.5.tar.gz/source/common/aarch64/arm64-utils.h Added

x265_3.5.tar.gz/source/common/aarch64/asm-primitives.cpp Changed

@@ -3,6 +3,7 @@
  *
  * Authors: Hongbin Liu <liuhongbin1@huawei.com>
  *          Yimeng Su <yimeng.su@huawei.com>
+ *          Sebastian Pop <spop@amazon.com>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -22,11 +23,267 @@
  * For more information, contact us at license @ x265.com.
  *****************************************************************************/
 
+
 #include "common.h"
 #include "primitives.h"
 #include "x265.h"
 #include "cpu.h"
 
+extern "C" {
+#include "fun-decls.h"
+}
+
+#define ALL_LUMA_TU_TYPED(prim, fncdef, fname, cpu) \
+    p.cuBLOCK_4x4.prim   = fncdef PFX(fname ## _4x4_ ## cpu); \
+    p.cuBLOCK_8x8.prim   = fncdef PFX(fname ## _8x8_ ## cpu); \
+    p.cuBLOCK_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \
+    p.cuBLOCK_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \
+    p.cuBLOCK_64x64.prim = fncdef PFX(fname ## _64x64_ ## cpu)
+
+#define ALL_LUMA_TU(prim, fname, cpu)      ALL_LUMA_TU_TYPED(prim, , fname, cpu)
+
+#define ALL_LUMA_PU_TYPED(prim, fncdef, fname, cpu) \
+    p.puLUMA_4x4.prim   = fncdef PFX(fname ## _4x4_ ## cpu); \
+    p.puLUMA_8x8.prim   = fncdef PFX(fname ## _8x8_ ## cpu); \
+    p.puLUMA_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \
+    p.puLUMA_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \
+    p.puLUMA_64x64.prim = fncdef PFX(fname ## _64x64_ ## cpu); \
+    p.puLUMA_8x4.prim   = fncdef PFX(fname ## _8x4_ ## cpu); \
+    p.puLUMA_4x8.prim   = fncdef PFX(fname ## _4x8_ ## cpu); \
+    p.puLUMA_16x8.prim  = fncdef PFX(fname ## _16x8_ ## cpu); \
+    p.puLUMA_8x16.prim  = fncdef PFX(fname ## _8x16_ ## cpu); \
+    p.puLUMA_16x32.prim = fncdef PFX(fname ## _16x32_ ## cpu); \
+    p.puLUMA_32x16.prim = fncdef PFX(fname ## _32x16_ ## cpu); \
+    p.puLUMA_64x32.prim = fncdef PFX(fname ## _64x32_ ## cpu); \
+    p.puLUMA_32x64.prim = fncdef PFX(fname ## _32x64_ ## cpu); \
+    p.puLUMA_16x12.prim = fncdef PFX(fname ## _16x12_ ## cpu); \
+    p.puLUMA_12x16.prim = fncdef PFX(fname ## _12x16_ ## cpu); \
+    p.puLUMA_16x4.prim  = fncdef PFX(fname ## _16x4_ ## cpu); \
+    p.puLUMA_4x16.prim  = fncdef PFX(fname ## _4x16_ ## cpu); \
+    p.puLUMA_32x24.prim = fncdef PFX(fname ## _32x24_ ## cpu); \
+    p.puLUMA_24x32.prim = fncdef PFX(fname ## _24x32_ ## cpu); \
+    p.puLUMA_32x8.prim  = fncdef PFX(fname ## _32x8_ ## cpu); \
+    p.puLUMA_8x32.prim  = fncdef PFX(fname ## _8x32_ ## cpu); \
+    p.puLUMA_64x48.prim = fncdef PFX(fname ## _64x48_ ## cpu); \
+    p.puLUMA_48x64.prim = fncdef PFX(fname ## _48x64_ ## cpu); \
+    p.puLUMA_64x16.prim = fncdef PFX(fname ## _64x16_ ## cpu); \
+    p.puLUMA_16x64.prim = fncdef PFX(fname ## _16x64_ ## cpu)
+#define ALL_LUMA_PU(prim, fname, cpu) ALL_LUMA_PU_TYPED(prim, , fname, cpu)
+
+#define ALL_LUMA_PU_T(prim, fname) \
+    p.puLUMA_4x4.prim   = fname<LUMA_4x4>; \
+    p.puLUMA_8x8.prim   = fname<LUMA_8x8>; \
+    p.puLUMA_16x16.prim = fname<LUMA_16x16>; \
+    p.puLUMA_32x32.prim = fname<LUMA_32x32>; \
+    p.puLUMA_64x64.prim = fname<LUMA_64x64>; \
+    p.puLUMA_8x4.prim   = fname<LUMA_8x4>; \
+    p.puLUMA_4x8.prim   = fname<LUMA_4x8>; \
+    p.puLUMA_16x8.prim  = fname<LUMA_16x8>; \
+    p.puLUMA_8x16.prim  = fname<LUMA_8x16>; \
+    p.puLUMA_16x32.prim = fname<LUMA_16x32>; \
+    p.puLUMA_32x16.prim = fname<LUMA_32x16>; \
+    p.puLUMA_64x32.prim = fname<LUMA_64x32>; \
+    p.puLUMA_32x64.prim = fname<LUMA_32x64>; \
+    p.puLUMA_16x12.prim = fname<LUMA_16x12>; \
+    p.puLUMA_12x16.prim = fname<LUMA_12x16>; \
+    p.puLUMA_16x4.prim  = fname<LUMA_16x4>; \
+    p.puLUMA_4x16.prim  = fname<LUMA_4x16>; \
+    p.puLUMA_32x24.prim = fname<LUMA_32x24>; \
+    p.puLUMA_24x32.prim = fname<LUMA_24x32>; \
+    p.puLUMA_32x8.prim  = fname<LUMA_32x8>; \
+    p.puLUMA_8x32.prim  = fname<LUMA_8x32>; \
+    p.puLUMA_64x48.prim = fname<LUMA_64x48>; \
+    p.puLUMA_48x64.prim = fname<LUMA_48x64>; \
+    p.puLUMA_64x16.prim = fname<LUMA_64x16>; \
+    p.puLUMA_16x64.prim = fname<LUMA_16x64>
+
+#define ALL_CHROMA_420_PU_TYPED(prim, fncdef, fname, cpu)               \
+    p.chromaX265_CSP_I420.puCHROMA_420_4x4.prim   = fncdef PFX(fname ## _4x4_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_8x8.prim   = fncdef PFX(fname ## _8x8_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_4x2.prim   = fncdef PFX(fname ## _4x2_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_2x4.prim   = fncdef PFX(fname ## _2x4_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_8x4.prim   = fncdef PFX(fname ## _8x4_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_4x8.prim   = fncdef PFX(fname ## _4x8_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_16x8.prim  = fncdef PFX(fname ## _16x8_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_8x16.prim  = fncdef PFX(fname ## _8x16_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_32x16.prim = fncdef PFX(fname ## _32x16_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_16x32.prim = fncdef PFX(fname ## _16x32_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_8x6.prim   = fncdef PFX(fname ## _8x6_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_6x8.prim   = fncdef PFX(fname ## _6x8_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_8x2.prim   = fncdef PFX(fname ## _8x2_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_2x8.prim   = fncdef PFX(fname ## _2x8_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_16x12.prim = fncdef PFX(fname ## _16x12_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_12x16.prim = fncdef PFX(fname ## _12x16_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_16x4.prim  = fncdef PFX(fname ## _16x4_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_4x16.prim  = fncdef PFX(fname ## _4x16_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_32x24.prim = fncdef PFX(fname ## _32x24_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_24x32.prim = fncdef PFX(fname ## _24x32_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_32x8.prim  = fncdef PFX(fname ## _32x8_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_8x32.prim  = fncdef PFX(fname ## _8x32_ ## cpu)
+#define ALL_CHROMA_420_PU(prim, fname, cpu) ALL_CHROMA_420_PU_TYPED(prim, , fname, cpu)
+
+#define ALL_CHROMA_420_4x4_PU_TYPED(prim, fncdef, fname, cpu) \
+    p.chromaX265_CSP_I420.puCHROMA_420_4x4.prim   = fncdef PFX(fname ## _4x4_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_8x2.prim   = fncdef PFX(fname ## _8x2_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_8x8.prim   = fncdef PFX(fname ## _8x8_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_8x4.prim   = fncdef PFX(fname ## _8x4_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_8x6.prim   = fncdef PFX(fname ## _8x6_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_4x8.prim   = fncdef PFX(fname ## _4x8_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_16x8.prim  = fncdef PFX(fname ## _16x8_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_8x16.prim  = fncdef PFX(fname ## _8x16_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_32x16.prim = fncdef PFX(fname ## _32x16_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_16x32.prim = fncdef PFX(fname ## _16x32_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_16x12.prim = fncdef PFX(fname ## _16x12_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_12x16.prim = fncdef PFX(fname ## _12x16_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_16x4.prim  = fncdef PFX(fname ## _16x4_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_4x16.prim  = fncdef PFX(fname ## _4x16_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_32x24.prim = fncdef PFX(fname ## _32x24_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_24x32.prim = fncdef PFX(fname ## _24x32_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_32x8.prim  = fncdef PFX(fname ## _32x8_ ## cpu); \
+    p.chromaX265_CSP_I420.puCHROMA_420_8x32.prim  = fncdef PFX(fname ## _8x32_ ## cpu)
+#define ALL_CHROMA_420_4x4_PU(prim, fname, cpu) ALL_CHROMA_420_4x4_PU_TYPED(prim, , fname, cpu)
+
+#define ALL_CHROMA_422_PU_TYPED(prim, fncdef, fname, cpu)               \
+    p.chromaX265_CSP_I422.puCHROMA_422_4x8.prim   = fncdef PFX(fname ## _4x8_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_8x16.prim  = fncdef PFX(fname ## _8x16_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_16x32.prim = fncdef PFX(fname ## _16x32_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_32x64.prim = fncdef PFX(fname ## _32x64_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_4x4.prim   = fncdef PFX(fname ## _4x4_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_2x8.prim   = fncdef PFX(fname ## _2x8_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_8x8.prim   = fncdef PFX(fname ## _8x8_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_4x16.prim  = fncdef PFX(fname ## _4x16_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_8x32.prim  = fncdef PFX(fname ## _8x32_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_16x64.prim = fncdef PFX(fname ## _16x64_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_8x12.prim  = fncdef PFX(fname ## _8x12_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_6x16.prim  = fncdef PFX(fname ## _6x16_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_8x4.prim   = fncdef PFX(fname ## _8x4_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_2x16.prim  = fncdef PFX(fname ## _2x16_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_16x24.prim = fncdef PFX(fname ## _16x24_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_12x32.prim = fncdef PFX(fname ## _12x32_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_16x8.prim  = fncdef PFX(fname ## _16x8_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_4x32.prim  = fncdef PFX(fname ## _4x32_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_32x48.prim = fncdef PFX(fname ## _32x48_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_24x64.prim = fncdef PFX(fname ## _24x64_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_32x16.prim = fncdef PFX(fname ## _32x16_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_8x64.prim  = fncdef PFX(fname ## _8x64_ ## cpu)
+#define ALL_CHROMA_422_PU(prim, fname, cpu) ALL_CHROMA_422_PU_TYPED(prim, , fname, cpu)
+
+#define ALL_CHROMA_422_PU_TYPED_1(prim, fncdef, fname, cpu)               \
+    p.chromaX265_CSP_I422.puCHROMA_422_4x8.prim   = fncdef PFX(fname ## _4x8_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_8x16.prim  = fncdef PFX(fname ## _8x16_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_16x32.prim = fncdef PFX(fname ## _16x32_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_32x64.prim = fncdef PFX(fname ## _32x64_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_4x4.prim   = fncdef PFX(fname ## _4x4_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_8x8.prim   = fncdef PFX(fname ## _8x8_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_4x16.prim  = fncdef PFX(fname ## _4x16_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_8x32.prim  = fncdef PFX(fname ## _8x32_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_16x64.prim = fncdef PFX(fname ## _16x64_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_8x12.prim  = fncdef PFX(fname ## _8x12_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_8x4.prim   = fncdef PFX(fname ## _8x4_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_16x24.prim = fncdef PFX(fname ## _16x24_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_12x32.prim = fncdef PFX(fname ## _12x32_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_16x8.prim  = fncdef PFX(fname ## _16x8_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_4x32.prim  = fncdef PFX(fname ## _4x32_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_32x48.prim = fncdef PFX(fname ## _32x48_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_24x64.prim = fncdef PFX(fname ## _24x64_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_32x16.prim = fncdef PFX(fname ## _32x16_ ## cpu); \
+    p.chromaX265_CSP_I422.puCHROMA_422_8x64.prim  = fncdef PFX(fname ## _8x64_ ## cpu)
+#define ALL_CHROMA_422_PU_1(prim, fname, cpu) ALL_CHROMA_422_PU_TYPED_1(prim, , fname, cpu)
+
+#define ALL_CHROMA_444_PU_TYPED(prim, fncdef, fname, cpu) \
+    p.chromaX265_CSP_I444.puLUMA_4x4.prim   = fncdef PFX(fname ## _4x4_ ## cpu); \
+    p.chromaX265_CSP_I444.puLUMA_8x8.prim   = fncdef PFX(fname ## _8x8_ ## cpu); \
+    p.chromaX265_CSP_I444.puLUMA_16x16.prim = fncdef PFX(fname ## _16x16_ ## cpu); \
+    p.chromaX265_CSP_I444.puLUMA_32x32.prim = fncdef PFX(fname ## _32x32_ ## cpu); \
+    p.chromaX265_CSP_I444.puLUMA_64x64.prim = fncdef PFX(fname ## _64x64_ ## cpu); \
+    p.chromaX265_CSP_I444.puLUMA_8x4.prim   = fncdef PFX(fname ## _8x4_ ## cpu); \
+    p.chromaX265_CSP_I444.puLUMA_4x8.prim   = fncdef PFX(fname ## _4x8_ ## cpu); \
+    p.chromaX265_CSP_I444.puLUMA_16x8.prim  = fncdef PFX(fname ## _16x8_ ## cpu); \
+    p.chromaX265_CSP_I444.puLUMA_8x16.prim  = fncdef PFX(fname ## _8x16_ ## cpu); \
+    p.chromaX265_CSP_I444.puLUMA_16x32.prim = fncdef PFX(fname ## _16x32_ ## cpu); \
+    p.chromaX265_CSP_I444.puLUMA_32x16.prim = fncdef PFX(fname ## _32x16_ ## cpu); \
+    p.chromaX265_CSP_I444.puLUMA_64x32.prim = fncdef PFX(fname ## _64x32_ ## cpu); \

x265_3.5.tar.gz/source/common/aarch64/asm.S Changed

x265_3.5.tar.gz/source/common/aarch64/blockcopy8.S Added

@@ -0,0 +1,1322 @@
+/*****************************************************************************
+ * Copyright (C) 2021 MulticoreWare, Inc
+ *
+ * Authors: Sebastian Pop <spop@amazon.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
+ *
+ * This program is also available under a commercial proprietary license.
+ * For more information, contact us at license @ x265.com.
+ *****************************************************************************/
+
+#include "asm.S"
+
+#ifdef __APPLE__
+.section __RODATA,__rodata
+#else
+.section .rodata
+#endif
+
+.align 4
+
+.text
+
+/* void blockcopy_sp(pixel* a, intptr_t stridea, const int16_t* b, intptr_t strideb)
+ *
+ * r0   - a
+ * r1   - stridea
+ * r2   - b
+ * r3   - strideb */
+function PFX(blockcopy_sp_4x4_neon)
+    lsl             x3, x3, #1
+.rept 2
+    ld1             {v0.8h}, x2, x3
+    ld1             {v1.8h}, x2, x3
+    xtn             v0.8b, v0.8h
+    xtn             v1.8b, v1.8h
+    st1             {v0.s}0, x0, x1
+    st1             {v1.s}0, x0, x1
+.endr
+    ret
+endfunc
+
+function PFX(blockcopy_sp_8x8_neon)
+    lsl             x3, x3, #1
+.rept 4
+    ld1             {v0.8h}, x2, x3
+    ld1             {v1.8h}, x2, x3
+    xtn             v0.8b, v0.8h
+    xtn             v1.8b, v1.8h
+    st1             {v0.d}0, x0, x1
+    st1             {v1.d}0, x0, x1
+.endr
+    ret
+endfunc
+
+function PFX(blockcopy_sp_16x16_neon)
+    lsl             x3, x3, #1
+    movrel          x11, xtn_xtn2_table
+    ld1             {v31.16b}, x11
+.rept 8
+    ld1             {v0.8h-v1.8h}, x2, x3
+    ld1             {v2.8h-v3.8h}, x2, x3
+    tbl             v0.16b, {v0.16b,v1.16b}, v31.16b
+    tbl             v1.16b, {v2.16b,v3.16b}, v31.16b
+    st1             {v0.16b}, x0, x1
+    st1             {v1.16b}, x0, x1
+.endr
+    ret
+endfunc
+
+function PFX(blockcopy_sp_32x32_neon)
+    mov             w12, #4
+    lsl             x3, x3, #1
+    movrel          x11, xtn_xtn2_table
+    ld1             {v31.16b}, x11
+.loop_csp32:
+    sub             w12, w12, #1
+.rept 4
+    ld1             {v0.8h-v3.8h}, x2, x3
+    ld1             {v4.8h-v7.8h}, x2, x3
+    tbl             v0.16b, {v0.16b,v1.16b}, v31.16b
+    tbl             v1.16b, {v2.16b,v3.16b}, v31.16b
+    tbl             v2.16b, {v4.16b,v5.16b}, v31.16b
+    tbl             v3.16b, {v6.16b,v7.16b}, v31.16b
+    st1             {v0.16b-v1.16b}, x0, x1
+    st1             {v2.16b-v3.16b}, x0, x1
+.endr
+    cbnz            w12, .loop_csp32
+    ret
+endfunc
+
+function PFX(blockcopy_sp_64x64_neon)
+    mov             w12, #16
+    lsl             x3, x3, #1
+    sub             x3, x3, #64
+    movrel          x11, xtn_xtn2_table
+    ld1             {v31.16b}, x11
+.loop_csp64:
+    sub             w12, w12, #1
+.rept 4
+    ld1             {v0.8h-v3.8h}, x2, #64
+    ld1             {v4.8h-v7.8h}, x2, x3
+    tbl             v0.16b, {v0.16b,v1.16b}, v31.16b
+    tbl             v1.16b, {v2.16b,v3.16b}, v31.16b
+    tbl             v2.16b, {v4.16b,v5.16b}, v31.16b
+    tbl             v3.16b, {v6.16b,v7.16b}, v31.16b
+    st1             {v0.16b-v3.16b}, x0, x1
+.endr
+    cbnz            w12, .loop_csp64
+    ret
+endfunc
+
+// void blockcopy_ps(int16_t* a, intptr_t stridea, const pixel* b, intptr_t strideb)
+function PFX(blockcopy_ps_4x4_neon)
+    lsl             x1, x1, #1
+.rept 2
+    ld1             {v0.8b}, x2, x3
+    ld1             {v1.8b}, x2, x3
+    uxtl            v0.8h, v0.8b
+    uxtl            v1.8h, v1.8b
+    st1             {v0.4h}, x0, x1
+    st1             {v1.4h}, x0, x1
+.endr
+    ret
+endfunc
+
+function PFX(blockcopy_ps_8x8_neon)
+    lsl             x1, x1, #1
+.rept 4
+    ld1             {v0.8b}, x2, x3
+    ld1             {v1.8b}, x2, x3
+    uxtl            v0.8h, v0.8b
+    uxtl            v1.8h, v1.8b
+    st1             {v0.8h}, x0, x1
+    st1             {v1.8h}, x0, x1
+.endr
+    ret
+endfunc
+
+function PFX(blockcopy_ps_16x16_neon)
+    lsl             x1, x1, #1
+.rept 8
+    ld1             {v4.16b}, x2, x3
+    ld1             {v5.16b}, x2, x3
+    uxtl            v0.8h, v4.8b
+    uxtl2           v1.8h, v4.16b
+    uxtl            v2.8h, v5.8b
+    uxtl2           v3.8h, v5.16b
+    st1             {v0.8h-v1.8h}, x0, x1
+    st1             {v2.8h-v3.8h}, x0, x1
+.endr
+    ret
+endfunc
+
+function PFX(blockcopy_ps_32x32_neon)
+    lsl             x1, x1, #1
+    mov             w12, #4
+.loop_cps32:
+    sub             w12, w12, #1
+.rept 4
+    ld1             {v16.16b-v17.16b}, x2, x3
+    ld1             {v18.16b-v19.16b}, x2, x3
+    uxtl            v0.8h, v16.8b
+    uxtl2           v1.8h, v16.16b
+    uxtl            v2.8h, v17.8b
+    uxtl2           v3.8h, v17.16b
+    uxtl            v4.8h, v18.8b
+    uxtl2           v5.8h, v18.16b
+    uxtl            v6.8h, v19.8b
+    uxtl2           v7.8h, v19.16b
+    st1             {v0.8h-v3.8h}, x0, x1
+    st1             {v4.8h-v7.8h}, x0, x1
+.endr
+    cbnz            w12, .loop_cps32
+    ret
+endfunc
+
+function PFX(blockcopy_ps_64x64_neon)
+    lsl             x1, x1, #1
+    sub             x1, x1, #64
+    mov             w12, #16
+.loop_cps64:
+    sub             w12, w12, #1
+.rept 4
+    ld1             {v16.16b-v19.16b}, x2, x3
+    uxtl            v0.8h, v16.8b
+    uxtl2           v1.8h, v16.16b

x265_3.5.tar.gz/source/common/aarch64/dct-prim.cpp Added

@@ -0,0 +1,947 @@
+#include "dct-prim.h"
+
+
+#if HAVE_NEON
+
+#include <arm_neon.h>
+
+
+namespace
+{
+using namespace X265_NS;
+
+
+static int16x8_t rev16(const int16x8_t a)
+{
+    static const int8x16_t tbl = {14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1};
+    return vqtbx1q_u8(a, a, tbl);
+}
+
+static int32x4_t rev32(const int32x4_t a)
+{
+    static const int8x16_t tbl = {12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3};
+    return vqtbx1q_u8(a, a, tbl);
+}
+
+static void transpose_4x4x16(int16x4_t &x0, int16x4_t &x1, int16x4_t &x2, int16x4_t &x3)
+{
+    int16x4_t s0, s1, s2, s3;
+    s0 = vtrn1_s32(x0, x2);
+    s1 = vtrn1_s32(x1, x3);
+    s2 = vtrn2_s32(x0, x2);
+    s3 = vtrn2_s32(x1, x3);
+
+    x0 = vtrn1_s16(s0, s1);
+    x1 = vtrn2_s16(s0, s1);
+    x2 = vtrn1_s16(s2, s3);
+    x3 = vtrn2_s16(s2, s3);
+}
+
+
+
+static int scanPosLast_opt(const uint16_t *scan, const coeff_t *coeff, uint16_t *coeffSign, uint16_t *coeffFlag,
+                           uint8_t *coeffNum, int numSig, const uint16_t * /*scanCG4x4*/, const int /*trSize*/)
+{
+
+    // This is an optimized function for scanPosLast, which removes the rmw dependency, once integrated into mainline x265, should replace reference implementation
+    // For clarity, left the original reference code in comments
+    int scanPosLast = 0;
+
+    uint16_t cSign = 0;
+    uint16_t cFlag = 0;
+    uint8_t cNum = 0;
+
+    uint32_t prevcgIdx = 0;
+    do
+    {
+        const uint32_t cgIdx = (uint32_t)scanPosLast >> MLS_CG_SIZE;
+
+        const uint32_t posLast = scanscanPosLast;
+
+        const int curCoeff = coeffposLast;
+        const uint32_t isNZCoeff = (curCoeff != 0);
+        /*
+        NOTE: the new algorithm is complicated, so I keep reference code here
+        uint32_t posy   = posLast >> log2TrSize;
+        uint32_t posx   = posLast - (posy << log2TrSize);
+        uint32_t blkIdx0 = ((posy >> MLS_CG_LOG2_SIZE) << codingParameters.log2TrSizeCG) + (posx >> MLS_CG_LOG2_SIZE);
+        const uint32_t blkIdx = ((posLast >> (2 * MLS_CG_LOG2_SIZE)) & ~maskPosXY) + ((posLast >> MLS_CG_LOG2_SIZE) & maskPosXY);
+        sigCoeffGroupFlag64 |= ((uint64_t)isNZCoeff << blkIdx);
+        */
+
+        // get L1 sig map
+        numSig -= isNZCoeff;
+
+        if (scanPosLast % (1 << MLS_CG_SIZE) == 0)
+        {
+            coeffSignprevcgIdx = cSign;
+            coeffFlagprevcgIdx = cFlag;
+            coeffNumprevcgIdx = cNum;
+            cSign = 0;
+            cFlag = 0;
+            cNum = 0;
+        }
+        // TODO: optimize by instruction BTS
+        cSign += (uint16_t)(((curCoeff < 0) ? 1 : 0) << cNum);
+        cFlag = (cFlag << 1) + (uint16_t)isNZCoeff;
+        cNum += (uint8_t)isNZCoeff;
+        prevcgIdx = cgIdx;
+        scanPosLast++;
+    }
+    while (numSig > 0);
+
+    coeffSignprevcgIdx = cSign;
+    coeffFlagprevcgIdx = cFlag;
+    coeffNumprevcgIdx = cNum;
+    return scanPosLast - 1;
+}
+
+
+#if (MLS_CG_SIZE == 4)
+template<int log2TrSize>
+static void nonPsyRdoQuant_neon(int16_t *m_resiDctCoeff, int64_t *costUncoded, int64_t *totalUncodedCost,
+                                int64_t *totalRdCost, uint32_t blkPos)
+{
+    const int transformShift = MAX_TR_DYNAMIC_RANGE - X265_DEPTH -
+                               log2TrSize; /* Represents scaling through forward transform */
+    const int scaleBits = SCALE_BITS - 2 * transformShift;
+    const uint32_t trSize = 1 << log2TrSize;
+
+    int64x2_t vcost_sum_0 = vdupq_n_s64(0);
+    int64x2_t vcost_sum_1 = vdupq_n_s64(0);
+    for (int y = 0; y < MLS_CG_SIZE; y++)
+    {
+        int16x4_t in = *(int16x4_t *)&m_resiDctCoeffblkPos;
+        int32x4_t mul = vmull_s16(in, in);
+        int64x2_t cost0, cost1;
+        cost0 = vshll_n_s32(vget_low_s32(mul), scaleBits);
+        cost1 = vshll_high_n_s32(mul, scaleBits);
+        *(int64x2_t *)&costUncodedblkPos + 0 = cost0;
+        *(int64x2_t *)&costUncodedblkPos + 2 = cost1;
+        vcost_sum_0 = vaddq_s64(vcost_sum_0, cost0);
+        vcost_sum_1 = vaddq_s64(vcost_sum_1, cost1);
+        blkPos += trSize;
+    }
+    int64_t sum = vaddvq_s64(vaddq_s64(vcost_sum_0, vcost_sum_1));
+    *totalUncodedCost += sum;
+    *totalRdCost += sum;
+}
+
+template<int log2TrSize>
+static void psyRdoQuant_neon(int16_t *m_resiDctCoeff, int16_t *m_fencDctCoeff, int64_t *costUncoded,
+                             int64_t *totalUncodedCost, int64_t *totalRdCost, int64_t *psyScale, uint32_t blkPos)
+{
+    const int transformShift = MAX_TR_DYNAMIC_RANGE - X265_DEPTH -
+                               log2TrSize; /* Represents scaling through forward transform */
+    const int scaleBits = SCALE_BITS - 2 * transformShift;
+    const uint32_t trSize = 1 << log2TrSize;
+    //using preprocessor to bypass clang bug
+    const int max = X265_MAX(0, (2 * transformShift + 1));
+
+    int64x2_t vcost_sum_0 = vdupq_n_s64(0);
+    int64x2_t vcost_sum_1 = vdupq_n_s64(0);
+    int32x4_t vpsy = vdupq_n_s32(*psyScale);
+    for (int y = 0; y < MLS_CG_SIZE; y++)
+    {
+        int32x4_t signCoef = vmovl_s16(*(int16x4_t *)&m_resiDctCoeffblkPos);
+        int32x4_t predictedCoef = vsubq_s32(vmovl_s16(*(int16x4_t *)&m_fencDctCoeffblkPos), signCoef);
+        int64x2_t cost0, cost1;
+        cost0 = vmull_s32(vget_low_s32(signCoef), vget_low_s32(signCoef));
+        cost1 = vmull_high_s32(signCoef, signCoef);
+        cost0 = vshlq_n_s64(cost0, scaleBits);
+        cost1 = vshlq_n_s64(cost1, scaleBits);
+        int64x2_t neg0 = vmull_s32(vget_low_s32(predictedCoef), vget_low_s32(vpsy));
+        int64x2_t neg1 = vmull_high_s32(predictedCoef, vpsy);
+        if (max > 0)
+        {
+            int64x2_t shift = vdupq_n_s64(-max);
+            neg0 = vshlq_s64(neg0, shift);
+            neg1 = vshlq_s64(neg1, shift);
+        }
+        cost0 = vsubq_s64(cost0, neg0);
+        cost1 = vsubq_s64(cost1, neg1);
+        *(int64x2_t *)&costUncodedblkPos + 0 = cost0;
+        *(int64x2_t *)&costUncodedblkPos + 2 = cost1;
+        vcost_sum_0 = vaddq_s64(vcost_sum_0, cost0);
+        vcost_sum_1 = vaddq_s64(vcost_sum_1, cost1);
+
+        blkPos += trSize;
+    }
+    int64_t sum = vaddvq_s64(vaddq_s64(vcost_sum_0, vcost_sum_1));
+    *totalUncodedCost += sum;
+    *totalRdCost += sum;
+}
+
+#else
+#error "MLS_CG_SIZE must be 4 for neon version"
+#endif
+
+
+
+template<int trSize>
+int  count_nonzero_neon(const int16_t *quantCoeff)
+{
+    X265_CHECK(((intptr_t)quantCoeff & 15) == 0, "quant buffer not aligned\n");
+    int count = 0;
+    int16x8_t vcount = vdupq_n_s16(0);
+    const int numCoeff = trSize * trSize;
+    int i = 0;
+    for (; (i + 8) <= numCoeff; i += 8)
+    {
+        int16x8_t in = *(int16x8_t *)&quantCoeffi;
+        vcount = vaddq_s16(vcount, vtstq_s16(in, in));
+    }
+    for (; i < numCoeff; i++)
+    {
+        count += quantCoeffi != 0;
+    }
+
+    return count - vaddvq_s16(vcount);

x265_3.5.tar.gz/source/common/aarch64/dct-prim.h Added

x265_3.5.tar.gz/source/common/aarch64/filter-prim.cpp Added

@@ -0,0 +1,973 @@
+#if HAVE_NEON
+
+#include "filter-prim.h"
+#include <arm_neon.h>
+
+namespace
+{
+
+using namespace X265_NS;
+
+
+template<int width, int height>
+void filterPixelToShort_neon(const pixel *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride)
+{
+    const int shift = IF_INTERNAL_PREC - X265_DEPTH;
+    int row, col;
+    const int16x8_t off = vdupq_n_s16(IF_INTERNAL_OFFS);
+    for (row = 0; row < height; row++)
+    {
+
+        for (col = 0; col < width; col += 8)
+        {
+            int16x8_t in;
+
+#if HIGH_BIT_DEPTH
+            in = *(int16x8_t *)&srccol;
+#else
+            in = vmovl_u8(*(uint8x8_t *)&srccol);
+#endif
+
+            int16x8_t tmp = vshlq_n_s16(in, shift);
+            tmp = vsubq_s16(tmp, off);
+            *(int16x8_t *)&dstcol = tmp;
+
+        }
+
+        src += srcStride;
+        dst += dstStride;
+    }
+}
+
+
+template<int N, int width, int height>
+void interp_horiz_pp_neon(const pixel *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int coeffIdx)
+{
+    const int16_t *coeff = (N == 4) ? g_chromaFiltercoeffIdx : g_lumaFiltercoeffIdx;
+    int headRoom = IF_FILTER_PREC;
+    int offset = (1 << (headRoom - 1));
+    uint16_t maxVal = (1 << X265_DEPTH) - 1;
+    int cStride = 1;
+
+    src -= (N / 2 - 1) * cStride;
+    int16x8_t vc;
+    vc = *(int16x8_t *)coeff;
+    int16x4_t low_vc = vget_low_s16(vc);
+    int16x4_t high_vc = vget_high_s16(vc);
+
+    const int32x4_t voffset = vdupq_n_s32(offset);
+    const int32x4_t vhr = vdupq_n_s32(-headRoom);
+
+    int row, col;
+    for (row = 0; row < height; row++)
+    {
+        for (col = 0; col < width; col += 8)
+        {
+            int32x4_t vsum1, vsum2;
+
+            int16x8_t inputN;
+
+            for (int i = 0; i < N; i++)
+            {
+#if HIGH_BIT_DEPTH
+                inputi = *(int16x8_t *)&srccol + i;
+#else
+                inputi = vmovl_u8(*(uint8x8_t *)&srccol + i);
+#endif
+            }
+            vsum1 = voffset;
+            vsum2 = voffset;
+
+            vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input0), low_vc, 0);
+            vsum2 = vmlal_high_lane_s16(vsum2, input0, low_vc, 0);
+
+            vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input1), low_vc, 1);
+            vsum2 = vmlal_high_lane_s16(vsum2, input1, low_vc, 1);
+
+            vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input2), low_vc, 2);
+            vsum2 = vmlal_high_lane_s16(vsum2, input2, low_vc, 2);
+
+            vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input3), low_vc, 3);
+            vsum2 = vmlal_high_lane_s16(vsum2, input3, low_vc, 3);
+
+            if (N == 8)
+            {
+                vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input4), high_vc, 0);
+                vsum2 = vmlal_high_lane_s16(vsum2, input4, high_vc, 0);
+                vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input5), high_vc, 1);
+                vsum2 = vmlal_high_lane_s16(vsum2, input5, high_vc, 1);
+                vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input6), high_vc, 2);
+                vsum2 = vmlal_high_lane_s16(vsum2, input6, high_vc, 2);
+                vsum1 = vmlal_lane_s16(vsum1, vget_low_s16(input7), high_vc, 3);
+                vsum2 = vmlal_high_lane_s16(vsum2, input7, high_vc, 3);
+
+            }
+
+            vsum1 = vshlq_s32(vsum1, vhr);
+            vsum2 = vshlq_s32(vsum2, vhr);
+
+            int16x8_t vsum = vuzp1q_s16(vsum1, vsum2);
+            vsum = vminq_s16(vsum, vdupq_n_s16(maxVal));
+            vsum = vmaxq_s16(vsum, vdupq_n_s16(0));
+#if HIGH_BIT_DEPTH
+            *(int16x8_t *)&dstcol = vsum;
+#else
+            uint8x16_t usum = vuzp1q_u8(vsum, vsum);
+            *(uint8x8_t *)&dstcol = vget_low_u8(usum);
+#endif
+
+        }
+
+        src += srcStride;
+        dst += dstStride;
+    }
+}
+
+#if HIGH_BIT_DEPTH
+
+template<int N, int width, int height>
+void interp_horiz_ps_neon(const uint16_t *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride, int coeffIdx,
+                          int isRowExt)
+{
+    const int16_t *coeff = (N == 4) ? g_chromaFiltercoeffIdx : g_lumaFiltercoeffIdx;
+    const int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
+    const int shift = IF_FILTER_PREC - headRoom;
+    const int offset = (unsigned) - IF_INTERNAL_OFFS << shift;
+
+    int blkheight = height;
+    src -= N / 2 - 1;
+
+    if (isRowExt)
+    {
+        src -= (N / 2 - 1) * srcStride;
+        blkheight += N - 1;
+    }
+    int32x4_t vc0 = vmovl_s16(*(int16x4_t *)coeff);
+    int32x4_t vc1;
+
+    if (N == 8)
+    {
+        vc1 = vmovl_s16(*(int16x4_t *)(coeff + 4));
+    }
+
+    const int32x4_t voffset = vdupq_n_s32(offset);
+    const int32x4_t vhr = vdupq_n_s32(-shift);
+
+    int row, col;
+    for (row = 0; row < blkheight; row++)
+    {
+        for (col = 0; col < width; col += 4)
+        {
+            int32x4_t vsum;
+
+            int32x4_t inputN;
+
+            for (int i = 0; i < N; i++)
+            {
+                inputi = vmovl_s16(*(int16x4_t *)&srccol + i);
+            }
+            vsum = voffset;
+            vsum = vmlaq_laneq_s32(vsum, (input0), vc0, 0);
+            vsum = vmlaq_laneq_s32(vsum, (input1), vc0, 1);
+            vsum = vmlaq_laneq_s32(vsum, (input2), vc0, 2);
+            vsum = vmlaq_laneq_s32(vsum, (input3), vc0, 3);
+
+
+            if (N == 8)
+            {
+                vsum = vmlaq_laneq_s32(vsum, (input4), vc1, 0);
+                vsum = vmlaq_laneq_s32(vsum, (input5), vc1, 1);
+                vsum = vmlaq_laneq_s32(vsum, (input6), vc1, 2);
+                vsum = vmlaq_laneq_s32(vsum, (input7), vc1, 3);
+
+            }
+
+            vsum = vshlq_s32(vsum, vhr);
+            *(int16x4_t *)&dstcol = vmovn_u32(vsum);
+        }
+
+        src += srcStride;
+        dst += dstStride;
+    }
+}
+
+
+#else
+
+template<int N, int width, int height>
+void interp_horiz_ps_neon(const uint8_t *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride, int coeffIdx,
+                          int isRowExt)

x265_3.5.tar.gz/source/common/aarch64/filter-prim.h Added

x265_3.5.tar.gz/source/common/aarch64/fun-decls.h Added

@@ -0,0 +1,224 @@
+/*****************************************************************************
+ * Copyright (C) 2021 MulticoreWare, Inc
+ *
+ * Authors: Sebastian Pop <spop@amazon.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
+ *
+ * This program is also available under a commercial proprietary license.
+ * For more information, contact us at license @ x265.com.
+ *****************************************************************************/
+
+#define FUNCDEF_TU(ret, name, cpu, ...) \
+    ret PFX(name ## _4x4_ ## cpu(__VA_ARGS__)); \
+    ret PFX(name ## _8x8_ ## cpu(__VA_ARGS__)); \
+    ret PFX(name ## _16x16_ ## cpu(__VA_ARGS__)); \
+    ret PFX(name ## _32x32_ ## cpu(__VA_ARGS__)); \
+    ret PFX(name ## _64x64_ ## cpu(__VA_ARGS__))
+
+#define FUNCDEF_TU_S(ret, name, cpu, ...) \
+    ret PFX(name ## _4_ ## cpu(__VA_ARGS__)); \
+    ret PFX(name ## _8_ ## cpu(__VA_ARGS__)); \
+    ret PFX(name ## _16_ ## cpu(__VA_ARGS__)); \
+    ret PFX(name ## _32_ ## cpu(__VA_ARGS__)); \
+    ret PFX(name ## _64_ ## cpu(__VA_ARGS__))
+
+#define FUNCDEF_TU_S2(ret, name, cpu, ...) \
+    ret PFX(name ## 4_ ## cpu(__VA_ARGS__)); \
+    ret PFX(name ## 8_ ## cpu(__VA_ARGS__)); \
+    ret PFX(name ## 16_ ## cpu(__VA_ARGS__)); \
+    ret PFX(name ## 32_ ## cpu(__VA_ARGS__)); \
+    ret PFX(name ## 64_ ## cpu(__VA_ARGS__))
+
+#define FUNCDEF_PU(ret, name, cpu, ...) \
+    ret PFX(name ## _4x4_   ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _8x8_   ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _16x16_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _32x32_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _64x64_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _8x4_   ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _4x8_   ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _16x8_  ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _8x16_  ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _16x32_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _32x16_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _64x32_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _32x64_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _16x12_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _12x16_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _16x4_  ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _4x16_  ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _32x24_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _24x32_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _32x8_  ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _8x32_  ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _64x48_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _48x64_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _64x16_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _16x64_ ## cpu)(__VA_ARGS__)
+
+#define FUNCDEF_CHROMA_PU(ret, name, cpu, ...) \
+    FUNCDEF_PU(ret, name, cpu, __VA_ARGS__); \
+    ret PFX(name ## _4x2_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _4x4_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _2x4_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _8x2_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _2x8_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _8x6_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _6x8_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _8x12_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _12x8_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _6x16_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _16x6_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _2x16_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _16x2_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _4x12_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _12x4_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _32x12_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _12x32_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _32x4_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _4x32_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _32x48_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _48x32_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _16x24_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _24x16_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _8x64_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _64x8_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _64x24_ ## cpu)(__VA_ARGS__); \
+    ret PFX(name ## _24x64_ ## cpu)(__VA_ARGS__);
+
+#define DECLS(cpu) \
+    FUNCDEF_TU(void, cpy2Dto1D_shl, cpu, int16_t* dst, const int16_t* src, intptr_t srcStride, int shift); \
+    FUNCDEF_TU(void, cpy2Dto1D_shr, cpu, int16_t* dst, const int16_t* src, intptr_t srcStride, int shift); \
+    FUNCDEF_TU(void, cpy1Dto2D_shl, cpu, int16_t* dst, const int16_t* src, intptr_t srcStride, int shift); \
+    FUNCDEF_TU(void, cpy1Dto2D_shl_aligned, cpu, int16_t* dst, const int16_t* src, intptr_t srcStride, int shift); \
+    FUNCDEF_TU(void, cpy1Dto2D_shr, cpu, int16_t* dst, const int16_t* src, intptr_t srcStride, int shift); \
+    FUNCDEF_TU_S(uint32_t, copy_cnt, cpu, int16_t* dst, const int16_t* src, intptr_t srcStride); \
+    FUNCDEF_TU_S(int, count_nonzero, cpu, const int16_t* quantCoeff); \
+    FUNCDEF_TU(void, blockfill_s, cpu, int16_t* dst, intptr_t dstride, int16_t val); \
+    FUNCDEF_TU(void, blockfill_s_aligned, cpu, int16_t* dst, intptr_t dstride, int16_t val); \
+    FUNCDEF_CHROMA_PU(void, blockcopy_ss, cpu, int16_t* dst, intptr_t dstStride, const int16_t* src, intptr_t srcStride); \
+    FUNCDEF_CHROMA_PU(void, blockcopy_pp, cpu, pixel* dst, intptr_t dstStride, const pixel* src, intptr_t srcStride); \
+    FUNCDEF_PU(void, blockcopy_sp, cpu, pixel* dst, intptr_t dstStride, const int16_t* src, intptr_t srcStride); \
+    FUNCDEF_PU(void, blockcopy_ps, cpu, int16_t* dst, intptr_t dstStride, const pixel* src, intptr_t srcStride); \
+    FUNCDEF_PU(void, interp_8tap_horiz_pp, cpu, const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \
+    FUNCDEF_PU(void, interp_8tap_horiz_ps, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); \
+    FUNCDEF_PU(void, interp_8tap_vert_pp, cpu, const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \
+    FUNCDEF_PU(void, interp_8tap_vert_ps, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx); \
+    FUNCDEF_PU(void, interp_8tap_vert_sp, cpu, const int16_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \
+    FUNCDEF_PU(void, interp_8tap_vert_ss, cpu, const int16_t* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx); \
+    FUNCDEF_PU(void, interp_8tap_hv_pp, cpu, const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int idxX, int idxY); \
+    FUNCDEF_CHROMA_PU(void, filterPixelToShort, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride); \
+    FUNCDEF_CHROMA_PU(void, filterPixelToShort_aligned, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride); \
+    FUNCDEF_CHROMA_PU(void, interp_horiz_pp, cpu, const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \
+    FUNCDEF_CHROMA_PU(void, interp_4tap_horiz_pp, cpu, const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \
+    FUNCDEF_CHROMA_PU(void, interp_horiz_ps, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); \
+    FUNCDEF_CHROMA_PU(void, interp_4tap_horiz_ps, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt); \
+    FUNCDEF_CHROMA_PU(void, interp_4tap_vert_pp, cpu, const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \
+    FUNCDEF_CHROMA_PU(void, interp_4tap_vert_ps, cpu, const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx); \
+    FUNCDEF_CHROMA_PU(void, interp_4tap_vert_sp, cpu, const int16_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx); \
+    FUNCDEF_CHROMA_PU(void, interp_4tap_vert_ss, cpu, const int16_t* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx); \
+    FUNCDEF_CHROMA_PU(void, addAvg, cpu, const int16_t*, const int16_t*, pixel*, intptr_t, intptr_t, intptr_t); \
+    FUNCDEF_CHROMA_PU(void, addAvg_aligned, cpu, const int16_t*, const int16_t*, pixel*, intptr_t, intptr_t, intptr_t); \
+    FUNCDEF_PU(void, pixel_avg_pp, cpu, pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); \
+    FUNCDEF_PU(void, pixel_avg_pp_aligned, cpu, pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int); \
+    FUNCDEF_PU(void, sad_x3, cpu, const pixel*, const pixel*, const pixel*, const pixel*, intptr_t, int32_t*); \
+    FUNCDEF_PU(void, sad_x4, cpu, const pixel*, const pixel*, const pixel*, const pixel*, const pixel*, intptr_t, int32_t*); \
+    FUNCDEF_CHROMA_PU(int, pixel_sad, cpu, const pixel*, intptr_t, const pixel*, intptr_t); \
+    FUNCDEF_CHROMA_PU(sse_t, pixel_ssd_s, cpu, const int16_t*, intptr_t); \
+    FUNCDEF_CHROMA_PU(sse_t, pixel_ssd_s_aligned, cpu, const int16_t*, intptr_t); \
+    FUNCDEF_TU_S(sse_t, pixel_ssd_s, cpu, const int16_t*, intptr_t); \
+    FUNCDEF_TU_S(sse_t, pixel_ssd_s_aligned, cpu, const int16_t*, intptr_t); \
+    FUNCDEF_PU(sse_t, pixel_sse_pp, cpu, const pixel*, intptr_t, const pixel*, intptr_t); \
+    FUNCDEF_CHROMA_PU(sse_t, pixel_sse_ss, cpu, const int16_t*, intptr_t, const int16_t*, intptr_t); \
+    FUNCDEF_PU(void, pixel_sub_ps, cpu, int16_t* a, intptr_t dstride, const pixel* b0, const pixel* b1, intptr_t sstride0, intptr_t sstride1); \
+    FUNCDEF_PU(void, pixel_add_ps, cpu, pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1); \
+    FUNCDEF_PU(void, pixel_add_ps_aligned, cpu, pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1); \
+    FUNCDEF_CHROMA_PU(int, pixel_satd, cpu, const pixel*, intptr_t, const pixel*, intptr_t); \
+    FUNCDEF_TU_S2(void, ssimDist, cpu, const pixel *fenc, uint32_t fStride, const pixel *recon, intptr_t rstride, uint64_t *ssBlock, int shift, uint64_t *ac_k); \
+    FUNCDEF_TU_S2(void, normFact, cpu, const pixel *src, uint32_t blockSize, int shift, uint64_t *z_k)
+
+DECLS(neon);
+
+void x265_pixel_planecopy_cp_neon(const uint8_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int width, int height, int shift);
+
+uint64_t x265_pixel_var_8x8_neon(const pixel* pix, intptr_t stride);
+uint64_t x265_pixel_var_16x16_neon(const pixel* pix, intptr_t stride);
+uint64_t x265_pixel_var_32x32_neon(const pixel* pix, intptr_t stride);
+uint64_t x265_pixel_var_64x64_neon(const pixel* pix, intptr_t stride);
+
+void x265_getResidual4_neon(const pixel* fenc, const pixel* pred, int16_t* residual, intptr_t stride);
+void x265_getResidual8_neon(const pixel* fenc, const pixel* pred, int16_t* residual, intptr_t stride);
+void x265_getResidual16_neon(const pixel* fenc, const pixel* pred, int16_t* residual, intptr_t stride);
+void x265_getResidual32_neon(const pixel* fenc, const pixel* pred, int16_t* residual, intptr_t stride);
+
+void x265_scale1D_128to64_neon(pixel *dst, const pixel *src);
+void x265_scale2D_64to32_neon(pixel* dst, const pixel* src, intptr_t stride);
+
+int x265_pixel_satd_4x4_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_4x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_4x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_4x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_8x4_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_8x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_8x12_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_8x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_8x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_8x64_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_12x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_12x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_16x4_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_16x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_16x12_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_16x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_16x24_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_16x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_16x64_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_24x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_24x64_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_32x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_32x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_32x24_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_32x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_32x48_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_32x64_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_48x64_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_64x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
+int x265_pixel_satd_64x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);

x265_3.5.tar.gz/source/common/aarch64/intrapred-prim.cpp Added

@@ -0,0 +1,264 @@
+#include "common.h"
+#include "primitives.h"
+
+
+#if 1
+#include "arm64-utils.h"
+#include <arm_neon.h>
+
+using namespace X265_NS;
+
+namespace
+{
+
+
+
+template<int width>
+void intra_pred_ang_neon(pixel *dst, intptr_t dstStride, const pixel *srcPix0, int dirMode, int bFilter)
+{
+    int width2 = width << 1;
+    // Flip the neighbours in the horizontal case.
+    int horMode = dirMode < 18;
+    pixel neighbourBuf129;
+    const pixel *srcPix = srcPix0;
+
+    if (horMode)
+    {
+        neighbourBuf0 = srcPix0;
+        //for (int i = 0; i < width << 1; i++)
+        //{
+        //    neighbourBuf1 + i = srcPixwidth2 + 1 + i;
+        //    neighbourBufwidth2 + 1 + i = srcPix1 + i;
+        //}
+        memcpy(&neighbourBuf1, &srcPixwidth2 + 1, sizeof(pixel) * (width << 1));
+        memcpy(&neighbourBufwidth2 + 1, &srcPix1, sizeof(pixel) * (width << 1));
+        srcPix = neighbourBuf;
+    }
+
+    // Intra prediction angle and inverse angle tables.
+    const int8_t angleTable17 = { -32, -26, -21, -17, -13, -9, -5, -2, 0, 2, 5, 9, 13, 17, 21, 26, 32 };
+    const int16_t invAngleTable8 = { 4096, 1638, 910, 630, 482, 390, 315, 256 };
+
+    // Get the prediction angle.
+    int angleOffset = horMode ? 10 - dirMode : dirMode - 26;
+    int angle = angleTable8 + angleOffset;
+
+    // Vertical Prediction.
+    if (!angle)
+    {
+        for (int y = 0; y < width; y++)
+        {
+            memcpy(&dsty * dstStride, srcPix + 1, sizeof(pixel)*width);
+        }
+        if (bFilter)
+        {
+            int topLeft = srcPix0, top = srcPix1;
+            for (int y = 0; y < width; y++)
+            {
+                dsty * dstStride = x265_clip((int16_t)(top + ((srcPixwidth2 + 1 + y - topLeft) >> 1)));
+            }
+        }
+    }
+    else // Angular prediction.
+    {
+        // Get the reference pixels. The reference base is the first pixel to the top (neighbourBuf1).
+        pixel refBuf64;
+        const pixel *ref;
+
+        // Use the projected left neighbours and the top neighbours.
+        if (angle < 0)
+        {
+            // Number of neighbours projected.
+            int nbProjected = -((width * angle) >> 5) - 1;
+            pixel *ref_pix = refBuf + nbProjected + 1;
+
+            // Project the neighbours.
+            int invAngle = invAngleTable- angleOffset - 1;
+            int invAngleSum = 128;
+            for (int i = 0; i < nbProjected; i++)
+            {
+                invAngleSum += invAngle;
+                ref_pix- 2 - i = srcPixwidth2 + (invAngleSum >> 8);
+            }
+
+            // Copy the top-left and top pixels.
+            //for (int i = 0; i < width + 1; i++)
+            //ref_pix-1 + i = srcPixi;
+
+            memcpy(&ref_pix-1, srcPix, (width + 1)*sizeof(pixel));
+            ref = ref_pix;
+        }
+        else // Use the top and top-right neighbours.
+        {
+            ref = srcPix + 1;
+        }
+
+        // Pass every row.
+        int angleSum = 0;
+        for (int y = 0; y < width; y++)
+        {
+            angleSum += angle;
+            int offset = angleSum >> 5;
+            int fraction = angleSum & 31;
+
+            if (fraction) // Interpolate
+            {
+                if (width >= 8 && sizeof(pixel) == 1)
+                {
+                    const int16x8_t f0 = vdupq_n_s16(32 - fraction);
+                    const int16x8_t f1 = vdupq_n_s16(fraction);
+                    for (int x = 0; x < width; x += 8)
+                    {
+                        uint8x8_t in0 = *(uint8x8_t *)&refoffset + x;
+                        uint8x8_t in1 = *(uint8x8_t *)&refoffset + x + 1;
+                        int16x8_t lo = vmlaq_s16(vdupq_n_s16(16), vmovl_u8(in0), f0);
+                        lo = vmlaq_s16(lo, vmovl_u8(in1), f1);
+                        lo = vshrq_n_s16(lo, 5);
+                        *(uint8x8_t *)&dsty * dstStride + x = vmovn_u16(lo);
+                    }
+                }
+                else if (width >= 4 && sizeof(pixel) == 2)
+                {
+                    const int32x4_t f0 = vdupq_n_s32(32 - fraction);
+                    const int32x4_t f1 = vdupq_n_s32(fraction);
+                    for (int x = 0; x < width; x += 4)
+                    {
+                        uint16x4_t in0 = *(uint16x4_t *)&refoffset + x;
+                        uint16x4_t in1 = *(uint16x4_t *)&refoffset + x + 1;
+                        int32x4_t lo = vmlaq_s32(vdupq_n_s32(16), vmovl_u16(in0), f0);
+                        lo = vmlaq_s32(lo, vmovl_u16(in1), f1);
+                        lo = vshrq_n_s32(lo, 5);
+                        *(uint16x4_t *)&dsty * dstStride + x = vmovn_u32(lo);
+                    }
+                }
+                else
+                {
+                    for (int x = 0; x < width; x++)
+                    {
+                        dsty * dstStride + x = (pixel)(((32 - fraction) * refoffset + x + fraction * refoffset + x + 1 + 16) >> 5);
+                    }
+                }
+            }
+            else // Copy.
+            {
+                memcpy(&dsty * dstStride, &refoffset, sizeof(pixel)*width);
+            }
+        }
+    }
+
+    // Flip for horizontal.
+    if (horMode)
+    {
+        if (width == 8)
+        {
+            transpose8x8(dst, dst, dstStride, dstStride);
+        }
+        else if (width == 16)
+        {
+            transpose16x16(dst, dst, dstStride, dstStride);
+        }
+        else if (width == 32)
+        {
+            transpose32x32(dst, dst, dstStride, dstStride);
+        }
+        else
+        {
+            for (int y = 0; y < width - 1; y++)
+            {
+                for (int x = y + 1; x < width; x++)
+                {
+                    pixel tmp              = dsty * dstStride + x;
+                    dsty * dstStride + x = dstx * dstStride + y;
+                    dstx * dstStride + y = tmp;
+                }
+            }
+        }
+    }
+}
+
+template<int log2Size>
+void all_angs_pred_neon(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma)
+{
+    const int size = 1 << log2Size;
+    for (int mode = 2; mode <= 34; mode++)
+    {
+        pixel *srcPix  = (g_intraFilterFlagsmode & size ? filtPix  : refPix);
+        pixel *out = dest + ((mode - 2) << (log2Size * 2));
+
+        intra_pred_ang_neon<size>(out, size, srcPix, mode, bLuma);
+
+        // Optimize code don't flip buffer
+        bool modeHor = (mode < 18);
+
+        // transpose the block if this is a horizontal mode
+        if (modeHor)
+        {
+            if (size == 8)
+            {
+                transpose8x8(out, out, size, size);
+            }

x265_3.5.tar.gz/source/common/aarch64/intrapred-prim.h Added

x265_3.5.tar.gz/source/common/aarch64/ipfilter.S Added

@@ -0,0 +1,2452 @@
+/*****************************************************************************
+ * Copyright (C) 2021 MulticoreWare, Inc
+ *
+ * Authors: Sebastian Pop <spop@amazon.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
+ *
+ * This program is also available under a commercial proprietary license.
+ * For more information, contact us at license @ x265.com.
+ *****************************************************************************/
+
+// Functions in this file:
+// ***** luma_vpp *****
+// ***** luma_vps *****
+// ***** luma_vsp *****
+// ***** luma_vss *****
+// ***** luma_hpp *****
+// ***** luma_hps *****
+// ***** chroma_vpp *****
+// ***** chroma_vps *****
+// ***** chroma_vsp *****
+// ***** chroma_vss *****
+// ***** chroma_hpp *****
+// ***** chroma_hps *****
+
+#include "asm.S"
+
+#ifdef __APPLE__
+.section __RODATA,__rodata
+#else
+.section .rodata
+#endif
+
+.align 4
+
+.text
+
+// Macros below follow these conventions:
+// - input data in registers: v0, v1, v2, v3, v4, v5, v6, v7
+// - constants in registers: v24, v25, v26, v27, v31
+// - temporary registers: v16, v17, v18, v19, v20, v21, v22, v23, v28, v29, v30.
+// - _32b macros output a result in v17.4s
+// - _64b and _32b_1 macros output results in v17.4s, v18.4s
+
+.macro vextin8 v
+    ldp             d6, d7, x11, #16
+.if \v == 0
+    // qpel_filter_0 only uses values in v3
+    ext             v3.8b, v6.8b, v7.8b, #4
+.else
+.if \v != 3
+    ext             v0.8b, v6.8b, v7.8b, #1
+.endif
+    ext             v1.8b, v6.8b, v7.8b, #2
+    ext             v2.8b, v6.8b, v7.8b, #3
+    ext             v3.8b, v6.8b, v7.8b, #4
+    ext             v4.8b, v6.8b, v7.8b, #5
+    ext             v5.8b, v6.8b, v7.8b, #6
+    ext             v6.8b, v6.8b, v7.8b, #7
+.endif
+.endm
+
+.macro vextin8_64 v
+    ldp             q6, q7, x11, #32
+.if \v == 0
+    // qpel_filter_0 only uses values in v3
+    ext             v3.16b, v6.16b, v7.16b, #4
+.else
+.if \v != 3
+    // qpel_filter_3 does not use values in v0
+    ext             v0.16b, v6.16b, v7.16b, #1
+.endif
+    ext             v1.16b, v6.16b, v7.16b, #2
+    ext             v2.16b, v6.16b, v7.16b, #3
+    ext             v3.16b, v6.16b, v7.16b, #4
+    ext             v4.16b, v6.16b, v7.16b, #5
+    ext             v5.16b, v6.16b, v7.16b, #6
+.if \v == 1
+    ext             v6.16b, v6.16b, v7.16b, #7
+    // qpel_filter_1 does not use v7
+.else
+    ext             v16.16b, v6.16b, v7.16b, #7
+    ext             v7.16b, v6.16b, v7.16b, #8
+    mov             v6.16b, v16.16b
+.endif
+.endif
+.endm
+
+.macro vextin8_chroma v
+    ldp             d6, d7, x11, #16
+.if \v == 0
+    // qpel_filter_chroma_0 only uses values in v1
+    ext             v1.8b, v6.8b, v7.8b, #2
+.else
+    ext             v0.8b, v6.8b, v7.8b, #1
+    ext             v1.8b, v6.8b, v7.8b, #2
+    ext             v2.8b, v6.8b, v7.8b, #3
+    ext             v3.8b, v6.8b, v7.8b, #4
+.endif
+.endm
+
+.macro vextin8_chroma_64 v
+    ldp             q16, q17, x11, #32
+.if \v == 0
+    // qpel_filter_chroma_0 only uses values in v1
+    ext             v1.16b, v16.16b, v17.16b, #2
+.else
+    ext             v0.16b, v16.16b, v17.16b, #1
+    ext             v1.16b, v16.16b, v17.16b, #2
+    ext             v2.16b, v16.16b, v17.16b, #3
+    ext             v3.16b, v16.16b, v17.16b, #4
+.endif
+.endm
+
+.macro qpel_load_32b v
+.if \v == 0
+    add             x6, x6, x11       // do not load 3 values that are not used in qpel_filter_0
+    ld1             {v3.8b}, x6, x1
+.elseif \v == 1 || \v == 2 || \v == 3
+.if \v != 3                           // not used in qpel_filter_3
+    ld1             {v0.8b}, x6, x1
+.else
+    add             x6, x6, x1
+.endif
+    ld1             {v1.8b}, x6, x1
+    ld1             {v2.8b}, x6, x1
+    ld1             {v3.8b}, x6, x1
+    ld1             {v4.8b}, x6, x1
+    ld1             {v5.8b}, x6, x1
+.if \v != 1                           // not used in qpel_filter_1
+    ld1             {v6.8b}, x6, x1
+    ld1             {v7.8b}, x6
+.else
+    ld1             {v6.8b}, x6
+.endif
+.endif
+.endm
+
+.macro qpel_load_64b v
+.if \v == 0
+    add             x6, x6, x11       // do not load 3 values that are not used in qpel_filter_0
+    ld1             {v3.16b}, x6, x1
+.elseif \v == 1 || \v == 2 || \v == 3
+.if \v != 3                           // not used in qpel_filter_3
+    ld1             {v0.16b}, x6, x1
+.else
+    add             x6, x6, x1
+.endif
+    ld1             {v1.16b}, x6, x1
+    ld1             {v2.16b}, x6, x1
+    ld1             {v3.16b}, x6, x1
+    ld1             {v4.16b}, x6, x1
+    ld1             {v5.16b}, x6, x1
+.if \v != 1                           // not used in qpel_filter_1
+    ld1             {v6.16b}, x6, x1
+    ld1             {v7.16b}, x6
+.else
+    ld1             {v6.16b}, x6
+.endif
+.endif
+.endm
+
+.macro qpel_chroma_load_32b v
+.if \v == 0
+    // qpel_filter_chroma_0 only uses values in v1
+    add             x6, x6, x1
+    ldr             d1, x6
+.else
+    ld1             {v0.8b}, x6, x1
+    ld1             {v1.8b}, x6, x1
+    ld1             {v2.8b}, x6, x1
+    ld1             {v3.8b}, x6
+.endif
+.endm
+
+.macro qpel_chroma_load_64b v
+.if \v == 0
+    // qpel_filter_chroma_0 only uses values in v1
+    add             x6, x6, x1
+    ldr             q1, x6
+.else
+    ld1             {v0.16b}, x6, x1
+    ld1             {v1.16b}, x6, x1
+    ld1             {v2.16b}, x6, x1
+    ld1             {v3.16b}, x6
+.endif
+.endm

x265_3.5.tar.gz/source/common/aarch64/ipfilter8.S Deleted

@@ -1,414 +0,0 @@
-/*****************************************************************************
- * Copyright (C) 2020 MulticoreWare, Inc
- *
- * Authors: Yimeng Su <yimeng.su@huawei.com>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
- *
- * This program is also available under a commercial proprietary license.
- * For more information, contact us at license @ x265.com.
- *****************************************************************************/
-
-#include "asm.S"
-
-.section .rodata
-
-.align 4
-
-.text
-
-
-
-.macro qpel_filter_0_32b
-    movi            v24.8h, #64
-    uxtl            v19.8h, v5.8b
-    smull           v17.4s, v19.4h, v24.4h
-    smull2          v18.4s, v19.8h, v24.8h
-.endm
-
-.macro qpel_filter_1_32b
-    movi            v16.8h, #58
-    uxtl            v19.8h, v5.8b
-    smull           v17.4s, v19.4h, v16.4h
-    smull2          v18.4s, v19.8h, v16.8h
-
-    movi            v24.8h, #10
-    uxtl            v21.8h, v1.8b
-    smull           v19.4s, v21.4h, v24.4h
-    smull2          v20.4s, v21.8h, v24.8h
-
-    movi            v16.8h, #17
-    uxtl            v23.8h, v2.8b
-    smull           v21.4s, v23.4h, v16.4h
-    smull2          v22.4s, v23.8h, v16.8h
-
-    movi            v24.8h, #5
-    uxtl            v1.8h, v6.8b
-    smull           v23.4s, v1.4h, v24.4h
-    smull2          v16.4s, v1.8h, v24.8h
-
-    sub             v17.4s, v17.4s, v19.4s
-    sub             v18.4s, v18.4s, v20.4s
-
-    uxtl            v1.8h, v4.8b
-    sshll           v19.4s, v1.4h, #2
-    sshll2          v20.4s, v1.8h, #2
-
-    add             v17.4s, v17.4s, v21.4s
-    add             v18.4s, v18.4s, v22.4s
-
-    uxtl            v1.8h, v0.8b
-    uxtl            v2.8h, v3.8b
-    ssubl           v21.4s, v2.4h, v1.4h
-    ssubl2          v22.4s, v2.8h, v1.8h
-
-    add             v17.4s, v17.4s, v19.4s
-    add             v18.4s, v18.4s, v20.4s
-    sub             v21.4s, v21.4s, v23.4s
-    sub             v22.4s, v22.4s, v16.4s
-    add             v17.4s, v17.4s, v21.4s
-    add             v18.4s, v18.4s, v22.4s
-.endm
-
-.macro qpel_filter_2_32b
-    movi            v16.4s, #11
-    uxtl            v19.8h, v5.8b
-    uxtl            v20.8h, v2.8b
-    saddl           v17.4s, v19.4h, v20.4h
-    saddl2          v18.4s, v19.8h, v20.8h
-
-    uxtl            v21.8h, v1.8b
-    uxtl            v22.8h, v6.8b
-    saddl           v19.4s, v21.4h, v22.4h
-    saddl2          v20.4s, v21.8h, v22.8h
-
-    mul             v19.4s, v19.4s, v16.4s
-    mul             v20.4s, v20.4s, v16.4s
-
-    movi            v16.4s, #40
-    mul             v17.4s, v17.4s, v16.4s
-    mul             v18.4s, v18.4s, v16.4s
-
-    uxtl            v21.8h, v4.8b
-    uxtl            v22.8h, v3.8b
-    saddl           v23.4s, v21.4h, v22.4h
-    saddl2          v16.4s, v21.8h, v22.8h
-
-    uxtl            v1.8h, v0.8b
-    uxtl            v2.8h, v7.8b
-    saddl           v21.4s, v1.4h, v2.4h
-    saddl2          v22.4s, v1.8h, v2.8h
-
-    shl             v23.4s, v23.4s, #2
-    shl             v16.4s, v16.4s, #2
-
-    add             v19.4s, v19.4s, v21.4s
-    add             v20.4s, v20.4s, v22.4s
-    add             v17.4s, v17.4s, v23.4s
-    add             v18.4s, v18.4s, v16.4s
-    sub             v17.4s, v17.4s, v19.4s
-    sub             v18.4s, v18.4s, v20.4s
-.endm
-
-.macro qpel_filter_3_32b
-    movi            v16.8h, #17
-    movi            v24.8h, #5
-
-    uxtl            v19.8h, v5.8b
-    smull           v17.4s, v19.4h, v16.4h
-    smull2          v18.4s, v19.8h, v16.8h
-
-    uxtl            v21.8h, v1.8b
-    smull           v19.4s, v21.4h, v24.4h
-    smull2          v20.4s, v21.8h, v24.8h
-
-    movi            v16.8h, #58
-    uxtl            v23.8h, v2.8b
-    smull           v21.4s, v23.4h, v16.4h
-    smull2          v22.4s, v23.8h, v16.8h
-
-    movi            v24.8h, #10
-    uxtl            v1.8h, v6.8b
-    smull           v23.4s, v1.4h, v24.4h
-    smull2          v16.4s, v1.8h, v24.8h
-
-    sub             v17.4s, v17.4s, v19.4s
-    sub             v18.4s, v18.4s, v20.4s
-
-    uxtl            v1.8h, v3.8b
-    sshll           v19.4s, v1.4h, #2
-    sshll2          v20.4s, v1.8h, #2
-
-    add             v17.4s, v17.4s, v21.4s
-    add             v18.4s, v18.4s, v22.4s
-
-    uxtl            v1.8h, v4.8b
-    uxtl            v2.8h, v7.8b
-    ssubl           v21.4s, v1.4h, v2.4h
-    ssubl2          v22.4s, v1.8h, v2.8h
-
-    add             v17.4s, v17.4s, v19.4s
-    add             v18.4s, v18.4s, v20.4s
-    sub             v21.4s, v21.4s, v23.4s
-    sub             v22.4s, v22.4s, v16.4s
-    add             v17.4s, v17.4s, v21.4s
-    add             v18.4s, v18.4s, v22.4s
-.endm
-
-
-
-
-.macro vextin8
-    ld1             {v3.16b}, x11, #16
-    mov             v7.d0, v3.d1
-    ext             v0.8b, v3.8b, v7.8b, #1
-    ext             v4.8b, v3.8b, v7.8b, #2
-    ext             v1.8b, v3.8b, v7.8b, #3
-    ext             v5.8b, v3.8b, v7.8b, #4
-    ext             v2.8b, v3.8b, v7.8b, #5
-    ext             v6.8b, v3.8b, v7.8b, #6
-    ext             v3.8b, v3.8b, v7.8b, #7
-.endm
-
-
-
-// void interp_horiz_ps_c(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt)
-.macro HPS_FILTER a b filterhps
-    mov             w12, #8192
-    mov             w6, w10
-    sub             x3, x3, #\a
-    lsl             x3, x3, #1
-    mov             w9, #\a
-    cmp             w9, #4
-    b.eq            14f
-    cmp             w9, #12
-    b.eq            15f
-    b               7f
-14:

x265_3.5.tar.gz/source/common/aarch64/ipfilter8.h Deleted

@@ -1,55 +0,0 @@
-/*****************************************************************************
- * Copyright (C) 2020 MulticoreWare, Inc
- *
- * Authors: Yimeng Su <yimeng.su@huawei.com>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
- *
- * This program is also available under a commercial proprietary license.
- * For more information, contact us at license @ x265.com.
- *****************************************************************************/
-
-#ifndef X265_IPFILTER8_AARCH64_H
-#define X265_IPFILTER8_AARCH64_H
-
-
-void x265_interp_8tap_horiz_ps_4x4_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_4x8_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_4x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_8x4_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_8x8_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_8x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_8x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_12x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_16x4_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_16x8_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_16x12_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_16x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_16x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_16x64_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_24x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_32x8_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_32x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_32x24_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_32x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_32x64_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_48x64_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_64x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_64x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_64x48_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-void x265_interp_8tap_horiz_ps_64x64_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
-
-
-#endif // ifndef X265_IPFILTER8_AARCH64_H

x265_3.5.tar.gz/source/common/aarch64/loopfilter-prim.cpp Added

@@ -0,0 +1,290 @@
+#include "loopfilter-prim.h"
+
+#define PIXEL_MIN 0
+
+
+
+#if !(HIGH_BIT_DEPTH) && defined(HAVE_NEON)
+#include<arm_neon.h>
+
+namespace
+{
+
+
+/* get the sign of input variable (TODO: this is a dup, make common) */
+static inline int8_t signOf(int x)
+{
+    return (x >> 31) | ((int)((((uint32_t) - x)) >> 31));
+}
+
+static inline int8x8_t sign_diff_neon(const uint8x8_t in0, const uint8x8_t in1)
+{
+    int16x8_t in = vsubl_u8(in0, in1);
+    return vmovn_s16(vmaxq_s16(vminq_s16(in, vdupq_n_s16(1)), vdupq_n_s16(-1)));
+}
+
+static void calSign_neon(int8_t *dst, const pixel *src1, const pixel *src2, const int endX)
+{
+    int x = 0;
+    for (; (x + 8) <= endX; x += 8)
+    {
+        *(int8x8_t *)&dstx  = sign_diff_neon(*(uint8x8_t *)&src1x, *(uint8x8_t *)&src2x);
+    }
+
+    for (; x < endX; x++)
+    {
+        dstx = signOf(src1x - src2x);
+    }
+}
+
+static void processSaoCUE0_neon(pixel *rec, int8_t *offsetEo, int width, int8_t *signLeft, intptr_t stride)
+{
+
+
+    int y;
+    int8_t signRight, signLeft0;
+    int8_t edgeType;
+
+    for (y = 0; y < 2; y++)
+    {
+        signLeft0 = signLefty;
+        int x = 0;
+
+        if (width >= 8)
+        {
+            int8x8_t vsignRight;
+            int8x8x2_t shifter;
+            shifter.val10 = signLeft0;
+            static const int8x8_t index = {8, 0, 1, 2, 3, 4, 5, 6};
+            int8x8_t tbl = *(int8x8_t *)offsetEo;
+            for (; (x + 8) <= width; x += 8)
+            {
+                uint8x8_t in = *(uint8x8_t *)&recx;
+                vsignRight = sign_diff_neon(in, *(uint8x8_t *)&recx + 1);
+                shifter.val0 = vneg_s8(vsignRight);
+                int8x8_t tmp = shifter.val0;
+                int8x8_t edge = vtbl2_s8(shifter, index);
+                int8x8_t vedgeType = vadd_s8(vadd_s8(vsignRight, edge), vdup_n_s8(2));
+                shifter.val10 = tmp7;
+                int16x8_t t1 = vmovl_s8(vtbl1_s8(tbl, vedgeType));
+                t1 = vaddw_u8(t1, in);
+                t1 = vmaxq_s16(t1, vdupq_n_s16(0));
+                t1 = vminq_s16(t1, vdupq_n_s16(255));
+                *(uint8x8_t *)&recx = vmovn_u16(t1);
+            }
+            signLeft0 = shifter.val10;
+        }
+        for (; x < width; x++)
+        {
+            signRight = ((recx - recx + 1) < 0) ? -1 : ((recx - recx + 1) > 0) ? 1 : 0;
+            edgeType = signRight + signLeft0 + 2;
+            signLeft0 = -signRight;
+            recx = x265_clip(recx + offsetEoedgeType);
+        }
+        rec += stride;
+    }
+}
+
+static void processSaoCUE1_neon(pixel *rec, int8_t *upBuff1, int8_t *offsetEo, intptr_t stride, int width)
+{
+    int x = 0;
+    int8_t signDown;
+    int edgeType;
+
+    if (width >= 8)
+    {
+        int8x8_t tbl = *(int8x8_t *)offsetEo;
+        for (; (x + 8) <= width; x += 8)
+        {
+            uint8x8_t in0 = *(uint8x8_t *)&recx;
+            uint8x8_t in1 = *(uint8x8_t *)&recx + stride;
+            int8x8_t vsignDown = sign_diff_neon(in0, in1);
+            int8x8_t vedgeType = vadd_s8(vadd_s8(vsignDown, *(int8x8_t *)&upBuff1x), vdup_n_s8(2));
+            *(int8x8_t *)&upBuff1x = vneg_s8(vsignDown);
+            int16x8_t t1 = vmovl_s8(vtbl1_s8(tbl, vedgeType));
+            t1 = vaddw_u8(t1, in0);
+            *(uint8x8_t *)&recx = vqmovun_s16(t1);
+        }
+    }
+    for (; x < width; x++)
+    {
+        signDown = signOf(recx - recx + stride);
+        edgeType = signDown + upBuff1x + 2;
+        upBuff1x = -signDown;
+        recx = x265_clip(recx + offsetEoedgeType);
+    }
+}
+
+static void processSaoCUE1_2Rows_neon(pixel *rec, int8_t *upBuff1, int8_t *offsetEo, intptr_t stride, int width)
+{
+    int y;
+    int8_t signDown;
+    int edgeType;
+
+    for (y = 0; y < 2; y++)
+    {
+        int x = 0;
+        if (width >= 8)
+        {
+            int8x8_t tbl = *(int8x8_t *)offsetEo;
+            for (; (x + 8) <= width; x += 8)
+            {
+                uint8x8_t in0 = *(uint8x8_t *)&recx;
+                uint8x8_t in1 = *(uint8x8_t *)&recx + stride;
+                int8x8_t vsignDown = sign_diff_neon(in0, in1);
+                int8x8_t vedgeType = vadd_s8(vadd_s8(vsignDown, *(int8x8_t *)&upBuff1x), vdup_n_s8(2));
+                *(int8x8_t *)&upBuff1x = vneg_s8(vsignDown);
+                int16x8_t t1 = vmovl_s8(vtbl1_s8(tbl, vedgeType));
+                t1 = vaddw_u8(t1, in0);
+                t1 = vmaxq_s16(t1, vdupq_n_s16(0));
+                t1 = vminq_s16(t1, vdupq_n_s16(255));
+                *(uint8x8_t *)&recx = vmovn_u16(t1);
+
+            }
+        }
+        for (; x < width; x++)
+        {
+            signDown = signOf(recx - recx + stride);
+            edgeType = signDown + upBuff1x + 2;
+            upBuff1x = -signDown;
+            recx = x265_clip(recx + offsetEoedgeType);
+        }
+        rec += stride;
+    }
+}
+
+static void processSaoCUE2_neon(pixel *rec, int8_t *bufft, int8_t *buff1, int8_t *offsetEo, int width, intptr_t stride)
+{
+    int x;
+
+    if (abs(buff1 - bufft) < 16)
+    {
+        for (x = 0; x < width; x++)
+        {
+            int8_t signDown = signOf(recx - recx + stride + 1);
+            int edgeType = signDown + buff1x + 2;
+            bufftx + 1 = -signDown;
+            recx = x265_clip(recx + offsetEoedgeType);;
+        }
+    }
+    else
+    {
+        int8x8_t tbl = *(int8x8_t *)offsetEo;
+        x = 0;
+        for (; (x + 8) <= width; x += 8)
+        {
+            uint8x8_t in0 = *(uint8x8_t *)&recx;
+            uint8x8_t in1 = *(uint8x8_t *)&recx + stride + 1;
+            int8x8_t vsignDown = sign_diff_neon(in0, in1);
+            int8x8_t vedgeType = vadd_s8(vadd_s8(vsignDown, *(int8x8_t *)&buff1x), vdup_n_s8(2));
+            *(int8x8_t *)&bufftx + 1 = vneg_s8(vsignDown);
+            int16x8_t t1 = vmovl_s8(vtbl1_s8(tbl, vedgeType));
+            t1 = vaddw_u8(t1, in0);
+            t1 = vmaxq_s16(t1, vdupq_n_s16(0));
+            t1 = vminq_s16(t1, vdupq_n_s16(255));
+            *(uint8x8_t *)&recx = vmovn_u16(t1);
+        }
+        for (; x < width; x++)
+        {
+            int8_t signDown = signOf(recx - recx + stride + 1);
+            int edgeType = signDown + buff1x + 2;
+            bufftx + 1 = -signDown;
+            recx = x265_clip(recx + offsetEoedgeType);;
+        }
+
+    }
+}
+
+
+static void processSaoCUE3_neon(pixel *rec, int8_t *upBuff1, int8_t *offsetEo, intptr_t stride, int startX, int endX)

x265_3.5.tar.gz/source/common/aarch64/loopfilter-prim.h Added

x265_3.5.tar.gz/source/common/aarch64/mc-a.S Changed

@@ -1,7 +1,8 @@
 /*****************************************************************************
- * Copyright (C) 2020 MulticoreWare, Inc
+ * Copyright (C) 2020-2021 MulticoreWare, Inc
  *
  * Authors: Hongbin Liu <liuhongbin1@huawei.com>
+ *          Sebastian Pop <spop@amazon.com>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -23,14 +24,18 @@
 
 #include "asm.S"
 
+#ifdef __APPLE__
+.section __RODATA,__rodata
+#else
 .section .rodata
+#endif
 
 .align 4
 
 .text
 
 .macro pixel_avg_pp_4xN_neon h
-function x265_pixel_avg_pp_4x\h\()_neon
+function PFX(pixel_avg_pp_4x\h\()_neon)
 .rept \h
     ld1             {v0.s}0, x2, x3
     ld1             {v1.s}0, x4, x5
@@ -46,7 +51,7 @@
 pixel_avg_pp_4xN_neon 16
 
 .macro pixel_avg_pp_8xN_neon h
-function x265_pixel_avg_pp_8x\h\()_neon
+function PFX(pixel_avg_pp_8x\h\()_neon)
 .rept \h
     ld1             {v0.8b}, x2, x3
     ld1             {v1.8b}, x4, x5
@@ -61,3 +66,506 @@
 pixel_avg_pp_8xN_neon 8
 pixel_avg_pp_8xN_neon 16
 pixel_avg_pp_8xN_neon 32
+
+function PFX(pixel_avg_pp_12x16_neon)
+    sub             x1, x1, #4
+    sub             x3, x3, #4
+    sub             x5, x5, #4
+.rept 16
+    ld1             {v0.s}0, x2, #4
+    ld1             {v1.8b}, x2, x3
+    ld1             {v2.s}0, x4, #4
+    ld1             {v3.8b}, x4, x5
+    urhadd          v4.8b, v0.8b, v2.8b
+    urhadd          v5.8b, v1.8b, v3.8b
+    st1             {v4.s}0, x0, #4
+    st1             {v5.8b}, x0, x1
+.endr
+    ret
+endfunc
+
+.macro pixel_avg_pp_16xN_neon h
+function PFX(pixel_avg_pp_16x\h\()_neon)
+.rept \h
+    ld1             {v0.16b}, x2, x3
+    ld1             {v1.16b}, x4, x5
+    urhadd          v2.16b, v0.16b, v1.16b
+    st1             {v2.16b}, x0, x1
+.endr
+    ret
+endfunc
+.endm
+
+pixel_avg_pp_16xN_neon 4
+pixel_avg_pp_16xN_neon 8
+pixel_avg_pp_16xN_neon 12
+pixel_avg_pp_16xN_neon 16
+pixel_avg_pp_16xN_neon 32
+
+function PFX(pixel_avg_pp_16x64_neon)
+    mov             w12, #8
+.lpavg_16x64:
+    sub             w12, w12, #1
+.rept 8
+    ld1             {v0.16b}, x2, x3
+    ld1             {v1.16b}, x4, x5
+    urhadd          v2.16b, v0.16b, v1.16b
+    st1             {v2.16b}, x0, x1
+.endr
+    cbnz            w12, .lpavg_16x64
+    ret
+endfunc
+
+function PFX(pixel_avg_pp_24x32_neon)
+    sub             x1, x1, #16
+    sub             x3, x3, #16
+    sub             x5, x5, #16
+    mov             w12, #4
+.lpavg_24x32:
+    sub             w12, w12, #1
+.rept 8
+    ld1             {v0.16b}, x2, #16
+    ld1             {v1.8b}, x2, x3
+    ld1             {v2.16b}, x4, #16
+    ld1             {v3.8b}, x4, x5
+    urhadd          v0.16b, v0.16b, v2.16b
+    urhadd          v1.8b, v1.8b, v3.8b
+    st1             {v0.16b}, x0, #16
+    st1             {v1.8b}, x0, x1
+.endr
+    cbnz            w12, .lpavg_24x32
+    ret
+endfunc
+
+.macro pixel_avg_pp_32xN_neon h
+function PFX(pixel_avg_pp_32x\h\()_neon)
+.rept \h
+    ld1             {v0.16b-v1.16b}, x2, x3
+    ld1             {v2.16b-v3.16b}, x4, x5
+    urhadd          v0.16b, v0.16b, v2.16b
+    urhadd          v1.16b, v1.16b, v3.16b
+    st1             {v0.16b-v1.16b}, x0, x1
+.endr
+    ret
+endfunc
+.endm
+
+pixel_avg_pp_32xN_neon 8
+pixel_avg_pp_32xN_neon 16
+pixel_avg_pp_32xN_neon 24
+
+.macro pixel_avg_pp_32xN1_neon h
+function PFX(pixel_avg_pp_32x\h\()_neon)
+    mov             w12, #\h / 8
+.lpavg_32x\h\():
+    sub             w12, w12, #1
+.rept 8
+    ld1             {v0.16b-v1.16b}, x2, x3
+    ld1             {v2.16b-v3.16b}, x4, x5
+    urhadd          v0.16b, v0.16b, v2.16b
+    urhadd          v1.16b, v1.16b, v3.16b
+    st1             {v0.16b-v1.16b}, x0, x1
+.endr
+    cbnz            w12, .lpavg_32x\h
+    ret
+endfunc
+.endm
+
+pixel_avg_pp_32xN1_neon 32
+pixel_avg_pp_32xN1_neon 64
+
+function PFX(pixel_avg_pp_48x64_neon)
+    mov             w12, #8
+.lpavg_48x64:
+    sub             w12, w12, #1
+.rept 8
+    ld1             {v0.16b-v2.16b}, x2, x3
+    ld1             {v3.16b-v5.16b}, x4, x5
+    urhadd          v0.16b, v0.16b, v3.16b
+    urhadd          v1.16b, v1.16b, v4.16b
+    urhadd          v2.16b, v2.16b, v5.16b
+    st1             {v0.16b-v2.16b}, x0, x1
+.endr
+    cbnz            w12, .lpavg_48x64
+    ret
+endfunc
+
+.macro pixel_avg_pp_64xN_neon h
+function PFX(pixel_avg_pp_64x\h\()_neon)
+    mov             w12, #\h / 4
+.lpavg_64x\h\():
+    sub             w12, w12, #1
+.rept 4
+    ld1             {v0.16b-v3.16b}, x2, x3
+    ld1             {v4.16b-v7.16b}, x4, x5
+    urhadd          v0.16b, v0.16b, v4.16b
+    urhadd          v1.16b, v1.16b, v5.16b
+    urhadd          v2.16b, v2.16b, v6.16b
+    urhadd          v3.16b, v3.16b, v7.16b
+    st1             {v0.16b-v3.16b}, x0, x1
+.endr
+    cbnz            w12, .lpavg_64x\h
+    ret
+endfunc
+.endm
+
+pixel_avg_pp_64xN_neon 16
+pixel_avg_pp_64xN_neon 32
+pixel_avg_pp_64xN_neon 48
+pixel_avg_pp_64xN_neon 64
+
+// void addAvg(const int16_t* src0, const int16_t* src1, pixel* dst, intptr_t src0Stride, intptr_t src1Stride, intptr_t dstStride)
+.macro addAvg_start
+    lsl             x3, x3, #1
+    lsl             x4, x4, #1
+    mov             w11, #0x40
+    dup             v30.16b, w11
+.endm
+
+.macro addAvg_2xN h

x265_3.5.tar.gz/source/common/aarch64/p2s.S Added

@@ -0,0 +1,452 @@
+/*****************************************************************************
+ * Copyright (C) 2021 MulticoreWare, Inc
+ *
+ * Authors: Sebastian Pop <spop@amazon.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
+ *
+ * This program is also available under a commercial proprietary license.
+ * For more information, contact us at license @ x265.com.
+ *****************************************************************************/
+
+#include "asm.S"
+
+#ifdef __APPLE__
+.section __RODATA,__rodata
+#else
+.section .rodata
+#endif
+
+.align 4
+
+.text
+
+#if HIGH_BIT_DEPTH
+# if BIT_DEPTH == 10
+#  define P2S_SHIFT 4
+# elif BIT_DEPTH == 12
+#  define P2S_SHIFT 2
+# endif
+.macro p2s_start
+    add             x3, x3, x3
+    add             x1, x1, x1
+    movi            v31.8h, #0xe0, lsl #8
+.endm
+
+#else // if !HIGH_BIT_DEPTH
+# define P2S_SHIFT 6
+.macro p2s_start
+    add             x3, x3, x3
+    movi            v31.8h, #0xe0, lsl #8
+.endm
+#endif // HIGH_BIT_DEPTH
+
+.macro p2s_2x2
+#if HIGH_BIT_DEPTH
+    ld1             {v0.s}0, x0, x1
+    ld1             {v0.s}1, x0, x1
+    shl             v3.8h, v0.8h, #P2S_SHIFT
+#else
+    ldrh            w10, x0
+    add             x0, x0, x1
+    ldrh            w11, x0
+    orr             w10, w10, w11, lsl #16
+    add             x0, x0, x1
+    dup             v0.4s, w10
+    ushll           v3.8h, v0.8b, #P2S_SHIFT
+#endif
+    add             v3.8h, v3.8h, v31.8h
+    st1             {v3.s}0, x2, x3
+    st1             {v3.s}1, x2, x3
+.endm
+
+// filterPixelToShort(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride)
+.macro p2s_2xN h
+function PFX(filterPixelToShort_2x\h\()_neon)
+    p2s_start
+.rept \h / 2
+    p2s_2x2
+.endr
+    ret
+endfunc
+.endm
+
+p2s_2xN 4
+p2s_2xN 8
+p2s_2xN 16
+
+.macro p2s_6x2
+#if HIGH_BIT_DEPTH
+    ld1             {v0.d}0, x0, #8
+    ld1             {v1.s}0, x0, x1
+    ld1             {v0.d}1, x0, #8
+    ld1             {v1.s}1, x0, x1
+    shl             v3.8h, v0.8h, #P2S_SHIFT
+    shl             v4.8h, v1.8h, #P2S_SHIFT
+#else
+    ldr             s0, x0
+    ldrh            w10, x0, #4
+    add             x0, x0, x1
+    ld1             {v0.s}1, x0
+    ldrh            w11, x0, #4
+    add             x0, x0, x1
+    orr             w10, w10, w11, lsl #16
+    dup             v1.4s, w10
+    ushll           v3.8h, v0.8b, #P2S_SHIFT
+    ushll           v4.8h, v1.8b, #P2S_SHIFT
+#endif
+    add             v3.8h, v3.8h, v31.8h
+    add             v4.8h, v4.8h, v31.8h
+    st1             {v3.d}0, x2, #8
+    st1             {v4.s}0, x2, x3
+    st1             {v3.d}1, x2, #8
+    st1             {v4.s}1, x2, x3
+.endm
+
+.macro p2s_6xN h
+function PFX(filterPixelToShort_6x\h\()_neon)
+    p2s_start
+    sub             x3, x3, #8
+#if HIGH_BIT_DEPTH
+    sub             x1, x1, #8
+#endif
+.rept \h / 2
+    p2s_6x2
+.endr
+    ret
+endfunc
+.endm
+
+p2s_6xN 8
+p2s_6xN 16
+
+function PFX(filterPixelToShort_4x2_neon)
+    p2s_start
+#if HIGH_BIT_DEPTH
+    ld1             {v0.d}0, x0, x1
+    ld1             {v0.d}1, x0, x1
+    shl             v3.8h, v0.8h, #P2S_SHIFT
+#else
+    ld1             {v0.s}0, x0, x1
+    ld1             {v0.s}1, x0, x1
+    ushll           v3.8h, v0.8b, #P2S_SHIFT
+#endif
+    add             v3.8h, v3.8h, v31.8h
+    st1             {v3.d}0, x2, x3
+    st1             {v3.d}1, x2, x3
+    ret
+endfunc
+
+function PFX(filterPixelToShort_4x4_neon)
+    p2s_start
+#if HIGH_BIT_DEPTH
+    ld1             {v0.d}0, x0, x1
+    ld1             {v0.d}1, x0, x1
+    shl             v3.8h, v0.8h, #P2S_SHIFT
+#else
+    ld1             {v0.s}0, x0, x1
+    ld1             {v0.s}1, x0, x1
+    ushll           v3.8h, v0.8b, #P2S_SHIFT
+#endif
+    add             v3.8h, v3.8h, v31.8h
+    st1             {v3.d}0, x2, x3
+    st1             {v3.d}1, x2, x3
+#if HIGH_BIT_DEPTH
+    ld1             {v1.d}0, x0, x1
+    ld1             {v1.d}1, x0, x1
+    shl             v4.8h, v1.8h, #P2S_SHIFT
+#else
+    ld1             {v1.s}0, x0, x1
+    ld1             {v1.s}1, x0, x1
+    ushll           v4.8h, v1.8b, #P2S_SHIFT
+#endif
+    add             v4.8h, v4.8h, v31.8h
+    st1             {v4.d}0, x2, x3
+    st1             {v4.d}1, x2, x3
+    ret
+endfunc
+
+.macro p2s_4xN h
+function PFX(filterPixelToShort_4x\h\()_neon)
+    p2s_start
+.rept \h / 2
+#if HIGH_BIT_DEPTH
+    ld1             {v0.16b}, x0, x1
+    shl             v0.8h, v0.8h, #P2S_SHIFT
+#else
+    ld1             {v0.8b}, x0, x1
+    ushll           v0.8h, v0.8b, #P2S_SHIFT
+#endif
+    add             v2.4h, v0.4h, v31.4h
+    st1             {v2.4h}, x2, x3
+#if HIGH_BIT_DEPTH
+    ld1             {v1.16b}, x0, x1
+    shl             v1.8h, v1.8h, #P2S_SHIFT
+#else
+    ld1             {v1.8b}, x0, x1
+    ushll           v1.8h, v1.8b, #P2S_SHIFT

x265_3.5.tar.gz/source/common/aarch64/pixel-prim.cpp Added

@@ -0,0 +1,2063 @@
+#include "common.h"
+#include "slicetype.h"      // LOWRES_COST_MASK
+#include "primitives.h"
+#include "x265.h"
+
+#include "pixel-prim.h"
+#include "arm64-utils.h"
+#if HAVE_NEON
+
+#include <arm_neon.h>
+
+using namespace X265_NS;
+
+
+
+namespace
+{
+
+
+/* SATD SA8D variants - based on x264 */
+static inline void SUMSUB_AB(int16x8_t &sum, int16x8_t &sub, const int16x8_t a, const int16x8_t b)
+{
+    sum = vaddq_s16(a, b);
+    sub = vsubq_s16(a, b);
+}
+
+static inline void transpose_8h(int16x8_t &t1, int16x8_t &t2, const int16x8_t s1, const int16x8_t s2)
+{
+    t1 = vtrn1q_s16(s1, s2);
+    t2 = vtrn2q_s16(s1, s2);
+}
+
+static inline void transpose_4s(int16x8_t &t1, int16x8_t &t2, const int16x8_t s1, const int16x8_t s2)
+{
+    t1 = vtrn1q_s32(s1, s2);
+    t2 = vtrn2q_s32(s1, s2);
+}
+
+#if (X265_DEPTH <= 10)
+static inline void transpose_2d(int16x8_t &t1, int16x8_t &t2, const int16x8_t s1, const int16x8_t s2)
+{
+    t1 = vtrn1q_s64(s1, s2);
+    t2 = vtrn2q_s64(s1, s2);
+}
+#endif
+
+
+static inline void SUMSUB_ABCD(int16x8_t &s1, int16x8_t &d1, int16x8_t &s2, int16x8_t &d2,
+                               int16x8_t a, int16x8_t  b, int16x8_t  c, int16x8_t  d)
+{
+    SUMSUB_AB(s1, d1, a, b);
+    SUMSUB_AB(s2, d2, c, d);
+}
+
+static inline void HADAMARD4_V(int16x8_t &r1, int16x8_t &r2, int16x8_t &r3, int16x8_t &r4,
+                               int16x8_t &t1, int16x8_t &t2, int16x8_t &t3, int16x8_t &t4)
+{
+    SUMSUB_ABCD(t1, t2, t3, t4, r1, r2, r3, r4);
+    SUMSUB_ABCD(r1, r3, r2, r4, t1, t3, t2, t4);
+}
+
+
+static int _satd_4x8_8x4_end_neon(int16x8_t v0, int16x8_t v1, int16x8_t v2, int16x8_t v3)
+
+{
+
+    int16x8_t v4, v5, v6, v7, v16, v17, v18, v19;
+
+
+    SUMSUB_AB(v16, v17, v0,  v1);
+    SUMSUB_AB(v18, v19, v2,  v3);
+
+    SUMSUB_AB(v4 , v6 , v16, v18);
+    SUMSUB_AB(v5 , v7 , v17, v19);
+
+    v0 = vtrn1q_s16(v4, v5);
+    v1 = vtrn2q_s16(v4, v5);
+    v2 = vtrn1q_s16(v6, v7);
+    v3 = vtrn2q_s16(v6, v7);
+
+    SUMSUB_AB(v16, v17, v0,  v1);
+    SUMSUB_AB(v18, v19, v2,  v3);
+
+    v0 = vtrn1q_s32(v16, v18);
+    v1 = vtrn2q_s32(v16, v18);
+    v2 = vtrn1q_s32(v17, v19);
+    v3 = vtrn2q_s32(v17, v19);
+
+    v0 = vabsq_s16(v0);
+    v1 = vabsq_s16(v1);
+    v2 = vabsq_s16(v2);
+    v3 = vabsq_s16(v3);
+
+    v0 = vmaxq_u16(v0, v1);
+    v1 = vmaxq_u16(v2, v3);
+
+    v0 = vaddq_u16(v0, v1);
+    return vaddlvq_u16(v0);
+}
+
+static inline int _satd_4x4_neon(int16x8_t v0, int16x8_t v1)
+{
+    int16x8_t v2, v3;
+    SUMSUB_AB(v2,  v3,  v0,  v1);
+
+    v0 = vzip1q_s64(v2, v3);
+    v1 = vzip2q_s64(v2, v3);
+    SUMSUB_AB(v2,  v3,  v0,  v1);
+
+    v0 = vtrn1q_s16(v2, v3);
+    v1 = vtrn2q_s16(v2, v3);
+    SUMSUB_AB(v2,  v3,  v0,  v1);
+
+    v0 = vtrn1q_s32(v2, v3);
+    v1 = vtrn2q_s32(v2, v3);
+
+    v0 = vabsq_s16(v0);
+    v1 = vabsq_s16(v1);
+    v0 = vmaxq_u16(v0, v1);
+
+    return vaddlvq_s16(v0);
+}
+
+static void _satd_8x4v_8x8h_neon(int16x8_t &v0, int16x8_t &v1, int16x8_t &v2, int16x8_t &v3, int16x8_t &v20,
+                                 int16x8_t &v21, int16x8_t &v22, int16x8_t &v23)
+{
+    int16x8_t v16, v17, v18, v19, v4, v5, v6, v7;
+
+    SUMSUB_AB(v16, v18, v0,  v2);
+    SUMSUB_AB(v17, v19, v1,  v3);
+
+    HADAMARD4_V(v20, v21, v22, v23, v0,  v1, v2, v3);
+
+    transpose_8h(v0,  v1,  v16, v17);
+    transpose_8h(v2,  v3,  v18, v19);
+    transpose_8h(v4,  v5,  v20, v21);
+    transpose_8h(v6,  v7,  v22, v23);
+
+    SUMSUB_AB(v16, v17, v0,  v1);
+    SUMSUB_AB(v18, v19, v2,  v3);
+    SUMSUB_AB(v20, v21, v4,  v5);
+    SUMSUB_AB(v22, v23, v6,  v7);
+
+    transpose_4s(v0,  v2,  v16, v18);
+    transpose_4s(v1,  v3,  v17, v19);
+    transpose_4s(v4,  v6,  v20, v22);
+    transpose_4s(v5,  v7,  v21, v23);
+
+    v0 = vabsq_s16(v0);
+    v1 = vabsq_s16(v1);
+    v2 = vabsq_s16(v2);
+    v3 = vabsq_s16(v3);
+    v4 = vabsq_s16(v4);
+    v5 = vabsq_s16(v5);
+    v6 = vabsq_s16(v6);
+    v7 = vabsq_s16(v7);
+
+    v0 = vmaxq_u16(v0, v2);
+    v1 = vmaxq_u16(v1, v3);
+    v2 = vmaxq_u16(v4, v6);
+    v3 = vmaxq_u16(v5, v7);
+
+}
+
+#if HIGH_BIT_DEPTH
+
+#if (X265_DEPTH > 10)
+static inline void transpose_2d(int32x4_t &t1, int32x4_t &t2, const int32x4_t s1, const int32x4_t s2)
+{
+    t1 = vtrn1q_s64(s1, s2);
+    t2 = vtrn2q_s64(s1, s2);
+}
+
+static inline void ISUMSUB_AB(int32x4_t &sum, int32x4_t &sub, const int32x4_t a, const int32x4_t b)
+{
+    sum = vaddq_s32(a, b);
+    sub = vsubq_s32(a, b);
+}
+
+static inline void ISUMSUB_AB_FROM_INT16(int32x4_t &suml, int32x4_t &sumh, int32x4_t &subl, int32x4_t &subh,
+        const int16x8_t a, const int16x8_t b)
+{
+    suml = vaddl_s16(vget_low_s16(a), vget_low_s16(b));
+    sumh = vaddl_high_s16(a, b);
+    subl = vsubl_s16(vget_low_s16(a), vget_low_s16(b));
+    subh = vsubl_high_s16(a, b);
+}
+
+#endif
+
+static inline void _sub_8x8_fly(const uint16_t *pix1, intptr_t stride_pix1, const uint16_t *pix2, intptr_t stride_pix2,
+                                int16x8_t &v0, int16x8_t &v1, int16x8_t &v2, int16x8_t &v3,
+                                int16x8_t &v20, int16x8_t &v21, int16x8_t &v22, int16x8_t &v23)
+{
+    uint16x8_t r0, r1, r2, r3;
+    uint16x8_t t0, t1, t2, t3;
+    int16x8_t v16, v17;
+    int16x8_t v18, v19;
+

x265_3.5.tar.gz/source/common/aarch64/pixel-prim.h Added

x265_3.5.tar.gz/source/common/aarch64/pixel-util.S Changed

@@ -1,8 +1,9 @@
 /*****************************************************************************
- * Copyright (C) 2020 MulticoreWare, Inc
+ * Copyright (C) 2020-2021 MulticoreWare, Inc
  *
  * Authors: Yimeng Su <yimeng.su@huawei.com>
  *          Hongbin Liu <liuhongbin1@huawei.com>
+ *          Sebastian Pop <spop@amazon.com>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -24,12 +25,677 @@
 
 #include "asm.S"
 
+#ifdef __APPLE__
+.section __RODATA,__rodata
+#else
 .section .rodata
+#endif
 
 .align 4
 
 .text
 
+// uint64_t pixel_var(const pixel* pix, intptr_t i_stride)
+function PFX(pixel_var_8x8_neon)
+    ld1             {v4.8b}, x0, x1        // pixx
+    uxtl            v0.8h, v4.8b             // sum = pixx
+    umull           v1.8h, v4.8b, v4.8b
+    uaddlp          v1.4s, v1.8h             // sqr = pixx * pixx
+
+.rept 7
+    ld1             {v4.8b}, x0, x1        // pixx
+    umull           v31.8h, v4.8b, v4.8b
+    uaddw           v0.8h, v0.8h, v4.8b      // sum += pixx
+    uadalp          v1.4s, v31.8h            // sqr += pixx * pixx
+.endr
+    uaddlv          s0, v0.8h
+    uaddlv          d1, v1.4s
+    fmov            w0, s0
+    fmov            x1, d1
+    orr             x0, x0, x1, lsl #32      // return sum + ((uint64_t)sqr << 32);
+    ret
+endfunc
+
+.macro pixel_var_start
+    movi            v0.16b, #0
+    movi            v1.16b, #0
+    movi            v2.16b, #0
+    movi            v3.16b, #0
+.endm
+
+.macro pixel_var_1 v
+    uaddw           v0.8h, v0.8h, \v\().8b
+    umull           v30.8h, \v\().8b, \v\().8b
+    uaddw2          v1.8h, v1.8h, \v\().16b
+    umull2          v31.8h, \v\().16b, \v\().16b
+    uadalp          v2.4s, v30.8h
+    uadalp          v3.4s, v31.8h
+.endm
+
+.macro pixel_var_end
+    uaddlv          s0, v0.8h
+    uaddlv          s1, v1.8h
+    add             v2.4s, v2.4s, v3.4s
+    fadd            s0, s0, s1
+    uaddlv          d2, v2.4s
+    fmov            w0, s0
+    fmov            x2, d2
+    orr             x0, x0, x2, lsl #32
+.endm
+
+function PFX(pixel_var_16x16_neon)
+    pixel_var_start
+    mov             w12, #16
+.loop_var_16:
+    sub             w12, w12, #1
+    ld1             {v4.16b}, x0, x1
+    pixel_var_1 v4
+    cbnz            w12, .loop_var_16
+    pixel_var_end
+    ret
+endfunc
+
+function PFX(pixel_var_32x32_neon)
+    pixel_var_start
+    mov             w12, #32
+.loop_var_32:
+    sub             w12, w12, #1
+    ld1             {v4.16b-v5.16b}, x0, x1
+    pixel_var_1 v4
+    pixel_var_1 v5
+    cbnz            w12, .loop_var_32
+    pixel_var_end
+    ret
+endfunc
+
+function PFX(pixel_var_64x64_neon)
+    pixel_var_start
+    mov             w12, #64
+.loop_var_64:
+    sub             w12, w12, #1
+    ld1             {v4.16b-v7.16b}, x0, x1
+    pixel_var_1 v4
+    pixel_var_1 v5
+    pixel_var_1 v6
+    pixel_var_1 v7
+    cbnz            w12, .loop_var_64
+    pixel_var_end
+    ret
+endfunc
+
+// void getResidual4_neon(const pixel* fenc, const pixel* pred, int16_t* residual, intptr_t stride)
+function PFX(getResidual4_neon)
+    lsl             x4, x3, #1
+.rept 2
+    ld1             {v0.8b}, x0, x3
+    ld1             {v1.8b}, x1, x3
+    ld1             {v2.8b}, x0, x3
+    ld1             {v3.8b}, x1, x3
+    usubl           v4.8h, v0.8b, v1.8b
+    usubl           v5.8h, v2.8b, v3.8b
+    st1             {v4.8b}, x2, x4
+    st1             {v5.8b}, x2, x4
+.endr
+    ret
+endfunc
+
+function PFX(getResidual8_neon)
+    lsl             x4, x3, #1
+.rept 4
+    ld1             {v0.8b}, x0, x3
+    ld1             {v1.8b}, x1, x3
+    ld1             {v2.8b}, x0, x3
+    ld1             {v3.8b}, x1, x3
+    usubl           v4.8h, v0.8b, v1.8b
+    usubl           v5.8h, v2.8b, v3.8b
+    st1             {v4.16b}, x2, x4
+    st1             {v5.16b}, x2, x4
+.endr
+    ret
+endfunc
+
+function PFX(getResidual16_neon)
+    lsl             x4, x3, #1
+.rept 8
+    ld1             {v0.16b}, x0, x3
+    ld1             {v1.16b}, x1, x3
+    ld1             {v2.16b}, x0, x3
+    ld1             {v3.16b}, x1, x3
+    usubl           v4.8h, v0.8b, v1.8b
+    usubl2          v5.8h, v0.16b, v1.16b
+    usubl           v6.8h, v2.8b, v3.8b
+    usubl2          v7.8h, v2.16b, v3.16b
+    st1             {v4.8h-v5.8h}, x2, x4
+    st1             {v6.8h-v7.8h}, x2, x4
+.endr
+    ret
+endfunc
+
+function PFX(getResidual32_neon)
+    lsl             x4, x3, #1
+    mov             w12, #4
+.loop_residual_32:
+    sub             w12, w12, #1
+.rept 4
+    ld1             {v0.16b-v1.16b}, x0, x3
+    ld1             {v2.16b-v3.16b}, x1, x3
+    ld1             {v4.16b-v5.16b}, x0, x3
+    ld1             {v6.16b-v7.16b}, x1, x3
+    usubl           v16.8h, v0.8b, v2.8b
+    usubl2          v17.8h, v0.16b, v2.16b
+    usubl           v18.8h, v1.8b, v3.8b
+    usubl2          v19.8h, v1.16b, v3.16b
+    usubl           v20.8h, v4.8b, v6.8b
+    usubl2          v21.8h, v4.16b, v6.16b
+    usubl           v22.8h, v5.8b, v7.8b
+    usubl2          v23.8h, v5.16b, v7.16b
+    st1             {v16.8h-v19.8h}, x2, x4
+    st1             {v20.8h-v23.8h}, x2, x4
+.endr
+    cbnz            w12, .loop_residual_32
+    ret
+endfunc
+
+// void pixel_sub_ps_neon(int16_t* a, intptr_t dstride, const pixel* b0, const pixel* b1, intptr_t sstride0, intptr_t sstride1)
+function PFX(pixel_sub_ps_4x4_neon)
+    lsl             x1, x1, #1
+.rept 2
+    ld1             {v0.8b}, x2, x4
+    ld1             {v1.8b}, x3, x5
+    ld1             {v2.8b}, x2, x4
+    ld1             {v3.8b}, x3, x5
+    usubl           v4.8h, v0.8b, v1.8b
+    usubl           v5.8h, v2.8b, v3.8b
+    st1             {v4.4h}, x0, x1
+    st1             {v5.4h}, x0, x1
+.endr
+    ret

x265_3.5.tar.gz/source/common/aarch64/pixel-util.h Deleted

@@ -1,40 +0,0 @@
-/*****************************************************************************
- * Copyright (C) 2020 MulticoreWare, Inc
- *
- * Authors: Yimeng Su <yimeng.su@huawei.com>
- *          Hongbin Liu <liuhongbin1@huawei.com>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
- *
- * This program is also available under a commercial proprietary license.
- * For more information, contact us at license @ x265.com.
- *****************************************************************************/
-
-#ifndef X265_PIXEL_UTIL_AARCH64_H
-#define X265_PIXEL_UTIL_AARCH64_H
-
-int x265_pixel_satd_4x4_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
-int x265_pixel_satd_4x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
-int x265_pixel_satd_4x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
-int x265_pixel_satd_4x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
-int x265_pixel_satd_8x4_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
-int x265_pixel_satd_8x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
-int x265_pixel_satd_12x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
-int x265_pixel_satd_12x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
-
-uint32_t x265_quant_neon(const int16_t* coef, const int32_t* quantCoeff, int32_t* deltaU, int16_t* qCoef, int qBits, int add, int numCoeff);
-int PFX(psyCost_4x4_neon)(const pixel* source, intptr_t sstride, const pixel* recon, intptr_t rstride);
-
-#endif // ifndef X265_PIXEL_UTIL_AARCH64_H

x265_3.5.tar.gz/source/common/aarch64/pixel.h Deleted

@@ -1,105 +0,0 @@
-/*****************************************************************************
- * Copyright (C) 2020 MulticoreWare, Inc
- *
- * Authors: Hongbin Liu <liuhongbin1@huawei.com>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
- *
- * This program is also available under a commercial proprietary license.
- * For more information, contact us at license @ x265.com.
- *****************************************************************************/
-
-#ifndef X265_I386_PIXEL_AARCH64_H
-#define X265_I386_PIXEL_AARCH64_H
-
-void x265_pixel_avg_pp_4x4_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_4x8_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_4x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_8x4_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_8x8_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_8x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_8x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_12x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_16x4_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_16x8_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_16x12_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_16x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_16x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_16x64_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_24x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_32x8_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_32x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_32x24_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_32x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_32x64_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_48x64_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_64x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_64x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_64x48_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-void x265_pixel_avg_pp_64x64_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
-
-void x265_sad_x3_4x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_4x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_4x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_8x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_8x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_8x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_8x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_12x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_16x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_16x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_16x12_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_16x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_16x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_16x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_24x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_32x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_32x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_32x24_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_32x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_32x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_48x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_64x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_64x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_64x48_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-void x265_sad_x3_64x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
-
-void x265_sad_x4_4x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_4x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_4x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_8x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_8x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_8x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_8x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_12x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_16x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_16x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_16x12_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_16x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_16x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_16x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_24x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_32x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_32x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_32x24_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_32x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_32x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_48x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_64x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_64x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_64x48_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-void x265_sad_x4_64x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
-
-#endif // ifndef X265_I386_PIXEL_AARCH64_H

x265_3.5.tar.gz/source/common/aarch64/sad-a.S Changed

@@ -1,7 +1,8 @@
 /*****************************************************************************
- * Copyright (C) 2020 MulticoreWare, Inc
+ * Copyright (C) 2020-2021 MulticoreWare, Inc
  *
  * Authors: Hongbin Liu <liuhongbin1@huawei.com>
+ *          Sebastian Pop <spop@amazon.com>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -23,83 +24,661 @@
 
 #include "asm.S"
 
+#ifdef __APPLE__
+.section __RODATA,__rodata
+#else
 .section .rodata
+#endif
 
 .align 4
 
 .text
 
-.macro SAD_X_START_8 x
-    ld1             {v0.8b}, x0, x9
-.if \x == 3
-    ld1             {v1.8b}, x1, x4
-    ld1             {v2.8b}, x2, x4
-    ld1             {v3.8b}, x3, x4
-.elseif \x == 4
-    ld1             {v1.8b}, x1, x5
-    ld1             {v2.8b}, x2, x5
-    ld1             {v3.8b}, x3, x5
-    ld1             {v4.8b}, x4, x5
+.macro SAD_START_4 f
+    ld1             {v0.s}0, x0, x1
+    ld1             {v0.s}1, x0, x1
+    ld1             {v1.s}0, x2, x3
+    ld1             {v1.s}1, x2, x3
+    \f              v16.8h, v0.8b, v1.8b
+.endm
+
+.macro SAD_4 h
+.rept \h / 2 - 1
+    SAD_START_4 uabal
+.endr
+.endm
+
+.macro SAD_START_8 f
+    ld1             {v0.8b}, x0, x1
+    ld1             {v1.8b}, x2, x3
+    ld1             {v2.8b}, x0, x1
+    ld1             {v3.8b}, x2, x3
+    \f              v16.8h, v0.8b, v1.8b
+    \f              v17.8h, v2.8b, v3.8b
+.endm
+
+.macro SAD_8 h
+.rept \h / 2 - 1
+    SAD_START_8 uabal
+.endr
+.endm
+
+.macro SAD_START_16 f
+    ld1             {v0.16b}, x0, x1
+    ld1             {v1.16b}, x2, x3
+    ld1             {v2.16b}, x0, x1
+    ld1             {v3.16b}, x2, x3
+    \f              v16.8h, v0.8b, v1.8b
+    \f\()2          v17.8h, v0.16b, v1.16b
+    uabal           v16.8h, v2.8b, v3.8b
+    uabal2          v17.8h, v2.16b, v3.16b
+.endm
+
+.macro SAD_16 h
+.rept \h / 2 - 1
+    SAD_START_16 uabal
+.endr
+.endm
+
+.macro SAD_START_32
+    movi            v16.16b, #0
+    movi            v17.16b, #0
+    movi            v18.16b, #0
+    movi            v19.16b, #0
+.endm
+
+.macro SAD_32
+    ld1             {v0.16b-v1.16b}, x0, x1
+    ld1             {v2.16b-v3.16b}, x2, x3
+    ld1             {v4.16b-v5.16b}, x0, x1
+    ld1             {v6.16b-v7.16b}, x2, x3
+    uabal           v16.8h, v0.8b, v2.8b
+    uabal2          v17.8h, v0.16b, v2.16b
+    uabal           v18.8h, v1.8b, v3.8b
+    uabal2          v19.8h, v1.16b, v3.16b
+    uabal           v16.8h, v4.8b, v6.8b
+    uabal2          v17.8h, v4.16b, v6.16b
+    uabal           v18.8h, v5.8b, v7.8b
+    uabal2          v19.8h, v5.16b, v7.16b
+.endm
+
+.macro SAD_END_32
+    add             v16.8h, v16.8h, v17.8h
+    add             v17.8h, v18.8h, v19.8h
+    add             v16.8h, v16.8h, v17.8h
+    uaddlv          s0, v16.8h
+    fmov            w0, s0
+    ret
+.endm
+
+.macro SAD_START_64
+    movi            v16.16b, #0
+    movi            v17.16b, #0
+    movi            v18.16b, #0
+    movi            v19.16b, #0
+    movi            v20.16b, #0
+    movi            v21.16b, #0
+    movi            v22.16b, #0
+    movi            v23.16b, #0
+.endm
+
+.macro SAD_64
+    ld1             {v0.16b-v3.16b}, x0, x1
+    ld1             {v4.16b-v7.16b}, x2, x3
+    ld1             {v24.16b-v27.16b}, x0, x1
+    ld1             {v28.16b-v31.16b}, x2, x3
+    uabal           v16.8h, v0.8b, v4.8b
+    uabal2          v17.8h, v0.16b, v4.16b
+    uabal           v18.8h, v1.8b, v5.8b
+    uabal2          v19.8h, v1.16b, v5.16b
+    uabal           v20.8h, v2.8b, v6.8b
+    uabal2          v21.8h, v2.16b, v6.16b
+    uabal           v22.8h, v3.8b, v7.8b
+    uabal2          v23.8h, v3.16b, v7.16b
+
+    uabal           v16.8h, v24.8b, v28.8b
+    uabal2          v17.8h, v24.16b, v28.16b
+    uabal           v18.8h, v25.8b, v29.8b
+    uabal2          v19.8h, v25.16b, v29.16b
+    uabal           v20.8h, v26.8b, v30.8b
+    uabal2          v21.8h, v26.16b, v30.16b
+    uabal           v22.8h, v27.8b, v31.8b
+    uabal2          v23.8h, v27.16b, v31.16b
+.endm
+
+.macro SAD_END_64
+    add             v16.8h, v16.8h, v17.8h
+    add             v17.8h, v18.8h, v19.8h
+    add             v16.8h, v16.8h, v17.8h
+    uaddlp          v16.4s, v16.8h
+    add             v18.8h, v20.8h, v21.8h
+    add             v19.8h, v22.8h, v23.8h
+    add             v17.8h, v18.8h, v19.8h
+    uaddlp          v17.4s, v17.8h
+    add             v16.4s, v16.4s, v17.4s
+    uaddlv          d0, v16.4s
+    fmov            x0, d0
+    ret
+.endm
+
+.macro SAD_START_12
+    movrel          x12, sad12_mask
+    ld1             {v31.16b}, x12
+    movi            v16.16b, #0
+    movi            v17.16b, #0
+.endm
+
+.macro SAD_12
+    ld1             {v0.16b}, x0, x1
+    and             v0.16b, v0.16b, v31.16b
+    ld1             {v1.16b}, x2, x3
+    and             v1.16b, v1.16b, v31.16b
+    ld1             {v2.16b}, x0, x1
+    and             v2.16b, v2.16b, v31.16b
+    ld1             {v3.16b}, x2, x3
+    and             v3.16b, v3.16b, v31.16b
+    uabal           v16.8h, v0.8b, v1.8b
+    uabal2          v17.8h, v0.16b, v1.16b
+    uabal           v16.8h, v2.8b, v3.8b
+    uabal2          v17.8h, v2.16b, v3.16b
+.endm
+
+.macro SAD_END_12
+    add             v16.8h, v16.8h, v17.8h
+    uaddlv          s0, v16.8h
+    fmov            w0, s0
+    ret
+.endm
+
+.macro SAD_START_24
+    movi            v16.16b, #0
+    movi            v17.16b, #0
+    movi            v18.16b, #0
+    sub             x1, x1, #16
+    sub             x3, x3, #16
+.endm
+
+.macro SAD_24

x265_3.5.tar.gz/source/common/aarch64/ssd-a.S Added

@@ -0,0 +1,483 @@
+/*****************************************************************************
+ * Copyright (C) 2021 MulticoreWare, Inc
+ *
+ * Authors: Sebastian Pop <spop@amazon.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
+ *
+ * This program is also available under a commercial proprietary license.
+ * For more information, contact us at license @ x265.com.
+ *****************************************************************************/
+
+#include "asm.S"
+
+#ifdef __APPLE__
+.section __RODATA,__rodata
+#else
+.section .rodata
+#endif
+
+.align 4
+
+.text
+
+.macro ret_v0_w0
+    trn2            v1.2d, v0.2d, v0.2d
+    add             v0.2s, v0.2s, v1.2s
+    addp            v0.2s, v0.2s, v0.2s
+    fmov            w0, s0
+    ret
+.endm
+
+function PFX(pixel_sse_pp_4x4_neon)
+    ld1             {v16.s}0, x0, x1
+    ld1             {v17.s}0, x2, x3
+    ld1             {v18.s}0, x0, x1
+    ld1             {v19.s}0, x2, x3
+    ld1             {v20.s}0, x0, x1
+    ld1             {v21.s}0, x2, x3
+    ld1             {v22.s}0, x0, x1
+    ld1             {v23.s}0, x2, x3
+
+    usubl           v1.8h, v16.8b, v17.8b
+    usubl           v2.8h, v18.8b, v19.8b
+    usubl           v3.8h, v20.8b, v21.8b
+    usubl           v4.8h, v22.8b, v23.8b
+
+    smull           v0.4s, v1.4h, v1.4h
+    smlal           v0.4s, v2.4h, v2.4h
+    smlal           v0.4s, v3.4h, v3.4h
+    smlal           v0.4s, v4.4h, v4.4h
+    ret_v0_w0
+endfunc
+
+function PFX(pixel_sse_pp_4x8_neon)
+    ld1             {v16.s}0, x0, x1
+    ld1             {v17.s}0, x2, x3
+    usubl           v1.8h, v16.8b, v17.8b
+    ld1             {v16.s}0, x0, x1
+    ld1             {v17.s}0, x2, x3
+    smull           v0.4s, v1.4h, v1.4h
+.rept 6
+    usubl           v1.8h, v16.8b, v17.8b
+    ld1             {v16.s}0, x0, x1
+    smlal           v0.4s, v1.4h, v1.4h
+    ld1             {v17.s}0, x2, x3
+.endr
+    usubl           v1.8h, v16.8b, v17.8b
+    smlal           v0.4s, v1.4h, v1.4h
+    ret_v0_w0
+endfunc
+
+function PFX(pixel_sse_pp_8x8_neon)
+    ld1             {v16.8b}, x0, x1
+    ld1             {v17.8b}, x2, x3
+    usubl           v1.8h, v16.8b, v17.8b
+    ld1             {v16.8b}, x0, x1
+    smull           v0.4s, v1.4h, v1.4h
+    smlal2          v0.4s, v1.8h, v1.8h
+    ld1             {v17.8b}, x2, x3
+
+.rept 6
+    usubl           v1.8h, v16.8b, v17.8b
+    ld1             {v16.8b}, x0, x1
+    smlal           v0.4s, v1.4h, v1.4h
+    smlal2          v0.4s, v1.8h, v1.8h
+    ld1             {v17.8b}, x2, x3
+.endr
+    usubl           v1.8h, v16.8b, v17.8b
+    smlal           v0.4s, v1.4h, v1.4h
+    smlal2          v0.4s, v1.8h, v1.8h
+    ret_v0_w0
+endfunc
+
+function PFX(pixel_sse_pp_8x16_neon)
+    ld1             {v16.8b}, x0, x1
+    ld1             {v17.8b}, x2, x3
+    usubl           v1.8h, v16.8b, v17.8b
+    ld1             {v16.8b}, x0, x1
+    smull           v0.4s, v1.4h, v1.4h
+    smlal2          v0.4s, v1.8h, v1.8h
+    ld1             {v17.8b}, x2, x3
+
+.rept 14
+    usubl           v1.8h, v16.8b, v17.8b
+    ld1             {v16.8b}, x0, x1
+    smlal           v0.4s, v1.4h, v1.4h
+    smlal2          v0.4s, v1.8h, v1.8h
+    ld1             {v17.8b}, x2, x3
+.endr
+    usubl           v1.8h, v16.8b, v17.8b
+    smlal           v0.4s, v1.4h, v1.4h
+    smlal2          v0.4s, v1.8h, v1.8h
+    ret_v0_w0
+endfunc
+
+.macro sse_pp_16xN h
+function PFX(pixel_sse_pp_16x\h\()_neon)
+    ld1             {v16.16b}, x0, x1
+    ld1             {v17.16b}, x2, x3
+    usubl           v1.8h, v16.8b, v17.8b
+    usubl2          v2.8h, v16.16b, v17.16b
+    ld1             {v16.16b}, x0, x1
+    ld1             {v17.16b}, x2, x3
+    smull           v0.4s, v1.4h, v1.4h
+    smlal2          v0.4s, v1.8h, v1.8h
+    smlal           v0.4s, v2.4h, v2.4h
+    smlal2          v0.4s, v2.8h, v2.8h
+.rept \h - 2
+    usubl           v1.8h, v16.8b, v17.8b
+    usubl2          v2.8h, v16.16b, v17.16b
+    ld1             {v16.16b}, x0, x1
+    smlal           v0.4s, v1.4h, v1.4h
+    smlal2          v0.4s, v1.8h, v1.8h
+    ld1             {v17.16b}, x2, x3
+    smlal           v0.4s, v2.4h, v2.4h
+    smlal2          v0.4s, v2.8h, v2.8h
+.endr
+    usubl           v1.8h, v16.8b, v17.8b
+    usubl2          v2.8h, v16.16b, v17.16b
+    smlal           v0.4s, v1.4h, v1.4h
+    smlal2          v0.4s, v1.8h, v1.8h
+    smlal           v0.4s, v2.4h, v2.4h
+    smlal2          v0.4s, v2.8h, v2.8h
+    ret_v0_w0
+endfunc
+.endm
+
+sse_pp_16xN 16
+sse_pp_16xN 32
+
+function PFX(pixel_sse_pp_32x32_neon)
+    mov             w12, #8
+    movi            v0.16b, #0
+    movi            v1.16b, #0
+.loop_sse_pp_32:
+    sub             w12, w12, #1
+.rept 4
+    ld1             {v16.16b,v17.16b}, x0, x1
+    ld1             {v18.16b,v19.16b}, x2, x3
+    usubl           v2.8h, v16.8b, v18.8b
+    usubl2          v3.8h, v16.16b, v18.16b
+    usubl           v4.8h, v17.8b, v19.8b
+    usubl2          v5.8h, v17.16b, v19.16b
+    smlal           v0.4s, v2.4h, v2.4h
+    smlal2          v1.4s, v2.8h, v2.8h
+    smlal           v0.4s, v3.4h, v3.4h
+    smlal2          v1.4s, v3.8h, v3.8h
+    smlal           v0.4s, v4.4h, v4.4h
+    smlal2          v1.4s, v4.8h, v4.8h
+    smlal           v0.4s, v5.4h, v5.4h
+    smlal2          v1.4s, v5.8h, v5.8h
+.endr
+    cbnz            w12, .loop_sse_pp_32
+    add             v0.4s, v0.4s, v1.4s
+    ret_v0_w0
+endfunc
+
+function PFX(pixel_sse_pp_32x64_neon)
+    mov             w12, #16
+    movi            v0.16b, #0
+    movi            v1.16b, #0
+.loop_sse_pp_32x64:
+    sub             w12, w12, #1
+.rept 4
+    ld1             {v16.16b,v17.16b}, x0, x1
+    ld1             {v18.16b,v19.16b}, x2, x3
+    usubl           v2.8h, v16.8b, v18.8b

x265_3.5.tar.gz/source/common/common.h Changed

@@ -130,7 +130,7 @@
 typedef uint64_t pixel4;
 typedef int64_t  ssum2_t;
 #define SHIFT_TO_BITPLANE 9
-#define HISTOGRAM_BINS 1024
+#define BRIGHTNESS_THRESHOLD 120 // The threshold above which a pixel is bright
 #else
 typedef uint8_t  pixel;
 typedef uint16_t sum_t;
@@ -138,7 +138,7 @@
 typedef uint32_t pixel4;
 typedef int32_t  ssum2_t; // Signed sum
 #define SHIFT_TO_BITPLANE 7
-#define HISTOGRAM_BINS 256
+#define BRIGHTNESS_THRESHOLD 30 // The threshold above which a pixel is bright
 #endif // if HIGH_BIT_DEPTH
 
 #if X265_DEPTH < 10
@@ -162,6 +162,8 @@
 
 #define MIN_QPSCALE     0.21249999999999999
 #define MAX_MAX_QPSCALE 615.46574234477100
+#define FRAME_BRIGHTNESS_THRESHOLD  50.0 // Min % of pixels in a frame, that are above BRIGHTNESS_THRESHOLD for it to be considered a bright frame
+#define FRAME_EDGE_THRESHOLD  10.0 // Min % of edge pixels in a frame, for it to be considered to have high edge density
 
 
 template<typename T>
@@ -340,6 +342,9 @@
 #define FILLER_OVERHEAD (NAL_TYPE_OVERHEAD + START_CODE_OVERHEAD + 1)
 
 #define MAX_NUM_DYN_REFINE          (NUM_CU_DEPTH * X265_REFINE_INTER_LEVELS)
+#define X265_BYTE 8
+
+#define MAX_MCSTF_TEMPORAL_WINDOW_LENGTH 8
 
 namespace X265_NS {
 
@@ -434,6 +439,14 @@
 #define  x265_unlink(fileName) unlink(fileName)
 #define  x265_rename(oldName, newName) rename(oldName, newName)
 #endif
+/* Close a file */
+#define  x265_fclose(file) if (file != NULL) fclose(file); file=NULL;
+#define x265_fread(val, size, readSize, fileOffset,errorMessage)\
+    if (fread(val, size, readSize, fileOffset) != readSize)\
+    {\
+        x265_log(NULL, X265_LOG_ERROR, errorMessage); \
+        return; \
+    }
 int      x265_exp2fix8(double x);
 
 double   x265_ssim2dB(double ssim);

x265_3.5.tar.gz/source/common/cpu.cpp Changed

@@ -7,6 +7,8 @@
  *          Steve Borho <steve@borho.org>
  *          Hongbin Liu <liuhongbin1@huawei.com>
  *          Yimeng Su <yimeng.su@huawei.com>
+ *          Josh Dekker <josh@itanimul.li>
+ *          Jean-Baptiste Kempf <jb@videolan.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -105,6 +107,9 @@
     { "NEON",            X265_CPU_NEON },
     { "FastNeonMRC",     X265_CPU_FAST_NEON_MRC },
 
+#elif X265_ARCH_ARM64
+    { "NEON",            X265_CPU_NEON },
+
 #elif X265_ARCH_POWER8
     { "Altivec",         X265_CPU_ALTIVEC },
 
@@ -369,12 +374,23 @@
     flags |= PFX(cpu_fast_neon_mrc_test)() ? X265_CPU_FAST_NEON_MRC : 0;
 #endif
     // TODO: write dual issue test? currently it's A8 (dual issue) vs. A9 (fast mrc)
-#elif X265_ARCH_ARM64
-    flags |= X265_CPU_NEON;
 #endif // if HAVE_ARMV6
     return flags;
 }
 
+#elif X265_ARCH_ARM64
+
+uint32_t cpu_detect(bool benableavx512)
+{
+    int flags = 0;
+
+    #if HAVE_NEON
+         flags |= X265_CPU_NEON;
+    #endif
+        
+    return flags;
+}
+
 #elif X265_ARCH_POWER8
 
 uint32_t cpu_detect(bool benableavx512)

x265_3.5.tar.gz/source/common/frame.cpp Changed

@@ -63,13 +63,40 @@
     m_thetaPic = NULL;
     m_edgeBitPlane = NULL;
     m_edgeBitPic = NULL;
+    m_frameSegment = 0;
     m_isInsideWindow = 0;
+
+    // mcstf
+    m_isSubSampled = NULL;
+    m_mcstf = NULL;
+    m_refPicCnt0 = 0;
+    m_refPicCnt1 = 0;
+    m_nextMCSTF = NULL;
+    m_prevMCSTF = NULL;
+
 }
 
 bool Frame::create(x265_param *param, float* quantOffsets)
 {
     m_fencPic = new PicYuv;
     m_param = param;
+
+    if (m_param->bEnableTemporalFilter)
+    {
+        m_mcstf = new TemporalFilter;
+        m_mcstf->init(param);
+
+        m_fencPicSubsampled2 = new PicYuv;
+        m_fencPicSubsampled4 = new PicYuv;
+
+        if (!m_fencPicSubsampled2->createScaledPicYUV(param, 2))
+            return false;
+        if (!m_fencPicSubsampled4->createScaledPicYUV(param, 4))
+            return false;
+
+        CHECKED_MALLOC_ZERO(m_isSubSampled, int, 1);
+    }
+
     CHECKED_MALLOC_ZERO(m_rcData, RcStats, 1);
 
     if (param->bCTUInfo)
@@ -104,7 +131,7 @@
         CHECKED_MALLOC_ZERO(m_classifyCount, uint32_t, size);
     }
 
-    if (param->rc.aqMode == X265_AQ_EDGE || (param->rc.zonefileCount && param->rc.aqMode != 0))
+    if (param->rc.aqMode == X265_AQ_EDGE || param->rc.frameSegment || (param->rc.zonefileCount && param->rc.aqMode != 0))
     {
         uint32_t numCuInWidth = (param->sourceWidth + param->maxCUSize - 1) / param->maxCUSize;
         uint32_t numCuInHeight = (param->sourceHeight + param->maxCUSize - 1) / param->maxCUSize;
@@ -151,6 +178,22 @@
     return false;
 }
 
+bool Frame::createSubSample()
+{
+
+    m_fencPicSubsampled2 = new PicYuv;
+    m_fencPicSubsampled4 = new PicYuv;
+
+    if (!m_fencPicSubsampled2->createScaledPicYUV(m_param, 2))
+        return false;
+    if (!m_fencPicSubsampled4->createScaledPicYUV(m_param, 4))
+        return false;
+    CHECKED_MALLOC_ZERO(m_isSubSampled, int, 1);
+    return true;
+fail:
+    return false;
+}
+
 bool Frame::allocEncodeData(x265_param *param, const SPS& sps)
 {
     m_encData = new FrameData;
@@ -207,6 +250,26 @@
         m_fencPic = NULL;
     }
 
+    if (m_param->bEnableTemporalFilter)
+    {
+
+        if (m_fencPicSubsampled2)
+        {
+            m_fencPicSubsampled2->destroy();
+            delete m_fencPicSubsampled2;
+            m_fencPicSubsampled2 = NULL;
+        }
+
+        if (m_fencPicSubsampled4)
+        {
+            m_fencPicSubsampled4->destroy();
+            delete m_fencPicSubsampled4;
+            m_fencPicSubsampled4 = NULL;
+        }
+        delete m_mcstf;
+        X265_FREE(m_isSubSampled);
+    }
+
     if (m_reconPic)
     {
         m_reconPic->destroy();
@@ -267,7 +330,8 @@
         X265_FREE(m_addOnPrevChange);
         m_addOnPrevChange = NULL;
     }
-    m_lowres.destroy();
+
+    m_lowres.destroy(m_param);
     X265_FREE(m_rcData);
 
     if (m_param->bDynamicRefine)

x265_3.5.tar.gz/source/common/frame.h Changed

@@ -28,6 +28,7 @@
 #include "common.h"
 #include "lowres.h"
 #include "threading.h"
+#include "temporalfilter.h"
 
 namespace X265_NS {
 // private namespace
@@ -70,6 +71,7 @@
     double   count4;
     double   offset4;
     double   bufferFillFinal;
+    int64_t  currentSatd;
 };
 
 class Frame
@@ -83,6 +85,9 @@
 
     /* Data associated with x265_picture */
     PicYuv*                m_fencPic;
+    PicYuv*                m_fencPicSubsampled2;
+    PicYuv*                m_fencPicSubsampled4;
+
     int                    m_poc;
     int                    m_encodeOrder;
     int64_t                m_pts;                // user provided presentation time stamp
@@ -132,6 +137,13 @@
     bool                   m_classifyFrame;
     int                    m_fieldNum;
 
+    /*MCSTF*/
+    TemporalFilter*        m_mcstf;
+    int                    m_refPicCnt2;
+    Frame*                 m_nextMCSTF;           // PicList doubly linked list pointers
+    Frame*                 m_prevMCSTF;
+    int*                   m_isSubSampled;
+
     /* aq-mode 4 : Gaussian, edge and theta frames for edge information */
     pixel*                 m_edgePic;
     pixel*                 m_gaussianPic;
@@ -141,11 +153,16 @@
     pixel*                 m_edgeBitPlane;
     pixel*                 m_edgeBitPic;
 
+    /* segment for each frame */
+    int                    m_frameSegment;
+
+
     int                    m_isInsideWindow;
 
     Frame();
 
     bool create(x265_param *param, float* quantOffsets);
+    bool createSubSample();
     bool allocEncodeData(x265_param *param, const SPS& sps);
     void reinit(const SPS& sps);
     void destroy();

x265_3.5.tar.gz/source/common/framedata.cpp Changed

x265_3.5.tar.gz/source/common/lowres.cpp Changed

@@ -28,6 +28,28 @@
 
 using namespace X265_NS;
 
+/*
+ * Down Sample input picture
+ */
+static
+void frame_lowres_core(const pixel* src0, pixel* dst0,
+    intptr_t src_stride, intptr_t dst_stride, int width, int height)
+{
+    for (int y = 0; y < height; y++)
+    {
+        const pixel* src1 = src0 + src_stride;
+        for (int x = 0; x < width; x++)
+        {
+            // slower than naive bilinear, but matches asm
+#define FILTER(a, b, c, d) ((((a + b + 1) >> 1) + ((c + d + 1) >> 1) + 1) >> 1)
+            dst0x = FILTER(src02 * x, src12 * x, src02 * x + 1, src12 * x + 1);
+#undef FILTER
+        }
+        src0 += src_stride * 2;
+        dst0 += dst_stride;
+    }
+}
+
 bool PicQPAdaptationLayer::create(uint32_t width, uint32_t height, uint32_t partWidth, uint32_t partHeight, uint32_t numAQPartInWidthExt, uint32_t numAQPartInHeightExt)
 {
     aqPartWidth = partWidth;
@@ -190,13 +212,48 @@
         }
     }
 
+    if (param->rc.frameSegment)
+        lowresEdgePlane = X265_MALLOC(pixel, lumaStride * (lines + (origPic->m_lumaMarginY * 2)));
+
+    if (param->bHistBasedSceneCut)
+    {
+        quarterSampleLowResWidth = widthFullRes / 4;
+        quarterSampleLowResHeight = heightFullRes / 4;
+        quarterSampleLowResOriginX = 16;
+        quarterSampleLowResOriginY = 16;
+        quarterSampleLowResStrideY = quarterSampleLowResWidth + 2 * quarterSampleLowResOriginY;
+
+        size_t quarterSampleLowResPlanesize = quarterSampleLowResStrideY * (quarterSampleLowResHeight + 2 * quarterSampleLowResOriginX);
+        /* allocate quarter sampled lowres buffers */
+        CHECKED_MALLOC_ZERO(quarterSampleLowResBuffer, pixel, quarterSampleLowResPlanesize);
+
+        // Allocate memory for Histograms
+        picHistogram = X265_MALLOC(uint32_t***, NUMBER_OF_SEGMENTS_IN_WIDTH * sizeof(uint32_t***));
+        picHistogram0 = X265_MALLOC(uint32_t**, NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT);
+        for (uint32_t wd = 1; wd < NUMBER_OF_SEGMENTS_IN_WIDTH; wd++) {
+            picHistogramwd = picHistogram0 + wd * NUMBER_OF_SEGMENTS_IN_HEIGHT;
+        }
+
+        for (uint32_t regionInPictureWidthIndex = 0; regionInPictureWidthIndex < NUMBER_OF_SEGMENTS_IN_WIDTH; regionInPictureWidthIndex++)
+        {
+            for (uint32_t regionInPictureHeightIndex = 0; regionInPictureHeightIndex < NUMBER_OF_SEGMENTS_IN_HEIGHT; regionInPictureHeightIndex++)
+            {
+                picHistogramregionInPictureWidthIndexregionInPictureHeightIndex = X265_MALLOC(uint32_t*, NUMBER_OF_SEGMENTS_IN_WIDTH *sizeof(uint32_t*));
+                picHistogramregionInPictureWidthIndexregionInPictureHeightIndex0 = X265_MALLOC(uint32_t, 3 * HISTOGRAM_NUMBER_OF_BINS * sizeof(uint32_t));
+                for (uint32_t wd = 1; wd < 3; wd++) {
+                    picHistogramregionInPictureWidthIndexregionInPictureHeightIndexwd = picHistogramregionInPictureWidthIndexregionInPictureHeightIndex0 + wd * HISTOGRAM_NUMBER_OF_BINS;
+                }
+            }
+        }
+    }
+
     return true;
 
 fail:
     return false;
 }
 
-void Lowres::destroy()
+void Lowres::destroy(x265_param* param)
 {
     X265_FREE(buffer0);
     if(bEnableHME)
@@ -234,7 +291,9 @@
     X265_FREE(invQscaleFactor8x8);
     X265_FREE(edgeInclined);
     X265_FREE(qpAqMotionOffset);
-    X265_FREE(blockVariance);
+    if (param->bDynamicRefine || param->bEnableFades)
+        X265_FREE(blockVariance);
+
     if (maxAQDepth > 0)
     {
         for (uint32_t d = 0; d < 4; d++)
@@ -254,6 +313,29 @@
 
         delete pAQLayer;
     }
+
+    // Histograms
+    if (param->bHistBasedSceneCut)
+    {
+        for (uint32_t segmentInFrameWidthIdx = 0; segmentInFrameWidthIdx < NUMBER_OF_SEGMENTS_IN_WIDTH; segmentInFrameWidthIdx++)
+        {
+            if (picHistogramsegmentInFrameWidthIdx)
+            {
+                for (uint32_t segmentInFrameHeightIdx = 0; segmentInFrameHeightIdx < NUMBER_OF_SEGMENTS_IN_HEIGHT; segmentInFrameHeightIdx++)
+                {
+                    if (picHistogramsegmentInFrameWidthIdxsegmentInFrameHeightIdx)
+                        X265_FREE(picHistogramsegmentInFrameWidthIdxsegmentInFrameHeightIdx0);
+                    X265_FREE(picHistogramsegmentInFrameWidthIdxsegmentInFrameHeightIdx);
+                }
+            }
+        }
+        if (picHistogram)
+            X265_FREE(picHistogram0);
+        X265_FREE(picHistogram);
+
+        X265_FREE(quarterSampleLowResBuffer);
+
+    }
 }
 // (re) initialize lowres state
 void Lowres::init(PicYuv *origPic, int poc)
@@ -266,10 +348,6 @@
     indB = 0;
     memset(costEst, -1, sizeof(costEst));
     memset(weightedCostDelta, 0, sizeof(weightedCostDelta));
-    interPCostPercDiff = 0.0;
-    intraCostPercDiff = 0.0;
-    m_bIsMaxThres = false;
-    m_bIsHardScenecut = false;
 
     if (qpAqOffset && invQscaleFactor)
         memset(costEstAq, -1, sizeof(costEstAq));
@@ -314,4 +392,16 @@
     }
 
     fpelPlane0 = lowresPlane0;
+
+    if (origPic->m_param->bHistBasedSceneCut)
+    {
+        // Quarter Sampled Input Picture Formation
+        // TO DO: Replace with ASM function
+        frame_lowres_core(
+            lowresPlane0,
+            quarterSampleLowResBuffer + quarterSampleLowResOriginX + quarterSampleLowResOriginY * quarterSampleLowResStrideY,
+            lumaStride,
+            quarterSampleLowResStrideY,
+            widthFullRes / 4, heightFullRes / 4);
+    }
 }

x265_3.5.tar.gz/source/common/lowres.h Changed

@@ -32,6 +32,10 @@
 namespace X265_NS {
 // private namespace
 
+#define HISTOGRAM_NUMBER_OF_BINS         256
+#define NUMBER_OF_SEGMENTS_IN_WIDTH      4
+#define NUMBER_OF_SEGMENTS_IN_HEIGHT     4
+
 struct ReferencePlanes
 {
     ReferencePlanes() { memset(this, 0, sizeof(ReferencePlanes)); }
@@ -44,6 +48,9 @@
     pixel*   fpelLowerResPlane3;
     pixel*   lowerResPlane4;
 
+    /* Edge Plane in Lowres */
+    pixel*   lowresEdgePlane;
+
     bool     isWeighted;
     bool     isLowres;
     bool     isHMELowres;
@@ -214,13 +221,13 @@
     double*   qpAqOffset;      // AQ QP offset values for each 16x16 CU
     double*   qpCuTreeOffset;  // cuTree QP offset values for each 16x16 CU
     double*   qpAqMotionOffset;
-    int*      invQscaleFactor; // qScale values for qp Aq Offsets
+    int*      invQscaleFactor;    // qScale values for qp Aq Offsets
     int*      invQscaleFactor8x8; // temporary buffer for qg-size 8
     uint32_t* blockVariance;
     uint64_t  wp_ssd3;       // This is different than SSDY, this is sum(pixel^2) - sum(pixel)^2 for entire frame
     uint64_t  wp_sum3;
     double    frameVariance;
-    int* edgeInclined;
+    int*      edgeInclined;
 
 
     /* cutree intermediate data */
@@ -230,18 +237,30 @@
     uint32_t heightFullRes;
     uint32_t m_maxCUSize;
     uint32_t m_qgSize;
-    
+
     uint16_t* propagateCost;
     double    weightedCostDeltaX265_BFRAME_MAX + 2;
     ReferencePlanes weightedRefX265_BFRAME_MAX + 2;
+
     /* For hist-based scenecut */
-    bool   m_bIsMaxThres;
-    double interPCostPercDiff;
-    double intraCostPercDiff;
-    bool   m_bIsHardScenecut;
+    int          quarterSampleLowResWidth;     // width of 1/4 lowres frame in pixels
+    int          quarterSampleLowResHeight;    // height of 1/4 lowres frame in pixels
+    int          quarterSampleLowResStrideY;
+    int          quarterSampleLowResOriginX;
+    int          quarterSampleLowResOriginY;
+    pixel       *quarterSampleLowResBuffer;
+    bool         bHistScenecutAnalyzed;
+
+    uint16_t     picAvgVariance;
+    uint16_t     picAvgVarianceCb;
+    uint16_t     picAvgVarianceCr;
+
+    uint32_t ****picHistogram;
+    uint64_t     averageIntensityPerSegmentNUMBER_OF_SEGMENTS_IN_WIDTHNUMBER_OF_SEGMENTS_IN_HEIGHT3;
+    uint8_t      averageIntensity3;
 
     bool create(x265_param* param, PicYuv *origPic, uint32_t qgSize);
-    void destroy();
+    void destroy(x265_param* param);
     void init(PicYuv *origPic, int poc);
 };
 }

x265_3.5.tar.gz/source/common/mv.h Changed

x265_3.5.tar.gz/source/common/param.cpp Changed

@@ -145,6 +145,8 @@
     param->bAnnexB = 1;
     param->bRepeatHeaders = 0;
     param->bEnableAccessUnitDelimiters = 0;
+    param->bEnableEndOfBitstream = 0;
+    param->bEnableEndOfSequence = 0;
     param->bEmitHRDSEI = 0;
     param->bEmitInfoSEI = 1;
     param->bEmitHDRSEI = 0; /*Deprecated*/
@@ -168,7 +170,6 @@
     param->bFrameAdaptive = X265_B_ADAPT_TRELLIS;
     param->bBPyramid = 1;
     param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
-    param->edgeTransitionThreshold = 0.03;
     param->bHistBasedSceneCut = 0;
     param->lookaheadSlices = 8;
     param->lookaheadThreads = 0;
@@ -278,7 +279,10 @@
     param->rc.rfConstantMin = 0;
     param->rc.bStatRead = 0;
     param->rc.bStatWrite = 0;
+    param->rc.dataShareMode = X265_SHARE_MODE_FILE;
     param->rc.statFileName = NULL;
+    param->rc.sharedMemName = NULL;
+    param->rc.bEncFocusedFramesOnly = 0;
     param->rc.complexityBlur = 20;
     param->rc.qblur = 0.5;
     param->rc.zoneCount = 0;
@@ -292,6 +296,7 @@
     param->rc.bEnableConstVbv = 0;
     param->bResetZoneConfig = 1;
     param->reconfigWindowSize = 0;
+    param->rc.frameSegment = 0;
     param->decoderVbvMaxRate = 0;
     param->bliveVBV2pass = 0;
 
@@ -321,6 +326,7 @@
     param->maxLuma = PIXEL_MAX;
     param->log2MaxPocLsb = 8;
     param->maxSlices = 1;
+    param->videoSignalTypePreset = NULL;
 
     /*Conformance window*/
     param->confWinRightOffset = 0;
@@ -373,10 +379,16 @@
     param->bEnableSvtHevc = 0;
     param->svtHevcParam = NULL;
 
+    /* MCSTF */
+    param->bEnableTemporalFilter = 0;
+    param->temporalFilterStrength = 0.95;
+
 #ifdef SVT_HEVC
     param->svtHevcParam = svtParam;
     svt_param_default(param);
 #endif
+    /* Film grain characteristics model filename */
+    param->filmGrain = NULL;
 }
 
 int x265_param_default_preset(x265_param* param, const char* preset, const char* tune)
@@ -949,7 +961,6 @@
        {
            bError = false;
            p->scenecutThreshold = atoi(value);
-           p->bHistBasedSceneCut = 0;
        }
     }
     OPT("temporal-layers") p->bEnableTemporalSubLayers = atobool(value);
@@ -1184,6 +1195,7 @@
         int pass = x265_clip3(0, 3, atoi(value));
         p->rc.bStatWrite = pass & 1;
         p->rc.bStatRead = pass & 2;
+        p->rc.dataShareMode = X265_SHARE_MODE_FILE;
     }
     OPT("stats") p->rc.statFileName = strdup(value);
     OPT("scaling-list") p->scalingLists = strdup(value);
@@ -1216,27 +1228,14 @@
         OPT("opt-ref-list-length-pps") p->bOptRefListLengthPPS = atobool(value);
         OPT("multi-pass-opt-rps") p->bMultiPassOptRPS = atobool(value);
         OPT("scenecut-bias") p->scenecutBias = atof(value);
-        OPT("hist-scenecut")
-        {
-            p->bHistBasedSceneCut = atobool(value);
-            if (bError)
-            {
-                bError = false;
-                p->bHistBasedSceneCut = 0;
-            }
-            if (p->bHistBasedSceneCut)
-            {
-                bError = false;
-                p->scenecutThreshold = 0;
-            }
-        }
-        OPT("hist-threshold") p->edgeTransitionThreshold = atof(value);
+        OPT("hist-scenecut") p->bHistBasedSceneCut = atobool(value);
         OPT("rskip-edge-threshold") p->edgeVarThreshold = atoi(value)/100.0f;
         OPT("lookahead-threads") p->lookaheadThreads = atoi(value);
         OPT("opt-cu-delta-qp") p->bOptCUDeltaQP = atobool(value);
         OPT("multi-pass-opt-analysis") p->analysisMultiPassRefine = atobool(value);
         OPT("multi-pass-opt-distortion") p->analysisMultiPassDistortion = atobool(value);
         OPT("aq-motion") p->bAQMotion = atobool(value);
+        OPT("sbrc") p->rc.frameSegment = atobool(value);
         OPT("dynamic-rd") p->dynamicRd = atof(value);
         OPT("analysis-reuse-level")
         {
@@ -1446,6 +1445,12 @@
         OPT("vbv-live-multi-pass") p->bliveVBV2pass = atobool(value);
         OPT("min-vbv-fullness") p->minVbvFullness = atof(value);
         OPT("max-vbv-fullness") p->maxVbvFullness = atof(value);
+        OPT("video-signal-type-preset") p->videoSignalTypePreset = strdup(value);
+        OPT("eob") p->bEnableEndOfBitstream = atobool(value);
+        OPT("eos") p->bEnableEndOfSequence = atobool(value);
+        /* Film grain characterstics model filename */
+        OPT("film-grain") p->filmGrain = (char* )value;
+        OPT("mcstf") p->bEnableTemporalFilter = atobool(value);
         else
             return X265_PARAM_BAD_NAME;
     }
@@ -1761,8 +1766,6 @@
           "scenecutThreshold must be greater than 0");
     CHECK(param->scenecutBias < 0 || 100 < param->scenecutBias,
             "scenecut-bias must be between 0 and 100");
-    CHECK(param->edgeTransitionThreshold < 0.0 || 1.0 < param->edgeTransitionThreshold,
-            "hist-threshold must be between 0.0 and 1.0");
     CHECK(param->radl < 0 || param->radl > param->bframes,
           "radl must be between 0 and bframes");
     CHECK(param->rdPenalty < 0 || param->rdPenalty > 2,
@@ -1824,15 +1827,15 @@
         "Invalid refine-ctu-distortion value, must be either 0 or 1");
     CHECK(param->maxAUSizeFactor < 0.5 || param->maxAUSizeFactor > 1.0,
         "Supported factor for controlling max AU size is from 0.5 to 1");
-    CHECK((param->dolbyProfile != 0) && (param->dolbyProfile != 50) && (param->dolbyProfile != 81) && (param->dolbyProfile != 82),
-        "Unsupported Dolby Vision profile, only profile 5, profile 8.1 and profile 8.2 enabled");
+    CHECK((param->dolbyProfile != 0) && (param->dolbyProfile != 50) && (param->dolbyProfile != 81) && (param->dolbyProfile != 82) && (param->dolbyProfile != 84),
+        "Unsupported Dolby Vision profile, only profile 5, profile 8.1, profile 8.2 and profile 8.4 enabled");
     CHECK(param->dupThreshold < 1 || 99 < param->dupThreshold,
         "Invalid frame-duplication threshold. Value must be between 1 and 99.");
     if (param->dolbyProfile)
     {
         CHECK((param->rc.vbvMaxBitrate <= 0 || param->rc.vbvBufferSize <= 0), "Dolby Vision requires VBV settings to enable HRD.\n");
-        CHECK((param->internalBitDepth != 10), "Dolby Vision profile - 5, profile - 8.1 and profile - 8.2 is Main10 only\n");
-        CHECK((param->internalCsp != X265_CSP_I420), "Dolby Vision profile - 5, profile - 8.1 and profile - 8.2 requires YCbCr 4:2:0 color space\n");
+        CHECK((param->internalBitDepth != 10), "Dolby Vision profile - 5, profile - 8.1, profile - 8.2 and profile - 8.4 are Main10 only\n");
+        CHECK((param->internalCsp != X265_CSP_I420), "Dolby Vision profile - 5, profile - 8.1, profile - 8.2 and profile - 8.4 requires YCbCr 4:2:0 color space\n");
         if (param->dolbyProfile == 81)
             CHECK(!(param->masteringDisplayColorVolume), "Dolby Vision profile - 8.1 requires Mastering display color volume information\n");
     }
@@ -1898,6 +1901,11 @@
         param->bSingleSeiNal = 0;
         x265_log(param, X265_LOG_WARNING, "None of the SEI messages are enabled. Disabling Single SEI NAL\n");
     }
+    if (param->bEnableTemporalFilter && (param->frameNumThreads > 1))
+    {
+        param->bEnableTemporalFilter = 0;
+        x265_log(param, X265_LOG_WARNING, "MCSTF can be enabled with frame thread = 1 only. Disabling MCSTF\n");
+    }
     CHECK(param->confWinRightOffset < 0, "Conformance Window Right Offset must be 0 or greater");
     CHECK(param->confWinBottomOffset < 0, "Conformance Window Bottom Offset must be 0 or greater");
     CHECK(param->decoderVbvMaxRate < 0, "Invalid Decoder Vbv Maxrate. Value can not be less than zero");
@@ -1910,6 +1918,7 @@
             x265_log(param, X265_LOG_WARNING, "Live VBV enabled without VBV settings.Disabling live VBV in 2 pass\n");
         }
     }
+    CHECK(param->rc.dataShareMode != X265_SHARE_MODE_FILE && param->rc.dataShareMode != X265_SHARE_MODE_SHAREDMEM, "Invalid data share mode. It must be one of the X265_DATA_SHARE_MODES enum values\n" );
     return check_failed;
 }
 
@@ -1970,8 +1979,8 @@
         x265_log(param, X265_LOG_INFO, "Keyframe min / max / scenecut / bias  : %d / %d / %d / %.2lf \n",
                  param->keyframeMin, param->keyframeMax, param->scenecutThreshold, param->scenecutBias * 100);
     else if (param->bHistBasedSceneCut && param->keyframeMax != INT_MAX) 
-        x265_log(param, X265_LOG_INFO, "Keyframe min / max / scenecut / edge threshold  : %d / %d / %d / %.2lf\n",
-                 param->keyframeMin, param->keyframeMax, param->bHistBasedSceneCut, param->edgeTransitionThreshold);
+        x265_log(param, X265_LOG_INFO, "Keyframe min / max / scenecut  : %d / %d / %d\n",
+                 param->keyframeMin, param->keyframeMax, param->bHistBasedSceneCut);
     else if (param->keyframeMax == INT_MAX)
         x265_log(param, X265_LOG_INFO, "Keyframe min / max / scenecut       : disabled\n");
 
@@ -1988,9 +1997,11 @@
              param->maxNumReferences, (param->limitReferences & X265_REF_LIMIT_CU) ? "on" : "off",
              (param->limitReferences & X265_REF_LIMIT_DEPTH) ? "on" : "off");
 
-    if (param->rc.aqMode)
+    if (param->rc.aqMode && !param->rc.frameSegment)
         x265_log(param, X265_LOG_INFO, "AQ: mode / str / qg-size / cu-tree  : %d / %0.1f / %d / %d\n", param->rc.aqMode,
                  param->rc.aqStrength, param->rc.qgSize, param->rc.cuTree);
+	else if (param->rc.frameSegment)
+        x265_log(param, X265_LOG_INFO, "AQ: mode / str / qg-size / cu-tree  : auto / %0.1f / %d / %d\n", param->rc.aqStrength, param->rc.qgSize, param->rc.cuTree);
 
     if (param->bLossless)
         x265_log(param, X265_LOG_INFO, "Rate Control                        : Lossless\n");
@@ -2089,6 +2100,8 @@
         bufSize += strlen(p->numaPools);
     if (p->masteringDisplayColorVolume)
         bufSize += strlen(p->masteringDisplayColorVolume);
+    if (p->videoSignalTypePreset)
+        bufSize += strlen(p->videoSignalTypePreset);

x265_3.5.tar.gz/source/common/piclist.cpp Changed

@@ -45,6 +45,25 @@
     m_count++;
 }
 
+void PicList::pushFrontMCSTF(Frame& curFrame)
+{
+    X265_CHECK(!curFrame.m_nextMCSTF && !curFrame.m_nextMCSTF, "piclist: picture already in OPB list\n"); // ensure frame is not in a list
+    curFrame.m_nextMCSTF = m_start;
+    curFrame.m_prevMCSTF = NULL;
+
+    if (m_count)
+    {
+        m_start->m_prevMCSTF = &curFrame;
+        m_start = &curFrame;
+    }
+    else
+    {
+        m_start = m_end = &curFrame;
+    }
+    m_count++;
+
+}
+
 void PicList::pushBack(Frame& curFrame)
 {
     X265_CHECK(!curFrame.m_next && !curFrame.m_prev, "piclist: picture already in list\n"); // ensure frame is not in a list
@@ -63,6 +82,24 @@
     m_count++;
 }
 
+void PicList::pushBackMCSTF(Frame& curFrame)
+{
+    X265_CHECK(!curFrame.m_nextMCSTF && !curFrame.m_prevMCSTF, "piclist: picture already in OPB list\n"); // ensure frame is not in a list
+    curFrame.m_nextMCSTF = NULL;
+    curFrame.m_prevMCSTF = m_end;
+
+    if (m_count)
+    {
+        m_end->m_nextMCSTF = &curFrame;
+        m_end = &curFrame;
+    }
+    else
+    {
+        m_start = m_end = &curFrame;
+    }
+    m_count++;
+}
+
 Frame *PicList::popFront()
 {
     if (m_start)
@@ -94,6 +131,14 @@
     return curFrame;
 }
 
+Frame* PicList::getPOCMCSTF(int poc)
+{
+    Frame *curFrame = m_start;
+    while (curFrame && curFrame->m_poc != poc)
+        curFrame = curFrame->m_nextMCSTF;
+    return curFrame;
+}
+
 Frame *PicList::popBack()
 {
     if (m_end)
@@ -117,6 +162,29 @@
         return NULL;
 }
 
+Frame *PicList::popBackMCSTF()
+{
+    if (m_end)
+    {
+        Frame* temp = m_end;
+        m_count--;
+
+        if (m_count)
+        {
+            m_end = m_end->m_prevMCSTF;
+            m_end->m_nextMCSTF = NULL;
+        }
+        else
+        {
+            m_start = m_end = NULL;
+        }
+        temp->m_nextMCSTF = temp->m_prevMCSTF = NULL;
+        return temp;
+    }
+    else
+        return NULL;
+}
+
 Frame* PicList::getCurFrame(void)
 {
     Frame *curFrame = m_start;
@@ -158,3 +226,36 @@
 
     curFrame.m_next = curFrame.m_prev = NULL;
 }
+
+void PicList::removeMCSTF(Frame& curFrame)
+{
+#if _DEBUG
+    Frame *tmp = m_start;
+    while (tmp && tmp != &curFrame)
+    {
+        tmp = tmp->m_nextMCSTF;
+    }
+
+    X265_CHECK(tmp == &curFrame, "framelist: pic being removed was not in list\n"); // verify pic is in this list
+#endif
+
+    m_count--;
+    if (m_count)
+    {
+        if (m_start == &curFrame)
+            m_start = curFrame.m_nextMCSTF;
+        if (m_end == &curFrame)
+            m_end = curFrame.m_prevMCSTF;
+
+        if (curFrame.m_nextMCSTF)
+            curFrame.m_nextMCSTF->m_prevMCSTF = curFrame.m_prevMCSTF;
+        if (curFrame.m_prevMCSTF)
+            curFrame.m_prevMCSTF->m_nextMCSTF = curFrame.m_nextMCSTF;
+    }
+    else
+    {
+        m_start = m_end = NULL;
+    }
+
+    curFrame.m_nextMCSTF = curFrame.m_prevMCSTF = NULL;
+}

x265_3.5.tar.gz/source/common/piclist.h Changed

x265_3.5.tar.gz/source/common/picyuv.cpp Changed

@@ -125,6 +125,58 @@
     return false;
 }
 
+/*Copy pixels from the picture buffer of a frame to picture buffer of another frame*/
+void PicYuv::copyFromFrame(PicYuv* source)
+{
+    uint32_t numCuInHeight = (m_picHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
+
+    int maxHeight = numCuInHeight * m_param->maxCUSize;
+    memcpy(m_picBuf0, source->m_picBuf0, sizeof(pixel)* m_stride * (maxHeight + (m_lumaMarginY * 2)));
+    m_picOrg0 = m_picBuf0 + m_lumaMarginY * m_stride + m_lumaMarginX;
+
+    if (m_picCsp != X265_CSP_I400)
+    {
+        memcpy(m_picBuf1, source->m_picBuf1, sizeof(pixel)* m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2)));
+        memcpy(m_picBuf2, source->m_picBuf2, sizeof(pixel)* m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2)));
+
+        m_picOrg1 = m_picBuf1 + m_chromaMarginY * m_strideC + m_chromaMarginX;
+        m_picOrg2 = m_picBuf2 + m_chromaMarginY * m_strideC + m_chromaMarginX;
+    }
+    else
+    {
+        m_picBuf1 = m_picBuf2 = NULL;
+        m_picOrg1 = m_picOrg2 = NULL;
+    }
+}
+
+bool PicYuv::createScaledPicYUV(x265_param* param, uint8_t scaleFactor)
+{
+    m_param = param;
+    m_picWidth = m_param->sourceWidth / scaleFactor;
+    m_picHeight = m_param->sourceHeight / scaleFactor;
+
+    m_picCsp = m_param->internalCsp;
+    m_hChromaShift = CHROMA_H_SHIFT(m_picCsp);
+    m_vChromaShift = CHROMA_V_SHIFT(m_picCsp);
+
+    uint32_t numCuInWidth = (m_picWidth + param->maxCUSize - 1) / param->maxCUSize;
+    uint32_t numCuInHeight = (m_picHeight + param->maxCUSize - 1) / param->maxCUSize;
+
+    m_lumaMarginX = 128; // search margin for L0 and L1 ME in horizontal direction
+    m_lumaMarginY = 128; // search margin for L0 and L1 ME in vertical direction
+    m_stride = (numCuInWidth * param->maxCUSize) + (m_lumaMarginX << 1);
+
+    int maxHeight = numCuInHeight * param->maxCUSize;
+    CHECKED_MALLOC_ZERO(m_picBuf0, pixel, m_stride * (maxHeight + (m_lumaMarginY * 2)));
+    m_picOrg0 = m_picBuf0 + m_lumaMarginY * m_stride + m_lumaMarginX;
+    m_picBuf1 = m_picBuf2 = NULL;
+    m_picOrg1 = m_picOrg2 = NULL;
+    return true;
+
+fail:
+    return false;
+}
+
 int PicYuv::getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp)
 {
     m_picWidth = picWidth;

x265_3.5.tar.gz/source/common/picyuv.h Changed

x265_3.5.tar.gz/source/common/pixel.cpp Changed

@@ -266,7 +266,7 @@
 {
     int satd = 0;
 
-#if ENABLE_ASSEMBLY && X265_ARCH_ARM64
+#if ENABLE_ASSEMBLY && X265_ARCH_ARM64 && !HIGH_BIT_DEPTH
     pixelcmp_t satd_4x4 = x265_pixel_satd_4x4_neon;
 #endif
 
@@ -284,7 +284,7 @@
 {
     int satd = 0;
 
-#if ENABLE_ASSEMBLY && X265_ARCH_ARM64
+#if ENABLE_ASSEMBLY && X265_ARCH_ARM64 && !HIGH_BIT_DEPTH
     pixelcmp_t satd_8x4 = x265_pixel_satd_8x4_neon;
 #endif
 
@@ -627,6 +627,23 @@
     }
 }
 
+static
+void frame_subsample_luma(const pixel* src0, pixel* dst0, intptr_t src_stride, intptr_t dst_stride, int width, int height)
+{
+    for (int y = 0; y < height; y++, src0 += 2 * src_stride, dst0 += dst_stride)
+    {
+        const pixel *inRow = src0;
+        const pixel *inRowBelow = src0 + src_stride;
+        pixel *target = dst0;
+        for (int x = 0; x < width; x++)
+        {
+            targetx = (((inRow0 + inRowBelow0 + 1) >> 1) + ((inRow1 + inRowBelow1 + 1) >> 1) + 1) >> 1;
+            inRow += 2;
+            inRowBelow += 2;
+        }
+    }
+}
+
 /* structural similarity metric */
 static void ssim_4x4x2_core(const pixel* pix1, intptr_t stride1, const pixel* pix2, intptr_t stride2, int sums24)
 {
@@ -1355,5 +1372,7 @@
     p.cuBLOCK_16x16.normFact = normFact_c;
     p.cuBLOCK_32x32.normFact = normFact_c;
     p.cuBLOCK_64x64.normFact = normFact_c;
+    /* SubSample Luma*/
+    p.frameSubSampleLuma = frame_subsample_luma;
 }
 }

x265_3.5.tar.gz/source/common/primitives.h Changed

@@ -232,6 +232,8 @@
 typedef void(*psyRdoQuant_t2)(int16_t *m_resiDctCoeff, int16_t *m_fencDctCoeff, int64_t *costUncoded, int64_t *totalUncodedCost, int64_t *totalRdCost, int64_t *psyScale, uint32_t blkPos);
 typedef void(*ssimDistortion_t)(const pixel *fenc, uint32_t fStride, const pixel *recon,  intptr_t rstride, uint64_t *ssBlock, int shift, uint64_t *ac_k);
 typedef void(*normFactor_t)(const pixel *src, uint32_t blockSize, int shift, uint64_t *z_k);
+/* SubSampling Luma */
+typedef void (*downscaleluma_t)(const pixel* src0, pixel* dstf, intptr_t src_stride, intptr_t dst_stride, int width, int height);
 /* Function pointers to optimized encoder primitives. Each pointer can reference
  * either an assembly routine, a SIMD intrinsic primitive, or a C function */
 struct EncoderPrimitives
@@ -353,6 +355,8 @@
 
     downscale_t           frameInitLowres;
     downscale_t           frameInitLowerRes;
+    /* Sub Sample Luma */
+    downscaleluma_t        frameSubSampleLuma;
     cutree_propagate_cost propagateCost;
     cutree_fix8_unpack    fix8Unpack;
     cutree_fix8_pack      fix8Pack;
@@ -488,7 +492,7 @@
 
 #if ENABLE_ASSEMBLY && X265_ARCH_ARM64
 extern "C" {
-#include "aarch64/pixel-util.h"
+#include "aarch64/fun-decls.h"
 }
 #endif

x265_3.5.tar.gz/source/common/ringmem.cpp Added

@@ -0,0 +1,357 @@
+/*****************************************************************************
+ * Copyright (C) 2013-2017 MulticoreWare, Inc
+ *
+ * Authors: liwei <liwei@multicorewareinc.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
+ *
+ * This program is also available under a commercial proprietary license.
+ * For more information, contact us at license @ x265.com
+ *****************************************************************************/
+
+#include "ringmem.h"
+
+#ifndef _WIN32
+#include <sys/mman.h>
+#endif ////< _WIN32
+
+#ifdef _WIN32
+#define X265_SHARED_MEM_NAME                    "Local\\_x265_shr_mem_"
+#define X265_SEMAPHORE_RINGMEM_WRITER_NAME	    "_x265_semW_"
+#define X265_SEMAPHORE_RINGMEM_READER_NAME	    "_x265_semR_"
+#else /* POSIX / pthreads */
+#define X265_SHARED_MEM_NAME                    "/tmp/_x265_shr_mem_"
+#define X265_SEMAPHORE_RINGMEM_WRITER_NAME	    "/tmp/_x265_semW_"
+#define X265_SEMAPHORE_RINGMEM_READER_NAME	    "/tmp/_x265_semR_"
+#endif
+
+#define RINGMEM_ALLIGNMENT                       64
+
+namespace X265_NS {
+    RingMem::RingMem() 
+        : m_initialized(false)
+        , m_protectRW(false)
+        , m_itemSize(0)
+        , m_itemCnt(0)
+        , m_dataPool(NULL)
+        , m_shrMem(NULL)
+#ifdef _WIN32
+        , m_handle(NULL)
+#else //_WIN32
+        , m_filepath(NULL)
+#endif //_WIN32
+        , m_writeSem(NULL)
+        , m_readSem(NULL)
+    {
+    }
+
+
+    RingMem::~RingMem()
+    {
+    }
+
+    bool RingMem::skipRead(int32_t cnt) {
+        if (!m_initialized)
+        {
+            return false;
+        }
+
+        if (m_protectRW)
+        {
+            for (int i = 0; i < cnt; i++)
+            {
+                m_readSem->take();
+            }
+        }
+        
+        ATOMIC_ADD(&m_shrMem->m_read, cnt);
+
+        if (m_protectRW)
+        {
+            m_writeSem->give(cnt);
+        }
+
+        return true;
+    }
+
+    bool RingMem::skipWrite(int32_t cnt) {
+        if (!m_initialized)
+        {
+            return false;
+        }
+
+        if (m_protectRW)
+        {
+            for (int i = 0; i < cnt; i++)
+            {
+                m_writeSem->take();
+            }
+        }
+
+        ATOMIC_ADD(&m_shrMem->m_write, cnt);
+
+        if (m_protectRW)
+        {
+            m_readSem->give(cnt);
+        }
+
+        return true;
+    }
+
+    ///< initialize
+    bool RingMem::init(int32_t itemSize, int32_t itemCnt, const char *name, bool protectRW)
+    {
+        ///< check parameters
+        if (itemSize <= 0 || itemCnt <= 0 || NULL == name)
+        {
+            ///< invalid parameters 
+            return false;
+        }
+
+        if (!m_initialized)
+        {
+            ///< formating names
+            char nameBufMAX_SHR_NAME_LEN = { 0 };
+
+            ///< shared memory name
+            snprintf(nameBuf, sizeof(nameBuf) - 1, "%s%s", X265_SHARED_MEM_NAME, name);
+
+            ///< create or open shared memory
+            bool newCreated = false;
+
+            ///< calculate the size of the shared memory
+            int32_t shrMemSize = (itemSize * itemCnt + sizeof(ShrMemCtrl) + RINGMEM_ALLIGNMENT - 1) & ~(RINGMEM_ALLIGNMENT - 1);
+
+#ifdef _WIN32
+            HANDLE h = OpenFileMappingA(FILE_MAP_WRITE | FILE_MAP_READ, FALSE, nameBuf);
+            if (!h)
+            {
+                h = CreateFileMappingA(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, shrMemSize, nameBuf);
+
+                if (!h)
+                {
+                    return false;
+                }
+
+                newCreated = true;
+            }
+
+            void *pool = MapViewOfFile(h, FILE_MAP_ALL_ACCESS, 0, 0, 0);
+
+            ///< should not close the handle here, otherwise the OpenFileMapping would fail
+            //CloseHandle(h);
+            m_handle = h;
+
+            if (!pool)
+            {
+                return false;
+            }
+
+#else /* POSIX / pthreads */
+            mode_t mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH;
+            int flag = O_RDWR;
+            int shrfd = -1;
+            if ((shrfd = open(nameBuf, flag, mode)) < 0)
+            {
+                flag |= O_CREAT;
+                
+                shrfd = open(nameBuf, flag, mode);
+                if (shrfd < 0)
+                {
+                    return false;
+                }
+                newCreated = true;
+
+                lseek(shrfd, shrMemSize - 1, SEEK_SET);
+
+                if (-1 == write(shrfd, "\0", 1))
+                {
+                    close(shrfd);
+                    return false;
+                }
+
+                if (lseek(shrfd, 0, SEEK_END) < shrMemSize)
+                {
+                    close(shrfd);
+                    return false;
+                }
+            }
+
+            void *pool = mmap(0,
+                shrMemSize,
+                PROT_READ | PROT_WRITE,
+                MAP_SHARED,
+                shrfd,
+                0);
+
+            close(shrfd);

x265_3.5.tar.gz/source/common/ringmem.h Added

@@ -0,0 +1,90 @@
+/*****************************************************************************
+ * Copyright (C) 2013-2017 MulticoreWare, Inc
+ *
+ * Authors: liwei <liwei@multicorewareinc.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
+ *
+ * This program is also available under a commercial proprietary license.
+ * For more information, contact us at license @ x265.com
+ *****************************************************************************/
+
+#ifndef X265_RINGMEM_H
+#define X265_RINGMEM_H
+
+#include "common.h"
+#include "threading.h"
+
+#if _MSC_VER
+#define snprintf _snprintf
+#define strdup _strdup
+#endif
+
+namespace X265_NS {
+
+#define MAX_SHR_NAME_LEN                         256
+
+    class RingMem {
+    public:
+        RingMem();
+        ~RingMem();
+
+        bool skipRead(int32_t cnt);
+
+        bool skipWrite(int32_t cnt);
+
+        ///< initialize
+        ///< protectRW: if use the semaphore the protect the write and read operation.
+        bool init(int32_t itemSize, int32_t itemCnt, const char *name, bool protectRW = false);
+        ///< finalize
+        void release();
+
+        typedef void(*fnRWSharedData)(void *dst, void *src, int32_t size);
+
+        ///< data read
+        bool readNext(void* dst, fnRWSharedData callback);
+        ///< data write
+        bool writeData(void *data, fnRWSharedData callback);
+
+    private:        
+        bool    m_initialized;
+        bool    m_protectRW;
+
+        int32_t m_itemSize;
+        int32_t m_itemCnt;
+        ///< data pool
+        void   *m_dataPool;
+        typedef struct {
+            ///< index to write
+            int32_t m_write;
+            ///< index to read
+            int32_t m_read;
+            
+        }ShrMemCtrl;
+
+        ShrMemCtrl *m_shrMem;
+#ifdef _WIN32
+        void       *m_handle;
+#else // _WIN32
+        char       *m_filepath;
+#endif // _WIN32
+
+        ///< Semaphores
+        NamedSemaphore *m_writeSem;
+        NamedSemaphore *m_readSem;
+    };
+};
+
+#endif // ifndef X265_RINGMEM_H

x265_3.5.tar.gz/source/common/slice.h Changed

x265_3.5.tar.gz/source/common/temporalfilter.cpp Added

@@ -0,0 +1,1017 @@
+/*****************************************************************************
+* Copyright (C) 2013-2021 MulticoreWare, Inc
+*
+ * Authors: Ashok Kumar Mishra <ashok@multicorewareinc.com>
+ *
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
+*
+* This program is also available under a commercial proprietary license.
+* For more information, contact us at license @ x265.com.
+*****************************************************************************/
+#include "common.h"
+#include "temporalfilter.h"
+#include "primitives.h"
+
+#include "frame.h"
+#include "slice.h"
+#include "framedata.h"
+#include "analysis.h"
+
+using namespace X265_NS;
+
+void OrigPicBuffer::addPicture(Frame* inFrame)
+{
+    m_mcstfPicList.pushFrontMCSTF(*inFrame);
+}
+
+void OrigPicBuffer::addEncPicture(Frame* inFrame)
+{
+    m_mcstfOrigPicFreeList.pushFrontMCSTF(*inFrame);
+}
+
+void OrigPicBuffer::addEncPictureToPicList(Frame* inFrame)
+{
+    m_mcstfOrigPicList.pushFrontMCSTF(*inFrame);
+}
+
+OrigPicBuffer::~OrigPicBuffer()
+{
+    while (!m_mcstfOrigPicList.empty())
+    {
+        Frame* curFrame = m_mcstfOrigPicList.popBackMCSTF();
+        curFrame->destroy();
+        delete curFrame;
+    }
+
+    while (!m_mcstfOrigPicFreeList.empty())
+    {
+        Frame* curFrame = m_mcstfOrigPicFreeList.popBackMCSTF();
+        curFrame->destroy();
+        delete curFrame;
+    }
+}
+
+void OrigPicBuffer::setOrigPicList(Frame* inFrame, int frameCnt)
+{
+    Slice* slice = inFrame->m_encData->m_slice;
+    uint8_t j = 0;
+    for (int iterPOC = (inFrame->m_poc - inFrame->m_mcstf->m_range);
+        iterPOC <= (inFrame->m_poc + inFrame->m_mcstf->m_range); iterPOC++)
+    {
+        if (iterPOC != inFrame->m_poc)
+        {
+            if (iterPOC < 0)
+                continue;
+            if (iterPOC >= frameCnt)
+                break;
+
+            Frame *iterFrame = m_mcstfPicList.getPOCMCSTF(iterPOC);
+            X265_CHECK(iterFrame, "Reference frame not found in OPB");
+            if (iterFrame != NULL)
+            {
+                slice->m_mcstfRefFrameList1j = iterFrame;
+                iterFrame->m_refPicCnt1--;
+            }
+
+            iterFrame = m_mcstfOrigPicList.getPOCMCSTF(iterPOC);
+            if (iterFrame != NULL)
+            {
+
+                slice->m_mcstfRefFrameList1j = iterFrame;
+
+                iterFrame->m_refPicCnt1--;
+                Frame *cFrame = m_mcstfOrigPicList.getPOCMCSTF(inFrame->m_poc);
+                X265_CHECK(cFrame, "Reference frame not found in encoded OPB");
+                cFrame->m_refPicCnt1--;
+            }
+            j++;
+        }
+    }
+}
+
+void OrigPicBuffer::recycleOrigPicList()
+{
+    Frame *iterFrame = m_mcstfPicList.first();
+
+    while (iterFrame)
+    {
+        Frame *curFrame = iterFrame;
+        iterFrame = iterFrame->m_nextMCSTF;
+        if (!curFrame->m_refPicCnt1)
+        {
+            m_mcstfPicList.removeMCSTF(*curFrame);
+            iterFrame = m_mcstfPicList.first();
+        }
+    }
+
+    iterFrame = m_mcstfOrigPicList.first();
+
+    while (iterFrame)
+    {
+        Frame *curFrame = iterFrame;
+        iterFrame = iterFrame->m_nextMCSTF;
+        if (!curFrame->m_refPicCnt1)
+        {
+            m_mcstfOrigPicList.removeMCSTF(*curFrame);
+            *curFrame->m_isSubSampled = false;
+            m_mcstfOrigPicFreeList.pushFrontMCSTF(*curFrame);
+            iterFrame = m_mcstfOrigPicList.first();
+        }
+    }
+}
+
+void OrigPicBuffer::addPictureToFreelist(Frame* inFrame)
+{
+    m_mcstfOrigPicFreeList.pushBack(*inFrame);
+}
+
+TemporalFilter::TemporalFilter()
+{
+    m_sourceWidth = 0;
+    m_sourceHeight = 0,
+    m_QP = 0;
+    m_sliceTypeConfig = 3;
+    m_numRef = 0;
+    m_useSADinME = 1;
+
+    m_range = 2;
+    m_chromaFactor = 0.55;
+    m_sigmaMultiplier = 9.0;
+    m_sigmaZeroPoint = 10.0;
+    m_motionVectorFactor = 16;
+}
+
+void TemporalFilter::init(const x265_param* param)
+{
+    m_param = param;
+    m_bitDepth = param->internalBitDepth;
+    m_sourceWidth = param->sourceWidth;
+    m_sourceHeight = param->sourceHeight;
+    m_internalCsp = param->internalCsp;
+    m_numComponents = (m_internalCsp != X265_CSP_I400) ? MAX_NUM_COMPONENT : 1;
+
+    m_metld = new MotionEstimatorTLD;
+
+    predPUYuv.create(FENC_STRIDE, X265_CSP_I400);
+}
+
+int TemporalFilter::createRefPicInfo(TemporalFilterRefPicInfo* refFrame, x265_param* param)
+{
+    CHECKED_MALLOC_ZERO(refFrame->mvs, MV, sizeof(MV)* ((m_sourceWidth ) / 4) * ((m_sourceHeight ) / 4));
+    refFrame->mvsStride = m_sourceWidth / 4;
+    CHECKED_MALLOC_ZERO(refFrame->mvs0, MV, sizeof(MV)* ((m_sourceWidth ) / 16) * ((m_sourceHeight ) / 16));
+    refFrame->mvsStride0 = m_sourceWidth / 16;
+    CHECKED_MALLOC_ZERO(refFrame->mvs1, MV, sizeof(MV)* ((m_sourceWidth ) / 16) * ((m_sourceHeight ) / 16));
+    refFrame->mvsStride1 = m_sourceWidth / 16;
+    CHECKED_MALLOC_ZERO(refFrame->mvs2, MV, sizeof(MV)* ((m_sourceWidth ) / 16)*((m_sourceHeight ) / 16));
+    refFrame->mvsStride2 = m_sourceWidth / 16;
+
+    CHECKED_MALLOC_ZERO(refFrame->noise, int, sizeof(int) * ((m_sourceWidth) / 4) * ((m_sourceHeight) / 4));
+    CHECKED_MALLOC_ZERO(refFrame->error, int, sizeof(int) * ((m_sourceWidth) / 4) * ((m_sourceHeight) / 4));
+
+    refFrame->slicetype = X265_TYPE_AUTO;
+
+    refFrame->compensatedPic = new PicYuv;
+    refFrame->compensatedPic->create(param, true);
+
+    return 1;
+fail:
+    return 0;
+}
+
+int TemporalFilter::motionErrorLumaSAD(
+    PicYuv *orig,
+    PicYuv *buffer,
+    int x,
+    int y,
+    int dx,

x265_3.5.tar.gz/source/common/temporalfilter.h Added

@@ -0,0 +1,185 @@
+/*****************************************************************************
+* Copyright (C) 2013-2021 MulticoreWare, Inc
+*
+ * Authors: Ashok Kumar Mishra <ashok@multicorewareinc.com>
+ *
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
+*
+* This program is also available under a commercial proprietary license.
+* For more information, contact us at license @ x265.com.
+*****************************************************************************/
+
+#ifndef X265_TEMPORAL_FILTER_H
+#define X265_TEMPORAL_FILTER_H
+
+#include "x265.h"
+#include "picyuv.h"
+#include "mv.h"
+#include "piclist.h"
+#include "yuv.h"
+#include "motion.h"
+
+const int s_interpolationFilter168 =
+{
+    {   0,   0,   0,  64,   0,   0,   0,   0 },   //0
+    {   0,   1,  -3,  64,   4,  -2,   0,   0 },   //1 -->-->
+    {   0,   1,  -6,  62,   9,  -3,   1,   0 },   //2 -->
+    {   0,   2,  -8,  60,  14,  -5,   1,   0 },   //3 -->-->
+    {   0,   2,  -9,  57,  19,  -7,   2,   0 },   //4
+    {   0,   3, -10,  53,  24,  -8,   2,   0 },   //5 -->-->
+    {   0,   3, -11,  50,  29,  -9,   2,   0 },   //6 -->
+    {   0,   3, -11,  44,  35, -10,   3,   0 },   //7 -->-->
+    {   0,   1,  -7,  38,  38,  -7,   1,   0 },   //8
+    {   0,   3, -10,  35,  44, -11,   3,   0 },   //9 -->-->
+    {   0,   2,  -9,  29,  50, -11,   3,   0 },   //10-->
+    {   0,   2,  -8,  24,  53, -10,   3,   0 },   //11-->-->
+    {   0,   2,  -7,  19,  57,  -9,   2,   0 },   //12
+    {   0,   1,  -5,  14,  60,  -8,   2,   0 },   //13-->-->
+    {   0,   1,  -3,   9,  62,  -6,   1,   0 },   //14-->
+    {   0,   0,  -2,   4,  64,  -3,   1,   0 }    //15-->-->
+};
+
+const double s_refStrengths34 =
+{ // abs(POC offset)
+  //  1,    2     3     4
+  {0.85, 0.57, 0.41, 0.33},  // m_range * 2
+  {1.13, 0.97, 0.81, 0.57},  // m_range
+  {0.30, 0.30, 0.30, 0.30}   // otherwise
+};
+
+namespace X265_NS {
+    class OrigPicBuffer
+    {
+    public:
+        PicList    m_mcstfPicList;
+        PicList    m_mcstfOrigPicFreeList;
+        PicList    m_mcstfOrigPicList;
+
+        ~OrigPicBuffer();
+        void addPicture(Frame*);
+        void addEncPicture(Frame*);
+        void setOrigPicList(Frame*, int);
+        void recycleOrigPicList();
+        void addPictureToFreelist(Frame*);
+        void addEncPictureToPicList(Frame*);
+    };
+
+    struct MotionEstimatorTLD
+    {
+        MotionEstimate  me;
+
+        MotionEstimatorTLD()
+        {
+            me.init(X265_CSP_I400);
+            me.setQP(X265_LOOKAHEAD_QP);
+        }
+
+        ~MotionEstimatorTLD() {}
+    };
+
+    struct TemporalFilterRefPicInfo
+    {
+        PicYuv*    picBuffer;
+        PicYuv*    picBufferSubSampled2;
+        PicYuv*    picBufferSubSampled4;
+        MV*        mvs;
+        MV*        mvs0;
+        MV*        mvs1;
+        MV*        mvs2;
+        uint32_t   mvsStride;
+        uint32_t   mvsStride0;
+        uint32_t   mvsStride1;
+        uint32_t   mvsStride2;
+        int*       error;
+        int*       noise;
+
+        int16_t    origOffset;
+        bool       isFilteredFrame;
+        PicYuv*    compensatedPic;
+
+        int*       isSubsampled;
+
+        int        slicetype;
+    };
+
+    class TemporalFilter
+    {
+    public:
+        TemporalFilter();
+        ~TemporalFilter() {}
+
+        void init(const x265_param* param);
+
+        //private:
+            // Private static member variables
+        const x265_param *m_param;
+        int32_t  m_bitDepth;
+        int m_range;
+        uint8_t m_numRef;
+        double m_chromaFactor;
+        double m_sigmaMultiplier;
+        double m_sigmaZeroPoint;
+        int m_motionVectorFactor;
+        int m_padding;
+
+        // Private member variables
+
+        int m_sourceWidth;
+        int m_sourceHeight;
+        int m_QP;
+
+        int m_internalCsp;
+        int m_numComponents;
+        uint8_t m_sliceTypeConfig;
+
+        MotionEstimatorTLD* m_metld;
+        Yuv  predPUYuv;
+        int m_useSADinME;
+
+        int createRefPicInfo(TemporalFilterRefPicInfo* refFrame, x265_param* param);
+
+        void bilateralFilter(Frame* frame, TemporalFilterRefPicInfo* mctfRefList, double overallStrength);
+
+        void motionEstimationLuma(MV *mvs, uint32_t mvStride, PicYuv *orig, PicYuv *buffer, int bs,
+            MV *previous = 0, uint32_t prevmvStride = 0, int factor = 1);
+
+        void motionEstimationLumaDoubleRes(MV *mvs, uint32_t mvStride, PicYuv *orig, PicYuv *buffer, int blockSize,
+            MV *previous, uint32_t prevMvStride, int factor, int* minError);
+
+        int motionErrorLumaSSD(PicYuv *orig,
+            PicYuv *buffer,
+            int x,
+            int y,
+            int dx,
+            int dy,
+            int bs,
+            int besterror = 8 * 8 * 1024 * 1024);
+
+        int motionErrorLumaSAD(PicYuv *orig,
+            PicYuv *buffer,
+            int x,
+            int y,
+            int dx,
+            int dy,
+            int bs,
+            int besterror = 8 * 8 * 1024 * 1024);
+
+        void destroyRefPicInfo(TemporalFilterRefPicInfo* curFrame);
+
+        void applyMotion(MV *mvs, uint32_t mvsStride, PicYuv *input, PicYuv *output);
+
+    };
+}
+#endif

x265_3.5.tar.gz/source/common/threading.h Changed

@@ -3,6 +3,7 @@
  *
  * Authors: Steve Borho <steve@borho.org>
  *          Min Chen <chenm003@163.com>
+            liwei <liwei@multicorewareinc.com>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -253,6 +254,47 @@
     int                m_val;
 };
 
+class NamedSemaphore
+{
+public:
+    NamedSemaphore() : m_sem(NULL)
+    {
+    }
+
+    ~NamedSemaphore()
+    {
+    }
+
+    bool create(const char* name, const int initcnt, const int maxcnt)
+    {
+        if(!m_sem)
+        {
+            m_sem = CreateSemaphoreA(NULL, initcnt, maxcnt, name);
+        }
+        return m_sem != NULL;
+    }
+
+    bool give(const int32_t cnt)
+    {
+        return ReleaseSemaphore(m_sem, (LONG)cnt, NULL) != FALSE;
+    }
+
+    bool take(const uint32_t time_out = INFINITE)
+    {
+        int32_t rt = WaitForSingleObject(m_sem, time_out);
+        return rt != WAIT_TIMEOUT && rt != WAIT_FAILED;
+    }
+
+    void release()
+    {
+        CloseHandle(m_sem);
+        m_sem = NULL;
+    }
+
+private:
+    HANDLE m_sem;
+};
+
 #else /* POSIX / pthreads */
 
 typedef pthread_t ThreadHandle;
@@ -459,6 +501,282 @@
     int             m_val;
 };
 
+#define TIMEOUT_INFINITE 0xFFFFFFFF
+
+class NamedSemaphore
+{
+public:
+    NamedSemaphore() 
+        : m_sem(NULL)
+#ifndef __APPLE__
+        , m_name(NULL)
+#endif //__APPLE__
+    {
+    }
+
+    ~NamedSemaphore()
+    {
+    }
+
+    bool create(const char* name, const int initcnt, const int maxcnt)
+    {
+        bool ret = false;
+
+        if (initcnt >= maxcnt)
+        {
+            return false;
+        }
+
+#ifdef __APPLE__
+        do
+        {
+            int32_t pshared = name != NULL ? PTHREAD_PROCESS_SHARED : PTHREAD_PROCESS_PRIVATE;
+
+            m_sem = (mac_sem_t *)malloc(sizeof(mac_sem_t));
+            if (!m_sem)
+            {
+                break;
+            }
+
+            if (pthread_mutexattr_init(&m_sem->mutexAttr))
+            {
+                break;
+            }
+
+            if (pthread_mutexattr_setpshared(&m_sem->mutexAttr, pshared))
+            {
+                break;
+            }
+
+            if (pthread_condattr_init(&m_sem->condAttr))
+            {
+                break;
+            }
+
+            if (pthread_condattr_setpshared(&m_sem->condAttr, pshared))
+            {
+                break;
+            }
+
+            if (pthread_mutex_init(&m_sem->mutex, &m_sem->mutexAttr))
+            {
+                break;
+            }
+
+            if (pthread_cond_init(&m_sem->cond, &m_sem->condAttr))
+            {
+                break;
+            }
+
+            m_sem->curCnt = initcnt;
+            m_sem->maxCnt = maxcnt;
+
+            ret = true;
+        } while (0);
+        
+        if (!ret)
+        {
+            release();
+        }
+
+#else  //__APPLE__
+        m_sem = sem_open(name, O_CREAT | O_EXCL, 0666, initcnt);
+        if (m_sem != SEM_FAILED) 
+        {
+            m_name = strdup(name);
+            ret = true;
+        }
+        else 
+        {
+            if (EEXIST == errno) 
+            {
+                m_sem = sem_open(name, 0);
+                if (m_sem != SEM_FAILED) 
+                {
+                    m_name = strdup(name);
+                    ret = true;
+                }
+            }
+        }
+#endif //__APPLE__
+
+        return ret;
+    }
+
+    bool give(const int32_t cnt)
+    {
+        if (!m_sem)
+        {
+            return false;
+        }
+
+#ifdef __APPLE__
+        if (pthread_mutex_lock(&m_sem->mutex))
+        {
+            return false;
+        }
+
+        int oldCnt = m_sem->curCnt;
+        m_sem->curCnt += cnt;
+        if (m_sem->curCnt > m_sem->maxCnt)
+        {
+            m_sem->curCnt = m_sem->maxCnt;
+        }
+
+        bool ret = true;
+        if (!oldCnt)
+        {
+            ret = 0 == pthread_cond_broadcast(&m_sem->cond);
+        }
+
+        if (pthread_mutex_unlock(&m_sem->mutex))
+        {
+            return false;
+        }
+
+        return ret;
+#else //__APPLE__
+        int ret = 0;
+        int32_t curCnt = cnt;
+        while (curCnt-- && !ret) {
+            ret = sem_post(m_sem);
+        }

x265_3.5.tar.gz/source/common/threadpool.cpp Changed

x265_3.5.tar.gz/source/common/version.cpp Changed

x265_3.5.tar.gz/source/common/x86/asm-primitives.cpp Changed

@@ -1091,6 +1091,7 @@
 
         p.frameInitLowres = PFX(frame_init_lowres_core_sse2);
         p.frameInitLowerRes = PFX(frame_init_lowres_core_sse2);
+        p.frameSubSampleLuma = PFX(frame_subsample_luma_sse2);
         // TODO: the planecopy_sp is really planecopy_SC now, must be fix it 
         //p.planecopy_sp = PFX(downShift_16_sse2);
         p.planecopy_sp_shl = PFX(upShift_16_sse2);
@@ -1121,6 +1122,7 @@
     {
         ASSIGN2(p.scale1D_128to64, scale1D_128to64_ssse3);
         p.scale2D_64to32 = PFX(scale2D_64to32_ssse3);
+        p.frameSubSampleLuma = PFX(frame_subsample_luma_ssse3);
 
         // p.puLUMA_4x4.satd = p.cuBLOCK_4x4.sa8d = PFX(pixel_satd_4x4_ssse3); this one is broken
         ALL_LUMA_PU(satd, pixel_satd, ssse3);
@@ -1462,6 +1464,7 @@
         p.puLUMA_64x48.copy_pp = (copy_pp_t)PFX(blockcopy_ss_64x48_avx);
         p.puLUMA_64x64.copy_pp = (copy_pp_t)PFX(blockcopy_ss_64x64_avx);
         p.propagateCost = PFX(mbtree_propagate_cost_avx);
+        p.frameSubSampleLuma = PFX(frame_subsample_luma_avx);
     }
     if (cpuMask & X265_CPU_XOP)
     {
@@ -1473,6 +1476,7 @@
         LUMA_VAR(xop);
         p.frameInitLowres = PFX(frame_init_lowres_core_xop);
         p.frameInitLowerRes = PFX(frame_init_lowres_core_xop);
+        p.frameSubSampleLuma = PFX(frame_subsample_luma_xop);
     }
     if (cpuMask & X265_CPU_AVX2)
     {
@@ -2301,6 +2305,9 @@
 
         p.frameInitLowres = PFX(frame_init_lowres_core_avx2);
         p.frameInitLowerRes = PFX(frame_init_lowres_core_avx2);
+
+        p.frameSubSampleLuma = PFX(frame_subsample_luma_avx2);
+
         p.propagateCost = PFX(mbtree_propagate_cost_avx2);
         p.fix8Unpack = PFX(cutree_fix8_unpack_avx2);
         p.fix8Pack = PFX(cutree_fix8_pack_avx2);
@@ -3300,6 +3307,7 @@
         //p.frameInitLowres = PFX(frame_init_lowres_core_mmx2);
         p.frameInitLowres = PFX(frame_init_lowres_core_sse2);
         p.frameInitLowerRes = PFX(frame_init_lowres_core_sse2);
+        p.frameSubSampleLuma = PFX(frame_subsample_luma_sse2);
 
         ALL_LUMA_TU(blockfill_sNONALIGNED, blockfill_s, sse2);
         ALL_LUMA_TU(blockfill_sALIGNED, blockfill_s, sse2);
@@ -3424,6 +3432,8 @@
         ASSIGN2(p.scale1D_128to64, scale1D_128to64_ssse3);
         p.scale2D_64to32 = PFX(scale2D_64to32_ssse3);
 
+        p.frameSubSampleLuma = PFX(frame_subsample_luma_ssse3);
+
         ASSIGN2(p.puLUMA_8x4.convert_p2s, filterPixelToShort_8x4_ssse3);
         ASSIGN2(p.puLUMA_8x8.convert_p2s, filterPixelToShort_8x8_ssse3);
         ASSIGN2(p.puLUMA_8x16.convert_p2s, filterPixelToShort_8x16_ssse3);
@@ -3691,6 +3701,7 @@
         p.frameInitLowres = PFX(frame_init_lowres_core_avx);
         p.frameInitLowerRes = PFX(frame_init_lowres_core_avx);
         p.propagateCost = PFX(mbtree_propagate_cost_avx);
+        p.frameSubSampleLuma = PFX(frame_subsample_luma_avx);
     }
     if (cpuMask & X265_CPU_XOP)
     {
@@ -3702,6 +3713,7 @@
         p.cuBLOCK_16x16.sse_pp = PFX(pixel_ssd_16x16_xop);
         p.frameInitLowres = PFX(frame_init_lowres_core_xop);
         p.frameInitLowerRes = PFX(frame_init_lowres_core_xop);
+        p.frameSubSampleLuma = PFX(frame_subsample_luma_xop);
 
     }
 #if X86_64
@@ -4684,6 +4696,8 @@
         p.saoCuStatsE2 = PFX(saoCuStatsE2_avx2);
         p.saoCuStatsE3 = PFX(saoCuStatsE3_avx2);
 
+        p.frameSubSampleLuma = PFX(frame_subsample_luma_avx2);
+
         if (cpuMask & X265_CPU_BMI2)
         {
             p.scanPosLast = PFX(scanPosLast_avx2_bmi2);

x265_3.5.tar.gz/source/common/x86/mc-a2.asm Changed

@@ -992,6 +992,262 @@
 FRAME_INIT_LOWRES
 %endif
 
+%macro SUBSAMPLEFILT8x4 7
+    mova      %3, r0+%7
+    mova      %4, r0+r2+%7
+    pavgb     %3, %4
+    pavgb     %4, r0+r2*2+%7
+    PALIGNR   %1, %3, 1, m6
+    PALIGNR   %2, %4, 1, m6
+%if cpuflag(xop)
+    pavgb     %1, %3
+    pavgb     %2, %4
+%else
+    pavgb     %1, %3
+    pavgb     %2, %4
+    psrlw     %5, %1, 8
+    psrlw     %6, %2, 8
+    pand      %1, m7
+    pand      %2, m7
+%endif
+%endmacro
+
+%macro SUBSAMPLEFILT32x4U 1
+    movu      m1, r0+r2
+    pavgb     m0, m1, r0
+    movu      m3, r0+r2+1
+    pavgb     m2, m3, r0+1
+    pavgb     m1, r0+r2*2
+    pavgb     m3, r0+r2*2+1
+    pavgb     m0, m2
+    pavgb     m1, m3
+
+    movu      m3, r0+r2+mmsize
+    pavgb     m2, m3, r0+mmsize
+    movu      m5, r0+r2+1+mmsize
+    pavgb     m4, m5, r0+1+mmsize
+    pavgb     m2, m4
+
+    pshufb    m0, m7
+    pshufb    m2, m7
+    punpcklqdq m0, m0, m2
+    vpermq    m0, m0, q3120
+    movu    %1, m0
+%endmacro
+
+%macro SUBSAMPLEFILT16x2 3
+    mova      m3, r0+%3+mmsize
+    mova      m2, r0+%3
+    pavgb     m3, r0+%3+r2+mmsize
+    pavgb     m2, r0+%3+r2
+    PALIGNR   %1, m3, 1, m6
+    pavgb     %1, m3
+    PALIGNR   m3, m2, 1, m6
+    pavgb     m3, m2
+%if cpuflag(xop)
+    vpperm    m3, m3, %1, m6
+%else
+    pand      m3, m7
+    pand      %1, m7
+    packuswb  m3, %1
+%endif
+    mova    %2, m3
+    mova      %1, m2
+%endmacro
+
+%macro SUBSAMPLEFILT8x2U 2
+    mova      m2, r0+%2
+    pavgb     m2, r0+%2+r2
+    mova      m0, r0+%2+1
+    pavgb     m0, r0+%2+r2+1
+    pavgb     m1, m3
+    pavgb     m0, m2
+    pand      m1, m7
+    pand      m0, m7
+    packuswb  m0, m1
+    mova    %1, m0
+%endmacro
+
+%macro SUBSAMPLEFILT8xU 2
+    mova      m3, r0+%2+8
+    mova      m2, r0+%2
+    pavgw     m3, r0+%2+r2+8
+    pavgw     m2, r0+%2+r2
+    movu      m1, r0+%2+10
+    movu      m0, r0+%2+2
+    pavgw     m1, r0+%2+r2+10
+    pavgw     m0, r0+%2+r2+2
+    pavgw     m1, m3
+    pavgw     m0, m2
+    psrld     m3, m1, 16
+    pand      m1, m7
+    pand      m0, m7
+    packssdw  m0, m1
+    movu    %1, m0
+%endmacro
+
+%macro SUBSAMPLEFILT8xA 3
+    movu      m3, r0+%3+mmsize
+    movu      m2, r0+%3
+    pavgw     m3, r0+%3+r2+mmsize
+    pavgw     m2, r0+%3+r2
+    PALIGNR   %1, m3, 2, m6
+    pavgw     %1, m3
+    PALIGNR   m3, m2, 2, m6
+    pavgw     m3, m2
+%if cpuflag(xop)
+    vpperm    m3, m3, %1, m6
+%else
+    pand      m3, m7
+    pand      %1, m7
+    packssdw  m3, %1
+%endif
+%if cpuflag(avx2)
+    vpermq     m3, m3, q3120
+%endif
+    movu    %2, m3
+    movu      %1, m2
+%endmacro
+
+;-----------------------------------------------------------------------------
+; void frame_subsample_luma( uint8_t *src0, uint8_t *dst0,
+;                              intptr_t src_stride, intptr_t dst_stride, int width, int height )
+;-----------------------------------------------------------------------------
+
+%macro FRAME_SUBSAMPLE_LUMA 0
+cglobal frame_subsample_luma, 6,7,(12-4*(BIT_DEPTH/9)) ; 8 for HIGH_BIT_DEPTH, 12 otherwise
+%if HIGH_BIT_DEPTH
+    shl   dword r3m, 1
+    FIX_STRIDES r2
+    shl   dword r4m, 1
+%endif
+%if mmsize >= 16
+    add   dword r4m, mmsize-1
+    and   dword r4m, ~(mmsize-1)
+%endif
+    ; src += 2*(height-1)*stride + 2*width
+    mov      r6d, r5m
+    dec      r6d
+    imul     r6d, r2d
+    add      r6d, r4m
+    lea       r0, r0+r6*2
+    ; dst += (height-1)*stride + width
+    mov      r6d, r5m
+    dec      r6d
+    imul     r6d, r3m
+    add      r6d, r4m
+    add       r1, r6
+    ; gap = stride - width
+    mov      r6d, r3m
+    sub      r6d, r4m
+    PUSH      r6
+    %define dst_gap rsp+gprsize
+    mov      r6d, r2d
+    sub      r6d, r4m
+    shl      r6d, 1
+    PUSH      r6
+    %define src_gap rsp
+%if HIGH_BIT_DEPTH
+%if cpuflag(xop)
+    mova      m6, deinterleave_shuf32a
+    mova      m7, deinterleave_shuf32b
+%else
+    pcmpeqw   m7, m7
+    psrld     m7, 16
+%endif
+.vloop:
+    mov      r6d, r4m
+%ifnidn cpuname, mmx2
+    movu      m0, r0
+    movu      m1, r0+r2
+    pavgw     m0, m1
+    pavgw     m1, r0+r2*2
+%endif
+.hloop:
+    sub       r0, mmsize*2
+    sub       r1, mmsize
+%ifidn cpuname, mmx2
+    SUBSAMPLEFILT8xU r1, 0
+%else
+    SUBSAMPLEFILT8xA m0, r1, 0
+%endif
+    sub      r6d, mmsize
+    jg .hloop
+%else ; !HIGH_BIT_DEPTH
+%if cpuflag(avx2)
+    mova      m7, deinterleave_shuf
+%elif cpuflag(xop)
+    mova      m6, deinterleave_shuf32a
+    mova      m7, deinterleave_shuf32b
+%else
+    pcmpeqb   m7, m7
+    psrlw     m7, 8
+%endif
+.vloop:
+    mov      r6d, r4m
+%ifnidn cpuname, mmx2
+%if mmsize <= 16
+    mova      m0, r0

x265_3.5.tar.gz/source/common/x86/mc.h Changed

x265_3.5.tar.gz/source/encoder/analysis.cpp Changed

x265_3.5.tar.gz/source/encoder/api.cpp Changed

x265_3.5.tar.gz/source/encoder/dpb.cpp Changed

x265_3.5.tar.gz/source/encoder/encoder.cpp Changed

@@ -72,7 +72,40 @@
 {
     { 1, 1, 1, 1, 1, 5, 1,  2, 2, 2, 50 },
     { 1, 1, 1, 1, 1, 5, 0, 16, 9, 9, 81 },
-    { 1, 1, 1, 1, 1, 5, 0,  1, 1, 1, 82 }
+    { 1, 1, 1, 1, 1, 5, 0,  1, 1, 1, 82 },
+    { 1, 1, 1, 1, 1, 5, 0, 18, 9, 9, 84 }
+};
+
+typedef struct
+{
+    int bEnableVideoSignalTypePresentFlag;
+    int bEnableColorDescriptionPresentFlag;
+    int bEnableChromaLocInfoPresentFlag;
+    int colorPrimaries;
+    int transferCharacteristics;
+    int matrixCoeffs;
+    int bEnableVideoFullRangeFlag;
+    int chromaSampleLocTypeTopField;
+    int chromaSampleLocTypeBottomField;
+    const char* systemId;
+}VideoSignalTypePresets;
+
+VideoSignalTypePresets vstPresets =
+{
+    {1, 1, 1, 6, 6, 6, 0, 0, 0, "BT601_525"},
+    {1, 1, 1, 5, 6, 5, 0, 0, 0, "BT601_626"},
+    {1, 1, 1, 1, 1, 1, 0, 0, 0, "BT709_YCC"},
+    {1, 1, 0, 1, 1, 0, 0, 0, 0, "BT709_RGB"},
+    {1, 1, 1, 9, 14, 1, 0, 2, 2, "BT2020_YCC_NCL"},
+    {1, 1, 0, 9, 16, 9, 0, 0, 0, "BT2020_RGB"},
+    {1, 1, 1, 9, 16, 9, 0, 2, 2, "BT2100_PQ_YCC"},
+    {1, 1, 1, 9, 16, 14, 0, 2, 2, "BT2100_PQ_ICTCP"},
+    {1, 1, 0, 9, 16, 0, 0, 0, 0, "BT2100_PQ_RGB"},
+    {1, 1, 1, 9, 18, 9, 0, 2, 2, "BT2100_HLG_YCC"},
+    {1, 1, 0, 9, 18, 0, 0, 0, 0, "BT2100_HLG_RGB"},
+    {1, 1, 0, 1, 1, 0, 1, 0, 0, "FR709_RGB"},
+    {1, 1, 0, 9, 14, 0, 1, 0, 0, "FR2020_RGB"},
+    {1, 1, 1, 12, 1, 6, 1, 1, 1, "FRP3D65_YCC"}
 };
 }
 
@@ -109,6 +142,7 @@
     m_threadPool = NULL;
     m_analysisFileIn = NULL;
     m_analysisFileOut = NULL;
+    m_filmGrainIn = NULL;
     m_naluFile = NULL;
     m_offsetEmergency = NULL;
     m_iFrameNum = 0;
@@ -134,12 +168,8 @@
     m_prevTonemapPayload.payload = NULL;
     m_startPoint = 0;
     m_saveCTUSize = 0;
-    m_edgePic = NULL;
-    m_edgeHistThreshold = 0;
-    m_chromaHistThreshold = 0.0;
-    m_scaledEdgeThreshold = 0.0;
-    m_scaledChromaThreshold = 0.0;
     m_zoneIndex = 0;
+    m_origPicBuffer = 0;
 }
 
 inline char *strcatFilename(const char *input, const char *suffix)
@@ -216,34 +246,6 @@
         }
     }
 
-    if (m_param->bHistBasedSceneCut)
-    {
-        m_planeSizes0 = (m_param->sourceWidth >> x265_cli_cspsp->internalCsp.width0) * (m_param->sourceHeight >> x265_cli_cspsm_param->internalCsp.height0);
-        uint32_t pixelbytes = m_param->internalBitDepth > 8 ? 2 : 1;
-        m_edgePic = X265_MALLOC(pixel, m_planeSizes0 * pixelbytes);
-        m_edgeHistThreshold = m_param->edgeTransitionThreshold;
-        m_chromaHistThreshold = x265_min(m_edgeHistThreshold * 10.0, MAX_SCENECUT_THRESHOLD);
-        m_scaledEdgeThreshold = x265_min(m_edgeHistThreshold * SCENECUT_STRENGTH_FACTOR, MAX_SCENECUT_THRESHOLD);
-        m_scaledChromaThreshold = x265_min(m_chromaHistThreshold * SCENECUT_STRENGTH_FACTOR, MAX_SCENECUT_THRESHOLD);
-        if (m_param->sourceBitDepth != m_param->internalBitDepth)
-        {
-            int size = m_param->sourceWidth * m_param->sourceHeight;
-            int hshift = CHROMA_H_SHIFT(m_param->internalCsp);
-            int vshift = CHROMA_V_SHIFT(m_param->internalCsp);
-            int widthC = m_param->sourceWidth >> hshift;
-            int heightC = m_param->sourceHeight >> vshift;
-
-            m_inputPic0 = X265_MALLOC(pixel, size);
-            if (m_param->internalCsp != X265_CSP_I400)
-            {
-                for (int j = 1; j < 3; j++)
-                {
-                    m_inputPicj = X265_MALLOC(pixel, widthC * heightC);
-                }
-            }
-        }
-    }
-
     // Do not allow WPP if only one row or fewer than 3 columns, it is pointless and unstable
     if (rows == 1 || cols < 3)
     {
@@ -357,6 +359,10 @@
             lookAheadThreadPooli.start();
     m_lookahead->m_numPools = pools;
     m_dpb = new DPB(m_param);
+
+    if (m_param->bEnableTemporalFilter)
+        m_origPicBuffer = new OrigPicBuffer();
+
     m_rateControl = new RateControl(*m_param, this);
     if (!m_param->bResetZoneConfig)
     {
@@ -518,6 +524,15 @@
             }
         }
     }
+    if (m_param->filmGrain)
+    {
+        m_filmGrainIn = x265_fopen(m_param->filmGrain, "rb");
+        if (!m_filmGrainIn)
+        {
+            x265_log_file(NULL, X265_LOG_ERROR, "Failed to open film grain characteristics binary file %s\n", m_param->filmGrain);
+        }
+    }
+
     m_bZeroLatency = !m_param->bframes && !m_param->lookaheadDepth && m_param->frameNumThreads == 1 && m_param->maxSlices == 1;
     m_aborted |= parseLambdaFile(m_param);
 
@@ -879,26 +894,6 @@
         }
     }
 
-    if (m_param->bHistBasedSceneCut)
-    {
-        if (m_edgePic != NULL)
-        {
-            X265_FREE_ZERO(m_edgePic);
-        }
-
-        if (m_param->sourceBitDepth != m_param->internalBitDepth)
-        {
-            X265_FREE_ZERO(m_inputPic0);
-            if (m_param->internalCsp != X265_CSP_I400)
-            {
-                for (int i = 1; i < 3; i++)
-                {
-                    X265_FREE_ZERO(m_inputPici);
-                }
-            }
-        }
-    }
-
     for (int i = 0; i < m_param->frameNumThreads; i++)
     {
         if (m_frameEncoderi)
@@ -924,6 +919,10 @@
         delete zoneReadCount;
         delete zoneWriteCount;
     }
+
+    if (m_param->bEnableTemporalFilter)
+        delete m_origPicBuffer;
+
     if (m_rateControl)
     {
         m_rateControl->destroy();
@@ -963,6 +962,8 @@
      }
     if (m_naluFile)
         fclose(m_naluFile);
+    if (m_filmGrainIn)
+        x265_fclose(m_filmGrainIn);
 
 #ifdef SVT_HEVC
     X265_FREE(m_svtAppData);
@@ -974,6 +975,7 @@
         /* release string arguments that were strdup'd */
         free((char*)m_param->rc.lambdaFileName);
         free((char*)m_param->rc.statFileName);
+        free((char*)m_param->rc.sharedMemName);
         free((char*)m_param->analysisReuseFileName);
         free((char*)m_param->scalingLists);
         free((char*)m_param->csvfn);
@@ -982,6 +984,7 @@
         free((char*)m_param->toneMapFile);
         free((char*)m_param->analysisSave);
         free((char*)m_param->analysisLoad);
+        free((char*)m_param->videoSignalTypePreset);
         PARAM_NS::x265_param_free(m_param);
     }
 }
@@ -1358,215 +1361,90 @@
     dest->planes2 = (char*)dest->planes1 + src->stride1 * (src->height >> x265_cli_cspssrc->colorSpace.height1);
 }
 
-bool Encoder::computeHistograms(x265_picture *pic)
+bool Encoder::isFilterThisframe(uint8_t sliceTypeConfig, int curSliceType)
 {
-    pixel *src = NULL, *planeV = NULL, *planeU = NULL;
-    uint32_t widthC, heightC;
-    int hshift, vshift;
-

x265_3.5.tar.gz/source/encoder/encoder.h Changed

@@ -32,6 +32,7 @@
 #include "nal.h"
 #include "framedata.h"
 #include "svt.h"
+#include "temporalfilter.h"
 #ifdef ENABLE_HDR10_PLUS
     #include "dynamicHDR10/hdr10plus.h"
 #endif
@@ -256,19 +257,6 @@
     int                m_bToneMap; // Enables tone-mapping
     int                m_enableNal;
 
-    /* For histogram based scene-cut detection */
-    pixel*             m_edgePic;
-    pixel*             m_inputPic3;
-    int32_t            m_curYUVHist3HISTOGRAM_BINS;
-    int32_t            m_prevYUVHist3HISTOGRAM_BINS;
-    int32_t            m_curEdgeHist2;
-    int32_t            m_prevEdgeHist2;
-    uint32_t           m_planeSizes3;
-    double             m_edgeHistThreshold;
-    double             m_chromaHistThreshold;
-    double             m_scaledEdgeThreshold;
-    double             m_scaledChromaThreshold;
-
 #ifdef ENABLE_HDR10_PLUS
     const hdr10plus_api     *m_hdr10plus_api;
     uint8_t                 **m_cim;
@@ -295,6 +283,9 @@
 
     ThreadSafeInteger* zoneReadCount;
     ThreadSafeInteger* zoneWriteCount;
+    /* Film grain model file */
+    FILE* m_filmGrainIn;
+    OrigPicBuffer*          m_origPicBuffer;
 
     Encoder();
     ~Encoder()
@@ -327,6 +318,8 @@
 
     void getStreamHeaders(NALList& list, Entropy& sbacCoder, Bitstream& bs);
 
+    void getEndNalUnits(NALList& list, Bitstream& bs);
+
     void fetchStats(x265_stats* stats, size_t statsSizeBytes);
 
     void printSummary();
@@ -373,11 +366,6 @@
 
     void copyPicture(x265_picture *dest, const x265_picture *src);
 
-    bool computeHistograms(x265_picture *pic);
-    void computeHistogramSAD(double *maxUVNormalizedSAD, double *edgeNormalizedSAD, int curPoc);
-    double normalizeRange(int32_t value, int32_t minValue, int32_t maxValue, double rangeStart, double rangeEnd);
-    void findSceneCuts(x265_picture *pic, bool& bDup, double m_maxUVSADVal, double m_edgeSADVal, bool& isMaxThres, bool& isHardSC);
-
     void initRefIdx();
     void analyseRefIdx(int *numRefIdx);
     void updateRefIdx();
@@ -387,6 +375,11 @@
 
     void configureDolbyVisionParams(x265_param* p);
 
+    void configureVideoSignalTypePreset(x265_param* p);
+
+    bool isFilterThisframe(uint8_t sliceTypeConfig, int curSliceType);
+    bool generateMcstfRef(Frame* frameEnc, FrameEncoder* currEncoder);
+
 protected:
 
     void initVPS(VPS *vps);

x265_3.5.tar.gz/source/encoder/frameencoder.cpp Changed

@@ -34,6 +34,7 @@
 #include "common.h"
 #include "slicetype.h"
 #include "nal.h"
+#include "temporalfilter.h"
 
 namespace X265_NS {
 void weightAnalyse(Slice& slice, Frame& frame, x265_param& param);
@@ -101,6 +102,16 @@
         delete m_rce.picTimingSEI;
         delete m_rce.hrdTiming;
     }
+
+    if (m_param->bEnableTemporalFilter)
+    {
+        delete m_frameEncTF->m_metld;
+
+        for (int i = 0; i < (m_frameEncTF->m_range << 1); i++)
+            m_frameEncTF->destroyRefPicInfo(&m_mcstfRefListi);
+
+        delete m_frameEncTF;
+    }
 }
 
 bool FrameEncoder::init(Encoder *top, int numRows, int numCols)
@@ -195,6 +206,16 @@
         m_sliceAddrBits = (uint16_t)(tmp + 1);
     }
 
+    if (m_param->bEnableTemporalFilter)
+    {
+        m_frameEncTF = new TemporalFilter();
+        if (m_frameEncTF)
+            m_frameEncTF->init(m_param);
+
+        for (int i = 0; i < (m_frameEncTF->m_range << 1); i++)
+            ok &= !!m_frameEncTF->createRefPicInfo(&m_mcstfRefListi, m_param);
+    }
+
     return ok;
 }
 
@@ -450,7 +471,7 @@
     m_ssimCnt = 0;
     memset(&(m_frame->m_encData->m_frameStats), 0, sizeof(m_frame->m_encData->m_frameStats));
 
-    if (!m_param->bHistBasedSceneCut && m_param->rc.aqMode != X265_AQ_EDGE && m_param->recursionSkipMode == EDGE_BASED_RSKIP)
+    if (m_param->rc.aqMode != X265_AQ_EDGE && m_param->recursionSkipMode == EDGE_BASED_RSKIP)
     {
         int height = m_frame->m_fencPic->m_picHeight;
         int width = m_frame->m_fencPic->m_picWidth;
@@ -467,6 +488,12 @@
      * unit) */
     Slice* slice = m_frame->m_encData->m_slice;
 
+    if (m_param->bEnableEndOfSequence && m_frame->m_lowres.sliceType == X265_TYPE_IDR && m_frame->m_poc)
+    {
+        m_bs.resetBits();
+        m_nalList.serialize(NAL_UNIT_EOS, m_bs);
+    }
+
     if (m_param->bEnableAccessUnitDelimiters && (m_frame->m_poc || m_param->bRepeatHeaders))
     {
         m_bs.resetBits();
@@ -573,6 +600,12 @@
     int qp = m_top->m_rateControl->rateControlStart(m_frame, &m_rce, m_top);
     m_rce.newQp = qp;
 
+    if (m_param->bEnableTemporalFilter)
+    {
+        m_frameEncTF->m_QP = qp;
+        m_frameEncTF->bilateralFilter(m_frame, m_mcstfRefList, m_param->temporalFilterStrength);
+    }
+
     if (m_nr)
     {
         if (qp > QP_MAX_SPEC && m_frame->m_param->rc.vbvBufferSize)
@@ -756,7 +789,14 @@
         m_seiAlternativeTC.m_preferredTransferCharacteristics = m_param->preferredTransferCharacteristics;
         m_seiAlternativeTC.writeSEImessages(m_bs, *slice->m_sps, NAL_UNIT_PREFIX_SEI, m_nalList, m_param->bSingleSeiNal);
     }
-
+    /* Write Film grain characteristics if present */
+    if (this->m_top->m_filmGrainIn)
+    {
+        FilmGrainCharacteristics m_filmGrain;
+        /* Read the Film grain model file */
+        readModel(&m_filmGrain, this->m_top->m_filmGrainIn);
+        m_filmGrain.writeSEImessages(m_bs, *slice->m_sps, NAL_UNIT_PREFIX_SEI, m_nalList, m_param->bSingleSeiNal);
+    }
     /* Write user SEI */
     for (int i = 0; i < m_frame->m_userSEI.numPayloads; i++)
     {
@@ -933,6 +973,23 @@
     if (m_param->bDynamicRefine && m_top->m_startPoint <= m_frame->m_encodeOrder) //Avoid collecting data that will not be used by future frames.
         collectDynDataFrame();
 
+    if (m_param->bEnableTemporalFilter && m_top->isFilterThisframe(m_frame->m_mcstf->m_sliceTypeConfig, m_frame->m_lowres.sliceType))
+    {
+        //Reset the MCSTF context in Frame Encoder and Frame
+        for (int i = 0; i < (m_frameEncTF->m_range << 1); i++)
+        {
+            memset(m_mcstfRefListi.mvs0, 0, sizeof(MV) * ((m_param->sourceWidth / 16) * (m_param->sourceHeight / 16)));
+            memset(m_mcstfRefListi.mvs1, 0, sizeof(MV) * ((m_param->sourceWidth / 16) * (m_param->sourceHeight / 16)));
+            memset(m_mcstfRefListi.mvs2, 0, sizeof(MV) * ((m_param->sourceWidth / 16) * (m_param->sourceHeight / 16)));
+            memset(m_mcstfRefListi.mvs,  0, sizeof(MV) * ((m_param->sourceWidth / 4) * (m_param->sourceHeight / 4)));
+            memset(m_mcstfRefListi.noise, 0, sizeof(int) * ((m_param->sourceWidth / 4) * (m_param->sourceHeight / 4)));
+            memset(m_mcstfRefListi.error, 0, sizeof(int) * ((m_param->sourceWidth / 4) * (m_param->sourceHeight / 4)));
+
+            m_frame->m_mcstf->m_numRef = 0;
+        }
+    }
+
+
     if (m_param->rc.bStatWrite)
     {
         int totalI = 0, totalP = 0, totalSkip = 0;
@@ -2127,6 +2184,54 @@
         m_nr->nrOffsetDenoisecat0 = 0;
     }
 }
+
+void FrameEncoder::readModel(FilmGrainCharacteristics* m_filmGrain, FILE* filmgrain)
+{
+    char const* errorMessage = "Error reading FilmGrain characteristics\n";
+    FilmGrain m_fg;
+    x265_fread((char* )&m_fg, sizeof(bool) * 3 + sizeof(uint8_t), 1, filmgrain, errorMessage);
+    m_filmGrain->m_filmGrainCharacteristicsCancelFlag = m_fg.m_filmGrainCharacteristicsCancelFlag;
+    m_filmGrain->m_filmGrainCharacteristicsPersistenceFlag = m_fg.m_filmGrainCharacteristicsPersistenceFlag;
+    m_filmGrain->m_filmGrainModelId = m_fg.m_filmGrainModelId;
+    m_filmGrain->m_separateColourDescriptionPresentFlag = m_fg.m_separateColourDescriptionPresentFlag;
+    if (m_filmGrain->m_separateColourDescriptionPresentFlag)
+    {
+        ColourDescription m_clr;
+        x265_fread((char* )&m_clr, sizeof(bool) + sizeof(uint8_t) * 5, 1, filmgrain, errorMessage);
+        m_filmGrain->m_filmGrainBitDepthLumaMinus8 = m_clr.m_filmGrainBitDepthLumaMinus8;
+        m_filmGrain->m_filmGrainBitDepthChromaMinus8 = m_clr.m_filmGrainBitDepthChromaMinus8;
+        m_filmGrain->m_filmGrainFullRangeFlag = m_clr.m_filmGrainFullRangeFlag;
+        m_filmGrain->m_filmGrainColourPrimaries = m_clr.m_filmGrainColourPrimaries;
+        m_filmGrain->m_filmGrainTransferCharacteristics = m_clr.m_filmGrainTransferCharacteristics;
+        m_filmGrain->m_filmGrainMatrixCoeffs = m_clr.m_filmGrainMatrixCoeffs;
+    }
+    FGPresent m_present;
+    x265_fread((char* )&m_present, sizeof(bool) * 3 + sizeof(uint8_t) * 2, 1, filmgrain, errorMessage);
+    m_filmGrain->m_blendingModeId = m_present.m_blendingModeId;
+    m_filmGrain->m_log2ScaleFactor = m_present.m_log2ScaleFactor;
+    m_filmGrain->m_compModel0.bPresentFlag = m_present.m_presentFlag0;
+    m_filmGrain->m_compModel1.bPresentFlag = m_present.m_presentFlag1;
+    m_filmGrain->m_compModel2.bPresentFlag = m_present.m_presentFlag2;
+    for (int i = 0; i < MAX_NUM_COMPONENT; i++)
+    {
+        if (m_filmGrain->m_compModeli.bPresentFlag)
+        {
+            x265_fread((char* )(&m_filmGrain->m_compModeli.m_filmGrainNumIntensityIntervalMinus1), sizeof(uint8_t), 1, filmgrain, errorMessage);
+            x265_fread((char* )(&m_filmGrain->m_compModeli.numModelValues), sizeof(uint8_t), 1, filmgrain, errorMessage);
+            m_filmGrain->m_compModeli.intensityValues = (FilmGrainCharacteristics::CompModelIntensityValues* ) malloc(sizeof(FilmGrainCharacteristics::CompModelIntensityValues) * (m_filmGrain->m_compModeli.m_filmGrainNumIntensityIntervalMinus1+1)) ;
+            for (int j = 0; j <= m_filmGrain->m_compModeli.m_filmGrainNumIntensityIntervalMinus1; j++)
+            {
+                x265_fread((char* )(&m_filmGrain->m_compModeli.intensityValuesj.intensityIntervalLowerBound), sizeof(uint8_t), 1, filmgrain, errorMessage);
+                x265_fread((char* )(&m_filmGrain->m_compModeli.intensityValuesj.intensityIntervalUpperBound), sizeof(uint8_t), 1, filmgrain, errorMessage);
+                m_filmGrain->m_compModeli.intensityValuesj.compModelValue = (int* ) malloc(sizeof(int) * (m_filmGrain->m_compModeli.numModelValues));
+                for (int k = 0; k < m_filmGrain->m_compModeli.numModelValues; k++)
+                {
+                    x265_fread((char* )(&m_filmGrain->m_compModeli.intensityValuesj.compModelValuek), sizeof(int), 1, filmgrain, errorMessage);
+                }
+            }
+        }
+    }
+}
 #if ENABLE_LIBVMAF
 void FrameEncoder::vmafFrameLevelScore()
 {

x265_3.5.tar.gz/source/encoder/frameencoder.h Changed

@@ -40,6 +40,7 @@
 #include "ratecontrol.h"
 #include "reference.h"
 #include "nal.h"
+#include "temporalfilter.h"
 
 namespace X265_NS {
 // private x265 namespace
@@ -113,6 +114,34 @@
     }
 };
 
+/*Film grain characteristics*/
+struct FilmGrain
+{
+    bool    m_filmGrainCharacteristicsCancelFlag;
+    bool    m_filmGrainCharacteristicsPersistenceFlag;
+    bool    m_separateColourDescriptionPresentFlag;
+    uint8_t m_filmGrainModelId;
+    uint8_t m_blendingModeId;
+    uint8_t m_log2ScaleFactor;
+};
+
+struct ColourDescription
+{
+    bool        m_filmGrainFullRangeFlag;
+    uint8_t     m_filmGrainBitDepthLumaMinus8;
+    uint8_t     m_filmGrainBitDepthChromaMinus8;
+    uint8_t     m_filmGrainColourPrimaries;
+    uint8_t     m_filmGrainTransferCharacteristics;
+    uint8_t     m_filmGrainMatrixCoeffs;
+};
+
+struct FGPresent
+{
+    uint8_t     m_blendingModeId;
+    uint8_t     m_log2ScaleFactor;
+    bool        m_presentFlag3;
+};
+
 // Manages the wave-front processing of a single encoding frame
 class FrameEncoder : public WaveFront, public Thread
 {
@@ -205,6 +234,10 @@
     FrameFilter              m_frameFilter;
     NALList                  m_nalList;
 
+    // initialization for mcstf
+    TemporalFilter*          m_frameEncTF;
+    TemporalFilterRefPicInfo m_mcstfRefListMAX_MCSTF_TEMPORAL_WINDOW_LENGTH;
+
     class WeightAnalysis : public BondedTaskGroup
     {
     public:
@@ -250,6 +283,7 @@
     void collectDynDataFrame();
     void computeAvgTrainingData();
     void collectDynDataRow(CUData& ctu, FrameStats* rowStats);    
+    void readModel(FilmGrainCharacteristics* m_filmGrain, FILE* filmgrain);
 };
 }

x265_3.5.tar.gz/source/encoder/motion.cpp Changed

@@ -190,6 +190,31 @@
     X265_CHECK(!bChromaSATD, "chroma distortion measurements impossible in this code path\n");
 }
 
+/* Called by lookahead, luma only, no use of PicYuv */
+void MotionEstimate::setSourcePU(pixel *fencY, intptr_t stride, intptr_t offset, int pwidth, int pheight, const int method, const int refine)
+{
+    partEnum = partitionFromSizes(pwidth, pheight);
+    X265_CHECK(LUMA_4x4 != partEnum, "4x4 inter partition detected!\n");
+    sad = primitives.pupartEnum.sad;
+    ads = primitives.pupartEnum.ads;
+    satd = primitives.pupartEnum.satd;
+    sad_x3 = primitives.pupartEnum.sad_x3;
+    sad_x4 = primitives.pupartEnum.sad_x4;
+
+
+    blockwidth = pwidth;
+    blockOffset = offset;
+    absPartIdx = ctuAddr = -1;
+
+    /* Search params */
+    searchMethod = method;
+    subpelRefine = refine;
+
+    /* copy PU block into cache */
+    primitives.pupartEnum.copy_pp(fencPUYuv.m_buf0, FENC_STRIDE, fencY + offset, stride);
+    X265_CHECK(!bChromaSATD, "chroma distortion measurements impossible in this code path\n");
+}
+
 /* Called by Search::predInterSearch() or --pme equivalent, chroma residual might be considered */
 void MotionEstimate::setSourcePU(const Yuv& srcFencYuv, int _ctuAddr, int cuPartIdx, int puPartIdx, int pwidth, int pheight, const int method, const int refine, bool bChroma)
 {

x265_3.5.tar.gz/source/encoder/motion.h Changed

x265_3.5.tar.gz/source/encoder/ratecontrol.cpp Changed

@@ -41,6 +41,10 @@
 #define BR_SHIFT  6
 #define CPB_SHIFT 4
 
+#define SHARED_DATA_ALIGNMENT      4 ///< 4btye, 32bit
+#define CUTREE_SHARED_MEM_NAME     "cutree"
+#define GOP_CNT_CU_TREE            3
+
 using namespace X265_NS;
 
 /* Amortize the partial cost of I frames over the next N frames */
@@ -104,6 +108,37 @@
     return output;
 }
 
+typedef struct CUTreeSharedDataItem
+{
+    uint8_t  *type;
+    uint16_t *stats;
+}CUTreeSharedDataItem;
+
+void static ReadSharedCUTreeData(void *dst, void *src, int32_t size)
+{
+    CUTreeSharedDataItem *statsDst = reinterpret_cast<CUTreeSharedDataItem *>(dst);
+    uint8_t *typeSrc = reinterpret_cast<uint8_t *>(src);
+    *statsDst->type = *typeSrc;
+
+    ///< for memory alignment, the type will take 32bit in the shared memory
+    int32_t offset = (sizeof(*statsDst->type) + SHARED_DATA_ALIGNMENT - 1) & ~(SHARED_DATA_ALIGNMENT - 1);
+    uint16_t *statsSrc = reinterpret_cast<uint16_t *>(typeSrc + offset);
+    memcpy(statsDst->stats, statsSrc, size - offset);
+}
+
+void static WriteSharedCUTreeData(void *dst, void *src, int32_t size)
+{
+    CUTreeSharedDataItem *statsSrc = reinterpret_cast<CUTreeSharedDataItem *>(src);
+    uint8_t *typeDst = reinterpret_cast<uint8_t *>(dst);
+    *typeDst = *statsSrc->type;
+
+    ///< for memory alignment, the type will take 32bit in the shared memory
+    int32_t offset = (sizeof(*statsSrc->type) + SHARED_DATA_ALIGNMENT - 1) & ~(SHARED_DATA_ALIGNMENT - 1);
+    uint16_t *statsDst = reinterpret_cast<uint16_t *>(typeDst + offset);
+    memcpy(statsDst, statsSrc->stats, size - offset);
+}
+
+
 inline double qScale2bits(RateControlEntry *rce, double qScale)
 {
     if (qScale < 0.1)
@@ -209,6 +244,7 @@
     m_lastAbrResetPoc = -1;
     m_statFileOut = NULL;
     m_cutreeStatFileOut = m_cutreeStatFileIn = NULL;
+    m_cutreeShrMem = NULL;
     m_rce2Pass = NULL;
     m_encOrder = NULL;
     m_lastBsliceSatdCost = 0;
@@ -320,6 +356,42 @@
         m_cuTreeStats.qpBufferi = NULL;
 }
 
+bool RateControl::initCUTreeSharedMem()
+{
+    if (!m_cutreeShrMem) {
+        m_cutreeShrMem = new RingMem();
+        if (!m_cutreeShrMem)
+        {
+            return false;
+        }
+
+        ///< now cutree data form at most 3 gops would be stored in the shared memory at the same time
+        int32_t itemSize = (sizeof(uint8_t) + SHARED_DATA_ALIGNMENT - 1) & ~(SHARED_DATA_ALIGNMENT - 1);
+        if (m_param->rc.qgSize == 8)
+        {
+            itemSize += sizeof(uint16_t) * m_ncu * 4;
+        }
+        else
+        {
+            itemSize += sizeof(uint16_t) * m_ncu;
+        }
+
+        int32_t itemCnt = X265_MIN(m_param->keyframeMax, (int)(m_fps + 0.5));
+        itemCnt *= GOP_CNT_CU_TREE;
+
+        char shrnameMAX_SHR_NAME_LEN = { 0 };
+        strcpy(shrname, m_param->rc.sharedMemName);
+        strcat(shrname, CUTREE_SHARED_MEM_NAME);
+
+        if (!m_cutreeShrMem->init(itemSize, itemCnt, shrname))
+        {
+            return false;
+        }
+    }
+
+    return true;
+}
+
 bool RateControl::init(const SPS& sps)
 {
     if (m_isVbv && !m_initVbv)
@@ -421,244 +493,257 @@
         /* Load stat file and init 2pass algo */
         if (m_param->rc.bStatRead)
         {
-            m_expectedBitsSum = 0;
-            char *p, *statsIn, *statsBuf;
-            /* read 1st pass stats */
-            statsIn = statsBuf = x265_slurp_file(fileName);
-            if (!statsBuf)
-                return false;
-            if (m_param->rc.cuTree)
+            if (X265_SHARE_MODE_FILE == m_param->rc.dataShareMode)
             {
-                char *tmpFile = strcatFilename(fileName, ".cutree");
-                if (!tmpFile)
+                m_expectedBitsSum = 0;
+                char *p, *statsIn, *statsBuf;
+                /* read 1st pass stats */
+                statsIn = statsBuf = x265_slurp_file(fileName);
+                if (!statsBuf)
                     return false;
-                m_cutreeStatFileIn = x265_fopen(tmpFile, "rb");
-                X265_FREE(tmpFile);
-                if (!m_cutreeStatFileIn)
+                if (m_param->rc.cuTree)
                 {
-                    x265_log_file(m_param, X265_LOG_ERROR, "can't open stats file %s.cutree\n", fileName);
-                    return false;
+                    char *tmpFile = strcatFilename(fileName, ".cutree");
+                    if (!tmpFile)
+                        return false;
+                    m_cutreeStatFileIn = x265_fopen(tmpFile, "rb");
+                    X265_FREE(tmpFile);
+                    if (!m_cutreeStatFileIn)
+                    {
+                        x265_log_file(m_param, X265_LOG_ERROR, "can't open stats file %s.cutree\n", fileName);
+                        return false;
+                    }
                 }
-            }
 
-            /* check whether 1st pass options were compatible with current options */
-            if (strncmp(statsBuf, "#options:", 9))
-            {
-                x265_log(m_param, X265_LOG_ERROR,"options list in stats file not valid\n");
-                return false;
-            }
-            {
-                int i, j, m;
-                uint32_t k , l;
-                bool bErr = false;
-                char *opts = statsBuf;
-                statsIn = strchr(statsBuf, '\n');
-                if (!statsIn)
-                {
-                    x265_log(m_param, X265_LOG_ERROR, "Malformed stats file\n");
-                    return false;
-                }
-                *statsIn = '\0';
-                statsIn++;
-                if ((p = strstr(opts, " input-res=")) == 0 || sscanf(p, " input-res=%dx%d", &i, &j) != 2)
-                {
-                    x265_log(m_param, X265_LOG_ERROR, "Resolution specified in stats file not valid\n");
-                    return false;
-                }
-                if ((p = strstr(opts, " fps=")) == 0 || sscanf(p, " fps=%u/%u", &k, &l) != 2)
+                /* check whether 1st pass options were compatible with current options */
+                if (strncmp(statsBuf, "#options:", 9))
                 {
-                    x265_log(m_param, X265_LOG_ERROR, "fps specified in stats file not valid\n");
+                    x265_log(m_param, X265_LOG_ERROR, "options list in stats file not valid\n");
                     return false;
                 }
-                if (((p = strstr(opts, " vbv-maxrate=")) == 0 || sscanf(p, " vbv-maxrate=%d", &m) != 1) && m_param->rc.rateControlMode == X265_RC_CRF)
                 {
-                    x265_log(m_param, X265_LOG_ERROR, "Constant rate-factor is incompatible with 2pass without vbv-maxrate in the previous pass\n");
-                    return false;
-                }
-                if (k != m_param->fpsNum || l != m_param->fpsDenom)
-                {
-                    x265_log(m_param, X265_LOG_ERROR, "fps mismatch with 1st pass (%u/%u vs %u/%u)\n",
-                              m_param->fpsNum, m_param->fpsDenom, k, l);
-                    return false;
-                }
-                if (m_param->analysisMultiPassRefine)
-                {
-                    p = strstr(opts, "ref=");
-                    sscanf(p, "ref=%d", &i);
-                    if (i > m_param->maxNumReferences)
+                    int i, j, m;
+                    uint32_t k, l;
+                    bool bErr = false;
+                    char *opts = statsBuf;
+                    statsIn = strchr(statsBuf, '\n');
+                    if (!statsIn)
                     {
-                        x265_log(m_param, X265_LOG_ERROR, "maxNumReferences cannot be less than 1st pass (%d vs %d)\n",
-                            i, m_param->maxNumReferences);
+                        x265_log(m_param, X265_LOG_ERROR, "Malformed stats file\n");
                         return false;

x265_3.5.tar.gz/source/encoder/ratecontrol.h Changed

@@ -28,6 +28,7 @@
 
 #include "common.h"
 #include "sei.h"
+#include "ringmem.h"
 
 namespace X265_NS {
 // encoder namespace
@@ -73,6 +74,7 @@
     Predictor  rowPreds32;
     Predictor* rowPred2;
 
+    int64_t currentSatd;
     int64_t lastSatd;      /* Contains the picture cost of the previous frame, required for resetAbr and VBV */
     int64_t leadingNoBSatd;
     int64_t rowTotalBits;  /* update cplxrsum and totalbits at the end of 2 rows */
@@ -87,6 +89,8 @@
     double  rowCplxrSum;
     double  qpNoVbv;
     double  bufferFill;
+    double  bufferFillFinal;
+    double  bufferFillActual;
     double  targetFill;
     bool    vbvEndAdj;
     double  frameDuration;
@@ -237,6 +241,8 @@
     FILE*   m_statFileOut;
     FILE*   m_cutreeStatFileOut;
     FILE*   m_cutreeStatFileIn;
+    ///< store the cutree data in memory instead of file
+    RingMem *m_cutreeShrMem;
     double  m_lastAccumPNorm;
     double  m_expectedBitsSum;   /* sum of qscale2bits after rceq, ratefactor, and overflow, only includes finished frames */
     int64_t m_predictedBits;
@@ -271,6 +277,9 @@
     int writeRateControlFrameStats(Frame* curFrame, RateControlEntry* rce);
     bool   initPass2();
 
+    bool initCUTreeSharedMem();
+    void skipCUTreeSharedMemRead(int32_t cnt);
+
     double forwardMasking(Frame* curFrame, double q);
     double backwardMasking(Frame* curFrame, double q);

x265_3.5.tar.gz/source/encoder/sei.h Changed

@@ -73,6 +73,101 @@
     }
 };
 
+/* Film grain characteristics */
+class FilmGrainCharacteristics : public SEI
+{
+  public:
+
+    FilmGrainCharacteristics()
+    {
+        m_payloadType = FILM_GRAIN_CHARACTERISTICS;
+        m_payloadSize = 0;
+    }
+
+    struct CompModelIntensityValues
+    {
+        uint8_t intensityIntervalLowerBound;
+        uint8_t intensityIntervalUpperBound;
+        int*    compModelValue;
+    };
+
+    struct CompModel
+    {
+        bool    bPresentFlag;
+        uint8_t numModelValues;
+        uint8_t m_filmGrainNumIntensityIntervalMinus1;
+        CompModelIntensityValues* intensityValues;
+    };
+
+    CompModel   m_compModelMAX_NUM_COMPONENT;
+    bool        m_filmGrainCharacteristicsPersistenceFlag;
+    bool        m_filmGrainCharacteristicsCancelFlag;
+    bool        m_separateColourDescriptionPresentFlag;
+    bool        m_filmGrainFullRangeFlag;
+    uint8_t     m_filmGrainModelId;
+    uint8_t     m_blendingModeId;
+    uint8_t     m_log2ScaleFactor;
+    uint8_t     m_filmGrainBitDepthLumaMinus8;
+    uint8_t     m_filmGrainBitDepthChromaMinus8;
+    uint8_t     m_filmGrainColourPrimaries;
+    uint8_t     m_filmGrainTransferCharacteristics;
+    uint8_t     m_filmGrainMatrixCoeffs;
+
+    void writeSEI(const SPS&)
+    {
+        WRITE_FLAG(m_filmGrainCharacteristicsCancelFlag, "film_grain_characteristics_cancel_flag");
+
+        if (!m_filmGrainCharacteristicsCancelFlag)
+        {
+            WRITE_CODE(m_filmGrainModelId, 2, "film_grain_model_id");
+            WRITE_FLAG(m_separateColourDescriptionPresentFlag, "separate_colour_description_present_flag");
+            if (m_separateColourDescriptionPresentFlag)
+            {
+                WRITE_CODE(m_filmGrainBitDepthLumaMinus8, 3, "film_grain_bit_depth_luma_minus8");
+                WRITE_CODE(m_filmGrainBitDepthChromaMinus8, 3, "film_grain_bit_depth_chroma_minus8");
+                WRITE_FLAG(m_filmGrainFullRangeFlag, "film_grain_full_range_flag");
+                WRITE_CODE(m_filmGrainColourPrimaries, X265_BYTE, "film_grain_colour_primaries");
+                WRITE_CODE(m_filmGrainTransferCharacteristics, X265_BYTE, "film_grain_transfer_characteristics");
+                WRITE_CODE(m_filmGrainMatrixCoeffs, X265_BYTE, "film_grain_matrix_coeffs");
+            }
+            WRITE_CODE(m_blendingModeId, 2, "blending_mode_id");
+            WRITE_CODE(m_log2ScaleFactor, 4, "log2_scale_factor");
+            for (uint8_t c = 0; c < 3; c++)
+            {
+                WRITE_FLAG(m_compModelc.bPresentFlag && m_compModelc.m_filmGrainNumIntensityIntervalMinus1 + 1 > 0 && m_compModelc.numModelValues > 0, "comp_model_present_flagc");
+            }
+            for (uint8_t c = 0; c < 3; c++)
+            {
+                if (m_compModelc.bPresentFlag && m_compModelc.m_filmGrainNumIntensityIntervalMinus1 + 1 > 0 && m_compModelc.numModelValues > 0)
+                {
+                    assert(m_compModelc.m_filmGrainNumIntensityIntervalMinus1 + 1 <= 256);
+                    assert(m_compModelc.numModelValues <= X265_BYTE);
+                    WRITE_CODE(m_compModelc.m_filmGrainNumIntensityIntervalMinus1 , X265_BYTE, "num_intensity_intervals_minus1c");
+                    WRITE_CODE(m_compModelc.numModelValues - 1, 3, "num_model_values_minus1c");
+                    for (uint8_t interval = 0; interval < m_compModelc.m_filmGrainNumIntensityIntervalMinus1 + 1; interval++)
+                    {
+                        WRITE_CODE(m_compModelc.intensityValuesinterval.intensityIntervalLowerBound, X265_BYTE, "intensity_interval_lower_boundci");
+                        WRITE_CODE(m_compModelc.intensityValuesinterval.intensityIntervalUpperBound, X265_BYTE, "intensity_interval_upper_boundci");
+                        for (uint8_t j = 0; j < m_compModelc.numModelValues; j++)
+                        {
+                            WRITE_SVLC(m_compModelc.intensityValuesinterval.compModelValuej,"comp_model_valueci");
+                        }
+                    }
+                }
+            }
+            WRITE_FLAG(m_filmGrainCharacteristicsPersistenceFlag, "film_grain_characteristics_persistence_flag");
+        }
+        if (m_bitIf->getNumberOfWrittenBits() % X265_BYTE != 0)
+        {
+            WRITE_FLAG(1, "payload_bit_equal_to_one");
+            while (m_bitIf->getNumberOfWrittenBits() % X265_BYTE != 0)
+            {
+                WRITE_FLAG(0, "payload_bit_equal_to_zero");
+            }
+        }
+    }
+};
+
 static const uint32_t ISO_IEC_11578_LEN = 16;
 
 class SEIuserDataUnregistered : public SEI

x265_3.5.tar.gz/source/encoder/slicetype.cpp Changed

@@ -45,6 +45,14 @@
 
 namespace {
 
+uint32_t acEnergyVarHist(uint64_t sum_ssd, int shift)
+{
+    uint32_t sum = (uint32_t)sum_ssd;
+    uint32_t ssd = (uint32_t)(sum_ssd >> 32);
+
+    return ssd - ((uint64_t)sum * sum >> shift);
+}
+
 /* Compute variance to derive AC energy of each block */
 inline uint32_t acEnergyVar(Frame *curFrame, uint64_t sum_ssd, int shift, int plane)
 {
@@ -184,7 +192,7 @@
     {
         for (int colNum = 0; colNum < width; colNum++)
         {
-            if ((rowNum >= 2) && (colNum >= 2) && (rowNum != height - 2) && (colNum != width - 2)) //Ignoring the border pixels of the picture
+            if ((rowNum >= 2) && (colNum >= 2) && (rowNum < height - 2) && (colNum < width - 2)) //Ignoring the border pixels of the picture
             {
                 /*  5x5 Gaussian filter
                     2   4   5   4   2
@@ -519,7 +527,7 @@
                 if (param->rc.aqMode == X265_AQ_EDGE)
                     edgeFilter(curFrame, param);
 
-                if (param->rc.aqMode == X265_AQ_EDGE && !param->bHistBasedSceneCut && param->recursionSkipMode == EDGE_BASED_RSKIP)
+                if (param->rc.aqMode == X265_AQ_EDGE && param->recursionSkipMode == EDGE_BASED_RSKIP)
                 {
                     pixel* src = curFrame->m_edgePic + curFrame->m_fencPic->m_lumaMarginY * curFrame->m_fencPic->m_stride + curFrame->m_fencPic->m_lumaMarginX;
                     primitives.planecopy_pp_shr(src, curFrame->m_fencPic->m_stride, curFrame->m_edgeBitPic,
@@ -1050,7 +1058,30 @@
     m_countPreLookahead = 0;
 #endif
 
-    memset(m_histogram, 0, sizeof(m_histogram));
+    m_accHistDiffRunningAvgCb = X265_MALLOC(uint32_t*, NUMBER_OF_SEGMENTS_IN_WIDTH * sizeof(uint32_t*));
+    m_accHistDiffRunningAvgCb0 = X265_MALLOC(uint32_t, NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT);
+    memset(m_accHistDiffRunningAvgCb0, 0, sizeof(uint32_t) * NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT);
+    for (uint32_t w = 1; w < NUMBER_OF_SEGMENTS_IN_WIDTH; w++) {
+        m_accHistDiffRunningAvgCbw = m_accHistDiffRunningAvgCb0 + w * NUMBER_OF_SEGMENTS_IN_HEIGHT;
+    }
+
+    m_accHistDiffRunningAvgCr = X265_MALLOC(uint32_t*, NUMBER_OF_SEGMENTS_IN_WIDTH * sizeof(uint32_t*));
+    m_accHistDiffRunningAvgCr0 = X265_MALLOC(uint32_t, NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT);
+    memset(m_accHistDiffRunningAvgCr0, 0, sizeof(uint32_t) * NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT);
+    for (uint32_t w = 1; w < NUMBER_OF_SEGMENTS_IN_WIDTH; w++) {
+        m_accHistDiffRunningAvgCrw = m_accHistDiffRunningAvgCr0 + w * NUMBER_OF_SEGMENTS_IN_HEIGHT;
+    }
+
+    m_accHistDiffRunningAvg = X265_MALLOC(uint32_t*, NUMBER_OF_SEGMENTS_IN_WIDTH * sizeof(uint32_t*));
+    m_accHistDiffRunningAvg0 = X265_MALLOC(uint32_t, NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT);
+    memset(m_accHistDiffRunningAvg0, 0, sizeof(uint32_t) * NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT);
+    for (uint32_t w = 1; w < NUMBER_OF_SEGMENTS_IN_WIDTH; w++) {
+        m_accHistDiffRunningAvgw = m_accHistDiffRunningAvg0 + w * NUMBER_OF_SEGMENTS_IN_HEIGHT;
+    }
+
+    m_resetRunningAvg = true;
+
+    m_segmentCountThreshold = (uint32_t)(((float)((NUMBER_OF_SEGMENTS_IN_WIDTH * NUMBER_OF_SEGMENTS_IN_HEIGHT) * 50) / 100) + 0.5);
 }
 
 #if DETAILED_CU_STATS
@@ -1098,6 +1129,7 @@
             m_pooli.stopWorkers();
     }
 }
+
 void Lookahead::destroy()
 {
     // these two queues will be empty unless the encode was aborted
@@ -1377,6 +1409,357 @@
     }
 }
 
+double computeBrightnessIntensity(pixel *inPlane, int width, int height, intptr_t stride)
+{
+    pixel* rowStart = inPlane;
+    double count = 0;
+
+    for (int i = 0; i < height; i++)
+    {
+        for (int j = 0; j < width; j++)
+        {
+            if (rowStartj > BRIGHTNESS_THRESHOLD)
+                count++;
+        }
+        rowStart += stride;
+    }
+
+    /* Returns the brightness percentage of the input plane */
+    return (count / (width * height)) * 100;
+}
+
+double computeEdgeIntensity(pixel *inPlane, int width, int height, intptr_t stride)
+{
+    pixel* rowStart = inPlane;
+    double count = 0;
+
+    for (int i = 0; i < height; i++)
+    {
+        for (int j = 0; j < width; j++)
+        {
+            if (rowStartj > 0)
+                count++;
+        }
+        rowStart += stride;
+    }
+
+    /* Returns the edge percentage of the input plane */
+    return (count / (width * height)) * 100;
+}
+
+uint32_t LookaheadTLD::calcVariance(pixel* inpSrc, intptr_t stride, intptr_t blockOffset, uint32_t plane)
+{
+    pixel* src = inpSrc + blockOffset;
+
+    uint32_t var;
+    if (!plane)
+        var = acEnergyVarHist(primitives.cuBLOCK_8x8.var(src, stride), 6);
+    else
+        var = acEnergyVarHist(primitives.cuBLOCK_4x4.var(src, stride), 4);
+
+    x265_emms();
+    return var;
+}
+
+/*
+** Compute Block and Picture Variance, Block Mean for all blocks in the picture
+*/
+void LookaheadTLD::computePictureStatistics(Frame *curFrame)
+{
+    int maxCol = curFrame->m_fencPic->m_picWidth;
+    int maxRow = curFrame->m_fencPic->m_picHeight;
+    intptr_t inpStride = curFrame->m_fencPic->m_stride;
+
+    // Variance
+    uint64_t picTotVariance = 0;
+    uint32_t variance;
+
+    uint64_t blockXY = 0;
+    pixel* src = curFrame->m_fencPic->m_picOrg0;
+
+    for (int blockY = 0; blockY < maxRow; blockY += 8)
+    {
+        uint64_t rowVariance = 0;
+        for (int blockX = 0; blockX < maxCol; blockX += 8)
+        {
+            intptr_t blockOffsetLuma = blockX + (blockY * inpStride);
+
+            variance = calcVariance(
+                src,
+                inpStride,
+                blockOffsetLuma, 0);
+
+            rowVariance += variance;
+            blockXY++;
+        }
+        picTotVariance += (uint16_t)(rowVariance / maxCol);
+    }
+
+    curFrame->m_lowres.picAvgVariance = (uint16_t)(picTotVariance / maxRow);
+
+    // Collect chroma variance
+    int hShift = curFrame->m_fencPic->m_hChromaShift;
+    int vShift = curFrame->m_fencPic->m_vChromaShift;
+
+    int maxColChroma = curFrame->m_fencPic->m_picWidth >> hShift;
+    int maxRowChroma = curFrame->m_fencPic->m_picHeight >> vShift;
+    intptr_t cStride = curFrame->m_fencPic->m_strideC;
+
+    pixel* srcCb = curFrame->m_fencPic->m_picOrg1;
+
+    picTotVariance = 0;
+    for (int blockY = 0; blockY < maxRowChroma; blockY += 4)
+    {
+        uint64_t rowVariance = 0;
+        for (int blockX = 0; blockX < maxColChroma; blockX += 4)
+        {
+            intptr_t blockOffsetChroma = blockX + blockY * cStride;
+
+            variance = calcVariance(
+                srcCb,
+                cStride,
+                blockOffsetChroma, 1);
+
+            rowVariance += variance;
+            blockXY++;
+        }
+        picTotVariance += (uint16_t)(rowVariance / maxColChroma);
+    }
+
+    curFrame->m_lowres.picAvgVarianceCb = (uint16_t)(picTotVariance / maxRowChroma);
+
+
+    pixel* srcCr = curFrame->m_fencPic->m_picOrg2;
+
+    picTotVariance = 0;

x265_3.5.tar.gz/source/encoder/slicetype.h Changed

@@ -44,6 +44,24 @@
 #define EDGE_INCLINATION 45
 #define TEMPORAL_SCENECUT_THRESHOLD 50
 
+#define X265_ABS(a)                        (((a) < 0) ? (-(a)) : (a))
+
+#define PICTURE_DIFF_VARIANCE_TH            390
+#define PICTURE_VARIANCE_TH                 1500
+#define LOW_VAR_SCENE_CHANGE_TH             2250
+#define HIGH_VAR_SCENE_CHANGE_TH            3500
+
+#define PICTURE_DIFF_VARIANCE_CHROMA_TH     10
+#define PICTURE_VARIANCE_CHROMA_TH          20
+#define LOW_VAR_SCENE_CHANGE_CHROMA_TH      2250/4
+#define HIGH_VAR_SCENE_CHANGE_CHROMA_TH     3500/4
+
+#define FLASH_TH                            1.5
+#define FADE_TH                             4
+#define INTENSITY_CHANGE_TH                 4
+
+#define NUM64x64INPIC(w,h)                  ((w*h)>> (MAX_LOG2_CU_SIZE<<1))
+
 #if HIGH_BIT_DEPTH
 #define EDGE_THRESHOLD 1023.0
 #else
@@ -93,7 +111,29 @@
 
     ~LookaheadTLD() { X265_FREE(wbuffer0); }
 
+    void collectPictureStatistics(Frame *curFrame);
+    void computeIntensityHistogramBinsLuma(Frame *curFrame, uint64_t *sumAvgIntensityTotalSegmentsLuma);
+
+    void computeIntensityHistogramBinsChroma(
+        Frame    *curFrame,
+        uint64_t *sumAverageIntensityCb,
+        uint64_t *sumAverageIntensityCr);
+
+    void calculateHistogram(
+        pixel    *inputSrc,
+        uint32_t  inputWidth,
+        uint32_t  inputHeight,
+        intptr_t  stride,
+        uint8_t   dsFactor,
+        uint32_t *histogram,
+        uint64_t *sum);
+
+    void computePictureStatistics(Frame *curFrame);
+
+    uint32_t calcVariance(pixel* src, intptr_t stride, intptr_t blockOffset, uint32_t plane);
+
     void calcAdaptiveQuantFrame(Frame *curFrame, x265_param* param);
+    void calcFrameSegment(Frame *curFrame);
     void lowresIntraEstimate(Lowres& fenc, uint32_t qgSize);
 
     void weightsAnalyse(Lowres& fenc, Lowres& ref);
@@ -124,7 +164,6 @@
 
     /* pre-lookahead */
     int           m_fullQueueSize;
-    int           m_histogramX265_BFRAME_MAX + 1;
     int           m_lastKeyframe;
     int           m_8x8Width;
     int           m_8x8Height;
@@ -153,6 +192,14 @@
     bool          m_isFadeIn;
     uint64_t      m_fadeCount;
     int           m_fadeStart;
+
+    uint32_t    **m_accHistDiffRunningAvgCb;
+    uint32_t    **m_accHistDiffRunningAvgCr;
+    uint32_t    **m_accHistDiffRunningAvg;
+
+    bool          m_resetRunningAvg;
+    uint32_t      m_segmentCountThreshold;
+
     Lookahead(x265_param *param, ThreadPool *pool);
 #if DETAILED_CU_STATS
     int64_t       m_slicetypeDecideElapsedTime;
@@ -174,6 +221,7 @@
 
     void    getEstimatedPictureCost(Frame *pic);
     void    setLookaheadQueue();
+    int     findSliceType(int poc);
 
 protected:
 
@@ -184,6 +232,10 @@
     /* called by slicetypeAnalyse() to make slice decisions */
     bool    scenecut(Lowres **frames, int p0, int p1, bool bRealScenecut, int numFrames);
     bool    scenecutInternal(Lowres **frames, int p0, int p1, bool bRealScenecut);
+
+    bool    histBasedScenecut(Lowres **frames, int p0, int p1, int numFrames);
+    bool    detectHistBasedSceneChange(Lowres **frames, int p0, int p1, int p2);
+
     void    slicetypePath(Lowres **frames, int length, char(*best_paths)X265_LOOKAHEAD_MAX + 1);
     int64_t slicetypePathCost(Lowres **frames, char *path, int64_t threshold);
     int64_t vbvFrameCost(Lowres **frames, int p0, int p1, int b);

x265_3.5.tar.gz/source/test/CMakeLists.txt Changed

x265_3.5.tar.gz/source/test/pixelharness.cpp Changed

@@ -406,6 +406,32 @@
     return true;
 }
 
+bool PixelHarness::check_downscaleluma_t(downscaleluma_t ref, downscaleluma_t opt)
+{
+    ALIGN_VAR_16(pixel, ref_destf32 * 32);
+    ALIGN_VAR_16(pixel, opt_destf32 * 32);
+
+    intptr_t src_stride = 64;
+    intptr_t dst_stride = 32;
+    int bx = 32;
+    int by = 32;
+    int j = 0;
+    for (int i = 0; i < ITERS; i++)
+    {
+        int index = i % TEST_CASES;
+        ref(pixel_test_buffindex + j, ref_destf, src_stride, dst_stride, bx, by);
+        checked(opt, pixel_test_buffindex + j, opt_destf, src_stride, dst_stride, bx, by);
+
+        if (memcmp(ref_destf, opt_destf, 32 * 32 * sizeof(pixel)))
+            return false;
+
+        reportfail();
+        j += INCR;
+    }
+
+    return true;
+}
+
 bool PixelHarness::check_cpy2Dto1D_shl_t(cpy2Dto1D_shl_t ref, cpy2Dto1D_shl_t opt)
 {
     ALIGN_VAR_16(int16_t, ref_dest64 * 64);
@@ -2793,6 +2819,15 @@
         }
     }
 
+    if (opt.frameSubSampleLuma)
+    {
+        if (!check_downscaleluma_t(ref.frameSubSampleLuma, opt.frameSubSampleLuma))
+        {
+            printf("SubSample Luma failed!\n");
+            return false;
+        }
+    }
+
     if (opt.scale1D_128to64NONALIGNED)
     {
         if (!check_scale1D_pp(ref.scale1D_128to64NONALIGNED, opt.scale1D_128to64NONALIGNED))
@@ -3492,6 +3527,12 @@
         REPORT_SPEEDUP(opt.frameInitLowres, ref.frameInitLowres, pbuf2, pbuf1, pbuf2, pbuf3, pbuf4, 64, 64, 64, 64);
     }
 
+    if (opt.frameSubSampleLuma)
+    {
+        HEADER0("downscaleluma");
+        REPORT_SPEEDUP(opt.frameSubSampleLuma, ref.frameSubSampleLuma, pbuf2, pbuf1, 64, 64, 64, 64);
+    }
+
     if (opt.scale1D_128to64NONALIGNED)
     {
         HEADER0("scale1D_128to64");

x265_3.5.tar.gz/source/test/pixelharness.h Changed

x265_3.5.tar.gz/source/test/regression-tests.txt Changed

@@ -18,12 +18,12 @@
 BasketballDrive_1920x1080_50.y4m,--preset faster --aq-strength 2 --merange 190 --slices 3
 BasketballDrive_1920x1080_50.y4m,--preset medium --ctu 16 --max-tu-size 8 --subme 7 --qg-size 16 --cu-lossless --tu-inter-depth 3 --limit-tu 1
 BasketballDrive_1920x1080_50.y4m,--preset medium --keyint -1 --nr-inter 100 -F4 --no-sao
-BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 2 --bitrate 7000 --limit-modes::--preset medium --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 2 --bitrate 7000 --limit-modes
+BasketballDrive_1920x1080_50.y4m,--preset medium --analysis-save x265_analysis.dat --analysis-save-reuse-level 2 --bitrate 7000 --limit-modes::--preset medium --analysis-load x265_analysis.dat --analysis-load-reuse-level 2 --bitrate 7000 --limit-modes
 BasketballDrive_1920x1080_50.y4m,--preset slow --nr-intra 100 -F4 --aq-strength 3 --qg-size 16 --limit-refs 1
 BasketballDrive_1920x1080_50.y4m,--preset slower --lossless --chromaloc 3 --subme 0 --limit-tu 4
-BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --bitrate 7000 --limit-tu 0::--preset slower --no-cutree --analysis-load x265_analysis.dat --analysis-load-reuse-level 10 --bitrate 7000 --limit-tu 0
+BasketballDrive_1920x1080_50.y4m,--preset slower --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --bitrate 7000 --limit-tu 0::--preset slower --analysis-load x265_analysis.dat --analysis-load-reuse-level 10 --bitrate 7000 --limit-tu 0
 BasketballDrive_1920x1080_50.y4m,--preset veryslow --crf 4 --cu-lossless --pmode --limit-refs 1 --aq-mode 3 --limit-tu 3
-BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --crf 18 --tskip-fast --limit-tu 2::--preset veryslow --no-cutree --analysis-load x265_analysis.dat  --analysis-load-reuse-level 5 --crf 18 --tskip-fast --limit-tu 2
+BasketballDrive_1920x1080_50.y4m,--preset veryslow --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --crf 18 --tskip-fast --limit-tu 2::--preset veryslow --analysis-load x265_analysis.dat  --analysis-load-reuse-level 5 --crf 18 --tskip-fast --limit-tu 2
 BasketballDrive_1920x1080_50.y4m,--preset veryslow --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
 Coastguard-4k.y4m,--preset ultrafast --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
 Coastguard-4k.y4m,--preset superfast --tune grain --overscan=crop
@@ -53,7 +53,7 @@
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset veryfast --weightp --nr-intra 1000 -F4
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset medium --nr-inter 500 -F4 --no-psy-rdoq
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset slower --no-weightp --rdoq-level 0 --limit-refs 3 --tu-inter-depth 4 --limit-tu 3
-DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1::--preset fast --no-cutree --analysis-load x265_analysis.dat  --analysis-load-reuse-level 5 --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1
+DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1::--preset fast --analysis-load x265_analysis.dat  --analysis-load-reuse-level 5 --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1
 FourPeople_1280x720_60.y4m,--preset superfast --no-wpp --lookahead-slices 2
 FourPeople_1280x720_60.y4m,--preset veryfast --aq-mode 2 --aq-strength 1.5 --qg-size 8
 FourPeople_1280x720_60.y4m,--preset medium --qp 38 --no-psy-rd
@@ -160,11 +160,14 @@
 Kimono1_1920x1080_24_400.yuv,--preset superfast --qp 28 --zones 0,139,q=32
 sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut --hist-threshold 0.02 --frame-dup --dup-threshold 60 --hrd --bitrate 10000 --vbv-bufsize 15000 --vbv-maxrate 12000
 sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut --hist-threshold 0.02
+sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut --hist-threshold 0.02 --no-traditional-scenecut
 sintel_trailer_2k_1920x1080_24.yuv, --preset ultrafast --hist-scenecut --hist-threshold 0.02
 crowd_run_1920x1080_50.yuv, --preset faster --ctu 32 --rskip 2 --rskip-edge-threshold 5
 crowd_run_1920x1080_50.yuv, --preset fast --ctu 64 --rskip 2 --rskip-edge-threshold 5 --aq-mode 4
 crowd_run_1920x1080_50.yuv, --preset slow --ctu 32 --rskip 2 --rskip-edge-threshold 5 --hist-scenecut --hist-threshold 0.1
 crowd_run_1920x1080_50.yuv, --preset slower --ctu 16 --rskip 2 --rskip-edge-threshold 5 --hist-scenecut --hist-threshold 0.1 --aq-mode 4
+crowd_run_1920x1080_50.yuv, --preset ultrafast --video-signal-type-preset BT2100_PQ_YCC:BT2100x108n0005
+crowd_run_1920x1080_50.yuv, --preset ultrafast --eob --eos
  
 # Main12 intraCost overflow bug test
 720p50_parkrun_ter.y4m,--preset medium
@@ -182,14 +185,18 @@
 
 #scaled save/load test
 crowd_run_1080p50.y4m,--preset ultrafast --no-cutree --analysis-save x265_analysis.dat  --analysis-save-reuse-level 1 --scale-factor 2 --crf 26 --vbv-maxrate 8000 --vbv-bufsize 8000::crowd_run_2160p50.y4m, --preset ultrafast --no-cutree --analysis-load x265_analysis.dat  --analysis-load-reuse-level 1 --scale-factor 2 --crf 26 --vbv-maxrate 12000 --vbv-bufsize 12000 
-crowd_run_1080p50.y4m,--preset superfast --no-cutree --analysis-save x265_analysis.dat  --analysis-save-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 5000 --vbv-bufsize 5000::crowd_run_2160p50.y4m, --preset superfast --no-cutree --analysis-load x265_analysis.dat  --analysis-load-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 10000 --vbv-bufsize 10000 
-crowd_run_1080p50.y4m,--preset fast --no-cutree --analysis-save x265_analysis.dat  --analysis-save-reuse-level 5 --scale-factor 2 --qp 18::crowd_run_2160p50.y4m, --preset fast --no-cutree --analysis-load x265_analysis.dat  --analysis-load-reuse-level 5 --scale-factor 2 --qp 18
+crowd_run_1080p50.y4m,--preset superfast --analysis-save x265_analysis.dat  --analysis-save-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 5000 --vbv-bufsize 5000::crowd_run_2160p50.y4m, --preset superfast --analysis-load x265_analysis.dat  --analysis-load-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 10000 --vbv-bufsize 10000 
+crowd_run_1080p50.y4m,--preset fast --analysis-save x265_analysis.dat  --analysis-save-reuse-level 5 --scale-factor 2 --qp 18::crowd_run_2160p50.y4m, --preset fast --analysis-load x265_analysis.dat  --analysis-load-reuse-level 5 --scale-factor 2 --qp 18
 crowd_run_1080p50.y4m,--preset medium --no-cutree --analysis-save x265_analysis.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000  --vbv-maxrate 5000 --vbv-bufsize 5000 --early-skip --tu-inter-depth 3::crowd_run_2160p50.y4m, --preset medium --no-cutree --analysis-load x265_analysis.dat  --analysis-load-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 10000 --early-skip --tu-inter-depth 3 --refine-intra 4 --dynamic-refine::crowd_run_2160p50.y4m, --preset medium --no-cutree --analysis-load x265_analysis.dat  --analysis-load-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 10000 --early-skip --tu-inter-depth 3 --refine-intra 3 --refine-inter 3
-RaceHorses_416x240_30.y4m,--preset slow --no-cutree --ctu 16 --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --scale-factor 2 --crf 22  --vbv-maxrate 1000 --vbv-bufsize 1000::RaceHorses_832x480_30.y4m, --preset slow --no-cutree --ctu 32 --analysis-load x265_analysis.dat  --analysis-save x265_analysis_2.dat --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --crf 16 --vbv-maxrate 4000 --vbv-bufsize 4000 --refine-intra 0 --refine-inter 1::RaceHorses_1664x960_30.y4m,--preset slow --no-cutree --ctu 64 --analysis-load x265_analysis_2.dat  --analysis-load-reuse-level 10 --scale-factor 2 --crf 12 --vbv-maxrate 7000 --vbv-bufsize 7000 --refine-intra 2 --refine-inter 2
+RaceHorses_416x240_30.y4m,--preset slow --ctu 16 --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --scale-factor 2 --crf 22  --vbv-maxrate 1000 --vbv-bufsize 1000::RaceHorses_832x480_30.y4m, --preset slow --ctu 32 --analysis-load x265_analysis.dat  --analysis-save x265_analysis_2.dat --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --crf 16 --vbv-maxrate 4000 --vbv-bufsize 4000 --refine-intra 0 --refine-inter 1::RaceHorses_1664x960_30.y4m,--preset slow --ctu 64 --analysis-load x265_analysis_2.dat  --analysis-load-reuse-level 10 --scale-factor 2 --crf 12 --vbv-maxrate 7000 --vbv-bufsize 7000 --refine-intra 2 --refine-inter 2
 ElFunete_960x540_60.yuv,--colorprim bt709 --transfer bt709 --chromaloc 2 --aud --repeat-headers --no-opt-qp-pps --no-opt-ref-list-length-pps --wpp --no-interlace --sar 1:1 --min-keyint 60 --no-open-gop --rc-lookahead 180 --bframes 5 --b-intra --ref 4 --cbqpoffs -2 --crqpoffs -2 --lookahead-threads 0 --weightb --qg-size 8 --me star --preset veryslow --frame-threads 1 --b-adapt 2 --aq-mode 3 --rd 6 --pools 15 --colormatrix bt709 --keyint 120 --high-tier --ctu 64 --tune psnr --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500 --analysis-save-reuse-level 10 --analysis-save elfuente_960x540.dat --scale-factor 2::ElFunete_1920x1080_60.yuv,--colorprim bt709 --transfer bt709 --chromaloc 2 --aud --repeat-headers --no-opt-qp-pps --no-opt-ref-list-length-pps --wpp --no-interlace --sar 1:1 --min-keyint 60 --no-open-gop --rc-lookahead 180 --bframes 5 --b-intra --ref 4 --cbqpoffs -2 --crqpoffs -2 --lookahead-threads 0 --weightb --qg-size 8 --me star --preset veryslow --frame-threads 1 --b-adapt 2 --aq-mode 3 --rd 6 --pools 15 --colormatrix bt709 --keyint 120 --high-tier --ctu 64 --tune psnr --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500 --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --analysis-save elfuente_1920x1080.dat --limit-tu 0 --scale-factor 2 --analysis-load elfuente_960x540.dat --refine-intra 4 --refine-inter 2::ElFuente_3840x2160_60.yuv,--colorprim bt709 --transfer bt709 --chromaloc 2 --aud --repeat-headers --no-opt-qp-pps --no-opt-ref-list-length-pps --wpp --no-interlace --sar 1:1 --min-keyint 60 --no-open-gop --rc-lookahead 180 --bframes 5 --b-intra --ref 4 --cbqpoffs -2 --crqpoffs -2 --lookahead-threads 0 --weightb --qg-size 8 --me star --preset veryslow --frame-threads 1 --b-adapt 2 --aq-mode 3 --rd 6 --pools 15 --colormatrix bt709 --keyint 120 --high-tier --ctu 64 --tune=psnr --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000 --analysis-load-reuse-level 10 --limit-tu 0 --scale-factor 2 --analysis-load elfuente_1920x1080.dat --refine-intra 4 --refine-inter 2
 #save/load with ctu distortion refinement
 CrowdRun_1920x1080_50_10bit_422.yuv,--no-cutree --analysis-save x265_analysis.dat --analysis-save-reuse-level 5 --refine-ctu-distortion 1 --bitrate 7000::--no-cutree --analysis-load x265_analysis.dat --refine-ctu-distortion 1 --bitrate 7000 --analysis-load-reuse-level 5
 #segment encoding
 BasketballDrive_1920x1080_50.y4m, --preset ultrafast --no-open-gop --chunk-start 100 --chunk-end 200
 
+#Test FG SEI message addition
+#OldTownCross_1920x1080_50_10bit_422.yuv,--preset slower --tune grain --film-grain "OldTownCross_1920x1080_50_10bit_422.bin"
+#RaceHorses_416x240_30_10bit.yuv,--preset ultrafast --signhide --colormatrix bt709 --film-grain "RaceHorses_416x240_30_10bit.bin"
+
 # vim: tw=200

x265_3.5.tar.gz/source/test/save-load-tests.txt Changed

@@ -12,10 +12,10 @@
 # not auto-detected.
 crowd_run_1080p50.y4m, --preset ultrafast --no-cutree --analysis-save x265_analysis.dat  --analysis-save-reuse-level 1 --scale-factor 2 --crf 26 --vbv-maxrate 8000 --vbv-bufsize 8000::crowd_run_2160p50.y4m, --preset ultrafast --no-cutree --analysis-load x265_analysis.dat  --analysis-load-reuse-level 1 --scale-factor 2 --crf 26 --vbv-maxrate 12000 --vbv-bufsize 12000
 crowd_run_540p50.y4m, --preset ultrafast --no-cutree --analysis-save x265_analysis.dat --scale-factor 2 --crf 26 --vbv-maxrate 8000 --vbv-bufsize 8000::crowd_run_1080p50.y4m, --preset ultrafast --no-cutree --analysis-load x265_analysis.dat --scale-factor 2 --crf 26 --vbv-maxrate 12000 --vbv-bufsize 12000
-crowd_run_1080p50.y4m, --preset superfast --no-cutree --analysis-save x265_analysis.dat  --analysis-save-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 5000 --vbv-bufsize 5000::crowd_run_2160p50.y4m,   --preset superfast --no-cutree --analysis-load x265_analysis.dat  --analysis-load-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 10000 --vbv-bufsize 10000
-crowd_run_1080p50.y4m,  --preset fast --no-cutree --analysis-save x265_analysis.dat  --analysis-save-reuse-level 5 --scale-factor 2 --qp 18::crowd_run_2160p50.y4m,   --preset fast --no-cutree --analysis-load x265_analysis.dat  --analysis-load-reuse-level 5 --scale-factor 2 --qp 18
-crowd_run_1080p50.y4m,   --preset medium --no-cutree --analysis-save x265_analysis.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000  --vbv-maxrate 5000 --vbv-bufsize 5000 --early-skip --tu-inter-depth 3::crowd_run_2160p50.y4m,    --preset medium --no-cutree --analysis-load x265_analysis.dat  --analysis-load-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 10000 --early-skip --tu-inter-depth 3 --refine-intra 4 --dynamic-refine::crowd_run_2160p50.y4m,    --preset medium --no-cutree --analysis-load x265_analysis.dat  --analysis-load-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 10000 --early-skip --tu-inter-depth 3 --refine-intra 3 --refine-inter 3
+crowd_run_1080p50.y4m, --preset superfast --analysis-save x265_analysis.dat  --analysis-save-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 5000 --vbv-bufsize 5000::crowd_run_2160p50.y4m,   --preset superfast --analysis-load x265_analysis.dat  --analysis-load-reuse-level 2 --scale-factor 2 --crf 22 --vbv-maxrate 10000 --vbv-bufsize 10000
+crowd_run_1080p50.y4m,  --preset fast --analysis-save x265_analysis.dat  --analysis-save-reuse-level 5 --scale-factor 2 --qp 18::crowd_run_2160p50.y4m,   --preset fast --analysis-load x265_analysis.dat  --analysis-load-reuse-level 5 --scale-factor 2 --qp 18
+crowd_run_1080p50.y4m,   --preset medium --analysis-save x265_analysis.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000  --vbv-maxrate 5000 --vbv-bufsize 5000 --early-skip --tu-inter-depth 3::crowd_run_2160p50.y4m,    --preset medium --analysis-load x265_analysis.dat  --analysis-load-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 10000 --early-skip --tu-inter-depth 3 --refine-intra 4 --dynamic-refine::crowd_run_2160p50.y4m,    --preset medium --analysis-load x265_analysis.dat  --analysis-load-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 10000 --early-skip --tu-inter-depth 3 --refine-intra 3 --refine-inter 3
 RaceHorses_416x240_30.y4m,   --preset slow --no-cutree --ctu 16 --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --scale-factor 2 --crf 22  --vbv-maxrate 1000 --vbv-bufsize 1000::RaceHorses_832x480_30.y4m,    --preset slow --no-cutree --ctu 32 --analysis-load x265_analysis.dat  --analysis-save x265_analysis_2.dat --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --crf 16 --vbv-maxrate 4000 --vbv-bufsize 4000 --refine-intra 0 --refine-inter 1::RaceHorses_1664x960_30.y4m,   --preset slow --no-cutree --ctu 64 --analysis-load x265_analysis_2.dat  --analysis-load-reuse-level 10 --scale-factor 2 --crf 12 --vbv-maxrate 7000 --vbv-bufsize 7000 --refine-intra 2 --refine-inter 2
-crowd_run_540p50.y4m,   --preset veryslow --no-cutree --analysis-save x265_analysis_540.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000 --vbv-bufsize 15000 --vbv-maxrate 9000::crowd_run_1080p50.y4m,   --preset veryslow --no-cutree --analysis-save x265_analysis_1080.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_1080p50.y4m,  --preset veryslow --no-cutree --analysis-save x265_analysis_1080.dat --analysis-load x265_analysis_540.dat --refine-intra 4 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_2160p50.y4m,  --preset veryslow --no-cutree --analysis-save x265_analysis_2160.dat --analysis-load x265_analysis_1080.dat --refine-intra 3 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000::crowd_run_2160p50.y4m,  --preset veryslow --no-cutree --analysis-load x265_analysis_2160.dat --refine-intra 2 --dynamic-refine --analysis-load-reuse-level 10 --scale-factor 1 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000
+crowd_run_540p50.y4m,   --preset veryslow --analysis-save x265_analysis_540.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000 --vbv-bufsize 15000 --vbv-maxrate 9000::crowd_run_1080p50.y4m,   --preset veryslow --analysis-save x265_analysis_1080.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_1080p50.y4m,  --preset veryslow --analysis-save x265_analysis_1080.dat --analysis-load x265_analysis_540.dat --refine-intra 4 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_2160p50.y4m,  --preset veryslow --analysis-save x265_analysis_2160.dat --analysis-load x265_analysis_1080.dat --refine-intra 3 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000::crowd_run_2160p50.y4m,  --preset veryslow --analysis-load x265_analysis_2160.dat --refine-intra 2 --dynamic-refine --analysis-load-reuse-level 10 --scale-factor 1 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000
 crowd_run_540p50.y4m,  --preset medium --no-cutree --analysis-save x265_analysis_540.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000 --vbv-bufsize 15000 --vbv-maxrate 9000::crowd_run_1080p50.y4m,  --preset medium --no-cutree --analysis-save x265_analysis_1080.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_1080p50.y4m,  --preset medium --no-cutree --analysis-save x265_analysis_1080.dat --analysis-load x265_analysis_540.dat --refine-intra 4 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_2160p50.y4m,  --preset medium --no-cutree --analysis-save x265_analysis_2160.dat --analysis-load x265_analysis_1080.dat --refine-intra 3 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000::crowd_run_2160p50.y4m,  --preset medium --no-cutree --analysis-load x265_analysis_2160.dat --refine-intra 2 --dynamic-refine --analysis-load-reuse-level 10 --scale-factor 1 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000
 News-4k.y4m,  --preset medium --analysis-save x265_analysis_fdup.dat --frame-dup --hrd --bitrate 10000 --vbv-bufsize 15000 --vbv-maxrate 12000::News-4k.y4m, --analysis-load x265_analysis_fdup.dat --frame-dup --hrd --bitrate 10000 --vbv-bufsize 15000 --vbv-maxrate 12000

x265_3.5.tar.gz/source/test/smoke-tests.txt Changed

x265_3.5.tar.gz/source/test/testbench.cpp Changed

@@ -208,15 +208,8 @@
 
         EncoderPrimitives asmprim;
         memset(&asmprim, 0, sizeof(asmprim));
-        setupAssemblyPrimitives(asmprim, test_archi.flag);
-
-#if X265_ARCH_ARM64
-        /* Temporary workaround because luma_vsp assembly primitive has not been completed
-         * but interp_8tap_hv_pp_cpu uses mixed C primitive and assembly primitive.
-         * Otherwise, segment fault occurs. */
-        setupAliasCPrimitives(cprim, asmprim, test_archi.flag);
-#endif
 
+        setupAssemblyPrimitives(asmprim, test_archi.flag);
         setupAliasPrimitives(asmprim);
         memcpy(&primitives, &asmprim, sizeof(EncoderPrimitives));
         for (size_t h = 0; h < sizeof(harness) / sizeof(TestHarness*); h++)
@@ -239,14 +232,8 @@
 #if X265_ARCH_X86
     setupInstrinsicPrimitives(optprim, cpuid);
 #endif
-    setupAssemblyPrimitives(optprim, cpuid);
 
-#if X265_ARCH_ARM64
-    /* Temporary workaround because luma_vsp assembly primitive has not been completed
-     * but interp_8tap_hv_pp_cpu uses mixed C primitive and assembly primitive.
-     * Otherwise, segment fault occurs. */
-    setupAliasCPrimitives(cprim, optprim, cpuid);
-#endif
+    setupAssemblyPrimitives(optprim, cpuid);
 
     /* Note that we do not setup aliases for performance tests, that would be
      * redundant. The testbench only verifies they are correctly aliased */

x265_3.5.tar.gz/source/test/testharness.h Changed

@@ -73,7 +73,7 @@
 #include <x86intrin.h>
 #elif ( !defined(__APPLE__) && defined (__GNUC__) && defined(__ARM_NEON__))
 #include <arm_neon.h>
-#elif defined(__GNUC__) && (!defined(__clang__) || __clang_major__ < 4)
+#else
 /* fallback for older GCC/MinGW */
 static inline uint32_t __rdtsc(void)
 {
@@ -82,15 +82,13 @@
 #if X265_ARCH_X86
     asm volatile("rdtsc" : "=a" (a) ::"edx");
 #elif X265_ARCH_ARM
-#if X265_ARCH_ARM64
-    asm volatile("mrs %0, cntvct_el0" : "=r"(a));
-#else
     // TOD-DO: verify following inline asm to get cpu Timestamp Counter for ARM arch
     // asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r"(a));
 
     // TO-DO: replace clock() function with appropriate ARM cpu instructions
     a = clock();
-#endif
+#elif  X265_ARCH_ARM64
+    asm volatile("mrs %0, cntvct_el0" : "=r"(a));
 #endif
     return a;
 }
@@ -140,7 +138,7 @@
  * needs an explicit asm check because it only sometimes crashes in normal use. */
 intptr_t PFX(checkasm_call)(intptr_t (*func)(), int *ok, ...);
 float PFX(checkasm_call_float)(float (*func)(), int *ok, ...);
-#elif X265_ARCH_ARM == 0
+#elif (X265_ARCH_ARM == 0 && X265_ARCH_ARM64 == 0)
 #define PFX(stack_pagealign)(func, align) func()
 #endif

x265_3.5.tar.gz/source/x265.h Changed

@@ -747,6 +747,13 @@
 
 static const x265_vmaf_commondata vcd = { { NULL, (char *)"/usr/local/share/model/vmaf_v0.6.1.pkl", NULL, NULL, 0, 0, 0, 0, 0, 0, 0, NULL, 0, 1, 0 } };
 
+
+typedef enum
+{
+    X265_SHARE_MODE_FILE = 0,
+    X265_SHARE_MODE_SHAREDMEM
+}X265_DATA_SHARE_MODES;
+
 /* x265 input parameters
  *
  * For version safety you may use x265_param_alloc/free() to manage the
@@ -1433,10 +1440,10 @@
         double    rfConstantMin;
 
         /* Multi-pass encoding */
-        /* Enable writing the stats in a multi-pass encode to the stat output file */
+        /* Enable writing the stats in a multi-pass encode to the stat output file/memory */
         int       bStatWrite;
 
-        /* Enable loading data from the stat input file in a multi pass encode */
+        /* Enable loading data from the stat input file/memory in a multi pass encode */
         int       bStatRead;
 
         /* Filename of the 2pass output/input stats file, if unspecified the
@@ -1489,6 +1496,24 @@
         /* internally enable if tune grain is set */
         int      bEnableConstVbv;
 
+        /* enable SBRC mode for each sequence */
+        int      frameSegment;
+
+        /* if only the focused frames would be re-encode or not */
+        int       bEncFocusedFramesOnly;
+
+        /* Share the data with stats file or shared memory.
+        It must be one of the X265_DATA_SHARE_MODES enum values
+        Available if the bStatWrite or bStatRead is true.
+        Use stats file by default.
+        The stats file mode would be used among the encoders running in sequence.
+        The shared memory mode could only be used among the encoders running in parallel.
+        Now only the cutree data could be shared among shared memory. More data would be support in the future.*/
+        int       dataShareMode;
+
+        /* Unique shared memory name. Required if the shared memory mode enabled. NULL by default */
+        const char* sharedMemName;
+
     } rc;
 
     /*== Video Usability Information ==*/
@@ -1869,12 +1894,6 @@
     /* The offset by which QP is incremented for non-referenced inter-frames after a scenecut when bEnableSceneCutAwareQp is 1 or 3. */
     double    fwdNonRefQpDelta;
 
-    /* A genuine threshold used for histogram based scene cut detection.
-     * This threshold determines whether a frame is a scenecut or not
-     * when compared against the edge and chroma histogram sad values.
-     * Default 0.03. Range: Real number in the interval (0,1). */
-    double    edgeTransitionThreshold;
-
     /* Enables histogram based scenecut detection algorithm to detect scenecuts. Default disabled */
     int       bHistBasedSceneCut;
 
@@ -1948,6 +1967,28 @@
 
     /* The offset by which QP is incremented for non-referenced inter-frames before a scenecut when bEnableSceneCutAwareQp is 2 or 3. */
     double    bwdNonRefQpDelta;
+
+    /* Specify combinations of color primaries, transfer characteristics, color matrix,
+    * range of luma and chroma signals, and chroma sample location. This has higher
+    * precedence than individual VUI parameters. If any individual VUI option is specified
+    * together with this, which changes the values set corresponding to the system-id
+    * or color-volume, it will be discarded. */
+    const char* videoSignalTypePreset;
+
+    /* Flag indicating whether the encoder should emit an End of Bitstream
+     * NAL at the end of bitstream. Default false */
+    int      bEnableEndOfBitstream;
+
+    /* Flag indicating whether the encoder should emit an End of Sequence
+     * NAL at the end of every Coded Video Sequence. Default false */
+    int      bEnableEndOfSequence;
+
+    /* Film Grain Characteristic file */
+    char* filmGrain;
+
+    /*Motion compensated temporal filter*/
+    int      bEnableTemporalFilter;
+    double   temporalFilterStrength;
 } x265_param;
 
 /* x265_param_alloc:

x265_3.5.tar.gz/source/x265cli.cpp Changed

@@ -28,8 +28,8 @@
 #include "x265cli.h"
 #include "svt.h"
 
-#define START_CODE 0x00000001
-#define START_CODE_BYTES 4
+#define START_CODE 0x00000001
+#define START_CODE_BYTES 4
 
 #ifdef __cplusplus
 namespace X265_NS {
@@ -174,7 +174,6 @@
         H1("   --scenecut-bias <0..100.0>    Bias for scenecut detection. Default %.2f\n", param->scenecutBias);
         H0("   --hist-scenecut               Enables histogram based scene-cut detection using histogram based algorithm.\n");
         H0("   --no-hist-scenecut            Disables histogram based scene-cut detection using histogram based algorithm.\n");
-        H1("   --hist-threshold <0.0..1.0>   Luma Edge histogram's Normalized SAD threshold for histogram based scenecut detection Default %.2f\n", param->edgeTransitionThreshold);
         H0("   --no-fades                  Enable detection and handling of fade-in regions. Default %s\n", OPT(param->bEnableFades));
         H1("   --scenecut-aware-qp <0..3>    Enable increasing QP for frames inside the scenecut window around scenecut. Default %s\n", OPT(param->bEnableSceneCutAwareQp));
         H1("                                 0 - Disabled\n");
@@ -262,6 +261,7 @@
         H0("   --aq-strength <float>         Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength);
         H0("   --qp-adaptation-range <float> Delta QP range by QP adaptation based on a psycho-visual model (1.0 to 6.0). Default %.2f\n", param->rc.qpAdaptationRange);
         H0("   --no-aq-motion              Block level QP adaptation based on the relative motion between the block and the frame. Default %s\n", OPT(param->bAQMotion));
+        H1("   --no-sbrc                   Enables the segment based rate control, using its scene statistics. Default %s\n", OPT(param->rc.frameSegment));
         H0("   --qg-size <int>               Specifies the size of the quantization group (64, 32, 16, 8). Default %d\n", param->rc.qgSize);
         H0("   --no-cutree                 Enable cutree for Adaptive Quantization. Default %s\n", OPT(param->rc.cuTree));
         H0("   --no-rc-grain               Enable ratecontrol mode to handle grains specifically. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableGrain));
@@ -314,6 +314,30 @@
         H0("   --master-display <string>     SMPTE ST 2086 master display color volume info SEI (HDR)\n");
         H0("                                    format: G(x,y)B(x,y)R(x,y)WP(x,y)L(max,min)\n");
         H0("   --max-cll <string>            Specify content light level info SEI as \"cll,fall\" (HDR).\n");
+        H0("   --video-signal-type-preset <string>    Specify combinations of color primaries, transfer characteristics, color matrix, range of luma and chroma signals, and chroma sample location\n");
+        H0("                                            format: <system-id>:<color-volume>\n");
+        H0("                                            This has higher precedence than individual VUI parameters. If any individual VUI option is specified together with this,\n");
+        H0("                                            which changes the values set corresponding to the system-id or color-volume, it will be discarded.\n");
+        H0("                                            The color-volume can be used only with the system-id options BT2100_PQ_YCC, BT2100_PQ_ICTCP, and BT2100_PQ_RGB.\n");
+        H0("                                            system-id options and their corresponding values:\n");
+        H0("                                              BT601_525:       --colorprim smpte170m --transfer smpte170m --colormatrix smpte170m --range limited --chromaloc 0\n");
+        H0("                                              BT601_626:       --colorprim bt470bg --transfer smpte170m --colormatrix bt470bg --range limited --chromaloc 0\n");
+        H0("                                              BT709_YCC:       --colorprim bt709 --transfer bt709 --colormatrix bt709 --range limited --chromaloc 0\n");
+        H0("                                              BT709_RGB:       --colorprim bt709 --transfer bt709 --colormatrix gbr --range limited\n");
+        H0("                                              BT2020_YCC_NCL:  --colorprim bt2020 --transfer bt2020-10 --colormatrix bt709 --range limited --chromaloc 2\n");
+        H0("                                              BT2020_RGB:      --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020nc --range limited\n");
+        H0("                                              BT2100_PQ_YCC:   --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020nc --range limited --chromaloc 2\n");
+        H0("                                              BT2100_PQ_ICTCP: --colorprim bt2020 --transfer smpte2084 --colormatrix ictcp --range limited --chromaloc 2\n");
+        H0("                                              BT2100_PQ_RGB:   --colorprim bt2020 --transfer smpte2084 --colormatrix gbr --range limited\n");
+        H0("                                              BT2100_HLG_YCC:  --colorprim bt2020 --transfer arib-std-b67 --colormatrix bt2020nc --range limited --chromaloc 2\n");
+        H0("                                              BT2100_HLG_RGB:  --colorprim bt2020 --transfer arib-std-b67 --colormatrix gbr --range limited\n");
+        H0("                                              FR709_RGB:       --colorprim bt709 --transfer bt709 --colormatrix gbr --range full\n");
+        H0("                                              FR2020_RGB:      --colorprim bt2020 --transfer bt2020-10 --colormatrix gbr --range full\n");
+        H0("                                              FRP3D65_YCC:     --colorprim smpte432 --transfer bt709 --colormatrix smpte170m --range full --chromaloc 1\n");
+        H0("                                            color-volume options and their corresponding values:\n");
+        H0("                                              P3D65x1000n0005: --master-display G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,5)\n");
+        H0("                                              P3D65x4000n005:  --master-display G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(40000000,50)\n");
+        H0("                                              BT2100x108n0005: --master-display G(8500,39850)B(6550,2300)R(34000,146000)WP(15635,16450)L(10000000,1)\n");
         H0("   --no-cll                    Emit content light level info SEI. Default %s\n", OPT(param->bEmitCLL));
         H0("   --no-hdr10                  Control dumping of HDR10 SEI packet. If max-cll or master-display has non-zero values, this is enabled. Default %s\n", OPT(param->bEmitHDR10SEI));
         H0("   --no-hdr-opt                Add luma and chroma offsets for HDR/WCG content. Default %s. Now deprecated.\n", OPT(param->bHDROpt));
@@ -327,6 +351,8 @@
         H0("   --no-idr-recovery-sei      Emit recovery point infor SEI at each IDR frame \n");
         H0("   --no-temporal-layers        Enable a temporal sublayer for unreferenced B frames. Default %s\n", OPT(param->bEnableTemporalSubLayers));
         H0("   --no-aud                    Emit access unit delimiters at the start of each access unit. Default %s\n", OPT(param->bEnableAccessUnitDelimiters));
+        H0("   --no-eob                    Emit end of bitstream nal unit at the end of the bitstream. Default %s\n", OPT(param->bEnableEndOfBitstream));
+        H0("   --no-eos                    Emit end of sequence nal unit at the end of every coded video sequence. Default %s\n", OPT(param->bEnableEndOfSequence));
         H1("   --hash <integer>              Decoded Picture Hash SEI 0: disabled, 1: MD5, 2: CRC, 3: Checksum. Default %d\n", param->decodedPictureHashSEI);
         H0("   --atc-sei <integer>           Emit the alternative transfer characteristics SEI message where the integer is the preferred transfer characteristics. Default disabled\n");
         H0("   --pic-struct <integer>        Set the picture structure and emits it in the picture timing SEI message. Values in the range 0..12. See D.3.3 of the HEVC spec. for a detailed explanation.\n");
@@ -344,6 +370,7 @@
         H0("   --lowpass-dct                 Use low-pass subband dct approximation. Default %s\n", OPT(param->bLowPassDct));
         H0("   --no-frame-dup              Enable Frame duplication. Default %s\n", OPT(param->bEnableFrameDuplication));
         H0("   --dup-threshold <integer>     PSNR threshold for Frame duplication. Default %d\n", param->dupThreshold);
+        H0("   --no-mcstf                  Enable GOP based temporal filter. Default %d\n", param->bEnableTemporalFilter);
 #ifdef SVT_HEVC
         H0("   --nosvt                     Enable SVT HEVC encoder %s\n", OPT(param->bEnableSvtHevc));
         H0("   --no-svt-hme                Enable Hierarchial motion estimation(HME) in SVT HEVC encoder \n");
@@ -365,6 +392,9 @@
         H1("    2 - unable to open encoder\n");
         H1("    3 - unable to generate stream headers\n");
         H1("    4 - encoder abort\n");
+        H0("\nSEI Message Options\n");
+        H0("   --film-grain <filename>           File containing Film Grain Characteristics to be written as a SEI Message\n");
+
 #undef OPT
 #undef H0
 #undef H1
@@ -1010,57 +1040,57 @@
         return 1;
     }
 
-    /* Parse the RPU file and extract the RPU corresponding to the current picture
-    * and fill the rpu field of the input picture */
-    int CLIOptions::rpuParser(x265_picture * pic)
-    {
-        uint8_t byteVal;
-        uint32_t code = 0;
-        int bytesRead = 0;
-        pic->rpu.payloadSize = 0;
-
-        if (!pic->pts)
-        {
-            while (bytesRead++ < 4 && fread(&byteVal, sizeof(uint8_t), 1, dolbyVisionRpu))
-                code = (code << 8) | byteVal;
-
-            if (code != START_CODE)
-            {
-                x265_log(NULL, X265_LOG_ERROR, "Invalid Dolby Vision RPU startcode in POC %d\n", pic->pts);
-                return 1;
-            }
-        }
-
-        bytesRead = 0;
-        while (fread(&byteVal, sizeof(uint8_t), 1, dolbyVisionRpu))
-        {
-            code = (code << 8) | byteVal;
-            if (bytesRead++ < 3)
-                continue;
-            if (bytesRead >= 1024)
-            {
-                x265_log(NULL, X265_LOG_ERROR, "Invalid Dolby Vision RPU size in POC %d\n", pic->pts);
-                return 1;
-            }
-
-            if (code != START_CODE)
-                pic->rpu.payloadpic->rpu.payloadSize++ = (code >> (3 * 8)) & 0xFF;
-            else
-                return 0;
-        }
-
-        int ShiftBytes = START_CODE_BYTES - (bytesRead - pic->rpu.payloadSize);
-        int bytesLeft = bytesRead - pic->rpu.payloadSize;
-        code = (code << ShiftBytes * 8);
-        for (int i = 0; i < bytesLeft; i++)
-        {
-            pic->rpu.payloadpic->rpu.payloadSize++ = (code >> (3 * 8)) & 0xFF;
-            code = (code << 8);
-        }
-        if (!pic->rpu.payloadSize)
-            x265_log(NULL, X265_LOG_WARNING, "Dolby Vision RPU not found for POC %d\n", pic->pts);
-        return 0;
-    }
+    /* Parse the RPU file and extract the RPU corresponding to the current picture
+    * and fill the rpu field of the input picture */
+    int CLIOptions::rpuParser(x265_picture * pic)
+    {
+        uint8_t byteVal;
+        uint32_t code = 0;
+        int bytesRead = 0;
+        pic->rpu.payloadSize = 0;
+
+        if (!pic->pts)
+        {
+            while (bytesRead++ < 4 && fread(&byteVal, sizeof(uint8_t), 1, dolbyVisionRpu))
+                code = (code << 8) | byteVal;
+
+            if (code != START_CODE)
+            {
+                x265_log(NULL, X265_LOG_ERROR, "Invalid Dolby Vision RPU startcode in POC %d\n", pic->pts);
+                return 1;
+            }
+        }
+
+        bytesRead = 0;
+        while (fread(&byteVal, sizeof(uint8_t), 1, dolbyVisionRpu))
+        {
+            code = (code << 8) | byteVal;
+            if (bytesRead++ < 3)
+                continue;
+            if (bytesRead >= 1024)
+            {
+                x265_log(NULL, X265_LOG_ERROR, "Invalid Dolby Vision RPU size in POC %d\n", pic->pts);
+                return 1;
+            }
+
+            if (code != START_CODE)
+                pic->rpu.payloadpic->rpu.payloadSize++ = (code >> (3 * 8)) & 0xFF;
+            else
+                return 0;
+        }
+
+        int ShiftBytes = START_CODE_BYTES - (bytesRead - pic->rpu.payloadSize);
+        int bytesLeft = bytesRead - pic->rpu.payloadSize;
+        code = (code << ShiftBytes * 8);
+        for (int i = 0; i < bytesLeft; i++)
+        {
+            pic->rpu.payloadpic->rpu.payloadSize++ = (code >> (3 * 8)) & 0xFF;
+            code = (code << 8);
+        }
+        if (!pic->rpu.payloadSize)
+            x265_log(NULL, X265_LOG_WARNING, "Dolby Vision RPU not found for POC %d\n", pic->pts);
+        return 0;
+    }
 
 #ifdef __cplusplus
 }

x265_3.5.tar.gz/source/x265cli.h Changed

@@ -143,7 +143,6 @@
     { "scenecut-bias",  required_argument, NULL, 0 },
     { "hist-scenecut",        no_argument, NULL, 0},
     { "no-hist-scenecut",     no_argument, NULL, 0},
-    { "hist-threshold", required_argument, NULL, 0},
     { "fades",                no_argument, NULL, 0 },
     { "no-fades",             no_argument, NULL, 0 },
     { "scenecut-aware-qp", required_argument, NULL, 0 },
@@ -182,6 +181,8 @@
     { "qp",             required_argument, NULL, 'q' },
     { "aq-mode",        required_argument, NULL, 0 },
     { "aq-strength",    required_argument, NULL, 0 },
+    { "sbrc",                 no_argument, NULL, 0 },
+    { "no-sbrc",              no_argument, NULL, 0 },
     { "rc-grain",             no_argument, NULL, 0 },
     { "no-rc-grain",          no_argument, NULL, 0 },
     { "ipratio",        required_argument, NULL, 0 },
@@ -244,6 +245,7 @@
     { "crop-rect",      required_argument, NULL, 0 }, /* DEPRECATED */
     { "master-display", required_argument, NULL, 0 },
     { "max-cll",        required_argument, NULL, 0 },
+    {"video-signal-type-preset", required_argument, NULL, 0 },
     { "min-luma",       required_argument, NULL, 0 },
     { "max-luma",       required_argument, NULL, 0 },
     { "log2-max-poc-lsb", required_argument, NULL, 8 },
@@ -263,6 +265,10 @@
     { "repeat-headers",       no_argument, NULL, 0 },
     { "aud",                  no_argument, NULL, 0 },
     { "no-aud",               no_argument, NULL, 0 },
+    { "eob",                  no_argument, NULL, 0 },
+    { "no-eob",               no_argument, NULL, 0 },
+    { "eos",                  no_argument, NULL, 0 },
+    { "no-eos",               no_argument, NULL, 0 },
     { "info",                 no_argument, NULL, 0 },
     { "no-info",              no_argument, NULL, 0 },
     { "zones",          required_argument, NULL, 0 },
@@ -349,6 +355,8 @@
     { "frame-dup",            no_argument, NULL, 0 },
     { "no-frame-dup", no_argument, NULL, 0 },
     { "dup-threshold", required_argument, NULL, 0 },
+    { "mcstf",                 no_argument, NULL, 0 },
+    { "no-mcstf",              no_argument, NULL, 0 },
 #ifdef SVT_HEVC
     { "svt",     no_argument, NULL, 0 },
     { "no-svt",  no_argument, NULL, 0 },
@@ -373,6 +381,7 @@
     { "abr-ladder", required_argument, NULL, 0 },
     { "min-vbv-fullness", required_argument, NULL, 0 },
     { "max-vbv-fullness", required_argument, NULL, 0 },
+    { "film-grain", required_argument, NULL, 0 },
     { 0, 0, 0, 0 },
     { 0, 0, 0, 0 },
     { 0, 0, 0, 0 },