Projects
Essentials
x265
Sign Up
Log In
Username
Password
We truncated the diff of some files because they were too big. If you want to see the full diff for every file,
click here
.
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
Expand all
Collapse all
Changes of Revision 24
View file
x265.changes
Changed
@@ -1,4 +1,57 @@ ------------------------------------------------------------------- +Thu Jul 27 08:33:52 UTC 2017 - joerg.lorenzen@ki.tng.de + +- Update to version 2.5 + Encoder enhancements + * Improved grain handling with --tune grain option by throttling + VBV operations to limit QP jumps. + * Frame threads are now decided based on number of threads + specified in the --pools, as opposed to the number of hardware + threads available. The mapping was also adjusted to improve + quality of the encodes with minimal impact to performance. + * CSV logging feature (enabled by --csv) is now part of the + library; it was previously part of the x265 application. + Applications that integrate libx265 can now extract frame level + statistics for their encodes by exercising this option in the + library. + * Globals that track min and max CU sizes, number of slices, and + other parameters have now been moved into instance-specific + variables. Consequently, applications that invoke multiple + instances of x265 library are no longer restricted to use the + same settings for these parameter options across the multiple + instances. + * x265 can now generate a seprate library that exports the HDR10+ + parsing API. Other libraries that wish to use this API may do + so by linking against this library. Enable ENABLE_HDR10_PLUS in + CMake options and build to generate this library. + * SEA motion search receives a 10% performance boost from AVX2 + optimization of its kernels. + * The CSV log is now more elaborate with additional fields such + as PU statistics, average-min-max luma and chroma values, etc. + Refer to documentation of --csv for details of all fields. + * x86inc.asm cleaned-up for improved instruction handling. + API changes + * New API x265_encoder_ctu_info() introduced to specify suggested + partition sizes for various CTUs in a frame. To be used in + conjunction with --ctu-info to react to the specified + partitions appropriately. + * Rate-control statistics passed through the x265_picture object + for an incoming frame are now used by the encoder. + * Options to scale, reuse, and refine analysis for incoming + analysis shared through the x265_analysis_data field in + x265_picture for runs that use --analysis-reuse-mode load; use + options --scale, --refine-mv, --refine-inter, and + --refine-intra to explore. + * VBV now has a deterministic mode. Use --const-vbv to exercise. + Bug fixes + * Several fixes for HDR10+ parsing code including incompatibility + with user-specific SEI, removal of warnings, linking issues in + linux, etc. + * SEI messages for HDR10 repeated every keyint when HDR options + (--hdr-opt, --master-display) specified. +- soname bump to 130. + +------------------------------------------------------------------- Thu Apr 27 14:15:13 UTC 2017 - joerg.lorenzen@ki.tng.de - Update to version 2.4
View file
x265.spec
Changed
@@ -1,10 +1,10 @@ # based on the spec file from https://build.opensuse.org/package/view_file/home:Simmphonie/libx265/ Name: x265 -%define soname 116 +%define soname 130 %define libname lib%{name} %define libsoname %{libname}-%{soname} -Version: 2.4 +Version: 2.5 Release: 0 License: GPL-2.0+ Summary: A free h265/HEVC encoder - encoder binary
View file
baselibs.conf
Changed
@@ -1,1 +1,1 @@ -libx265-116 +libx265-130
View file
x265_2.4.tar.gz/source/dynamicHDR10/BasicStructures.cpp
Deleted
@@ -1,40 +0,0 @@ -/** - * @file BasicStructures.cpp - * @brief Defines the structure of metadata parameters - * @author Daniel Maximiliano Valenzuela, Seongnam Oh. - * @create date 03/01/2017 - * @version 0.0.1 - * - * Copyright @ 2017 Samsung Electronics, DMS Lab, Samsung Research America and Samsung Research Tijuana - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version 2 - * of the License, or (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, - * MA 02110-1301, USA. -**/ - -#include "BasicStructures.h" -#include "vector" - -struct PercentileLuminance{ - - float averageLuminance = 0.0; - float maxRLuminance = 0.0; - float maxGLuminance = 0.0; - float maxBLuminance = 0.0; - int order; - std::vector<unsigned int> percentiles; -}; - - -
View file
x265_2.4.tar.gz/.hg_archival.txt -> x265_2.5.tar.gz/.hg_archival.txt
Changed
@@ -1,4 +1,4 @@ repo: 09fe40627f03a0f9c3e6ac78b22ac93da23f9fdf -node: e7a4dd48293b7956d4a20df257d23904cc78e376 +node: 64b2d0bf45a52511e57a6b7299160b961ca3d51c branch: stable -tag: 2.4 +tag: 2.5
View file
x265_2.4.tar.gz/.hgtags -> x265_2.5.tar.gz/.hgtags
Changed
@@ -22,3 +22,4 @@ 981e3bfef16a997bce6f46ce1b15631a0e234747 2.1 be14a7e9755e54f0fd34911c72bdfa66981220bc 2.2 3037c1448549ca920967831482c653e5892fa8ed 2.3 +e7a4dd48293b7956d4a20df257d23904cc78e376 2.4
View file
x265_2.4.tar.gz/doc/reST/api.rst -> x265_2.5.tar.gz/doc/reST/api.rst
Changed
@@ -192,6 +192,12 @@ * presets is not recommended without a more fine-grained breakdown of * parameters to take this into account. */ int x265_encoder_reconfig(x265_encoder *, x265_param *); +**x265_encoder_ctu_info** + /* x265_encoder_ctu_info: + * Copy CTU information such as ctu address and ctu partition structure of all + * CTUs in each frame. The function is invoked only if "--ctu-info" is enabled and + * the encoder will wait for this copy to complete if enabled. + */ Pictures ======== @@ -341,6 +347,14 @@ Cleanup ======= +At the end of the encode, the application will want to trigger logging +of the final encode statistics, if :option:`--csv` had been specified:: + + /* x265_encoder_log: + * write a line to the configured CSV file. If a CSV filename was not + * configured, or file open failed, this function will perform no write. */ + void x265_encoder_log(x265_encoder *encoder, int argc, char **argv); + Finally, the encoder must be closed in order to free all of its resources. An encoder that has been flushed cannot be restarted and reused. Once **x265_encoder_close()** has been called, the encoder
View file
x265_2.4.tar.gz/doc/reST/cli.rst -> x265_2.5.tar.gz/doc/reST/cli.rst
Changed
@@ -52,8 +52,7 @@ 2. unable to open encoder 3. unable to generate stream headers 4. encoder abort - 5. unable to open csv file - + Logging/Statistic Options ========================= @@ -83,9 +82,66 @@ it adds one line per run. If :option:`--csv-log-level` is greater than 0, it writes one line per frame. Default none - Several frame performance statistics are available when - :option:`--csv-log-level` is greater than or equal to 2: - + The following statistics are available when :option:`--csv-log-level` is + greater than or equal to 1: + + **Encode Order** The frame order in which the encoder encodes. + + **Type** Slice type of the frame. + + **POC** Picture Order Count - The display order of the frames. + + **QP** Quantization Parameter decided for the frame. + + **Bits** Number of bits consumed by the frame. + + **Scenecut** 1 if the frame is a scenecut, 0 otherwise. + + **RateFactor** Applicable only when CRF is enabled. The rate factor depends + on the CRF given by the user. This is used to determine the QP so as to + target a certain quality. + + **BufferFill** Bits available for the next frame. Includes bits carried + over from the current frame. + + **Latency** Latency in terms of number of frames between when the frame + was given in and when the frame is given out. + + **PSNR** Peak signal to noise ratio for Y, U and V planes. + + **SSIM** A quality metric that denotes the structural similarity between frames. + + **Ref lists** POC of references in lists 0 and 1 for the frame. + + Several statistics about the encoded bitstream and encoder performance are + available when :option:`--csv-log-level` is greater than or equal to 2: + + **I/P cost ratio:** The ratio between the cost when a frame is decided as an + I frame to that when it is decided as a P frame as computed from the + quarter-resolution frame in look-ahead. This, in combination with other parameters + such as position of the frame in the GOP, is used to decide scene transitions. + + **Analysis statistics:** + + **CU Statistics** percentage of CU modes. + + **Distortion** Average luma and chroma distortion. Calculated as + SSE is done on fenc and recon(after quantization). + + **Psy Energy** Average psy energy calculated as the sum of absolute + difference between source and recon energy. Energy is measured by sa8d + minus SAD. + + **Residual Energy** Average residual energy. SSE is calculated on fenc + and pred(before quantization). + + **Luma/Chroma Values** minumum, maximum and average(averaged by area) + luma and chroma values of source for each frame. + + **PU Statistics** percentage of PU modes at each depth. + + **Performance statistics:** + **DecideWait ms** number of milliseconds the frame encoder had to wait, since the previous frame was retrieved by the API thread, before a new frame has been given to it. This is the latency @@ -111,6 +167,8 @@ **Stall Time ms** the number of milliseconds of the reported wall time that were spent with zero worker threads, aka all compression was completely stalled. + + **Total frame time** Total time spent to encode the frame. **Avg WPP** the average number of worker threads working on this frame, at any given time. This value is sampled at the completion of @@ -123,8 +181,6 @@ is more of a problem for P frames where some blocks are much more expensive than others. - **CLI ONLY** - .. option:: --csv-log-level <integer> Controls the level of detail (and size) of --csv log files @@ -133,8 +189,6 @@ 1. frame level logging 2. frame level logging with performance statistics - **CLI ONLY** - .. option:: --ssim, --no-ssim Calculate and report Structural Similarity values. It is @@ -795,33 +849,31 @@ Analysis re-use options, to improve performance when encoding the same sequence multiple times (presumably at varying bitrates). The encoder -will not reuse analysis if the resolution and slice type parameters do -not match. +will not reuse analysis if slice type parameters do not match. -.. option:: --analysis-mode <string|int> +.. option:: --analysis-reuse-mode <string|int> - Specify whether analysis information of each frame is output by encoder - or input for reuse. By reading the analysis data writen by an - earlier encode of the same sequence, substantial redundant work may - be avoided. - - The following data may be stored and reused: - I frames - split decisions and luma intra directions of all CUs. - P/B frames - motion vectors are dumped at each depth for all CUs. + This option allows reuse of analysis information from first pass to second pass. + :option:`--analysis-reuse-mode save` specifies that encoder outputs analysis information of each frame. + :option:`--analysis-reuse-mode load` specifies that encoder reuses analysis information from first pass. + There is no benefit using load mode without running encoder in save mode. Analysis data from save mode is + written to a file specified by :option:`--analysis-reuse-file`. The amount of analysis data stored/reused + is determined by :option:`--analysis-reuse-level`. By reading the analysis data writen by an earlier encode + of the same sequence, substantial redundant work may be avoided. Requires cutree, pmode to be off. Default 0. **Values:** off(0), save(1): dump analysis data, load(2): read analysis data -.. option:: --analysis-file <filename> +.. option:: --analysis-reuse-file <filename> - Specify a filename for analysis data (see :option:`--analysis-mode`) + Specify a filename for analysis data (see :option:`--analysis-reuse-mode`) If no filename is specified, x265_analysis.dat is used. -.. option:: --refine-level <1..10> +.. option:: --analysis-reuse-level <1..10> - Amount of information stored/reused in :option:`--analysis-mode` is distributed across levels. + Amount of information stored/reused in :option:`--analysis-reuse-mode` is distributed across levels. Higher the value, higher the information stored/reused, faster the encode. Default 5. - Note that --refine-level must be paired with analysis-mode. + Note that --analysis-reuse-level must be paired with analysis-reuse-mode. +--------+-----------------------------------------+ | Level | Description | @@ -835,6 +887,41 @@ | 10 | Level 5 + Full CU analysis-info | +--------+-----------------------------------------+ +.. option:: --scale-factor + + Factor by which input video is scaled down for analysis save mode. + This option should be coupled with analysis-reuse-mode option, --analysis-reuse-level 10. + The ctu size of load should be double the size of save. Default 0. + +.. option:: --refine-intra <0|1|2> + + Enables refinement of intra blocks in current encode. + + Level 0 - Forces both mode and depth from the previous encode. + + Level 1 - Evaluates all intra modes for blocks of size one smaller than + the min-cu-size of the incoming analysis data from the previous encode, + forces modes for blocks of larger size. + + Level 2 - Evaluates all intra modes for blocks of size one smaller than + the min-cu-size of the incoming analysis data from the previous encode. + For larger blocks, force only depth when angular mode is chosen by the + previous encode, force depth and mode when other intra modes are chosen. + + Default 0. + +.. option:: --refine-inter-depth + + Enables refinement of inter blocks in current encode. Evaluates all + inter modes for blocks of size one smaller than the min-cu-size of the + incoming analysis data from the previous encode. Default disabled. + +.. option:: --refine-mv + + Enables refinement of motion vector for scaled video. Evaluates the best + motion vector by searching the surrounding eight integer and subpel pixel + positions. + Options which affect the transform unit quad-tree, sometimes referred to as the residual quad-tree (RQT). @@ -1221,7 +1308,16 @@ intra cost of a frame used in scenecut detection. For example, a value of 5 indicates, if the inter cost of a frame is greater than or equal to 95 percent of the intra cost of the frame,
View file
x265_2.4.tar.gz/doc/reST/releasenotes.rst -> x265_2.5.tar.gz/doc/reST/releasenotes.rst
Changed
@@ -2,8 +2,33 @@ Release Notes ************* -Release Notes -************* +Version 2.5 +=========== + +Release date - 13th July, 2017. + +Encoder enhancements +-------------------- +1. Improved grain handling with :option:`--tune` grain option by throttling VBV operations to limit QP jumps. +2. Frame threads are now decided based on number of threads specified in the :option:`--pools`, as opposed to the number of hardware threads available. The mapping was also adjusted to improve quality of the encodes with minimal impact to performance. +3. CSV logging feature (enabled by :option:`--csv`) is now part of the library; it was previously part of the x265 application. Applications that integrate libx265 can now extract frame level statistics for their encodes by exercising this option in the library. +4. Globals that track min and max CU sizes, number of slices, and other parameters have now been moved into instance-specific variables. Consequently, applications that invoke multiple instances of x265 library are no longer restricted to use the same settings for these parameter options across the multiple instances. +5. x265 can now generate a seprate library that exports the HDR10+ parsing API. Other libraries that wish to use this API may do so by linking against this library. Enable ENABLE_HDR10_PLUS in CMake options and build to generate this library. +6. SEA motion search receives a 10% performance boost from AVX2 optimization of its kernels. +7. The CSV log is now more elaborate with additional fields such as PU statistics, average-min-max luma and chroma values, etc. Refer to documentation of :option:`--csv` for details of all fields. +8. x86inc.asm cleaned-up for improved instruction handling. + +API changes +----------- +1. New API x265_encoder_ctu_info() introduced to specify suggested partition sizes for various CTUs in a frame. To be used in conjunction with :option:`--ctu-info` to react to the specified partitions appropriately. +2. Rate-control statistics passed through the x265_picture object for an incoming frame are now used by the encoder. +3. Options to scale, reuse, and refine analysis for incoming analysis shared through the x265_analysis_data field in x265_picture for runs that use :option:`--analysis-reuse-mode` load; use options :option:`--scale`, :option:`--refine-mv`, :option:`--refine-inter`, and :option:`--refine-intra` to explore. +4. VBV now has a deterministic mode. Use :option:`--const-vbv` to exercise. + +Bug fixes +--------- +1. Several fixes for HDR10+ parsing code including incompatibility with user-specific SEI, removal of warnings, linking issues in linux, etc. +2. SEI messages for HDR10 repeated every keyint when HDR options (:option:`--hdr-opt`, :option:`--master-display`) specified. Version 2.4 ===========
View file
x265_2.4.tar.gz/source/CMakeLists.txt -> x265_2.5.tar.gz/source/CMakeLists.txt
Changed
@@ -29,7 +29,7 @@ option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF) mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD) # X265_BUILD must be incremented each time the public API is changed -set(X265_BUILD 116) +set(X265_BUILD 130) configure_file("${PROJECT_SOURCE_DIR}/x265.def.in" "${PROJECT_BINARY_DIR}/x265.def") configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in" @@ -182,12 +182,19 @@ add_definitions(-O3 -qstrict -qhot -qaltivec) add_definitions(-qinline=level=10 -qpath=IL:/data/video_files/latest.tpo/) endif() - - +# this option is to enable the inclusion of dynamic HDR10 library to the libx265 compilation +option(ENABLE_HDR10_PLUS "Enable dynamic HDR10 compilation" OFF) if(GCC) add_definitions(-Wall -Wextra -Wshadow) add_definitions(-D__STDC_LIMIT_MACROS=1) - add_definitions(-std=gnu++98) + if(ENABLE_HDR10_PLUS) + if(CMAKE_CXX_COMPILER_VERSION VERSION_LESS "4.8") + message(FATAL_ERROR "gcc version above 4.8 required to support hdr10plus") + endif() + add_definitions(-std=gnu++11) + else() + add_definitions(-std=gnu++98) + endif() if(ENABLE_PIC) add_definitions(-fPIC) endif(ENABLE_PIC) @@ -363,14 +370,12 @@ else(HIGH_BIT_DEPTH) add_definitions(-DHIGH_BIT_DEPTH=0 -DX265_DEPTH=8) endif(HIGH_BIT_DEPTH) -# this option is to enable the inclusion of dynamic HDR10 library to the libx265 compilation -option(ENABLE_DYNAMIC_HDR10 "Enable dynamic HDR10 compilation" OFF) -if (ENABLE_DYNAMIC_HDR10) - add_subdirectory(dynamicHDR10) - include_directories(dynamicHDR10) - add_definitions(-DENABLE_DYNAMIC_HDR10) -endif(ENABLE_DYNAMIC_HDR10) +if (ENABLE_HDR10_PLUS) + include_directories(. dynamicHDR10 "${PROJECT_BINARY_DIR}") + add_subdirectory(dynamicHDR10) + add_definitions(-DENABLE_HDR10_PLUS) +endif(ENABLE_HDR10_PLUS) # this option can only be used when linking multiple libx265 libraries # together, and some alternate API access method is implemented. option(EXPORT_C_API "Implement public C programming interface" ON) @@ -510,8 +515,10 @@ endif() endif() source_group(ASM FILES ${ASM_SRCS}) -if(ENABLE_DYNAMIC_HDR10) +if(ENABLE_HDR10_PLUS) add_library(x265-static STATIC $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS} ${ASM_SRCS}) + add_library(hdr10plus-static STATIC $<TARGET_OBJECTS:dynamicHDR10>) + set_target_properties(hdr10plus-static PROPERTIES OUTPUT_NAME hdr10plus) else() add_library(x265-static STATIC $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS} ${ASM_SRCS}) endif() @@ -524,6 +531,12 @@ install(TARGETS x265-static LIBRARY DESTINATION ${LIB_INSTALL_DIR} ARCHIVE DESTINATION ${LIB_INSTALL_DIR}) + +if(ENABLE_HDR10_PLUS) + install(TARGETS hdr10plus-static + LIBRARY DESTINATION ${LIB_INSTALL_DIR} + ARCHIVE DESTINATION ${LIB_INSTALL_DIR}) +endif() install(FILES x265.h "${PROJECT_BINARY_DIR}/x265_config.h" DESTINATION include) if(CMAKE_RC_COMPILER) @@ -547,10 +560,16 @@ endif() option(ENABLE_SHARED "Build shared library" ON) if(ENABLE_SHARED) - - if(ENABLE_DYNAMIC_HDR10) + if(ENABLE_HDR10_PLUS) add_library(x265-shared SHARED "${PROJECT_BINARY_DIR}/x265.def" ${ASM_OBJS} ${X265_RC_FILE} $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10>) + add_library(hdr10plus-shared SHARED $<TARGET_OBJECTS:dynamicHDR10>) + + if(MSVC) + set_target_properties(hdr10plus-shared PROPERTIES OUTPUT_NAME libhdr10plus) + else() + set_target_properties(hdr10plus-shared PROPERTIES OUTPUT_NAME hdr10plus) + endif() else() add_library(x265-shared SHARED "${PROJECT_BINARY_DIR}/x265.def" ${ASM_OBJS} ${X265_RC_FILE} $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common>) @@ -585,6 +604,11 @@ ARCHIVE DESTINATION ${LIB_INSTALL_DIR} RUNTIME DESTINATION ${BIN_INSTALL_DIR}) endif() + if(ENABLE_HDR10_PLUS) + install(TARGETS hdr10plus-shared + LIBRARY DESTINATION ${LIB_INSTALL_DIR} + ARCHIVE DESTINATION ${LIB_INSTALL_DIR}) + endif() if(LINKER_OPTIONS) # set_target_properties can't do list expansion string(REPLACE ";" " " LINKER_OPTION_STR "${LINKER_OPTIONS}") @@ -646,18 +670,18 @@ endif(WIN32) if(XCODE) # Xcode seems unable to link the CLI with libs, so link as one targget - if(ENABLE_DYNAMIC_HDR10) + if(ENABLE_HDR10_PLUS) add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT} - x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp + x265.cpp x265.h x265cli.h $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS} ${ASM_SRCS}) else() add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT} - x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp + x265.cpp x265.h x265cli.h $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS} ${ASM_SRCS}) endif() else() add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT} ${X265_RC_FILE} - ${ExportDefs} x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp) + ${ExportDefs} x265.cpp x265.h x265cli.h) if(WIN32 OR NOT ENABLE_SHARED OR INTEL_CXX) # The CLI cannot link to the shared library on Windows, it # requires internal APIs not exported from the DLL
View file
x265_2.4.tar.gz/source/common/CMakeLists.txt -> x265_2.5.tar.gz/source/common/CMakeLists.txt
Changed
@@ -57,10 +57,10 @@ set(VEC_PRIMITIVES vec/vec-primitives.cpp ${PRIMITIVES}) source_group(Intrinsics FILES ${VEC_PRIMITIVES}) - set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h) + set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h seaintegral.h) set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm ssd-a.asm mc-a.asm mc-a2.asm pixel-util8.asm blockcopy8.asm - pixeladd8.asm dct8.asm) + pixeladd8.asm dct8.asm seaintegral.asm) if(HIGH_BIT_DEPTH) set(A_SRCS ${A_SRCS} sad16-a.asm intrapred16.asm ipfilter16.asm loopfilter.asm) else()
View file
x265_2.4.tar.gz/source/common/common.h -> x265_2.5.tar.gz/source/common/common.h
Changed
@@ -259,7 +259,6 @@ #define LOG2_RASTER_SIZE (MAX_LOG2_CU_SIZE - LOG2_UNIT_SIZE) #define RASTER_SIZE (1 << LOG2_RASTER_SIZE) #define MAX_NUM_PARTITIONS (RASTER_SIZE * RASTER_SIZE) -#define NUM_4x4_PARTITIONS (1U << (g_unitSizeDepth << 1)) // number of 4x4 units in max CU size #define MIN_PU_SIZE 4 #define MIN_TU_SIZE 4
View file
x265_2.4.tar.gz/source/common/constants.cpp -> x265_2.5.tar.gz/source/common/constants.cpp
Changed
@@ -161,7 +161,6 @@ 65535 }; -int g_ctuSizeConfigured = 0; uint32_t g_maxLog2CUSize = MAX_LOG2_CU_SIZE; uint32_t g_maxCUSize = MAX_CU_SIZE; uint32_t g_unitSizeDepth = NUM_CU_DEPTH;
View file
x265_2.4.tar.gz/source/common/constants.h -> x265_2.5.tar.gz/source/common/constants.h
Changed
@@ -30,8 +30,6 @@ namespace X265_NS { // private namespace -extern int g_ctuSizeConfigured; - extern double x265_lambda_tab[QP_MAX_MAX + 1]; extern double x265_lambda2_tab[QP_MAX_MAX + 1]; extern const uint16_t x265_chroma_lambda2_offset_tab[MAX_CHROMA_LAMBDA_OFFSET + 1];
View file
x265_2.4.tar.gz/source/common/cpu.cpp -> x265_2.5.tar.gz/source/common/cpu.cpp
Changed
@@ -69,6 +69,7 @@ { "SSE2Slow", SSE2 | X265_CPU_SSE2_IS_SLOW }, { "SSE2", SSE2 }, { "SSE2Fast", SSE2 | X265_CPU_SSE2_IS_FAST }, + { "LZCNT", X265_CPU_LZCNT }, { "SSE3", SSE2 | X265_CPU_SSE3 }, { "SSSE3", SSE2 | X265_CPU_SSE3 | X265_CPU_SSSE3 }, { "SSE4.1", SSE2 | X265_CPU_SSE3 | X265_CPU_SSSE3 | X265_CPU_SSE4 }, @@ -78,16 +79,17 @@ { "AVX", AVX }, { "XOP", AVX | X265_CPU_XOP }, { "FMA4", AVX | X265_CPU_FMA4 }, - { "AVX2", AVX | X265_CPU_AVX2 }, { "FMA3", AVX | X265_CPU_FMA3 }, + { "BMI1", AVX | X265_CPU_LZCNT | X265_CPU_BMI1 }, + { "BMI2", AVX | X265_CPU_LZCNT | X265_CPU_BMI1 | X265_CPU_BMI2 }, +#define AVX2 AVX | X265_CPU_FMA3 | X265_CPU_LZCNT | X265_CPU_BMI1 | X265_CPU_BMI2 | X265_CPU_AVX2 + { "AVX2", AVX2}, +#undef AVX2 #undef AVX #undef SSE2 #undef MMX2 { "Cache32", X265_CPU_CACHELINE_32 }, { "Cache64", X265_CPU_CACHELINE_64 }, - { "LZCNT", X265_CPU_LZCNT }, - { "BMI1", X265_CPU_BMI1 }, - { "BMI2", X265_CPU_BMI1 | X265_CPU_BMI2 }, { "SlowCTZ", X265_CPU_SLOW_CTZ }, { "SlowAtom", X265_CPU_SLOW_ATOM }, { "SlowPshufb", X265_CPU_SLOW_PSHUFB },
View file
x265_2.4.tar.gz/source/common/cudata.cpp -> x265_2.5.tar.gz/source/common/cudata.cpp
Changed
@@ -28,6 +28,7 @@ #include "picyuv.h" #include "mv.h" #include "cudata.h" +#define MAX_MV 1 << 14 using namespace X265_NS; @@ -110,25 +111,23 @@ } -cubcast_t CUData::s_partSet[NUM_FULL_DEPTH] = { NULL, NULL, NULL, NULL, NULL }; -uint32_t CUData::s_numPartInCUSize; - CUData::CUData() { memset(this, 0, sizeof(*this)); } -void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance) +void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, const x265_param& param, int instance) { + int csp = param.internalCsp; m_chromaFormat = csp; m_hChromaShift = CHROMA_H_SHIFT(csp); m_vChromaShift = CHROMA_V_SHIFT(csp); - m_numPartitions = NUM_4x4_PARTITIONS >> (depth * 2); + m_numPartitions = param.num4x4Partitions >> (depth * 2); if (!s_partSet[0]) { - s_numPartInCUSize = 1 << g_unitSizeDepth; - switch (g_maxLog2CUSize) + s_numPartInCUSize = 1 << param.unitSizeDepth; + switch (param.maxLog2CUSize) { case 6: s_partSet[0] = bcast256; @@ -220,7 +219,7 @@ m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions; - uint32_t cuSize = g_maxCUSize >> depth; + uint32_t cuSize = param.maxCUSize >> depth; m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (cuSize * cuSize); m_trCoeff[1] = m_trCoeff[2] = 0; m_transformSkip[1] = m_transformSkip[2] = m_cbf[1] = m_cbf[2] = 0; @@ -262,7 +261,7 @@ m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions; - uint32_t cuSize = g_maxCUSize >> depth; + uint32_t cuSize = param.maxCUSize >> depth; uint32_t sizeL = cuSize * cuSize; uint32_t sizeC = sizeL >> (m_hChromaShift + m_vChromaShift); // block chroma part m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (sizeL + sizeC * 2); @@ -278,17 +277,17 @@ m_encData = frame.m_encData; m_slice = m_encData->m_slice; m_cuAddr = cuAddr; - m_cuPelX = (cuAddr % m_slice->m_sps->numCuInWidth) << g_maxLog2CUSize; - m_cuPelY = (cuAddr / m_slice->m_sps->numCuInWidth) << g_maxLog2CUSize; + m_cuPelX = (cuAddr % m_slice->m_sps->numCuInWidth) << m_slice->m_param->maxLog2CUSize; + m_cuPelY = (cuAddr / m_slice->m_sps->numCuInWidth) << m_slice->m_param->maxLog2CUSize; m_absIdxInCTU = 0; - m_numPartitions = NUM_4x4_PARTITIONS; + m_numPartitions = m_encData->m_param->num4x4Partitions; m_bFirstRowInSlice = (uint8_t)firstRowInSlice; m_bLastRowInSlice = (uint8_t)lastRowInSlice; m_bLastCuInSlice = (uint8_t)lastCuInSlice; /* sequential memsets */ m_partSet((uint8_t*)m_qp, (uint8_t)qp); - m_partSet(m_log2CUSize, (uint8_t)g_maxLog2CUSize); + m_partSet(m_log2CUSize, (uint8_t)m_slice->m_param->maxLog2CUSize); m_partSet(m_lumaIntraDir, (uint8_t)ALL_IDX); m_partSet(m_chromaIntraDir, (uint8_t)ALL_IDX); m_partSet(m_tqBypass, (uint8_t)frame.m_encData->m_param->bLossless); @@ -390,7 +389,7 @@ memcpy(m_distortion + offset, subCU.m_distortion, childGeom.numPartitions * sizeof(sse_t)); - uint32_t tmp = 1 << ((g_maxLog2CUSize - childGeom.depth) * 2); + uint32_t tmp = 1 << ((m_slice->m_param->maxLog2CUSize - childGeom.depth) * 2); uint32_t tmp2 = subPartIdx * tmp; memcpy(m_trCoeff[0] + tmp2, subCU.m_trCoeff[0], sizeof(coeff_t)* tmp); @@ -489,7 +488,7 @@ memcpy(ctu.m_distortion + m_absIdxInCTU, m_distortion, m_numPartitions * sizeof(sse_t)); - uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2); + uint32_t tmpY = 1 << ((m_slice->m_param->maxLog2CUSize - depth) * 2); uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2); memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY); @@ -568,7 +567,7 @@ m_partCopy(ctu.m_tuDepth + m_absIdxInCTU, m_tuDepth); m_partCopy(ctu.m_cbf[0] + m_absIdxInCTU, m_cbf[0]); - uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2); + uint32_t tmpY = 1 << ((m_slice->m_param->maxLog2CUSize - depth) * 2); uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2); memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY); @@ -656,7 +655,7 @@ return m_cuLeft; } - alPartUnitIdx = NUM_4x4_PARTITIONS - 1; + alPartUnitIdx = m_encData->m_param->num4x4Partitions - 1; return m_cuAboveLeft; } @@ -799,7 +798,7 @@ /* Get left QpMinCu */ const CUData* CUData::getQpMinCuLeft(uint32_t& lPartUnitIdx, uint32_t curAbsIdxInCTU) const { - uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2); + uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2); uint32_t absRorderQpMinCUIdx = g_zscanToRaster[absZorderQpMinCUIdx]; // check for left CTU boundary @@ -816,7 +815,7 @@ /* Get above QpMinCu */ const CUData* CUData::getQpMinCuAbove(uint32_t& aPartUnitIdx, uint32_t curAbsIdxInCTU) const { - uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2); + uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2); uint32_t absRorderQpMinCUIdx = g_zscanToRaster[absZorderQpMinCUIdx]; // check for top CTU boundary @@ -855,7 +854,7 @@ int8_t CUData::getLastCodedQP(uint32_t absPartIdx) const { - uint32_t quPartIdxMask = 0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2; + uint32_t quPartIdxMask = 0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2; int lastValidPartIdx = getLastValidPartIdx(absPartIdx & quPartIdxMask); if (lastValidPartIdx >= 0) @@ -865,7 +864,7 @@ if (m_absIdxInCTU) return m_encData->getPicCTU(m_cuAddr)->getLastCodedQP(m_absIdxInCTU); else if (m_cuAddr > 0 && !(m_slice->m_pps->bEntropyCodingSyncEnabled && !(m_cuAddr % m_slice->m_sps->numCuInWidth))) - return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(NUM_4x4_PARTITIONS); + return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(m_encData->m_param->num4x4Partitions); else return (int8_t)m_slice->m_sliceQp; } @@ -997,7 +996,7 @@ bool CUData::setQPSubCUs(int8_t qp, uint32_t absPartIdx, uint32_t depth) { - uint32_t curPartNumb = NUM_4x4_PARTITIONS >> (depth << 1); + uint32_t curPartNumb = m_encData->m_param->num4x4Partitions >> (depth << 1); uint32_t curPartNumQ = curPartNumb >> 2; if (m_cuDepth[absPartIdx] > depth) @@ -1623,6 +1622,11 @@ dir |= (1 << list); candMvField[count][list].mv = colmv; candMvField[count][list].refIdx = refIdx; + if (m_encData->m_param->scaleFactor && m_encData->m_param->analysisReuseMode == X265_ANALYSIS_SAVE && m_log2CUSize[0] < 4) + { + MV dist(MAX_MV, MAX_MV); + candMvField[count][list].mv = dist; + } } } @@ -1783,7 +1787,13 @@ int curRefPOC = m_slice->m_refPOCList[picList][refIdx]; int curPOC = m_slice->m_poc; - pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC); + if (m_encData->m_param->scaleFactor && m_encData->m_param->analysisReuseMode == X265_ANALYSIS_SAVE && (m_log2CUSize[0] < 4)) + { + MV dist(MAX_MV, MAX_MV); + pmv[numMvc++] = amvpCand[num++] = dist; + } + else + pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC); } } @@ -1905,10 +1915,10 @@ uint32_t offset = 8; int16_t xmax = (int16_t)((m_slice->m_sps->picWidthInLumaSamples + offset - m_cuPelX - 1) << mvshift); - int16_t xmin = -(int16_t)((g_maxCUSize + offset + m_cuPelX - 1) << mvshift); + int16_t xmin = -(int16_t)((m_encData->m_param->maxCUSize + offset + m_cuPelX - 1) << mvshift); int16_t ymax = (int16_t)((m_slice->m_sps->picHeightInLumaSamples + offset - m_cuPelY - 1) << mvshift); - int16_t ymin = -(int16_t)((g_maxCUSize + offset + m_cuPelY - 1) << mvshift); + int16_t ymin = -(int16_t)((m_encData->m_param->maxCUSize + offset + m_cuPelY - 1) << mvshift); outMV.x = X265_MIN(xmax, X265_MAX(xmin, outMV.x)); outMV.y = X265_MIN(ymax, X265_MAX(ymin, outMV.y));
View file
x265_2.4.tar.gz/source/common/cudata.h -> x265_2.5.tar.gz/source/common/cudata.h
Changed
@@ -161,8 +161,8 @@ { public: - static cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth - static uint32_t s_numPartInCUSize; + cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth + uint32_t s_numPartInCUSize; bool m_vbvAffected; @@ -225,7 +225,7 @@ CUData(); - void initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance); + void initialize(const CUDataMemPool& dataPool, uint32_t depth, const x265_param& param, int instance); static void calcCTUGeoms(uint32_t ctuWidth, uint32_t ctuHeight, uint32_t maxCUSize, uint32_t minCUSize, CUGeom cuDataArray[CUGeom::MAX_GEOMS]); void initCTU(const Frame& frame, uint32_t cuAddr, int qp, uint32_t firstRowInSlice, uint32_t lastRowInSlice, uint32_t lastCUInSlice); @@ -271,7 +271,7 @@ void getInterTUQtDepthRange(uint32_t tuDepthRange[2], uint32_t absPartIdx) const; uint32_t getBestRefIdx(uint32_t subPartIdx) const { return ((m_interDir[subPartIdx] & 1) << m_refIdx[0][subPartIdx]) | (((m_interDir[subPartIdx] >> 1) & 1) << (m_refIdx[1][subPartIdx] + 16)); } - uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (g_unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; } + uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (m_slice->m_param->unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; } uint32_t getNumPartInter(uint32_t absPartIdx) const { return nbPartsTable[(int)m_partSize[absPartIdx]]; } bool isIntra(uint32_t absPartIdx) const { return m_predMode[absPartIdx] == MODE_INTRA; } @@ -285,7 +285,7 @@ void getAllowedChromaDir(uint32_t absPartIdx, uint32_t* modeList) const; int getIntraDirLumaPredictor(uint32_t absPartIdx, uint32_t* intraDirPred) const; - uint32_t getSCUAddr() const { return (m_cuAddr << g_unitSizeDepth * 2) + m_absIdxInCTU; } + uint32_t getSCUAddr() const { return (m_cuAddr << m_slice->m_param->unitSizeDepth * 2) + m_absIdxInCTU; } uint32_t getCtxSplitFlag(uint32_t absPartIdx, uint32_t depth) const; uint32_t getCtxSkipFlag(uint32_t absPartIdx) const; void getTUEntropyCodingParameters(TUEntropyCodingParameters &result, uint32_t absPartIdx, uint32_t log2TrSize, bool bIsLuma) const; @@ -350,10 +350,10 @@ CUDataMemPool() { charMemBlock = NULL; trCoeffMemBlock = NULL; mvMemBlock = NULL; distortionMemBlock = NULL; } - bool create(uint32_t depth, uint32_t csp, uint32_t numInstances) + bool create(uint32_t depth, uint32_t csp, uint32_t numInstances, const x265_param& param) { - uint32_t numPartition = NUM_4x4_PARTITIONS >> (depth * 2); - uint32_t cuSize = g_maxCUSize >> depth; + uint32_t numPartition = param.num4x4Partitions >> (depth * 2); + uint32_t cuSize = param.maxCUSize >> depth; uint32_t sizeL = cuSize * cuSize; if (csp == X265_CSP_I400) {
View file
x265_2.4.tar.gz/source/common/frame.cpp -> x265_2.5.tar.gz/source/common/frame.cpp
Changed
@@ -48,6 +48,11 @@ m_rcData = NULL; m_encodeStartTime = 0; m_reconfigureRc = false; + m_ctuInfo = NULL; + m_prevCtuInfoChange = NULL; + m_addOnDepth = NULL; + m_addOnCtuInfo = NULL; + m_addOnPrevChange = NULL; } bool Frame::create(x265_param *param, float* quantOffsets) @@ -56,11 +61,26 @@ m_param = param; CHECKED_MALLOC_ZERO(m_rcData, RcStats, 1); - if (m_fencPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp) && - m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize)) + if (param->bCTUInfo) + { + uint32_t widthInCTU = (m_param->sourceWidth + param->maxCUSize - 1) >> m_param->maxLog2CUSize; + uint32_t heightInCTU = (m_param->sourceHeight + param->maxCUSize - 1) >> m_param->maxLog2CUSize; + uint32_t numCTUsInFrame = widthInCTU * heightInCTU; + CHECKED_MALLOC_ZERO(m_addOnDepth, uint8_t *, numCTUsInFrame); + CHECKED_MALLOC_ZERO(m_addOnCtuInfo, uint8_t *, numCTUsInFrame); + CHECKED_MALLOC_ZERO(m_addOnPrevChange, int *, numCTUsInFrame); + for (uint32_t i = 0; i < numCTUsInFrame; i++) + { + CHECKED_MALLOC_ZERO(m_addOnDepth[i], uint8_t, uint32_t(param->num4x4Partitions)); + CHECKED_MALLOC_ZERO(m_addOnCtuInfo[i], uint8_t, uint32_t(param->num4x4Partitions)); + CHECKED_MALLOC_ZERO(m_addOnPrevChange[i], int, uint32_t(param->num4x4Partitions)); + } + } + + if (m_fencPic->create(param) && m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize)) { X265_CHECK((m_reconColCount == NULL), "m_reconColCount was initialized"); - m_numRows = (m_fencPic->m_picHeight + g_maxCUSize - 1) / g_maxCUSize; + m_numRows = (m_fencPic->m_picHeight + param->maxCUSize - 1) / param->maxCUSize; m_reconRowFlag = new ThreadSafeInteger[m_numRows]; m_reconColCount = new ThreadSafeInteger[m_numRows]; @@ -86,12 +106,12 @@ m_reconPic = new PicYuv; m_param = param; m_encData->m_reconPic = m_reconPic; - bool ok = m_encData->create(*param, sps, m_fencPic->m_picCsp) && m_reconPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp); + bool ok = m_encData->create(*param, sps, m_fencPic->m_picCsp) && m_reconPic->create(param); if (ok) { /* initialize right border of m_reconpicYuv as SAO may read beyond the * end of the picture accessing uninitialized pixels */ - int maxHeight = sps.numCuInHeight * g_maxCUSize; + int maxHeight = sps.numCuInHeight * param->maxCUSize; memset(m_reconPic->m_picOrg[0], 0, sizeof(pixel)* m_reconPic->m_stride * maxHeight); /* use pre-calculated cu/pu offsets cached in the SPS structure */ @@ -166,6 +186,35 @@ delete[] m_userSEI.payloads; } + if (m_ctuInfo) + { + uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize; + uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize; + uint32_t numCUsInFrame = widthInCU * heightInCU; + for (uint32_t i = 0; i < numCUsInFrame; i++) + { + X265_FREE((*m_ctuInfo + i)->ctuInfo); + (*m_ctuInfo + i)->ctuInfo = NULL; + X265_FREE(m_addOnDepth[i]); + m_addOnDepth[i] = NULL; + X265_FREE(m_addOnCtuInfo[i]); + m_addOnCtuInfo[i] = NULL; + X265_FREE(m_addOnPrevChange[i]); + m_addOnPrevChange[i] = NULL; + } + X265_FREE(*m_ctuInfo); + *m_ctuInfo = NULL; + X265_FREE(m_ctuInfo); + m_ctuInfo = NULL; + X265_FREE(m_prevCtuInfoChange); + m_prevCtuInfoChange = NULL; + X265_FREE(m_addOnDepth); + m_addOnDepth = NULL; + X265_FREE(m_addOnCtuInfo); + m_addOnCtuInfo = NULL; + X265_FREE(m_addOnPrevChange); + m_addOnPrevChange = NULL; + } m_lowres.destroy(); X265_FREE(m_rcData); }
View file
x265_2.4.tar.gz/source/common/frame.h -> x265_2.5.tar.gz/source/common/frame.h
Changed
@@ -66,6 +66,10 @@ double shortTermCplxCount; int64_t totalBits; int64_t encodedBits; + double coeff[4]; + double count[4]; + double offset[4]; + double bufferFillFinal; }; class Frame @@ -108,7 +112,14 @@ x265_analysis_2Pass m_analysis2Pass; RcStats* m_rcData; + x265_ctu_info_t** m_ctuInfo; + Event m_copied; + int* m_prevCtuInfoChange; int64_t m_encodeStartTime; + + uint8_t** m_addOnDepth; + uint8_t** m_addOnCtuInfo; + int** m_addOnPrevChange; Frame(); bool create(x265_param *param, float* quantOffsets);
View file
x265_2.4.tar.gz/source/common/framedata.cpp -> x265_2.5.tar.gz/source/common/framedata.cpp
Changed
@@ -41,9 +41,9 @@ if (param.rc.bStatWrite) m_spsrps = const_cast<RPS*>(sps.spsrps); - m_cuMemPool.create(0, param.internalCsp, sps.numCUsInFrame); + m_cuMemPool.create(0, param.internalCsp, sps.numCUsInFrame, param); for (uint32_t ctuAddr = 0; ctuAddr < sps.numCUsInFrame; ctuAddr++) - m_picCTU[ctuAddr].initialize(m_cuMemPool, 0, param.internalCsp, ctuAddr); + m_picCTU[ctuAddr].initialize(m_cuMemPool, 0, param, ctuAddr); CHECKED_MALLOC_ZERO(m_cuStat, RCStatCU, sps.numCUsInFrame); CHECKED_MALLOC(m_rowStat, RCStatRow, sps.numCuInHeight);
View file
x265_2.4.tar.gz/source/common/framedata.h -> x265_2.5.tar.gz/source/common/framedata.h
Changed
@@ -62,6 +62,7 @@ double percentMergeCu[NUM_CU_DEPTH]; double percentIntraDistribution[NUM_CU_DEPTH][INTRA_MODES]; double percentInterDistribution[NUM_CU_DEPTH][3]; // 2Nx2N, RECT, AMP modes percentage + double ipCostRatio; uint64_t cntIntraNxN; uint64_t totalCu; @@ -78,6 +79,15 @@ uint64_t cuInterDistribution[NUM_CU_DEPTH][INTER_MODES]; uint64_t cuIntraDistribution[NUM_CU_DEPTH][INTRA_MODES]; + + uint64_t totalPu[NUM_CU_DEPTH + 1]; + uint64_t cntSkipPu[NUM_CU_DEPTH]; + uint64_t cntIntraPu[NUM_CU_DEPTH]; + uint64_t cntAmp[NUM_CU_DEPTH]; + uint64_t cnt4x4; + uint64_t cntInterPu[NUM_CU_DEPTH][INTER_MODES - 1]; + uint64_t cntMergePu[NUM_CU_DEPTH][INTER_MODES - 1]; + FrameStats() { memset(this, 0, sizeof(FrameStats));
View file
x265_2.4.tar.gz/source/common/ipfilter.cpp -> x265_2.5.tar.gz/source/common/ipfilter.cpp
Changed
@@ -123,9 +123,8 @@ const int16_t* coeff = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx]; int headRoom = IF_INTERNAL_PREC - X265_DEPTH; int shift = IF_FILTER_PREC - headRoom; - int offset = -IF_INTERNAL_OFFS << shift; + int offset = (unsigned)-IF_INTERNAL_OFFS << shift; int blkheight = height; - src -= N / 2 - 1; if (isRowExt) @@ -209,10 +208,8 @@ const int16_t* c = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx]; int headRoom = IF_INTERNAL_PREC - X265_DEPTH; int shift = IF_FILTER_PREC - headRoom; - int offset = -IF_INTERNAL_OFFS << shift; - + int offset = (unsigned)-IF_INTERNAL_OFFS << shift; src -= (N / 2 - 1) * srcStride; - int row, col; for (row = 0; row < height; row++) {
View file
x265_2.4.tar.gz/source/common/lowres.h -> x265_2.5.tar.gz/source/common/lowres.h
Changed
@@ -118,6 +118,8 @@ bool bKeyframe; bool bLastMiniGopBFrame; + double ipCostRatio; + /* lookahead output data */ int64_t costEst[X265_BFRAME_MAX + 2][X265_BFRAME_MAX + 2]; int64_t costEstAq[X265_BFRAME_MAX + 2][X265_BFRAME_MAX + 2];
View file
x265_2.4.tar.gz/source/common/param.cpp -> x265_2.5.tar.gz/source/common/param.cpp
Changed
@@ -110,6 +110,7 @@ param->frameNumThreads = 0; param->logLevel = X265_LOG_INFO; + param->csvLogLevel = 0; param->csvfn = NULL; param->rc.lambdaFileName = NULL; param->bLogCuStats = 0; @@ -194,10 +195,10 @@ param->rdPenalty = 0; param->psyRd = 2.0; param->psyRdoq = 0.0; - param->analysisMode = 0; + param->analysisReuseMode = 0; param->analysisMultiPassRefine = 0; param->analysisMultiPassDistortion = 0; - param->analysisFileName = NULL; + param->analysisReuseFileName = NULL; param->bIntraInBFrames = 0; param->bLossless = 0; param->bCULossless = 0; @@ -236,6 +237,7 @@ param->rc.bEnableGrain = 0; param->rc.qpMin = 0; param->rc.qpMax = QP_MAX_MAX; + param->rc.bEnableConstVbv = 0; /* Video Usability Information (VUI) */ param->vui.aspectRatioIdc = 0; @@ -271,10 +273,18 @@ param->bOptCUDeltaQP = 0; param->bAQMotion = 0; param->bHDROpt = 0; - param->analysisRefineLevel = 5; + param->analysisReuseLevel = 5; param->toneMapFile = NULL; param->bDhdr10opt = 0; + param->bCTUInfo = 0; + param->bUseRcStats = 0; + param->scaleFactor = 0; + param->intraRefine = 0; + param->interRefine = 0; + param->mvRefine = 0; + param->bUseAnalysisFile = 1; + param->csvfpt = NULL; } int x265_param_default_preset(x265_param* param, const char* preset, const char* tune) @@ -494,6 +504,7 @@ param->psyRd = 4.0; param->psyRdoq = 10.0; param->bEnableSAO = 0; + param->rc.bEnableConstVbv = 1; } else return -1; @@ -828,7 +839,7 @@ p->rc.bStrictCbr = atobool(value); p->rc.pbFactor = 1.0; } - OPT("analysis-mode") p->analysisMode = parseName(value, x265_analysis_names, bError); + OPT("analysis-reuse-mode") p->analysisReuseMode = parseName(value, x265_analysis_names, bError); OPT("sar") { p->vui.aspectRatioIdc = parseName(value, x265_sar_names, bError); @@ -907,7 +918,7 @@ OPT("scaling-list") p->scalingLists = strdup(value); OPT2("pools", "numa-pools") p->numaPools = strdup(value); OPT("lambda-file") p->rc.lambdaFileName = strdup(value); - OPT("analysis-file") p->analysisFileName = strdup(value); + OPT("analysis-reuse-file") p->analysisReuseFileName = strdup(value); OPT("qg-size") p->rc.qgSize = atoi(value); OPT("master-display") p->masteringDisplayColorVolume = strdup(value); OPT("max-cll") bError |= sscanf(value, "%hu,%hu", &p->maxCLL, &p->maxFALL) != 2; @@ -921,6 +932,8 @@ if (bExtraParams) { if (0) ; + OPT("csv") p->csvfn = strdup(value); + OPT("csv-log-level") p->csvLogLevel = atoi(value); OPT("qpmin") p->rc.qpMin = atoi(value); OPT("analyze-src-pics") p->bSourceReferenceEstimation = atobool(value); OPT("log2-max-poc-lsb") p->log2MaxPocLsb = atoi(value); @@ -938,7 +951,7 @@ OPT("multi-pass-opt-distortion") p->analysisMultiPassDistortion = atobool(value); OPT("aq-motion") p->bAQMotion = atobool(value); OPT("dynamic-rd") p->dynamicRd = atof(value); - OPT("refine-level") p->analysisRefineLevel = atoi(value); + OPT("analysis-reuse-level") p->analysisReuseLevel = atoi(value); OPT("ssim-rd") { int bval = atobool(value); @@ -954,6 +967,12 @@ OPT("limit-sao") p->bLimitSAO = atobool(value); OPT("dhdr10-info") p->toneMapFile = strdup(value); OPT("dhdr10-opt") p->bDhdr10opt = atobool(value); + OPT("const-vbv") p->rc.bEnableConstVbv = atobool(value); + OPT("ctu-info") p->bCTUInfo = atoi(value); + OPT("scale-factor") p->scaleFactor = atoi(value); + OPT("refine-intra")p->intraRefine = atoi(value); + OPT("refine-inter")p->interRefine = atobool(value); + OPT("refine-mv")p->mvRefine = atobool(value); else return X265_PARAM_BAD_NAME; } @@ -1284,16 +1303,19 @@ "Constant QP is incompatible with 2pass"); CHECK(param->rc.bStrictCbr && (param->rc.bitrate <= 0 || param->rc.vbvBufferSize <=0), "Strict-cbr cannot be applied without specifying target bitrate or vbv bufsize"); - CHECK(param->analysisMode && (param->analysisMode < X265_ANALYSIS_OFF || param->analysisMode > X265_ANALYSIS_LOAD), + CHECK(param->analysisReuseMode && (param->analysisReuseMode < X265_ANALYSIS_OFF || param->analysisReuseMode > X265_ANALYSIS_LOAD), "Invalid analysis mode. Analysis mode 0: OFF 1: SAVE : 2 LOAD"); - CHECK(param->analysisMode && (param->analysisRefineLevel < 1 || param->analysisRefineLevel > 10), + CHECK(param->analysisReuseMode && (param->analysisReuseLevel < 1 || param->analysisReuseLevel > 10), "Invalid analysis refine level. Value must be between 1 and 10 (inclusive)"); + CHECK(param->scaleFactor > 2, "Invalid scale-factor. Supports factor <= 2"); CHECK(param->rc.qpMax < QP_MIN || param->rc.qpMax > QP_MAX_MAX, "qpmax exceeds supported range (0 to 69)"); CHECK(param->rc.qpMin < QP_MIN || param->rc.qpMin > QP_MAX_MAX, "qpmin exceeds supported range (0 to 69)"); CHECK(param->log2MaxPocLsb < 4 || param->log2MaxPocLsb > 16, "Supported range for log2MaxPocLsb is 4 to 16"); + CHECK(param->bCTUInfo < 0 || (param->bCTUInfo != 0 && param->bCTUInfo != 1 && param->bCTUInfo != 2 && param->bCTUInfo != 4 && param->bCTUInfo != 6) || param->bCTUInfo > 6, + "Supported values for bCTUInfo are 0, 1, 2, 4, 6"); #if !X86_64 CHECK(param->searchMethod == X265_SEA && (param->sourceWidth > 840 || param->sourceHeight > 480), "SEA motion search does not support resolutions greater than 480p in 32 bit build"); @@ -1322,42 +1344,6 @@ } } -int x265_set_globals(x265_param* param) -{ - uint32_t maxLog2CUSize = (uint32_t)g_log2Size[param->maxCUSize]; - uint32_t minLog2CUSize = (uint32_t)g_log2Size[param->minCUSize]; - - Lock gLock; - ScopedLock sLock(gLock); - - if (++g_ctuSizeConfigured > 1) - { - if (g_maxCUSize != param->maxCUSize) - { - x265_log(param, X265_LOG_WARNING, "maxCUSize must be the same for all encoders in a single process"); - } - if (g_maxCUDepth != maxLog2CUSize - minLog2CUSize) - { - x265_log(param, X265_LOG_WARNING, "maxCUDepth must be the same for all encoders in a single process"); - } - param->maxCUSize = g_maxCUSize; - return x265_check_params(param); /* Check again, since param may have changed */ - } - else - { - // set max CU width & height - g_maxCUSize = param->maxCUSize; - g_maxLog2CUSize = maxLog2CUSize; - - // compute actual CU depth with respect to config depth and max transform size - g_maxCUDepth = maxLog2CUSize - minLog2CUSize; - g_unitSizeDepth = maxLog2CUSize - LOG2_UNIT_SIZE; - } - - g_maxSlices = param->maxSlices; - return 0; -} - static void appendtool(x265_param* param, char* buf, size_t size, const char* toolstr) { static const int overhead = (int)strlen("x265 [info]: tools: "); @@ -1457,6 +1443,7 @@ TOOLOPT(param->bEnableStrongIntraSmoothing, "strong-intra-smoothing"); TOOLVAL(param->lookaheadSlices, "lslices=%d"); TOOLVAL(param->lookaheadThreads, "lthreads=%d") + TOOLVAL(param->bCTUInfo, "ctu-info=%d"); if (param->maxSlices > 1) TOOLVAL(param->maxSlices, "slices=%d"); if (param->bEnableLoopFilter) @@ -1473,8 +1460,8 @@ TOOLOPT(!param->bSaoNonDeblocked && param->bEnableSAO, "sao"); TOOLOPT(param->rc.bStatWrite, "stats-write"); TOOLOPT(param->rc.bStatRead, "stats-read"); -#if ENABLE_DYNAMIC_HDR10 - TOOLVAL(param->toneMapFile != NULL, "dhdr10-info"); +#if ENABLE_HDR10_PLUS + TOOLOPT(param->toneMapFile != NULL, "dhdr10-info"); #endif x265_log(param, X265_LOG_INFO, "tools:%s\n", buf); fflush(stderr); @@ -1501,6 +1488,8 @@ BOOL(p->bEnablePsnr, "psnr"); BOOL(p->bEnableSsim, "ssim"); s += sprintf(s, " log-level=%d", p->logLevel); + if (p->csvfn) + s += sprintf(s, " csvfn=%s csv-log-level=%d", p->csvfn, p->csvLogLevel); s += sprintf(s, " bitdepth=%d", p->internalBitDepth); s += sprintf(s, " input-csp=%d", p->internalCsp); s += sprintf(s, " fps=%u/%u", p->fpsNum, p->fpsDenom); @@ -1573,7 +1562,7 @@
View file
x265_2.4.tar.gz/source/common/param.h -> x265_2.5.tar.gz/source/common/param.h
Changed
@@ -28,7 +28,6 @@ namespace X265_NS { int x265_check_params(x265_param *param); -int x265_set_globals(x265_param *param); void x265_print_params(x265_param *param); void x265_param_apply_fastfirstpass(x265_param *p); char* x265_param2string(x265_param *param, int padx, int pady);
View file
x265_2.4.tar.gz/source/common/picyuv.cpp -> x265_2.5.tar.gz/source/common/picyuv.cpp
Changed
@@ -46,36 +46,62 @@ m_maxLumaLevel = 0; m_avgLumaLevel = 0; + + m_maxChromaULevel = 0; + m_avgChromaULevel = 0; + + m_maxChromaVLevel = 0; + m_avgChromaVLevel = 0; + +#if (X265_DEPTH > 8) + m_minLumaLevel = 0xFFFF; + m_minChromaULevel = 0xFFFF; + m_minChromaVLevel = 0xFFFF; +#else + m_minLumaLevel = 0xFF; + m_minChromaULevel = 0xFF; + m_minChromaVLevel = 0xFF; +#endif + m_stride = 0; m_strideC = 0; m_hChromaShift = 0; m_vChromaShift = 0; } -bool PicYuv::create(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp) +bool PicYuv::create(x265_param* param, pixel *pixelbuf) { + m_param = param; + uint32_t picWidth = m_param->sourceWidth; + uint32_t picHeight = m_param->sourceHeight; + uint32_t picCsp = m_param->internalCsp; m_picWidth = picWidth; m_picHeight = picHeight; m_hChromaShift = CHROMA_H_SHIFT(picCsp); m_vChromaShift = CHROMA_V_SHIFT(picCsp); m_picCsp = picCsp; - uint32_t numCuInWidth = (m_picWidth + g_maxCUSize - 1) / g_maxCUSize; - uint32_t numCuInHeight = (m_picHeight + g_maxCUSize - 1) / g_maxCUSize; + uint32_t numCuInWidth = (m_picWidth + param->maxCUSize - 1) / param->maxCUSize; + uint32_t numCuInHeight = (m_picHeight + param->maxCUSize - 1) / param->maxCUSize; - m_lumaMarginX = g_maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment - m_lumaMarginY = g_maxCUSize + 16; // margin for 8-tap filter and infinite padding - m_stride = (numCuInWidth * g_maxCUSize) + (m_lumaMarginX << 1); + m_lumaMarginX = param->maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment + m_lumaMarginY = param->maxCUSize + 16; // margin for 8-tap filter and infinite padding + m_stride = (numCuInWidth * param->maxCUSize) + (m_lumaMarginX << 1); - int maxHeight = numCuInHeight * g_maxCUSize; - CHECKED_MALLOC(m_picBuf[0], pixel, m_stride * (maxHeight + (m_lumaMarginY * 2))); - m_picOrg[0] = m_picBuf[0] + m_lumaMarginY * m_stride + m_lumaMarginX; + int maxHeight = numCuInHeight * param->maxCUSize; + if (pixelbuf) + m_picOrg[0] = pixelbuf; + else + { + CHECKED_MALLOC(m_picBuf[0], pixel, m_stride * (maxHeight + (m_lumaMarginY * 2))); + m_picOrg[0] = m_picBuf[0] + m_lumaMarginY * m_stride + m_lumaMarginX; + } if (picCsp != X265_CSP_I400) { m_chromaMarginX = m_lumaMarginX; // keep 16-byte alignment for chroma CTUs m_chromaMarginY = m_lumaMarginY >> m_vChromaShift; - m_strideC = ((numCuInWidth * g_maxCUSize) >> m_hChromaShift) + (m_chromaMarginX * 2); + m_strideC = ((numCuInWidth * m_param->maxCUSize) >> m_hChromaShift) + (m_chromaMarginX * 2); CHECKED_MALLOC(m_picBuf[1], pixel, m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2))); CHECKED_MALLOC(m_picBuf[2], pixel, m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2))); @@ -94,12 +120,33 @@ return false; } +int PicYuv::getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp) +{ + m_picWidth = picWidth; + m_picHeight = picHeight; + m_hChromaShift = CHROMA_H_SHIFT(picCsp); + m_vChromaShift = CHROMA_V_SHIFT(picCsp); + m_picCsp = picCsp; + + uint32_t numCuInWidth = (m_picWidth + m_param->maxCUSize - 1) / m_param->maxCUSize; + uint32_t numCuInHeight = (m_picHeight + m_param->maxCUSize - 1) / m_param->maxCUSize; + + m_lumaMarginX = m_param->maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment + m_lumaMarginY = m_param->maxCUSize + 16; // margin for 8-tap filter and infinite padding + m_stride = (numCuInWidth * m_param->maxCUSize) + (m_lumaMarginX << 1); + + int maxHeight = numCuInHeight * m_param->maxCUSize; + int bufLen = (int)(m_stride * (maxHeight + (m_lumaMarginY * 2))); + + return bufLen; +} + /* the first picture allocated by the encoder will be asked to generate these * offset arrays. Once generated, they will be provided to all future PicYuv * allocated by the same encoder. */ bool PicYuv::createOffsets(const SPS& sps) { - uint32_t numPartitions = 1 << (g_unitSizeDepth * 2); + uint32_t numPartitions = 1 << (m_param->unitSizeDepth * 2); if (m_picCsp != X265_CSP_I400) { @@ -109,8 +156,8 @@ { for (uint32_t cuCol = 0; cuCol < sps.numCuInWidth; cuCol++) { - m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * g_maxCUSize + cuCol * g_maxCUSize; - m_cuOffsetC[cuRow * sps.numCuInWidth + cuCol] = m_strideC * cuRow * (g_maxCUSize >> m_vChromaShift) + cuCol * (g_maxCUSize >> m_hChromaShift); + m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * m_param->maxCUSize + cuCol * m_param->maxCUSize; + m_cuOffsetC[cuRow * sps.numCuInWidth + cuCol] = m_strideC * cuRow * (m_param->maxCUSize >> m_vChromaShift) + cuCol * (m_param->maxCUSize >> m_hChromaShift); } } @@ -129,7 +176,7 @@ CHECKED_MALLOC(m_cuOffsetY, intptr_t, sps.numCuInWidth * sps.numCuInHeight); for (uint32_t cuRow = 0; cuRow < sps.numCuInHeight; cuRow++) for (uint32_t cuCol = 0; cuCol < sps.numCuInWidth; cuCol++) - m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * g_maxCUSize + cuCol * g_maxCUSize; + m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * m_param->maxCUSize + cuCol * m_param->maxCUSize; CHECKED_MALLOC(m_buOffsetY, intptr_t, (size_t)numPartitions); for (uint32_t idx = 0; idx < numPartitions; ++idx) @@ -184,6 +231,11 @@ X265_CHECK(pic.bitDepth >= 8, "pic.bitDepth check failure"); + uint64_t lumaSum; + uint64_t cbSum; + uint64_t crSum; + lumaSum = cbSum = crSum = 0; + if (pic.bitDepth == 8) { #if (X265_DEPTH > 8) @@ -288,6 +340,47 @@ pixel *U = m_picOrg[1]; pixel *V = m_picOrg[2]; + pixel *yPic = m_picOrg[0]; + pixel *uPic = m_picOrg[1]; + pixel *vPic = m_picOrg[2]; + + for (int r = 0; r < height; r++) + { + for (int c = 0; c < width; c++) + { + m_maxLumaLevel = X265_MAX(yPic[c], m_maxLumaLevel); + m_minLumaLevel = X265_MIN(yPic[c], m_minLumaLevel); + lumaSum += yPic[c]; + } + yPic += m_stride; + } + m_avgLumaLevel = (double)lumaSum / (m_picHeight * m_picWidth); + + if (param.csvLogLevel >= 2) + { + if (param.internalCsp != X265_CSP_I400) + { + for (int r = 0; r < height >> m_vChromaShift; r++) + { + for (int c = 0; c < width >> m_hChromaShift; c++) + { + m_maxChromaULevel = X265_MAX(uPic[c], m_maxChromaULevel); + m_minChromaULevel = X265_MIN(uPic[c], m_minChromaULevel); + cbSum += uPic[c]; + + m_maxChromaVLevel = X265_MAX(vPic[c], m_maxChromaVLevel); + m_minChromaVLevel = X265_MIN(vPic[c], m_minChromaVLevel); + crSum += vPic[c]; + } + + uPic += m_strideC; + vPic += m_strideC; + } + m_avgChromaULevel = (double)cbSum / ((height >> m_vChromaShift) * (width >> m_hChromaShift)); + m_avgChromaVLevel = (double)crSum / ((height >> m_vChromaShift) * (width >> m_hChromaShift)); + } + } + #if HIGH_BIT_DEPTH bool calcHDRParams = !!param.minLuma || (param.maxLuma != PIXEL_MAX); /* Apply min/max luma bounds for HDR pixel manipulations */
View file
x265_2.4.tar.gz/source/common/picyuv.h -> x265_2.5.tar.gz/source/common/picyuv.h
Changed
@@ -60,14 +60,25 @@ uint32_t m_chromaMarginX; uint32_t m_chromaMarginY; - pixel m_maxLumaLevel; - double m_avgLumaLevel; + pixel m_maxLumaLevel; + pixel m_minLumaLevel; + double m_avgLumaLevel; + + pixel m_maxChromaULevel; + pixel m_minChromaULevel; + double m_avgChromaULevel; + + pixel m_maxChromaVLevel; + pixel m_minChromaVLevel; + double m_avgChromaVLevel; + x265_param *m_param; PicYuv(); - bool create(uint32_t picWidth, uint32_t picHeight, uint32_t csp); + bool create(x265_param* param, pixel *pixelbuf = NULL); bool createOffsets(const SPS& sps); void destroy(); + int getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp); void copyFromPicture(const x265_picture&, const x265_param& param, int padx, int pady);
View file
x265_2.4.tar.gz/source/common/primitives.cpp -> x265_2.5.tar.gz/source/common/primitives.cpp
Changed
@@ -57,6 +57,7 @@ void setupIntraPrimitives_c(EncoderPrimitives &p); void setupLoopFilterPrimitives_c(EncoderPrimitives &p); void setupSaoPrimitives_c(EncoderPrimitives &p); +void setupSeaIntegralPrimitives_c(EncoderPrimitives &p); void setupCPrimitives(EncoderPrimitives &p) { @@ -66,6 +67,7 @@ setupIntraPrimitives_c(p); // intrapred.cpp setupLoopFilterPrimitives_c(p); // loopfilter.cpp setupSaoPrimitives_c(p); // sao.cpp + setupSeaIntegralPrimitives_c(p); // framefilter.cpp } void setupAliasPrimitives(EncoderPrimitives &p)
View file
x265_2.4.tar.gz/source/common/primitives.h -> x265_2.5.tar.gz/source/common/primitives.h
Changed
@@ -110,6 +110,17 @@ BLOCK_422_32x64 }; +enum IntegralSize +{ + INTEGRAL_4, + INTEGRAL_8, + INTEGRAL_12, + INTEGRAL_16, + INTEGRAL_24, + INTEGRAL_32, + NUM_INTEGRAL_SIZE +}; + typedef int (*pixelcmp_t)(const pixel* fenc, intptr_t fencstride, const pixel* fref, intptr_t frefstride); // fenc is aligned typedef int (*pixelcmp_ss_t)(const int16_t* fenc, intptr_t fencstride, const int16_t* fref, intptr_t frefstride); typedef sse_t (*pixel_sse_t)(const pixel* fenc, intptr_t fencstride, const pixel* fref, intptr_t frefstride); // fenc is aligned @@ -203,6 +214,9 @@ typedef void (*pelFilterLumaStrong_t)(pixel* src, intptr_t srcStep, intptr_t offset, int32_t tcP, int32_t tcQ); typedef void (*pelFilterChroma_t)(pixel* src, intptr_t srcStep, intptr_t offset, int32_t tc, int32_t maskP, int32_t maskQ); +typedef void (*integralv_t)(uint32_t *sum, intptr_t stride); +typedef void (*integralh_t)(uint32_t *sum, pixel *pix, intptr_t stride); + /* Function pointers to optimized encoder primitives. Each pointer can reference * either an assembly routine, a SIMD intrinsic primitive, or a C function */ struct EncoderPrimitives @@ -342,6 +356,9 @@ pelFilterLumaStrong_t pelFilterLumaStrong[2]; // EDGE_VER = 0, EDGE_HOR = 1 pelFilterChroma_t pelFilterChroma[2]; // EDGE_VER = 0, EDGE_HOR = 1 + integralv_t integral_initv[NUM_INTEGRAL_SIZE]; + integralh_t integral_inith[NUM_INTEGRAL_SIZE]; + /* There is one set of chroma primitives per color space. An encoder will * have just a single color space and thus it will only ever use one entry * in this array. However we always fill all entries in the array in case
View file
x265_2.4.tar.gz/source/common/slice.cpp -> x265_2.5.tar.gz/source/common/slice.cpp
Changed
@@ -185,22 +185,22 @@ uint32_t Slice::realEndAddress(uint32_t endCUAddr) const { // Calculate end address - uint32_t internalAddress = (endCUAddr - 1) % NUM_4x4_PARTITIONS; - uint32_t externalAddress = (endCUAddr - 1) / NUM_4x4_PARTITIONS; - uint32_t xmax = m_sps->picWidthInLumaSamples - (externalAddress % m_sps->numCuInWidth) * g_maxCUSize; - uint32_t ymax = m_sps->picHeightInLumaSamples - (externalAddress / m_sps->numCuInWidth) * g_maxCUSize; + uint32_t internalAddress = (endCUAddr - 1) % m_param->num4x4Partitions; + uint32_t externalAddress = (endCUAddr - 1) / m_param->num4x4Partitions; + uint32_t xmax = m_sps->picWidthInLumaSamples - (externalAddress % m_sps->numCuInWidth) * m_param->maxCUSize; + uint32_t ymax = m_sps->picHeightInLumaSamples - (externalAddress / m_sps->numCuInWidth) * m_param->maxCUSize; while (g_zscanToPelX[internalAddress] >= xmax || g_zscanToPelY[internalAddress] >= ymax) internalAddress--; internalAddress++; - if (internalAddress == NUM_4x4_PARTITIONS) + if (internalAddress == m_param->num4x4Partitions) { internalAddress = 0; externalAddress++; } - return externalAddress * NUM_4x4_PARTITIONS + internalAddress; + return externalAddress * m_param->num4x4Partitions + internalAddress; }
View file
x265_2.4.tar.gz/source/common/slice.h -> x265_2.5.tar.gz/source/common/slice.h
Changed
@@ -360,6 +360,7 @@ int m_iPPSQpMinus26; int numRefIdxDefault[2]; int m_iNumRPSInSPS; + const x265_param *m_param; Slice() {
View file
x265_2.4.tar.gz/source/common/threadpool.cpp -> x265_2.5.tar.gz/source/common/threadpool.cpp
Changed
@@ -253,6 +253,7 @@ int cpusPerNode[MAX_NODE_NUM + 1]; int threadsPerPool[MAX_NODE_NUM + 2]; uint64_t nodeMaskPerPool[MAX_NODE_NUM + 2]; + int totalNumThreads = 0; memset(cpusPerNode, 0, sizeof(cpusPerNode)); memset(threadsPerPool, 0, sizeof(threadsPerPool)); @@ -388,9 +389,23 @@ if (bNumaSupport) x265_log(p, X265_LOG_DEBUG, "NUMA node %d may use %d logical cores\n", i, cpusPerNode[i]); if (threadsPerPool[i]) + { numPools += (threadsPerPool[i] + MAX_POOL_THREADS - 1) / MAX_POOL_THREADS; + totalNumThreads += threadsPerPool[i]; + } } + if (!isThreadsReserved) + { + if (!numPools) + { + x265_log(p, X265_LOG_DEBUG, "No pool thread available. Deciding frame-threads based on detected CPU threads\n"); + totalNumThreads = ThreadPool::getCpuCount(); // auto-detect frame threads + } + if (!p->frameNumThreads) + ThreadPool::getFrameThreadsCount(p, totalNumThreads); + } + if (!numPools) return NULL; @@ -412,7 +427,7 @@ node++; int numThreads = X265_MIN(MAX_POOL_THREADS, threadsPerPool[node]); int origNumThreads = numThreads; - if (p->lookaheadThreads > numThreads / 2) + if (i == 0 && p->lookaheadThreads > numThreads / 2) { p->lookaheadThreads = numThreads / 2; x265_log(p, X265_LOG_DEBUG, "Setting lookahead threads to a maximum of half the total number of threads\n"); @@ -423,7 +438,7 @@ maxProviders = 1; } - else + else if (i == 0) numThreads -= p->lookaheadThreads; if (!pools[i].create(numThreads, maxProviders, nodeMaskPerPool[node])) { @@ -643,4 +658,21 @@ #endif } +void ThreadPool::getFrameThreadsCount(x265_param* p, int cpuCount) +{ + int rows = (p->sourceHeight + p->maxCUSize - 1) >> g_log2Size[p->maxCUSize]; + if (!p->bEnableWavefront) + p->frameNumThreads = X265_MIN3(cpuCount, (rows + 1) / 2, X265_MAX_FRAME_THREADS); + else if (cpuCount >= 32) + p->frameNumThreads = (p->sourceHeight > 2000) ? 6 : 5; + else if (cpuCount >= 16) + p->frameNumThreads = 4; + else if (cpuCount >= 8) + p->frameNumThreads = 3; + else if (cpuCount >= 4) + p->frameNumThreads = 2; + else + p->frameNumThreads = 1; +} + } // end namespace X265_NS
View file
x265_2.4.tar.gz/source/common/threadpool.h -> x265_2.5.tar.gz/source/common/threadpool.h
Changed
@@ -105,6 +105,7 @@ static ThreadPool* allocThreadPools(x265_param* p, int& numPools, bool isThreadsReserved); static int getCpuCount(); static int getNumaNodeCount(); + static void getFrameThreadsCount(x265_param* p,int cpuCount); }; /* Any worker thread may enlist the help of idle worker threads from the same
View file
x265_2.4.tar.gz/source/common/x86/asm-primitives.cpp -> x265_2.5.tar.gz/source/common/x86/asm-primitives.cpp
Changed
@@ -114,6 +114,7 @@ #include "blockcopy8.h" #include "intrapred.h" #include "dct8.h" +#include "seaintegral.h" } #define ALL_LUMA_CU_TYPED(prim, fncdef, fname, cpu) \ @@ -2157,6 +2158,17 @@ p.fix8Unpack = PFX(cutree_fix8_unpack_avx2); p.fix8Pack = PFX(cutree_fix8_pack_avx2); + p.integral_initv[INTEGRAL_4] = PFX(integral4v_avx2); + p.integral_initv[INTEGRAL_8] = PFX(integral8v_avx2); + p.integral_initv[INTEGRAL_12] = PFX(integral12v_avx2); + p.integral_initv[INTEGRAL_16] = PFX(integral16v_avx2); + p.integral_initv[INTEGRAL_24] = PFX(integral24v_avx2); + p.integral_initv[INTEGRAL_32] = PFX(integral32v_avx2); + p.integral_inith[INTEGRAL_4] = PFX(integral4h_avx2); + p.integral_inith[INTEGRAL_8] = PFX(integral8h_avx2); + p.integral_inith[INTEGRAL_12] = PFX(integral12h_avx2); + p.integral_inith[INTEGRAL_16] = PFX(integral16h_avx2); + /* TODO: This kernel needs to be modified to work with HIGH_BIT_DEPTH only p.planeClipAndMax = PFX(planeClipAndMax_avx2); */ @@ -3695,6 +3707,19 @@ p.fix8Unpack = PFX(cutree_fix8_unpack_avx2); p.fix8Pack = PFX(cutree_fix8_pack_avx2); + p.integral_initv[INTEGRAL_4] = PFX(integral4v_avx2); + p.integral_initv[INTEGRAL_8] = PFX(integral8v_avx2); + p.integral_initv[INTEGRAL_12] = PFX(integral12v_avx2); + p.integral_initv[INTEGRAL_16] = PFX(integral16v_avx2); + p.integral_initv[INTEGRAL_24] = PFX(integral24v_avx2); + p.integral_initv[INTEGRAL_32] = PFX(integral32v_avx2); + p.integral_inith[INTEGRAL_4] = PFX(integral4h_avx2); + p.integral_inith[INTEGRAL_8] = PFX(integral8h_avx2); + p.integral_inith[INTEGRAL_12] = PFX(integral12h_avx2); + p.integral_inith[INTEGRAL_16] = PFX(integral16h_avx2); + p.integral_inith[INTEGRAL_24] = PFX(integral24h_avx2); + p.integral_inith[INTEGRAL_32] = PFX(integral32h_avx2); + } #endif }
View file
x265_2.4.tar.gz/source/common/x86/loopfilter.asm -> x265_2.5.tar.gz/source/common/x86/loopfilter.asm
Changed
@@ -1583,7 +1583,7 @@ pshufb m1, m4, m0 pcmpgtb m0, [pb_15] ; m0 = [mask] - pblendvb m6, m6, m1, m0 ; NOTE: don't use 3 parameters style, x264 macro have some bug! + pblendvb m6, m1, m0 pmovsxbw m0, m6 ; offset punpckhbw m6, m6 @@ -1630,7 +1630,7 @@ pshufb m6, m3, m1 pshufb m5, m4, m1 - pblendvb m6, m6, m5, m0 ; NOTE: don't use 3 parameters style, x264 macro have some bug! + pblendvb m6, m5, m0 pmovzxbw m1, m2 ; rec punpckhbw m2, m7 @@ -1904,7 +1904,7 @@ sub r3, r4 movu xmm0, [r3] movu m3, [r0] - pblendvb m5, m5, m3, xmm0 + pblendvb m5, m3, xmm0 movu [r0], m5 .end:
View file
x265_2.4.tar.gz/source/common/x86/pixel-a.asm -> x265_2.5.tar.gz/source/common/x86/pixel-a.asm
Changed
@@ -227,7 +227,7 @@ ; clobber: m3..m7 ; out: %1 = satd %macro SATD_4x4_MMX 3 - %xdefine %%n n%1 + %xdefine %%n nn%1 %assign offset %2*SIZEOF_PIXEL LOAD_DIFF m4, m3, none, [r0+ offset], [r2+ offset] LOAD_DIFF m5, m3, none, [r0+ r1+offset], [r2+ r3+offset]
View file
x265_2.4.tar.gz/source/common/x86/pixel-util8.asm -> x265_2.5.tar.gz/source/common/x86/pixel-util8.asm
Changed
@@ -1597,7 +1597,7 @@ .widthLess8: movu m6, [r1] - pblendvb m6, m6, m7, m0 + pblendvb m6, m7, m0 movu [r1], m6 .nextH:
View file
x265_2.5.tar.gz/source/common/x86/seaintegral.asm
Added
@@ -0,0 +1,1062 @@ +;***************************************************************************** +;* Copyright (C) 2013-2017 MulticoreWare, Inc +;* +;* Authors: Jayashri Murugan <jayashri@multicorewareinc.com> +;* Vignesh V Menon <vignesh@multicorewareinc.com> +;* Praveen Tiwari <praveen@multicorewareinc.com> +;* +;* This program is free software; you can redistribute it and/or modify +;* it under the terms of the GNU General Public License as published by +;* the Free Software Foundation; either version 2 of the License, or +;* (at your option) any later version. +;* +;* This program is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;* GNU General Public License for more details. +;* +;* You should have received a copy of the GNU General Public License +;* along with this program; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. +;* +;* This program is also available under a commercial proprietary license. +;* For more information, contact us at license @ x265.com. +;*****************************************************************************/ + +%include "x86inc.asm" +%include "x86util.asm" + +SECTION .text + +;----------------------------------------------------------------------------- +;void integral_init4v_c(uint32_t *sum4, intptr_t stride) +;----------------------------------------------------------------------------- +INIT_YMM avx2 +cglobal integral4v, 2, 3, 2 + mov r2, r1 + shl r2, 4 + +.loop + movu m0, [r0] + movu m1, [r0 + r2] + psubd m1, m0 + movu [r0], m1 + add r0, 32 + sub r1, 8 + jnz .loop + RET + +;----------------------------------------------------------------------------- +;void integral_init8v_c(uint32_t *sum8, intptr_t stride) +;----------------------------------------------------------------------------- +INIT_YMM avx2 +cglobal integral8v, 2, 3, 2 + mov r2, r1 + shl r2, 5 + +.loop + movu m0, [r0] + movu m1, [r0 + r2] + psubd m1, m0 + movu [r0], m1 + add r0, 32 + sub r1, 8 + jnz .loop + RET + +;----------------------------------------------------------------------------- +;void integral_init12v_c(uint32_t *sum12, intptr_t stride) +;----------------------------------------------------------------------------- +INIT_YMM avx2 +cglobal integral12v, 2, 4, 2 + mov r2, r1 + mov r3, r1 + shl r2, 5 + shl r3, 4 + add r2, r3 + +.loop + movu m0, [r0] + movu m1, [r0 + r2] + psubd m1, m0 + movu [r0], m1 + add r0, 32 + sub r1, 8 + jnz .loop + RET + +;----------------------------------------------------------------------------- +;void integral_init16v_c(uint32_t *sum16, intptr_t stride) +;----------------------------------------------------------------------------- +INIT_YMM avx2 +cglobal integral16v, 2, 3, 2 + mov r2, r1 + shl r2, 6 + +.loop + movu m0, [r0] + movu m1, [r0 + r2] + psubd m1, m0 + movu [r0], m1 + add r0, 32 + sub r1, 8 + jnz .loop + RET + +;----------------------------------------------------------------------------- +;void integral_init24v_c(uint32_t *sum24, intptr_t stride) +;----------------------------------------------------------------------------- +INIT_YMM avx2 +cglobal integral24v, 2, 4, 2 + mov r2, r1 + mov r3, r1 + shl r2, 6 + shl r3, 5 + add r2, r3 + +.loop + movu m0, [r0] + movu m1, [r0 + r2] + psubd m1, m0 + movu [r0], m1 + add r0, 32 + sub r1, 8 + jnz .loop + RET + +;----------------------------------------------------------------------------- +;void integral_init32v_c(uint32_t *sum32, intptr_t stride) +;----------------------------------------------------------------------------- +INIT_YMM avx2 +cglobal integral32v, 2, 3, 2 + mov r2, r1 + shl r2, 7 + +.loop + movu m0, [r0] + movu m1, [r0 + r2] + psubd m1, m0 + movu [r0], m1 + add r0, 32 + sub r1, 8 + jnz .loop + RET + +%macro INTEGRAL_FOUR_HORIZONTAL_16 0 + pmovzxbw m0, [r1] + pmovzxbw m1, [r1 + 1] + paddw m0, m1 + pmovzxbw m1, [r1 + 2] + paddw m0, m1 + pmovzxbw m1, [r1 + 3] + paddw m0, m1 +%endmacro + +%macro INTEGRAL_FOUR_HORIZONTAL_4 0 + movd xm0, [r1] + movd xm1, [r1 + 1] + pmovzxbw xm0, xm0 + pmovzxbw xm1, xm1 + paddw xm0, xm1 + movd xm1, [r1 + 2] + pmovzxbw xm1, xm1 + paddw xm0, xm1 + movd xm1, [r1 + 3] + pmovzxbw xm1, xm1 + paddw xm0, xm1 +%endmacro + +%macro INTEGRAL_FOUR_HORIZONTAL_8_HBD 0 + pmovzxwd m0, [r1] + pmovzxwd m1, [r1 + 2] + paddd m0, m1 + pmovzxwd m1, [r1 + 4] + paddd m0, m1 + pmovzxwd m1, [r1 + 6] + paddd m0, m1 +%endmacro + +%macro INTEGRAL_FOUR_HORIZONTAL_4_HBD 0 + pmovzxwd xm0, [r1] + pmovzxwd xm1, [r1 + 2] + paddd xm0, xm1 + pmovzxwd xm1, [r1 + 4] + paddd xm0, xm1 + pmovzxwd xm1, [r1 + 6] + paddd xm0, xm1 +%endmacro + +;----------------------------------------------------------------------------- +;static void integral_init4h(uint32_t *sum, pixel *pix, intptr_t stride) +;----------------------------------------------------------------------------- +INIT_YMM avx2 +%if HIGH_BIT_DEPTH +cglobal integral4h, 3, 5, 3 + lea r3, [4 * r2] + sub r0, r3 + sub r2, 4 ;stride - 4 + mov r4, r2 + shr r4, 3
View file
x265_2.5.tar.gz/source/common/x86/seaintegral.h
Added
@@ -0,0 +1,42 @@ +/***************************************************************************** +* Copyright (C) 2013-2017 MulticoreWare, Inc +* +* Authors: Vignesh V Menon <vignesh@multicorewareinc.com> +* Jayashri Murugan <jayashri@multicorewareinc.com> +* Praveen Tiwari <praveen@multicorewareinc.com> +* +* This program is free software; you can redistribute it and/or modify +* it under the terms of the GNU General Public License as published by +* the Free Software Foundation; either version 2 of the License, or +* (at your option) any later version. +* +* This program is distributed in the hope that it will be useful, +* but WITHOUT ANY WARRANTY; without even the implied warranty of +* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +* GNU General Public License for more details. +* +* You should have received a copy of the GNU General Public License +* along with this program; if not, write to the Free Software +* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA. +* +* This program is also available under a commercial proprietary license. +* For more information, contact us at license @ x265.com. +*****************************************************************************/ + +#ifndef X265_SEAINTEGRAL_H +#define X265_SEAINTEGRAL_H + +void PFX(integral4v_avx2)(uint32_t *sum, intptr_t stride); +void PFX(integral8v_avx2)(uint32_t *sum, intptr_t stride); +void PFX(integral12v_avx2)(uint32_t *sum, intptr_t stride); +void PFX(integral16v_avx2)(uint32_t *sum, intptr_t stride); +void PFX(integral24v_avx2)(uint32_t *sum, intptr_t stride); +void PFX(integral32v_avx2)(uint32_t *sum, intptr_t stride); +void PFX(integral4h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride); +void PFX(integral8h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride); +void PFX(integral12h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride); +void PFX(integral16h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride); +void PFX(integral24h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride); +void PFX(integral32h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride); + +#endif //X265_SEAINTEGRAL_H
View file
x265_2.4.tar.gz/source/common/x86/x86inc.asm -> x265_2.5.tar.gz/source/common/x86/x86inc.asm
Changed
@@ -76,10 +76,6 @@ SECTION .rodata align=%1 %endmacro -%macro SECTION_TEXT 0-1 16 - SECTION .text align=%1 -%endmacro - %if WIN64 %define PIC %elif ARCH_X86_64 == 0 @@ -139,6 +135,7 @@ %define r%1w %2w %define r%1b %2b %define r%1h %2h + %define %2q %2 %if %0 == 2 %define r%1m %2d %define r%1mp %2 @@ -163,9 +160,9 @@ %define e%1h %3 %define r%1b %2 %define e%1b %2 -%if ARCH_X86_64 == 0 - %define r%1 e%1 -%endif + %if ARCH_X86_64 == 0 + %define r%1 e%1 + %endif %endmacro DECLARE_REG_SIZE ax, al, ah @@ -275,7 +272,7 @@ %macro ASSERT 1 %if (%1) == 0 - %error assert failed + %error assertion ``%1'' failed %endif %endmacro @@ -365,9 +362,19 @@ %ifnum %1 %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT %if %1 > 0 + ; Reserve an additional register for storing the original stack pointer, but avoid using + ; eax/rax for this purpose since it can potentially get overwritten as a return value. %assign regs_used (regs_used + 1) - %elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 + UNIX64 * 2 - %warning "Stack pointer will overwrite register argument" + %if ARCH_X86_64 && regs_used == 7 + %assign regs_used 8 + %elif ARCH_X86_64 == 0 && regs_used == 1 + %assign regs_used 2 + %endif + %endif + %if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3 + ; Ensure that we don't clobber any registers containing arguments. For UNIX64 we also preserve r6 (rax) + ; since it's used as a hidden argument in vararg functions to specify the number of vector registers used. + %assign regs_used 5 + UNIX64 * 3 %endif %endif %endif @@ -396,10 +403,10 @@ DECLARE_REG 8, rsi, 72 DECLARE_REG 9, rbx, 80 DECLARE_REG 10, rbp, 88 -DECLARE_REG 11, R12, 96 -DECLARE_REG 12, R13, 104 -DECLARE_REG 13, R14, 112 -DECLARE_REG 14, R15, 120 +DECLARE_REG 11, R14, 96 +DECLARE_REG 12, R15, 104 +DECLARE_REG 13, R12, 112 +DECLARE_REG 14, R13, 120 %macro PROLOGUE 2-5+ 0 ; #args, #regs, #xmm_regs, [stack_size,] arg_names... %assign num_args %1 @@ -445,45 +452,46 @@ WIN64_PUSH_XMM %endmacro -%macro WIN64_RESTORE_XMM_INTERNAL 1 +%macro WIN64_RESTORE_XMM_INTERNAL 0 %assign %%pad_size 0 %if xmm_regs_used > 8 %assign %%i xmm_regs_used %rep xmm_regs_used-8 %assign %%i %%i-1 - movaps xmm %+ %%i, [%1 + (%%i-8)*16 + stack_size + 32] + movaps xmm %+ %%i, [rsp + (%%i-8)*16 + stack_size + 32] %endrep %endif %if stack_size_padded > 0 %if stack_size > 0 && required_stack_alignment > STACK_ALIGNMENT mov rsp, rstkm %else - add %1, stack_size_padded + add rsp, stack_size_padded %assign %%pad_size stack_size_padded %endif %endif %if xmm_regs_used > 7 - movaps xmm7, [%1 + stack_offset - %%pad_size + 24] + movaps xmm7, [rsp + stack_offset - %%pad_size + 24] %endif %if xmm_regs_used > 6 - movaps xmm6, [%1 + stack_offset - %%pad_size + 8] + movaps xmm6, [rsp + stack_offset - %%pad_size + 8] %endif %endmacro -%macro WIN64_RESTORE_XMM 1 - WIN64_RESTORE_XMM_INTERNAL %1 +%macro WIN64_RESTORE_XMM 0 + WIN64_RESTORE_XMM_INTERNAL %assign stack_offset (stack_offset-stack_size_padded) + %assign stack_size_padded 0 %assign xmm_regs_used 0 %endmacro %define has_epilogue regs_used > 7 || xmm_regs_used > 6 || mmsize == 32 || stack_size > 0 %macro RET 0 - WIN64_RESTORE_XMM_INTERNAL rsp + WIN64_RESTORE_XMM_INTERNAL POP_IF_USED 14, 13, 12, 11, 10, 9, 8, 7 -%if mmsize == 32 - vzeroupper -%endif + %if mmsize == 32 + vzeroupper + %endif AUTO_REP_RET %endmacro @@ -500,10 +508,10 @@ DECLARE_REG 8, R11, 24 DECLARE_REG 9, rbx, 32 DECLARE_REG 10, rbp, 40 -DECLARE_REG 11, R12, 48 -DECLARE_REG 12, R13, 56 -DECLARE_REG 13, R14, 64 -DECLARE_REG 14, R15, 72 +DECLARE_REG 11, R14, 48 +DECLARE_REG 12, R15, 56 +DECLARE_REG 13, R12, 64 +DECLARE_REG 14, R13, 72 %macro PROLOGUE 2-5+ ; #args, #regs, #xmm_regs, [stack_size,] arg_names... %assign num_args %1 @@ -520,17 +528,17 @@ %define has_epilogue regs_used > 9 || mmsize == 32 || stack_size > 0 %macro RET 0 -%if stack_size_padded > 0 -%if required_stack_alignment > STACK_ALIGNMENT - mov rsp, rstkm -%else - add rsp, stack_size_padded -%endif -%endif + %if stack_size_padded > 0 + %if required_stack_alignment > STACK_ALIGNMENT + mov rsp, rstkm + %else + add rsp, stack_size_padded + %endif + %endif POP_IF_USED 14, 13, 12, 11, 10, 9 -%if mmsize == 32 - vzeroupper -%endif + %if mmsize == 32 + vzeroupper + %endif AUTO_REP_RET %endmacro @@ -576,29 +584,29 @@ %define has_epilogue regs_used > 3 || mmsize == 32 || stack_size > 0 %macro RET 0 -%if stack_size_padded > 0 -%if required_stack_alignment > STACK_ALIGNMENT - mov rsp, rstkm -%else - add rsp, stack_size_padded -%endif -%endif + %if stack_size_padded > 0 + %if required_stack_alignment > STACK_ALIGNMENT + mov rsp, rstkm + %else + add rsp, stack_size_padded + %endif + %endif POP_IF_USED 6, 5, 4, 3 -%if mmsize == 32 - vzeroupper
View file
x265_2.4.tar.gz/source/dynamicHDR10/BasicStructures.h -> x265_2.5.tar.gz/source/dynamicHDR10/BasicStructures.h
Changed
@@ -35,16 +35,26 @@ float maxRLuminance = 0.0; float maxGLuminance = 0.0; float maxBLuminance = 0.0; - int order; + int order = 0; std::vector<unsigned int> percentiles; }; struct BezierCurveData { - int order; - int sPx; - int sPy; + int order = 0; + int sPx = 0; + int sPy = 0; std::vector<int> coeff; }; +struct PercentileLuminance{ + + float averageLuminance = 0.0; + float maxRLuminance = 0.0; + float maxGLuminance = 0.0; + float maxBLuminance = 0.0; + int order = 0; + std::vector<unsigned int> percentiles; +}; + #endif // BASICSTRUCTURES_H
View file
x265_2.4.tar.gz/source/dynamicHDR10/CMakeLists.txt -> x265_2.5.tar.gz/source/dynamicHDR10/CMakeLists.txt
Changed
@@ -1,8 +1,8 @@ # vim: syntax=cmake -if(ENABLE_DYNAMIC_HDR10) +if(ENABLE_HDR10_PLUS) add_library(dynamicHDR10 OBJECT - BasicStructures.cpp BasicStructures.h + BasicStructures.h json11/json11.cpp json11/json11.h JsonHelper.cpp JsonHelper.h metadataFromJson.cpp metadataFromJson.h @@ -10,7 +10,6 @@ hdr10plus.h api.cpp ) -else() cmake_minimum_required (VERSION 2.8.11) project(dynamicHDR10) include(CheckIncludeFiles) @@ -150,26 +149,5 @@ option(ENABLE_SHARED "Build shared library" OFF) -if(ENABLE_SHARED) - add_library(dynamicHDR10 SHARED - json11/json11.cpp json11/json11.h - BasicStructures.cpp BasicStructures.h - JsonHelper.cpp JsonHelper.h - metadataFromJson.cpp metadataFromJson.h - SeiMetadataDictionary.cpp SeiMetadataDictionary.h - hdr10plus.h api.cpp ) -else() - add_library(dynamicHDR10 STATIC - json11/json11.cpp json11/json11.h - BasicStructures.cpp BasicStructures.h - JsonHelper.cpp JsonHelper.h - metadataFromJson.cpp metadataFromJson.h - SeiMetadataDictionary.cpp SeiMetadataDictionary.h - hdr10plus.h api.cpp ) -endif() - -install (TARGETS dynamicHDR10 - LIBRARY DESTINATION ${LIB_INSTALL_DIR} - ARCHIVE DESTINATION ${LIB_INSTALL_DIR}) install(FILES hdr10plus.h DESTINATION include) endif() \ No newline at end of file
View file
x265_2.4.tar.gz/source/dynamicHDR10/json11/json11.cpp -> x265_2.5.tar.gz/source/dynamicHDR10/json11/json11.cpp
Changed
@@ -26,6 +26,12 @@ #include <cstdio> #include <limits> +#if _MSC_VER +#pragma warning(disable: 4510) //const member cannot be default initialized +#pragma warning(disable: 4512) //assignment operator could not be generated +#pragma warning(disable: 4610) //const member cannot be default initialized +#endif + namespace json11 { static const int max_depth = 200; @@ -435,7 +441,7 @@ char get_next_token() { consume_garbage(); if (i == str.size()) - return fail("unexpected end of input", 0); + return fail("unexpected end of input", '0'); return str[i++]; } @@ -472,7 +478,7 @@ string parse_string() { string out; long last_escaped_codepoint = -1; - while (true) { + for (;;) { if (i == str.size()) return fail("unexpected end of input in string", ""); @@ -665,7 +671,7 @@ if (ch == '}') return data; - while (1) { + for (;;) { if (ch != '"') return fail("expected '\"' in object, got " + esc(ch)); @@ -698,7 +704,7 @@ if (ch == ']') return data; - while (1) { + for (;;) { i--; data.push_back(parse_json(depth + 1)); if (failed)
View file
x265_2.4.tar.gz/source/dynamicHDR10/metadataFromJson.cpp -> x265_2.5.tar.gz/source/dynamicHDR10/metadataFromJson.cpp
Changed
@@ -168,7 +168,7 @@ { int payloadBytes = 1; - for(;payload > 0xFF; payload -= 0xFF, ++payloadBytes); + for(;payload >= 0xFF; payload -= 0xFF, ++payloadBytes); if(payloadBytes > 1) {
View file
x265_2.4.tar.gz/source/encoder/CMakeLists.txt -> x265_2.5.tar.gz/source/encoder/CMakeLists.txt
Changed
@@ -43,4 +43,5 @@ reference.cpp reference.h encoder.cpp encoder.h api.cpp - weightPrediction.cpp) + weightPrediction.cpp + ../x265-extras.cpp ../x265-extras.h)
View file
x265_2.4.tar.gz/source/encoder/analysis.cpp -> x265_2.5.tar.gz/source/encoder/analysis.cpp
Changed
@@ -75,6 +75,7 @@ m_reuseInterDataCTU = NULL; m_reuseRef = NULL; m_bHD = false; + m_evaluateInter = 0; } bool Analysis::create(ThreadLocalData *tld) @@ -89,19 +90,19 @@ cacheCost = X265_MALLOC(uint64_t, costArrSize); int csp = m_param->internalCsp; - uint32_t cuSize = g_maxCUSize; + uint32_t cuSize = m_param->maxCUSize; bool ok = true; - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++, cuSize >>= 1) + for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++, cuSize >>= 1) { ModeDepth &md = m_modeDepth[depth]; - md.cuMemPool.create(depth, csp, MAX_PRED_TYPES); + md.cuMemPool.create(depth, csp, MAX_PRED_TYPES, *m_param); ok &= md.fencYuv.create(cuSize, csp); for (int j = 0; j < MAX_PRED_TYPES; j++) { - md.pred[j].cu.initialize(md.cuMemPool, depth, csp, j); + md.pred[j].cu.initialize(md.cuMemPool, depth, *m_param, j); ok &= md.pred[j].predYuv.create(cuSize, csp); ok &= md.pred[j].reconYuv.create(cuSize, csp); md.pred[j].fencYuv = &md.fencYuv; @@ -115,7 +116,7 @@ void Analysis::destroy() { - for (uint32_t i = 0; i <= g_maxCUDepth; i++) + for (uint32_t i = 0; i <= m_param->maxCUDepth; i++) { m_modeDepth[i].cuMemPool.destroy(); m_modeDepth[i].fencYuv.destroy(); @@ -150,6 +151,41 @@ calculateNormFactor(ctu, qp); uint32_t numPartition = ctu.m_numPartitions; + if (m_param->bCTUInfo && (*m_frame->m_ctuInfo + ctu.m_cuAddr)) + { + x265_ctu_info_t* ctuTemp = *m_frame->m_ctuInfo + ctu.m_cuAddr; + if (ctuTemp->ctuPartitions) + { + int32_t depthIdx = 0; + uint32_t maxNum8x8Partitions = 64; + uint8_t* depthInfoPtr = m_frame->m_addOnDepth[ctu.m_cuAddr]; + uint8_t* contentInfoPtr = m_frame->m_addOnCtuInfo[ctu.m_cuAddr]; + int* prevCtuInfoChangePtr = m_frame->m_addOnPrevChange[ctu.m_cuAddr]; + do + { + uint8_t depth = (uint8_t)ctuTemp->ctuPartitions[depthIdx]; + uint8_t content = (uint8_t)(*((int32_t *)ctuTemp->ctuInfo + depthIdx)); + int prevCtuInfoChange = m_frame->m_prevCtuInfoChange[ctu.m_cuAddr * maxNum8x8Partitions + depthIdx]; + memset(depthInfoPtr, depth, sizeof(uint8_t) * numPartition >> 2 * depth); + memset(contentInfoPtr, content, sizeof(uint8_t) * numPartition >> 2 * depth); + memset(prevCtuInfoChangePtr, 0, sizeof(int) * numPartition >> 2 * depth); + for (uint32_t l = 0; l < numPartition >> 2 * depth; l++) + prevCtuInfoChangePtr[l] = prevCtuInfoChange; + depthInfoPtr += ctu.m_numPartitions >> 2 * depth; + contentInfoPtr += ctu.m_numPartitions >> 2 * depth; + prevCtuInfoChangePtr += ctu.m_numPartitions >> 2 * depth; + depthIdx++; + } while (ctuTemp->ctuPartitions[depthIdx] != 0); + + m_additionalCtuInfo = m_frame->m_addOnCtuInfo[ctu.m_cuAddr]; + m_prevCtuInfoChange = m_frame->m_addOnPrevChange[ctu.m_cuAddr]; + memcpy(ctu.m_cuDepth, m_frame->m_addOnDepth[ctu.m_cuAddr], sizeof(uint8_t) * numPartition); + //Calculate log2CUSize from depth + for (uint32_t i = 0; i < cuGeom.numPartitions; i++) + ctu.m_log2CUSize[i] = (uint8_t)m_param->maxLog2CUSize - ctu.m_cuDepth[i]; + } + } + if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead) { m_multipassAnalysis = (analysis2PassFrameData*)m_frame->m_analysis2Pass.analysisFramedata; @@ -167,19 +203,19 @@ } } - if (m_param->analysisMode && m_slice->m_sliceType != I_SLICE && m_param->analysisRefineLevel > 1 && m_param->analysisRefineLevel < 10) + if (m_param->analysisReuseMode && m_slice->m_sliceType != I_SLICE && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel < 10) { int numPredDir = m_slice->isInterP() ? 1 : 2; m_reuseInterDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData; m_reuseRef = &m_reuseInterDataCTU->ref[ctu.m_cuAddr * X265_MAX_PRED_MODE_PER_CTU * numPredDir]; m_reuseDepth = &m_reuseInterDataCTU->depth[ctu.m_cuAddr * ctu.m_numPartitions]; m_reuseModes = &m_reuseInterDataCTU->modes[ctu.m_cuAddr * ctu.m_numPartitions]; - if (m_param->analysisRefineLevel > 4) + if (m_param->analysisReuseLevel > 4) { m_reusePartSize = &m_reuseInterDataCTU->partSize[ctu.m_cuAddr * ctu.m_numPartitions]; m_reuseMergeFlag = &m_reuseInterDataCTU->mergeFlag[ctu.m_cuAddr * ctu.m_numPartitions]; } - if (m_param->analysisMode == X265_ANALYSIS_SAVE) + if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE) for (int i = 0; i < X265_MAX_PRED_MODE_PER_CTU * numPredDir; i++) m_reuseRef[i] = -1; } @@ -188,7 +224,7 @@ if (m_slice->m_sliceType == I_SLICE) { analysis_intra_data* intraDataCTU = (analysis_intra_data*)m_frame->m_analysisData.intraData; - if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1) + if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1) { memcpy(ctu.m_cuDepth, &intraDataCTU->depth[ctu.m_cuAddr * numPartition], sizeof(uint8_t) * numPartition); memcpy(ctu.m_lumaIntraDir, &intraDataCTU->modes[ctu.m_cuAddr * numPartition], sizeof(uint8_t) * numPartition); @@ -200,8 +236,8 @@ else { if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE && - ctu.m_cuPelX / g_maxCUSize >= frame.m_encData->m_pir.pirStartCol - && ctu.m_cuPelX / g_maxCUSize < frame.m_encData->m_pir.pirEndCol) + ctu.m_cuPelX / m_param->maxCUSize >= frame.m_encData->m_pir.pirStartCol + && ctu.m_cuPelX / m_param->maxCUSize < frame.m_encData->m_pir.pirEndCol) compressIntraCU(ctu, cuGeom, qp); else if (!m_param->rdLevel) { @@ -214,7 +250,7 @@ /* generate residual for entire CTU at once and copy to reconPic */ encodeResidue(ctu, cuGeom); } - else if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel == 10) + else if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel == 10) { analysis_inter_data* interDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData; int posCTU = ctu.m_cuAddr * numPartition; @@ -229,7 +265,7 @@ } //Calculate log2CUSize from depth for (uint32_t i = 0; i < cuGeom.numPartitions; i++) - ctu.m_log2CUSize[i] = (uint8_t)g_maxLog2CUSize - ctu.m_cuDepth[i]; + ctu.m_log2CUSize[i] = (uint8_t)m_param->maxLog2CUSize - ctu.m_cuDepth[i]; qprdRefine (ctu, cuGeom, qp, qp); return *m_modeDepth[0].bestMode; @@ -245,9 +281,69 @@ if (m_param->bEnableRdRefine || m_param->bOptCUDeltaQP) qprdRefine(ctu, cuGeom, qp, qp); + if (m_param->csvLogLevel >= 2) + collectPUStatistics(ctu, cuGeom); + return *m_modeDepth[0].bestMode; } +void Analysis::collectPUStatistics(const CUData& ctu, const CUGeom& cuGeom) +{ + uint8_t depth = 0; + uint8_t partSize = 0; + for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2)) + { + depth = ctu.m_cuDepth[absPartIdx]; + partSize = ctu.m_partSize[absPartIdx]; + uint32_t numPU = nbPartsTable[(int)partSize]; + int shift = 2 * (m_param->maxCUDepth + 1 - depth); + for (uint32_t puIdx = 0; puIdx < numPU; puIdx++) + { + PredictionUnit pu(ctu, cuGeom, puIdx); + int puabsPartIdx = ctu.getPUOffset(puIdx, absPartIdx); + int mode = 1; + if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_Nx2N || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxN) + mode = 2; + else if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxnU || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxnD || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_nLx2N || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_nRx2N) + mode = 3; + + if (ctu.m_predMode[puabsPartIdx + absPartIdx] == MODE_SKIP) + { + ctu.m_encData->m_frameStats.cntSkipPu[depth] += (uint64_t)(1 << shift); + ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift); + } + else if (ctu.m_predMode[puabsPartIdx + absPartIdx] == MODE_INTRA) + { + if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_NxN) + { + ctu.m_encData->m_frameStats.cnt4x4++; + ctu.m_encData->m_frameStats.totalPu[4]++; + } + else + { + ctu.m_encData->m_frameStats.cntIntraPu[depth] += (uint64_t)(1 << shift); + ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift); + } + } + else if (mode == 3) + { + ctu.m_encData->m_frameStats.cntAmp[depth] += (uint64_t)(1 << shift); + ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift); + break; + } + else + {
View file
x265_2.4.tar.gz/source/encoder/analysis.h -> x265_2.5.tar.gz/source/encoder/analysis.h
Changed
@@ -137,6 +137,10 @@ int* m_multipassMvpIdx[2]; int32_t* m_multipassRef[2]; uint8_t* m_multipassModes; + + uint8_t m_evaluateInter; + uint8_t* m_additionalCtuInfo; + int* m_prevCtuInfoChange; /* refine RD based on QP for rd-levels 5 and 6 */ void qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp, int32_t lqp); @@ -178,6 +182,9 @@ void calculateNormFactor(CUData& ctu, int qp); void normFactor(const pixel* src, uint32_t blockSize, CUData& ctu, int qp, TextType ttype); + + void collectPUStatistics(const CUData& ctu, const CUGeom& cuGeom); + /* check whether current mode is the new best */ inline void checkBestMode(Mode& mode, uint32_t depth) { @@ -190,6 +197,7 @@ else md.bestMode = &mode; } + int findSameContentRefCount(const CUData& parentCTU, const CUGeom& cuGeom); }; struct ThreadLocalData
View file
x265_2.4.tar.gz/source/encoder/api.cpp -> x265_2.5.tar.gz/source/encoder/api.cpp
Changed
@@ -30,6 +30,7 @@ #include "level.h" #include "nal.h" #include "bitcost.h" +#include "x265-extras.h" /* multilib namespace reflectors */ #if LINKED_8BIT @@ -96,9 +97,6 @@ if (x265_check_params(param)) goto fail; - if (x265_set_globals(param)) - goto fail; - encoder = new Encoder; if (!param->rc.bEnableSlowFirstPass) PARAM_NS::x265_param_apply_fastfirstpass(param); @@ -119,6 +117,17 @@ } encoder->create(); + /* Try to open CSV file handle */ + if (encoder->m_param->csvfn) + { + encoder->m_param->csvfpt = x265_csvlog_open(*encoder->m_param, encoder->m_param->csvfn, encoder->m_param->csvLogLevel); + if (!encoder->m_param->csvfpt) + { + x265_log(encoder->m_param, X265_LOG_ERROR, "Unable to open CSV log file <%s>, aborting\n", encoder->m_param->csvfn); + encoder->m_aborted = true; + } + } + encoder->m_latestParam = latestParam; memcpy(latestParam, param, sizeof(x265_param)); if (encoder->m_aborted) @@ -144,7 +153,10 @@ if (encoder->m_param->rc.bStatRead && encoder->m_param->bMultiPassOptRPS) { if (!encoder->computeSPSRPSIndex()) + { + encoder->m_aborted = true; return -1; + } } encoder->getStreamHeaders(encoder->m_nalList, sbacCoder, bs); *pp_nal = &encoder->m_nalList.m_nal[0]; @@ -152,6 +164,11 @@ return encoder->m_nalList.m_occupancy; } + if (enc) + { + Encoder *encoder = static_cast<Encoder*>(enc); + encoder->m_aborted = true; + } return -1; } @@ -251,6 +268,12 @@ else if (pi_nal) *pi_nal = 0; + if (numEncoded && encoder->m_param->csvLogLevel) + x265_csvlog_frame(encoder->m_param->csvfpt, *encoder->m_param, *pic_out, encoder->m_param->csvLogLevel); + + if (numEncoded < 0) + encoder->m_aborted = true; + return numEncoded; } @@ -263,12 +286,17 @@ } } -void x265_encoder_log(x265_encoder* enc, int, char **) +void x265_encoder_log(x265_encoder* enc, int argc, char **argv) { if (enc) { Encoder *encoder = static_cast<Encoder*>(enc); - x265_log(encoder->m_param, X265_LOG_WARNING, "x265_encoder_log is now deprecated\n"); + x265_stats stats; + int padx = encoder->m_sps.conformanceWindow.rightOffset; + int pady = encoder->m_sps.conformanceWindow.bottomOffset; + encoder->fetchStats(&stats, sizeof(stats)); + const x265_api * api = x265_api_get(0); + x265_csvlog_encode(encoder->m_param->csvfpt, api->version_str, *encoder->m_param, padx, pady, stats, encoder->m_param->csvLogLevel, argc, argv); } } @@ -282,7 +310,6 @@ encoder->printSummary(); encoder->destroy(); delete encoder; - ATOMIC_DEC(&g_ctuSizeConfigured); } } @@ -295,14 +322,18 @@ encoder->m_bQueuedIntraRefresh = 1; return 0; } +int x265_encoder_ctu_info(x265_encoder *enc, int poc, x265_ctu_info_t** ctu) +{ + if (!ctu || !enc) + return -1; + Encoder* encoder = static_cast<Encoder*>(enc); + encoder->copyCtuInfo(ctu, poc); + return 0; +} void x265_cleanup(void) { - if (!g_ctuSizeConfigured) - { - BitCost::destroy(); - CUData::s_partSet[0] = NULL; /* allow CUData to adjust to new CTU size */ - } + BitCost::destroy(); } x265_picture *x265_picture_alloc() @@ -321,14 +352,14 @@ pic->userSEI.payloads = NULL; pic->userSEI.numPayloads = 0; - if (param->analysisMode) + if (param->analysisReuseMode) { - uint32_t widthInCU = (param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize; - uint32_t heightInCU = (param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize; + uint32_t widthInCU = (param->sourceWidth + param->maxCUSize - 1) >> param->maxLog2CUSize; + uint32_t heightInCU = (param->sourceHeight + param->maxCUSize - 1) >> param->maxLog2CUSize; uint32_t numCUsInFrame = widthInCU * heightInCU; pic->analysisData.numCUsInFrame = numCUsInFrame; - pic->analysisData.numPartitions = NUM_4x4_PARTITIONS; + pic->analysisData.numPartitions = param->num4x4Partitions; } } @@ -372,6 +403,7 @@ sizeof(x265_frame_stats), &x265_encoder_intra_refresh, + &x265_encoder_ctu_info, }; typedef const x265_api* (*api_get_func)(int bitDepth);
View file
x265_2.4.tar.gz/source/encoder/dpb.cpp -> x265_2.5.tar.gz/source/encoder/dpb.cpp
Changed
@@ -105,6 +105,23 @@ } } + if (curFrame->m_ctuInfo != NULL) + { + uint32_t widthInCU = (curFrame->m_param->sourceWidth + curFrame->m_param->maxCUSize - 1) >> curFrame->m_param->maxLog2CUSize; + uint32_t heightInCU = (curFrame->m_param->sourceHeight + curFrame->m_param->maxCUSize - 1) >> curFrame->m_param->maxLog2CUSize; + uint32_t numCUsInFrame = widthInCU * heightInCU; + for (uint32_t i = 0; i < numCUsInFrame; i++) + { + X265_FREE((*curFrame->m_ctuInfo + i)->ctuInfo); + (*curFrame->m_ctuInfo + i)->ctuInfo = NULL; + } + X265_FREE(*curFrame->m_ctuInfo); + *(curFrame->m_ctuInfo) = NULL; + X265_FREE(curFrame->m_ctuInfo); + curFrame->m_ctuInfo = NULL; + X265_FREE(curFrame->m_prevCtuInfoChange); + curFrame->m_prevCtuInfoChange = NULL; + } curFrame->m_encData = NULL; curFrame->m_reconPic = NULL; } @@ -187,7 +204,7 @@ } // Disable Loopfilter in bound area, because we will do slice-parallelism in future - slice->m_sLFaseFlag = (g_maxSlices > 1) ? false : ((SLFASE_CONSTANT & (1 << (pocCurr % 31))) > 0); + slice->m_sLFaseFlag = (newFrame->m_param->maxSlices > 1) ? false : ((SLFASE_CONSTANT & (1 << (pocCurr % 31))) > 0); /* Increment reference count of all motion-referenced frames to prevent them * from being recycled. These counts are decremented at the end of
View file
x265_2.4.tar.gz/source/encoder/encoder.cpp -> x265_2.5.tar.gz/source/encoder/encoder.cpp
Changed
@@ -86,8 +86,10 @@ m_frameEncoder[i] = NULL; MotionEstimate::initScales(); -#if ENABLE_DYNAMIC_HDR10 +#if ENABLE_HDR10_PLUS m_hdr10plus_api = hdr10plus_api_get(); + numCimInfo = 0; + cim = NULL; #endif m_prevTonemapPayload.payload = NULL; @@ -132,26 +134,19 @@ if (!p->bEnableWavefront && !p->bDistributeModeAnalysis && !p->bDistributeMotionEstimation && !p->lookaheadSlices) allowPools = false; - if (!p->frameNumThreads) - { - // auto-detect frame threads - int cpuCount = ThreadPool::getCpuCount(); - if (!p->bEnableWavefront) - p->frameNumThreads = X265_MIN3(cpuCount, (rows + 1) / 2, X265_MAX_FRAME_THREADS); - else if (cpuCount >= 32) - p->frameNumThreads = (p->sourceHeight > 2000) ? 8 : 6; // dual-socket 10-core IvyBridge or higher - else if (cpuCount >= 16) - p->frameNumThreads = 5; // 8 HT cores, or dual socket - else if (cpuCount >= 8) - p->frameNumThreads = 3; // 4 HT cores - else if (cpuCount >= 4) - p->frameNumThreads = 2; // Dual or Quad core - else - p->frameNumThreads = 1; - } m_numPools = 0; if (allowPools) m_threadPool = ThreadPool::allocThreadPools(p, m_numPools, 0); + else + { + if (!p->frameNumThreads) + { + // auto-detect frame threads + int cpuCount = ThreadPool::getCpuCount(); + ThreadPool::getFrameThreadsCount(p, cpuCount); + } + } + if (!m_numPools) { // issue warnings if any of these features were requested @@ -320,8 +315,8 @@ else m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc); - int numRows = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize; - int numCols = (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize; + int numRows = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize; + int numCols = (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize; for (int i = 0; i < m_param->frameNumThreads; i++) { if (!m_frameEncoder[i]->init(this, numRows, numCols)) @@ -346,12 +341,12 @@ initRefIdx(); - if (m_param->analysisMode) + if (m_param->analysisReuseMode) { - const char* name = m_param->analysisFileName; + const char* name = m_param->analysisReuseFileName; if (!name) name = defaultAnalysisFileName; - const char* mode = m_param->analysisMode == X265_ANALYSIS_LOAD ? "rb" : "wb"; + const char* mode = m_param->analysisReuseMode == X265_ANALYSIS_LOAD ? "rb" : "wb"; m_analysisFile = x265_fopen(name, mode); if (!m_analysisFile) { @@ -362,7 +357,7 @@ if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion) { - const char* name = m_param->analysisFileName; + const char* name = m_param->analysisReuseFileName; if (!name) name = defaultAnalysisFileName; if (m_param->rc.bStatWrite) @@ -431,6 +426,10 @@ void Encoder::destroy() { +#if ENABLE_HDR10_PLUS + m_hdr10plus_api->hdr10plus_clear_movie(cim, numCimInfo); +#endif + if (m_exportedPic) { ATOMIC_DEC(&m_exportedPic->m_countRefEncoders); @@ -482,7 +481,7 @@ { int bError = 1; fclose(m_analysisFileOut); - const char* name = m_param->analysisFileName; + const char* name = m_param->analysisReuseFileName; if (!name) name = defaultAnalysisFileName; char* temp = strcatFilename(name, ".temp"); @@ -499,11 +498,14 @@ } if (m_param) { + if (m_param->csvfpt) + fclose(m_param->csvfpt); /* release string arguments that were strdup'd */ free((char*)m_param->rc.lambdaFileName); free((char*)m_param->rc.statFileName); - free((char*)m_param->analysisFileName); + free((char*)m_param->analysisReuseFileName); free((char*)m_param->scalingLists); + free((char*)m_param->csvfn); free((char*)m_param->numaPools); free((char*)m_param->masteringDisplayColorVolume); free((char*)m_param->toneMapFile); @@ -518,7 +520,7 @@ FrameEncoder *encoder = m_frameEncoder[i]; if (encoder->m_rce.isActive && encoder->m_rce.poc != rc->m_curSlice->m_poc) { - int64_t bits = (int64_t) X265_MAX(encoder->m_rce.frameSizeEstimated, encoder->m_rce.frameSizePlanned); + int64_t bits = m_param->rc.bEnableConstVbv ? (int64_t)encoder->m_rce.frameSizePlanned : (int64_t)X265_MAX(encoder->m_rce.frameSizeEstimated, encoder->m_rce.frameSizePlanned); rc->m_bufferFill -= bits; rc->m_bufferFill = X265_MAX(rc->m_bufferFill, 0); rc->m_bufferFill += encoder->m_rce.bufferRate; @@ -593,6 +595,8 @@ if (m_exportedPic) { + if (!m_param->bUseAnalysisFile && m_param->analysisReuseMode == X265_ANALYSIS_SAVE) + freeAnalysis(&m_exportedPic->m_analysisData); ATOMIC_DEC(&m_exportedPic->m_countRefEncoders); m_exportedPic = NULL; m_dpb->recycleUnreferenced(); @@ -601,16 +605,22 @@ { x265_sei_payload toneMap; toneMap.payload = NULL; -#if ENABLE_DYNAMIC_HDR10 +#if ENABLE_HDR10_PLUS if (m_bToneMap) { - uint8_t *cim = NULL; - if (m_hdr10plus_api->hdr10plus_json_to_frame_cim(m_param->toneMapFile, pic_in->poc, cim)) - { - toneMap.payload = (uint8_t*)x265_malloc(sizeof(uint8_t) * cim[0]); - toneMap.payloadSize = cim[0]; + if (pic_in->poc == 0) + numCimInfo = m_hdr10plus_api->hdr10plus_json_to_movie_cim(m_param->toneMapFile, cim); + if (pic_in->poc < numCimInfo) + { + int32_t i = 0; + toneMap.payloadSize = 0; + while (cim[pic_in->poc][i] == 0xFF) + toneMap.payloadSize += cim[pic_in->poc][i++]; + toneMap.payloadSize += cim[pic_in->poc][i++]; + + toneMap.payload = (uint8_t*)x265_malloc(sizeof(uint8_t) * toneMap.payloadSize); toneMap.payloadType = USER_DATA_REGISTERED_ITU_T_T35; - memcpy(toneMap.payload, cim, toneMap.payloadSize); + memcpy(toneMap.payload, cim[pic_in->poc] + i, toneMap.payloadSize); } } #endif @@ -708,7 +718,7 @@ for (int i = 0; i < numPayloads; i++) { x265_sei_payload input; - if (i == (numPayloads - 1)) + if ((i == (numPayloads - 1)) && toneMapEnable) input = toneMap; else input = pic_in->userSEI.payloads[i]; @@ -754,24 +764,40 @@ /* In analysisSave mode, x265_analysis_data is allocated in pic_in and inFrame points to this */ /* Load analysis data before lookahead->addPicture, since sliceType has been decided */ - if (m_param->analysisMode == X265_ANALYSIS_LOAD) + if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD) { - x265_picture* inputPic = const_cast<x265_picture*>(pic_in); /* readAnalysisFile reads analysis data for the frame and allocates memory based on slicetype */ - readAnalysisFile(&inputPic->analysisData, inFrame->m_poc); - inFrame->m_analysisData.poc = inFrame->m_poc; - inFrame->m_analysisData.sliceType = inputPic->analysisData.sliceType; - inFrame->m_analysisData.bScenecut = inputPic->analysisData.bScenecut; - inFrame->m_analysisData.satdCost = inputPic->analysisData.satdCost; - inFrame->m_analysisData.numCUsInFrame = inputPic->analysisData.numCUsInFrame; - inFrame->m_analysisData.numPartitions = inputPic->analysisData.numPartitions; - inFrame->m_analysisData.wt = inputPic->analysisData.wt; - inFrame->m_analysisData.interData = inputPic->analysisData.interData; - inFrame->m_analysisData.intraData = inputPic->analysisData.intraData; - sliceType = inputPic->analysisData.sliceType; + readAnalysisFile(&inFrame->m_analysisData, inFrame->m_poc, pic_in); + sliceType = inFrame->m_analysisData.sliceType;
View file
x265_2.4.tar.gz/source/encoder/encoder.h -> x265_2.5.tar.gz/source/encoder/encoder.h
Changed
@@ -31,11 +31,9 @@ #include "x265.h" #include "nal.h" #include "framedata.h" - -#ifdef ENABLE_DYNAMIC_HDR10 - #include "dynamicHDR10\hdr10plus.h" +#ifdef ENABLE_HDR10_PLUS + #include "dynamicHDR10/hdr10plus.h" #endif - struct x265_encoder {}; namespace X265_NS { // private namespace @@ -178,8 +176,10 @@ int m_bToneMap; // Enables tone-mapping -#ifdef ENABLE_DYNAMIC_HDR10 +#ifdef ENABLE_HDR10_PLUS const hdr10plus_api *m_hdr10plus_api; + uint8_t **cim; + int numCimInfo; #endif x265_sei_payload m_prevTonemapPayload; @@ -187,7 +187,7 @@ Encoder(); ~Encoder() { -#ifdef ENABLE_DYNAMIC_HDR10 +#ifdef ENABLE_HDR10_PLUS if (m_prevTonemapPayload.payload != NULL) X265_FREE(m_prevTonemapPayload.payload); #endif @@ -201,6 +201,8 @@ int reconfigureParam(x265_param* encParam, x265_param* param); + void copyCtuInfo(x265_ctu_info_t** frameCtuInfo, int poc); + void getStreamHeaders(NALList& list, Entropy& sbacCoder, Bitstream& bs); void fetchStats(x265_stats* stats, size_t statsSizeBytes); @@ -223,7 +225,7 @@ void freeAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType); - void readAnalysisFile(x265_analysis_data* analysis, int poc); + void readAnalysisFile(x265_analysis_data* analysis, int poc, const x265_picture* picIn); void writeAnalysisFile(x265_analysis_data* pic, FrameData &curEncData); void readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int poc, int sliceType);
View file
x265_2.4.tar.gz/source/encoder/entropy.cpp -> x265_2.5.tar.gz/source/encoder/entropy.cpp
Changed
@@ -700,7 +700,7 @@ // TODO: Enable when pps_loop_filter_across_slices_enabled_flag==1 // We didn't support filter across slice board, so disable it now - if (g_maxSlices <= 1) + if (encData.m_param->maxSlices <= 1) { bool isSAOEnabled = slice.m_sps->bUseSAO ? saoParam->bSaoFlag[0] || saoParam->bSaoFlag[1] : false; bool isDBFEnabled = !slice.m_pps->bPicDisableDeblockingFilter; @@ -783,7 +783,7 @@ if (cuSplitFlag) codeSplitFlag(ctu, absPartIdx, depth); - if (depth < ctu.m_cuDepth[absPartIdx] && depth < g_maxCUDepth) + if (depth < ctu.m_cuDepth[absPartIdx] && depth < ctu.m_encData->m_param->maxCUDepth) { uint32_t qNumParts = cuGeom.numPartitions >> 2; if (depth == slice->m_pps->maxCuDQPDepth && slice->m_pps->bUseDQP) @@ -863,7 +863,7 @@ case SIZE_nRx2N: bits += bitsCodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 0]); bits += bitsCodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 1]); - if (depth == g_maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3)) + if (depth == cu.m_encData->m_param->maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3)) bits += bitsCodeBin(1, m_contextState[OFF_PART_SIZE_CTX + 2]); if (cu.m_slice->m_sps->maxAMPDepth > depth) { @@ -888,7 +888,7 @@ uint32_t cuAddr = ctu.getSCUAddr() + absPartIdx; X265_CHECK(realEndAddress == slice->realEndAddress(slice->m_endCUAddr), "real end address expected\n"); - uint32_t granularityMask = g_maxCUSize - 1; + uint32_t granularityMask = ctu.m_encData->m_param->maxCUSize - 1; uint32_t cuSize = 1 << ctu.m_log2CUSize[absPartIdx]; uint32_t rpelx = ctu.m_cuPelX + g_zscanToPelX[absPartIdx] + cuSize; uint32_t bpely = ctu.m_cuPelY + g_zscanToPelY[absPartIdx] + cuSize; @@ -902,7 +902,7 @@ { // Encode slice finish uint32_t bTerminateSlice = ctu.m_bLastCuInSlice; - if (cuAddr + (NUM_4x4_PARTITIONS >> (depth << 1)) == realEndAddress) + if (cuAddr + (slice->m_param->num4x4Partitions >> (depth << 1)) == realEndAddress) bTerminateSlice = 1; // The 1-terminating bit is added to all streams, so don't add it here when it's 1. @@ -1512,7 +1512,7 @@ if (cu.isIntra(absPartIdx)) { - if (depth == g_maxCUDepth) + if (depth == cu.m_encData->m_param->maxCUDepth) encodeBin(partSize == SIZE_2Nx2N ? 1 : 0, m_contextState[OFF_PART_SIZE_CTX]); return; } @@ -1541,7 +1541,7 @@ case SIZE_nRx2N: encodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 0]); encodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 1]); - if (depth == g_maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3)) + if (depth == cu.m_encData->m_param->maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3)) encodeBin(1, m_contextState[OFF_PART_SIZE_CTX + 2]); if (cu.m_slice->m_sps->maxAMPDepth > depth) {
View file
x265_2.4.tar.gz/source/encoder/frameencoder.cpp -> x265_2.5.tar.gz/source/encoder/frameencoder.cpp
Changed
@@ -124,7 +124,7 @@ range += !!(m_param->searchMethod < 2); /* diamond/hex range check lag */ range += NTAPS_LUMA / 2; /* subpel filter half-length */ range += 2 + (MotionEstimate::hpelIterationCount(m_param->subpelRefine) + 1) / 2; /* subpel refine steps */ - m_refLagRows = /*(m_param->maxSlices > 1 ? 1 : 0) +*/ 1 + ((range + g_maxCUSize - 1) / g_maxCUSize); + m_refLagRows = /*(m_param->maxSlices > 1 ? 1 : 0) +*/ 1 + ((range + m_param->maxCUSize - 1) / m_param->maxCUSize); // NOTE: 2 times of numRows because both Encoder and Filter in same queue if (!WaveFront::init(m_numRows * 2)) @@ -295,6 +295,11 @@ while (m_threadActive) { + if (m_param->bCTUInfo) + { + while (!m_frame->m_ctuInfo) + m_frame->m_copied.wait(); + } compressFrame(); m_done.trigger(); /* FrameEncoder::getEncodedPicture() blocks for this event */ m_enable.wait(); @@ -383,7 +388,7 @@ bool bUseWeightB = slice->m_sliceType == B_SLICE && slice->m_pps->bUseWeightedBiPred; WeightParam* reuseWP = NULL; - if (m_param->analysisMode && (bUseWeightP || bUseWeightB)) + if (m_param->analysisReuseMode && (bUseWeightP || bUseWeightB)) reuseWP = (WeightParam*)m_frame->m_analysisData.wt; if (bUseWeightP || bUseWeightB) @@ -392,7 +397,7 @@ m_cuStats.countWeightAnalyze++; ScopedElapsedTime time(m_cuStats.weightAnalyzeTime); #endif - if (m_param->analysisMode == X265_ANALYSIS_LOAD) + if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD) { for (int list = 0; list < slice->isInterB() + 1; list++) { @@ -431,7 +436,7 @@ slice->m_refReconPicList[l][ref] = slice->m_refFrameList[l][ref]->m_reconPic; m_mref[l][ref].init(slice->m_refReconPicList[l][ref], w, *m_param); } - if (m_param->analysisMode == X265_ANALYSIS_SAVE && (bUseWeightP || bUseWeightB)) + if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE && (bUseWeightP || bUseWeightB)) { for (int i = 0; i < (m_param->internalCsp != X265_CSP_I400 ? 3 : 1); i++) *(reuseWP++) = slice->m_weightPredTable[l][0][i]; @@ -664,7 +669,7 @@ if (writeSei) { SEICreativeIntentMeta sei; - sei.cim = payload->payload; + sei.m_payload = payload->payload; m_bs.resetBits(); sei.setSize(payload->payloadSize); sei.write(m_bs, *slice->m_sps); @@ -832,7 +837,7 @@ } else if (m_param->decodedPictureHashSEI == 3) { - uint32_t cuHeight = g_maxCUSize; + uint32_t cuHeight = m_param->maxCUSize; m_checksum[0] = 0; @@ -872,43 +877,52 @@ m_frame->m_encData->m_frameStats.percent8x8Inter = (double)totalP / totalCuCount; m_frame->m_encData->m_frameStats.percent8x8Skip = (double)totalSkip / totalCuCount; } - for (uint32_t i = 0; i < m_numRows; i++) + + if (m_param->csvLogLevel >= 1) { - m_frame->m_encData->m_frameStats.cntIntraNxN += m_rows[i].rowStats.cntIntraNxN; - m_frame->m_encData->m_frameStats.totalCu += m_rows[i].rowStats.totalCu; - m_frame->m_encData->m_frameStats.totalCtu += m_rows[i].rowStats.totalCtu; - m_frame->m_encData->m_frameStats.lumaDistortion += m_rows[i].rowStats.lumaDistortion; - m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion; - m_frame->m_encData->m_frameStats.psyEnergy += m_rows[i].rowStats.psyEnergy; - m_frame->m_encData->m_frameStats.ssimEnergy += m_rows[i].rowStats.ssimEnergy; - m_frame->m_encData->m_frameStats.resEnergy += m_rows[i].rowStats.resEnergy; - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++) + for (uint32_t i = 0; i < m_numRows; i++) { - m_frame->m_encData->m_frameStats.cntSkipCu[depth] += m_rows[i].rowStats.cntSkipCu[depth]; - m_frame->m_encData->m_frameStats.cntMergeCu[depth] += m_rows[i].rowStats.cntMergeCu[depth]; - for (int m = 0; m < INTER_MODES; m++) - m_frame->m_encData->m_frameStats.cuInterDistribution[depth][m] += m_rows[i].rowStats.cuInterDistribution[depth][m]; + m_frame->m_encData->m_frameStats.cntIntraNxN += m_rows[i].rowStats.cntIntraNxN; + m_frame->m_encData->m_frameStats.totalCu += m_rows[i].rowStats.totalCu; + m_frame->m_encData->m_frameStats.totalCtu += m_rows[i].rowStats.totalCtu; + m_frame->m_encData->m_frameStats.lumaDistortion += m_rows[i].rowStats.lumaDistortion; + m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion; + m_frame->m_encData->m_frameStats.psyEnergy += m_rows[i].rowStats.psyEnergy; + m_frame->m_encData->m_frameStats.ssimEnergy += m_rows[i].rowStats.ssimEnergy; + m_frame->m_encData->m_frameStats.resEnergy += m_rows[i].rowStats.resEnergy; + for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++) + { + m_frame->m_encData->m_frameStats.cntSkipCu[depth] += m_rows[i].rowStats.cntSkipCu[depth]; + m_frame->m_encData->m_frameStats.cntMergeCu[depth] += m_rows[i].rowStats.cntMergeCu[depth]; + for (int m = 0; m < INTER_MODES; m++) + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][m] += m_rows[i].rowStats.cuInterDistribution[depth][m]; + for (int n = 0; n < INTRA_MODES; n++) + m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] += m_rows[i].rowStats.cuIntraDistribution[depth][n]; + } + } + m_frame->m_encData->m_frameStats.percentIntraNxN = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu; + + for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++) + { + m_frame->m_encData->m_frameStats.percentSkipCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntSkipCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu; + m_frame->m_encData->m_frameStats.percentMergeCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntMergeCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu; for (int n = 0; n < INTRA_MODES; n++) - m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] += m_rows[i].rowStats.cuIntraDistribution[depth][n]; + m_frame->m_encData->m_frameStats.percentIntraDistribution[depth][n] = (double)(m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] * 100) / m_frame->m_encData->m_frameStats.totalCu; + uint64_t cuInterRectCnt = 0; // sum of Nx2N, 2NxN counts + cuInterRectCnt += m_frame->m_encData->m_frameStats.cuInterDistribution[depth][1] + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][2]; + m_frame->m_encData->m_frameStats.percentInterDistribution[depth][0] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][0] * 100) / m_frame->m_encData->m_frameStats.totalCu; + m_frame->m_encData->m_frameStats.percentInterDistribution[depth][1] = (double)(cuInterRectCnt * 100) / m_frame->m_encData->m_frameStats.totalCu; + m_frame->m_encData->m_frameStats.percentInterDistribution[depth][2] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][3] * 100) / m_frame->m_encData->m_frameStats.totalCu; } } - m_frame->m_encData->m_frameStats.avgLumaDistortion = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu; - m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu; - m_frame->m_encData->m_frameStats.avgPsyEnergy = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu; - m_frame->m_encData->m_frameStats.avgSsimEnergy = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu; - m_frame->m_encData->m_frameStats.avgResEnergy = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu; - m_frame->m_encData->m_frameStats.percentIntraNxN = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu; - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++) + + if (m_param->csvLogLevel >= 2) { - m_frame->m_encData->m_frameStats.percentSkipCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntSkipCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu; - m_frame->m_encData->m_frameStats.percentMergeCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntMergeCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu; - for (int n = 0; n < INTRA_MODES; n++) - m_frame->m_encData->m_frameStats.percentIntraDistribution[depth][n] = (double)(m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] * 100) / m_frame->m_encData->m_frameStats.totalCu; - uint64_t cuInterRectCnt = 0; // sum of Nx2N, 2NxN counts - cuInterRectCnt += m_frame->m_encData->m_frameStats.cuInterDistribution[depth][1] + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][2]; - m_frame->m_encData->m_frameStats.percentInterDistribution[depth][0] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][0] * 100) / m_frame->m_encData->m_frameStats.totalCu; - m_frame->m_encData->m_frameStats.percentInterDistribution[depth][1] = (double)(cuInterRectCnt * 100) / m_frame->m_encData->m_frameStats.totalCu; - m_frame->m_encData->m_frameStats.percentInterDistribution[depth][2] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][3] * 100) / m_frame->m_encData->m_frameStats.totalCu; + m_frame->m_encData->m_frameStats.avgLumaDistortion = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu; + m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu; + m_frame->m_encData->m_frameStats.avgPsyEnergy = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu; + m_frame->m_encData->m_frameStats.avgSsimEnergy = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu; + m_frame->m_encData->m_frameStats.avgResEnergy = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu; } m_bs.resetBits(); @@ -1096,7 +1110,7 @@ /* Accumulate CU statistics from each worker thread, we could report * per-frame stats here, but currently we do not. */ for (int i = 0; i < numTLD; i++) - m_cuStats.accumulate(m_tld[i].analysis.m_stats[m_jpId]); + m_cuStats.accumulate(m_tld[i].analysis.m_stats[m_jpId], *m_param); #endif m_endFrameTime = x265_mdate(); @@ -1106,7 +1120,7 @@ { Slice* slice = m_frame->m_encData->m_slice; const uint32_t widthInLCUs = slice->m_sps->numCuInWidth; - const uint32_t lastCUAddr = (slice->m_endCUAddr + NUM_4x4_PARTITIONS - 1) / NUM_4x4_PARTITIONS; + const uint32_t lastCUAddr = (slice->m_endCUAddr + m_param->num4x4Partitions - 1) / m_param->num4x4Partitions; const uint32_t numSubstreams = m_param->bEnableWavefront ? slice->m_sps->numCuInHeight : 1; SAOParam* saoParam = slice->m_sps->bUseSAO ? m_frame->m_encData->m_saoParam : NULL; @@ -1208,7 +1222,6 @@ const uint32_t row = (uint32_t)intRow; CTURow& curRow = m_rows[row]; - tld.analysis.m_param = m_param; if (m_param->bEnableWavefront) { ScopedLock self(curRow.lock); @@ -1241,7 +1254,7 @@ uint32_t maxBlockCols = (m_frame->m_fencPic->m_picWidth + (16 - 1)) / 16; uint32_t maxBlockRows = (m_frame->m_fencPic->m_picHeight + (16 - 1)) / 16; - uint32_t noOfBlocks = g_maxCUSize / 16; + uint32_t noOfBlocks = m_param->maxCUSize / 16; const uint32_t bFirstRowInSlice = ((row == 0) || (m_rows[row - 1].sliceId != curRow.sliceId)) ? 1 : 0; const uint32_t bLastRowInSlice = ((row == m_numRows - 1) || (m_rows[row + 1].sliceId != curRow.sliceId)) ? 1 : 0; const uint32_t sliceId = curRow.sliceId; @@ -1320,8 +1333,8 @@ // TODO: specially case handle on first and last row // Initialize restrict on MV range in slices - tld.analysis.m_sliceMinY = -(int16_t)(rowInSlice * g_maxCUSize * 4) + 3 * 4; - tld.analysis.m_sliceMaxY = (int16_t)((endRowInSlicePlus1 - 1 - row) * (g_maxCUSize * 4) - 4 * 4); + tld.analysis.m_sliceMinY = -(int16_t)(rowInSlice * m_param->maxCUSize * 4) + 3 * 4; + tld.analysis.m_sliceMaxY = (int16_t)((endRowInSlicePlus1 - 1 - row) * (m_param->maxCUSize * 4) - 4 * 4); // Handle single row slice if (tld.analysis.m_sliceMaxY < tld.analysis.m_sliceMinY) @@ -1361,8 +1374,8 @@ cuStat.baseQp = curEncData.m_rowStat[row].rowQp; /* TODO: use defines from slicetype.h for lowres block size */
View file
x265_2.4.tar.gz/source/encoder/framefilter.cpp -> x265_2.5.tar.gz/source/encoder/framefilter.cpp
Changed
@@ -35,107 +35,126 @@ static uint64_t computeSSD(pixel *fenc, pixel *rec, intptr_t stride, uint32_t width, uint32_t height); static float calculateSSIM(pixel *pix1, intptr_t stride1, pixel *pix2, intptr_t stride2, uint32_t width, uint32_t height, void *buf, uint32_t& cnt); -static void integral_init4h(uint32_t *sum, pixel *pix, intptr_t stride) +namespace X265_NS { - int32_t v = pix[0] + pix[1] + pix[2] + pix[3]; - for (int16_t x = 0; x < stride - 4; x++) + static void integral_init4h_c(uint32_t *sum, pixel *pix, intptr_t stride) { - sum[x] = v + sum[x - stride]; - v += pix[x + 4] - pix[x]; + int32_t v = pix[0] + pix[1] + pix[2] + pix[3]; + for (int16_t x = 0; x < stride - 4; x++) + { + sum[x] = v + sum[x - stride]; + v += pix[x + 4] - pix[x]; + } } -} -static void integral_init8h(uint32_t *sum, pixel *pix, intptr_t stride) -{ - int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7]; - for (int16_t x = 0; x < stride - 8; x++) + static void integral_init8h_c(uint32_t *sum, pixel *pix, intptr_t stride) { - sum[x] = v + sum[x - stride]; - v += pix[x + 8] - pix[x]; + int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7]; + for (int16_t x = 0; x < stride - 8; x++) + { + sum[x] = v + sum[x - stride]; + v += pix[x + 8] - pix[x]; + } } -} -static void integral_init12h(uint32_t *sum, pixel *pix, intptr_t stride) -{ - int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] + - pix[8] + pix[9] + pix[10] + pix[11]; - for (int16_t x = 0; x < stride - 12; x++) + static void integral_init12h_c(uint32_t *sum, pixel *pix, intptr_t stride) { - sum[x] = v + sum[x - stride]; - v += pix[x + 12] - pix[x]; + int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] + + pix[8] + pix[9] + pix[10] + pix[11]; + for (int16_t x = 0; x < stride - 12; x++) + { + sum[x] = v + sum[x - stride]; + v += pix[x + 12] - pix[x]; + } } -} -static void integral_init16h(uint32_t *sum, pixel *pix, intptr_t stride) -{ - int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] + - pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15]; - for (int16_t x = 0; x < stride - 16; x++) + static void integral_init16h_c(uint32_t *sum, pixel *pix, intptr_t stride) { - sum[x] = v + sum[x - stride]; - v += pix[x + 16] - pix[x]; + int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] + + pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15]; + for (int16_t x = 0; x < stride - 16; x++) + { + sum[x] = v + sum[x - stride]; + v += pix[x + 16] - pix[x]; + } } -} -static void integral_init24h(uint32_t *sum, pixel *pix, intptr_t stride) -{ - int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] + - pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] + - pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23]; - for (int16_t x = 0; x < stride - 24; x++) + static void integral_init24h_c(uint32_t *sum, pixel *pix, intptr_t stride) { - sum[x] = v + sum[x - stride]; - v += pix[x + 24] - pix[x]; + int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] + + pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] + + pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23]; + for (int16_t x = 0; x < stride - 24; x++) + { + sum[x] = v + sum[x - stride]; + v += pix[x + 24] - pix[x]; + } } -} -static void integral_init32h(uint32_t *sum, pixel *pix, intptr_t stride) -{ - int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] + - pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] + - pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23] + - pix[24] + pix[25] + pix[26] + pix[27] + pix[28] + pix[29] + pix[30] + pix[31]; - for (int16_t x = 0; x < stride - 32; x++) + static void integral_init32h_c(uint32_t *sum, pixel *pix, intptr_t stride) { - sum[x] = v + sum[x - stride]; - v += pix[x + 32] - pix[x]; + int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] + + pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] + + pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23] + + pix[24] + pix[25] + pix[26] + pix[27] + pix[28] + pix[29] + pix[30] + pix[31]; + for (int16_t x = 0; x < stride - 32; x++) + { + sum[x] = v + sum[x - stride]; + v += pix[x + 32] - pix[x]; + } } -} -static void integral_init4v(uint32_t *sum4, intptr_t stride) -{ - for (int x = 0; x < stride; x++) - sum4[x] = sum4[x + 4 * stride] - sum4[x]; -} + static void integral_init4v_c(uint32_t *sum4, intptr_t stride) + { + for (int x = 0; x < stride; x++) + sum4[x] = sum4[x + 4 * stride] - sum4[x]; + } -static void integral_init8v(uint32_t *sum8, intptr_t stride) -{ - for (int x = 0; x < stride; x++) - sum8[x] = sum8[x + 8 * stride] - sum8[x]; -} + static void integral_init8v_c(uint32_t *sum8, intptr_t stride) + { + for (int x = 0; x < stride; x++) + sum8[x] = sum8[x + 8 * stride] - sum8[x]; + } -static void integral_init12v(uint32_t *sum12, intptr_t stride) -{ - for (int x = 0; x < stride; x++) - sum12[x] = sum12[x + 12 * stride] - sum12[x]; -} + static void integral_init12v_c(uint32_t *sum12, intptr_t stride) + { + for (int x = 0; x < stride; x++) + sum12[x] = sum12[x + 12 * stride] - sum12[x]; + } -static void integral_init16v(uint32_t *sum16, intptr_t stride) -{ - for (int x = 0; x < stride; x++) - sum16[x] = sum16[x + 16 * stride] - sum16[x]; -} + static void integral_init16v_c(uint32_t *sum16, intptr_t stride) + { + for (int x = 0; x < stride; x++) + sum16[x] = sum16[x + 16 * stride] - sum16[x]; + } -static void integral_init24v(uint32_t *sum24, intptr_t stride) -{ - for (int x = 0; x < stride; x++) - sum24[x] = sum24[x + 24 * stride] - sum24[x]; -} + static void integral_init24v_c(uint32_t *sum24, intptr_t stride) + { + for (int x = 0; x < stride; x++) + sum24[x] = sum24[x + 24 * stride] - sum24[x]; + } -static void integral_init32v(uint32_t *sum32, intptr_t stride) -{ - for (int x = 0; x < stride; x++) - sum32[x] = sum32[x + 32 * stride] - sum32[x]; + static void integral_init32v_c(uint32_t *sum32, intptr_t stride) + { + for (int x = 0; x < stride; x++) + sum32[x] = sum32[x + 32 * stride] - sum32[x]; + } + + void setupSeaIntegralPrimitives_c(EncoderPrimitives &p) + { + p.integral_initv[INTEGRAL_4] = integral_init4v_c; + p.integral_initv[INTEGRAL_8] = integral_init8v_c; + p.integral_initv[INTEGRAL_12] = integral_init12v_c; + p.integral_initv[INTEGRAL_16] = integral_init16v_c; + p.integral_initv[INTEGRAL_24] = integral_init24v_c; + p.integral_initv[INTEGRAL_32] = integral_init32v_c; + p.integral_inith[INTEGRAL_4] = integral_init4h_c; + p.integral_inith[INTEGRAL_8] = integral_init8h_c; + p.integral_inith[INTEGRAL_12] = integral_init12h_c; + p.integral_inith[INTEGRAL_16] = integral_init16h_c; + p.integral_inith[INTEGRAL_24] = integral_init24h_c; + p.integral_inith[INTEGRAL_32] = integral_init32h_c;
View file
x265_2.4.tar.gz/source/encoder/framefilter.h -> x265_2.5.tar.gz/source/encoder/framefilter.h
Changed
@@ -123,7 +123,7 @@ uint32_t getCUWidth(int colNum) const { - return (colNum == (int)m_numCols - 1) ? m_lastWidth : g_maxCUSize; + return (colNum == (int)m_numCols - 1) ? m_lastWidth : m_param->maxCUSize; } void init(Encoder *top, FrameEncoder *frame, int numRows, uint32_t numCols);
View file
x265_2.4.tar.gz/source/encoder/motion.cpp -> x265_2.5.tar.gz/source/encoder/motion.cpp
Changed
@@ -598,6 +598,139 @@ } } +void MotionEstimate::refineMV(ReferencePlanes* ref, + const MV& mvmin, + const MV& mvmax, + const MV& qmvp, + MV& outQMv) +{ + ALIGN_VAR_16(int, costs[16]); + if (ctuAddr >= 0) + blockOffset = ref->reconPic->getLumaAddr(ctuAddr, absPartIdx) - ref->reconPic->getLumaAddr(0); + intptr_t stride = ref->lumaStride; + pixel* fenc = fencPUYuv.m_buf[0]; + pixel* fref = ref->fpelPlane[0] + blockOffset; + + setMVP(qmvp); + + MV qmvmin = mvmin.toQPel(); + MV qmvmax = mvmax.toQPel(); + + /* The term cost used here means satd/sad values for that particular search. + * The costs used in ME integer search only includes the SAD cost of motion + * residual and sqrtLambda times MVD bits. The subpel refine steps use SATD + * cost of residual and sqrtLambda * MVD bits. + */ + + // measure SATD cost at clipped QPEL MVP + MV pmv = qmvp.clipped(qmvmin, qmvmax); + MV bestpre = pmv; + int bprecost; + + bprecost = subpelCompare(ref, pmv, sad); + + /* re-measure full pel rounded MVP with SAD as search start point */ + MV bmv = pmv.roundToFPel(); + int bcost = bprecost; + if (pmv.isSubpel()) + bcost = sad(fenc, FENC_STRIDE, fref + bmv.x + bmv.y * stride, stride) + mvcost(bmv << 2); + + /* square refine */ + int dir = 0; + COST_MV_X4_DIR(0, -1, 0, 1, -1, 0, 1, 0, costs); + if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y)) + COPY2_IF_LT(bcost, costs[0], dir, 1); + if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y)) + COPY2_IF_LT(bcost, costs[1], dir, 2); + COPY2_IF_LT(bcost, costs[2], dir, 3); + COPY2_IF_LT(bcost, costs[3], dir, 4); + COST_MV_X4_DIR(-1, -1, -1, 1, 1, -1, 1, 1, costs); + if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y)) + COPY2_IF_LT(bcost, costs[0], dir, 5); + if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y)) + COPY2_IF_LT(bcost, costs[1], dir, 6); + if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y)) + COPY2_IF_LT(bcost, costs[2], dir, 7); + if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y)) + COPY2_IF_LT(bcost, costs[3], dir, 8); + bmv += square1[dir]; + + if (bprecost < bcost) + { + bmv = bestpre; + bcost = bprecost; + } + else + bmv = bmv.toQPel(); // promote search bmv to qpel + + // TO DO: Change SubpelWorkload to fine tune MV + // Now it is set to 5 for experiment. + // const SubpelWorkload& wl = workload[this->subpelRefine]; + const SubpelWorkload& wl = workload[5]; + + pixelcmp_t hpelcomp; + + if (wl.hpel_satd) + { + bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv); + hpelcomp = satd; + } + else + hpelcomp = sad; + + for (int iter = 0; iter < wl.hpel_iters; iter++) + { + int bdir = 0; + for (int i = 1; i <= wl.hpel_dirs; i++) + { + MV qmv = bmv + square1[i] * 2; + + // check mv range for slice bound + if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y)) + continue; + + int cost = subpelCompare(ref, qmv, hpelcomp) + mvcost(qmv); + COPY2_IF_LT(bcost, cost, bdir, i); + } + + if (bdir) + bmv += square1[bdir] * 2; + else + break; + } + + /* if HPEL search used SAD, remeasure with SATD before QPEL */ + if (!wl.hpel_satd) + bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv); + + for (int iter = 0; iter < wl.qpel_iters; iter++) + { + int bdir = 0; + for (int i = 1; i <= wl.qpel_dirs; i++) + { + MV qmv = bmv + square1[i]; + + // check mv range for slice bound + if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y)) + continue; + + int cost = subpelCompare(ref, qmv, satd) + mvcost(qmv); + COPY2_IF_LT(bcost, cost, bdir, i); + } + + if (bdir) + bmv += square1[bdir]; + else + break; + } + + // check mv range for slice bound + X265_CHECK(((pmv.y >= qmvmin.y) & (pmv.y <= qmvmax.y)), "mv beyond range!"); + + x265_emms(); + outQMv = bmv; +} + int MotionEstimate::motionEstimate(ReferencePlanes *ref, const MV & mvmin, const MV & mvmax, @@ -606,6 +739,7 @@ const MV * mvc, int merange, MV & outQMv, + uint32_t maxSlices, pixel * srcReferencePlane) { ALIGN_VAR_16(int, costs[16]); @@ -1306,7 +1440,7 @@ const SubpelWorkload& wl = workload[this->subpelRefine]; // check mv range for slice bound - if ((g_maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y))) + if ((maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y))) { bmv.y = x265_min(x265_max(bmv.y, qmvmin.y), qmvmax.y); bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
View file
x265_2.4.tar.gz/source/encoder/motion.h -> x265_2.5.tar.gz/source/encoder/motion.h
Changed
@@ -92,7 +92,8 @@ chromaSatd(refYuv.getCrAddr(puPartIdx), refYuv.m_csize, fencPUYuv.m_buf[2], fencPUYuv.m_csize); } - int motionEstimate(ReferencePlanes* ref, const MV & mvmin, const MV & mvmax, const MV & qmvp, int numCandidates, const MV * mvc, int merange, MV & outQMv, pixel *srcReferencePlane = 0); + void refineMV(ReferencePlanes* ref, const MV& mvmin, const MV& mvmax, const MV& qmvp, MV& outQMv); + int motionEstimate(ReferencePlanes* ref, const MV & mvmin, const MV & mvmax, const MV & qmvp, int numCandidates, const MV * mvc, int merange, MV & outQMv, uint32_t maxSlices, pixel *srcReferencePlane = 0); int subpelCompare(ReferencePlanes* ref, const MV &qmv, pixelcmp_t);
View file
x265_2.4.tar.gz/source/encoder/ratecontrol.cpp -> x265_2.5.tar.gz/source/encoder/ratecontrol.cpp
Changed
@@ -2272,7 +2272,7 @@ uint32_t refRowSatdCost = 0, refRowBits = 0, intraCostForPendingCus = 0; double refQScale = 0; - if (picType != I_SLICE) + if (picType != I_SLICE && !m_param->rc.bEnableConstVbv) { FrameData& refEncData = *refFrame->m_encData; uint32_t endCuAddr = maxCols * (row + 1); @@ -2301,7 +2301,8 @@ && refFrame && refFrame->m_encData->m_slice->m_sliceType == picType && refQScale > 0 - && refRowSatdCost > 0) + && refRowBits > 0 + && !m_param->rc.bEnableConstVbv) { if (abs((int32_t)(refRowSatdCost - satdCostForPendingCus)) < (int32_t)satdCostForPendingCus / 2) { @@ -2343,7 +2344,7 @@ } rowSatdCost >>= X265_DEPTH - 8; updatePredictor(rce->rowPred[0], qScaleVbv, (double)rowSatdCost, encodedBits); - if (curEncData.m_slice->m_sliceType != I_SLICE) + if (curEncData.m_slice->m_sliceType != I_SLICE && !m_param->rc.bEnableConstVbv) { Frame* refFrame = curEncData.m_slice->m_refFrameList[0][0]; if (qpVbv < refFrame->m_encData->m_rowStat[row].rowQp) @@ -2613,7 +2614,7 @@ for (uint32_t i = 0; i < slice->m_sps->numCuInHeight; i++) avgQpAq += curEncData.m_rowStat[i].sumQpAq; - avgQpAq /= (slice->m_sps->numCUsInFrame * NUM_4x4_PARTITIONS); + avgQpAq /= (slice->m_sps->numCUsInFrame * m_param->num4x4Partitions); curEncData.m_avgQpAq = avgQpAq; } else @@ -2711,6 +2712,13 @@ { *filler = updateVbv(actualBits, rce); + curFrame->m_rcData->bufferFillFinal = m_bufferFillFinal; + for (int i = 0; i < 4; i++) + { + curFrame->m_rcData->coeff[i] = m_pred[i].coeff; + curFrame->m_rcData->count[i] = m_pred[i].count; + curFrame->m_rcData->offset[i] = m_pred[i].offset; + } if (m_param->bEmitHRDSEI) { const VUI *vui = &curEncData.m_slice->m_sps->vuiParameters;
View file
x265_2.4.tar.gz/source/encoder/reference.cpp -> x265_2.5.tar.gz/source/encoder/reference.cpp
Changed
@@ -72,12 +72,12 @@ if (wp) { - uint32_t numCUinHeight = (reconPic->m_picHeight + g_maxCUSize - 1) / g_maxCUSize; + uint32_t numCUinHeight = (reconPic->m_picHeight + p.maxCUSize - 1) / p.maxCUSize; int marginX = reconPic->m_lumaMarginX; int marginY = reconPic->m_lumaMarginY; intptr_t stride = reconPic->m_stride; - int cuHeight = g_maxCUSize; + int cuHeight = p.maxCUSize; for (int c = 0; c < (p.internalCsp != X265_CSP_I400 && recPic->m_picCsp != X265_CSP_I400 ? numInterpPlanes : 1); c++) { @@ -127,15 +127,15 @@ int marginY = reconPic->m_lumaMarginY; intptr_t stride = reconPic->m_stride; int width = reconPic->m_picWidth; - int height = (finishedRows - numWeightedRows) * g_maxCUSize; + int height = (finishedRows - numWeightedRows) * reconPic->m_param->maxCUSize; /* the last row may be partial height */ if (finishedRows == maxNumRows - 1) { - const int leftRows = (reconPic->m_picHeight & (g_maxCUSize - 1)); + const int leftRows = (reconPic->m_picHeight & (reconPic->m_param->maxCUSize - 1)); - height += leftRows ? leftRows : g_maxCUSize; + height += leftRows ? leftRows : reconPic->m_param->maxCUSize; } - int cuHeight = g_maxCUSize; + int cuHeight = reconPic->m_param->maxCUSize; for (int c = 0; c < numInterpPlanes; c++) {
View file
x265_2.4.tar.gz/source/encoder/sao.cpp -> x265_2.5.tar.gz/source/encoder/sao.cpp
Changed
@@ -98,8 +98,8 @@ m_hChromaShift = CHROMA_H_SHIFT(param->internalCsp); m_vChromaShift = CHROMA_V_SHIFT(param->internalCsp); - m_numCuInWidth = (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize; - m_numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize; + m_numCuInWidth = (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize; + m_numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize; const pixel maxY = (1 << X265_DEPTH) - 1; const pixel rangeExt = maxY >> 1; @@ -107,12 +107,12 @@ for (int i = 0; i < (param->internalCsp != X265_CSP_I400 ? 3 : 1); i++) { - CHECKED_MALLOC(m_tmpL1[i], pixel, g_maxCUSize + 1); - CHECKED_MALLOC(m_tmpL2[i], pixel, g_maxCUSize + 1); + CHECKED_MALLOC(m_tmpL1[i], pixel, m_param->maxCUSize + 1); + CHECKED_MALLOC(m_tmpL2[i], pixel, m_param->maxCUSize + 1); // SAO asm code will read 1 pixel before and after, so pad by 2 // NOTE: m_param->sourceWidth+2 enough, to avoid condition check in copySaoAboveRef(), I alloc more up to 63 bytes in here - CHECKED_MALLOC(m_tmpU[i], pixel, m_numCuInWidth * g_maxCUSize + 2 + 32); + CHECKED_MALLOC(m_tmpU[i], pixel, m_numCuInWidth * m_param->maxCUSize + 2 + 32); m_tmpU[i] += 1; } @@ -279,8 +279,8 @@ uint32_t picWidth = m_param->sourceWidth; uint32_t picHeight = m_param->sourceHeight; const CUData* cu = m_frame->m_encData->getPicCTU(addr); - int ctuWidth = g_maxCUSize; - int ctuHeight = g_maxCUSize; + int ctuWidth = m_param->maxCUSize; + int ctuHeight = m_param->maxCUSize; uint32_t lpelx = cu->m_cuPelX; uint32_t tpely = cu->m_cuPelY; const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice; @@ -573,8 +573,8 @@ { PicYuv* reconPic = m_frame->m_reconPic; intptr_t stride = reconPic->m_stride; - int ctuWidth = g_maxCUSize; - int ctuHeight = g_maxCUSize; + int ctuWidth = m_param->maxCUSize; + int ctuHeight = m_param->maxCUSize; int addr = idxY * m_numCuInWidth + idxX; pixel* rec = reconPic->getLumaAddr(addr); @@ -633,8 +633,8 @@ { PicYuv* reconPic = m_frame->m_reconPic; intptr_t stride = reconPic->m_strideC; - int ctuWidth = g_maxCUSize; - int ctuHeight = g_maxCUSize; + int ctuWidth = m_param->maxCUSize; + int ctuHeight = m_param->maxCUSize; { ctuWidth >>= m_hChromaShift; @@ -744,8 +744,8 @@ intptr_t stride = plane ? reconPic->m_strideC : reconPic->m_stride; uint32_t picWidth = m_param->sourceWidth; uint32_t picHeight = m_param->sourceHeight; - int ctuWidth = g_maxCUSize; - int ctuHeight = g_maxCUSize; + int ctuWidth = m_param->maxCUSize; + int ctuHeight = m_param->maxCUSize; uint32_t lpelx = cu->m_cuPelX; uint32_t tpely = cu->m_cuPelY; const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice; @@ -791,9 +791,9 @@ // WARNING: *) May read beyond bound on video than ctuWidth or ctuHeight is NOT multiple of cuSize X265_CHECK((ctuWidth == ctuHeight) || (m_chromaFormat != X265_CSP_I420), "video size check failure\n"); if (plane) - primitives.chroma[m_chromaFormat].cu[g_maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride); + primitives.chroma[m_chromaFormat].cu[m_param->maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride); else - primitives.cu[g_maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride); + primitives.cu[m_param->maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride); } else { @@ -928,8 +928,8 @@ intptr_t stride = reconPic->m_stride; uint32_t picWidth = m_param->sourceWidth; uint32_t picHeight = m_param->sourceHeight; - int ctuWidth = g_maxCUSize; - int ctuHeight = g_maxCUSize; + int ctuWidth = m_param->maxCUSize; + int ctuHeight = m_param->maxCUSize; uint32_t lpelx = cu->m_cuPelX; uint32_t tpely = cu->m_cuPelY; const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice; @@ -1553,14 +1553,17 @@ } // Estimate Best Position - int64_t bestRDCostBO = MAX_INT64; int32_t bestClassBO = 0; + int64_t currentRDCost = costClasses[0]; + currentRDCost += costClasses[1]; + currentRDCost += costClasses[2]; + currentRDCost += costClasses[3]; + int64_t bestRDCostBO = currentRDCost; - for (int i = 0; i < MAX_NUM_SAO_CLASS - SAO_NUM_OFFSET + 1; i++) + for (int i = 1; i < MAX_NUM_SAO_CLASS - SAO_NUM_OFFSET + 1; i++) { - int64_t currentRDCost = 0; - for (int j = i; j < i + SAO_NUM_OFFSET; j++) - currentRDCost += costClasses[j]; + currentRDCost -= costClasses[i - 1]; + currentRDCost += costClasses[i + 3]; if (currentRDCost < bestRDCostBO) {
View file
x265_2.4.tar.gz/source/encoder/search.cpp -> x265_2.5.tar.gz/source/encoder/search.cpp
Changed
@@ -120,8 +120,8 @@ CHECKED_MALLOC(m_rqt[i].coeffRQT[0], coeff_t, sizeL + sizeC * 2); m_rqt[i].coeffRQT[1] = m_rqt[i].coeffRQT[0] + sizeL; m_rqt[i].coeffRQT[2] = m_rqt[i].coeffRQT[0] + sizeL + sizeC; - ok &= m_rqt[i].reconQtYuv.create(g_maxCUSize, param.internalCsp); - ok &= m_rqt[i].resiQtYuv.create(g_maxCUSize, param.internalCsp); + ok &= m_rqt[i].reconQtYuv.create(param.maxCUSize, param.internalCsp); + ok &= m_rqt[i].resiQtYuv.create(param.maxCUSize, param.internalCsp); } } else @@ -130,15 +130,15 @@ { CHECKED_MALLOC(m_rqt[i].coeffRQT[0], coeff_t, sizeL); m_rqt[i].coeffRQT[1] = m_rqt[i].coeffRQT[2] = NULL; - ok &= m_rqt[i].reconQtYuv.create(g_maxCUSize, param.internalCsp); - ok &= m_rqt[i].resiQtYuv.create(g_maxCUSize, param.internalCsp); + ok &= m_rqt[i].reconQtYuv.create(param.maxCUSize, param.internalCsp); + ok &= m_rqt[i].resiQtYuv.create(param.maxCUSize, param.internalCsp); } } /* the rest of these buffers are indexed per-depth */ - for (uint32_t i = 0; i <= g_maxCUDepth; i++) + for (uint32_t i = 0; i <= m_param->maxCUDepth; i++) { - int cuSize = g_maxCUSize >> i; + int cuSize = param.maxCUSize >> i; ok &= m_rqt[i].tmpResiYuv.create(cuSize, param.internalCsp); ok &= m_rqt[i].tmpPredYuv.create(cuSize, param.internalCsp); ok &= m_rqt[i].bidirPredYuv[0].create(cuSize, param.internalCsp); @@ -186,7 +186,7 @@ m_rqt[i].resiQtYuv.destroy(); } - for (uint32_t i = 0; i <= g_maxCUDepth; i++) + for (uint32_t i = 0; i <= m_param->maxCUDepth; i++) { m_rqt[i].tmpResiYuv.destroy(); m_rqt[i].tmpPredYuv.destroy(); @@ -2073,7 +2073,7 @@ int mvpIdx = selectMVP(interMode.cu, pu, amvp, list, ref); MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx]; - if (!m_param->analysisMode) /* Prevents load/save outputs from diverging if lowresMV is not available */ + if (!m_param->analysisReuseMode) /* Prevents load/save outputs from diverging if lowresMV is not available */ { MV lmv = getLowresMV(interMode.cu, pu, list, ref); if (lmv.notZero()) @@ -2082,7 +2082,7 @@ setSearchRange(interMode.cu, mvp, m_param->searchRange, mvmin, mvmax); - int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, + int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0); /* Get total cost of partition, but only include MV bit cost once */ @@ -2108,6 +2108,17 @@ } } +void Search::searchMV(Mode& interMode, const PredictionUnit& pu, int list, int ref, MV& outmv) +{ + CUData& cu = interMode.cu; + const Slice *slice = m_slice; + MV mv = cu.m_mv[list][pu.puAbsPartIdx]; + cu.clipMv(mv); + MV mvmin, mvmax; + setSearchRange(cu, mv, m_param->searchRange, mvmin, mvmax); + m_me.refineMV(&slice->m_mref[list][ref], mvmin, mvmax, mv, outmv); +} + /* find the best inter prediction for each PU of specified mode */ void Search::predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t refMasks[2]) { @@ -2150,7 +2161,7 @@ cu.getNeighbourMV(puIdx, pu.puAbsPartIdx, interMode.interNeighbours); /* Uni-directional prediction */ - if ((m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1) + if ((m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10) || (m_param->analysisMultiPassRefine && m_param->rc.bStatRead)) { for (int list = 0; list < numPredDir; list++) @@ -2180,7 +2191,7 @@ if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && mvpIdx == bestME[list].mvpIdx) mvpIn = bestME[list].mv; - int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv, + int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0); /* Get total cost of partition, but only include MV bit cost once */ @@ -2286,7 +2297,7 @@ int mvpIdx = selectMVP(cu, pu, amvp, list, ref); MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx]; - if (!m_param->analysisMode) /* Prevents load/save outputs from diverging when lowresMV is not available */ + if (!m_param->analysisReuseMode) /* Prevents load/save outputs from diverging when lowresMV is not available */ { MV lmv = getLowresMV(cu, pu, list, ref); if (lmv.notZero()) @@ -2300,7 +2311,7 @@ m_me.integral[planes] = interMode.fencYuv->m_integral[list][ref][planes] + puX * pu.width + puY * pu.height * m_slice->m_refFrameList[list][ref]->m_reconPic->m_stride; } setSearchRange(cu, mvp, m_param->searchRange, mvmin, mvmax); - int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, + int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0); /* Get total cost of partition, but only include MV bit cost once */ @@ -2582,11 +2593,11 @@ cu.clipMv(mvmax); if (cu.m_encData->m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE && - cu.m_cuPelX / g_maxCUSize < m_frame->m_encData->m_pir.pirStartCol && + cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirStartCol && m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol < m_slice->m_sps->numCuInWidth) { int safeX, maxSafeMv; - safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * g_maxCUSize - 3; + safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3; maxSafeMv = (safeX - cu.m_cuPelX) * 4; mvmax.x = X265_MIN(mvmax.x, maxSafeMv); mvmin.x = X265_MIN(mvmin.x, maxSafeMv);
View file
x265_2.4.tar.gz/source/encoder/search.h -> x265_2.5.tar.gz/source/encoder/search.h
Changed
@@ -204,9 +204,9 @@ memset(this, 0, sizeof(*this)); } - void accumulate(CUStats& other) + void accumulate(CUStats& other, x265_param& param) { - for (uint32_t i = 0; i <= g_maxCUDepth; i++) + for (uint32_t i = 0; i <= param.maxCUDepth; i++) { intraRDOElapsedTime[i] += other.intraRDOElapsedTime[i]; interRDOElapsedTime[i] += other.interRDOElapsedTime[i]; @@ -311,6 +311,7 @@ // estimation inter prediction (non-skip) void predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t masks[2]); + void searchMV(Mode& interMode, const PredictionUnit& pu, int list, int ref, MV& outmv); // encode residual and compute rd-cost for inter mode void encodeResAndCalcRdInterCU(Mode& interMode, const CUGeom& cuGeom); void encodeResAndCalcRdSkipCU(Mode& interMode);
View file
x265_2.4.tar.gz/source/encoder/sei.cpp -> x265_2.5.tar.gz/source/encoder/sei.cpp
Changed
@@ -54,21 +54,23 @@ } WRITE_CODE(type, 8, "payload_type"); uint32_t payloadSize; - if (hrdTypes || m_payloadType == USER_DATA_UNREGISTERED) + if (hrdTypes || m_payloadType == USER_DATA_UNREGISTERED || m_payloadType == USER_DATA_REGISTERED_ITU_T_T35) { if (hrdTypes) { X265_CHECK(0 == (count.getNumberOfWrittenBits() & 7), "payload unaligned\n"); payloadSize = count.getNumberOfWrittenBits() >> 3; } - else + else if (m_payloadType == USER_DATA_UNREGISTERED) payloadSize = m_payloadSize + 16; + else + payloadSize = m_payloadSize; for (; payloadSize >= 0xff; payloadSize -= 0xff) WRITE_CODE(0xff, 8, "payload_size"); WRITE_CODE(payloadSize, 8, "payload_size"); } - else if(m_payloadType != USER_DATA_REGISTERED_ITU_T_T35) + else WRITE_CODE(m_payloadSize, 8, "payload_size"); /* virtual writeSEI method, write to bs */ writeSEI(sps);
View file
x265_2.4.tar.gz/source/encoder/sei.h -> x265_2.5.tar.gz/source/encoder/sei.h
Changed
@@ -276,27 +276,17 @@ m_payloadSize = 0; } - uint8_t *cim; + uint8_t *m_payload; // daniel.vt@samsung.com :: for the Creative Intent Meta Data Encoding ( seongnam.oh@samsung.com ) void writeSEI(const SPS&) { - if (!cim) + if (!m_payload) return; - int i = 0; - int payloadSize = m_payloadSize; - while (cim[i] == 0xFF) - { - i++; - payloadSize += cim[i]; - WRITE_CODE(0xFF, 8, "payload_size"); - } - WRITE_CODE(payloadSize, 8, "payload_size"); - i++; - payloadSize += i; - for (; i < payloadSize; ++i) - WRITE_CODE(cim[i], 8, "creative_intent_metadata"); + uint32_t i = 0; + for (; i < m_payloadSize; ++i) + WRITE_CODE(m_payload[i], 8, "creative_intent_metadata"); } }; }
View file
x265_2.4.tar.gz/source/encoder/slicetype.cpp -> x265_2.5.tar.gz/source/encoder/slicetype.cpp
Changed
@@ -893,7 +893,7 @@ if (m_param->rc.cuTree && !m_param->rc.bStatRead) /* update row satds based on cutree offsets */ curFrame->m_lowres.satdCost = frameCostRecalculate(frames, p0, p1, b); - else if (m_param->analysisMode != X265_ANALYSIS_LOAD) + else if (m_param->analysisReuseMode != X265_ANALYSIS_LOAD || m_param->scaleFactor) { if (m_param->rc.aqMode) curFrame->m_lowres.satdCost = curFrame->m_lowres.costEstAq[b - p0][p1 - b]; @@ -907,7 +907,7 @@ curFrame->m_lowres.lowresCostForRc = curFrame->m_lowres.lowresCosts[b - p0][p1 - b]; uint32_t lowresRow = 0, lowresCol = 0, lowresCuIdx = 0, sum = 0, intraSum = 0; uint32_t scale = m_param->maxCUSize / (2 * X265_LOWRES_CU_SIZE); - uint32_t numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize; + uint32_t numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize; uint32_t widthInLowresCu = (uint32_t)m_8x8Width, heightInLowresCu = (uint32_t)m_8x8Height; double *qp_offset = 0; /* Factor in qpoffsets based on Aq/Cutree in CU costs */ @@ -1638,6 +1638,13 @@ m_isSceneTransition = false; /* Signal end of scene transitioning */ } + if (m_param->csvLogLevel >= 2) + { + int64_t icost = frames[p1]->costEst[0][0]; + int64_t pcost = frames[p1]->costEst[p1 - p0][0]; + frames[p1]->ipCostRatio = (double)icost / pcost; + } + /* A frame is always analysed with bRealScenecut = true first, and then bRealScenecut = false, the former for I decisions and the latter for P/B decisions. It's possible that the first analysis detected scenecuts which were later nulled due to scene transitioning, in which @@ -1812,7 +1819,8 @@ MV *mvs = frames[b]->lowresMvs[list][listDist[list]]; int32_t x = mvs[cuIndex].x; int32_t y = mvs[cuIndex].y; - displacement += sqrt(pow(abs(x), 2) + pow(abs(y), 2)); + // NOTE: the dynamic range of abs(x) and abs(y) is 15-bits + displacement += sqrt((double)(abs(x) * abs(x)) + (double)(abs(y) * abs(y))); } else displacement += 0.0; @@ -2400,7 +2408,7 @@ /* ME will never return a cost larger than the cost @MVP, so we do not * have to check that ME cost is more than the estimated merge cost */ - fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, s_merange, *fencMV); + fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, s_merange, *fencMV, m_lookahead.m_param->maxSlices); if (skipCost < 64 && skipCost < fencCost && bBidir) { fencCost = skipCost;
View file
x265_2.4.tar.gz/source/test/ipfilterharness.cpp -> x265_2.5.tar.gz/source/test/ipfilterharness.cpp
Changed
@@ -38,10 +38,8 @@ { pixel_test_buff[0][i] = rand() & PIXEL_MAX; short_test_buff[0][i] = (rand() % (2 * SMAX)) - SMAX; - pixel_test_buff[1][i] = PIXEL_MIN; - short_test_buff[1][i] = SMIN; - + short_test_buff[1][i] = (int16_t)SMIN; pixel_test_buff[2][i] = PIXEL_MAX; short_test_buff[2][i] = SMAX; }
View file
x265_2.4.tar.gz/source/test/ipfilterharness.h -> x265_2.5.tar.gz/source/test/ipfilterharness.h
Changed
@@ -39,8 +39,7 @@ enum { ITERS = 100 }; enum { TEST_CASES = 3 }; enum { SMAX = 1 << 12 }; - enum { SMIN = -1 << 12 }; - + enum { SMIN = (unsigned)-1 << 12 }; ALIGN_VAR_32(pixel, pixel_buff[TEST_BUF_SIZE]); int16_t short_buff[TEST_BUF_SIZE]; int16_t IPF_vec_output_s[TEST_BUF_SIZE];
View file
x265_2.4.tar.gz/source/test/pixelharness.cpp -> x265_2.5.tar.gz/source/test/pixelharness.cpp
Changed
@@ -44,9 +44,8 @@ uchar_test_buff[0][i] = rand() % ((1 << 8) - 1); residual_test_buff[0][i] = (rand() % (2 * RMAX + 1)) - RMAX - 1;// For sse_ss only double_test_buff[0][i] = (double)(short_test_buff[0][i]) / 256.0; - pixel_test_buff[1][i] = PIXEL_MIN; - short_test_buff[1][i] = SMIN; + short_test_buff[1][i] = (int16_t)SMIN; short_test_buff1[1][i] = PIXEL_MIN; short_test_buff2[1][i] = -16384; int_test_buff[1][i] = SHORT_MIN; @@ -2003,6 +2002,76 @@ return true; } +bool PixelHarness::check_integral_initv(integralv_t ref, integralv_t opt) +{ + intptr_t srcStep = 64; + int j = 0; + uint32_t dst_ref[BUFFSIZE] = { 0 }; + uint32_t dst_opt[BUFFSIZE] = { 0 }; + + for (int i = 0; i < 64; i++) + { + dst_ref[i] = pixel_test_buff[0][i]; + dst_opt[i] = pixel_test_buff[0][i]; + } + + for (int i = 0, k = 0; i < BUFFSIZE; i++) + { + if (i % 64 == 0) + k++; + dst_ref[i] = dst_ref[i % 64] + k; + dst_opt[i] = dst_opt[i % 64] + k; + } + + int padx = 4; + int pady = 4; + uint32_t *dst_ref_ptr = dst_ref + srcStep * pady + padx; + uint32_t *dst_opt_ptr = dst_opt + srcStep * pady + padx; + for (int i = 0; i < ITERS; i++) + { + ref(dst_ref_ptr, srcStep); + checked(opt, dst_opt_ptr, srcStep); + + if (memcmp(dst_ref, dst_opt, sizeof(uint32_t) * BUFFSIZE)) + return false; + + reportfail() + j += INCR; + } + return true; +} + +bool PixelHarness::check_integral_inith(integralh_t ref, integralh_t opt) +{ + /* Since stride is always a multiple of 8 and data movement in AVX2 is 16 elements at a time for 8 bit pixel, we need + * to check correctness for two cases: stride multiple of 16 and stride not a multiple of 16; fine for High bit depth + * where data movement in AVX2 is 8 elements at a time */ + intptr_t srcStep[2] = { 56, 64 }; + int j = 0; + uint32_t dst_ref[BUFFSIZE] = { 0 }; + uint32_t dst_opt[BUFFSIZE] = { 0 }; + + int padx = 4; + int pady = 4; + for (int l = 0; l < 2; l++) + { + uint32_t *dst_ref_ptr = dst_ref + srcStep[l] * pady + padx; + uint32_t *dst_opt_ptr = dst_opt + srcStep[l] * pady + padx; + for (int k = 0; k < ITERS; k++) + { + ref(dst_ref_ptr, pixel_test_buff[0], srcStep[l]); + checked(opt, dst_opt_ptr, pixel_test_buff[0], srcStep[l]); + + if (memcmp(dst_ref, dst_opt, sizeof(uint32_t) * BUFFSIZE)) + return false; + + reportfail() + j += INCR; + } + } + return true; +} + bool PixelHarness::testPU(int part, const EncoderPrimitives& ref, const EncoderPrimitives& opt) { if (opt.pu[part].satd) @@ -2688,6 +2757,64 @@ } } + for (int k = 0; k < NUM_INTEGRAL_SIZE; k++) + { + if (opt.integral_initv[k] && !check_integral_initv(ref.integral_initv[k], opt.integral_initv[k])) + { + switch (k) + { + case 0: + printf("Integral4v failed!\n"); + break; + case 1: + printf("Integral8v failed!\n"); + break; + case 2: + printf("Integral12v failed!\n"); + break; + case 3: + printf("Integral16v failed!\n"); + break; + case 4: + printf("Integral24v failed!\n"); + break; + case 5: + printf("Integral32v failed!\n"); + break; + } + return false; + } + } + + + for (int k = 0; k < NUM_INTEGRAL_SIZE; k++) + { + if (opt.integral_inith[k] && !check_integral_inith(ref.integral_inith[k], opt.integral_inith[k])) + { + switch (k) + { + case 0: + printf("Integral4h failed!\n"); + break; + case 1: + printf("Integral8h failed!\n"); + break; + case 2: + printf("Integral12h failed!\n"); + break; + case 3: + printf("Integral16h failed!\n"); + break; + case 4: + printf("Integral24h failed!\n"); + break; + case 5: + printf("Integral32h failed!\n"); + break; + } + return false; + } + } return true; } @@ -3210,4 +3337,67 @@ HEADER0("pelFilterChroma_Horizontal"); REPORT_SPEEDUP(opt.pelFilterChroma[1], ref.pelFilterChroma[1], pbuf1, 1, STRIDE, tc, maskP, maskQ); } + + for (int k = 0; k < NUM_INTEGRAL_SIZE; k++) + { + if (opt.integral_initv[k]) + { + switch (k) + { + case 0: + HEADER0("integral_init4v"); + break; + case 1: + HEADER0("integral_init8v"); + break; + case 2: + HEADER0("integral_init12v"); + break; + case 3: + HEADER0("integral_init16v"); + break; + case 4: + HEADER0("integral_init24v"); + break; + case 5: + HEADER0("integral_init32v"); + break; + default: + break; + } + REPORT_SPEEDUP(opt.integral_initv[k], ref.integral_initv[k], (uint32_t*)pbuf1, STRIDE); + } + } + + for (int k = 0; k < NUM_INTEGRAL_SIZE; k++) + { + if (opt.integral_inith[k]) + { + uint32_t dst_buf[BUFFSIZE] = { 0 }; + switch (k) + { + case 0: + HEADER0("integral_init4h"); + break; + case 1:
View file
x265_2.4.tar.gz/source/test/pixelharness.h -> x265_2.5.tar.gz/source/test/pixelharness.h
Changed
@@ -40,7 +40,7 @@ enum { BUFFSIZE = STRIDE * (MAX_HEIGHT + PAD_ROWS) + INCR * ITERS }; enum { TEST_CASES = 3 }; enum { SMAX = 1 << 12 }; - enum { SMIN = -1 << 12 }; + enum { SMIN = (unsigned)-1 << 12 }; enum { RMAX = PIXEL_MAX - PIXEL_MIN }; //The maximum value obtained by subtracting pixel values (residual max) enum { RMIN = PIXEL_MIN - PIXEL_MAX }; //The minimum value obtained by subtracting pixel values (residual min) @@ -126,6 +126,8 @@ bool check_pelFilterLumaStrong_H(pelFilterLumaStrong_t ref, pelFilterLumaStrong_t opt); bool check_pelFilterChroma_V(pelFilterChroma_t ref, pelFilterChroma_t opt); bool check_pelFilterChroma_H(pelFilterChroma_t ref, pelFilterChroma_t opt); + bool check_integral_initv(integralv_t ref, integralv_t opt); + bool check_integral_inith(integralh_t ref, integralh_t opt); public:
View file
x265_2.4.tar.gz/source/test/regression-tests.txt -> x265_2.5.tar.gz/source/test/regression-tests.txt
Changed
@@ -17,17 +17,17 @@ BasketballDrive_1920x1080_50.y4m,--preset faster --aq-strength 2 --merange 190 --slices 3 BasketballDrive_1920x1080_50.y4m,--preset medium --ctu 16 --max-tu-size 8 --subme 7 --qg-size 16 --cu-lossless --tu-inter-depth 3 --limit-tu 1 BasketballDrive_1920x1080_50.y4m,--preset medium --keyint -1 --nr-inter 100 -F4 --no-sao -BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-mode=save --refine-level 2 --bitrate 7000 --limit-modes,--preset medium --no-cutree --analysis-mode=load --refine-level 2 --bitrate 7000 --limit-modes +BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 2 --bitrate 7000 --limit-modes,--preset medium --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 2 --bitrate 7000 --limit-modes BasketballDrive_1920x1080_50.y4m,--preset slow --nr-intra 100 -F4 --aq-strength 3 --qg-size 16 --limit-refs 1 BasketballDrive_1920x1080_50.y4m,--preset slower --lossless --chromaloc 3 --subme 0 --limit-tu 4 -BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-mode=save --refine-level 10 --bitrate 7000 --limit-tu 0,--preset slower --no-cutree --analysis-mode=load --refine-level 10 --bitrate 7000 --limit-tu 0 +BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 10 --bitrate 7000 --limit-tu 0,--preset slower --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 10 --bitrate 7000 --limit-tu 0 BasketballDrive_1920x1080_50.y4m,--preset veryslow --crf 4 --cu-lossless --pmode --limit-refs 1 --aq-mode 3 --limit-tu 3 -BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-mode=save --bitrate 7000 --tskip-fast --limit-tu 4,--preset veryslow --no-cutree --analysis-mode=load --bitrate 7000 --tskip-fast --limit-tu 4 +BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-reuse-mode=save --bitrate 7000 --tskip-fast --limit-tu 4,--preset veryslow --no-cutree --analysis-reuse-mode=load --bitrate 7000 --tskip-fast --limit-tu 4 BasketballDrive_1920x1080_50.y4m,--preset veryslow --recon-y4m-exec "ffplay -i pipe:0 -autoexit" Coastguard-4k.y4m,--preset ultrafast --recon-y4m-exec "ffplay -i pipe:0 -autoexit" Coastguard-4k.y4m,--preset superfast --tune grain --overscan=crop Coastguard-4k.y4m,--preset superfast --tune grain --pme --aq-strength 2 --merange 190 -Coastguard-4k.y4m,--preset veryfast --no-cutree --analysis-mode=save --refine-level 1 --bitrate 15000,--preset veryfast --no-cutree --analysis-mode=load --refine-level 1 --bitrate 15000 +Coastguard-4k.y4m,--preset veryfast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 1 --bitrate 15000,--preset veryfast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 1 --bitrate 15000 Coastguard-4k.y4m,--preset medium --rdoq-level 1 --tune ssim --no-signhide --me umh --slices 2 Coastguard-4k.y4m,--preset slow --tune psnr --cbqpoffs -1 --crqpoffs 1 --limit-refs 1 CrowdRun_1920x1080_50_10bit_422.yuv,--preset ultrafast --weightp --tune zerolatency --qg-size 16 @@ -51,7 +51,7 @@ DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset veryfast --weightp --nr-intra 1000 -F4 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset medium --nr-inter 500 -F4 --no-psy-rdoq DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset slower --no-weightp --rdoq-level 0 --limit-refs 3 --tu-inter-depth 4 --limit-tu 3 -DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-mode=save --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1,--preset fast --no-cutree --analysis-mode=load --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1 +DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-reuse-mode=save --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1,--preset fast --no-cutree --analysis-reuse-mode=load --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1 FourPeople_1280x720_60.y4m,--preset superfast --no-wpp --lookahead-slices 2 FourPeople_1280x720_60.y4m,--preset veryfast --aq-mode 2 --aq-strength 1.5 --qg-size 8 FourPeople_1280x720_60.y4m,--preset medium --qp 38 --no-psy-rd @@ -68,8 +68,8 @@ KristenAndSara_1280x720_60.y4m,--preset slower --pmode --max-tu-size 8 --limit-refs 0 --limit-modes --limit-tu 1 NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset superfast --tune psnr NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset medium --tune grain --limit-refs 2 -NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset slow --no-cutree --analysis-mode=save --rd 5 --refine-level 10 --bitrate 9000,--preset slow --no-cutree --analysis-mode=load --rd 5 --refine-level 10 --bitrate 9000 -News-4k.y4m,--preset ultrafast --no-cutree --analysis-mode=save --refine-level 2 --bitrate 15000,--preset ultrafast --no-cutree --analysis-mode=load --refine-level 2 --bitrate 15000 +NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset slow --no-cutree --analysis-reuse-mode=save --rd 5 --analysis-reuse-level 10 --bitrate 9000,--preset slow --no-cutree --analysis-reuse-mode=load --rd 5 --analysis-reuse-level 10 --bitrate 9000 +News-4k.y4m,--preset ultrafast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 2 --bitrate 15000,--preset ultrafast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 2 --bitrate 15000 News-4k.y4m,--preset superfast --lookahead-slices 6 --aq-mode 0 News-4k.y4m,--preset superfast --slices 4 --aq-mode 0 News-4k.y4m,--preset medium --tune ssim --no-sao --qg-size 16 @@ -123,7 +123,7 @@ old_town_cross_444_720p50.y4m,--preset superfast --weightp --min-cu 16 --limit-modes old_town_cross_444_720p50.y4m,--preset veryfast --qp 1 --tune ssim old_town_cross_444_720p50.y4m,--preset faster --rd 1 --tune zero-latency -old_town_cross_444_720p50.y4m,--preset fast --no-cutree --analysis-mode=save --refine-level 1 --bitrate 3000 --early-skip,--preset fast --no-cutree --analysis-mode=load --refine-level 1 --bitrate 3000 --early-skip +old_town_cross_444_720p50.y4m,--preset fast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 1 --bitrate 3000 --early-skip,--preset fast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 1 --bitrate 3000 --early-skip old_town_cross_444_720p50.y4m,--preset medium --keyint -1 --no-weightp --ref 6 old_town_cross_444_720p50.y4m,--preset slow --rdoq-level 1 --early-skip --ref 7 --no-b-pyramid old_town_cross_444_720p50.y4m,--preset slower --crf 4 --cu-lossless
View file
x265_2.4.tar.gz/source/x265-extras.cpp -> x265_2.5.tar.gz/source/x265-extras.cpp
Changed
@@ -25,7 +25,7 @@ #include "x265.h" #include "x265-extras.h" - +#include "param.h" #include "common.h" using namespace X265_NS; @@ -38,14 +38,8 @@ "B count, B ave-QP, B kbps, B-PSNR Y, B-PSNR U, B-PSNR V, B-SSIM (dB), " "MaxCLL, MaxFALL, Version\n"; -FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char* fname, int level) +FILE* x265_csvlog_open(const x265_param& param, const char* fname, int level) { - if (sizeof(x265_stats) != api.sizeof_stats || sizeof(x265_picture) != api.sizeof_picture) - { - fprintf(stderr, "extras [error]: structure size skew, unable to create CSV logfile\n"); - return NULL; - } - FILE *csvfp = x265_fopen(fname, "r"); if (csvfp) { @@ -62,6 +56,8 @@ if (level) { fprintf(csvfp, "Encode Order, Type, POC, QP, Bits, Scenecut, "); + if (level >= 2) + fprintf(csvfp, "I/P cost ratio, "); if (param.rc.rateControlMode == X265_RC_CRF) fprintf(csvfp, "RateFactor, "); if (param.rc.vbvBufferSize) @@ -73,7 +69,7 @@ fprintf(csvfp, "Latency, "); fprintf(csvfp, "List 0, List 1"); uint32_t size = param.maxCUSize; - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++) + for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++) { fprintf(csvfp, ", Intra %dx%d DC, Intra %dx%d Planar, Intra %dx%d Ang", size, size, size, size, size, size); size /= 2; @@ -82,7 +78,7 @@ size = param.maxCUSize; if (param.bEnableRectInter) { - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++) + for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++) { fprintf(csvfp, ", Inter %dx%d, Inter %dx%d (Rect)", size, size, size, size); if (param.bEnableAMP) @@ -92,29 +88,56 @@ } else { - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++) + for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++) { fprintf(csvfp, ", Inter %dx%d", size, size); size /= 2; } } size = param.maxCUSize; - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++) + for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++) { fprintf(csvfp, ", Skip %dx%d", size, size); size /= 2; } size = param.maxCUSize; - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++) + for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++) { fprintf(csvfp, ", Merge %dx%d", size, size); size /= 2; } - fprintf(csvfp, ", Avg Luma Distortion, Avg Chroma Distortion, Avg psyEnergy, Avg Luma Level, Max Luma Level, Avg Residual Energy"); - /* detailed performance statistics */ if (level >= 2) - fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms), Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks"); + { + fprintf(csvfp, ", Avg Luma Distortion, Avg Chroma Distortion, Avg psyEnergy, Avg Residual Energy," + " Min Luma Level, Max Luma Level, Avg Luma Level"); + + if (param.internalCsp != X265_CSP_I400) + fprintf(csvfp, ", Min Cb Level, Max Cb Level, Avg Cb Level, Min Cr Level, Max Cr Level, Avg Cr Level"); + + /* PU statistics */ + size = param.maxCUSize; + for (uint32_t i = 0; i< param.maxLog2CUSize - (uint32_t)g_log2Size[param.minCUSize] + 1; i++) + { + fprintf(csvfp, ", Intra %dx%d", size, size); + fprintf(csvfp, ", Skip %dx%d", size, size); + fprintf(csvfp, ", AMP %d", size); + fprintf(csvfp, ", Inter %dx%d", size, size); + fprintf(csvfp, ", Merge %dx%d", size, size); + fprintf(csvfp, ", Inter %dx%d", size, size / 2); + fprintf(csvfp, ", Merge %dx%d", size, size / 2); + fprintf(csvfp, ", Inter %dx%d", size / 2, size); + fprintf(csvfp, ", Merge %dx%d", size / 2, size); + size /= 2; + } + + if ((uint32_t)g_log2Size[param.minCUSize] == 3) + fprintf(csvfp, ", 4x4"); + + /* detailed performance statistics */ + fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms)," + "Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks"); + } fprintf(csvfp, "\n"); } else @@ -131,7 +154,10 @@ return; const x265_frame_stats* frameStats = &pic.frameData; - fprintf(csvfp, "%d, %c-SLICE, %4d, %2.2lf, %10d, %d,", frameStats->encoderOrder, frameStats->sliceType, frameStats->poc, frameStats->qp, (int)frameStats->bits, frameStats->bScenecut); + fprintf(csvfp, "%d, %c-SLICE, %4d, %2.2lf, %10d, %d,", frameStats->encoderOrder, frameStats->sliceType, frameStats->poc, + frameStats->qp, (int)frameStats->bits, frameStats->bScenecut); + if (level >= 2) + fprintf(csvfp, "%.2f,", frameStats->ipCostRatio); if (param.rc.rateControlMode == X265_RC_CRF) fprintf(csvfp, "%.3lf,", frameStats->rateFactor); if (param.rc.vbvBufferSize) @@ -159,39 +185,76 @@ else fputs(" -,", csvfp); } - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++) - fprintf(csvfp, "%5.2lf%%, %5.2lf%%, %5.2lf%%,", frameStats->cuStats.percentIntraDistribution[depth][0], frameStats->cuStats.percentIntraDistribution[depth][1], frameStats->cuStats.percentIntraDistribution[depth][2]); - fprintf(csvfp, "%5.2lf%%", frameStats->cuStats.percentIntraNxN); - if (param.bEnableRectInter) + + if (level) { - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++) + for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++) + fprintf(csvfp, "%5.2lf%%, %5.2lf%%, %5.2lf%%,", frameStats->cuStats.percentIntraDistribution[depth][0], + frameStats->cuStats.percentIntraDistribution[depth][1], + frameStats->cuStats.percentIntraDistribution[depth][2]); + fprintf(csvfp, "%5.2lf%%", frameStats->cuStats.percentIntraNxN); + if (param.bEnableRectInter) { - fprintf(csvfp, ", %5.2lf%%, %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0], frameStats->cuStats.percentInterDistribution[depth][1]); - if (param.bEnableAMP) - fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][2]); + for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++) + { + fprintf(csvfp, ", %5.2lf%%, %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0], + frameStats->cuStats.percentInterDistribution[depth][1]); + if (param.bEnableAMP) + fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][2]); + } } + else + { + for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++) + fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0]); + } + for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++) + fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentSkipCu[depth]); + for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++) + fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentMergeCu[depth]); } - else - { - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++) - fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0]); - } - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++) - fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentSkipCu[depth]); - for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++) - fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentMergeCu[depth]); - fprintf(csvfp, ", %.2lf, %.2lf, %.2lf, %.2lf, %d, %.2lf", frameStats->avgLumaDistortion, frameStats->avgChromaDistortion, frameStats->avgPsyEnergy, frameStats->avgLumaLevel, frameStats->maxLumaLevel, frameStats->avgResEnergy); if (level >= 2) { - fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime, frameStats->wallTime, frameStats->refWaitWallTime, frameStats->totalCTUTime, frameStats->stallTime, frameStats->totalFrameTime); + fprintf(csvfp, ", %.2lf, %.2lf, %.2lf, %.2lf ", frameStats->avgLumaDistortion, + frameStats->avgChromaDistortion, + frameStats->avgPsyEnergy, + frameStats->avgResEnergy); + + fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minLumaLevel, frameStats->maxLumaLevel, frameStats->avgLumaLevel); + + if (param.internalCsp != X265_CSP_I400) + { + fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minChromaULevel, frameStats->maxChromaULevel, frameStats->avgChromaULevel); + fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minChromaVLevel, frameStats->maxChromaVLevel, frameStats->avgChromaVLevel); + } + + for (uint32_t i = 0; i < param.maxLog2CUSize - (uint32_t)g_log2Size[param.minCUSize] + 1; i++) + { + fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentIntraPu[i]); + fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentSkipPu[i]); + fprintf(csvfp, ",%.2lf%%", frameStats->puStats.percentAmpPu[i]); + for (uint32_t j = 0; j < 3; j++)
View file
x265_2.4.tar.gz/source/x265-extras.h -> x265_2.5.tar.gz/source/x265-extras.h
Changed
@@ -44,7 +44,7 @@ * closed by the caller using fclose(). If level is 0, then no frame logging * header is written to the file. This function will return NULL if it is unable * to open the file for write or if it detects a structure size skew */ -LIBAPI FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char* fname, int level); +LIBAPI FILE* x265_csvlog_open(const x265_param& param, const char* fname, int level); /* Log frame statistics to the CSV file handle. level should have been non-zero * in the call to x265_csvlog_open() if this function is called. */ @@ -53,7 +53,7 @@ /* Log final encode statistics to the CSV file handle. 'argc' and 'argv' are * intended to be command line arguments passed to the encoder. Encode * statistics should be queried from the encoder just prior to closing it. */ -LIBAPI void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, const x265_stats& stats, int level, int argc, char** argv); +LIBAPI void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, int padx, int pady, const x265_stats& stats, int level, int argc, char** argv); /* In-place downshift from a bit-depth greater than 8 to a bit-depth of 8, using * the residual bits to dither each row. */
View file
x265_2.4.tar.gz/source/x265.cpp -> x265_2.5.tar.gz/source/x265.cpp
Changed
@@ -73,15 +73,12 @@ ReconFile* recon; OutputFile* output; FILE* qpfile; - FILE* csvfpt; - const char* csvfn; const char* reconPlayCmd; const x265_api* api; x265_param* param; bool bProgress; bool bForceY4m; bool bDither; - int csvLogLevel; uint32_t seek; // number of frames to skip from the beginning uint32_t framesToBeEncoded; // number of frames to encode uint64_t totalbytes; @@ -97,8 +94,6 @@ recon = NULL; output = NULL; qpfile = NULL; - csvfpt = NULL; - csvfn = NULL; reconPlayCmd = NULL; api = NULL; param = NULL; @@ -109,7 +104,6 @@ startTime = x265_mdate(); prevUpdateTime = 0; bDither = false; - csvLogLevel = 0; } void destroy(); @@ -129,9 +123,6 @@ if (qpfile) fclose(qpfile); qpfile = NULL; - if (csvfpt) - fclose(csvfpt); - csvfpt = NULL; if (output) output->release(); output = NULL; @@ -292,8 +283,6 @@ if (0) ; OPT2("frame-skip", "seek") this->seek = (uint32_t)x265_atoi(optarg, bError); OPT("frames") this->framesToBeEncoded = (uint32_t)x265_atoi(optarg, bError); - OPT("csv") this->csvfn = optarg; - OPT("csv-log-level") this->csvLogLevel = x265_atoi(optarg, bError); OPT("no-progress") this->bProgress = false; OPT("output") outputfn = optarg; OPT("input") inputfn = optarg; @@ -530,8 +519,7 @@ * 1 - unable to parse command line * 2 - unable to open encoder * 3 - unable to generate stream headers - * 4 - encoder abort - * 5 - unable to open csv file */ + * 4 - encoder abort */ int main(int argc, char **argv) { @@ -586,28 +574,15 @@ /* get the encoder parameters post-initialization */ api->encoder_parameters(encoder, param); - if (cliopt.csvfn) - { - cliopt.csvfpt = x265_csvlog_open(*api, *param, cliopt.csvfn, cliopt.csvLogLevel); - if (!cliopt.csvfpt) - { - x265_log_file(param, X265_LOG_ERROR, "Unable to open CSV log file <%s>, aborting\n", cliopt.csvfn); - cliopt.destroy(); - if (cliopt.api) - cliopt.api->param_free(cliopt.param); - exit(5); - } - } - - /* Control-C handler */ + /* Control-C handler */ if (signal(SIGINT, sigint_handler) == SIG_ERR) x265_log(param, X265_LOG_ERROR, "Unable to register CTRL+C handler: %s\n", strerror(errno)); x265_picture pic_orig, pic_out; x265_picture *pic_in = &pic_orig; - /* Allocate recon picture if analysisMode is enabled */ + /* Allocate recon picture if analysisReuseMode is enabled */ std::priority_queue<int64_t>* pts_queue = cliopt.output->needPTS() ? new std::priority_queue<int64_t>() : NULL; - x265_picture *pic_recon = (cliopt.recon || !!param->analysisMode || pts_queue || reconPlay || cliopt.csvLogLevel) ? &pic_out : NULL; + x265_picture *pic_recon = (cliopt.recon || !!param->analysisReuseMode || pts_queue || reconPlay || param->csvLogLevel) ? &pic_out : NULL; uint32_t inFrameCount = 0; uint32_t outFrameCount = 0; x265_nal *p_nal; @@ -698,8 +673,6 @@ } cliopt.printStatus(outFrameCount); - if (numEncoded && cliopt.csvLogLevel) - x265_csvlog_frame(cliopt.csvfpt, *param, *pic_recon, cliopt.csvLogLevel); } /* Flush the encoder */ @@ -730,8 +703,6 @@ } cliopt.printStatus(outFrameCount); - if (numEncoded && cliopt.csvLogLevel) - x265_csvlog_frame(cliopt.csvfpt, *param, *pic_recon, cliopt.csvLogLevel); if (!numEncoded) break; @@ -746,8 +717,8 @@ delete reconPlay; api->encoder_get_stats(encoder, &stats, sizeof(stats)); - if (cliopt.csvfpt && !b_ctrl_c) - x265_csvlog_encode(cliopt.csvfpt, api->version_str, *param, stats, cliopt.csvLogLevel, argc, argv); + if (param->csvfn && !b_ctrl_c) + api->encoder_log(encoder, argc, argv); api->encoder_close(encoder); int64_t second_largest_pts = 0;
View file
x265_2.4.tar.gz/source/x265.h -> x265_2.5.tar.gz/source/x265.h
Changed
@@ -24,10 +24,9 @@ #ifndef X265_H #define X265_H - #include <stdint.h> +#include <stdio.h> #include "x265_config.h" - #ifdef __cplusplus extern "C" { #endif @@ -98,6 +97,7 @@ uint32_t sliceType; uint32_t numCUsInFrame; uint32_t numPartitions; + uint32_t depthBytes; int bScenecut; void* wt; void* interData; @@ -117,6 +117,20 @@ } x265_cu_stats; +/* pu statistics */ +typedef struct x265_pu_stats +{ + double percentSkipPu[4]; // Percentage of skip cu in all depths + double percentIntraPu[4]; // Percentage of intra modes in all depths + double percentAmpPu[4]; // Percentage of amp modes in all depths + double percentInterPu[4][3]; // Percentage of inter 2nx2n, 2nxn and nx2n in all depths + double percentMergePu[4][3]; // Percentage of merge 2nx2n, 2nxn and nx2n in all depth + double percentNxN; + + /* All the above values will add up to 100%. */ +} x265_pu_stats; + + typedef struct x265_analysis_2Pass { uint32_t poc; @@ -154,13 +168,41 @@ int list0POC[16]; int list1POC[16]; uint16_t maxLumaLevel; + uint16_t minLumaLevel; + + uint16_t maxChromaULevel; + uint16_t minChromaULevel; + double avgChromaULevel; + + + uint16_t maxChromaVLevel; + uint16_t minChromaVLevel; + double avgChromaVLevel; + char sliceType; int bScenecut; + double ipCostRatio; int frameLatency; x265_cu_stats cuStats; + x265_pu_stats puStats; double totalFrameTime; } x265_frame_stats; +typedef struct x265_ctu_info_t +{ + int32_t ctuAddress; + int32_t ctuPartitions[64]; + void* ctuInfo; +} x265_ctu_info_t; + +typedef enum +{ + NO_CTU_INFO = 0, + HAS_CTU_INFO = 1, + CTU_INFO_CHANGE = 2, +}CTUInfo; + + /* Arbitrary User SEI * Payload size is in bytes and the payload pointer must be non-NULL. * Payload types and syntax can be found in Annex D of the H.265 Specification. @@ -258,15 +300,15 @@ * to allow the encoder to determine base QP */ int forceqp; - /* If param.analysisMode is X265_ANALYSIS_OFF this field is ignored on input + /* If param.analysisReuseMode is X265_ANALYSIS_OFF this field is ignored on input * and output. Else the user must call x265_alloc_analysis_data() to * allocate analysis buffers for every picture passed to the encoder. * - * On input when param.analysisMode is X265_ANALYSIS_LOAD and analysisData + * On input when param.analysisReuseMode is X265_ANALYSIS_LOAD and analysisData * member pointers are valid, the encoder will use the data stored here to * reduce encoder work. * - * On output when param.analysisMode is X265_ANALYSIS_SAVE and analysisData + * On output when param.analysisReuseMode is X265_ANALYSIS_SAVE and analysisData * member pointers are valid, the encoder will write output analysis into * this data structure */ x265_analysis_data analysisData; @@ -612,7 +654,14 @@ * X265_LOG_FULL, default is X265_LOG_INFO */ int logLevel; - /* Filename of CSV log. Now deprecated */ + /* Level of csv logging. 0 is summary, 1 is frame level logging, + * 2 is frame level logging with performance statistics */ + int csvLogLevel; + + /* filename of CSV log. If csvLogLevel is non-zero, the encoder will emit + * per-slice statistics to this log file in encode order. Otherwise the + * encoder will emit per-stream statistics into the log file when + * x265_encoder_log is called (presumably at the end of the encode) */ const char* csvfn; /*== Internal Picture Specification ==*/ @@ -1057,10 +1106,10 @@ * buffers. if X265_ANALYSIS_LOAD, read analysis information into analysis * buffer and use this analysis information to reduce the amount of work * the encoder must perform. Default X265_ANALYSIS_OFF */ - int analysisMode; + int analysisReuseMode; - /* Filename for analysisMode save/load. Default name is "x265_analysis.dat" */ - const char* analysisFileName; + /* Filename for analysisReuseMode save/load. Default name is "x265_analysis.dat" */ + const char* analysisReuseFileName; /*== Rate Control ==*/ @@ -1194,6 +1243,9 @@ /* sets a hard lower limit on QP */ int qpMin; + + /* internally enable if tune grain is set */ + int bEnableConstVbv; } rc; /*== Video Usability Information ==*/ @@ -1376,9 +1428,9 @@ int bHDROpt; /* A value between 1 and 10 (both inclusive) determines the level of - * information stored/reused in save/load analysis-mode. Higher the refine - * level higher the informtion stored/reused. Default is 5 */ - int analysisRefineLevel; + * information stored/reused in save/load analysis-reuse-mode. Higher the refine + * level higher the information stored/reused. Default is 5 */ + int analysisReuseLevel; /* Limit Sample Adaptive Offset filter computation by early terminating SAO * process based on inter prediction mode, CTU spatial-domain correlations, @@ -1391,7 +1443,44 @@ /* Insert tone mapping information only for IDR frames and when the * tone mapping information changes. */ int bDhdr10opt; + + /* Determine how x265 react to the content information recieved through the API */ + int bCTUInfo; + + /* Use ratecontrol statistics from pic_in, if available*/ + int bUseRcStats; + + /* Factor by which input video is scaled down for analysis save mode. Default is 0 */ + int scaleFactor; + + /* Enable intra refinement in load mode*/ + int intraRefine; + + /* Enable inter refinement in load mode*/ + int interRefine; + + /* Enable motion vector refinement in load mode*/ + int mvRefine; + + /* Log of maximum CTU size */ + uint32_t maxLog2CUSize; + + /* Actual CU depth with respect to config depth */ + uint32_t maxCUDepth; + + /* CU depth with respect to maximum transform size */ + uint32_t unitSizeDepth; + + /* Number of 4x4 units in maximum CU size */ + uint32_t num4x4Partitions; + + /* Specify if analysis mode uses file for data reuse */ + int bUseAnalysisFile; + + /* File pointer for csv log */ + FILE* csvfpt; } x265_param; + /* x265_param_alloc: * Allocates an x265_param instance. The returned param structure is not * special in any way, but using this method together with x265_param_free()
View file
x265_2.4.tar.gz/source/x265cli.h -> x265_2.5.tar.gz/source/x265cli.h
Changed
@@ -122,6 +122,7 @@ { "scenecut", required_argument, NULL, 0 }, { "no-scenecut", no_argument, NULL, 0 }, { "scenecut-bias", required_argument, NULL, 0 }, + { "ctu-info", required_argument, NULL, 0 }, { "intra-refresh", no_argument, NULL, 0 }, { "rc-lookahead", required_argument, NULL, 0 }, { "lookahead-slices", required_argument, NULL, 0 }, @@ -158,6 +159,8 @@ { "qpstep", required_argument, NULL, 0 }, { "qpmin", required_argument, NULL, 0 }, { "qpmax", required_argument, NULL, 0 }, + { "const-vbv", no_argument, NULL, 0 }, + { "no-const-vbv", no_argument, NULL, 0 }, { "ratetol", required_argument, NULL, 0 }, { "cplxblur", required_argument, NULL, 0 }, { "qblur", required_argument, NULL, 0 }, @@ -247,9 +250,13 @@ { "no-slow-firstpass", no_argument, NULL, 0 }, { "multi-pass-opt-rps", no_argument, NULL, 0 }, { "no-multi-pass-opt-rps", no_argument, NULL, 0 }, - { "analysis-mode", required_argument, NULL, 0 }, - { "analysis-file", required_argument, NULL, 0 }, - { "refine-level", required_argument, NULL, 0 }, + { "analysis-reuse-mode", required_argument, NULL, 0 }, + { "analysis-reuse-file", required_argument, NULL, 0 }, + { "analysis-reuse-level", required_argument, NULL, 0 }, + { "scale-factor", required_argument, NULL, 0 }, + { "refine-intra", required_argument, NULL, 0 }, + { "refine-inter", no_argument, NULL, 0 }, + { "no-refine-inter",no_argument, NULL, 0 }, { "strict-cbr", no_argument, NULL, 0 }, { "temporal-layers", no_argument, NULL, 0 }, { "no-temporal-layers", no_argument, NULL, 0 }, @@ -271,6 +278,8 @@ { "dhdr10-info", required_argument, NULL, 0 }, { "dhdr10-opt", no_argument, NULL, 0}, { "no-dhdr10-opt", no_argument, NULL, 0}, + { "refine-mv", no_argument, NULL, 0 }, + { "no-refine-mv", no_argument, NULL, 0 }, { 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 0, 0, 0 }, @@ -316,9 +325,9 @@ H1(" 1 - i420 (4:2:0 default)\n"); H1(" 2 - i422 (4:2:2)\n"); H1(" 3 - i444 (4:4:4)\n"); -#if ENABLE_DYNAMIC_HDR10 - H0(" --dhdr10-info <filename> JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping \n"); - H0(" --[no-]dhdr10-opt Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled"); +#if ENABLE_HDR10_PLUS + H0(" --dhdr10-info <filename> JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping\n"); + H0(" --[no-]dhdr10-opt Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled\n"); #endif H0("-f/--frames <integer> Maximum number of frames to encode. Default all\n"); H0(" --seek <integer> First frame to encode\n"); @@ -367,6 +376,11 @@ H1(" --[no-]tskip-fast Enable fast intra transform skipping. Default %s\n", OPT(param->bEnableTSkipFast)); H1(" --nr-intra <integer> An integer value in range of 0 to 2000, which denotes strength of noise reduction in intra CUs. Default 0\n"); H1(" --nr-inter <integer> An integer value in range of 0 to 2000, which denotes strength of noise reduction in inter CUs. Default 0\n"); + H0(" --ctu-info <integer> Enable receiving ctu information asynchronously and determine reaction to the CTU information (0, 1, 2, 4, 6) Default 0\n" + " - 1: force the partitions if CTU information is present\n" + " - 2: functionality of (1) and reduce qp if CTU information has changed\n" + " - 4: functionality of (1) and force Inter modes when CTU Information has changed, merge/skip otherwise\n" + " Enable this option only when planning to invoke the API function x265_encoder_ctu_info to copy ctu-info asynchronously\n"); H0("\nCoding tools:\n"); H0("-w/--[no-]weightp Enable weighted prediction in P slices. Default %s\n", OPT(param->bEnableWeightedPred)); H0(" --[no-]weightb Enable weighted prediction in B slices. Default %s\n", OPT(param->bEnableWeightedBiPred)); @@ -431,9 +445,13 @@ H0(" --[no-]analyze-src-pics Motion estimation uses source frame planes. Default disable\n"); H0(" --[no-]slow-firstpass Enable a slow first pass in a multipass rate control mode. Default %s\n", OPT(param->rc.bEnableSlowFirstPass)); H0(" --[no-]strict-cbr Enable stricter conditions and tolerance for bitrate deviations in CBR mode. Default %s\n", OPT(param->rc.bStrictCbr)); - H0(" --analysis-mode <string|int> save - Dump analysis info into file, load - Load analysis buffers from the file. Default %d\n", param->analysisMode); - H0(" --analysis-file <filename> Specify file name used for either dumping or reading analysis data.\n"); - H0(" --refine-level <1..10> Level of analysis refinement indicates amount of info stored/reused in save/load mode, 1:least....10:most. Default %d\n", param->analysisRefineLevel); + H0(" --analysis-reuse-mode <string|int> save - Dump analysis info into file, load - Load analysis buffers from the file. Default %d\n", param->analysisReuseMode); + H0(" --analysis-reuse-file <filename> Specify file name used for either dumping or reading analysis data. Deault x265_analysis.dat\n"); + H0(" --analysis-reuse-level <1..10> Level of analysis reuse indicates amount of info stored/reused in save/load mode, 1:least..10:most. Default %d\n", param->analysisReuseLevel); + H0(" --scale-factor <int> Specify factor by which input video is scaled down for analysis save mode. Default %d\n", param->scaleFactor); + H0(" --refine-intra <int> Enable intra refinement for load mode. Default %d\n", param->intraRefine); + H0(" --[no-]refine-inter Enable inter refinement for load mode. Default %s\n", OPT(param->interRefine)); + H0(" --[no-]refine-mv Enable mv refinement for load mode. Default %s\n", OPT(param->mvRefine)); H0(" --aq-mode <integer> Mode for Adaptive Quantization - 0:none 1:uniform AQ 2:auto variance 3:auto variance with bias to dark scenes. Default %d\n", param->rc.aqMode); H0(" --aq-strength <float> Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength); H0(" --[no-]aq-motion Adaptive Quantization based on the relative motion of each CU w.r.t., frame. Default %s\n", OPT(param->bOptCUDeltaQP)); @@ -446,6 +464,7 @@ H1(" --qpstep <integer> The maximum single adjustment in QP allowed to rate control. Default %d\n", param->rc.qpStep); H1(" --qpmin <integer> sets a hard lower limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMin); H1(" --qpmax <integer> sets a hard upper limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMax); + H0(" --[no-]const-vbv Enable consistent vbv. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableConstVbv)); H1(" --cbqpoffs <integer> Chroma Cb QP Offset [-12..12]. Default %d\n", param->cbQpOffset); H1(" --crqpoffs <integer> Chroma Cr QP Offset [-12..12]. Default %d\n", param->crQpOffset); H1(" --scaling-list <string> Specify a file containing HM style quant scaling lists or 'default' or 'off'. Default: off\n");
Locations
Projects
Search
Status Monitor
Help
Open Build Service
OBS Manuals
API Documentation
OBS Portal
Reporting a Bug
Contact
Mailing List
Forums
Chat (IRC)
Twitter
Open Build Service (OBS)
is an
openSUSE project
.