`(19) World Intellectual Property
`Organization
`International Bureau
`
`\=
`
`(43) International Publication Date
`20 July 2017 (20.07.2017)
`
`WIPO!IPCT
`
`GD)
`
`International Patent Classification:
`HOAN 19/105 (2014.01)
`HOAN 19/119 (2014.01)
`HOAN 19/50 (2014.01)
`HOAN 19/172 (2014.01)
`IOAN 19/176 (2014.01)
`HOAN 19/136 (2014.01)
`
`(21)
`
`International Application Number:
`
`PCT/US2017/013485
`
`(74)
`
`(81)
`
`(10) International Publication Number
`WO 2017/123980 Al
`
`Agent: VREDEVELD, Albert W.; Shumaker & Sieffert,
`P.A., 1625 Radio Drive, Suite 100, Woodbury, Minnesota
`55125 (US).
`
`Designated States (unless otherwise indicated, for every
`kind of national protection available): AE, AG, AL, AM,
`AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY,
`BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM,
`DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT,
`HIN, HR, HU, ID, IL, IN, IR, IS, JP, KE, KG, KH, KN,
`KP, KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA,
`MD, ME, MG, MK, MN, MW, Mx, MY, MZ, NA, NG,
`NI, NO, NZ, OM, PA, PE, PG, PH, PL, PT, QA, RO, RS,
`RU, RW, SA, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY,
`TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN,
`ZA, ZM, ZW.
`
`(22)
`
`International Filing Date:
`
`13 January 2017 (13.01.2017)
`
`(25)
`
`(26)
`
`(30)
`
`(71)
`
`(72)
`
`Filing Language:
`
`Publication Language:
`
`English
`
`English
`
`Priority Data:
`62/279,233
`15/404,634
`
`15 January 2016 (15.01.2016)
`12 January 2017 (12.01.2017)
`
`US
`US
`
`Applicant: QUALCOMM INCORPORATED [US/US];
`ATTN: International IP Administration, 5775 Morehouse
`Drive, San Diego, California 92121-1714 CUS).
`
`Inventors: LI, Xiang; 5775 Morehouse Drive, San Diego,
`California 92121-1714 (US). ZHANG, Li; 5775 More-
`house Drive, San Diego, California 92121-1714 (US).
`CHIEN, Wei-Jung; 5775 Morehouse Drive, San Diego,
`California 92121-1714 (US). CHEN, Jianle; 5775 More-
`house Drive, San Dicgo, California 92121-1714 (US).
`ZHAO, Xin; 5775 Morehouse Drive, San Diego, Califor-
`nia 92121-1714 (US). KARCZEWICZ, Marta; 5775
`Morehouse Drive, San Diego, California 92121-1714 (US).
`
`(84)
`
`Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ,
`TZ, UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU,
`TJ, TM), European (AL, AT, BE, BG, CH, CY, CZ, DE,
`DK,EE, ES, FI, FR, GB, GR, HR, HU,IE, IS, IT, LT, LU,
`LV, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK,
`SM, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ,
`GW, KM, ML, MR,NE, SN, TD, TG).
`Published:
`
`with international search report (Art. 21(3))
`
`(54) Title: MULTI-TYPE-TREE FRAMEWORK FOR VIDEO CODING
`
`110 TRANSFORM
`
`VIDEO DATA
`MEMORY
`101
`
`VIDEO DATA
`
`PREDICTION
`PROCESSING UNIT
`100
`
`INTER-
`PREDICTION
`PROCESSING
`UNIT
`120
`INTRA-
`PREDICTION
`PROCESSING
`UNIT
`126
`
`FIG. 8
`
`FILTER UNIT
`114
`
`DEGODED
`PICTURE
`BUFFER
`116
`
`VIDEO ENCODER
`
`PROCESSING
`UNIT
`104
`
`QUANTIZATION
`UNIT
`106
`
`SYNTAX ELEMENTS
`
`INVERSE
`TRANSFORM
`PROCESSING
`UNIT
`
`INVERSE
`QUANTIZATION
`UNIT
`108
`
`ENTROPY
`ENCODING
`UNIT
`1s
`
`BITSTREAM
`
`(57) Abstract: A method of decoding video data including receiving a bitstream that includes a sequence ofbits that forms a repres-
`entation ofa coded picture of the video data, partitioning the codedpicture of the video data into a plurality of blocks using three or
`more different partition structures, and reconstructing the plurality of blocks ofthe coded picture ofthe video data. Partitioning the
`coded picture of the video data may include partitioning the coded picture of the video data into the plurality of blocks using the
`three or more different partition structures, wherein at least three of the three or more different partition structures may be used at
`each depth ofa tree structure that represents how a particular block of the coded picture of the video data is partitioned.
`
`
`
`wo2017/123980A1IIITNMIIMTANATNTNATA
`
`
`
`WO 2017/123980
`
`PCT/US82017/013485
`
`MULTI-TYPE-TREE FRAMEWORK FOR VIDEO CODING
`
`[0001] This application claims the benefit of U.S. Provisional Application No.
`
`62/279,233, filed January 15, 2016, the entire content of which is incorporated by
`
`reference herein.
`
`[0002] This disclosure relates to video encoding and video decoding.
`
`TECHNICAL FIELD
`
`BACKGROUND
`
`[0003] Digital video capabilities can be incorporated into a wide range of devices,
`
`including digital televisions, digital direct broadcast systems, wireless broadcast
`
`systems, personal digital assistants (PDAs), laptop or desktop computers, tablet
`
`computers, e-book readers, digital cameras, digital recording devices, digital media
`
`players, video gaming devices, video game consoles, cellular or satellite radio
`
`telephones, so-called “smart phones,” video teleconferencing devices, video streaming
`
`devices, and the like. Digital video devices implement video coding techniques, such as
`
`those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T
`
`H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), the High Efficiency Video
`
`Coding (HEVC)standard, and extensions of such standards. The video devices may
`
`transmit, receive, encode, decode, and/or store digital video information more
`
`efficiently by implementing such video coding techniques.
`
`[0004] Video coding techniques include spatial (intra-picture) prediction and/or
`
`temporal (inter-picture) prediction to reduce or remove redundancyinherent in video
`
`sequences. For block-based video coding, a video slice (e.g., a video picture/frame or a
`
`portion of a video picture) may be partitioned into video blocks, which may also be
`
`referred to as treeblocks, coding units (CUs) and/or coding nodes. Pictures may be
`
`referred to as frames. Reference pictures may be referred to as reference frames.
`
`[0005] Spatial or temporal prediction results in a predictive block for a block to be
`
`coded. Residual data represents pixel differences between the original block to be
`
`coded and the predictive block. For further compression, the residual data may be
`
`transformed from the pixel domain to a transform domain, resulting in residual
`
`transform coefficients, which then may be quantized. Entropy coding may be applied to
`
`achieve even more compression.
`
`
`
`WO 2017/123980
`
`PCT/US2017/013485
`
`SUMMARY
`
`[0006] This disclosure describes techniques for partitioning blocks of video data using a
`
`multi-type-tree (MTT) framework. The techniquesof this disclosure include
`
`determining oneof a plurality of partitioning techniques at various nodesof a tree
`
`structure. Examples of the plurality of partitioning techniques may include partitioning
`
`techniques that symmetrically split a block through the center of the block, as well as
`
`partitioning techniques that split a block, either symmetrically or asymmetrically, such
`
`that the center of the block is not split. In this way, the partitioning of video blocks can
`
`be performed in a mannerthat leads to more efficient coding, including partitioning that
`
`better captures objects in the video data that are in the center of blocks.
`
`[0007] This disclosure further describes techniques for signaling syntax elements that
`
`indicate how a particular picture of video data is partitioned. Block partitioning
`
`generally describes how a picture of video data is divided, and sub-divided, into blocks
`
`of various sizes. A video decoder may use such syntax elements to reconstruct the
`
`block partitioning. Other examples of the disclosure are directed to performing
`
`transforms on blocks of video data that were partitioned using the MTTpartitioning
`
`techniques of this disclosure.
`
`[0008] In one example of the disclosure, a method of decoding video data comprises
`
`receiving a bitstream that includes a sequence of bits that forms a representation of a
`
`coded picture of the video data, determining a partitioning the coded picture of the video
`
`data into a plurality of blocks using three or more different partition structures, and
`
`reconstructing the plurality of blocks of the frame of video data.
`
`[0009] In another example of the disclosure, a method of encoding video data comprises
`
`receiving a picture of the video data, partitioning the picture of the video data into a
`
`plurality of blocks using three or more different partition structures, and encoding the
`
`plurality of blocks of the picture of the video data.
`
`[0010] In another example of the disclosure, an apparatus configured to decode video
`
`data comprises a memory configured to store the video data, and video decoding
`
`circuitry configured to receive a bitstream that includes a sequenceofbits that forms a
`
`representation of the picture of the video data, determine a partitioning of the coded
`
`picture of the video data into a plurality of blocks using three or more different partition
`
`structures, and reconstructthe plurality of blocks of the frame of video data.
`
`
`
`WO 2017/123980
`
`PCT/US2017/013485
`
`[0011] In another example of the disclosure, an apparatus configured to decode video
`
`data, comprises meansfor receiving a bitstream that includes a sequenceofbits that
`
`forms a coded picture of the video data, means for determining a partitioning of the
`
`coded picture of the video data into a plurality of blocks using three or more different
`
`partition structures, and meansfor reconstructing the plurality of blocks of the frame of
`
`video data.
`
`[0012] The details of one or more examplesare set forth in the accompanying drawings
`
`and the description below. Other features, objects, and advantages will be apparent
`
`from the description, drawings, and claims.
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`[0013] FIG. 1 is a block diagram illustrating an example video encoding and decoding
`
`system configured to implement techniques of the disclosure.
`
`[0014] FIG. 2 is a conceptual diagram illustrating coding unit (CU) structure in High
`
`Efficiency Video Coding (HEVC).
`
`[0015] FIG. 3 is a conceptual diagram illustrating example partition types for an inter
`
`prediction mode.
`
`[0016] FIG. 4Ais a conceptual diagram illustrating an example of block partitioning
`
`using a quad-tree-binary-tree (QTBT)structure.
`
`[0017] FIG. 4B is a conceptual diagram illustrating an example tree structure
`
`correspondingto the block partitioning using the QTBT structure of FIG. 4A.
`
`[0018] FIG. 5A is a conceptual diagram illustrating example horizontal triple-tree
`
`partition types.
`
`[0019] FIG. 5B is a conceptual diagram illustrating example horizontal triple-tree
`
`partition types.
`
`[0020] FIG. 6Ais a conceptual diagram illustrating quad-tree partitioning.
`
`[0021] FIG. 6B is a conceptual diagram illustrating vertical binary-tree partitioning.
`
`[0022] FIG. 6C is a conceptual diagram illustrating horizontal binary-tree partitioning.
`
`[0023] FIG. 6D is a conceptual diagram illustrating vertical center-side tree partitioning.
`
`[0024] FIG. 6E is a conceptual diagram illustrating horizontal center-side tree
`
`partitioning.
`
`[0025] FIG. 7 is a conceptual diagram illustrating an example of coding tree unit (CTU)
`
`partitioning according to the techniques ofthis disclosure.
`
`[0026] FIG. 8 is a block diagram illustrating an example of a video encoder.
`
`
`
`WO 2017/123980
`
`PCT/US2017/013485
`
`[0027] FIG. 9 is a block diagram illustrating an example of a video decoder.
`
`[0028] FIG. 10A is a flowchart illustrating an example operation of a video encoder, in
`
`accordance with a technique ofthis disclosure.
`
`[0029] FIG. 10B is a flowchart illustrating an example operation of a video decoder, in
`
`accordance with a technique ofthis disclosure.
`
`[0030] FIG. 11 is a flowchart illustrating an example operation of a video encoder, in
`
`accordance with another example technique ofthis disclosure.
`
`[0031] FIG. 12 is a flowchart illustrating an example operation of a video decoder, in
`
`accordance with another example technique ofthis disclosure.
`
`DETAILED DESCRIPTION
`
`[0032] This disclosure is related to the partitioning and/or organization of blocks of
`
`video data (e.g., coding units) in block-based video coding. The techniques ofthis
`
`disclosure may be applied in video coding standards. In various examples described
`
`below, the techniques of this disclosure include partitioning blocks of video data using
`
`three or more different partitioning structures. In some examples, three or more
`
`different partition structures may be used at each depth of a coding tree structure. Such
`
`partitioning techniques may be referred to as multi-type-tree (MTT) partitioning. By
`
`using MTT partitioning, video data may be moreflexibly partitioned, thus allowing for
`
`greater coding efficiency.
`
`[0033] FIG. 1 is a block diagram illustrating an example video encoding and decoding
`
`system 10 that may utilize techniques of this disclosure for partitioning blocks of video
`
`data, signaling and parsing partition types, and applying transforms and further
`
`transform partitions. As shown in FIG. 1, system 10 includes a source device 12 that
`
`provides encoded video data to be decodedat a later time by a destination device 14.
`
`In
`
`particular, source device 12 provides the video data to destination device 14 via a
`
`computer-readable medium 16. Source device 12 and destination device 14 may
`
`comprise any of a wide range of devices, including desktop computers, notebook(i.e.,
`
`laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called
`
`“smart” phones, tablet computers, televisions, cameras, display devices, digital media
`
`players, video gaming consoles, video streaming device, or the like.
`
`In some cases,
`
`source device 12 and destination device 14 may be equippedfor wireless
`
`communication. Thus, source device 12 and destination device 14 may be wireless
`
`communication devices. Source device 12 is an example video encoding device(i.e., a
`
`
`
`WO 2017/123980
`
`PCT/US2017/013485
`
`device for encoding video data). Destination device 14 is an example video decoding
`
`device (e.g., a device or apparatus for decoding video data).
`
`[0034] In the example of FIG. 1, source device 12 includes a video source 18, a storage
`
`media 20 configured to store video data, a video encoder 22, and an output interface 24.
`
`Destination device 14 includes an input interface 26, a storage medium 28 configured to
`
`store encoded video data, a video decoder 30, and display device 32. In other examples,
`
`source device 12 and destination device 14 include other components or arrangements.
`
`For example, source device 12 may receive video data from an external video source,
`
`such as an external camera. Likewise, destination device 14 may interface with an
`
`external display device, rather than including an integrated display device.
`
`[0035] Theillustrated system 10 of FIG. 1 is merely one example. Techniques for
`
`processing video data may be performed by any digital video encoding and/or decoding
`
`device or apparatus. Although generally the techniques of this disclosure are performed
`
`by a video encoding device and a video decoding device, the techniques may also be
`
`performed by a combined video encoder/decoder, typically referred to as a “CODEC.”
`
`Source device 12 and destination device 14 are merely examples of such coding devices
`
`in which source device 12 generates encoded video data for transmission to destination
`
`device 14.
`
`In some examples, source device 12 and destination device 14 operate in a
`
`substantially symmetrical manner such that each of source device 12 and destination
`
`device 14 include video encoding and decoding components. Hence, system 10 may
`
`support one-way or two-way video transmission between source device 12 and
`
`destination device 14, e.g., for video streaming, video playback, video broadcasting, or
`
`video telephony.
`
`[0036] Video source 18 of source device 12 may include a video capture device, such as
`
`a video camera, a video archive containing previously captured video, and/or a video
`
`feed interface to receive video data from a video content provider. As a further
`
`alternative, video source 18 may generate computer graphics-based data as the source
`
`video, or a combination oflive video, archived video, and computer-generated video.
`
`Source device 12 may comprise one or more data storage media(e.g., storage media 20)
`
`configured to store the video data. The techniques described in this disclosure may be
`
`applicable to video coding in general, and may be applied to wireless and/or wired
`
`applications. In each case, the captured, pre-captured, or computer-generated video may
`
`be encoded by video encoder 22. Output interface 24 may output the encoded video
`
`information to computer-readable medium 16.
`
`
`
`WO 2017/123980
`
`PCT/US2017/013485
`
`[0037] Destination device 14 may receive the encoded video data to be decoded via
`
`computer-readable medium 16. Computer-readable medium 16 may comprise any type
`
`of medium or device capable of moving the encoded video data from source device 12
`
`to destination device 14. In some examples, computer-readable medium 16 comprises a
`
`communication medium to enable source device 12 to transmit encoded video data
`
`directly to destination device 14 in real-time. The encoded video data may be
`
`modulated according to a communication standard, such as a wireless communication
`
`protocol, and transmitted to destination device 14. The communication medium may
`
`comprise any wireless or wired communication medium, such as a radio frequency (RF)
`
`spectrum or one or more physical transmission lines. The communication medium may
`
`form part of a packet-based network, such as a local area network, a wide-area network,
`
`or a global network such as the Internet. The communication medium may include
`
`routers, switches, base stations, or any other equipment that may be usefulto facilitate
`
`communication from source device 12 to destination device 14. Destination device 14
`
`may comprise one or more data storage media configured to store encoded video data
`
`and decoded videodata.
`
`[0038] In some examples, encoded data (e.g., encoded video data) may be output from
`
`output interface 24 to a storage device. Similarly, encoded data may be accessed from
`
`the storage device by input interface 26. The storage device may include any of a
`
`variety of distributed or locally accessed data storage media such as a hard drive, Blu-
`
`ray discs, DVDs, CD-ROMs,flash memory,volatile or non-volatile memory, or any
`
`other suitable digital storage media for storing encoded video data. In a further
`
`example, the storage device may correspondto a file server or another intermediate
`
`storage device that may store the encoded video generated by source device 12.
`
`Destination device 14 may access stored video data from the storage device via
`
`streaming or download. The file server may be any type of server capable of storing
`
`encoded video data and transmitting that encoded video data to the destination device
`
`14. Example file servers include a web server(e.g., for a website), an FTP server,
`
`network attached storage (NAS) devices, or a local disk drive. Destination device 14
`
`may access the encoded video data through any standard data connection, including an
`
`Internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a
`
`wired connection (e.g., DSL, cable modem, etc.), or a combination of both thatis
`
`suitable for accessing encoded video data stored on a file server. The transmission of
`
`
`
`WO 2017/123980
`
`PCT/US2017/013485
`
`encoded video data from the storage device may be a streaming transmission, a
`
`download transmission, or a combination thereof.
`
`[0039] The techniquesof this disclosure may be applied to video coding in support of
`
`any of a variety of multimedia applications, such as over-the-air television broadcasts,
`
`cable television transmissions, satellite television transmissions, Internet streaming
`
`video transmissions, such as dynamic adaptive streaming over HTTP (DASH),digital
`
`video that is encoded onto a data storage medium, decoding of digital video stored on a
`
`data storage medium, or other applications. In some examples, system 10 may be
`
`configured to support one-way or two-wayvideo transmission to support applications
`
`such as video streaming, video playback, video broadcasting, and/or video telephony.
`
`[0040] Computer-readable medium 16 may include transient media, such as a wireless
`
`broadcast or wired network transmission, or storage media (that is, non-transitory
`
`storage media), such as a hard disk, flash drive, compactdisc, digital video disc, Blu-ray
`
`disc, or other computer-readable media. In some examples, a network server (not
`
`shown) may receive encoded video data from source device 12 and provide the encoded
`
`video data to destination device 14, e.g., via network transmission. Similarly, a
`
`computing device of a medium production facility, such as a disc stamping facility, may
`
`receive encoded video data from source device 12 and produce a disc containing the
`
`encoded video data. Therefore, computer-readable medium 16 may be understoodto
`
`include one or more computer-readable media of various forms, in various examples.
`
`[0041] Input interface 26 of destination device 14 receives information from computer-
`
`readable medium 16. The information of computer-readable medium 16 may include
`
`syntax information defined by video encoder 22 of video encoder 22, which is also used
`
`by video decoder30, that includes syntax elements that describe characteristics and/or
`
`processing ofblocks and other codedunits, e.g., groups of pictures (GOPs). Storage
`
`media 28 may store encoded video data received by input interface 26. Display device
`
`32 displays the decoded video data to a user. Display device 32 may comprise any of a
`
`variety of display devices such as a cathode ray tube (CRT), a liquid crystal display
`
`(LCD), a plasmadisplay, an organic light emitting diode (OLED)display, or another
`
`type of display device.
`
`[0042] Video encoder 22 and video decoder 30 each may be implemented as any of a
`
`variety of suitable encoder or decodercircuitry, such as one or more microprocessors,
`
`digital signal processors (DSPs), application specific integrated circuits (ASICs), field
`
`programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any
`
`
`
`WO 2017/123980
`
`PCT/US2017/013485
`
`combinations thereof. When the techniques are implementedpartially in software, a
`
`device may store instructions for the software in a suitable, non-transitory computer-
`
`readable medium and may execute the instructions in hardware using one or more
`
`processors to perform the techniques of this disclosure. Each of video encoder 22 and
`
`video decoder 30 may be included in one or more encodersor decoders, either of which
`
`may be integrated as part of a combined encoder/decoder (CODEC)in a respective
`
`device.
`
`[0043] In some examples, video encoder 22 and video decoder 30 may operate
`
`according to a video coding standard. Example video coding standards include, but are
`
`not limited to, ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC
`
`MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also
`
`known as ISO/TEC MPEG-4 AVC), including its Scalable Video Coding (SVC) and
`
`Multi-View Video Coding (MVC) extensions. The video coding standard High
`
`Efficiency Video Coding (HEVC) or ITU-T H.265, including its range and screen
`
`content coding extensions, 3D video coding (3D-HEVC) and multiview extensions
`
`(MV-HEVC)and scalable extension (SHVC), has been developed by the Joint
`
`Collaboration Team on Video Coding (JCT-VC) of ITU-T Video Coding Experts Group
`
`(VCEG)and ISO/IEC Motion Picture Experts Group (MPEG).
`
`[0044] In HEVCandother video coding specifications, a video sequence typically
`
`includes a series of pictures. Pictures may also be referred to as “frames.” A picture
`
`may include three sample arrays, denoted SL, Scv, and Scr. Sz is a two-dimensional
`
`array (i.e., a block) of luma samples. Sc» is a two-dimensional array of Cb chrominance
`
`samples. Sc: is a two-dimensional array of Cr chrominance samples. Chrominance
`
`samples may also be referred to herein as “chroma” samples. In other instances, a
`
`picture may be monochrome and mayonly include an array of luma samples.
`
`[0045] Furthermore, in HEVC and other video coding specifications, to generate an
`
`encoded representation of a picture, video encoder 22 may generate a set of coding tree
`
`units (CTUs). Each of the CTUs may comprise a coding tree block of luma samples,
`
`two corresponding coding tree blocks of chroma samples, and syntax structures used to
`
`code the samples of the coding tree blocks. In monochromepictures or pictures having
`
`three separate color planes, a CTU may comprise a single coding tree block and syntax
`
`structures used to code the samples of the coding tree block. A coding tree block may
`
`be an NxN block of samples. A CTU mayalso be referred to as a “tree block” or a
`
`“largest coding unit” (LCU). The CTUs of HEVC maybe broadly analogousto the
`
`
`
`WO 2017/123980
`
`PCT/US2017/013485
`
`macroblocks of other standards, such as H.264/AVC. However, a CTU is not
`
`necessarily limited to a particular size and mayinclude one or more coding units (CUs).
`
`A slice may include an integer number of CTUs ordered consecutively in a raster scan
`
`order.
`
`[0046] If operating according to HEVC, to generate a coded CTU, video encoder 22
`
`may recursively perform quad-tree partitioning on the coding tree blocks of a CTU to
`
`divide the coding tree blocks into coding blocks, hence the name “coding tree units.” A
`
`coding block is an NxN block of samples. A CU may comprise a coding block of luma
`
`samples and two corresponding coding blocks of chroma samplesof a picture that has a
`
`luma sample array, a Cb sample array, and a Cr sample array, and syntax structures used
`
`to code the samples of the coding blocks. In monochromepicturesor pictures having
`
`three separate color planes, a CU may comprise a single coding block and syntax
`
`structures used to code the samples of the coding block.
`
`[0047] Syntax data within a bitstream may also define a size for the CTU. A slice
`
`includes a number of consecutive CTUs in coding order. A video frame or picture may
`
`be partitioned into one or more slices. As mentioned above, each tree block may be
`
`split into coding units (CUs) according to a quad-tree.
`
`In general, a quad-tree data
`
`structure includes one node per CU, with a root node corresponding to the treeblock. If
`
`a CU is split into four sub-CUs, the node corresponding to the CU includes four leaf
`
`nodes, each of which correspondsto one of the sub-CUs.
`
`[0048] Each node of the quadtree data structure may provide syntax data for the
`
`corresponding CU. For example, a node in the quadtree may include a split flag,
`
`indicating whether the CU correspondingto the nodeis split into sub-CUs. Syntax
`
`elements for a CU may be defined recursively, and may depend on whether the CUis
`
`split into sub-CUs. Ifa CU is not split further, it is referred as a leaf-CU. Ifa block of
`
`CU is split further, it may be generally referred to as a non-leaf-CU. In some examples
`
`of this disclosure, four sub-CUsof a leaf-CU maybe referred to as leaf-CUs even if
`
`there is no explicit splitting of the original leaf-CU. For example, if a CU at 16x16 size
`
`is not split further, the four 8x8 sub-CUs mayalso be referred to as leaf-CUsalthough
`
`the 16x16 CU wasneversplit.
`
`[0049] A CU hasa similar purpose as a macroblock of the H.264 standard, except that a
`
`CU does not have a size distinction. For example, a tree block may be split into four
`
`child nodes(also referred to as sub-CUs), and each child node mayin turn be a parent
`
`node andbe split into another four child nodes. A final, unsplit child node, referred to
`
`
`
`WO 2017/123980
`
`PCT/US2017/013485
`
`as a leaf node of the quadtree, comprises a coding node, also referred to as a leaf-CU.
`
`Syntax data associated with a codedbitstream may define a maximum numberof times
`
`a tree block may be split, referred to as a maximum CU depth, and may also define a
`
`minimum size of the coding nodes. Accordingly, a bitstream may also define a smallest
`
`coding unit (SCU). This disclosure uses the term “block” to refer to any of a CU, PU,
`
`or TU, in the context of HEVC, or similar data structures in the context of other
`
`standards (e.g., macroblocks and sub-blocks thereof in H.264/AVC).
`
`[0050] A CU includes a coding node as well as prediction units (PUs) and transform
`
`units (TUs) associated with the coding node. A size of the CU corresponds to a size of
`
`the coding node and maybe, in some examples, square in shape. In the example of
`
`HEVC,the size of the CU may range from 8x8 pixels up to the size of the tree block
`
`with a maximum of 64x64 pixels or greater. Each CU may contain one or more PUs
`
`and one or more TUs. Syntax data associated with a CU may describe, for example,
`
`partitioning of the CU into one or more PUs. Partitioning modes maydiffer between
`
`whether the CU is skip or direct mode encoded, intra-prediction mode encoded, or inter-
`
`prediction mode encoded. PUs maybe partitioned to be non-square in shape. Syntax
`
`data associated with a CU mayalso describe, for example, partitioning of the CU into
`
`one or more TUsaccording to a quadtree. A TU can be square or non-square(e.g.,
`
`rectangular) in shape.
`
`[0051] The HEVCstandard allowsfor transformations according to TUs. The TUs may
`
`be different for different CUs. The TUsare typically sized based on the size of PUs
`
`within a given CU defined for a partitioned LCU, although this may not always be the
`
`case. The TUsare typically the same size or smaller than the PUs. In some examples,
`
`residual samples corresponding to a CU maybe subdivided into smaller units using a
`
`quad-tree structure, sometimes called a "residual quad tree" (RQT). The leaf nodes of
`
`the RQT maybe referred to as TUs. Pixel difference values associated with the TUs
`
`may be transformed to produce transform coefficients, which may be quantized.
`
`[0052] A leaf-CU mayinclude one or more PUs.
`
`In general, a PU represents a spatial
`
`area correspondingto all or a portion of the corresponding CU, and may include data for
`
`retrieving a reference sample for the PU. Moreover, a PU includesdata related to
`
`prediction. For example, when the PU is intra-mode encoded, data for the PU may be
`
`included in a RQT, which may include data describing an intra-prediction modefor a
`
`TU corresponding to the PU. As another example, when the PU 1s inter-mode encoded,
`
`the PU may include data defining one or more motion vectors for the PU. The data
`
`
`
`WO 2017/123980
`
`PCT/US2017/013485
`
`defining the motion vector for a PU may describe, for example, a horizontal component
`
`of the motion vector, a vertical component of the motion vector, a resolution for the
`
`motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a
`
`reference picture to which the motion vector points, and/or a reference picturelist (e.g.,
`
`List 0, List 1, or List C) for the motion vector.
`
`[0053] A leaf-CU having one or more PUs mayalso include one or more TUs. The TUs
`
`may be specified using an RQT(also referred to as a TU quad-tree structure), as
`
`discussed above. For example, a split flag may indicate whether a leaf-CU is split into
`
`four transform units. In some examples, each transform unit maybe split further into
`
`further sub-TUs. When a TU is not split further, it may be referred to as a leaf-TU.
`
`Generally, for intra coding,all the leaf-TUs belongingto a leaf-CU contain residual data
`
`produced from the sameintra prediction mode. Thatis, the sameintra-prediction mode
`
`is generally applied to calculate predicted values that will be transformed in all TUs of a
`
`leaf-CU. For intra coding, video encoder 22 may calculate a residual value for each
`
`leaf-TU using the intra prediction mode,as a difference between the portion of the CU
`
`corresponding to the TU and the original block. A TU is not necessarily limited to the
`
`size of aPU. Thus, TUs may be larger or smaller than a PU. Forintra coding, a PU
`
`may be collocated with a corresponding leaf-TU for the same CU. In some examples,
`
`the maximum size of a leaf-TU may correspond to the size of the corresponding leaf-
`
`CU.
`
`[0054] Moreover, TUs of leaf-CUs mayalso be associated with respective RQT
`
`structures. That is, a leaf-CU may include a quadtree indicating how the leaf-CU is
`
`partitioned into TUs. The root node of a TU quadtree generally correspondsto a leaf-
`
`CU, while the root node of a CU quadtree generally correspondsto a treeblock (or
`
`LCU).
`
`[0055] As discussed above, video encoder 22 may partition a coding block of a CU into
`
`one or moreprediction blocks. A prediction block is a rectangular (i.e., square or non-
`
`square) block of samples on which the same prediction is applied. A PU of a CU may
`
`comprise a prediction block of luma samples, two corresponding prediction blocks of
`
`chroma samples, and sy