throbber
Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 1 of 18 PageID #: 733
`Case 1:11—cv—00797-RGA Document 113-1
`Filed 01/24/12 Page 1 of 18 Page|D #: 733
`
`Ex. 1
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 2 of 18 PageID #: 734
`Case 1:11—cv—00797-RGA Document 113-1
`Filed 01/24/12 Page 2 of 18 Page|D #: 734
`
`ITU - Telecommunications Standardization Sector
`STUDY GROUP 16 Question 6
`Video Coding Experts Group (VCEG)
`
`Document VCEG-M1§
`Filename: VCEG-M.16.doc
`Generated: 27 March ’01
`
`Thirteenth Meeting: Austin, Texas, USA, 2-4 April, 2001
`
`Question: Q.6/SG16(VCEG)
`
`Source:
`
`Jie Liang (j|iang@fastvdo.com),
`Trac Tran (tran@fastvdo.com),
`Pankaj Topiwala (pnt@fastvdo.com)
`
`Tel:
`Fax:
`Email:
`
`410-730-9191
`410-730-2957
`pnt@fastvdo.com
`
`5565 Sterrett Pl., #322
`Columbia, MD 21044 USA
`
`Title:
`
`A 16-bit architecture for H.26L, treating DCT transforms and quantization
`
`Purpose:
`
`Proposal.
`
`Abstract
`
`For a variety of applications, especially for use in wireless devices with limited power, memory, and CPU
`capacity, it is desirable to have a fixed, low bit-depth integer codec implementation.
`Integer-only codecs
`are also needed if a fully lossless mode is required. A key obstacle to achieving these goals for
`transform-based video codecs is the nature of the transform; all others aspects of video coding can
`generally be maintained to satisfy a fixed low bit-depth (e.g., 16 bits). Most transforms require floating
`point operations, e.g., the true discrete cosine transform, DCT, is a floating-point transform. A variety of
`integer approximations to the DCT have been proposed in the literature, and are used in various
`formulations of existing standards (e.g., JPEG onwards). These approximations differ in the complexity of
`implementation, accuracy of approximation to the true DCT, and compression performance.
`In this
`document, we propose a new approximation to the DCT, named binDCT, which is fully invertible for
`lossless coding, requires no multiplications (only bit-shifts and adds), yet offers operating points within the
`accuracy, complexity, and performance trade space which were previously unavailable. Furthermore, a
`family of increasingly sophisticated solutions is available which offers performance/complexity scalability.
`Our DCT solution maintains 13 bits for 9-bit input, and thus offers the opportunity to build the entire H.26L
`codec in a 16-bit architecture. The quantization of transform coefficients is also treated in this document.
`
`This document mainly addresses the 4x4 transform that is at the heart of the existing H.26L codec.
`However, methods such as adaptive block transforms call for other block transform sizes such as 8x8 and
`16x16. We also have binDCT solutions for both of these sizes. The 8x8 binDCT preserves 15 bits for a
`9-bit input, and the 16x16 binDCT preserves 17 bits for a 9-bit input. The latter can also be made to
`preserve 16 bits by applying a variety of available methods such as scaling, statistical sampling, and
`approximation. Our solutions have complexity, accuracy and performance advantages over other
`approximate DCT solutions recently proposed for these applications [6 - 8], which will be explored
`elsewhere. For now, we briefly present our 8x8 and 16x16 solutions diagrammatically without further
`analysis.
`
`Introduction
`
`A lifting-based fast approximation of the 4-point DCT, named the binDCT, is proposed for the H.26L
`standard, which can be implemented with only addition and right-shift operations. The binDCT can be well
`fitted into a 16-bit architecture and has the same performance as that of the integer transform used in the
`current H.26L test model. Most of the quantization can also be implemented with 16-bit operations,
`except the RD-optimized quantization. The proposed binDCT is implemented within TML 5.2, and its
`performance is demonstrated.
`
`1. Review of the integer DCT in the current H.26L test model
`
`File:VCEG—M16—Fast:VDO—4.01-KEY
`
`Page:
`
`1
`
`Date Printed: 1/13/12
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 3 of 18 PageID #: 735
`Case 1:11—cv—00797-RGA Document 113-1
`Filed 01/24/12 Page 3 of 18 Page|D #: 735
`
`1.1. Integer DCT in TML 5.2
`This section summarizes the integer transform and the corresponding quantization strategy used in the
`H.26L test model 5.2 [1].
`
`The forward integer transform used in the current H.26L test model is as follows:
`
`A=13a+13b+13c+13d,
`B=17a+ 7b- 7c-17d,
`C=13a-13b-13c+13d,
`D= 7a-17b+17c- 7d,
`
`where a, b, c. and d are 4 input pixels. The inverse transformation of transform coefficients A,B,C,D into 4
`pixels a'.b‘,c'.d‘ is defined by:
`
`a'=13A+17B+13C+ 7D,
`b'=13A+ 7B-13C-17D,
`c'=13A- 7B-13C+17D,
`d'=13A-17B+13C- 7D.
`
`This method is a scaled version of the Chen-Wang factorization of the DCT matrix [2] [3]. The relation
`between a and a' is: a' = 676a. Besides, after 2-D transform, the integer transform coefficients are about
`676 times the true DCT coefficients. This scaling is then compensated in various quantization and
`dequantization steps.
`
`1.2 Quantization in TML 5.2
`
`There are 32 different QP values used in the current H.26L. The total range of step size from smallest to
`largest is about the same as for H.263. The QP signalled in the bitstream applies for luma quantization /
`dequantization. This is called QP.u,,,a. For chroma quantization / dequantization a different value -
`QPchmma - is used. The relationship between the two is:
`
`QP.um.,, 012345678910111213141516171819202122232425262728293031
`
`QPch,om,,012345678910111213141516171718192020212222232324242525
`
`When QP is used in the following it means QP..,ma or QPch,oma depending on what is appropriate.
`
`Two arrays of numbers are used for quantization/dequantization2
`
`A(QP=0,..,31) =
`(620,553,492,439,391,348,310,276,246.219,195,174,155,138,123.110,98,87,78,69,62,55,49,44,39,35,
`31 ,27,24,22,19,17},
`
`B(QP=0,..,31) =
`
`{3881,4351.4890,5481,6154.6914.7761,8718,9781,10987,12339,13828,15523.17435,19561.21873,2455
`2,27656,30847,34870,38807.43747,49103,54683,61694,68745.77615,89113,100253,109366,126635,14
`1533}.
`
`The relation between A() and B() is: A(QP)xB(QP)x6762 = 24°.
`
`Given this notation, a transform coefficient K is quantized in the following way:
`
`LEVEL = (KxA(QP) + fx22°)/22°;
`
`|f| is 1/3 for intra and 1/6 for inter blocks and f has the same sign as K.
`
`Given the quantized result LEVEL, the dequantization is performed by:
`
`K’ = LEVELxB(QP)
`
`In summary, let Y be the 4x4 input pixels. and K0 be one of its true 2-D DCT coefficients, then the
`relationship between K0 and the corresponding 2-D integertransform coefficient K is approximately:
`
`K 1: 676 K0
`
`After quantization:
`
`File:\/CEG-M16-FastVDO-4.01-KEY
`
`Page:
`
`2
`
`Date Printed: 1/13/12
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 4 of 18 PageID #: 736
`Case 1:11—cv—00797-RGA Document 113-1
`Filed 01/24/12 Page 4 of 18 Page|D #: 736
`
`LEVEL 4 K x A(QP) / 22°: 676 x K0 x A(QP) / 22°-
`
`After dequantization, the reconstructed signal for K is:
`
`K‘ .9 LEVEL x B(QP) .9 676 x K0 x A(QP) x B(QP) / 22°-
`
`The 2-D inverse transform introduces another factor of 676, therefore the reconstructed pixel is:
`
`Y" =1 6762 x Y‘ x A(QP) x B(QP) / 22°-
`
`Where Y‘ is the reconstructed pixel by the true DCT. Since the arrays A and B are designed such that
`A(QP) x B(QP) x 6762 = 24°‘ we therefore have
`
`Y" == Y‘ x 22°-
`
`Thus a down-shift of 20 bits (with rounding) is needed.
`
`It can be seen from the above that the equivalent quantization stepsizes for the true DCT coefficients are:
`
`22° / (676 x A(QP)).
`
`When QP changes from O to 31, the stepsizes are:
`
`DCTQ(QP=0,..,31) =
`
`[2.5o19,
`
`2.8050,
`
`3.1527,
`
`3.5334,
`
`3.9671,
`
`4.4573,
`
`5.0037,
`
`5.6201,
`
`6.3055,
`
`7.0829,
`
`7.9546,
`
`8.9146,
`
`10.0074, 11.2402, 12.6110, 14.1013,
`
`15.8280,
`
`17.8293, 19.8865, 22.4804, 25.0185, 28.2027, 31.6561, 35.2534,
`
`39.7730, 44.3185, 50.0370, 57.4499, 64.6312, 70.5067, 81.6394, 91.2440];
`
`There is an increase of about 12% from one stepsize to the next one.
`
`2. The binDCT
`
`2.1 Chen-Wang Algorithm of the DCT and the Lifting Scheme
`The proposed binDCT is also based on the well-known Chen-Wang plane rotation-based factorization of
`the DCT matrix [2] [3]. For 4-point DCT, their factorization is shown in Fig. 1.
`
`Xt0I
`
`.
`
`T 99‘
`\
`3/
`X111
`._
`,«_____;D
`3/
`\>/'
`\-.
`IA
`,/419
`\
`
`M21
`mi.
`
`-
`
`_
`
`‘-.;-1
`
`3121..
`
`,
`
`L
`*3 F3 ,.
`)1‘ "$1.1
`_/__@f_ _ Y[2]
`-C1:/4
`L
`Y[1]
`c3..67:_1@_
`xg S31:/8
`_ .§;§3"’3 ti.-*1 .
`
`C3rttE%
`
`Fig 1. Plane rotation-based factorization of the 4-point DCT.
`
`where c0 and s6 mean cos(6) and sin(6), respectively. Note that a uniform scaling factor of 1/sqrt(2) is
`required at the end of the signal flow to get the true DCT coefficient.
`
`File:VCE‘.G-M16-FastVDO—4.0l—KEY
`
`Page:
`
`3
`
`Date Printed: 1/13/12
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 5 of 18 PageID #: 737
`Case 1:11—cv—00797-RGA Document 113-1
`Filed 01/24/12 Page 5 of 18 Page|D #: 737
`
`The plane rotations in this factorization are the main difficulty for the 16-bit implementation of the 4-point
`DCT. In this proposal, the lifting scheme [4] will be used to obtain a fast approximation of the DCT. Fig. 2
`illustrates the decomposition of the plane rotations in Fig. ‘l by three lifting steps, as shown in [4]:
`
`
`
`(b)
`(a)
`Fig 2. (a) Representation of a plane rotation by 3 lifting steps. (b) Inverse transform.
`
`Each lifting step is a biorthogonal transform, and its inverse also has a simple lifting structure, as shown
`in Fig. 2(b). This means that to inverse a lifting step, we simply need to subtract out what was added in at
`the forward transform. Hence the original signal can still be perfectly reconstructed even if the floating-
`point multiplication results in the lifting steps are rounded to integers, as long as the same procedure is
`applied to both the forward and inverse transform. This is the basis for many lifting-based lossless
`transforms.
`
`To obtain fast implementation, we can further approximate the floating-point lifting coefficients by
`hardware-friendly dyadic values (i.e., rational numbers in the format of k/2"‘, where k and m are integers),
`which can be implemented by only shift and addition operations. This also enables the transform to be
`implemented with a narrower data bus than the original method.
`
`2.2 The scaled Lifting Structure
`To further reduce the complexity of the lifting-based fast DCT, we use a scaled lifting structure to
`represent the plane rotation, as proposed in [5]. The general structure is given in Fig. 3 (a) and (b), where
`a general butterfly (not necessarily an orthogonal plane rotation) is represented by 2 lifting steps and 2
`scaling factors. The two scaling factors can be absorbed in the quantization stage, thus only 2 lifting steps
`are left in the transform, making it more efficient than the representation in Fig. 2.
`
`The parameters in Fig. 3(b) are given by:
`
`P = r12 / T11.
`U = F11 X F21/(V11 X F22 — F21 X T12).
`K1=F11.
`K2 = (F11 X |‘22- F21 X F12) / V11.
`
`If the parameters in Fig. 3(a) are orthogonal plane rotation, as shown in Fig. 3(c), its scaled lifting
`structure can be represented as in Fig. 3(d). As noted in [5], for rotation angles close to (k+1/2)n, this
`scaled lifting structure is sensitive to finite-length approximations, and it also increases the dynamic range
`of the intermediate results. However, it can be shown that in this case, a permuted version of the scaled
`lifting structure will lead to a more robust result, and the dynamic range is also minimized. The permuted
`rotation and its corresponding scaled lifting structure are illustrated in Fig. 3(e) and (f).
`
`E‘ile:VCEG-M16-E‘astVDO—4.01-KEY
`
`Page: 4
`
`Date Printed: 1/13/12
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 6 of 18 PageID #: 738
`Case 1:11—cv—00797—RGA Document 113-1
`Filed 01/24/12 Page 6 of 18 Page|D #: 738
`
`"E19
`
`l "El“' Y1
`
`Lg ..Ei_. Y2
`
`.
`
`-69 —L
`g3
`3;
`.
`‘\ / sina
`tuna. —uu(rx:om
`.
`/\
`_
`.
`5|
`x‘-£11111
`X2 4 —ta— Y2 x2—A~ee . pl. —~Y2
`
`cow
`V
`mm
`
`com ]—- Y1
`
`(c)
`—sina
`X1 T-\\
`‘@ "' Y2
`fa
`com
`I.
`x_\ com
`_® ____ Y1
`
`X2
`
`..v’
`
`.
`
`/5‘
`
`(d)
`
`X1 W -SiI'l(1|
`_
`L sinorcosucl
`rand
`..
`V .
`
`1
`flfét
`
`
`
`X2,
`
`* Y2
`
`0-Y1
`
`(E)
`(e)
`(c) An orthogonal plane rotation; (d)
`Fig. 3. (a) A general rotation; (b) The scaled lifting structure for (a);
`The scaled lifting structure for (c); (e) The permuted version of (c); (f) The scaled lifting structure for (e).
`
`2.3 General Structure of the binDCT
`
`From the above results. we can obtain the general structure for the lifting-based 4-point DCT, denoted by
`binDCT. The forward and inverse transforms are given in Fig. 4. The rotations of 7!:/4 and 3::/8 are both
`represented by 2 lifting steps. Note that the permuted scaled lifting structure is used for 37:/8. and
`because of the butterflies. a scaling factor of 2 is introduced after the inverse transform, which can be
`compensated by a right-shift.
`
`x[0]H
`E
`
`‘
`
`£9
`-’
`
`-+J
`in?
`
`r — - — ---
`-r.
`%___I
`
`.——~-~
`- Y[0] —~:__2__;
`
`
`
`2x[0]
`
`i_
`
`I
`
`2x[1]
`
`V
`
`,
`
`1421
`
`\>\;,
`"
`
`.
`_
`'_ _ _ _
`,_'_s_ir1—31_t/t;"_,
`
`_ _
`
`_
`
`4
`
`'\\
`.39
`
`"R
`
`__
`
`{X91
`
`2x.[3]
`
`(a)
`
`(b)
`
`Fig. 4 (a) Forward transform of the 4-point binDCT; (b) Inverse transform of the 4-point binDCT.
`
`File:VCE1G—M16-E‘astVDO-4.01-KEY
`
`Page:
`
`5
`
`Date Printed: 1/13/12
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 7 of 18 PageID #: 739
`Case 1:11—cv—00797-RGA Document 113-1
`Filed 01/24/12 Page 7 of 18 Page|D #: 739
`
`
`
`
`3/8
`
`3/8
`
`5
`
`1/2
`
`3/8
`
`4
`
`1/2
`
`1/2
`
`3
`
`10
`
`9
`
`8
`
`3/8
`
`5
`
`10
`
`
`
`
`
`Table 1 tabulates the floating-point values of the parameters p and u in Fig. 4, as well as some of their
`dyadic approximations, which allow for fast implementations with only shift and addition operations. The
`coding gains of the floating-point 4-point DCT and various binDCT configurations are also presented in
`Table 1. showing that the binDCT is very close to the true DCT in terms of the energy compaction
`capability.
`
`The first configuration in Table 1 with p = 7/16 and u = 3/8 will be used in the following tests. which
`requires 5 shifts and 10 additions. As a comparison, the integer transform in the current TML 5.2 needs 6
`fixed-point multiplications and 8 additions. Note that 7/16x can be implemented as 1/2x — 1/16x, and 3/8x
`can be obtained by 1/2x — 1/8x. That is, only right-shifts are involved, which minimizes the dynamic
`ranges of the intermediate results. The forward and inverse transforms for this configuration are (without
`considering the scaling factors):
`
`Forward transform=
`
`1
`
`1
`
`1
`
`1
`
`107/128
`
`3/8
`
`-3/8 -107/128
`
`1/2
`
`-1/2 -1/2
`
`1/2
`
`I.
`
`7/16
`
`-1
`
`1
`
`-7/16
`
`.
`
`Inverse transform =
`
`1/2
`
`1/2
`
`1
`
`1
`
`3/8
`
`7/16
`
`-1 -107/128
`
`1/2 -7/16 -1
`
`107/128
`
`'
`
`1/2
`
`-1
`
`1
`
`-3/8
`
`2.4. Scaling Factors and Quantization
`
`The scaling factors in Fig. 4 are incorporated into the quantization. The 1-D scaling vector is:
`
`81 = [0.50000O0000O0O0 0.765366864723018 1.00000000000000 0.653281482413819],
`
`and the corresponding 2-D binDCT scaling matrix is
`
`s2 = S1Tx s1 =
`0.25
`
`0.3826834323 6509
`
`0.5
`
`0.326640741219119"
`
`0.383268343236509 0.58578643762690 0.76536686473018
`0.5
`0.76536686473018
`1.0
`
`0.5
`0.653281482438|9 ‘
`
`0.32664074121909
`
`0.5
`
`0.65328148243819
`
`(}.426'/7669529664
`
`where S1T is the transpose of S1 .
`
`File:VCEG—Ml6—FastVDO-4.0l—KEY
`
`Page:
`
`6
`
`Date Printed: 1/13/12
`
`binDCT Confi
`
`.
`
`P
`
`U
`
`0.41421356237310
`
`7/16
`
`0.135355339059327
`
`Number of Shifts
`
`-
`
`-
`
`Numberof Adds
`
`
`
`
`
`
`7.5485 Codirl Gain dB 7.5701 true DCT 7.5697 7.5566 7.5493
`Table 1. Some configurations of the binDCT and their performances.
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 8 of 18 PageID #: 740
`Case 1:11—cv—00797-RGA Document 113-1
`Filed 01/24/12 Page 8 of 18 Page|D #: 740
`
`It can be verified that if X is a given 4x4 image block, and Y is its true 2-D DCT transform, the relationship
`between the binDCT transform coefficient Yb and Y is:
`
`where 69 represents entry-wise multiplications between two matrices.
`
`Y = Yb (9 82,
`
`In the current H.26L test model, the integer transform has a uniform scaling factor for all transform
`coefficients, which is no longer true in the binDCT, because of the 2-D scaling matrix S2. Hence, to
`maintain the compatibility of the binDCT and the true DCT, a quantization matrix Qb should be applied to
`the binDCT results, whose elements are defined as:
`
`Qb(i, j) = DCTQ(QP) / S2(i, j),
`
`i,j = 1 to 4.
`
`where DCTQ is the equivalent quantization stepsize for the true DCT coefficients, as derived in the first
`section.
`
`Note that the quantization matrix is determined by both the binDCT scaling matrix and the QP value.
`Since the first frame could have different QP value from other frames, and QP.um.,, and QPch,oma might be
`different, two quantization matrices should be defined for each frame --- Qbchmma for the chroma
`quantization and Qbluma for the luma part. These two matrices should be updated after the processing of
`the first frame, if the QP of the remaining frames is different from the first frame.
`
`The vector DCTQ and the binDCT scaling matrix 82 have floating-point entries. In implementation, two
`methods can be used to obtain integer quantization matrix. In the first method, floating-point operations
`can be used to obtain the floating-point values of Qb from DCTQ and S2, these results can then be
`rounded into integers.
`
`In order to generate the quantization matrix with 16-bit fixed-point operations, we can scale DCTQ by a
`factor of 8, and 1 /S2(i,j) by a factor of 16, for i,j = 1
`4. The results are then rounded to integer and
`stored as look-up tables. Denote the integer results of the scaled DCTQ and 1/S2 by SDCTQ and S82,
`the integer binDCT quantization table can be computed by 16-bit operations as:
`Qb(i, j) = (SDCTQ(QP) * SS2(i, j) + 64)/ 128,
`i,j = 1 to 4.
`Both Qbchmma and Qbluma can be performed in this way.
`
`This scaling approach introduces some round—off errors for the binDCT quantization table, compared with
`the quantization stepsizes used by the current TML, especially for smaller QP. However, experiments
`show that the effect is negligible for QP >= 8.
`
`After the quantization matrix is obtained, the quantization can be performed in a similar way to the current
`integer transform. That is,
`
`LEVEL(i, j) = ( Yb(i, j) + fx Qb(i, j) )/ Qb(i, j);
`
`4 can be calculated in advance, after the
`To speed up the quantization, the fx Qb(i, j), i, j = 1
`quantization matrix Qb is available. In the current H.26L test model, f can be one of 3 values: 1/3 for lntra
`blocks, 1/6 for inter blocks, and 0.45 for RD-optimized quantizations. As a result, 8 4x4 look-up tables can
`be defined for the quantization of each frame, storing Qbch,°ma_ 1/3Qbch,oma, 1/6Qbch,oma, 0.45Qbch,°ma,
`Qb.uma,1/3Qb.uma, 1/6Qb.u,,,a, and 0.45Qb.uma, respectively, where the 0.45Qb can be obtained by Qb/2 —
`Qb/20.
`
`Since B-frames are introduced in the latest TML 5.9, more quantization tables are needed to take care
`the QP value for B frames.
`
`The dequantization is simply performed by
`
`Yb‘(i, j) = LEVEL(i, j) * Qb(i, j).
`
`E‘ile:\/CEG-M16-l:‘astVDO-4.0l—KEY
`
`Page:
`
`7
`
`Date Printed: 1/13/12
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 9 of 18 PageID #: 741
`Case 1:11-cv-00797-RGA Document 113-1
`Filed 01/24/12 Page 9 of 18 Page|D #: 741
`
`2.5 Dynamic Range Analysis
`The elimination of the floating-point and fixed-point multiplications enables the binDCT to be implemented
`with narrower data bus than other fast algorithms.
`
`After the prediction, the input data to the binDCT are 9-bit signed integers, ranging from -255 to 255. To
`check the dynamic range of the binDCT, we examine the signs of the binDCT coefficients and find out the
`set of input data which would generate the maximum or minimum outputs in different binDCT subbands.
`For example, the second row of the forward binDCT used in this proposal is:
`
`[107/128
`
`3/8
`
`-3/8
`
`-107/128],
`
`thus the input x = [255. 255, -255, -255] will generate the maximum output for the second binDCT
`coefficient, and x = [-255, -255, 255, 255] will give its minimum.
`
`it may appear that intermediate calculations, which amount to multiplying numbers such as 107x255,
`would lead us outside a desired low-bit range (e.g., beyond 16-bits). However. given our method of
`calculating these transforms in terms of cascaded lifting steps, these multiplications never take place. We
`are able to verify that the entire calculation, beginning to end. stays within 13 bits, as shown in figure 5. In
`general, as all lifting parameters in the binDCT are less than 1, they can be implemented with addition
`and right-shift operations, which can minimize the intermediate dynamic range. Furthermore, since the
`first row has the maximum absolute sum, the dynamic range of the binDCT is therefore determined by the
`DC subband.
`
`Xw]
`
`[—255,255]
`[—255,255]
`
`(9)_
`(9)
`
`\
`\
`1
`
`[—255,255]
`[ 255 255]
`—
`
`(9)
`(9)
`
`I
`
`x[2]
`
`Km
`
`1
`
`>1
`1'
`
`I
`1/ la
`1
`J/\\ /_'_‘é_)
`E9
`_
`
`[—510_ 510] (10)
`[—510,510]
`(10)
`
`[—S10,S10]
`
`(10)
`
`I-='>10,5101<10>
`
`[-1020, 1020] (11)
`[—510,510] (11)
`
`I
`
`.
`]1/2]
`
`[—10201020] (111 W0]
`[—510, 510]
`(_11_; Ym
`
`__
`3
`-7/1'6"‘
`I
`
`[—733,733] (11)
`
`[—733,733] (I I)
`
`-
`
`1
`l3/=81
`1-5«1<x5il1>..(.{.,+I-61zL%1U12Y[,]
`
`Figure 5. Dynamic range of intermediate calculations for our 1-D 4-point binDCT. The numbers in
`brackets give the worst case range, and numbers in parentheses give the bit depth. The 1D transform
`thus maps 9-bit integers to 11-bit integers. The 2D transform takes 9-bits to 13-bits.
`
`It can be verified that given the input range of [-255, 255], the 1-D binDCT outputs are within [-1020,
`1020], which requires 11 bits to represent. The range of the 2-D binDCT output is [-4080, 4080], which
`needs 13 bits. If all the lifting parameters are implemented with only right shifts, the intermediate results
`are less than the final results in the worst cases. Therefore, the binDCT transform can be well fitted into
`16-bit architecture. The implementation of figure 5 in C is shown below.
`
`transform */
`/* Horizontal
`for
`(i=O;i<4;i++)
`
`{ M
`
`5[0]= M0[i1[ii][0][JJ] + M0[1][ii][3][jj],
`M5[3]= M0[i][ii1[0l[jJ] - M0[11[ii][3]ljj],
`M5[l]= M0[i][ii][1llJJ] + M0[1][ii][2][jj],
`M5[21= M0[i][ii1[1lijj] - M0[i][ii1[21[jj1,
`M0[i1[ii][0][jj] = M510]
`+ M5[l1;
`— M5111;
`M0[i][ii][21[jj] =
`(M0[i1[ii1[0][jj] >> 1)
`MO[i][ii][3][jj] = ((M5[3] >> 1)
`-
`(M5[3] >> 4))
`- M5[2];
`M0[i][ii][11[jj] = M513]
`- ((M0[i][ii][3][jj] >> 1)
`- (M0[i][ii][3][jj] >>
`3));
`
`File:VCEG—M16—Fast:VDO—4.01-KEY
`
`Page:
`
`8
`
`Date Printed: 1/13/12
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 10 of 18 PageID #: 742
`Case 1:11—cv—00797-RGA Document 113-1
`Filed 01/24/12 Page 10 of 18 Page|D #: 742
`
`2.6 Rate Distortion constrained quantization (RDQ)
`
`The current RD-constrained quantization function needs some modifications to be used in the binDCT
`framework. The main reason is that in the current integer transform, the quantization stepsize is uniform
`for all coefficients; therefore the RDQ function doesn't need to know the coordinates of each coefficient
`after the zigzag scanning. However, to use the binDCT, we have to pass the scanning pattern information
`to the RDQ functions, so that it can apply the correct quantization stepsizes for different binDCT
`coefficients. This can be done by simply passing to the RDQ function the pointer to the scanning pattern
`table, which could be single scan or double scan, depends on the current modes. For the quantization of
`the 2x2 chroma DC, this pointer is not used.
`
`Within the RDQ function, the program needs to check the intra or inter mode to decide whether to use
`Qbmma or Qbchmma (including 1/3Qb or 1/6Qb). Besides, the pointer to the Qb table should be returned to
`the binDCT routine for the dequantization processing.
`
`The quantization decisions in the new RDQ functions are similar to the one in the current test model. The
`only difference is in the calculation of the RD_gain. When the program scans through all coefficients and
`accumulates the 'RD_gain' by representing a coefficient by a nonzero reconstruction, a factor of 64 is
`introduced so that integer calculations may be used. The formula is:
`
`RD_gain = 64x(Bits + Ax((K‘ — Level)2 — K'2)) = 64x(Bits + }»x(Leve|2 — 2xK'xLevel) )
`
`= 64xBits + }»xLevelx(64xLevel — 128xK') = 64xBits + >»xLevelx(64xLeve|-KxA(QP)/213)
`
`where K is a transform coefficient, and K’ is a floating-point representation of the quantized result of K.
`
`in binDCT, this is simply
`
`64xBits + 7»xLevelx(64xLevel — (128 x K) / bquant)
`
`where bquant is the quantization stepsize corresponding to K, obtained from the current Qb table.
`
`This is the only part that couldn't be implemented with 16-bit operations. If RD-constrained quantization is
`not desired, the whole DCT and quantization can be implemented with 16-bit architecture.
`
`2. 7 Alternative binDCT structures:
`
`There are some other alternatives to the aforementioned binDCT structures. In Fig. 6 (a), the rotation
`
`angle of 3n/8 is implemented as 3 lifting steps. Compared with Fig. 4 (a), this structure needs a little more
`operations, but has nicer scaling factors. Some configurations of this kind of binDCT are shown in Table
`2.
`
`In Fig. 6 (b), we replace the butterflies by lifting steps, which allow easier integer to integer mapping. The
`inverse transform is simplified, because in the previous structures, the inverse transform has to take care
`of the downscaling caused by the butterflies.
`
`xlll
`
`x12]
`
`x[0l\
`
`1"
`'~__;
`‘\\f._
`‘Wu’
`A
`5
`..=’
`‘
`'\
`1-*9
`/at
`,
`3 /
`‘U- __@_
`
`Wt!
`
`_
`
`—
`“ti 1
`I
`P1_l|U| T'3_
`4;, 1-55
`
`(a)
`
`v_: %.::_.Y[0]
`
`xmI,Q
`
`__: ","‘._¥I21
`._ _
`I
`.
`'
`--. Ytll
`
`x[_11
`
`.i2i
`
`_@ _
`I
`I
`
`3}
`
`_
`
`-.
`
`YB,
`L2 2.
`_ _
`4
`
`.
`
`_
`
`_
`-‘I-*_l._.__.@.
`
`YL01
`
`'_®J‘
`
`l"?
`_@
`: Y[2]
`r"|'
`--
`'-
`__
`_
`_
`;----- -~, Y[3]
`17 "L"?‘_"“_31’?:"
`lL_’___ll_‘.‘_
`Y,
`i._..(,5: _.; w_v;?m;
`.'-.l
`.
`_ _ _ __
`
`([3)
`
`Fig. 6. Two alternative structures for the binDCT.
`
`File:VCEG—M16-Fast:VDO-4.01-KEY
`
`Page:
`
`9
`
`Date Printed: 1/13/12
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 11 of 18 PageID #: 743
`Case 1:11—cv—00797—RGA Document 113-1
`Filed 01/24/12 Page 11 of 18 Page|D #: 743
`
`-5/8
`-O66817863791930
`
`0.923879353251129
`
`
`P2
`-0.668178637919130
`-21/32
`-5/8
`-5/8
`-3/4
`
`Number of Shifts
`
`-
`
`6
`
`5
`
`6
`
`5
`
`
`
`Numberof Adds

`12
`11
`12
`11
`7.5030 7.5685 7.5649
`7.5560
`
`Codin Gain (dB
`Table 2. Some configurations of the binDCT in Fig. 4 (a) and their performances.
`
`3. BinDCT for sizes 8x8 and 16x16
`
`We have fast, multiplierless solutions for the sizes 8x8 and 16x16 as well, which are briefly summarized
`in the following figures 7 and 8. As mentioned, these can be used if DCTs of these sizes are required.
`
`x[0]_
`
`.
`
`xm
`
`\
`
`«ea
`
`x[2] . _- ._,_
`
`X13] >1
`
`x[4]_._
`
`.. __
`
`x[5] -
`
`X16]
`
`x[7]
`
`'
`
`_-
`
`Figure 7. An 8x8 binDCT example, from [9]. All special constants are dyadic rationals.
`
`File:VCEG—Ml6—Fast:VDO-4.01-KEY
`
`Page:
`
`10
`
`Date Printed: 1/13/12
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 12 of 18 PageID #: 744
`Case 1:11—cv—00797—RGA Document 113-1
`Filed 01/24/12 Page 12 of 18 Page|D #: 744
`
`x[O]
`‘l
`1
`""1 V
`,
`'-.
`x[2]
`.\I-'.|I‘||z"f@
`l\||\
`I
`mi —--1--fix
`/
`1
`X141
`\ fl
`‘[51
`~\‘\'-.-
`x[6]
`\
`‘
`».[7]_--
`X181 —"-397::
`
`'1
`
`*
`
`.
`
`\.
`
`_E
`411121!/4 :_, km]
`~.’._:?>
`W.‘ * xm
`.''I
`-‘A
`If!
`‘.I\
`._."(B
`XLG]
`«,1.
`mini]
`.“
`I,»
`-~7-__.®~’— --ea Ll —é>" — e X121
`"I.-1&3
`-:
`__
`xm
`
`‘=1.
`Xi“
`15:19.1 l&*fli1»'4l
`/
`X[5]
`J)
`..F
`-2"
`xm
`'59
`X1131
`
`,_
`
`.__ _:
`.._
`N
`fa‘;
`.
`
`_
`
`_:§/3?1!?"+"E'11i«‘_5'+'1
`
`ii __,_
`-319. __1..
`4%,)
`
`_
`
`/
`
`'.\
`.39
`
`..
`.4;
`
`-
`
`I
`“'“§‘i
`
`-9 _
`
`ff)
`/'
`
`\
`'\
`
`V
`
`_+
`
`["37
`49
`
`I
`
`_
`
`'
`
`'
`
`‘
`
`',~.'\
`
`
`
`fr/"1
`
`11*
`
`'11-
`
`-'1
`
`1.
`
`=
`
`‘_—rI'a"?:i;l:_i|'—'.\%II§§\'§‘-t3
`"_,’E,'/1.'“1-1:-“G3
`x[1U|
`xl111'—',"./1 "‘ 111‘-‘\‘*"'|"'
`~:[:21—
`‘ 1'
`r}:
`xnsi
`/7 ‘
`1?
`1
`.
`x[14|
`"
`‘H3
`'1
`I
`l
`x[15:
`-— —‘
`
`-
`_
`41.3
`
`11/32.1.5-‘£5
`
`.1_1/32. 1.9-8; M W32 .2/_-12.15/16 2/_321.m=:.
`
`.1__:.2‘-?ra.=
`
`.‘-:»*'‘-.‘~
`
`1/
`
`'
`
`'-
`
`I‘.
`
`_-'i""j.--
`
`1
`
`'d‘\._
`
`;-
`_
`-6.53
`
`--
`
`-
`
`'
`—-——
`
`—
`
`.
`'{‘)_
`L?‘ _ _ —
`'
`_
`:"_-."\_‘_\_J_;;@ "\\
`‘(ll
`_ __—
`'/-13%:
`"r_$1'?_
`
`
`I‘ J _| ‘Tl.
`_ _'_?—,‘t3——|i J_. "5"T'1i="—'-15-_:—-?f**¢*Tt_._|
`l
`ll
`1 \
`9 .,_1i:’1l‘,;€er
`1
`«:3
`
`J7
`_.v;?\:‘'-_
`1 _ '-
`" "$9" "'1'
`119
`\
`J;
`1
`-_ --ta--—' ‘ --1&3
`
`I
`
`1&1
`
`«laJn3"f_31J'“ X[3]
`_-~ mi
`*;
`1".-.“ X115]
`-:__y2__; »x[1]
`I- xm
`I. —i——~‘
`‘-5233? I
`-X151
`1
`;
`4./__
`. _
`|i/15 3'15 |
`1‘ -é:"- ~;,,s,,,;,,,,;.'- ~ x[11]
`
`F“; ‘.
`
`-
`
`,
`
`_
`
`-- — — ‘W
`
`Figure 9. A 16x16 binDCT example. from [9].
`
`4. Experimental Results
`
`The attached results are obtained with 100 frames of various test sequences, running on a PC with
`Pentium III 550MHz CPU and Windows 2000 OS.
`
`In the implementation of the inverse binDCT transform, if all the AC coefficients are zero, the inverse
`transform is bypassed and all the outputs are set to one half of the DC coefficients. This can make the
`inverse transform a slightly faster.
`
`In the following tests, we encode and decode a sequence with two different methods, and measure their
`difference in term of PSNR Y and the compressed bit rate.
`
`First, we compare the two ways of obtaining the quantization table for the binDCT. The first method use
`floating-point operations to get the floating values of the entries in the quantization table, then round them
`to integer. In the second method. we start from the scaled true DCT quantization stepsize DCTQ and the
`binDCT scaling matrix 82, and compute the quantization matrix with 16-bit operations.
`
`The result with the Foreman sequence shows that for Q>=8, the maximum difference is 0.212% for PSNR
`and 2.213% for compressed bit rate, which are quite small. Therefore. the 16-bit implementation of the
`quantization matrix is used for the following tests.
`
`File:VCEG—Ml6—FastVDO-4.0l—KEY
`
`Page:
`
`11
`
`Date Printed: 1/13/12
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 13 of 18 PageID #: 745
`Case 1:11—cv—00797-RGA Document 113-1
`Filed 01/24/12 Page 13 of 18 Page|D #: 745
`
`The comparison between binDCT and the integer transform for 4 sequences show that for QP >= 8, the
`maximum difference between the two methods is 0.35% for PSNR and 3.6% in bit rate. When QP >= 16,
`most of the bit rate changes are less than 1.5%. Therefore they have very similar performance. This
`agrees well with our design target of providing compatible outputs after quantization.
`
`4. Conclusion
`
`A lifting-based fast approximation of the 4-point DCT, named the binDCT, is proposed, which can be
`implemented with only addition and right-shift operations. The binDCT can be well fitted into 16-bit
`architecture. Most of the quantization can also be implemented with 16-bit operations, except the RD-
`optimized quantization. Experimental results show that the binDCT has the same performance as the
`integer transform used in TML 5.2.
`
`References
`
`[4]
`
`[1] H.26L Document Q15-K-59: H.26L Test Model Long Term Number 5 (TML-5) Draft 0.
`[2] W. Chen, C. Harrison, and S. Fralick, A fast computational algorithm for the discrete cosine transform,
`IEEE Trans. Com., Vol. COM-25 (9), pp. 1004-1011, Sept. 1977.
`[3] Z. Wang, Fast algorithm for the discrete W transform and for the discrete Fourier transform, IEEE
`Trans. ASSP, Vol. ASSP-32 (4), pp. 803-816, Aug. 1984.
`l. Daubechies and W. Sweldens, Factoring wavelet transforms into lifting steps, Journal of Fourier
`Anal. Appl., Vol. 4 (3), pp. 247-269, 1998.
`[5] J. Liang and T. D. Tran, Fast multiplierless approximations of the DCT with the lifting scheme,
`submitted to IEEE Trans. on Signal Processing. Website for download: http://thanglong.ece.'hu.edu
`[6] M. Wien et al, "|nteger Transforms for H.26L using Adaptive Block Transforms," lTU VCEG
`q15K24.doc, Portland, OR, USA, Aug, 2000.
`[7] L. Kerofsky and S. Sun, “Results of H.26L Core Experiment on Adaptive Block Transforms," ITU VCEG-
`L11, Eibsee, Germany, January, 2001.
`[8] M. Wien and S. Sun, "ICT Comparison for Adaptive Block Transforms," ITU VCEG-L12, Eibsee,
`Germany, January, 2001.
`[9] J. Liang and T. Tran, "Fast Multiplierless Approximations of the DCT with the Lifting Scheme," SPIE,
`Appl. Dig. lm. Proc. XXll|, vol. 4115, Aug., 2000, pp. 384-395.
`
`File:VCEG—Ml6-Fasl:VDO-4.01-KEY
`
`Page:
`
`12
`
`Date Printed: 1/13/12
`
`

`
`Case 1:11-cv-00797-RGA Document 113-1 Filed 01/24/12 Page 14 of 18 PageID #: 746
`Case 1:11—cv—00797—RGA Document 113-1
`Filed 01/24/12 Page 14 of 18 Page|D #: 746
`
`Experimental Result 1. Comparison of two methods of obtaining
`quantization table for binDCT
`
`Bit Rate
`(bpp)
`Foreman.qcif QP PSNR Y PSNR U PSNR V
`BinDCT_intQ
`8
`42.55
`44.32
`46.29 569439.31
`16
`36.69
`39.88
`41.77 156044.41
`20
`34.05
`38.57
`40.17
`89662.50
`24
`31.42
`37.37
`38.50
`55437.60
`28
`28.44
`36.56
`37.34
`38102.10
`
`Change of Change of
`PSNR Y (%) Bit Rate (%)
`-
`-
`-
`-
`-
`-
`-
`—
`-
`—
`
`BinDCT__f|tQ
`
`8
`16
`20
`24
`28
`
`42.46
`36.72
`34.03
`31.39
`28.48
`
`44.24
`39.91
`38.58
`37.35
`36.51
`
`46.27 557352.63
`41.80 158942.70
`40.18
`88812.00
`38.47
`55594.50
`37.31
`3865

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket