`Organization
`:
`international Berea
`
`(2) INTERNATIONAL APPLICATION PUBLISHED USDER THE PATENT COOPERATION TREATY(FCT)
`2 UOATBE
`=
`|
`
`eS
`AN
`
`(43) International Publication Date
`6 December 2012 (06.12.2012) WHRPOQIiPCT
`
`(18) International Publication Number
`WO 2012/166382 A2
`
`(51) Luternational Patent Classification:
`HBIN 1/96 (20068.D1}
`HOaNGd (2006.07)
`(21) Iaternational Application Number:
`=
`"
`,
`.
`eee
`
`POTAISIOI 2038448
`
`(22) Lnteraational Filing Date:
`
`-
`(25) Ming Langnage:
`26) Publicadon Language
`
`LP May2012 (17.08, 20123}
`‘
`—_
`English
`Enalish
`
`
`
`
`(8D Designated States funjesy otherwise oxdicuicd for sere
`
`Mad af waticnal proicetian auniableh AB, AG, AL, AM,
`AQ, AT, AU, AZ, BA, BBL RG BR, BE, BW, BY, Bi,
`
`
`CA, CHL CE ¢ 0, OR, CU, C2, BEL DR, DM. De,
`
`
`
`
`
`ES, PL GB, GD, GE, GH, GM, GT, HN,
`bz, PE, RR, BG.
`HR, HI, TR iL. TN, PS, JP, RR, EG, RM, EN, EP) RR,
`KZ, LA, LC, EE, LR, LS, UT EU, LA, MA, MB, ME,
`MG. MK, MN, MW. MX, MY. MZ, NANG NE NO, NZ.
`OM. PE, BC, PH, PL. PT, QA, BO, RS, RLL RW, SCAR,
`
`Sh, SG, SE, SLOSM, ST SV, SY, TH, EE TM, TN, TR,
`
`TT, TZ, VA, UG, US, LEZ, VCR, ZA, 2M, ZV.
`
`
`
`cy.oe JUDE. DK,
`a
`.
`}
`:
`2,
`ES, ATL ET, Lu LY,
`ES
`
`MC. MK, MT. L, NO, PL, Pr, ROR RS, SE,
`ST, SEL SM,
`
`
`TROATT CBF, BT, CE, 0G, CL CM, GAL GN, GQ. GY,
`MIL, MRL NE, SN, TRE TGs}.
`
`;
`(38) Priority Data:
`27 May 208) (2705.2019
`Us G4) Designaced States favesether
`8V89} B14
`(7H Applicant for all desivneated States excep’ US): DAYLBY
`LECENSING CORPORATTON
`LABORATORIES
`
`fuss
`100 Poytreva Avenue, San Prancisce, Calvania
`olds.48433 US).
`(72) ieventers: and
`(73)
`inventers/Appticanis Gor USon
`
`MESSMER, Neil M.
`
`1
`‘
`[CAVCAR wo
`SS44 Z1lith
`3
`Columbia VOYOBS (A) ATRINS, Rabin [CAAIS)
`co Declaratinns under Rule 417:
`
`Dolby Laboratories, Ine. 432 Lakeside Deh
`Calliortiia $4085 CUS) MARGERAE Steve[
`
`Cath:ca Street,i, New weesiminister, British ©.
`2Ck
`*, Peter W. PGB:Cat
`:
`
`* British Cohambia VSL IP6
`
`~~
`
`wees
`
`\
`as ta the identiey ofthe Iaventar (Rube 44 790)
`
`ay fg applicant ‘
`
`(emer! ie apply far end be granted a
`
`74) Agents: DOLBY LABORATORIES, EXC. ot al: Intel~
`
`dectuad Property Group, $99 BrannStreet, San Francisca,
`California 9A h)3-d938 CUS),
`
`
`mr (Rude 4.07
`
` S OF
`(34) Tide: SCALABLE SYSTEMS POR CONTROLLING COLOR MANAGEMENT COMPRISE: VARYING LEVEL
`METADATA
`
`Convinced on next pare!
`
`OE
`
` efecneee
`
`
`
`
`
`
`FIG. 1A
`
`
`processing systeus wud methods are disclosed herein wherebycalor manage~
`Lom a target display is changed according ta varying lovels of imetadata,
`
`
`
`
`
`Published:
`
`_ withoed ineraational search repurt and to be reguebiished
`upor recedes ofibe report (Nude IR 263)
`
`
`
`WO 2012/166382
`
`PCT/US201 2038448
`
`SCALABLE SYSTEMSFOR CONTROLLING COLOR MANAGEMENT COMPRISING
`
`VARYING LEVELS OF METADATA
`
`CROSS-REFERENCE TO RELATED APPLICATIONS
`
`This application claims priority to United States Provisional Patent
`Application No. 61/494,014filed 27 May 2011, which is hereby incorporated by
`reference in its entirety.
`
`TECHNICAL FIELD
`
`The present invention relates ta image processing and, more
`particularly, to the encodingand decoding of image and video signals
`employing metadata, and more particularly, in various layers of metadata.
`
`BACKGROUND
`
`Known scalable video encoding and decoding techniques allow for the
`expansion or contraction of video quality, depending on the capabilities of the
`target video display and the quality of the source video data.
`
`mprovements in image and/or video rendering and the experience to
`the viewers may be made, however, in the use and application of image
`metadata in either a single level or in various levels of metadata.
`
`SUMMARY
`
`Several embodiments of scalable image processing systems and
`methods are disclosed herein whereby color managernent processing of source
`image data to be displayed on a target display is changed according to varying
`levels of metadata.
`
`In one embodiment, a method far processing and rendering image data
`on a target display through a set of levels of metadata is disclosed wherein the
`metadata is associated with the image cantent, The method comprises
`inputting the image data; ascertainingthe set of levels of metadata associated
`with the image data: if no metadata is associated with the image data,
`performing at least one of a group of image processing steps, said group
`comprising: switching to default values and adaptively calculating parameter
`
`L
`
`1G
`
`15
`
`20
`
`30
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`if metadata is associated with the image data, calculating color
`values;
`management algorithm parameters according to set of levels of metadata
`associated with the image data.
`
`wn
`
`o
`
`is
`
`In yet another embodiment, a system for decoding and rendering image
`data on a target display through a set of levels of metadata is disclosed. The
`system comprises: a video decoder, said video decoder receiving input image
`data and outputting intermediate image data; a metadata decoder, said
`metadata decoder receiving input image data wherein said metadata decoder
`capable of detecting a set of levels of metadata associated with said input
`image data and outputting intermediate metadata; a color management
`module, said color management module receiving intermediate metadata from
`said metadata decoder, receiving intermediate image data from said video
`decoder, and performing image processing upon intermediate image data
`based uponsaid intermediate metadata; and a target display, said target
`display receiving and displaying the image data from said color management
`module.
`
`Other features and advantages of the present system are presented
`below in the Detailed Description when read in connection with the drawings
`presented withinthis application.
`
`20
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Exemplary embodiments are illustrated in referenced figures of the
`drawings. it is intended that the embodiments and figures disclosed herein are
`ta be considered iHustrative rather than restrictive.
`
`Figures 1A, 18 and 1C show one embadiment of a current video pipeline
`from creation, todistribution, to consumption of a video signal.
`
`Figure 2A depicts one embodiment of a video pipeline that comprises a
`metadata pipeline in accordance with the teachings of the present application,
`
`Figure 2B depicts one embodiment of a metadata prediction black.
`
`Figure 3 shows one embodiment of a sigmoidal curve that employs Level
`1 metadata,
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`Figure 4 shows one embodiment of a sigmoidal curve that employs Level
`2 metadata.
`
`Figure 5 shows one embodiment of a histogram plot based on
`image/scene analysis that may be used to adjust the image/video mapping
`onto a target display.
`
`Figure 6 shows one embodimentof an adjusted image/video mapping
`based on Level 3 metadata that includes a second reference display grading of
`the image/video data.
`
`4B
`
`Figure 7 shows one embodimentof a linear mapping that might occur if
`the target display fs a substantially good match to the second reference display
`used to cofor grade the image/video data.
`
`Figure 8 is one embodiment of a video/metadata pipeline, made in
`accordance with the principles of the present application.
`
`DETAILED DESCRIPTION
`
`Throughout the fallowing description, specific details are set forth in
`order to provide a more thorough understanding to persons skilled jn the art.
`However, well known elements may not have been shown or described in
`detail to avoid unnecessarily obscuring the disclosure. Accordingly, the
`description and drawings are to be regarded in anillustrative, rather than a
`restrictive, sense,
`
`Overview
`
`25
`
`One aspect of video quality concerns itself with having images or video
`rendered on a target display with the same or substantially the samefidelity a5
`it was intended by the creator of the images or video.
`It is desirable to have a
`Color Management {CM} scheme that tries to maintain the original appearance
`of video content on displays with differing capabilities. In order to accornplish
`this task, It might be desirable that such a CM algorithm be able to predict how
`the video appeared to viewers in the pest production environment where it
`was finalized.
`
`30
`
`To illustrate the issues germane to the present application and system,
`Figures 1A, 1B and 1C depict one embodiment of a current video pipeline 100
`
`3
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`that follows a video signal from creation, to distribution and to consumption of
`that video signal.
`
`Creation 102 of video signal may occur with the video signal being calor
`graded 104 by a color grader 106 who may grade the signal for various image
`characteristics ~ e.g. luminance, contrast, color rendering of an input video
`signal. Color grader 106 may grade the signal to produce image/video
`mapping 108 and suchgrading may be done to a reference display device 110
`that may have, for example, a gamma response curve 112.
`
`Once the signal has been graded, the video signal may be sent through a
`distribution 114 ~ where such distribution should be proper canceived of
`broadly. For example, distribution could be via the internet, DVD, movie
`theatre showings and the Hke.
`In the present case, the distribution is shownin
`Figure 1A as takingthe signal to a target display 120 of maximum luminance of
`100 nits and having gamma response curve 124. Assuming that the reference
`display 110 had substantially the same maximum luminance as the target
`display and substantially the same response curve, then the mapping applied
`to the video signal may be as simple as a 1:1 mapping 122 ~ and made in
`accordance with, for example, Rec 709 STD for color management 118.
`Halding all other factors equal (like for example, ambient ight conditions at
`the target display}, then what one might see at the reference displayis
`substantially what you would see at the target display.
`
`This situation may change, for example, as shown in Figure 1B, wherein
`the target display 130 differs from the reference display 110 in several aspects
`~ @.g. maximum luminance (500 nits as opposed to 100 nits for the reference
`display}.
`in this case, the mapping 132 might be a 1:5 mapping to render on
`the target display.
`In such a case, the mapping is a linear stretch through the
`Rec 709 CM block. Any potential distortion from reference display viewing to
`target display viewing may or may not be objectionable to the viewer,
`depending on levels of individual discrimination. For example, the darks and
`mid-tones are stretched but might be acceptable.
`tn additian, it may make
`MPEG blocking artifacts moresignificant.
`
`Figure 1C shows a more extreme example, Here the target display 140
`may have moresignificant differences from the reference display. For
`
`wn
`
`40
`
`20
`
`30
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`o
`
`is
`
`20
`
`25
`
`example, target display 140 has maximum fuminance of 1000 nits — as opposed
`ta 100 nits for the reference display.
`if the same linear stretch mapping 142
`were to be applied to the video signal going to the target display, then much
`more noticeable, and objectionable, distortions may be present for the viewer.
`For example, the video content may be displayed at a significantly higher
`luminance level (1:10 ratio}. The darks and mid-tones may be stretched toa
`point where camera noise of the original capture is noticeable, and banding in
`the dark areas of the image becomes more significant.
`In addition, the MPEG
`blocking artifacts may be more significant.
`
`Without exploring exhaustively all possible examples of how
`objectionable artifacts may appear to the viewer, it may be instructive ta
`discuss a few more. For example, supposing that the reference display had a
`larger maximum luminance (say, 600 nits) than the target display (say, 100
`nits}.
`In this case, if the mapping is again a 6:1 linear stretch, then the content
`may be displayed at an overall lower luminance level and the image may
`appear to be dark and the dark detail of the image may have a noticeable
`
`crush,
`
`in yet another example, suppose the reference display has a different in
`maximum luminance (say 600 nits} to the target display (say 2000 nits).
`Applying a linear stretch, even though there may be only a small ratio
`difference {that is, close to 1:2}, the magnitude difference in maximum
`luminance is potentially large and objectionable. Due to the magnitude
`difference, the image may be far too bright and might be uncomfortable to
`watch. The mid-tones may be stretched unnaturally and might appear to be
`washed out.
`In addition, both camera noise and compression noise may be
`noticeable and objectionable.
`in yet another example, suppose the reference
`display has a color gamut equal to P3 and the target display has a gamut that is
`smaller than REC. 709. Assume the content was color graded on the reference
`display but the rendered content has a gamut equivalent to the target display.
`in this case, mapping the content from the reference display gamut to the
`target gamut might unnecessarily compress the content and desaturate the
`
`appearance.
`
`Without some sort of intelligent (or at least more accurate} mode! of
`image renderingon a targetdisplay,it is likely that some distortion or
`
`5
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`wa
`
`In
`objectionable artifacts will be apparent for the viewer of the images/video.
`fact, itis likely that what the viewer experiences is not what was Intended by
`the creator of images/video. While the discussion has focused on luminance,it
`would be appreciated that the same concerns would also apply to calor.
`In
`fact, if there is a difference in the source display’s color space and the target
`display’s color space and that difference is not properly accounted for, then
`color distortion would be a noticeable artifact as well. The same concept holds
`for any differences in the ambient environment between the source display
`and the target display.
`
`o
`
`Use of Metadata
`
`As these examples set out, it may be desirable to have an understanding
`as to the nature and capabilities of the reference display, target display and
`source content in order to create as higha fidelity to the originally intended
`video as possible. There are other data — that describes aspects, and conveys
`information, of the raw image data, called “metadata” that is useful in such
`faithful renderings.
`
`While tone and gamut mappers generally perfarm adequately for
`roughly 80-95%of the images processed for a particular display, there are
`issues using such generic solutions to process the images. Typically, these
`methods do not guarantee the image displayed on the screen matches the
`intent of the directoror initial creator.
`It has also been noted that different
`
`tone or gamut mappers maywork better with different types of images or
`better preserve the mood of the images.
`In addition, it is also noted that
`different tone and gamut mappers may cause clipping and loss of detail or a
`shift in color or hue.
`
`When tone-mapping a color-graded image-sequence, the colar-grading
`parameters such as the content’s minimal black level and maximum white level!
`may be desirable pararneters to drive the tone-mapping of color-graded
`content onto a particular display. The color-grader has already made the
`content (on a per image, as well as a temporal basis} look the way he/she
`prefers, When translating it ta a different display, it may be desired to
`preserve the perceived viewing experience of the image sequence. Ht should
`
`20
`
`3U
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`wn
`
`o
`
`is
`
`20
`
`25
`
`be appreciatedthat with increasing levels of metadata, it may be possible to
`improve such preservation of the appearance.
`
`For example, assume that a sunrise sequence has been filmed, and
`color-graded by a professional on a 1000 nit reference display.
`in this example,
`the content is te be mapped for display on a 200 nit display. The images
`before the sun rises may not be using the whole range of the reference display
`e.g. 200 nits max). As soon as the sun rises, the image sequence could use the
`whole 1000 nit range, which is the maximum of the content. Without
`metadata, many tone-mappers use the maximum value (such as luminance) as
`a guideline for how to map content. Thus, the tone-curves applied to the pre-
`sunrise images {a 1:1 mapping} may be different than the tone-curves applied
`ta the post-sunrise images (a 5x tone compression}. The resulting images
`shown on the target display may have the same peak Juminance before and
`after the sunrise, which is a distortion of the creative intent. The artist
`intended for the image ta be darker before the sunrise and brighter during, as
`it was produced on the reference display.
`in this scenario, metadata may be
`defined that fully describes the dynamic range of the scene; and the use of that
`metadata may ensure that the artistic effect is maintained.
`if may also be used
`to minimize luminance temporal issues from scene to scene.
`
`For yet another example, consider the reverse of the above-given
`situation. Assume that Scene 1 is graded far 350 nits and that Scene 1 is filmed
`in outdoor natural light.
`if Scene 2 is filmed in a darkened room, and shown in
`the same range, then Scene 2 would apoear to be too dark. The use of
`metadata in this case could be used to define the proper tone curve and
`ensure that Scene2 is appropriatelyvisible.
`In yet another example, suppose
`the reference display has a color gamut equal to P3 and the target display has
`a gamut that is smaller than REC. 709. Assume the content was color graded
`on the reference display but the rendered content has a gamut equivalent to
`the target display. The use of metadata that defines the gamut of the content
`and the gamut of the source display may enable the mapping to make an
`intelligent decision and map the content gamut 1:1. This may ensure the
`content color saturation remains intact.
`
`in certain embodiments of the present system, tone and gamut need not
`be treated as separate entities or conditions of a set of images/video.
`
`T
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`wa
`
`id
`
`20
`
`36
`
`“Memory colors” are colors in an image that, even though a viewer may not
`aware of the initial intent, they will look wrang ¢ adjusted incorrectly. Skin
`tones, sky, and grass are good examples of a memory color that, when tone
`mapped, their hue might be changed ta look wrong.
`In one embodiment, the
`gamut mapper has knowledge of a protected color (as metadata) in an image
`to ensure its hue is maintained during the tone mapping process. The use of
`this metadata may define and highlight protected colors in the image to ensure
`correct handling of memory colors. The ability to define localized tone and
`gamut mapper parameters is an example of metadata that is not necessarilya
`mere product of the reference and/or target display parameters.
`
`One Embodiment of a Rabust Color Management
`
`In several embodiments of the present application, systems and
`methods fer providing a robust color management schemeis disclosed,
`whereby several sources of metadata are employed to provide better
`image/videofidelity that matches the original intent of the content creator.
`one embodiment, various sources of metadata may be added to the
`processing, according the availability of certain metadata, as will be discussed
`in greater detail herein.
`
`In
`
`As merely one exemplary, Figure 2A shows a high level black diagramof
`an image/video pipeline 200 that employs metadata. Image creation and post-
`production may take place in block 202. Video source 208 is Input into video
`encoder 210. Captured as well as video source, metadata 204 is input into a
`metadata encoder 206. Examples of metadata 204 have been previously
`discussed; but may include such items as gamut boundaries and other
`parameters of the source and/or reference display, the environmentof the
`reference display and other encoding parameters.
`In one embodiment, the
`metadata accompanies the video signals, as a subset ofmetadata might be co-
`located temporally and spatially with the video signals that are intended to be
`rendered at a given time. Together, metadata encoder 206 and video encoder
`210 may be considered the source image encoder.
`
`Video signal and metadata are then distributed via distribution 212 — in
`any sultable manner — e.g. multiplexed, serial, parallel or by some other known
`scheme. Ht should be appreciated that distribution 212 should be conceived of
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`wn
`
`o
`
`20
`
`25
`
`broadly for the purposes of the oresent application. Suitable distribution
`schemes might include: internet, DVD, cable, satellite, wireless, wired or the
`fike.
`
`Video signals and metadata, thus distributed, are input into a target
`display environment 220. Metacata and video decoders, 222 and 224
`respectively, receive their respective data streams and provide decoding
`appropriate for the characteristics of the target display, among other factors.
`Metadata at this point might preferably be sent to either a third party Color
`Management (CM} block 220 and/or to one of the embodiments of a CM
`module 228 of the present application.
`in the case that the video and
`metadata are processed by CM block 228, CM parameter generator 232 may
`take as inputs metadata from metadata decoder 222 as well as metadata
`prediction block 230.
`
`Metadata prediction block 230 may make certain predictions of a higher
`fidelity rendering based upon knowledge of previous images or video scenes.
`The metadata prediction block gathers statistics from the incoming video
`stream in order to estimate metadata parameters. One possible embodiment
`of a metadata prediction block 230 is shown in Figure 2B.
`in this embodiment,
`a histagram 262 of the log of the image luminance may be calculated for each
`frame. An optional low pass filter 260 may precede the histogram in order to
`{a} reduce sensitivity of histagram to noise and/or (b} partially account for
`natural blur in humanvision system (e.g. humans perceive a dither pattern as a
`solid calor patch). From that the minimum 266, maximum 274 are captured.
`The toe 268 and shoulder 272 points can also be captured based on percentile
`settings (like 5% and 95%). The geometric mean 270 (log average} can also be
`calculated and used as the mid point. These values may be temporally filtered,
`$0 that, e.g., they do not jerk around too quickly. These values may also be
`reset during a scene change if desired. Scene changes may be detected from
`black frame insertionor extreme radical jumps in the histogram or any other
`such technique.
`[t willbe appreciated that the scene change detector 264
`could detect scene changes frameither histogram data, as shawn, or from the
`video data directly.
`
`in yet another embodiment, the system might compute the mean of the
`image intensity values luminance}.
`image intensity may then be scaled by a
`
`3
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`perceptual weighting, such as log, power function, or a LUT. The system might
`then estimate the highlight and shadow regions (e.g, headroam and foatroom
`on Figure 5} from pre-determined percentiles of the image histogram (for
`example 10% and 90%}. Alternatively, the system may estimate the highlight
`and shadowregions from when the slope of the histogramis above or below a
`certain threshold. Many variations are possible — for example, the system may
`calculate the maximum and minimum values of the input image, or from pre-
`defined percentiles {for example 1% and 99%),
`
`in other embodiments, the values may be stabilized over time (e.g.
`frame to frame}, such a5 with a fixed rise and fall rate. Sudden changes maybe
`indicative of a scene change, so the values might be exempt from time-
`stabilization. For example, if the change is belaw a certain threshold, the
`system might limit the rate of change, otherwise, go with the new value.
`Alternatively, the system may reject certain values frominfluencing the shape
`of the histogram (such as letterbox, or zero values).
`
`in addition, CM parameter generator 232 could take other metadata {i.e.
`not necessarily based on content creation) such as display parameters, the
`ambient display environment and user preferences to factor into the color
`management of the images/video data.
`It will be appreciated that display
`parameters could be made available to CM parameter generator 232 by
`standard interfaces, e.g. EDID or the like via interfaces (such as DDC serial
`interfaces, HDMI, DVI or the like}.
`In addition, ambient display environment
`data may be supplied by ambient light sensors {not shown) that measure the
`ambient light conditions or reflectance of such from the target display.
`
`Having received any appropriate metadata, CM parameter generator
`232 may set parameters in a downstream CM algorithm 234 which may
`concern itself with the final mapping of image/video data upon the target
`display 236.
`it should be appreciated that there does not need to be a
`bifurcation of functions as shown between CM parameter generator 232 and
`CMalgorithm 234.
`In fact, in some embodiments, these features may be
`combined in one block.
`
`Likewise, it wil be appreciated the various blocks forming Figures 2A
`and 2B are optional from the standpoint of the present embodiment and that
`
`16
`
`o
`
`20
`
`30
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`wa
`
`it is possible to design many other embodiments — with or without these
`recited blocks — that are within the scope of the present application.
`In
`addition, CM processing may take place at different points in the image
`pipeline 200 and not necessarily as depicted in Figure 24. For example, CM of
`the target display may be placed and contained within the target display itself,
`or such processing may be performed in a set top box. Alternatively,
`depending on what level of metadata processing is available or deemed
`appropriate, CM of the target display could take place in the distribution or at
`the point of post-production.
`
`o
`
`Scalable Color Management Using Varying Levels of Metadata
`
`In several embodiments of the present application, systems and
`methods for providing a scalable calor management schemeis disclosed,
`whereby the several sources of metadata may be arranged in a set of varying
`levels of metadata to provide an even higher Jeve! of image/videofidelity to
`the original intent of the content creator.
`In one embodiment, various levels
`of metadata may be added to the processing, according the availability of
`certain metadata, as will be discussed in greater detail herein,
`
`In many embodiments of the present system, suitable metadata
`algorithms may consider a plethora of information, such as, for example:
`
`26
`
`25
`
`36
`
`{1} the encoded video content,
`
`{2} a methad for converting the encaded content into Hnear light
`
`{3) the gamut boundaries (both luminance and chromaticity} of
`the source content, and
`
`{4)
`
`information on the post-production environment.
`
`The method for converting to linear light may be desirable so that the
`appearance Guminance, color gamut, etc.) of the actual image observed by the
`content creators can be calculated. The gamut boundaries aid in specifying in
`advance what the outer-most colors may be, so that such outer-most colors
`may be mappedinto the target display without clipping or leaving too much
`overhead. The information on the post production environment may be
`
`13
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`desirable so that any external factors that that could influence the appearance
`of the display might be modeled.
`
`wn
`
`o
`
`In current video distribution mechanisms, anly the encoded video
`content is provided to a target display. Mis assumed that the content has been
`produced in a reference studio environment using reference displays
`compliant with Rec. 601/709 and various SMPTEstandards. The target display
`system is typically assumed to comply with Rec. 601/709 — and the target
`display environmentis largely ignored. Because of the underlying assumption
`that the post-production display and target display will both comply with
`Rec.601/709, neither of the displays may be upgraded withoutintroducing
`some level of image distartion.
`In fact, as Rec. 601 and Rec. 709 differ slightly
`in their choice of primaries, some distortion may have already been
`introduced.
`
`One embodiment of a scalable system of metadata levels is disclosed
`herein that enables the use of reference and target displays with a wider and
`enhanced range of capabilities. The various metadata levels enable a CM
`algorithm to tailor source content for a given target display with increasing
`levels of accuracy. The following sections describe the levels of metadata
`proposed:
`
`20
`
`Level O
`
`Level 0 metadata is the default case and essentially means zero
`metadata. Metadata may be absent for a number of reasons including:
`
`{1} Content creators did not include it (or it was Jost at some point
`in the post-production pipeline}
`
`(2) Display switches between content (Le. channel surfing or
`commercial break)
`
`{3) Data corruption or loss.
`
`In one embodiment, it may be desirable that CM processing handle Level
`0 {.e. where no metadata is present) either by estimating it based on video
`analysis or by assuming default values.
`
`2&5
`
`30
`
`bsPat
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`un
`
`40
`
`26
`
`in such an embodiment, Color Management algorithms may be abie to
`operate in the absence of metadata in at least two different ways:
`
`Switchto default values
`
`in this case a display would operate much like today’s distribution
`system where the characteristics of the post production reference display are
`assumed. Depending on the video encoding format, the assumed reference
`display could potentially be different. For example, a Rec. 601/709 display
`could be assumed for 8 bit RGB data.
`If color graded on a professional monitor
`{such a5 a ProMonitor) in 600 nit mode, P3 ar Rec 709 gamut could be
`assumed for higher bit depth RGB data or LogYuv encoded data. This might
`work well if there is only one standard or a de facto standard for higher
`dynamic range content. However, if the higher dynamic range contentis
`created under custem conditions, the results may not be greatly improved and
`may be poor.
`
`Adaptively calculate parameter values
`
`in this case, the CMalgorithm might start with some default
`assumptions and refine these assumptions based on information gained from
`analyzing the source content. Typically, this might involve analyzing the
`histogramof the video frames to determine how to best adjust the luminance
`of the incoming source, possibly by calculating parameter values fora CM
`algorithm.
`In doing so, there may be a risk in that it may produce an ‘auto
`exposure’ type of look to the video where each scene or frameis balanced to
`the same furminance level.
`In addition, some formats may present some other
`challenges — for exampie, there is currently no automated way to determine
`the color garnut if the source content is in RGB format.
`
`in another embodiment, it is possible to implement a combination of the
`two approaches. For example, gamut and encoding parameters {like gamma)
`could be assumed to be 2 standardized default value and a histogram could be
`used to adjust the luminance levels.
`
`30
`
`Level i
`
`inthe present embodiment, Level 1 metadata provides information
`describing how the source content was created and packaged. This data may
`
`is
`
`
`
`WO 2012166382
`
`PCT/US201 2/038448
`
`allow CM processing to predict how the video content actually appeared to the
`content producers, The Level 1 metadata parameters may be grouped inte
`three areas:
`
`(1) video encoding parameters,
`
`{2) source display parameters,
`
`{3) source content gamut parameters, and
`
`{4} environmental parameters.
`
`Video Encoding Pararneters
`
`As mast Color Managementalgorithms work at least partially in a linear
`light space, it may be desirable to have a method to convert the encoded video
`to a linear (but relative) (x, Y,Z} representation -- either inherent in the
`encoding schemeor provided as metadata itself. For example, encoding
`schemes, such as LogYuv, OpenEXR, LogYxy or LogLuv TIFF, inherently contain
`the information necessary to convert to a linear light format. However, for
`many RGB or YCbCr formats, additional information such as gamma and color
`primaries may be desired. As an example, to pracess YCboCr or RGB input, the
`following pieces of information may be supplied:
`(i) the coordinates of the primaries and white point used for
`encoding the source content. This may be used to generate the
`RGB to XYZ color space transform matrix. {x,y} coardinates for
`each of red, green, blue, and white.
`
`{2) the minimum and maximum code values (eg. ‘standard’ or ‘full’
`range). This may be used to convert code values inte
`normalized input values.
`
`{3} the global or per-channel response curve for each primary {eg.
`‘gamma’}. This may be used to linearize the intensity values by
`undoing any non-linear response that may have been applied
`by the interface or the reference disp