In particular there was a mention of CAM16 and a use case to “match color appearance” across devices when device capabilities or viewing environments differ (e.g. dynamic range, primaries, surround, display max luminance, etc.). It was mentioned that this is only one use case and another might be to match the absolute colorimetry.
The CAM16 paper can be found here for those interested in reading up on the model.
We’ve always said that the RRT is intended to account for the perceptual factors required to convert the scene referred ACES images to an output referred encoding associated with an very high dynamic range, extremely wide color gamut device in a dark surround.
The ODTs are intended to account for difference in display capabilities, and viewing environment differences, between that theoretical device associated with the RRT and any real device.
thanks for the answer. I am aware of the definition of the RRT and OCES, but I am struggling to understand it.
Let’s examine the definition in more detail:
So the “perceptual factors” sound very much like “colour appearance phenomena” to me, “factors” we need to account for in order to make the image appear as an observer perceiving the scene directly.
The “scene” is a very brought concept which needs more definition in my opinion. On my own image appearance matching tests, we found that the definition of the scene was one of the biggest factors when deriving display rendering transforms (more below).
The definition of the Reference Display is also not clear to me either. (I know the words… but what do they mean):
"RRT (Reference Rendering Transform) – Converts the scene-referred ACES2065-1 colors into colorimetry for an idealized cinema projector with no dynamic range or gamut limitations."
"…Although it is possible to use the Output Color Encoding Specification (OCES) encoding for image storage, it is primarily a conceptual encoding, a common `jumping-off point’ for Output Device Transforms (ODTs)…"
Mh… does not clarify things for me in practice.
Is the references display an ideal 3 primary display?
Here “extreme wide gamut” is a bit ambiguous.
Why is a cinema-like environment coded into the Reference Viewing Condition? This might be unideal if I want to build a perfect pipeline for VR maybe?
(For me, an ideal display would have much more primaries and all “three component” approaches would be obsolete anyway.)
Also, how do you verify the isolated performance of the RRT if such a display is not available to the present day?
“… Required to convert…” does not really specify the intent of the RRT. Is its definition to match appearance, to satisfy a hand full of expert viewers or something else? Is it performance measure an objective or subjective one.
Assuming you put Colour appearance matching in the ODTs:
I have doubts that you can derive and visually verify “colour appearance models” if the source space is an abstract one (or ambiguously defined). If you encounter discrepancies in colour matching it is hard to debug - where does the error comes from (the RRT or the CAM in the ODT)?
for a CAM “the scene” could be just another viewing condition. So the problem of transforming from a specific scene referred image state to a specific display referred image state could be (and can be) solely solved by a one suitable model, I suppose.
I write a “specific scene” because there is strong evidence that such a model could produce the best results if it would alter its parameters based on the specific scene and specific display viewing condition. A scene with less dynamic range (the configuration of the lighting on set) might need a different set of tone mapping and gamut mapping parameters than a scene with a lot of dynamic range.
In Motion Picture, the situation is relaxed because we have a skilled colourist that can alter the image state by additional operations ( this is one aspect of colour grading), but there might be other use cases.
So one fundamental question might come down to “what is the ‘golden’ reference scene” for a static colour management framework?
To me, we really should open the discussion about the RRT if we consider enhancing complexity in the ODTs.
In many related disciplines, complexity comes with a great penalty for computation cost, manufacturing costs etc… so complexity should always be avoided. I think such a mindset would be good for the development of the “next ACES”. Every line of code in the image processing pipeline really needs to show a clear benefit. I have doubts that we will see real-time implementations of the actual real code of RRT+ODT if we increase complexity rather then decreasing it.
Removing or redesigning the RRT really is a good opportunity to reduce complexity.
I pretty much agree with everything you’ve said and completely understand why you have the questions you have. Let me see if I can provide some of the historical background that might help answer some of your questions …
I think this is generally true, but there may be a bit more to it. The intention wasn’t for the RRT to be color appearance model per say, but rather a transformation that yields a “good reproduction” for a variety of scenes. We know that in order to create a good reproduction certain color appearance phenomena need to be accounted for and the RRT has characteristics similar to that of a color appearance model.
The definition of the scene is part of the SMPTE 2065-1 ACES encoding specification. It conforms to ISO 22028
You’re correct that this has never been codify in a document, however, as you also point out this is needed to build the RRT, so we can actually infer it from the RRT code. I agree this is not ideal.
I’m not sure I agree with the definition in the appendix personally. The ACES 1.0 RRT does have dynamic range. Huge, yes … unlimited, no. Gamut limitations are a little harder to define in this case. The primaries are virtual, so I guess that means there’s no gamut limitations. But then again, dynamic range is a component of gamut so maybe it does.
I think we might be better served to make it less conceptual and more practical. More on this below.
Yes, that was the intention … again not codified anywhere than in the RRT CTL code.
The RRT CTL code defines the luminance dynamic range of the RRT as 0.0001 to 10,000 nits. The primaries of the OCES encoding are the AP0 primaries. This, by inference, means that OCES reference display has a dynamic range of 100 million to 1 and can reproduce any color inside the spectrum locus.
A viewing environment needed to be established for OCES. A cinema-like environment seemed like the natural choice due to systems primary use case. The expectation is that if the final viewing environment associated with any particular output referred encoding is not a cinema-like environment then the difference between the two viewing environments will be accounted for in the ODT …
… likewise defining the OCES viewing environment as one for VR might not be ideal for cinema. BTW, I’m not exactly sure what a VR viewing environment would be, but that’s beside the point.
Bottom line, it’s relatively easily compensated for viewing environment differences in the ODT.
This is something we’ve always understood and struggled with. Personally, I think this is why it might be useful to define OCES relative to a realizable device, but the counter to that argument is that once you’ve done that it’s surely going to be surpassed by other devices.
Agree … intent is important. I think the way the RRT evolved it was more subjective than objective. I tried to bring this question of intent up by referencing Digital Color Management: Encoding Solutions. Ed talks a lot in the unified paradigm section about interpretation options.
Agreed, having a colorist is a huge help. It’s worth noting that putting in more dynamic adjustments based on scene content tend to drive the colorists a bit nuts.
I think you’ve started that conversation
In all seriousness, I think we all agree that unnecessary complications should be avoided. I would like to see all the code be only as complex as necessary to achieve the stated objective of each individual transform.
Per @Thomas_Mansencal comment, I think all of these comments are inline with the feedback from the group that wrote the RAE paper.
Thank you, Thomas and Alex, for this amazing reply:
Good that the discussion is picking up. I think there is a unique chance in the development of ACES to question everything that is not 100% clear.
So to reformulate the issue I am seeing:
We have a complex transform (RRT) that transforms into an ambiguously defined (and abstract) space with a highly subjective goal (preferred colour reproduction). This cannot be a good starting point for colour appearance matching.
I agree that such a transformation is mandatory in a motion picture pipeline to elicit a pleasing image rendition. I do not agree that one specific transformation must be mandatory in the overall framework
All the abstraction needed to fulfil the task of preferred colour reproduction could be internal to the transformation, so there would be no need to try defining an abstract space (an achievable display gamut but at the same time not limiting).
Someone could argue that the RRT is conceptually a very complex LMT, and maybe we should treat it like this.
About OCES and the Reference Display:
You could flip the problem space by declaring OCES as scene referred, so you would specify the output of the RRT (or any other LMT) to be larger than any present camera. This would remove the need to specify a display we can (and cannot) look at. We had this definition problem with camera negative too. Many people treat “camera negative space” already as display referred, but to me it is “more” scene referred as it is closer to the scene as to the display. When we build looks we try to stay conceptionally on one side (scene or display).
Agreed. I think it’s critical to make this 100% unambiguous if we continue to use a two part rendering transform.
A primary driver of a reference rendering transform has been the archival use case. One of the goals of the system has been to put a defined digital source master into the archive and to make it unambiguous how to view that digital source master. In this film world this meant archiving the negative and keeping the print material relatively static in its design. This is what allows us to take negatives out of the archive from 50 years ago and make viewable prints. One could argue this could be achieved with proper metadata, but that puts a lot of faith in the “metadata keys” not getting lost.
This is a bit tricky and a bit philosophical. At Kodak we always treated Printing Density (i.e. the negative) as a third space. It general I think it gets a bit dangerous to think of OCES as scene referred. I’ll roll this one over a little more …
Since the current ODTs are intimately tied to the current RRT (the original ODTs which contained no tone mapping element would not work with the current RRT, would they?) one could argue that accurate reproduction at a later date requires that the entire set of ACES transforms current at the time of archive should be stored with it. In that case, is not an ACES 2065-1 archive adequate, as a post RRT version is just a known transform from that?
It ties back into the conversations we had in regard to ACESclip, missing ACES version in the ACES container: it is currently impossible to know what version of ACES a given show has been using, which essentially breaks the archival promise.
I did not want to open the “archival pandora box”, but I guess it is related to the RRT discussion:
In practice, if you have a negative from the 60ths - you have no chance to unambiguously view the material as viewed at that time. In ten years from now, it will be even harder as it will be more and more difficult to pull a print, and the only thing you can do is scan the neg - which comes with its own caveats in terms of colour reproduction. What I am trying to say is that there is no real long-term archive which 100% accurately stores the intended look (not in the past and not today), and I believe this will be true in the future. Also, a graded master is somewhat not really a long-term archive as the material was altered based on a particular viewing condition, so all baked in decisions are indirect output referred.
So there is a lot of effort put in the initial ACES design to fulfil a somewhat “romantic” requirement. And the penalty for this is complexity and a “baked in” look - in my opinion.
You could argue that baking the RRT (or any other HDR aware look) is nothing more than a complex grade and could be baked into the archive, as ideally, it should not limit the gamut.
To boil all of this above down, someone (me) could argue:
"Put CAM into ODTs and make the RRT an LMT… "
Agreed … knowing the ACES version is key. The intention is to have this in the ACESclip metadata sidecar file. We debated having it in the ACES container file in the past, but the group felt it wasn’t frame based metadata so it was best placed in the ACESclip.
More history: the ACES system version was intentionally omitted from ST 2065-4 files because of the unacceptable overhead that comes with opening every single image file in a production to access that piece of essential metadata. ACESclip was (and is) designed to contain the essential metadata for a collection of individual image files (shot, scene, reel or entire show - take your pick), such as ACES system version number, intended ODT, etc. That’s why ACESclip support is an ACES Logo Program “must have”.
The archival promise is still alive. Fulfilling it is dependent on ACESclip support going into the tools and then being used by everyone in production. Or some other pipeline-wide metadata carrier that we can all agree on and standardize.
[Forgive me if I’m misusing the “Reply” button and breaking up threads, but if it’s here, I’m clicking it!]
Good point about the film system - the target print stock is considered when shooting negative, and even though you might have the paper tape, it doesn’t really help you if the print stock is no longer available. But you do effectively have a bounded problem in that you know what the intended display medium and environment is, and that knowledge has served film archivists well for a long time.
I think those archivists might argue that they have to have the graded master in order to fulfill their mission, i.e., preserve the filmmaker’s intent. Film or digital, the final corrections are part of the movie, and that’s what goes into the archive.
The photochemical film generations before us, I’ll guess, didn’t think about repurposing or remastering their film negatives for future display technologies. Happy accident or by design, their work is protected for future generations by the inherent long-term archivability of film negative.
ACES, as a system, intends to cover both use cases: preserving filmmaker intent for a particular set of display conditions and preserving the ability to remaster for other display conditions. ACESclip’s optional metadata enables referencing camera native files, associated Input Transforms, ASC-CDL files, LMTs and Common LUT Files. So you can go as unbaked as you like.
However the ACES architecture is extended/enhanced to address where the various processing steps takes place (I’m not a color scientist so I won’t embarrass myself with an opinion here), essential metadata that eliminates ambiguity is a fundamental requirement for long-term archiving. And it helps a whole lot in production, too.
To close the pandora box again, I think the excistence (or non-existence) of the RRT does not stand in the way of the archive promise. It is just a definition thing.
About CAMs: There are very complex CAMs out there and it seems that CAM 16 simpler that CieCAM02 is still quite complex.
Also existing CAMs for unrelated stimuli can only be used to some extent.
Most important: take any attempt for pleasing colour reproduction out of the equation.
It is a very nice discussion this one, and I want to thank everybody for their great support and initiative!