Best of existing LUT formats

This is a space to post information on features of existing LUT formats we think are popular or worth investigating.

Info on the popular .csp LUT format (part of the cineSpace manual).
CSP Format.pdf (184.1 KB)

The ones are online references to a few outstanding formats I can think of

I couldn’t find a reference to Foundry NUKE’s own .cube format and R&S CLIPSTER’s own XML format.

Iridas / Adobe .cube format. Broadly similar to Resolve .cube, but with different domain tags:

It would be ideal if CLF was able to encapsulate a LUT from any other LUT format without modification. But I think that is unfortunately not possible. The issue that springs immediately to mind is .csp , with its ability to have arbitrarily spaced input entries in its preLUTs. This can be represented as an IndexMap in the current CLF format, but since IndexMaps with length >2 are not required under the current spec (and therefore not supported in either the Resolve or Autodesk implementations) .csp LUTs with non-uniform input domains cannot be reliably represented as a CLF using that approach.

My work-in-progress CLF code in Colour Science for Python can load any of the valid sample Cinespace LUTs from here, and save them as CLF LUTs which will work in Colour. However @hpduiker 's Python sample implementation is the only other place I have found that they work properly. Only some work in Resolve and Flame, due to the “un-required” elements of CLF. And we are looking at removing such components from the spec.

So the question is how to accurately replicate the transform from a .csp with a non-uniform domain when translating it to CLF. The options I can see are:

  • IndexMap
  • halfDomain

The issue with the use of an IndexMap are that is is difficult to implement in a real-time system, which I gather is why it is currently not a required component of CLF. The benefit is that it could be translated back exactly to the original .csp preLUT.

A possible concern with halfDomain is that it is not implemented in any other mainstream LUT format, so if a transform is ‘translated’ from another LUT format to halfDomain it cannot be easily automatically translated back. The benefits are that it can efficiently implement a 1D transform on any value representable in half-float, with no interpolation required if the output is also to be half-float.

So I guess the question is, how many people are using the non-uniform input domain capabilities of Cinespace LUTs? Would it be an issue if these could not be directly replicated in CLF?

Two points I’d add to the discussion:

  1. OCIO also effectively supports non-uniformly spaced shapers as 1D LUTs can be run in ‘inverse’ mode. This is how shapers are implemented in the ACES OCIO config.
    https://github.com/hpd/OpenColorIO-Configs/blob/master/aces_1.0.3/config.ocio#L1502
    How has Autodesk efficiently supported that functionality?

  2. If the IndexMap can be specified clearly, implementors could evaluate the map across all half float values and then cache a half-domain LUT version of the in-memory with the expected speed boost but no loss in accuracy. It is slow in Python to compute this cached half-domain representation:
    https://github.com/hpd/CLF/blob/master/python/aces/clf/IndexMap.py#L131
    but unless you’re processing an image that’s smaller than 256x256, it is still faster than computing the IndexMap result each time. The time to compute that cache should be a non-factor in a C implementation.

HP, yes, that is a very useful feature of OCIO, but it does have some limitations worth noting. First, the inverse Lut1D evaluator is too slow for realtime processing, so in Autodesk apps we convert it to a half-domain forward LUT (similar to your second point). The other limitation of the technique is that the shaper must be uniformly spaced in one of the two directions. So it is not possible to losslessly represent an arbitrary IndexMap using that technique.

Because of this, when OCIO reads a .csp transform it resamples the shaper (“prelut”) into a 65536 entry forward Lut1D. Unfortunately it is a uniformly spaced (i.e., not half-domain) LUT and therefore may lose accuracy (for the same reasons as Nick illustrated in a recent post).

Doug

Just a quick note to say it is not necessarily that slow. The implementation I have been working on leverages the parallelization of NumPy, and can convert an IndexMap to a halfDomain LUT in a couple of microseconds.

Can someone prove that converting an arbitrarily spaced IndexMap to a halfDomain 1D LUT is going to be precise in most cases? This is probably the only option for realtime GPU usage.

Also, this sort of implementation trick should probably be communicated in the spec document if arbitrarily spaced IndexMaps stay around.

I’m not sure about proving it, but a halfDomain lookup is effectively a preset IndexMap with samples at every possible half-float code value. If you consider the precision of half-float acceptable, then by definition its code values are closely enough spaced samples. And for higher precision input, you can still interpolate between the samples.

A lot of people consider 16-bit float to be precise enough for image processing. When writing a renderer I usually opt for 16-bit float when realtime performance is more important than anything else, and 32-bit float for precision when GPU load isn’t a factor. Might be overkill.