Framing metadata

Hi all,

We decided to include framing metadata as an optional element in AMF, but its form still needs to be decided upon. Thanks to Josh Pines and Walter A for both providing proposals on how this could look.

For discussion on the call tomorrow, here is an example framing setup from ARRI Frame Line Tool (Alexa LF Open Gate 4.5K, framing for UHD center crop) and proposed representations of this in XML form.

Proposal #1 - ‘coordinates’:

<framing>
    <inputFrame>0 0 4447 3095</inputFrame>
    <extractionArea>304 468 4144 2628</extractionArea> 
</framing>

Proposal #2 - ‘size with center offset’:

<framing>
    <inputFrame>4448 3096</inputFrame>
    <extractionArea>3840 2160</extractionArea>
    <centerOffset>0 0</centerOffset>
</framing>

Proposal #3 - ‘size with coordinate offset’:

<framing>
    <inputFrame>4448 3096</inputFrame>
    <extractionArea>3840 2160</extractionArea>
    <originTopLeft>304 468</originTopLeft>
</framing>

I think I did that right.

Generally I am a fan of #2 or 3. The benefit of #2 is that it is straightforward when it is a center crop, which in my experience is most often the case. The benefit of #3 over #2 is that you would know exactly where to crop in the event that the width is an odd number, and the ‘center’ is in the middle of a pixel. The ‘origin’ method of #3 solves that problem.

#1 is the most absolute, and closest to the EXR header attributes, but is less human readable (to me) and requires 8 numbers rather than 6.

ARRI and RED seem to store frame line info similar to #3
(@joseph and @Graeme_Nattress, please correct me if I’m wrong)

PDF, EXR, and Nuke seem to use something closer to #1
(proposed originally by @walter.arrighetti, please confirm I represented it correctly)

(@peterpostma, @rohitg, @brunomunger, any opinions from dailies / color software side?)

Any of these could be made proportional, rather than absolute,
so #3 above would turn into:

<framing>
    <inputFrame>4448 3096</inputFrame>
    <extractionArea>0.86330935 0.69767442</extractionArea>
    <originTopLeft>0.06834532 0.15116279</originTopLeft>
</framing>

I think we would also want to include the ‘final frame’ or ‘final resolution’ i.e. 1920x1080 for dailies - but wanted to get opinions on preference out of the above proposals.

Please let us know, or if you can make the call tomorrow at 9am, speak to you then.

Thanks,
Chris

cc: @Alexander_Forsythe

Thanks Chris,
I assume proposal #3 came from me, so maybe it goes without saying that I’m in favor of that one :slight_smile: But I personally like that one considering it checks all scenarios I have came across, regardless of how likely they are to come up. The other thing I would wonder is if it’s possible to actually label it:

<framing1>
    <inputFrame>4448 3096</inputFrame>
    <extractionArea>3840 2160</extractionArea>
    <originTopLeft>304 468</originTopLeft>
</framing1>
<framing2>
    <inputFrame>4448 3096</inputFrame>
    <extractionArea>3840 2160</extractionArea>
    <originTopLeft>304 468</originTopLeft>
</framing2>

Therefore we can track multiple frames. It is very common that we have two different frames we’re delivering during the dailies process. Or 2 different frames we’re having VFX track, which is the intended frame, versus the extended area they deliver back for finishing.

As an example, maybe the frame is 2.39, but we want to deliver 1.78 to editorial during the dailies pass, but PIX gets 2.39. Editorial then adds their own matte in the Avid.

The ultimate goal in my mind for this would be to:

  • Create frame lines in camera
  • Within Colorfront, Daylight, Etc, just click a button for Frame1 and it would apply the framing. Click a button for frame2, and it applies that framing. You could then create your software presets based on these, or just use the in camera frame lines as your framing option.
  • Then when we render EXR’s for VFX pulls, we place this information (somewhere?) in the EXR extended attributes for the VFX vendors to be able to then click a button in Nuke to re-apply said framing.
  • Same would go for stereo pulls for features

There is a lot to think about with this, but I think it’s important to not forget why we’re asking for this. And that reason to me is to no longer have post production departments all creating slightly different framing presets because of un-pixel accurate framing charts shot on set. But I assume I’m preaching to the choir on that one!

The other thing we may want to think about documenting for software vendors is what to do when the extraction area is wider than the frame of your project. Do they auto matte it? I would think yes, but it’s something that would need to be defined for them.

Hello,
for what I can say I def also agree on #3 as the best approach.
Also I’m with Jesse on this one: I think we should be able to include multiple frame extraction pipelines in the AMF (although this is a conversation that goes along with the multiple color pipeline inside a AMF and - if the decision is not to proceed with multiple color pipeline, then probably the idea of have multiple AMF for each intended pipeline should also apply for the framing extractions).
However I agree with the goals that Jesse is aiming to: ultimately I think that the idea is that an AMF can be used to set automatically pipelines avoiding the need to always double and triple check the turnover of each vendor involved in a project. And that’s why is so important that not only color is tracked, but also frames extractions.

As Chris is saying, I also second that the extraction process should also include a scaling node (and blanking?), so one can scale both H and V at the same amount to upscale or downscale a picture keeping a consistent aspect ratio with the original source material, or use a different ratio to apply a desqueeze (of any nature) to the image.
For instance:
For a 1080p dailies process with a 2.39:1 blanking on

<framing>
    <inputFrame>4448 3096</inputFrame>
    <extractionArea>3840 2160</extractionArea>
    <originTopLeft>304 468</originTopLeft>
    <scaling>1920 1080</scaling>
    <blanking>0 138 0 138</blanking>
</framing>

Or a 2x anamorphic desqueeze from a 2.8K 4:3 Alexa sequence for a 2K DCI DI timeline would be like

<framing>
    <inputFrame>2880 2160</inputFrame>
    <extractionArea>2570 2160</extractionArea>
    <originTopLeft>155 0</originTopLeft>
    <scaling>2048 858</scaling>
    <blanking>0 0 0 0</blanking>
</framing>

I think that the order of the events in the xml should also represent the order on which each process happens (the way how I originally thought of it was to do something like the LMT stack position, but maybe just the order in the node will do just fine. Obviously the order of operation is essential to obtain the right results.
I also thought that we should include a comment on what scaling algorithm it is recommended to be used, although I would not go into trying to make that mandatory or actually design it in a way that should actually produce any affect (it’s more an instruction between DI houses and VFX to say things like " we gonna do this with a Catmull-Rom algorithm, see if you can do the same!")

Hope it makes sense.

Hi Chris.
Yes, proposal #1 is exactly mine; and yes: it’s like PDF, EXR and ARRIRAW v2 behave.

Personally I think that almost whatever choice is good, I’m really agnostic as to which solution is picked up in the end. All in all:

  • my proposal (#1) really includes all frame-mapping cases: i.e. scaling (+blanking) and H/V reflections (which are relevant to Stereo-3D and a few VFX shots), although it’s less human-readable;
  • proposals #2 and #3 may be easily extended to support reflections if negative values are allowed – but still no scaling;

what I would tend to discourange, instead, are:

  • adding scaling and blanking in separate XML elements adds complexity and ambiguity: the same framing can be described by several metadata combinations;
  • the soluton with float numbers, because that may lead to ambiguous renders at frame edges.

As regards adding multiple framing info, I believe we all agree that follows the same road than having multiple color pipelines: it’s reasonable that each pipeline also bears framing info along with it.

This is not the right topic but allow me to say that a wise and smart dosage of multi-pipelines and history elements (even if they are optional), allows to rebuild the imaging intent by DoP: with history, one also retrieves original formats, color encodings and framing – all help in recreating the wider perspective over a piece of footage.

As others have said, any would do as they are all a simple mathematical transform from one another. I would pick #2 if it were up to me, simply because it is instantly human readable in terms of the size of the extracted frame and “is it centred?”

Thank you all for your input.

On the call, we decided to proceed with #3. It has the readability of #2 but solves for the rare case when an odd number of pixels is involved. We also decided to stay with absolute pixel counts, rather than proportional / ratio values, to reduce the chance of rounding errors. We also decided to not add additional elements for output scaling / blanking, since this adds unnecessary complexity outside of the most important part: “what is the active image area to view?”.

We will be updating the schema before the next meeting, so if anyone has any strong objections please voice them now!

Thanks,
Chris

One remaining question: should we have a desqueeze ratio? (i.e. 2.0 for anamorphic). I’m thinking that, since we are not adding output scaling / blanking, the desqueeze ratio would be necessary for an anamorphic extraction to be fully automated. Open to thoughts here!

Thanks,
Chris

Hey Chirs,
Desqueeze ratio is definitely useful, but I think it should go along with scaling.
If you decide to put it without scaling (I do appreciate that there is an use for it even if scaling is not involved), please make sure that the number has three decimal points as there are certain squeezing factors that are not really conventional and sometimes one can require more precision.

Hope it helps
F

I agree with Francesco with both points.
Squeezing is important l, but it should be there only if scaling is: I hope no one will ever hand-type an AMF.

As the three decimal points I agree and all-in on that: let’s define them as a ratio, as it’s done already for noninteger frame rates – numerator and denominator. So no “rounding” or higher level cose processing of it is requited.

So 1.66 really is 5 : 6.

Hey Walter and Francesco,
I am curious why it is important to keep a scaling attribute along side the de-squeeze? I have never had to think about what I am scaling to, when de-squeezing an image, so I assume I’m missing something.

As an example, when receiving 2:1 Anamorphic media from set, in the lab, I apply the 2:1, or 1.33:1, but regardless of whether I am rendering out HD files for editorial, or 720 files for PIX…Or even smaller files for a different screener deliverable, or a random resolution for VFX, the de-squeeze never changes.

And the same would go for my team in finishing. When we conform, these two values are very independent of one another.

I guess I’ve just never heard of a de-squeeze changing in any way due to the scaling you’re applying to your output. Maybe I’m missing something though? If not, I would wonder how often we will actually get this de-squeeze value from set. The camera may not always know there is an amamorphic lens on it and we may not always have smart lenses, so I think we need to assume this field will sometimes be blank. I agree that it would be good to have it in there though.

Therefore: (Sorry, I can’t remember how you wrote in the frame ID #'s during the meeting, so those are missing below, but essentially:)

<framing>
    <inputFrame>2880 2160</inputFrame>
    <extractionArea>2570 2160</extractionArea>
    <originTopLeft>155 0</originTopLeft>
    <deSqueeze>2048 858</deSqueeze>
</framing>

Hi Jesse.
Here’s my three cents.

  • First point: the anamorphic factor, as far as it is proposed to be represented here (i.e. as a single number rather than detailed optical parameters of a cylindrical lens), is nothing more than a vertical scaling (squeezing) factor; so why neglect horizontal scaling?
  • Second point: generally speaking, all you might be ever concerned with in transporting framing metadata along with AMF is squeezing. However, to some advanced image evaluation workflows, you might want to preserve, along with the AMF history, the output resolution of what was exactly viewed. For exampke, in a compositing or QC perspective, you want to preserve that a Scope central extraction of a specific area was made, but also that this 2.39:1 frame was viewed in a sandboxed HD monitor (in Rec.709), rather than on something else (e.g. a 8K Rec.2020 reference monitor, where 1-by-1 pixel was viewed without scaling). In such a case, full (i.e. both horizonal and vertical) scaling factor may be relevant in AMF.
  • Third point: neither anamorphic factor nor scaling are, strictly speaking, color-related metadata; as they may both be considered framing parameters, they should be either both-in or both-out of AMF.

What I’m saying is that squeezing factor is of course fundamental (mostly for what you said at the end: that’s not always included in all raw camera footage metada because of lack of smart lenses).
From imaging science perspective, I propose to call squeezing factor as <verticalScaling> and have both it and <horizontalScaling> as an independently optional couple of parameters. So squeezing-only or full scaling can be both used, according to different needs.

I question whether a single stage scale+offset is sufficient.

The format choice is fine, and exact integers is important for QC as well.

But in the general model of processing including film scans (which are still relevant in restoration)
there is the Image Capture, the Extraction of the Working Image -perhaps with extra margin- (Bayer pattern requirements), possible desqueezing as noted above, and then extraction of the final work product frame.

Sometimes there are extractions of TWO different output ratios and they may be shifted (top-line or center-line as examples both a 1.85 and a 1.78 (TV) are sometimes extracted when a larger canvas is available. VFX would have been instructed to work in the larger ‘protection format’ that the DP shot. (protect for 1.78 with a 1.85 cinema release.)

If these are not carried in a chain in a IMF, then point to point files for each of these steps is needed
and have to be managed for the history of the transforms applied. Some of these transforms are unique to a small number of frames, so you will still get some splitting of the AMF over time. I think
part of the goal is to recreate from ‘camera’ original the image transforms that get you to the current result both in color and size? True?

In general, it is far better to have exact IN and OUT pixel integers than to apply an approximate ratio that is never really correct. Sometimes they are cheated because the ratio is not important, the output deliverable is.

All,

I’ve updated the XSD based on the discussion so far.

And the example

As another reason not to use ‘scaling factors’ but go right to the deliverable raster…
the aspect ratio for DCI Cinema is NOT 2.39 it is 2048x858 (or a 2.386946…:1)
Exact integer line placement is important.

Also for TV, it is not 1.78:1 but rather 1.77777777777…:1 (1920x1080)

All ratios you have heard about are non-deterministic and sometimes choices have to be made
about fitting in that are not standard… easy with integers, hard with ratios. Another example, some old films have to shown at 2.4 which was the actual projection ratio, at 2.39 there might be a splice appearing.

So for the same reason we should not use float positions, we should not use float scalers.

BTW, I am not talking about the centering or offsetting to get a frame in the middle for letter boxing or pillar boxing, or for odd formats or even for pan and scan, I am just referring to creating the working active image. The origin is ALWAYS going to be relative to the inputFrame size.

I think that the should always be able to describe what the flat image size is even if you are working on an anamorphic working frame.

Jim

Hi @JesseKorosi ,

sorry for the late reply, it has been a busy week.

I think both @walter.arrighetti and @jim_houston are better describing and clarifying my points. I like the <verticalScaling> and the <horizontalScaling> idea a lot, I think solves both unusual squeezing problems and pixel accurate scaling.

However, I don’t think it’s enough still. Please don’t hate me if I try once more (and one last time, I promise) to express my doubts.

Having @jim_houston mentioning “active image area” on his last email, gives me the chance to argue again on the need of two additional (optional) nodes to add to the framing tree.

My argument is mainly based on trying to clarify (mostly to myself) what is the aim of this framing metadata and what we refer to when we talk about extraction. I might be stating the obvious here and I’m sure you will all have considered the following points and I’m just overkilling this topic but, for the sake of trying to clarify what I have in mind, I’ll go further.

I think it’s very important to make a difference between how an extraction guide is used on set and how it is used in post.

I think we all agree that extraction guide lines designed for on-set purposes want to specify what the operator has to frame for on-set and not more., especially if the we aim to be able to translate those numbers into a camera-compatible frame-line file. In 99.99% of the cases operators only want to see what they need to frame for. Too many lines in the viewfinder make their life impossible, it distracts them.

The post-production framing, on the other hand, in my personal experience, specifies what the working image area should be, or -in other words- how a given frame should be processed to fit the current post-production stage. Most of the time (I would say 80% of the time) the extraction isn’t the target frame, but instead is how the image needs to be cropped and adjusted to, from which the target area is obtained after and within it. In other words, in post production we never crop for the target area, but rather for a larger working area.

I know that different AMF will be generated for different stages of production, therefore the concept of extraction can vary from set to post from what is the target area to what needs to be pulled for VFX or DI, but still I think that one single instruction to define a frame and a working area isn’t enough. I strongly believe there is a need of a multi-layer system.

If this framing metadata wants to automate post-workflows I think it needs to account for these instructions:

  • INPUT FRAME

  • EXTRACTION (CROP)

  • SCALING (V+H)

  • TARGET AREA (V+H)

  • ACTIVE AREA (V+H)

We got the first three nailed down, allow me to argue that we need the forth one to make the whole thing working and possibly the fifth one to account for every scenario I ever had to deal with.

  • TARGET AREA: I previously referred to it as “blanking” instruction, but after reading Jim and Walter’s post I think we could refer to it as “target area”. Conceptually they are two different approaches to get to the same result: once an image has been cropped and scaled, we use this instruction to tell the software what portion of the frame matches with what has been framed on set. Implementations will then leave the software/user to decide what to do with it (blank it, make a reference line, create a different canvas). This is what will be used on set to calculate the frame lines as well.

  • ACTIVE AREA would mostly be used for VFX when the workflow requires the vendor to receive and deliver back to post an image different (most of the time bigger) than the rendered GC area. To elaborate: what happens 99% of the times on my projects is that we have to account for a 5/10% extra area outside the target area to allow post production to stabilise, reframe, make a 3D release, make an IMAX release and so on. For these reasons, VFX CG needs to be rendered outside the main target frame, so that once VFX pulls go back to DI, VFX shots will have the same extra room of drama shots and the CG has been rendered to account for all needs, like different releases (ie IMAX) or further post-production stages (ie 3D post conversion). You don’t want to have to render twice, right?

I reckon that by adding those two extra nodes we could really account for every need both on set and in post.

I have a bunch of projects I can mention and provide documentation for it is required. But I’m sure you all know what I’m talking about here…

Sorry for the long email.

F

Thank you all for your comments.

I’m in favor of adding Vertical & Horizontal scale factors to indicate anamorphic squeezing (most that I have seen seem to use a float, i.e. 2.0 for traditional anamorphic, 1.3 for hawk lenses, 1.0 for spherical)

And while I agree that Target Area would make this a more complete snapshot of what resize was done, I am not in favor of adding Target Area for a couple reasons:

  • You will have so many different target areas depending on the specific use case and software configuration. To name a few, HD 1920x1080 for dailies, UHD 3840x2160 for reviewing on consumer TV, DCI 2K or 4K a projector, etc. The software in which you do these resizes all have different methodologies of where you set your output resolution, the order of operations, etc. Even if we specify target area, most likely it would get overridden by ‘what output res the user is set to’ in the specific scenario.

  • What these all have in common, though, is the extraction area, which is what we are trying to communicate with this metadata. It should be possible to pass a single AMF between all of these places and get the same active image to be fit within the target area of choice within the software.

So I guess my questions is, with target area, are we trying to solve something outside the scope of ‘this is the area of the image to view’?