Framing metadata

I agree with Francesco with both points.
Squeezing is important l, but it should be there only if scaling is: I hope no one will ever hand-type an AMF.

As the three decimal points I agree and all-in on that: let’s define them as a ratio, as it’s done already for noninteger frame rates – numerator and denominator. So no “rounding” or higher level cose processing of it is requited.

So 1.66 really is 5 : 6.

Hey Walter and Francesco,
I am curious why it is important to keep a scaling attribute along side the de-squeeze? I have never had to think about what I am scaling to, when de-squeezing an image, so I assume I’m missing something.

As an example, when receiving 2:1 Anamorphic media from set, in the lab, I apply the 2:1, or 1.33:1, but regardless of whether I am rendering out HD files for editorial, or 720 files for PIX…Or even smaller files for a different screener deliverable, or a random resolution for VFX, the de-squeeze never changes.

And the same would go for my team in finishing. When we conform, these two values are very independent of one another.

I guess I’ve just never heard of a de-squeeze changing in any way due to the scaling you’re applying to your output. Maybe I’m missing something though? If not, I would wonder how often we will actually get this de-squeeze value from set. The camera may not always know there is an amamorphic lens on it and we may not always have smart lenses, so I think we need to assume this field will sometimes be blank. I agree that it would be good to have it in there though.

Therefore: (Sorry, I can’t remember how you wrote in the frame ID #'s during the meeting, so those are missing below, but essentially:)

    <inputFrame>2880 2160</inputFrame>
    <extractionArea>2570 2160</extractionArea>
    <originTopLeft>155 0</originTopLeft>
    <deSqueeze>2048 858</deSqueeze>

Hi Jesse.
Here’s my three cents.

  • First point: the anamorphic factor, as far as it is proposed to be represented here (i.e. as a single number rather than detailed optical parameters of a cylindrical lens), is nothing more than a vertical scaling (squeezing) factor; so why neglect horizontal scaling?
  • Second point: generally speaking, all you might be ever concerned with in transporting framing metadata along with AMF is squeezing. However, to some advanced image evaluation workflows, you might want to preserve, along with the AMF history, the output resolution of what was exactly viewed. For exampke, in a compositing or QC perspective, you want to preserve that a Scope central extraction of a specific area was made, but also that this 2.39:1 frame was viewed in a sandboxed HD monitor (in Rec.709), rather than on something else (e.g. a 8K Rec.2020 reference monitor, where 1-by-1 pixel was viewed without scaling). In such a case, full (i.e. both horizonal and vertical) scaling factor may be relevant in AMF.
  • Third point: neither anamorphic factor nor scaling are, strictly speaking, color-related metadata; as they may both be considered framing parameters, they should be either both-in or both-out of AMF.

What I’m saying is that squeezing factor is of course fundamental (mostly for what you said at the end: that’s not always included in all raw camera footage metada because of lack of smart lenses).
From imaging science perspective, I propose to call squeezing factor as <verticalScaling> and have both it and <horizontalScaling> as an independently optional couple of parameters. So squeezing-only or full scaling can be both used, according to different needs.

I question whether a single stage scale+offset is sufficient.

The format choice is fine, and exact integers is important for QC as well.

But in the general model of processing including film scans (which are still relevant in restoration)
there is the Image Capture, the Extraction of the Working Image -perhaps with extra margin- (Bayer pattern requirements), possible desqueezing as noted above, and then extraction of the final work product frame.

Sometimes there are extractions of TWO different output ratios and they may be shifted (top-line or center-line as examples both a 1.85 and a 1.78 (TV) are sometimes extracted when a larger canvas is available. VFX would have been instructed to work in the larger ‘protection format’ that the DP shot. (protect for 1.78 with a 1.85 cinema release.)

If these are not carried in a chain in a IMF, then point to point files for each of these steps is needed
and have to be managed for the history of the transforms applied. Some of these transforms are unique to a small number of frames, so you will still get some splitting of the AMF over time. I think
part of the goal is to recreate from ‘camera’ original the image transforms that get you to the current result both in color and size? True?

In general, it is far better to have exact IN and OUT pixel integers than to apply an approximate ratio that is never really correct. Sometimes they are cheated because the ratio is not important, the output deliverable is.


I’ve updated the XSD based on the discussion so far.

And the example

As another reason not to use ‘scaling factors’ but go right to the deliverable raster…
the aspect ratio for DCI Cinema is NOT 2.39 it is 2048x858 (or a 2.386946…:1)
Exact integer line placement is important.

Also for TV, it is not 1.78:1 but rather 1.77777777777…:1 (1920x1080)

All ratios you have heard about are non-deterministic and sometimes choices have to be made
about fitting in that are not standard… easy with integers, hard with ratios. Another example, some old films have to shown at 2.4 which was the actual projection ratio, at 2.39 there might be a splice appearing.

So for the same reason we should not use float positions, we should not use float scalers.

BTW, I am not talking about the centering or offsetting to get a frame in the middle for letter boxing or pillar boxing, or for odd formats or even for pan and scan, I am just referring to creating the working active image. The origin is ALWAYS going to be relative to the inputFrame size.

I think that the should always be able to describe what the flat image size is even if you are working on an anamorphic working frame.


Hi @JesseKorosi ,

sorry for the late reply, it has been a busy week.

I think both @walter.arrighetti and @jim_houston are better describing and clarifying my points. I like the <verticalScaling> and the <horizontalScaling> idea a lot, I think solves both unusual squeezing problems and pixel accurate scaling.

However, I don’t think it’s enough still. Please don’t hate me if I try once more (and one last time, I promise) to express my doubts.

Having @jim_houston mentioning “active image area” on his last email, gives me the chance to argue again on the need of two additional (optional) nodes to add to the framing tree.

My argument is mainly based on trying to clarify (mostly to myself) what is the aim of this framing metadata and what we refer to when we talk about extraction. I might be stating the obvious here and I’m sure you will all have considered the following points and I’m just overkilling this topic but, for the sake of trying to clarify what I have in mind, I’ll go further.

I think it’s very important to make a difference between how an extraction guide is used on set and how it is used in post.

I think we all agree that extraction guide lines designed for on-set purposes want to specify what the operator has to frame for on-set and not more., especially if the we aim to be able to translate those numbers into a camera-compatible frame-line file. In 99.99% of the cases operators only want to see what they need to frame for. Too many lines in the viewfinder make their life impossible, it distracts them.

The post-production framing, on the other hand, in my personal experience, specifies what the working image area should be, or -in other words- how a given frame should be processed to fit the current post-production stage. Most of the time (I would say 80% of the time) the extraction isn’t the target frame, but instead is how the image needs to be cropped and adjusted to, from which the target area is obtained after and within it. In other words, in post production we never crop for the target area, but rather for a larger working area.

I know that different AMF will be generated for different stages of production, therefore the concept of extraction can vary from set to post from what is the target area to what needs to be pulled for VFX or DI, but still I think that one single instruction to define a frame and a working area isn’t enough. I strongly believe there is a need of a multi-layer system.

If this framing metadata wants to automate post-workflows I think it needs to account for these instructions:






We got the first three nailed down, allow me to argue that we need the forth one to make the whole thing working and possibly the fifth one to account for every scenario I ever had to deal with.

  • TARGET AREA: I previously referred to it as “blanking” instruction, but after reading Jim and Walter’s post I think we could refer to it as “target area”. Conceptually they are two different approaches to get to the same result: once an image has been cropped and scaled, we use this instruction to tell the software what portion of the frame matches with what has been framed on set. Implementations will then leave the software/user to decide what to do with it (blank it, make a reference line, create a different canvas). This is what will be used on set to calculate the frame lines as well.

  • ACTIVE AREA would mostly be used for VFX when the workflow requires the vendor to receive and deliver back to post an image different (most of the time bigger) than the rendered GC area. To elaborate: what happens 99% of the times on my projects is that we have to account for a 5/10% extra area outside the target area to allow post production to stabilise, reframe, make a 3D release, make an IMAX release and so on. For these reasons, VFX CG needs to be rendered outside the main target frame, so that once VFX pulls go back to DI, VFX shots will have the same extra room of drama shots and the CG has been rendered to account for all needs, like different releases (ie IMAX) or further post-production stages (ie 3D post conversion). You don’t want to have to render twice, right?

I reckon that by adding those two extra nodes we could really account for every need both on set and in post.

I have a bunch of projects I can mention and provide documentation for it is required. But I’m sure you all know what I’m talking about here…

Sorry for the long email.


Thank you all for your comments.

I’m in favor of adding Vertical & Horizontal scale factors to indicate anamorphic squeezing (most that I have seen seem to use a float, i.e. 2.0 for traditional anamorphic, 1.3 for hawk lenses, 1.0 for spherical)

And while I agree that Target Area would make this a more complete snapshot of what resize was done, I am not in favor of adding Target Area for a couple reasons:

  • You will have so many different target areas depending on the specific use case and software configuration. To name a few, HD 1920x1080 for dailies, UHD 3840x2160 for reviewing on consumer TV, DCI 2K or 4K a projector, etc. The software in which you do these resizes all have different methodologies of where you set your output resolution, the order of operations, etc. Even if we specify target area, most likely it would get overridden by ‘what output res the user is set to’ in the specific scenario.

  • What these all have in common, though, is the extraction area, which is what we are trying to communicate with this metadata. It should be possible to pass a single AMF between all of these places and get the same active image to be fit within the target area of choice within the software.

So I guess my questions is, with target area, are we trying to solve something outside the scope of ‘this is the area of the image to view’?

Hello Chris,
fair enough if we don’t want to over complicate the framing metadata node.

I just don’t quite understand what the expectations are from the extraction node, as to me it seems to be quite a hybrid concept now (especially after adding the V+H scaling): are we expecting from it to allow a software to pre-process a source frame so it can be transformed to fall into specific pipeline requirements or simply to provide an instruction of some sort that users will have to visualise a specific region of the source frame?

The latter only requires the extraction node then (one could argue that not even the scaling is needed) as its function will basically match that of the camera framelines on set. If so, then I understand what was your point initially and I do agree, we’ll be fine with extraction+offset.

If, on the other hand, we hope that this set of metadata will automate some steps in post, a bit like the metadata contained into the .mxf Alexa Mini files that allows to crop files recorded with a 4:3 2.8K flag to the right extraction (2880x2160), instead of leaving them Open Gate (which is what the camera really records), or like it happens with all cameras when shooting with an “anamorphic” flag (Arri, Red, Sony, all do the same) which instructs the software to automatically desqueeze the input frame accordingly to the right ratio. If that’s the aim then I have to reiterate my points and say that I believe we need the four (five, maybe) nodes as I was trying to explain on my post above, to make sure that things are instructed properly for each stage of production and post.

To go back to your points, the way I see it, the Target Area is not going to be affected by the specific use cases as it communicates exclusively the area framed on set by the Camera operators. In fact, the way how I see it, nothing really a part from the output scaling (the output resize, which we are currently not even considering here) drastically changes from use case to use case, except when there are specific needs in place, especially at the far end of the post production (like the ones quoted by Walter and Jim, ie center crop, pan and scan, QC, etc.).


  • INPUT FRAME and EXTRACTION as mandatory fields that will be related to how the source frame is meant to be extracted (cropped).
  • SCALING as optional third element of the extraction pipeline which is relative to the extraction output and tells how the cropped frame needs to be up/down scaled.
  • TARGET and ACTIVE are optional instructions aimed to the software that needs to work on the extracted and scaled images, to workout the canvas size (ACTIVE AREA) and the on-set intended framed area (TARGET AREA). The results of those two instructions won’t directly affect/transform the image, but only how the software will show it to the user, or on what it will allow the user to work on.

I will try to make some examples of how the framing metadata section would look like for me using some real-life examples of extraction guidelines designed on different shows over the years.
To put things in context, however, I would also like to pick a show and share all the framelines we had to design for Aladdin (Disney - 2019), which was a multi camera, multi lenses, multi format show that had complex needs. You can download them from here (Dropbox Link):

Let me consider two, quite standard, use cases:

  1. Multiple target frames, with extraction crop due to lens coverage and extra room for VFX area (from LIFE- Sony).

Put into context: Shooting Open Gate, lenses chosen by the DP won’t cover it (no surprises there, right?), hence all target frames needed to be calculated from an area that lenses would cover. We would normally just scale the input source to the desired VFX resolution, without cropping, but there was a problem: Sony required a 4K master, VFX costs demanded to keep their bits in 2K, DI (Tech) and DP wanted to keep the higher possible resolution. We managed to all agree on a 3.2K pipeline as it matched a full 35mm film gate, so lenses would cover, and it was the max allowed without incurring into extra budget for the VFX renders. This way both drama and VFX could be done at the same resolution and DI would be able to scale to 4K or 2K with better results.
Also, because the project required a 3D post conversion and an IMAX release, there was need of some extra room to allow all that fun too.

This would be the AMF:

<framing extractionName="AM_OG">
	<inputFrame>3424 2202</inputFrame>
		<extractionArea>3280 1844</extractionArea>
		<extractionOriginTopLeft>72 179</extractionOriginTopLeft>
	<activeArea activeAreaName="VFX">
		<activeAreaSize>3280 1728</activeAreaSize>
		<activeAreaOriginTopLeft>0 58</activeAreaOriginTopLeft>
	<target targetName="IMAX">
		<targetArea>3116 1642</targetArea>
		<targetOriginTopLeft>82 101</targetOriginTopLeft>
	<target targetName="2-39">
		<targetArea>3116 1306</targetArea>
		<targetOriginTopLeft>82 269</targetOriginTopLeft>

The same AMF would work for VFX pulls and dramas, but not for dailies. Since for AVID we normally crop the east-west edges of the frame to the target frame and then we leave the north-south to fill the 1.78:1 container, when it is possible. So, the version of the above AMF for dailies would look like:

<framing extractionName="AM_OG_dailies">
	<inputFrame>3424 2202</inputFrame>
		<extractionArea>3116 1752</extractionArea>
		<extractionOriginTopLeft>154 225</extractionOriginTopLeft>
	<target targetName="IMAX">
		<targetArea>3116 1642</targetArea>
		<targetOriginTopLeft>0 55</targetOriginTopLeft>
	<target targetName="2-39">
		<targetArea>3116 1306</targetArea>
		<targetOriginTopLeft>0 223</targetOriginTopLeft>

This should cover most of the needs for this project.

  1. Anamorphic 2x squeeze, with extra room for VFX area (from Aladdin)

The AMF would be:

<framing extractionName="ASXT_4-3_2x">
	<inputFrame>2880 2160</inputFrame>
		<extractionArea>2880 2160</extractionArea>
		<extractionOriginTopLeft>0 0</extractionOriginTopLeft>
	<activeArea activeAreaName="VFX">
		<activeAreaSize>2578 2160</activeAreaSize>
		<activeAreaOriginTopLeft>151 0</activeAreaOriginTopLeft>
	<target targetName="2-39-desqueezed">
		<targetArea>2450 2052</targetArea>
		<targetOriginTopLeft>215 54</targetOriginTopLeft>

This time VFX gets the full gate, as well as DI, so no crops required for post. Once again though, we need to extract differently for dailies:

<framing extractionName="ASXT_4-3_2x_dailies">
	<inputFrame>2880 2160</inputFrame>
		<extractionArea>2450 2160</extractionArea>
		<extractionOriginTopLeft>215 0</extractionOriginTopLeft>
	<target targetName="2-39-desqueezed">
		<targetArea>2450 2052</targetArea>
		<targetOriginTopLeft>0 54</targetOriginTopLeft>

This would cover most of the needs for this show.

The general idea is that the frame gets pre-processed by the software following the extraction node, then the active and target areas could only be a visual reference or, for some software implementations, become useful to set up the project/timeline or double check that the existing timeline/canvas size matches to what is required. I’m guessing, for example, that AVID would be able to transform the target frame instructions into a blanking filter for the timeline, as the editors now manually set up. Davinci could do the same (with the blanking).

I understand and now agree that we don’t want to over-complicate things with output scaling resolutions and leave the softwares to adapt these framing numbers to the desired resolution using their own methods, but if we add those additional nodes we could at least allow for every useful information to be carried through and properly communicated to each vendor.

As I’m writing this I’m also realising that the target area should really be put down as a relative number, like ARRI does in the xml framelines, so that the pixel count can be calculated by the software after the internal scaling (if you scale a source frame of 2880 px to1080 px then the target area numbers won’t mean much unless they get scaled as well, but maybe it’s easier if they get written down as relative instruction in the first place).

I know I’m insisting a lot here, but I just wanted to make sure that my points were clear.
I’m not going to bring this up again if you guys think I’m overthinking it.

As usual, sorry for the thousands words.


Thanks for the detailed message here! I think really getting down to real world scenarios is what will set this up for success, so this was great.

For the dailies example, you have the extraction area. So this gets you down into the 1.78 frame they want delivered for editorial. Therefore in Colorfront as an example, this is what we’d actually choose as our framing/render preset. But then is the idea that the ‘target’ is just metadata that is there should they now want to apply that 2.39 matte back on in Avid? I think if we’re really trying to pair things down, this could be something out of scope. Definitely nice, but not necissarily a must in my opinion.

What if the AMF looked like this:

With this example, there will be a lot of jobs that just have 1 or 2 framing options within, but some, like your job, would have many more. But at this point, its all in one self contained file, and still pretty easily manageable.

And then for our ASC Advanced Data Management Committee looking to also get framing data for non ACES jobs, we put these into columns (Like the CDL):
Width: 4448
Height: 3096
extraction001: (3280 1844)(72 179)(1.0 1.0)
extraction002: (3280 1728)(0 58)(1.0 1.0)
extraction003: (3116 1642)(82 101)(1.0 1.0)
extraction004: (3116 1306)(82 269)(1.0 1.0)

Hey Jesse,
thanks for the feedback!

For the dailies example, you have the extraction area. So this gets you down into the 1.78 frame they want delivered for editorial. Therefore in Colorfront as an example, this is what we’d actually choose as our framing/render preset. But then is the idea that the ‘target’ is just metadata that is there should they now want to apply that 2.39 matte back on in Avid? I think if we’re really trying to pair things down, this could be something out of scope. Definitely nice, but not necissarily a must in my opinion.

Yes, that’s exactly the idea.
I’m sorry if it sounds out of scope, as I said I’m struggling to understand the exact scope of the whole framing metadata, if it is not to try to automate these things.

Your proposed structure would work too for me, absolutely.

I have, however, some comments/concerns on it:

  1. The problem I see with leaving the Extraction node only (without the Target and Active Area nodes) are:
  • If the Extraction metadata only needs to express what has been framed on set, then it cannot be used to express an extraction (crop). If you don’t instruct an extraction (crop) then you leave to the users the hardest part of the work: cropping and scaling a frame to specs… see two points below what I think about this.
  • If the Extraction metadata could express both “what has been framed on set” and “the crop” required to adjust an input frame to a specific need, then the scope/intent of that metadata will change from use case to use case, even within the same workflow, making the whole thing a bit confusing. For instance: on set, it would communicate what you are meant to frame; for dailies or post it would be how the frame needs to be cropped before processing it, for VFX it would go back to express what was framed on set, etc. Just back to my question: what are we trying to achieve here?
  • If the Extraction only needs to express how to prepare/crop/adjust an input frame (as I propose), by not adding the Target Area you rely on the user/vendor to do so correctly as per status quo. Hence we are back into the same old conundrum of having to trust, check and verify, everything and every time.

I’m not saying that my proposed structure would fix that problem once and for all, but it would definitely be a step towards the semi-complete framing extraction automation by actually carrying all the required data in one package. I’m not really too worried about the AVID blanking (although I am to a point), I’m more worried about VFX vendors (last job we had 14 of them, from around the world, and I’m sure you know how it’s a PAINFUL job having to check pixel accurate turnovers for each delivery coming from each vendor, every time)

  1. My proposal can be condensed into a single AMF as well, just by adding multiple Framing trees, that’s why I added an “extractionName=” attribute, so they can be distinguished from each other and co-exist.
  2. Your proposal will be hard to be translated into a camera-compatible frameline instruction (let’s use the Alexa XMLs as an example), when multiple framing targets are required (all the time for me). I mean, you can, but the user will theoretically have to append multiple extractions together manually. On my proposal, because there are multiple Target Areas allowed within a single Extraction node, these could be translated into multi-targets camera compatible framelines (this would work for Red too using their “absolute values”).

Hope this makes sense.



Currently, I have to create pixel accurate framing charts for any of our high end jobs we do at Sim. As you know, the cameras framing charts are never pixel accurate. They are a good reference, but that’s it. We do more TV than features here, so it’s often less about many framing options for each camera, but plenty of cameras to deal with. So for one of the new Netflix shows I’m on right now as an example, which has actually been pretty light on various cameras, I’ve created a chart for the:

  • AlexaMini_OpenGate_3424x2202
  • Phantom4K
  • SonyVenice4K

I create these diagrams, people create their framing presets knowing my chart is pixel accurate, then they apply said framing preset on the camera recorded chart, and we see how it lines up. Its never bang on, because again…camera charts never are. But its a great reference!

So to me, the goal here would be to stop having humans manually choosing how much to zoom in, pan/tilt, crop, matte, etc… Have the software read the framing preset, apply it. And then you could even save that as a preset in your software.

Because not every job has people like you and I making these pixel accurate charts, dailies may frame one way, VFX then re-creates the framing trying to match the QT Reference. But maybe dailies f-ed up the framing and now VFX is just matching it, but no one hears about it because they are both wrong. haha. Anyhow;

I think we’re on the same page for the goals :slight_smile: But what I’m curious to hear is why the example I gave would not allow software to make the crop/zoom, etc to match the frame from on set? Where I wrote extraction frame, really what this meant is; this is the area that is intended to be active. If this active area happens to be 2.39, and you throw it into a 1.78 window, the software would know to matte it.

My thought had been that in camera, you set up your frame lines, the same way you currently do. These are each then an extraction name set of parameters from the saved XML. But in camera you could choose to use any of these 4 optional frame lines. I have never actually created these frame lines though and am not sure how they currently correlate with what gets saved into an XML. So if you’re saying this way is not do-able, copy that. I don’t think I fully get it yet though.

Maybe the disconnect is that I don’t understand the difference between your point about the framing on set, vs the crop? Is this as an example people framing on set for 1.78, but the crop would bring it down into a 2.39 as a different frame? If so, wouldn’t the 1.78 by one frame line you set in camera, and then the 2.39 be another. Then we have both?

I assume I’m missing something, but thought I’d also offer up the idea of chatting over the phone, versus continuing in this thread?

P.S. I totally hear your point about lots of vendors. We have 28 right now on Watchmen…Its nuts!


Hey Jesse,
yes - let’s talk. I will PM you my details. Should we say Monday? So we can discuss this prior our next ACES meeting?

Whoever wants to join please let us know.

To answer your points.

Yes, I’m totally in line with you here. And that’s why we need to move away from that approach and have something that could be used from set to AVID, from Dailies to DI, from VFX Pull to finishing.

That’s exactly my goal too: avoid having people to do these things manually.

I’m not saying that your last proposal won’t work. On the opposite, I think it will be sufficient, at least to start with. However, it still leaves to the user/software to workout some of the steps required to map a target extraction to a specific container. I simply think that one single instruction isn’t enough to reach a full automated pipeline and that the more informations you can add to avoid mistakes, the better. If one can be super explicit on how a frame is meant to be pulled and scaled, why not doing it? Especially if these nodes can be left as not arbitrary but as options.

Most of the features I worked on had to account for multi-viewing-format deliveries and 3D post conversions. Hence, on set the framelines had to allow to view those extra areas, all at the same time, so in the viewfinder and on the on-set monitors you end up with a main target frame and references frame lines of the others areas that needed to be protected (aka to leave clean from booms, flags, lights, people, etc.). To use a real-world example, again, this was the operator framing guideline I designed for one of my last shows. I usually send this document to the camera operators and the DITs for the other units to help them understand what they have to frame for (this chart represent exactly what they would be looking in the viewfinder/monitors):

The Arri XML for these framelines can be downloaded from here:
ASXTOGE_0_0_0_v1.xml (10.1 KB)

It’s one file, that contain them all. If using your proposal, to generate an ALEXA XML like that, once should manually choose between multiple Extraction nodes and fuse them into a single ARRI XML. In my proposal, whoever designs the AMF, can have multiple Target Area nodes within a single Extraction, hence one could design the AMF file, send it on set and say: “use the Extraction called ASXTOGE and it will automatically populate everything in one go”. Or, vice versa, the ARRI XML could be converted easily to a multi-layered AMF. Red works on a similar way (not with XML’s but has up to three simultaneous framelines options). Sony, I hear, that will design something similar for the Venice.

Anyway, let’s discuss it over the phone!
Have a good weekend…


1 Like

Hey all,
Just heads up we’ll be meeting to chat at 9:30AM pst on Monday. If anyone else wants to join us for some early Monday morning framing dialog…haha. Feel free to use this link!

ACES/ADM Framing
Mon, Sep 30, 2019 9:30 AM - 10:15 AM PDT

Hey @CClark,
I just thought I would let you know, after a few of us jumped on the call today, we all agreed on @Francesco’s latest XML format!

To break this down, so we’re all on the same page for how this works:

<framing extractionName="ASXT_4-3_2x">
	<inputFrame>2880 2160</inputFrame> (this is the original captured resolution)
		<extractionArea>2880 2160</extractionArea> (This is the initial crop, if there is one)
		<extractionOriginTopLeft>0 0</extractionOriginTopLeft> (order of these is width, then height)
		<verticalScaling>0.5</verticalScaling> (This would indicate a 2x lens was used and we need to squeeze it vertacally x2)
	<activeArea activeAreaName="VFX">
		<activeAreaSize>2578 2160</activeAreaSize> (In this example, this is the resolution being turned over for VFX, which is asmaller resolution than the native res, but its not the intended frame ("Active") that we will be delivering the job at.  But now VFX has room for stabalization, stereo work, etc..)
		<activeAreaOriginTopLeft>151 0</activeAreaOriginTopLeft>
	<target targetName="2-39-desqueezed"> (This is the name Colorfront, Baselight, Resolve, etc, will show when giving the user the choice of which frame to apply)
		<targetArea>2450 2052</targetArea>  
		<targetOriginTopLeft>215 54</targetOriginTopLeft>

Another note is that all of the resolutions and scaling have been based on the native resolution/aspect ratio of the neg. Therefore if looking at this from an order of operations view, the de-squeeze comes last.

I will just ping a few people here, but I’ve also emailed them to see if they can join us tomorrow.

@Graeme_Nattress @ptr @Brian_Gaffney @JamesEggleton @JamesMilne @daniele


I think it would be useful to have lens squeeze factor (1.0, 1.25, 1.3, 1.33, 1.5, 1.65, 1.8, 2.0, …) and the pixel aspect (usually 1.0 these days) as properties of the source and target rasters. In the absence of lens squeeze and pixel aspect you can assume default values of 1.0 .

Hey James! Thanks for joining in. We ended up incorporating the squeeze factor into the <vertical Scaling> and <horizontalScaling> as it’s more a matter of how you need to transform the frame “to make it right” for that specific piece of pipeline, more than an inherent feature of the source frame that always needs to be applied (ie one can decide NOT to desqueeze for VFX pulls).
But I agree on adding something like <pixelAspect> in the input frame properties as optional tag that should be assumed as “1”, if not present.

Thanks, and a few others- @rohitg @peterpostma @brunomunger

Guys, I see lots of beautiful things came up from thisI also totally agree with giving tagged names to active and target areas.
My three suggestions to improve all of this further

  • Simplify things a little bit for applications with fewer, simpler requirements (monitors, LUT boxes, video converters, etc.), making most of these elements optional. For example, scaling (either vertical and horizontal – read below), if not present, default both to 1.0. Same goes with extraction, active, and target: if they are not present, they default to the original image size. In other words, defaults are extraction area = active area = target area = whole original frame.
  • For scaling, two numbers in one optional scaling element; where, if present, the first number is mandatory the “vertical scaling”, whereas the second optional number is for “horizontal scaling”.
  • Homogenized names of XML elements and attributes: therefore active, extraction, target on the top level; size, origin and scaling as children elements common to all of them. I also suggest one common W3C-compatible name attribute tag for all three “area” metadata.
  • I also suggest not putting “...TopLeft” in a name: once agreed coordinates are top-left based, that’explicitly defined in the specs and that’s it.

This way, simpler framings can still be written in very compact, very concise ways, like

        <size>1980 1080</size>

which reads: just do an unsqueezed, 1-to-1 Full-HD central extraction from the original frame (whatever resolution it is).

The previous example by @JesseKorosi would be syntax-only simplified down to:

<framing name="ASXT_4-3_2x">
	<input>2880 2160</input>
		<scaling>1.0 0.5</scaling>
	<active name="VFX">
		<size>2578 2160</size>
		<origin>151 0</origin>
	<target name="2-39-desqueezed">
		<size>2450 2052</size>  
		<origin>215 54</origin>

At the same time, more complicated, multipurpose framing options can be included in the same spec.

As a bottom-line: I still find we are truly over-regulating things here, as AMF should be an XML dialect to describe color metadata only, whereas imaging stuff (incl. framing/scaling parameters) are to be regulated by separate XML namespaces that may either extend AMF, or be extended by AMF. But if this framing metadata need be specced here, I find this thread is reaching out to a very effective, generic framework.