Framing metadata

Hey Jesse,
yes - let’s talk. I will PM you my details. Should we say Monday? So we can discuss this prior our next ACES meeting?

Whoever wants to join please let us know.

To answer your points.

Yes, I’m totally in line with you here. And that’s why we need to move away from that approach and have something that could be used from set to AVID, from Dailies to DI, from VFX Pull to finishing.

That’s exactly my goal too: avoid having people to do these things manually.

I’m not saying that your last proposal won’t work. On the opposite, I think it will be sufficient, at least to start with. However, it still leaves to the user/software to workout some of the steps required to map a target extraction to a specific container. I simply think that one single instruction isn’t enough to reach a full automated pipeline and that the more informations you can add to avoid mistakes, the better. If one can be super explicit on how a frame is meant to be pulled and scaled, why not doing it? Especially if these nodes can be left as not arbitrary but as options.

Most of the features I worked on had to account for multi-viewing-format deliveries and 3D post conversions. Hence, on set the framelines had to allow to view those extra areas, all at the same time, so in the viewfinder and on the on-set monitors you end up with a main target frame and references frame lines of the others areas that needed to be protected (aka to leave clean from booms, flags, lights, people, etc.). To use a real-world example, again, this was the operator framing guideline I designed for one of my last shows. I usually send this document to the camera operators and the DITs for the other units to help them understand what they have to frame for (this chart represent exactly what they would be looking in the viewfinder/monitors):

The Arri XML for these framelines can be downloaded from here:
ASXTOGE_0_0_0_v1.xml (10.1 KB)

It’s one file, that contain them all. If using your proposal, to generate an ALEXA XML like that, once should manually choose between multiple Extraction nodes and fuse them into a single ARRI XML. In my proposal, whoever designs the AMF, can have multiple Target Area nodes within a single Extraction, hence one could design the AMF file, send it on set and say: “use the Extraction called ASXTOGE and it will automatically populate everything in one go”. Or, vice versa, the ARRI XML could be converted easily to a multi-layered AMF. Red works on a similar way (not with XML’s but has up to three simultaneous framelines options). Sony, I hear, that will design something similar for the Venice.

Anyway, let’s discuss it over the phone!
Have a good weekend…

F

1 Like

Hey all,
Just heads up we’ll be meeting to chat at 9:30AM pst on Monday. If anyone else wants to join us for some early Monday morning framing dialog…haha. Feel free to use this link!

ACES/ADM Framing
Mon, Sep 30, 2019 9:30 AM - 10:15 AM PDT

https://global.gotomeeting.com/join/319715109

Hey @CClark,
I just thought I would let you know, after a few of us jumped on the call today, we all agreed on @Francesco’s latest XML format!

To break this down, so we’re all on the same page for how this works:

<framing extractionName="ASXT_4-3_2x">
	<inputFrame>2880 2160</inputFrame> (this is the original captured resolution)
	<extraction>
		<extractionArea>2880 2160</extractionArea> (This is the initial crop, if there is one)
		<extractionOriginTopLeft>0 0</extractionOriginTopLeft> (order of these is width, then height)
		<verticalScaling>0.5</verticalScaling> (This would indicate a 2x lens was used and we need to squeeze it vertacally x2)
		<horizontalScaling>1.0</horizontalScaling>
	</extraction>
	<activeArea activeAreaName="VFX">
		<activeAreaSize>2578 2160</activeAreaSize> (In this example, this is the resolution being turned over for VFX, which is asmaller resolution than the native res, but its not the intended frame ("Active") that we will be delivering the job at.  But now VFX has room for stabalization, stereo work, etc..)
		<activeAreaOriginTopLeft>151 0</activeAreaOriginTopLeft>
	</activeArea>	
	<target targetName="2-39-desqueezed"> (This is the name Colorfront, Baselight, Resolve, etc, will show when giving the user the choice of which frame to apply)
		<targetArea>2450 2052</targetArea>  
		<targetOriginTopLeft>215 54</targetOriginTopLeft>
	</target>
</framing>

Another note is that all of the resolutions and scaling have been based on the native resolution/aspect ratio of the neg. Therefore if looking at this from an order of operations view, the de-squeeze comes last.

Hey,
I will just ping a few people here, but I’ve also emailed them to see if they can join us tomorrow.

@Graeme_Nattress @ptr @Brian_Gaffney @JamesEggleton @JamesMilne @daniele

F.

I think it would be useful to have lens squeeze factor (1.0, 1.25, 1.3, 1.33, 1.5, 1.65, 1.8, 2.0, …) and the pixel aspect (usually 1.0 these days) as properties of the source and target rasters. In the absence of lens squeeze and pixel aspect you can assume default values of 1.0 .

Hey James! Thanks for joining in. We ended up incorporating the squeeze factor into the <vertical Scaling> and <horizontalScaling> as it’s more a matter of how you need to transform the frame “to make it right” for that specific piece of pipeline, more than an inherent feature of the source frame that always needs to be applied (ie one can decide NOT to desqueeze for VFX pulls).
But I agree on adding something like <pixelAspect> in the input frame properties as optional tag that should be assumed as “1”, if not present.

Thanks, and a few others- @rohitg @peterpostma @brunomunger

Guys, I see lots of beautiful things came up from thisI also totally agree with giving tagged names to active and target areas.
My three suggestions to improve all of this further

  • Simplify things a little bit for applications with fewer, simpler requirements (monitors, LUT boxes, video converters, etc.), making most of these elements optional. For example, scaling (either vertical and horizontal – read below), if not present, default both to 1.0. Same goes with extraction, active, and target: if they are not present, they default to the original image size. In other words, defaults are extraction area = active area = target area = whole original frame.
  • For scaling, two numbers in one optional scaling element; where, if present, the first number is mandatory the “vertical scaling”, whereas the second optional number is for “horizontal scaling”.
  • Homogenized names of XML elements and attributes: therefore active, extraction, target on the top level; size, origin and scaling as children elements common to all of them. I also suggest one common W3C-compatible name attribute tag for all three “area” metadata.
  • I also suggest not putting “...TopLeft” in a name: once agreed coordinates are top-left based, that’explicitly defined in the specs and that’s it.

This way, simpler framings can still be written in very compact, very concise ways, like

<framing>
    <extraction>
        <size>1980 1080</size>
    </extraction>
</framing>

which reads: just do an unsqueezed, 1-to-1 Full-HD central extraction from the original frame (whatever resolution it is).

The previous example by @JesseKorosi would be syntax-only simplified down to:

<framing name="ASXT_4-3_2x">
	<input>2880 2160</input>
	<extraction>
		<scaling>1.0 0.5</scaling>
	</extraction>
	<active name="VFX">
		<size>2578 2160</size>
		<origin>151 0</origin>
	</active>
	<target name="2-39-desqueezed">
		<size>2450 2052</size>  
		<origin>215 54</origin>
	</target>
</framing>

At the same time, more complicated, multipurpose framing options can be included in the same spec.

As a bottom-line: I still find we are truly over-regulating things here, as AMF should be an XML dialect to describe color metadata only, whereas imaging stuff (incl. framing/scaling parameters) are to be regulated by separate XML namespaces that may either extend AMF, or be extended by AMF. But if this framing metadata need be specced here, I find this thread is reaching out to a very effective, generic framework.

Hey Walter!
Thank you for the feedback.

I agree

I agree

This might depend on how we decide to logically make things work: if the active and target depends and are relative to the output of the extraction, then no - I don’t agree- they should be children of extraction. If we stick to what we roughly agreed yesterday (the coordinates for active , extraction , target are all relative to the input), then I agree with you on this point too.

@JesseKorosi with this simplified way one could also attempt to find a string model that would reflect the XML structure to be fitted into ALE’s or EDLs. I suggest we take a look at how Codex handles it from their Production Suite software.

Codex is likely to change approach to framing in future applications, so please don’t dig too deep in to current behaviour w.r.t. ALE/EDL serialisation!

Production Suite requires that the user associates a framing rectangle with each source clip.
The canvas rectangle is defined by [width, height, pixelAspect]
An inner (framing) rectangle is defined by [width, height, xOffset, yOffset], where offsets are relative to the top-left corner of the canvas.

The geometry of output deliverables is similarly defined by a canvas and inner rectangle, along with fitting rules that define how a source inner rectangle should be fitted inside the target inner rectangle. All fitting is performed at sub-pixel precision.

As an example, a 2.39 framed region of a Alexa65 clip might be described as canvas=[6560,3100,1.0], inner=[6144,2574,208,263]. If mapping to a target image with canvas=[1920,1080,1.0], inner=[1824,1080,48,0] and fitting rule FitWidth, then the image would be rescaled so that the source 6144x2574 framed region aligned with the 1824-wide inner region of the target. We provide various fitting rules: FitX/FitY/FitXY.
Note that the source and target pixelAspects are used to constrain the fitting rules.

Note that we have another two parameters - biasX and biasY that allow the user to decide how to bias the position for cases where the source and target inner rectangles are not the same aspect. e.g. biasX=0 would results in top-justify, biasX=1 would result in bottom-justify.

I’m afraid I can’t make today’s call, but will be happy to engage in future conversations.

Evening metadata aficionados,

Jumping in to stoke the fire a bit…

My understanding is that framing is now being considered as an optional set of attributes because AMF is one sneeze away from a “complete” viewing recipe.

Considering AMF is being created to supplement the success and accuracy of a color management system, I’m actually inclined to suggest that framing metadata not be included in the AMF specification. I currently see it as far enough removed from color management to be considered an out-of-scope problem.

I also can’t help doubt the effectiveness of optional metadata in any system. It is a bit lame if filmmakers on-set decide to depend on preserving their framing choices in AMF, but some platform downstream has opted out of interpreting that metadata because it was optional. Additionally, we currently don’t see wide adoption of attributes that are declared as required in specifications like ST 2065-4, so what chance does optional metadata have?

My opinion above aside, I’ve read through the discussion so far in this thread to gather an understanding of the goal with specifying framing metadata. Referencing Chris’s earlier statement:

It should be possible to pass a single AMF between all of these places and get the same active image to be fit within the target area of choice within the software.

This looks like the most straightforward goal to me, and a feasible one as well. There is no ambiguity in the original specification that aims to achieve that goal:

<framing>
    <inputFrame>4448 3096</inputFrame>
    <extractionArea>3840 2160</extractionArea>
    <originTopLeft>304 468</originTopLeft>
</framing>

(I can’t help suggesting different names for the above. Why call one Frame and the other Area? I’d suggest inputArea and framingArea. extraction implies performing a hard crop to some.)

Although the above representation is simple, I would describe support for utilizing the extractionArea as a major feature to implement in a given software platform. AMF is per-clip, whereas viewing area / blanking is typically set at the timeline/viewing level in software, and re-mapping different areas per shot may require sophisticated definitions within the software’s configuration (if the application even supports that to begin with).

Some conflicts I would worry about:

  1. Say you bring in two shots from two different cameras, and one of them has a different aspect ratio for the extractionArea than the other shot. This is an awkward conflict that a Dailies operator may want to solve by choosing to stop the AMF framing metadata from driving their viewer, because they know what the aspect ratio should be. But that also means the software needs a button to disable reading the AMF’s framing metadata (even per shot!).
  2. What if shot A has AMF framing metadata, and shot B has no AMF framing metadata to be read? How does the user even know that’s happened, unless shot B’s native aspect ratio is definitely different than the framing of the other shot? The dailies op likely won’t know if the content they’re looking at is definitely framed correctly, unless all of this automation of mapping the framing area is presented by the software in a very transparent way. Possibly the user needs to determine which shots have AMF framing metadata, not mess with those, and then manually intervene with the shots that don’t have that metadata.
  3. Another feature this may need to support is choosing between multiple framing options. As Francesco has shown through example, you could have more than one frame line live on the camera. This is a pretty sophisticated feature, I think.

I’m also not clear how the AMF is updated for transcodes that alter the resolution. Let’s say I start with this:

<framing>
    <inputFrame>5674 3192</inputFrame>
    <extractionArea>5390 2695</extractionArea>
    <originTopLeft>142 248</originTopLeft>
</framing>

Then I transcode it to EXR and resize the entire image down to 3840x2160 (no cropping). Does the AMF update framing metadata to this?

<framing>
    <inputFrame>3840 2160</inputFrame>
    <extractionArea>3648 1824</extractionArea>
    <originTopLeft>96 168</originTopLeft>
</framing>

The addition of the scaling factors verticalScaling and horizontalScaling, purely for the sake of recognizing images captured with anamorphic lenses, feels a bit awkward to me. Two reasons:

  1. These are pre-determining whether the footage should be “stretched” (increase the width by a scaling factor) or “squashed” (decrease the height by a scaling factor). There is value in leaving this flexibility open. You may stretch to avoid having to upsample a de-squeezed image into a larger delivery container. You may squash an image to produce something with a smaller pixel area, in order to save on compute and/or file size.

  2. Images captured with anamorphic glass are typically described with a non-zero pixel aspect ratio. pixelAspectRatio is actually a mandatory OpenEXR metadata attribute in ST 2065-4. Is this not enough to communicate non-square pixels? Will optional framing metadata in an AMF be more successful than required metadata in 2065-4? Obviously 2065-4 doesn’t cover the source camera files themselves, but original camera files should really be carrying that sort of info in their own metadata.

Including the concepts behind activeArea and targetArea supports a very different goal. Instead of maintaining the same active image at all stages, this looks to automate some part of the image workflow in Post. This is exponentially more complex than the previously stated goal by Chris.

Some immediate concerns:

  1. Order of operations now matters. Each given software platform has its own under-the-hood implementation of transforming images (resizing, mapping areas, blanking, and so on). Translating the intended order of the operations successfully from an AMF to each of these systems is a sizable task, especially when we expect that translation to be perfectly consistent across all software platforms.
  2. This assumes operators know what their activeArea will be in pre-production. While we would love this to be the task, in reality I reckon that most of the time this is not decided. Who would be responsible for including it after an AMF is initially generated on-set? What software would do that? Is this any easier than doing whatever people are already doing extract these images for VFX pulls, or is it just moving the task from one mechanism to another?
  3. This makes AMF implementation exponentially more difficult for vendors to support, and therefore much more likely to fail adoption. I would imagine these are major features to add support for.

I hope these concerns don’t come across too pessimistic. Ultimately I think achieving a successful unified color management system is already changing the world, so I am just worried about weighing it down with more objectives that don’t directly benefit color management. If everyone is looking at this as a major selling factor to describe ACES as a system that can preserve the complete viewing recipe, then I’d suggest making it required metadata.

3 Likes

Thank you @jmcc and @JamesEggleton for the posts.
You both make very interesting points. On some of which I have an opinion and ideas, on some others less. I hope I will have the chance to discuss them in person soon enough.
As you all noticed this topic is now growing too much and it is going to distract everyone from the main aim of this adventure: color management.
The working group decided today to exclude the Framing metadata from the AMF for now.

However we all agreed that this is a very hot topic and we all hope that we will find the time and the (more appropriate) space to discuss this deeper and further and to hopefully find a solution together.

Let’s keep up the good work!

1 Like