Film Trailer Generation via Task Decomposition: Results: Ablation Studies

cover
7 Jun 2024

Authors:

(1) Pinelopi Papalampidi, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh;

(2) Frank Keller, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh;

(3) Mirella Lapata, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh.

C. Results: Ablation Studies

D. Task Decomposition Analysis

How Narrative Structure Connects with Trailers According to screenwriting theory [22], the five TPs segment movies into six thematic units, namely, ”Setup”, ”New Situation”, ”Progress”, ”Complications and Higher Stakes”, ”Final Push”, and ”Aftermath”. To examine which parts of the movie are most prevalent in a trailer, we compute the distribution of shots per thematic unit in gold trailers (using the extended development set of TRIPOD). As shown in Figure 4, trailers on average contain shots from all sections of a movie, even from the last two, which might reveal the ending. Moreover, most trailer shots (30.33%) are selected from the middle of the movie (i.e., Progress) as well as from the beginning (i.e., 16.62% and 25.45% for “Setup” and “New Situation”, respectively). These empirical observations corroborate industry principles for trailer creation.[10]

Next, we find how often the trailers include the different types of key events denoted by TPs. We present the percentage of trailers (on the development set) that include at least one shot per TP in Table 7. As can be seen, more than half of the trailers (i.e., 52.63% and 55.26%) include shots related to the first two TPs, whereas only 34.21% of trailers have any information about the two final ones. This is expected, since the first TPs are introductory to the story and hence more important for making trailers, whereas the last two may contain spoilers and are often avoided.

How Sentiment Connects with Trailers Empirical rules for trailer making[11] suggest that a trailer should start with shots of medium intensity to captivate the viewers, then decrease the sentiment intensity in order to deliver key information about the movie, and finally build up the tension until it reaches a climax.

Here, we analyze the sentiment flow in real trailers from our development set based on predicted sentiment scores (see Sections 3.5 and 4). Specifically, we compute the absolute sentiment intensity (i.e., regardless of positive/negative polarity) per shot in the (true) trailers. In accordance with our experimental setup, we again map trailer shots to movie shots based on visual similarity and consider the corresponding sentiment scores predicted by our network. We then segment the trailer into three equal sections and compute the average absolute sentiment intensity per section. In Table 8 presents the results. As expected, on average, the second part is the least intense, whereas the third has the highest sentiment intensity. Finally, when we again segment each trailer into three equal sections and measure the sentiment flow from one section to the next, we find that 46.67% of the trailers follow a ”V” shape, similar to our sentiment condition for generating proposal trailers with GRAPHTRAILER.

Examples of Walks in GRAPHTRAILER We present in Figures 5 and 6 a real example of how GRAPHTRAILER operates over a sparse (shot) graph for the movie ”The Shining”. Here, we show the algorithm’s inner workings on a further pruned graph for better visualization (Step 1; Figure 5), while in reality we use the full graph as input to GRAPHTRAILER.

Figure 5. Run of GRAPHTRAILER algorithm for the movie ”The Shining”. Step 1 illustrates the shot-level graph (pruned for better visualization) with colored nodes representing the different types of TPs predicted in the movie (i.e., TP1, TP2, TP3, TP4, TP5). Our algorithm starts by sampling a shot identified as TP1 by VIDEOGRAPH (Step 1). For each next step, we only consider the immediate neighborhood of the current shot (i.e., 6–12 neighbors) and select the next shot based on the following criteria: (1) semantic similarity, (2) time proximity, (3) narrative structure, and (4) sentiment intensity (Steps 2-4). Our algorithm continues in Figure 6.

We begin with shots that have been identified as TP1 (i.e., ”Opportunity”; introductory event for the story). We sample a shot (i.e., bright green nodes in graph) and initialize our path. For the next steps (2–7; in reality, we execute up to 10 steps, but we excluded a few for brevity), we only examine the immediate neighborhood of the current node and select the next shot to be included in the path based on the following criteria: (1) semantic coherence, (2) time proximity, (3) narrative structure, and (4) sentiment intensity. We give more details about how we formalize and combine these criteria in Section 3.1.

We observe that our algorithm manages to stay close to important events (colored nodes) while creating the path, which means that we reduce the probability of selecting random shots that are irrelevant to the main story. Finally, in Step 8, Figure 6, we assemble the proposal trailer by concatenating all shots in the retrieved path. We also illustrate the path in the graph (i.e., red line).

An advantage of our approach is that it is interpretable and can be easily used as a tool with a human in the loop. Specifically, given the immediate neighborhood at each step, one could select shots based on different automatic criteria or even manually. Our approach drastically reduces the amount of shots that need to be reviewed to create trailer sequences to 10% of the movie. Moreover, our criteria allow users to explore different sections of the movie, and create diverse trailers.

Figure 6. We continue to build the trailer path by selecting the next shot from the immediate neighborhood based on interpretable criteria (Steps 5–7). Finally, we assemble the proposal trailer by concatenating the shots in the path. Our algorithm allows users to review candidate shots at each step and manually select the best one while taking into account our criteria. GRAPHTRAILER allows users to create trailers by only reviewing around 10% of the movie based on recommendations which are interpretable (e.g., coherence with previous shot, relevance to story or intensity).

This paper is available on arxiv under CC BY-SA 4.0 DEED license.


[10] https://archive.nytimes.com/www.nytimes.com/interactive/2013/02/19/movies/ awardsseason/oscar-trailers.html?_r=0

[11] https://www.derek-lieu.com/blog/2017/9/10/the matrix-is-a-trailer-editors-dream