21206mp4

Precisely outline the changed object using an MLP segmentation head.

A visual "heatmap" or mask overlaying the video, showing that the AI successfully located the change requested in the text. Technical Significance 21206mp4

A human-language query like "Find the new building" or "Highlight the moved chair." Precisely outline the changed object using an MLP

This file is used to prove that the architecture can: 21206mp4

Use text tokens to focus only on specific changes rather than every pixel difference (like shadows or lighting).