SAM-PT

Segment Anything Meets Point Tracking (arXiv:2307)

TL;DR

This paper introduces SAM-PT, a point-centric interactive video segmentation model, empowered by SAM and long-term point tracking.
However, the performance remains behind the existing SOTA methods while the methods are somewhat complicated and heuristic.

Motivation

samptfig1

The authors of this paper maybe wanted to leverage image segmentation foundation model, namely SAM, into video segmentation domain. To do this, point tracking method CoTracker are utilized.

Method

SAM-PT

samptfig2

SAM-PT works in four steps:

select query points for the first frame.
propagate the points to the entire video frame using the tracker module.
generate segmentation masks using SAM for the each frame independently.
using the predicted masks, reinitialize the process by sampling query points

Query Points Selection

What we have to do first is to generate multiple points. Since this paper focuses on the interactive VOS and semi-supervised VOS, basically there are two ways to generate the inital points.

samptfig3

In the case of semi-supervised VOS, multiple points are sampled using various methods (See Fig. 3). While for interactive point-based VOS, they just used the user input points.

Point Tracking

The points are then propagated across the entire frame of the video, employing off-the-shelf point tracker modules such as PIPS or CoTracker.

Segmentation

samptfig4

To prompt the SAM, positive and negative points are combined and feed to the SAM. Actually, this model works in two passes, where the first pass is what we know, and the second pass is getting points from the last step of the first pass. In the second pass, negative points provide a nuanced distinction between the object and the background. This second pass is executed a variable number for mask refinement process.

Point Tracking Reinitialization

Reinitialization process is executed optionally, once a prediction of $h=8$ frames is done. Then all the previous points are discarded and new points are generated based on the last segmentation mask.

SAM-PT vs. Object-centric Mask Propagation

sampttab1

The comparisons between the SAM-PT and previous methods are reported on Tab. 1.

iDeA: Why do we have to use this method despite of the inferior performances comparing to the previous methods?

Experiments

This section is remain blank intentionally.

Discussion

The methods are particularly interesting and sound promising. However, it is still not persuasive why we need to use this multi-step, complicated, and somewhat heuristic method.

TL;DR#

Motivation#

Method#

SAM-PT#

Query Points Selection#

Point Tracking#

Segmentation#

Point Tracking Reinitialization#

SAM-PT vs. Object-centric Mask Propagation#

Experiments#

Discussion#

TL;DR

Motivation

Method

SAM-PT

Query Points Selection

Point Tracking

Segmentation

Point Tracking Reinitialization

SAM-PT vs. Object-centric Mask Propagation

Experiments

Discussion