You can find the korean version of this post in here

Video Object Segmentation (VOS)

Semi-Supervised Video Object Segmentation (Semi-supervised VOS)

Also known as one-shot video object segmentation, it uses either the first frame mask or the bounding box of the target object as a reference. The goal is then to track the object along the entire frame and generate segmentation masks of the object.

You can refer to AOT, STCN, RDE, SWEM, XMem and DeAOT for the mask initialization scheme, and to SimMask, FTMU for box-initialization scheme.

Interactive Video Object Segmentation (Interactive VOS)

It follows the setting of the interactive track of the DAVIS 2019 Challenge. The goal is to track the object given by the scribbles in the first frame for the target object.

Unsupervised Video Object Segmentation (Unsupervised VOS)

Also known as zero-shot video object segmentation, this setting does not takes any supervisions and just segment the primary objects in a video automatically.

 

 


Video Instance Segmentation (VIS)

First proposed in Video instance segmentation (ICCV 2019), this task is to segment all objects in the predefined category, maintaining instance label number even if the object disappeared and reappeared. It can be seen as an extension of image instance segmentation task to video domain.

visfig1