You can find the korean version of this post in here

Video Object Segmentation (VOS)

Semi-Supervised Video Object Segmentation (Semi-supervised VOS)

ssvosfig1 Also known as one-shot video object segmentation, it uses either the first frame mask or the bounding box of the target object as a reference. This task is introduced in this CVPR 2017 paper. The goal is then to track the object along the entire frame and generate segmentation masks of the object.

You can refer to AOT, STCN, RDE, SWEM, XMem and DeAOT for the mask initialization scheme, and to SimMask, FTMU for box-initialization scheme.

 

 

Interactive Video Object Segmentation (Interactive VOS)

It follows the setting of the interactive track of the DAVIS 2019 Challenge. The goal is to track the object given by the scribbles in the first frame for the target object.

 

 

Unsupervised Video Object Segmentation (Unsupervised VOS)

Also known as zero-shot video object segmentation, this setting does not takes any supervisions and just segment the primary objects in a video automatically.

 

 

Weakly Supervised VOS

refer to this ECCV 2012 paper

 

 

Referring Video Object Segmentatino (RVOS)

Referring Video Object Segmentation (RVOS) is the process of segmenting and tracking objects in a video based on provided textual descriptions. For more details, please refer to this post.

 

 


Video Instance Segmentation (VIS)

First proposed in Video instance segmentation (ICCV 2019), this task is to segment all objects in the predefined category, maintaining instance label number even if the object disappeared and reappeared. It can be seen as an extension of image instance segmentation task to video domain.

visfig1


etc

RLE Encoding

Refer to this video.