Tags

Segmentation
Semantic
Superpixel
Vision Transformer
ViT
Open-Vocabulary
Retrieval
CLIP
Pre-training