SAM-CLIP: Merging Vision Foundation Models Towards Semantic and Spatial UnderstandingPublished in CVPR eLVM Workshop, 2024Direct LinkShare on Bluesky Facebook LinkedIn X (formerly Twitter) Previous Next