SAM-CLIP: Merging Vision Foundation Models Towards Semantic and Spatial Understanding

Published in CVPR eLVM Workshop, 2024

Direct Link