Text this: Fine-tuning CLIP in spectral space for multimodal sentiment analysis