Similar Items: A multimodal speech-based pipeline with joint emotion analysis for Vietnamese service quality assessment