Text this: A multimodal speech-based pipeline with joint emotion analysis for Vietnamese service quality assessment