Text this: A Clinical-Contextual Framework for Multimodal Tuberculosis Detection From Asynchronous CXR and Lung Sound Data Using Lesion-Guided Pairing and Decision Fusion