>>936632807
'caption' is likely a text string, just connect it to the 'text' in the CLIP text encode.