OKAY...I need help.

I want subtitles to display alongside audio at the same speed and timing that the words are being said.

My thought so far is to tokenize each word and have them added to a label alongside the length of the audio itself, but that hasn't been quite right. What would you all recommend? I was debating doing it based on when the audio picks up, like a minimum decible level or something.

This is in Godot btw.