>>514838784
>It knows words, but there is no understanding behind those words
I don't think that's the issue, rather it's caught in the limbo between the meaning the user might imply and the meaning that is most common in the training data.
It then ignores "to the brim" because full is "restaurant-full" more often than not.
In other words, it doesn't (can't) imagine what the user might mean, it just assumes a probable answer based on the training data - but, in the "cup"-problem, it's not entirely that. There, it's rather assuming that the user identifies top and bottom correctly and goes straight to few, rare explanations (like novelty item) where the user's predetermination might hold true. That seems like it can't imagine a cup, but I'd say it's rather the issue of not clueing in on the user's lax use of "top" and "bottom" there. It's basically a different level of language comprehension in each case.