Search Results
7/25/2025, 10:21:53 PM
>>106024124
>Transformers are just optimized Markov processes (added memory feature)
they aren't markov models
if that were the case you could model them as markov process and is not the case.
>but that they are probabilistic models with big multidimensional matrixes as inputs.
the fact that they are probabilistic models doesn't mean they are tractable under the classic probability frameworks, if you try to do the same using pure probability distributions you will miserably fail.
You use classical probability and statistics when you really understand which are the relations between your variables and know the probability distribution that may fit your data. When you have no idea which the probability distribution is , then you use deep learning , with the cost of not even knowing the probability functions.
And don't say like "muh is just gaussean mixtures" because that's another bullshit, internally the model could be fitting new kinds of distributions you don't even know
>Transformers are just optimized Markov processes (added memory feature)
they aren't markov models
if that were the case you could model them as markov process and is not the case.
>but that they are probabilistic models with big multidimensional matrixes as inputs.
the fact that they are probabilistic models doesn't mean they are tractable under the classic probability frameworks, if you try to do the same using pure probability distributions you will miserably fail.
You use classical probability and statistics when you really understand which are the relations between your variables and know the probability distribution that may fit your data. When you have no idea which the probability distribution is , then you use deep learning , with the cost of not even knowing the probability functions.
And don't say like "muh is just gaussean mixtures" because that's another bullshit, internally the model could be fitting new kinds of distributions you don't even know
Page 1