Thread 16714614 - /sci/ [Archived: 427 hours ago]

Anonymous
7/3/2025, 12:22:45 PM No.16714614
1751387350911872s
1751387350911872s
md5: 339dbbec3bc6698478a5390d243c1fcc🔍
How does error correction work?

Let's say I have a machine that calculates y = mx, where x is the real number input and m is a real parameter stored in the machine, and y is then the real number output.

Suppose the machine comes from the factory with m=2 installed. Now I have some peculiar process which I wish to capture that gives me y=3 from x=2. The machine with factory settings would rather output y=4. I need to correct it. So I set it in correction mode and feed it (2, 5) to correct the error.

What are the numerous ways it could correct itself?
Replies: >>16714629
Anonymous
7/3/2025, 12:42:31 PM No.16714629
>>16714614 (OP)
define a loss function (y - m*x)^2 = L
take the derivative of L respect to m
dL = -2x*(y-m*x)
then
m_1 = m_0 + η*dL (gradient descent)
then loop
or
+standart linear regression
+recursive least squares
+linear quadratic gaussian control (kalman filter)
etc.etc.
Replies: >>16714630
Anonymous
7/3/2025, 12:46:44 PM No.16714630
>>16714629
Where does this notion of a "loss function" L(m; y, x) and trying to minimize it come from?
And more specifically, how did they figure out that L should be the square of the difference at each point?
Replies: >>16715017 >>16715051
Anonymous
7/3/2025, 6:16:13 PM No.16715017
>>16714630
>Where does this notion of a "loss function" L(m; y, x) and trying to minimize it come from?
I believe from information theory, but it became standard in optimization generally, and these days kids learn the term because it’s now considered the basics of machine learning. The word “loss” doesn’t really make sense in all contexts, so people sometimes call it a “cost function instead”—a big error is expensive!

The choice of sum-of-squares (“squared L2 norm”) is I think mostly just convention, it has some basic features e.g. it’s nonnegative and it’s easy for a computer to compute. But these guys use other ones too, look up “elastic net” for a kinda strange one. Generally asking these guys for a theoretic reason is a non starter, as they are basically Prompt Engineers tweaking parameters and running tests, and picking the ones that seem to work
Replies: >>16715751
Anonymous
7/3/2025, 6:53:12 PM No.16715051
>>16714630
If you want to approximate or estimate something, you want to minimize the error of your procedure. Now how to define the error is a choice, there are infinitely many choices. Informally, your error should be some measure of distance or dissimilarity between the estimate and the true vector.

A loss function is, loosely speaking, some function that aggregates your errors across the many predictions/approximations/estimations you have made and spits out a single number. Naturally, the loss function should be increasing in the errors and consequently you want to minimize the loss function in your predictions to minimize your errors across the sample.

Choosing the square of the difference as the error arises from Euclidean geometry. In Euclidean geometry, for example the standard Cartesian xy plane, the distance between two points is a straight line. Using the Pythagorean theorem, the length of the straight line between two points, e.g. the distance between a:=(3,-1) and b:=(4,5), is then easily seen to be the square root of the sum of squared differences, i.e. d(a,b) = ( (3-4)^2 + (-1-5)^2 ) ^1/2.

So if we want to minimize the error between our prediction vectors and the actual vectors in the sample, a natural choice for the distance that measures the error is the straight line distance between points, thus arriving at the squared differences. The square root is almost never applied to the error because the predictions that minimize the Euclidean distance will also minimize that same distance squared. The square root would just add another layer of computation and introduce floating point calculations, so it can be omitted.

Also, quadratic functions work nicely with optimization, they yield a differentiable problem which for example would not be the case if you use absolute values of differences as errors.
Replies: >>16715751
Anonymous
7/4/2025, 2:42:06 PM No.16715751
>>16715051
>>16715017
Isnt it actually true that the square is chosen because it is the optimal in some sense under certain statistical assumptions ?
Replies: >>16715893
Anonymous
7/4/2025, 6:07:33 PM No.16715893
>>16715751
It plays a role, but it is not the only reason and the assumptions are not always met. You are thinking of https://en.wikipedia.org/wiki/Gauss-Markov_theorem I assume. Or maybe you're thinking of the fact that MLE and OLS coincide in a linear model when the errors are assumed Gaussian.

Classically, statisticians cared a lot about unbiased estimators, but modern statisticians care more about minimizing MSE or some other criterion rather than minimizing variance within a class of unbiased estimators.

Many modern statistical problems have a high dimensional flavor where classical estimators are suboptimal or ill-defined and so the Gauss-Markov theorem is not relevant. In these high dimensional problems one often wants to exploit some sort of low dimensional structure in the data and that's where regularization such as elastic net the other guy was talking about comes in.
Replies: >>16715905
Anonymous
7/4/2025, 6:41:28 PM No.16715905
>>16715893
Is this actually even true though?
Aren't the high dimensional estimators still just using squares and roots?
Whereas some high degree polynomial requires structure. If I am thinking of decomposition of signals, the processing is actually using structures. Whereas market and AI approaches are using squares/roots. Even the elastic net is hinging on squares.
And isn't this because of the relation of a population to its averages and derivatives and the generalized pythagorean theorem describing a simple dimensional relation, of course. Doesn't the use of some further analysis require an inherent geometry - that is, in some sense the resultant would be an assumption?
Replies: >>16715942
Anonymous
7/4/2025, 7:52:03 PM No.16715942
>>16715905
>Is this actually even true though?
Is what true?

>Aren't the high dimensional estimators still just using squares and roots?
An estimator is not automatically classical if it uses squares or roots. Those are two of the basic mathematical operations, many estimators in many different settings will use these operations.

>Whereas some high degree polynomial requires structure.
I don't know what you mean by that. If anything it's the reverse, if you have a number of random points with few structure, you can fit a polynomial arbitrarily well to these points provided you use a high enough degree polynomial. At that point you're just overfitting but ye.

>If I am thinking of decomposition of signals, the processing is actually using structures. Whereas market and AI approaches are using squares/roots. Even the elastic net is hinging on squares.

What? Are you talking about signal processing? And that signal processing somehow doesn't uses squares or roots but "structures"? You need to be more clear and define things or use common teminology. A lot of signal processing is done in L^2 or l^2 and there you have an infinite dimensional analog of Euclidean distance that uses squares again, with integrals or with infinite sums.

>And isn't this because of the relation of a population to its averages and derivatives and the generalized pythagorean theorem describing a simple dimensional relation, of course. Doesn't the use of some further analysis require an inherent geometry - that is, in some sense the resultant would be an assumption?

I honestly don't understand much of this. Interpreting the 2nd sentence charitably you are talking about the fact that if you're "minimizing things with squares", things have to be square-integrable/summable so you are most likely in an Hilbert space such as L^2 or l^2 where the inner product gives rise to a geometry?
Anonymous
7/5/2025, 1:17:42 AM No.16716173
Under what conditions is the Loss function being quadratic in errors optimal? what doea it even mean, optimal?
Anonymous
7/5/2025, 4:00:58 AM No.16716239
the missile
the missile
md5: 1a159ace8f99fc5a992d9939809d025c🔍
>The missile knows where it is at all times. It knows this because it knows where it isn't. By subtracting where it is from where it isn't, or where it isn't from where it is (whichever is greater), it obtains a difference, or deviation. The guidance subsystem uses deviations to generate corrective commands to drive the missile from a position where it is to a position where it isn't, and arriving at a position where it wasn't, it now is. Consequently, the position where it is, is now the position that it wasn't, and it follows that the position that it was, is now the position that it isn't. In the event that the position that it is in is not the position that it wasn't, the system has acquired a variation, the variation being the difference between where the missile is, and where it wasn't. If variation is considered to be a significant factor, it too may be corrected by the GEA. However, the missile must also know where it was. The missile guidance computer scenario works as follows. Because a variation has modified some of the information the missile has obtained, it is not sure just where it is. However, it is sure where it isn't, within reason, and it knows where it was. It now subtracts where it should be from where it wasn't, or vice-versa, and by differentiating this from the algebraic sum of where it shouldn't be, and where it was, it is able to obtain the deviation and its variation, which is called error.
Replies: >>16716939
Anonymous
7/6/2025, 1:53:39 AM No.16716939
>>16716239
wew
Replies: >>16718627
Anonymous
7/7/2025, 5:20:27 PM No.16718627
>>16716939
kek