
lexx


Total Posts: 64 
Joined: Apr 2006 


Hi! was wondering if anyone knows a solution to the following problem : Given two vectors X* and Y* I want to create approximations to both of them, minimizing mean square errors, such that they are orthogonal to each other at the same time:
Min (X*X)^2 + (Y*Y)^2
X'*Y=0
Solution definitely exists, since I can regress one against the other and use residual, but it's not exactly the same as above 




kuebiko


Total Posts: 41 
Joined: May 2018 


I think the problem is not convex.
You are jointly solving for x and y... concatenate them into a single solution vector [x;y]. Without the constraint, you have two uncoupled trivial quadratic programs:
min_{x,y} 0.5*[x;y]’*[I, 0; 0, I]*[x;y]  [x*;y*]’*[x;y]
(in pseudo matlab notation)
But the orthogonalization constraint is a quadratic
[x;y]’*[0, I; I, 0]*[x;y] <= 0 and so you have a quadraticallyconstrained quadratic program.
For this to be convex the matrix appearing in the quadratic constraint needs to be positive definite. In that case you can use e.g. conic programming.




lexx


Total Posts: 64 
Joined: Apr 2006 


Thanks a lot!  interesting take. Didn't think about it this way. Will simplify then the problem, by adding orthogonality constraint with penalty 




nikol


Total Posts: 1172 
Joined: Jun 2005 


(X,Y) is defined as orthogonal basis in the same plane as (X*,Y*).
Define matrixC: Cxx = (X*, X)/X^2 Cxy = (X*, Y)/Y^2 Cyx = (Y*, X)/X^2 Cyy = (Y*, Y)/Y^2
(.,.)  scalar multiplication
It must be that X* = Cxx.X + Cxy.Y Y* = Cyx.X + Cyy.Y ADDED === or Z* = C.Z, Z = (X,Y) ===
You can redefine (X, Y) in terms of (X*, Y*), which gives matrix M ADDED === X = Mxx.X* + Mxy.Y* Y = Myx.X* + Myy.Y* or Z = inv(C).Z* = M.Z* ===
Hence, you want to find min Q(M), where Q(M) = ((Mxx1).X* + Mxy.Y* )^2 + (Myx.X* + (Myy1).Y*)^2
So, it is just about finding 4 numbers from 4equations:
dQ/dMij = 0, where i and j ={x,y}
The key is that it is 2dim problem in (x,y)space. If (X*,Y*)=0 (already orthogonal), then (X,Y) = (X*,Y*) For colinear vectors, X* = k.Y*, there is no solution.




kuebiko


Total Posts: 41 
Joined: May 2018 


Nikol, that’s a really nice way of looking at it, but I think there’s still something undetermined about the system. If you write out your four derivative equations, you’ll find that only two are unique. The orthogonality requirement gives you a third equation, but I think we’re still missing a fourth. Unless I’m misunderstanding something (very possible)... 




nikol


Total Posts: 1172 
Joined: Jun 2005 


> you’ll find that only two are unique
Nope, I don't share this conclusion. The set of dQ/dM is not degenerate. Repeat your calculations again. I did it on the back of envelope, just lazy to type it in here.
(above, added some more lines) 



kuebiko


Total Posts: 41 
Joined: May 2018 


You're right, I made a mistake and didn't say what I meant to say. The system is indeed full rank. But what I mean is, I think they are trivial. Have you tried solving the system for arbitrary vectors x* and y*? You get the identity solution for M unless you incorporate the orthogonality constraint.
Either way, the orthogonality condition has to be incorporated somehow, which it is not if we only consider dQ/dM (again, unless I'm missing something).
And we can see the orthogonality condition is nonlinear. You write
Z = M.Z*
So if we require Z'Z = 0 (inner product vanishes), then we need M to satisfy
Z*'M'MZ* = 0





nikol


Total Posts: 1172 
Joined: Jun 2005 


I admit, my notation in ADDED is a bit misleading. It was to illustrate the solution path.
From your Z'Z = 0 (with my Z=(X,Y)) you have (X,Y)'(X,Y) = X^2  Y^2 = 0, where X^2=Y^2, which is not what I mean.
By definition, it must be that X.Y = 0 and if X = Mxx.X* + Mxy.Y* Y = Myx.X* + Myy.Y* then (Mxx.X* + Mxy.Y*)'.(Myx.X* + Myy.Y*) = 0
> Z*'M'MZ* = 0
Honestly, I don't see where you try to lead the solution. It has simple intuitive picture:
Any two non colinear vectors (X*,Y*) define 2Dplane, which can have any basis (X,Y) (upto rotation). Then, the question was to find such X,Y, that the sum of distances between (X,X*) and (Y,Y*) are minimal. What I did is assumed arbitrary solution (X,Y), projected (X*,Y*) to that basis with C, then inverted it into M and expressed (X,Y) as a function of (X*,Y*), or (X(X*,Y*), Y(X*,Y*)). After that I am left with M only. 



kuebiko


Total Posts: 41 
Joined: May 2018 


My point in writing Z*'M'MZ* = 0 is that this isn't a purely linear system for the entries in M, because there is a nonlinear constraint on M. When jointly solving for X and Y, the problem is nonconvex (see my first reply). I understand the geometric intuition but I don't see how you can solve this with your approach. Can you give your numerical solution for some simple 2D example? What if X*=[1,0] and Y*=[1,1], so they are not orthogonal. What do you get for the entries in M? 




nikol


Total Posts: 1172 
Joined: Jun 2005 


Yep, you're right. I get Mxx=Myy=1, Mxy=Myx=0 and, therefore, Z=Z* :(
ADDED: Understood all you said about linear system. Indeed, all my writing about Cmatrix was left out from discussion and, therefore, orthogonality condition is not included. 



lexx


Total Posts: 64 
Joined: Apr 2006 


nikol and kuebiko  many thanks for the interesting discussion, now see the problem from different perspective 




ronin


Total Posts: 594 
Joined: May 2006 


> Solution definitely exists, since I can regress one against the other and use residual, > but it's not exactly the same as above
I think it is the same as above. Wlog set X*=0.
Then your problem is min X^2 + (Y*Y)^2 subject to X.Y=0. It's just an orthogonal projection.

"There is a SIX am?"  Arthur 


frolloos


Total Posts: 124 
Joined: Dec 2007 


Why can't you use GramSchmidt orthogonalisation? I am likely missing something here, but if the hard constraint is that they are orthogonal then you can use GramSchmidt, can't you? Whether the mean square error is minimised I don't know, and I am not sure you can do both at once.
https://en.wikipedia.org/wiki/Gram–Schmidt_process 
No vanna, no cry 



frolloos


Total Posts: 124 
Joined: Dec 2007 


dbl post. 
No vanna, no cry 


nikol


Total Posts: 1172 
Joined: Jun 2005 


the problem with GS is that u1=v1, i.e. first (arbitrary) vector, v1, always goes alongside with u1 generated orthogonal set and, therefore, u1v1 = 0.
as far as I understood the initial problem, lexx wants {u}'s to be equidistant from {v}'s. 




kuebiko


Total Posts: 41 
Joined: May 2018 


So it’s definitely biconvex, meaning you can fix y and solve for x, then fix x and solve for y, and repeat in alternating fashion until convergence. But in general biconvexity doesn’t guarantee a global optimum. So you’d have to play around with it and see if it’s sensitive to your choice of initialization. This is a simple problem so somebody who knows more about this may know whether or not there’s a global solution here. 



ETwode


Total Posts: 6 
Joined: May 2017 


I could be wrong but I think it isn't quite as simple as orthogonal projection. Consider the case in 2d where X* and Y* are the standard basis vectors rotated inward toward each other by the same angle. We expect that the solution is to symmetrically push them both back outward (projecting onto the axes), and can show that eg. regressing X* on Y* and taking X to be the residual gives larger error than this. Every case where X* = Y* reduces to this situation.
Manipulating the KKT conditions suggests that the general solution is: X = (1u^2)^1 [X*  uY*], Y = (1u^2)^1 [Y*  uX*], with u = (s  sqrt(s^2  4r^2)) / 2r, s = X*^2 + Y*^2, r = X*'Y* which passes some quick/basic sanity checks, but I haven't convinced myself the reasoning is 100% correct. 




lexx


Total Posts: 64 
Joined: Apr 2006 


Wow! thanks for the discussion! Didn't expect it
Wanted to share initial motivation for the problem :
I have 2 factors  equity characteristics, say value and quality. And want to combine them together, based on how they perform. They often do become highly correlated, so better results I could get if to orthogonalize them  which I can do by regressing one against the other and using residuals instead of the original factor. But results depend on which one is regressed and which is kept as is (tried different factors). Residualized factor is dominated by the one kept asis (ETwode's point regarding errors) So the posed problem would avoid that, and be "fair" to both factors  nikol's point of equidistance. GS procedure I believe creates orthogonal basis consisting of linear combinations of the vectors, if i'm not mistaken. I would like to avoid that, since interpretation becomes difficult. Min their respective errors, somewhat preserves the "nature" (to some degree) of approximating factors. 







