
filthy


Total Posts: 1257 
Joined: Jun 2004 


given one normally distributed random number, i can generate another with a given correlation. but can this be easily extended so i can generate N series with a given correlation matrix? if only i could remember where i learned the first trick i might be able to generalize it... 
"Game's the same, just got more fierce" 



IAmEric

Phorgy Phynance Banned

Total Posts: 2961 
Joined: Oct 2004 


If you want to generate N series consisting of M samples each with a given correlation, generate an M x N matrix X of uncorrelated normally distributed random variables. Take your N x N correlation matrix C and perform a cholesky (or some other) decomposition so that you have
C = A^t A.
Set
Y = X A.
Y is an M x N matrix whose columns have the desired correlation.
(pretty sure I got that right)
Good luck 



Patrik

Founding Member

Total Posts: 1333 
Joined: Mar 2004 


I think this was brought up a couple of days ago. If you search the forum for "cholesky" I'm sure you'll find that thread (and a couple of other threads on the subject), and using google you'll find more than a couple of papers. Good luck.
edit: crossed posts with IAE 
Capital Structure Demolition LLC 



mj


Total Posts: 1049 
Joined: Jun 2004 


most people do C= AA^{t}
and set Y =AX




IAmEric

Phorgy Phynance Banned

Total Posts: 2961 
Joined: Oct 2004 


The builtin Matlab function chol.m factors C according to
C = A^t A.
The Matlab code is
function series = nfactor(nsamp, corrmat);
ncorr = length(corrmat); R = chol(corrmat); x = randn(nsamp,ncorr); series = x*R;





filthy


Total Posts: 1257 
Joined: Jun 2004 


thanks guys. next time i will search harder. 
"Game's the same, just got more fierce" 


SARS


Total Posts: 69 
Joined: Feb 2006 


Just to clarify, and a quick google seems to confirm, presumably Cholesky decomposition works for any elliptical distributions and not just the multivariate Normal (as seems to generally get asked here)....





aaron


Total Posts: 746 
Joined: Mar 2006 


Although this is the common approach, it's easy to see that it makes little sense. Cholesky gives you a series of coefficients, C11, C21, C22, C31, C32, C33,. . .
You have independent Normal variates X1, X2, X3. . . and you want correlated variates Y1, Y2, Y3. . .
Y1 = C11*X1
Y2 = C21*X1 + C22*X2
Y3 = C31*X1 + C32*X2 + C33*X3
and so on. The instability of this process should be obvious. Multiplying and adding long series of numbers exacerbates rounding error. If you get a deviant value in X1, it affects every Y. If your Normal generating routine is even slightly bad, you'll get wrong correlations. When I also mention that the process for computing the C's is subject to exactly the same instabilities, you'll see that squares the problem. On top of all this, the process is computationintensive and destroyed by small errors in the covariance matrix, and we know covariance matrix estimation is not robust in the first place.
Cholesky works on the blackboard for symbol manipulation, but any time someone shows it to you in code, you know they've done the problem wrong.
Block diagonal decompositions, with correlations between blocks but not directly between individual members of different blocks, make a lot more sense. These give formulae for the Y's in which each Y is expressed of as a weighted sum of, say, 10 quantities (some of which are also sums of 10 quantities and so forth), rather than one Y expressed as a sum of one quantity and one Y expressed as a sum of N. The whole process is much more robust and easier to code and maintain. 



Henrik


Total Posts: 803 
Joined: Nov 2004 


Block diagonal decompositions.....
Aaron, do you have any reference to this?
Cheers,
Henrik 
Friendly ghost 



IAmEric

Phorgy Phynance Banned

Total Posts: 2961 
Joined: Oct 2004 


True. I'd also add that anyone who wants to do this for large matrices has more problems than just rounding error because the whole concept is flawed to begin with (I have no faith in large correlation matrices). If you do need to perform a large dimensional Monte Carlo, it makes sense to decompose the problem hierarchically as you suggested. At the end of the day, you will still need to simulate the smaller blocks though. Doing the smaller blocks via Cholesky (or similarity transforms and eigenanalysis) should be fine I would think.
Cheers 



sanyasi


Total Posts: 52 
Joined: Jun 2005 


Attached File: NALXLW.zip
I'm in the process of converting some libraries for use in Excel. One of the functions available is a Cholesky decomposition. See attached spreadsheet and addin.
Perhaps will be of use to somebody.
In addition to the problems already pointed out about this decomposition to generate correlated random numbers, there is also the issue of this method not being all that suitable for terminal correlation. 




mj


Total Posts: 1049 
Joined: Jun 2004 


you can use it for terminal correlation provided you use the terminal correlation matrix 



aaron


Total Posts: 746 
Joined: Mar 2006 


I don't have a reference handy. People who publish about this generally assume you have to identify the blocks statistically. This is the hard problem. In finance, we always have a lot of structure to begin with. If you're simulating security price changes, you know you have sectors like equities, interest rates, currencies, softs, hards and energy. You can subdivide these further as appropriate.
Once you get to a granular level, typically with six to twenty prices, you can rotate into principal components. It's usually enough to take the first principal components of these blocks (although you can take two or more of some if they're important), and put them in a new covariance matrix with 1/6 to 1/20 the number of rows and columns of the original. If that's still too big, you do it again for another level of aggregation.
When you're done, to reconstruct a specific equity price you'll have something like:
P = C0*Global_Equity _Factor + C1*US_Equity_Factor + C2*Manufacturing_Industry_Factor + C3*Electrical_Supplies_Factor + C4*Ideosyncratic_Factor
This reconstruction will not match all N*(N+1)/2 covariances of the overall covariance matrix, but you can't estimate those reliably anyway. It will produce simulated prices that are statistically indistinguishable from the historical series used for covariance estimation, in a robust, meaningful and numerically stable way. The correlation between, say, the sixth principal component of electrical supplies stocks with the eighth principal component of south asian currency inflation rates is safe to ignore; it's not going to be stable anyway. 




SARS


Total Posts: 69 
Joined: Feb 2006 


I'm with Eric. Although I'm no expert on data and methods within the banking world, I can't believe that some of the data issues that face insurers are completely unique. Rounding errors will be the least of your problem with a large matrix  so I personally wouldn't get too hung up about it. 



aaron


Total Posts: 746 
Joined: Mar 2006 


I would say rounding errors underlie many of the problems with large matrices. High dimensions have a way of exaggerating tiny problems, but rounding error is often the grain of sand needed for the pearl.
Also, I have an engineer's sense that there's a right way and a wrong way to do things. When you see code that exaggerates rounding errors, it's usually wrong for other reasons as well. When someone writes:
Y = C0 + C1*X + C2*X^2 + . . .
instead of:
Y = C0 + X*(C1 + X*(C2 +. . .
I know the answer will be wrong (and probably the question as well) just as I know the emails filled with mispellings and bad grammar are probably not sound business proposals, or a device that makes a lot of noise and heat and bad smells is not the right technology for the job. 




IAmEric

Phorgy Phynance Banned

Total Posts: 2961 
Joined: Oct 2004 


Just curious, when you say "large", how large are you talking about? 100 x 100? 1000 x 1000? 10^6 x 10^6? I'm still trying to wrap my head around where rounding error comes into the picture (not disagreeing). If you were in a situation where you were worried about rounding errors for a Cholesky facorization of a correlation matrix, I would actually be more worried about the very concept of correlation before I would be worried about rounding error. If someone handed me a code with either of the ways to compute your Y, but Ci went out to like i = 10^6, I'd be more worried about their faith in statistics then their engineering skills.
Anyway, this really gets down to the heart of one of my biggest worries lately, i.e. the risks of risk management. When I talk to other people in risk management, I always ask them what they think about the risks of risk management. Everyone, without fail, says that they are confident in their system, but they think other systems are scary (bordering on useless). One of the many aspects that scare me is the fact that you have teams of PhDs dedicated to computing humongous correlation matrices. Why? I am probably being extremely naive and too simplistic, but I think a large function of risk management is to give people a false sense of security. I have overheard conversations (not at my current place, btw) along the lines of, "Hey, your VaR numbers are not allowing me to allocate what I want to this sector. Go back and modify the correlation matrix until we get the numbers I want." Then a team of PhDs goes back and plays around until they get the correlation matrix they want. There is just so much ambiguity that risk management scares me. Especially when I talk to others who are not mathematically inclined and they try to tell me how I shouldn't worry about stuff because people generally have good risk management systems in place. If they only knew!
Others (smarter than me) have said that the risks of risk managements are not that great because people generally do not allocate assets according to VaR (unless they are forced to). I can't help but think that anyone who goes through the trouble of "measuring" VaR, would not start to panic when VaR numbers double, triple, or quadruple, which they could easily do with volatilities so low across the board.
Anyway, I have only been in this business for 1.25 years and admit to being totally naive. Just throwing some of my thoughts out there and hopefully will learn from my mistakes.
Cheers Eric 



aaron


Total Posts: 746 
Joined: Mar 2006 


Large is not a precise term. A 10x10 correlation looks large when you think that it measures only the pairwise correlations, not the 1,000 third order effects, 10,000 fourth order and so on. But if you're working with analytic values and modern packages, one hundred rows and columns shouldn't cause too much problem, one thousand you have to be careful about and ten thousand means the answer is likely meaningless even if you are careful.
Just think of the last simulated random variable in a 10,000x10,000 Cholesky decomposition. It is a sum of 10,000 coefficients, each one multiplied by a N(0,1) factor and summed. If the coefficients were all about the same size, then the error in the result would be about 10,000^0.5 = 100 times the average coefficient error. But when some coefficients are much larger than others, the error grows much faster.
Each of the coefficients is similarly determined by 10,000 multiplications and sums, so in the best possible case, the output error is 10,000 times the input precision. If you have 16 digits of precision in your input correlations (which obviously means analytic correlations as empirical ones are probably have only one significant digit, if that), you can't have more than 11 digits of precision in your output. That's probably acceptable, but with differences in size among the correlations, you might find you have 3 or 1 or 0 significant digits in your output, even with exact, analytic inputs.
As you say, there are also problems with measuring correlations, and there are also problems with definining correlation, and more problems with association beyong pairwise correlation. Obviously these are all much larger than the 0.25*10^16 average input rounding error, and when they are multiplied by 10,000, the noise can easily swamp the signal.
However, things aren't as scary as all that. You don't need to get the correlation matrix exactly right to have useful risk management. In fact, a risk management decision that depended on precise correlations would be highly risky, hence not a good risk management decision. The point of the correlation matrix assumptions is to capture some gross effect of variables moving together, not to make precise VaR measurements. People do tend to report precise VaR, rather than giving only one significant digit or a broad confidence interval, because the number is needed for decision making.
If you report a confidence interval for VaR, or a number with one sigificant digit, you get a lot of arguments. If you report essentially a random number drawn from the confidence interval, you can make a yes/no decision about whether a desk is over limit. That means a desk close to limit might get ruled under or over depending on the luck of the draw, which isn't a bad outcome since it's impossible to tell whether it's a little under or a little over. There's nothing magic about the limit choice.
Some people like precision, and are horrified by this process. But the point isn't to measure risk to the penny, it's to change behavior. Better to have a somewhat random outcome that everyone accepts and gets back to work than an ambiguous result that provokes arguments. Referees award either zero or one goal, not 0.89 when they're 89% sure a goal was scored. That leads to fewer arguments and better games and people who don't like the randomness should go home and play a game of chess instead of football. Risk managers say over or under limit, not under limit with 89% confidence. That leads to fewer arguments and better trading. People who don't like the randomness should go work on abstract math instead of trading. 




sfca


Total Posts: 904 
Joined: May 2004 


So how do you do these correlated RV's across time? For example, I have 50 equities and there is a payoff in the future based upon how many times they individually hit a barrier. To get a first approximation I want to model 50 correlated series of lognormal equity prices. First, I estimate the correlation of each to the S&P500 and use that correlation at each weekly timestep to link them in a onefactor type model like RV(equity_i)=sqr(rho)*(sys_e)+sqr(1rho)*non_sys_e where sys_e is the systematic risk and non_sys_e is the nonsystematic risk. These RVs feed into the growth rates, and each period equity is a function of the previous through S(t)=S(t1)*exp((uvol^2/2)dt+vol*sqr(dt)*RV).
Two issues. One, is that while for each time period the input correlation is just fine, the output is a nonlinear transformation so that the output correlation (at each single time period taken by itself) may be significantly different. What does one do about the output correlation being different than the input correlation due to the transformation?
Second, when calculating the correlations of the resulting series across time, the correlations may diverge from the inputs because the series are cumulative. Does one do a huge Cholesky for this with each week a new column or what?




IAmEric

Phorgy Phynance Banned

Total Posts: 2961 
Joined: Oct 2004 


Hi Aaron,
Thanks a lot for your post. This is great stuff.
What you say here makes a lot of sense
If you report a confidence interval for VaR, or a number with one sigificant digit, you get a lot of arguments. If you report essentially a random number drawn from the confidence interval, you can make a yes/no decision about whether a desk is over limit. That means a desk close to limit might get ruled under or over depending on the luck of the draw, which isn't a bad outcome since it's impossible to tell whether it's a little under or a little over. There's nothing magic about the limit choice. If I understand correctly, you are saying that when all is said and done, there is a declaration made regarding whether a particular desk is over or under their VaR limit. It is pretty much binary.
What happens when a desk is over their VaR limit?
Something must happen. Things probably start out with a warning, but eventually assets are going to be allocated. The thing that worries me (and it doesn't seem to worry you as much, which is actually reassuring) is the situation where EVERY desk is suddenly SIGNIFICANTLY over their VaR limits. It's not a matter of gradation of some fine numbers. What if absolutely every single desk suddenly comes up in a report as being over their VaR limits? Am I wrong to think that some high level managers will start to freak out? Is it inconceivable that if (when?) vols snap up and all VaR numbers suddenly quadruple that this would cause some serious reallocation of assets? The very process of which will exacerbate volatilities?
For the record, I'm not caught up with precision. On the contrary, the thing that worries me is that a lot of people I talk to have a lot of faith in people like you. It might not be as obvious to them as it is to you that those numbers are really accurate to no more than 1 digit even though 3 or 4 are reported. They think that since there are so many smart people working in risk management that the risk is probably actually managed pretty well. With over $17 trillion in CDS out there, and no one has a clue on how to model the risk, you tell me how confident they should be in risk management.
Anyway, kind of the point (and to try to tie it back to the original subject of this thread) is that if you told some nonmathematically inclined manager that you had a group of 10 PhDs constructing a huge correlation matrix, then you and I know that the whole exercise is pretty much pointless and the resulting emprical correlation matrix contains almost no information, but it is easy to imagine how that nonmathematically inclined manager may be impressed and think things are under control since so much firepower has been thrown at the problem. I think a large population of the finance industry does not know that a 1000 x 1000 or 10000 x 10000 correlation matrix is pretty useless. A lot of those people are in control of large sums of money. Those are the people that worry me. The ones who will see spikes in VaR numbers across the board (which I view as an almost certainty) and not know how to respond. Not everyone in a decision making position is as smart as you unfortunately
We'll see. Like I said, I have only been working in finance for less than 1.5 years and haven't see a market crisis yet. When LTCM blew up, I was more interested in hanging out with John Baez and learning about quantum gravity and representations of Lie algebras. I have a lot of history to live before I'll even begin to have a clue (if ever).
Cheers Eric 





I think the most powerful thing constraining this type of model feedback is the existence of other constraints. Many managers can only turn over so much of their portfolio in a limited time without going out of some compliance. I think they generally take those limits a lot more seriously than VAR.
If all managers were constrained by VAR, and there were no other constraints besides VAR, there would still be different reactions to spiking volatility. If one manager is underweight some asset class and the other is overweight in that asset class, conditions of excess volatility will cause both of them to provide each other liquidity as they revert to the index. I don't think we can make too many assumptions about that kind of negative correlation, and that only applies to managers tied to benchmarks, but I think it's safe to say they won't all definitely act 100% in the same direction.
Also, volatility is measured over some time period, there would have to be a pretty big spike to cause a panic over a short period, even if all models used the same resources to measure volatility. I'll bet there are some systems out there that only update market volatilities once every year or two, or even less. Are they wrong? IAE, I know you used to work on VAR systems at a previous job, so I'm not asking that rhetorically, is there a guideline or rule about how often vols get updated?
t. 
the only reason it would be easier
to program in C is that you can't easily express complex problems
in C, so you don't. comp.lang.lisp 


Graeme


Total Posts: 1629 
Joined: Jun 2004 


According to BIS/Basle 1996 VaR rules, one must have
strict evaluation procedures with parameter updates at least quarterly, and parameter estimation based on a minimum of a year of historical data; a sufficiently rich set of risk factors. These factors in particular must capture volatility risk of all positions and at all maturities.
That may or may not be verbatim, I'm taking it from some old notes of mine. I haven't had the enviable joy of going through much of the Basle II documentation, but given that market VaR is basically unchanged there, I would guess that this rule still stands.
Bit of a joke really. Risk can and should update their parameters every day. It then all depends who risk de facto report to: to management, or to some big report vault in the sky. If the former, daily; if the latter, then quarterly. A lot of politics gets attached to risk management. 
Graeme West 



Graeme


Total Posts: 1629 
Joined: Jun 2004 


I'm coming in quite late on this correlated normal random variable stuff but I would venture to suggest that any matrix bigger than you can print and read on a page (with my shitty eyesight) is too big to put any value on individual entries, and you need to go to some kind of factor model, or PCA. 
Graeme West 



We'll see. Like I said, I have only been working in finance for less than 1.5 years and haven't see a market crisis yet. When LTCM blew up, I was more interested in hanging out with John Baez and learning about quantum gravity and representations of Lie algebras. I have a lot of history to live before I'll even begin to have a clue (if ever).
What's nice about fixed income is that something horrible happens, somewhere, about once every 23 years at most, so that everyone gets to see a crisis fairly early in their career. 




IAmEric

Phorgy Phynance Banned

Total Posts: 2961 
Joined: Oct 2004 


I love this topic, but there is already a thread dedicated to it, so I suggest moving the discussion back here, where I plan to (attempt) to address some of tristanreid's questions re parameter updates and compliance. 



sfca


Total Posts: 904 
Joined: May 2004 


So I guess my question got ignored. 








