Forums  > Pricing & Modelling  > simulation of correlated random variables  
     
Page 1 of 2Goto to page: [1], 2 Next
Display using:  

filthy


Total Posts: 1257
Joined: Jun 2004
 
Posted: 2006-04-20 00:09

given one normally distributed random number, i can generate another with a given correlation. but can this be easily extended so i can generate N series with a given correlation matrix? if only i could remember where i learned the first trick i might be able to generalize it...


"Game's the same, just got more fierce"

IAmEric
Phorgy Phynance
Banned
Total Posts: 2961
Joined: Oct 2004
 
Posted: 2006-04-20 00:25
If you want to generate N series consisting of M samples each with a given correlation, generate an M x N matrix X of uncorrelated normally distributed random variables. Take your N x N correlation matrix C and perform a cholesky (or some other) decomposition so that you have

C = A^t A.

Set

Y = X A.

Y is an M x N matrix whose columns have the desired correlation.

(pretty sure I got that right)

Good luck Beer

Patrik
Founding Member

Total Posts: 1336
Joined: Mar 2004
 
Posted: 2006-04-20 00:28

I think this was brought up a couple of days ago. If you search the forum for "cholesky" I'm sure you'll find that thread (and a couple of other threads on the subject), and using google you'll find more than a couple of papers. Good luck.

edit: crossed posts with IAE


Capital Structure Demolition LLC Radiation

mj


Total Posts: 1049
Joined: Jun 2004
 
Posted: 2006-04-20 01:42
most people do C= AA^{t}

and set Y =AX



IAmEric
Phorgy Phynance
Banned
Total Posts: 2961
Joined: Oct 2004
 
Posted: 2006-04-20 02:17
The builtin Matlab function chol.m factors C according to

C = A^t A.

The Matlab code is

function series = nfactor(nsamp, corrmat);

ncorr = length(corrmat);
R = chol(corrmat);
x = randn(nsamp,ncorr);
series = x*R;


Smiley

filthy


Total Posts: 1257
Joined: Jun 2004
 
Posted: 2006-04-20 17:05
thanks guys. next time i will search harder.Blush

"Game's the same, just got more fierce"

SARS


Total Posts: 69
Joined: Feb 2006
 
Posted: 2006-04-20 17:45

Just to clarify, and a quick google seems to confirm, presumably Cholesky decomposition works for any elliptical distributions and not just the multivariate Normal (as seems to generally get asked here)....

 


aaron


Total Posts: 746
Joined: Mar 2006
 
Posted: 2006-04-27 00:26

Although this is the common approach, it's easy to see that it makes little sense. Cholesky gives you a series of coefficients, C11, C21, C22, C31, C32, C33,. . .

You have independent Normal variates X1, X2, X3. . . and you want correlated variates Y1, Y2, Y3. . .

Y1 = C11*X1

Y2 = C21*X1 + C22*X2

Y3 = C31*X1 + C32*X2 + C33*X3

and so on. The instability of this process should be obvious. Multiplying and adding long series of numbers exacerbates rounding error. If you get a deviant value in X1, it affects every Y. If your Normal generating routine is even slightly bad, you'll get wrong correlations. When I also mention that the process for computing the C's is subject to exactly the same instabilities, you'll see that squares the problem. On top of all this, the process is computation-intensive and destroyed by small errors in the covariance matrix, and we know covariance matrix estimation is not robust in the first place.

Cholesky works on the blackboard for symbol manipulation, but any time someone shows it to you in code, you know they've done the problem wrong.

Block diagonal decompositions, with correlations between blocks but not directly between individual members of different blocks, make a lot more sense. These give formulae for the Y's in which each Y is expressed of as a weighted sum of, say, 10 quantities (some of which are also sums of 10 quantities and so forth), rather than one Y expressed as a sum of one quantity and one Y expressed as a sum of N. The whole process is much more robust and easier to code and maintain.


Henrik


Total Posts: 803
Joined: Nov 2004
 
Posted: 2006-04-27 00:41

Block diagonal decompositions.....

Aaron, do you have any reference to this?

Cheers,

Henrik


Friendly ghost

IAmEric
Phorgy Phynance
Banned
Total Posts: 2961
Joined: Oct 2004
 
Posted: 2006-04-27 00:41
True. I'd also add that anyone who wants to do this for large matrices has more problems than just rounding error because the whole concept is flawed to begin with (I have no faith in large correlation matrices). If you do need to perform a large dimensional Monte Carlo, it makes sense to decompose the problem hierarchically as you suggested. At the end of the day, you will still need to simulate the smaller blocks though. Doing the smaller blocks via Cholesky (or similarity transforms and eigenanalysis) should be fine I would think.

Cheers Beer

sanyasi


Total Posts: 52
Joined: Jun 2005
 
Posted: 2006-04-27 05:30

Attached File: NALXLW.zip

I'm in the process of converting some libraries for use in Excel. One of the functions available is a Cholesky decomposition. See attached spreadsheet and add-in.

Perhaps will be of use to somebody.

In addition to the problems already pointed out about this decomposition to generate correlated random numbers, there is also the issue of this method not being all that suitable for terminal correlation.  


mj


Total Posts: 1049
Joined: Jun 2004
 
Posted: 2006-04-27 08:37
you can use it for terminal correlation provided you use the terminal correlation matrix

aaron


Total Posts: 746
Joined: Mar 2006
 
Posted: 2006-04-27 14:52

I don't have a reference handy. People who publish about this generally assume you have to identify the blocks statistically. This is the hard problem. In finance, we always have a lot of structure to begin with. If you're simulating security price changes, you know you have sectors like equities, interest rates, currencies, softs, hards and energy. You can subdivide these further as appropriate.

Once you get to a granular level, typically with six to twenty prices, you can rotate into principal components. It's usually enough to take the first principal components of these blocks (although you can take two or more of some if they're important), and put them in a new covariance matrix with 1/6 to 1/20 the number of rows and columns of the original. If that's still too big, you do it again for another level of aggregation.

When you're done, to reconstruct a specific equity price you'll have something like:

P = C0*Global_Equity _Factor + C1*US_Equity_Factor + C2*Manufacturing_Industry_Factor + C3*Electrical_Supplies_Factor + C4*Ideosyncratic_Factor

This reconstruction will not match all N*(N+1)/2 covariances of the overall covariance matrix, but you can't estimate those reliably anyway. It will produce simulated prices that are statistically indistinguishable from the historical series used for covariance estimation, in a robust, meaningful and numerically stable way. The correlation between, say, the sixth principal component of electrical supplies stocks with the eighth principal component of south asian currency inflation rates is safe to ignore; it's not going to be stable anyway.


SARS


Total Posts: 69
Joined: Feb 2006
 
Posted: 2006-04-27 17:03

I'm with Eric.  Although I'm no expert on data and methods within the banking world, I can't believe that some of the data issues that face insurers are completely unique.  Rounding errors will be the least of your problem with a large matrix - so I personally wouldn't get too hung up about it.


aaron


Total Posts: 746
Joined: Mar 2006
 
Posted: 2006-04-28 15:34

I would say rounding errors underlie many of the problems with large matrices. High dimensions have a way of exaggerating tiny problems, but rounding error is often the grain of sand needed for the pearl.

Also, I have an engineer's sense that there's a right way and a wrong way to do things. When you see code that exaggerates rounding errors, it's usually wrong for other reasons as well. When someone writes:

Y = C0 + C1*X + C2*X^2 + . . .

instead of:

Y = C0 + X*(C1 + X*(C2 +. . .

I know the answer will be wrong (and probably the question as well) just as I know the emails filled with mispellings and bad grammar are probably not sound business proposals, or a device that makes a lot of noise and heat and bad smells is not the right technology for the job.


IAmEric
Phorgy Phynance
Banned
Total Posts: 2961
Joined: Oct 2004
 
Posted: 2006-04-28 16:12
Just curious, when you say "large", how large are you talking about? 100 x 100? 1000 x 1000? 10^6 x 10^6? I'm still trying to wrap my head around where rounding error comes into the picture (not disagreeing). If you were in a situation where you were worried about rounding errors for a Cholesky facorization of a correlation matrix, I would actually be more worried about the very concept of correlation before I would be worried about rounding error. If someone handed me a code with either of the ways to compute your Y, but Ci went out to like i = 10^6, I'd be more worried about their faith in statistics then their engineering skills.

Anyway, this really gets down to the heart of one of my biggest worries lately, i.e. the risks of risk management. When I talk to other people in risk management, I always ask them what they think about the risks of risk management. Everyone, without fail, says that they are confident in their system, but they think other systems are scary (bordering on useless). One of the many aspects that scare me is the fact that you have teams of PhDs dedicated to computing humongous correlation matrices. Why? I am probably being extremely naive and too simplistic, but I think a large function of risk management is to give people a false sense of security. I have overheard conversations (not at my current place, btw) along the lines of, "Hey, your VaR numbers are not allowing me to allocate what I want to this sector. Go back and modify the correlation matrix until we get the numbers I want." Then a team of PhDs goes back and plays around until they get the correlation matrix they want. There is just so much ambiguity that risk management scares me. Especially when I talk to others who are not mathematically inclined and they try to tell me how I shouldn't worry about stuff because people generally have good risk management systems in place. If they only knew!

Others (smarter than me) have said that the risks of risk managements are not that great because people generally do not allocate assets according to VaR (unless they are forced to). I can't help but think that anyone who goes through the trouble of "measuring" VaR, would not start to panic when VaR numbers double, triple, or quadruple, which they could easily do with volatilities so low across the board.

Anyway, I have only been in this business for 1.25 years and admit to being totally naive. Just throwing some of my thoughts out there and hopefully will learn from my mistakes.

Cheers Beer
Eric

aaron


Total Posts: 746
Joined: Mar 2006
 
Posted: 2006-05-04 15:55

Large is not a precise term. A 10x10 correlation looks large when you think that it measures only the pairwise correlations, not the 1,000 third order effects, 10,000 fourth order and so on. But if you're working with analytic values and modern packages, one hundred rows and columns shouldn't cause too much problem, one thousand you have to be careful about and ten thousand means the answer is likely meaningless even if you are careful.

Just think of the last simulated random variable in a 10,000x10,000 Cholesky decomposition. It is a sum of 10,000 coefficients, each one multiplied by a N(0,1) factor and summed. If the coefficients were all about the same size, then the error in the result would be about 10,000^0.5 = 100 times the average coefficient error. But when some coefficients are much larger than others, the error grows much faster.

Each of the coefficients is similarly determined by 10,000 multiplications and sums, so in the best possible case, the output error is 10,000 times the input precision. If you have 16 digits of precision in your input correlations (which obviously means analytic correlations as empirical ones are probably have only one significant digit, if that), you can't have more than 11 digits of precision in your output. That's probably acceptable, but with differences in size among the correlations, you might find you have 3 or 1 or 0 significant digits in your output, even with exact, analytic inputs.

As you say, there are also problems with measuring correlations, and there are also problems with definining correlation, and more problems with association beyong pairwise correlation. Obviously these are all much larger than the 0.25*10^-16 average input rounding error, and when they are multiplied by 10,000, the noise can easily swamp the signal.

However, things aren't as scary as all that. You don't need to get the correlation matrix exactly right to have useful risk management. In fact, a risk management decision that depended on precise correlations would be highly risky, hence not a good risk management decision. The point of the correlation matrix assumptions is to capture some gross effect of variables moving together, not to make precise VaR measurements. People do tend to report precise VaR, rather than giving only one significant digit or a broad confidence interval, because the number is needed for decision making.

If you report a confidence interval for VaR, or a number with one sigificant digit, you get a lot of arguments. If you report essentially a random number drawn from the confidence interval, you can make a yes/no decision about whether a desk is over limit. That means a desk close to limit might get ruled under or over depending on the luck of the draw, which isn't a bad outcome since it's impossible to tell whether it's a little under or a little over. There's nothing magic about the limit choice.

Some people like precision, and are horrified by this process. But the point isn't to measure risk to the penny, it's to change behavior. Better to have a somewhat random outcome that everyone accepts and gets back to work than an ambiguous result that provokes arguments. Referees award either zero or one goal, not 0.89 when they're 89% sure a goal was scored. That leads to fewer arguments and better games and people who don't like the randomness should go home and play a game of chess instead of football. Risk managers say over or under limit, not under limit with 89% confidence. That leads to fewer arguments and better trading. People who don't like the randomness should go work on abstract math instead of trading.


sfca


Total Posts: 904
Joined: May 2004
 
Posted: 2006-05-04 17:49

So how do you do these correlated RV's across time?  For example, I have 50 equities and there is a payoff in the future based upon how many times they individually hit a barrier.  To get a first approximation I want to model 50 correlated series of lognormal equity prices.  First, I estimate the correlation of each to the S&P500 and use that correlation at each weekly timestep to link them in a one-factor type model like RV(equity_i)=sqr(rho)*(sys_e)+sqr(1-rho)*non_sys_e   where sys_e is the systematic risk and non_sys_e is the nonsystematic risk.  These RVs feed into the growth rates, and each period equity is a function of the previous through S(t)=S(t-1)*exp((u-vol^2/2)dt+vol*sqr(dt)*RV).

Two issues.  One, is that while for each time period the input correlation is just fine, the output is a non-linear transformation so that the output correlation (at each single time period taken by itself)  may be significantly different.  What does one do about the output correlation being different than the input correlation due to the transformation?

Second, when calculating the correlations of the resulting series across time, the correlations may diverge from the inputs because the series are cumulative.  Does one do a huge Cholesky for this with each week a new column or what? 

   


IAmEric
Phorgy Phynance
Banned
Total Posts: 2961
Joined: Oct 2004
 
Posted: 2006-05-04 22:18
Hi Aaron,

Thanks a lot for your post. This is great stuff.

What you say here makes a lot of sense
If you report a confidence interval for VaR, or a number with one sigificant digit, you get a lot of arguments. If you report essentially a random number drawn from the confidence interval, you can make a yes/no decision about whether a desk is over limit. That means a desk close to limit might get ruled under or over depending on the luck of the draw, which isn't a bad outcome since it's impossible to tell whether it's a little under or a little over. There's nothing magic about the limit choice.

If I understand correctly, you are saying that when all is said and done, there is a declaration made regarding whether a particular desk is over or under their VaR limit. It is pretty much binary.

What happens when a desk is over their VaR limit?

Something must happen. Things probably start out with a warning, but eventually assets are going to be allocated. The thing that worries me (and it doesn't seem to worry you as much, which is actually reassuring) is the situation where EVERY desk is suddenly SIGNIFICANTLY over their VaR limits. It's not a matter of gradation of some fine numbers. What if absolutely every single desk suddenly comes up in a report as being over their VaR limits? Am I wrong to think that some high level managers will start to freak out? Is it inconceivable that if (when?) vols snap up and all VaR numbers suddenly quadruple that this would cause some serious reallocation of assets? The very process of which will exacerbate volatilities?

For the record, I'm not caught up with precision. On the contrary, the thing that worries me is that a lot of people I talk to have a lot of faith in people like you. It might not be as obvious to them as it is to you that those numbers are really accurate to no more than 1 digit even though 3 or 4 are reported. They think that since there are so many smart people working in risk management that the risk is probably actually managed pretty well. With over $17 trillion in CDS out there, and no one has a clue on how to model the risk, you tell me how confident they should be in risk management.

Anyway, kind of the point (and to try to tie it back to the original subject of this thread) is that if you told some non-mathematically inclined manager that you had a group of 10 PhDs constructing a huge correlation matrix, then you and I know that the whole exercise is pretty much pointless and the resulting emprical correlation matrix contains almost no information, but it is easy to imagine how that non-mathematically inclined manager may be impressed and think things are under control since so much firepower has been thrown at the problem. I think a large population of the finance industry does not know that a 1000 x 1000 or 10000 x 10000 correlation matrix is pretty useless. A lot of those people are in control of large sums of money. Those are the people that worry me. The ones who will see spikes in VaR numbers across the board (which I view as an almost certainty) and not know how to respond. Not everyone in a decision making position is as smart as you unfortunately Smiley

We'll see. Like I said, I have only been working in finance for less than 1.5 years and haven't see a market crisis yet. When LTCM blew up, I was more interested in hanging out with John Baez and learning about quantum gravity and representations of Lie algebras. I have a lot of history to live before I'll even begin to have a clue (if ever).

Cheers Beer
Eric

tristanreid


Total Posts: 1677
Joined: Aug 2005
 
Posted: 2006-05-05 00:00

I think the most powerful thing constraining this type of model feedback is the existence of other constraints.  Many managers can only turn over so much of their portfolio in a limited time without going out of some compliance.  I think they generally take those limits a lot more seriously than VAR.   

If all managers were constrained by VAR, and there were no other constraints besides VAR, there would still be different reactions to spiking volatility.  If one manager is underweight some asset class and the other is overweight in that asset class, conditions of excess volatility will cause both of them to provide each other liquidity as they revert to the index.  I don't think we can make too many assumptions about that kind of negative correlation, and that only applies to managers tied to benchmarks, but I think it's safe to say they won't all definitely act 100% in the same direction. 

Also, volatility is measured over some time period, there would have to be a pretty big spike to cause a panic over a short period, even if all models used the same resources to measure volatility.  I'll bet there are some systems out there that only update market volatilities once every year or two, or even less.  Are they wrong?  IAE, I know you used to work on VAR systems at a previous job, so I'm not asking that rhetorically, is there a guideline or rule about how often vols get updated?

-t.


the only reason it would be easier to program in C is that you can't easily express complex problems in C, so you don't. -comp.lang.lisp

Graeme


Total Posts: 1629
Joined: Jun 2004
 
Posted: 2006-05-05 09:27

According to BIS/Basle 1996 VaR rules, one must have

strict evaluation procedures with parameter updates at least quarterly, and parameter estimation based on a minimum of a year of historical data; a sufficiently rich set of risk factors. These factors in particular must capture volatility risk of all positions and at all maturities.

That may or may not be verbatim, I'm taking it from some old notes of mine. I haven't had the enviable joy Dead of going through much of the Basle II documentation, but given that market VaR is basically unchanged there, I would guess that this rule still stands.

Bit of a joke really. Risk can and should update their parameters every day. It then all depends who risk de facto report to: to management, or to some big report vault in the sky. If the former, daily; if the latter, then quarterly. A lot of politics gets attached to risk management.


Graeme West

Graeme


Total Posts: 1629
Joined: Jun 2004
 
Posted: 2006-05-05 09:33

I'm coming in quite late on this correlated normal random variable stuff but I would venture to suggest that any matrix bigger than you can print and read on a page (with my shitty eyesight) is too big to put any value on individual entries, and you need to go to some kind of factor model, or PCA.


Graeme West

doctorwes


Total Posts: 576
Joined: May 2005
 
Posted: 2006-05-05 11:04

We'll see. Like I said, I have only been working in finance for less than 1.5 years and haven't see a market crisis yet. When LTCM blew up, I was more interested in hanging out with John Baez and learning about quantum gravity and representations of Lie algebras. I have a lot of history to live before I'll even begin to have a clue (if ever).

What's nice about fixed income is that something horrible happens, somewhere, about once every 2-3 years at most, so that everyone gets to see a crisis fairly early in their career.




IAmEric
Phorgy Phynance
Banned
Total Posts: 2961
Joined: Oct 2004
 
Posted: 2006-05-05 18:59
I love this topic, but there is already a thread dedicated to it, so I suggest moving the discussion back here, where I plan to (attempt) to address some of tristanreid's questions re parameter updates and compliance.

sfca


Total Posts: 904
Joined: May 2004
 
Posted: 2006-05-05 19:11
So I guess my question got ignored. 
Previous Thread :: Next Thread 
Page 1 of 2Goto to page: [1], 2 Next