Cholesky Decomposition

Morten was visiting Dyalog clients and forwarded a request: Can we have the Cholesky decomposition?

If A is a Hermitian, positive-definite matrix, its Cholesky decomposition [0] is a lower-triangular matrix L such that A ≡ L +.× +⍉L. The matrix L is a sort of “square root” of the matrix A.

For example:

   ⎕io←0 ⋄ ⎕rl←7*5 ⋄ ⎕pp←6

   A←t+.×⍉t←¯10+?5 5⍴20
   A
231   42  ¯63  16  26
 42  199 ¯127 ¯68  53
¯63 ¯127  245  66 ¯59
 16  ¯68   66 112 ¯75
 26   53  ¯59 ¯75  75

   L←Cholesky A
   L
15.1987   0        0        0       0
 2.7634  13.8334   0        0       0
¯4.1451  ¯8.35263 12.5719   0       0
 1.05272 ¯5.12592  2.1913   8.93392 0
 1.71067  3.48957 ¯1.81055 ¯6.15028 4.33502

   A ≡ L +.× +⍉L
1

For real matrices, “Hermitian” reduces to symmetric and the conjugate transpose +⍉ to transpose . The symmetry arises in solving least-squares problems.

Some writers asserted that an algorithm for the Cholesky decomposition “cannot be expressed without a loop” [1] and that “a Pascal program is a natural way of expressing the essentially iterative algorithm” [2]. You can judge for yourself whether the algorithm presented here belies these assertions.

The Algorithm [3]

A recursive solution for the Cholesky decomposition obtains by considering A as a 2-by-2 matrix of matrices. It is algorithmically interesting but not necessarily the best with respect to numerical stability.

Cholesky←{
 ⍝ Cholesky decomposition of a Hermitian positive-definite matrix
    1≥n←≢⍵:⍵*0.5
    p←⌈n÷2
    q←⌊n÷2
    X←(p,p)↑⍵ ⊣ Y←(p,-q)↑⍵ ⊣ Z←(-q,q)↑⍵
    L0←∇ X
    L1←∇ Z-T+.×Y ⊣ T←(+⍉Y)+.×⌹X
    ((p,n)↑L0)⍪(T+.×L0),L1
}

The recursive block matrix technique can be used for triangular matrix inversion [4], LU decomposition [5], and QR decomposition [6].

Proof of Correctness

The algorithm can be stated as a block matrix equation:

  ┌───┬───┐          ┌──────────────┬──────────────┐
  │ X │ Y │          │   L0 ← ∇ X   │       0      │
∇ ├───┼───┤  ←→  L ← ├──────────────┼──────────────┤ 
  │+⍉Y│ Z │          │    T+.×L0    │L1 ← ∇ Z-T+.×Y│
  └───┴───┘          └──────────────┴──────────────┘

where T←(+⍉Y)+.×⌹X. To verify that the result is correct, we need to show that A≡L+.×+⍉L and that L is lower triangular. For the first, we need to show:

┌───┬───┐     ┌──────┬───────┐     ┌────────┬────────┐
│ X │ Y │     │  L0  │   0   │     │  +⍉L0  │+⍉T+.×L0│
├───┼───┤  ≡  ├──────┼───────┤ +.× ├────────┼────────┤
│+⍉Y│ Z │     │T+.×L0│   L1  │     │    0   │  +⍉L1  │
└───┴───┘     └──────┴───────┘     └────────┴────────┘

that is:

(a)  X     ≡ L0 +.× +⍉L0
(b)  Y     ≡ L0 +.× +⍉ T+.×L0
(c)  (+⍉Y) ≡ (T+.×L0) +.× +⍉L0
(d)  Z     ≡ ((T+.×L0) +.× (+⍉T+.×L0)) + (L1+.×+⍉L1)

(a) holds because L0 is the Cholesky decomposition of X.

(b) is seen to be true as follows:
L0 +.× +⍉ T+.×L0
L0 +.× +⍉ ((+⍉Y)+.×⌹X)+.×L0 definition of T
L0 +.× (+⍉L0)+.×(+⍉⌹X)+.×Y +⍉A+.×B ←→ (+⍉B)+.×+⍉A and +⍉+⍉Y ←→ Y
(L0+.×+⍉L0)+.×(+⍉⌹X)+.×Y +.× is associative
X+.×(+⍉⌹X)+.×Y (a)
X+.×(⌹X)+.×Y X and hence ⌹X are Hermitian
I+.×Y associativity; matrix inverse
Y identity matrix

(c) follows from (b) by application of +⍉ to both sides of the equation.

(d) turns on that L1 is the Cholesky decomposition of Z-T+.×Y:

((T+.×L0)+.×(+⍉T+.×L0)) + (L1+.×+⍉L1)
((T+.×L0)+.×(+⍉T+.×L0)) + Z-T+.×Y
((T+.×L0)+.×(+⍉L0)+.×+⍉T) + Z-T+.×Y
(T+.×X+.×+⍉T) + Z-T+.×Y
(T+.×X+.×+⍉(+⍉Y)+.×⌹X) + Z-T+.×Y
(T+.×X+.×(+⍉⌹X)+.×Y) + Z-T+.×Y
(T+.×X+.×(⌹X)+.×Y) + Z-T+.×Y
(T+.×I+.×Y) + Z-T+.×Y
(T+.×Y) + Z-T+.×Y
Z

Finally, L is lower triangular if L0 and L1 are lower triangular, and they are by induction.

A Complex Example

   ⎕io←0 ⋄ ⎕rl←7*5

   A←t+.×+⍉t←(¯10+?5 5⍴20)+0j1ׯ10+?5 5⍴20
   A
382        17J131  ¯91J¯124 ¯43J0107  20J0035
 17J¯131  314     ¯107J0005 ¯60J¯154  26J¯137
¯91J0124 ¯107J¯05  379       49J0034  20J0137
¯43J¯107  ¯60J154   49J¯034 272       35J0103
 20J¯035   26J137   20J¯137  35J¯103 324

   L←Cholesky A

   A ≡ L +.× +⍉L
1
   0≠L
1 0 0 0 0
1 1 0 0 0
1 1 1 0 0
1 1 1 1 0
1 1 1 1 1

A Personal Note

This way of computing the Cholesky decomposition was one of the topics of [7] and was the connection (through Professor Shlomo Moran) by which I acquired an Erdős number of 2.

References

  1. Wikipedia, Cholesky decomposition, 2014-11-25.
  2. Thomson, Norman, J-ottings 7, The Education Vector, Volume 12, Number 2, 1995, pp. 21-25.
  3. Muller, Antje, Tineke van Woudenberg, and Alister Young, Two Numerical Algorithms in J, The Education Vector, Volume 12, Number 2, 1995, pp. 26-30.
  4. Hui, Roger, Cholesky Decomposition, J Wiki Essay, 2005-10-14.
  5. Hui, Roger, Triangular Matrix Inverse, J Wiki Essay, 2005-10-27.
  6. Hui, Roger, LU Decomposition, J Wiki Essay, 2005-10-31.
  7. Hui, Roger, QR Decomposition, J Wiki Essay, 2005-10-30.
  8. Ibarra, Oscar, Shlomo Moran, and Roger Hui, A Generalization of the Fast LUP Matrix Decomposition Algorithm and Applications, Journal of Algorithms 3, 1982, pp. 45-56.

Three-and-a-bit

The most obvious expression for computing π in APL is ○1. But what if you can’t remember how works, or your O key is broken, or you feel like taking the road less travelled? With thanks to Wikipedia’s excellent list of Approximations of π, here are some short sweet APL expressions for three-and-a-bit:

      3                                 ⍝ very short
3
      4                                 ⍝ not so sweet
4
      s←*∘0.5                           ⍝ let's allow ourselves some square roots
      +/s 2 3
3.14626436994197234232913506571557
      31*÷3
3.141380652391393004493075896462748
      +/1.8*1 0.5                       ⍝ Ramanujan
3.141640786499873817845504201238766
      s 7+s 6+s 5
3.141632544503617840472137945142766
      ÷/7 4*7 9
3.141567230224609375
      9⍟995
3.141573605337628094187009177086444
      355÷113
3.141592920353982300884955752212389
      s s 2143÷22                       ⍝ Ramanujan again
3.141592652582646125206037179644022
      +∘÷/3 7 15 1 292                  ⍝ continued fraction
3.14159265301190260407226149477373
      ÷/63 25×17 7+15×s 5
3.14159265380568820189839000630151
      (1E100÷11222.11122)*÷193
3.141592653643822210363178893440074
      (⍟744+640320*3)÷s 163             ⍝ Ramanujan yet again
3.141592653589793238462643383279727

This last one is accurate to more places than I ever learned in my youth!

Technical note: to get plenty of precision, these examples were evaluated with 128-bit decimal floating-point, by setting ⎕FR←1287 and ⎕PP←34.

For more on continued fractions, see cfract in the dfns workspace.

What’s Your Favourite Beautiful Squiggle?

Roger’s post speculating on Ken Iverson’s favourite APL expression reminded me that one of the delegates at Dyalog ’14 conducted a quick survey to find the most popular primitive (thanks to Alex Weiner for taking the initiative here!). His findings are reproduced below:

9 votes:
8 votes:
6 votes:
4 votes: ⍠ ⍟ * ⎕
3 votes: ⌽ ¨ ⍎
2 votes: ⍺ ∇ ≢ ← ⊃ ⊢ ⍬
1 vote: ⍉ , ∊ ⍋ ∘ ∧ ⍲ ⊥ ⌈

Unfortunately there were no reasons given…is it because it’s a shape that’s pleasing to the eye, a really nifty piece of functionality or something more esoteric?

As for me, it’s easy – my favourite is the Log glyph (). Not for a technical reason, although it performs a very useful function, nor due to its rather pleasing visual symmetry, but rather because of the way I was introduced to it. An APL virgin when I joined Dyalog 20 months ago, my first exercise was to familiarise myself with APL’s “beautiful squiggles”. When it came to the Log glyph I asked one of my colleagues a question and they dictated a line of APL to me to experiment with. As soon as they referred to by its informal name of “splat” that was it, I was entranced. Any language that is so powerful, so concise and yet can make adults have passionate discussions involving the word “splat” has got me for life.

Quicksort in APL

Quicksort is a classic sorting algorithm invented by C.A.R. Hoare in 1961 [0, 1]. It has been known for some time that quicksort has a terse rendition in APL [2]. To get right to it, here is the code:

Q←{1≥≢⍵:⍵ ⋄ S←{⍺⌿⍨⍺ ⍺⍺ ⍵} ⋄ ⍵((∇<S)⍪=S⍪(∇>S))⍵⌷⍨?≢⍵}

The “pivot” ⍵⌷⍨?≢⍵ is randomly chosen. ((∇<S)⍪=S⍪(∇>S)) is a fork, selecting the parts of which are less than the pivot, equal to the pivot, and greater than the pivot. The function is recursively applied to the first and the last of these three parts.

      ⎕io←0 ⋄ ⎕rl←7*5

      ⎕←x←?13⍴20
3 2 19 16 11 4 18 17 9 17 7 3 1
      Q x
1 2 3 3 4 7 9 11 16 17 17 18 19

The variant Q1 obtains by enclosing each of the three parts. Its result exhibits an interesting structure. The middle item of each triplet is the value of the pivot at each recursion. Since the pivot is randomly chosen, the result of Q1 can be different on the same argument, as illustrated below:

      Q1←{1≥≢⍵:⍵ ⋄ S←{⍺⌿⍨⍺ ⍺⍺ ⍵} ⋄ ⍵((⊂∘∇<S)⍪(⊂=S)⍪(⊂∘∇>S))⍵⌷⍨?≢⍵}

      Q1 x
┌──────┬───┬────────────────────────────────────┐
│┌─┬─┬┐│3 3│┌─┬─┬──────────────────────────────┐│
││1│2│││   ││4│7│┌┬─┬─────────────────────────┐││
│└─┴─┴┘│   ││ │ │││9│┌──┬──┬─────────────────┐│││
│      │   ││ │ │││ ││11│16│┌─────────┬──┬──┐││││
│      │   ││ │ │││ ││  │  ││┌┬─────┬┐│18│19│││││
│      │   ││ │ │││ ││  │  ││││17 17│││  │  │││││
│      │   ││ │ │││ ││  │  ││└┴─────┴┘│  │  │││││
│      │   ││ │ │││ ││  │  │└─────────┴──┴──┘││││
│      │   ││ │ │││ │└──┴──┴─────────────────┘│││
│      │   ││ │ │└┴─┴─────────────────────────┘││
│      │   │└─┴─┴──────────────────────────────┘│
└──────┴───┴────────────────────────────────────┘

      Q1 x
┌───────────────────────┬─┬─────────────────────────┐
│┌──────────────────┬─┬┐│9│┌──┬──┬─────────────────┐│
││┌┬─┬─────────────┐│7│││ ││11│16│┌───────────┬──┬┐││
││││1│┌┬─┬────────┐││ │││ ││  │  ││┌┬─────┬──┐│19││││
││││ │││2│┌┬───┬─┐│││ │││ ││  │  ││││17 17│18││  ││││
││││ │││ │││3 3│4││││ │││ ││  │  ││└┴─────┴──┘│  ││││
││││ │││ │└┴───┴─┘│││ │││ ││  │  │└───────────┴──┴┘││
││││ │└┴─┴────────┘││ │││ │└──┴──┴─────────────────┘│
││└┴─┴─────────────┘│ │││ │                         │
│└──────────────────┴─┴┘│ │                         │
└───────────────────────┴─┴─────────────────────────┘

The enlist of the result of Q1 x is the same as Q x, the sort of x:

Q1 x
1 2 3 3 4 7 9 11 16 17 17 18 19
      Q x
1 2 3 3 4 7 9 11 16 17 17 18 19

This note is meant to explore the workings of a classical algorithm. To actually sort data in Dyalog, it is more convenient and more efficient to use {⍵[⍋⍵]}. Earlier versions of this text appeared in [3, 4].

References

  1. Hoare, C.A.R, Algorithm 63: Partition, Communications of the ACM, Volume 4, Number 7, 1961-07.
  2. Hoare, C.A.R, Algorithm 64: Quicksort, Communications of the ACM, Volume 4, Number 7, 1961-07.
  3. Hui, Roger K.W., and Kenneth E. Iverson, J Introduction and Dictionary, 1991-2014; if. entry.
  4. Hui, Roger K.W., Quicksort, J Wiki Essay, 2005-09-28.
  5. Hui, Roger K.W. Sixteen APL Amuse-Bouches, 2014-11-02.

Ken Iverson’s Favourite APL Expression?

What was Ken Iverson’s favourite APL expression? I don’t know that he had one and if he had I don’t know what it was, but if I have to guess …

From Sixteen APL Amuse-Bouches:

The expression (0,x)+(x,0) or its commute, which generates the next set of binomial coefficients, is present in the document that introduced APL\360 in 1967 [20, Fig.1] and the one that introduced J in 1990 [21, Gc&Gd]; in Elementary Functions: An Algorithmic Treatment in 1966 [22, p.69], in APL\360 User’s Manual in 1968 [23, A.5], in Algebra: An Algorithmic Treatment in 1972 [24, p.141], in Introducing APL to Teachers in 1972 [25, p.22], in An Introduction to APL for Scientists and Engineers in 1973 [26, p.19], in Elementary Analysis in 1976 [27, ex.1.68], in Programming Style in APL in 1978 [28, §6], in Notation as a Tool of Thought in 1980 [29, A.3], in A Dictionary of APL in 1987 [30, m∇n], and probably others.

The expression in action:

   ⎕←x←,1
1
   ⎕←x←(0,x)+(x,0)
1 1
   ⎕←x←(0,x)+(x,0)
1 2 1
   ⎕←x←(0,x)+(x,0)
1 3 3 1
   ⎕←x←(0,x)+(x,0)
1 4 6 4 1
   ⎕←x←(0,x)+(x,0)
1 5 10 10 5 1
   ⎕←x←(0,x)+(x,0)
1 6 15 20 15 6 1

It is easily seen from the expression that the n-th vector of binomial coefficients is a palindrome and that its sum is 2*n.

Musings on Reduction

In one man’s humble opinion, reduction () is the Queen of Operators.

Each (¨) comes a close second, but doesn’t get the cigar because each can be written in terms of reduction.

Two special cases are of interest: reduction along axes of length 1 (or reduction of a scalar) and reduction along axes of length 0.

With a length-1 axis (or scalar), the operand function is not applied (+⌿'A' → 'A'). This can be useful as an implicit no-op – see DFS video on YouTube.

With a length-0 axis, a primitive operand returns its right identity item – but only if one is defined (⌊⌿⍬). Otherwise: DOMAIN ERROR.

Another way to think about the 0-length axis case is that a right identity item (if there is one) is catenated to the argument prior to the reduction. Functional Programming languages tend to define reduction in this way by supplying an explicit initial value (ival) to the reduction:

    fold fn ival [] = ival
    fold fn ival (x:xs) = fn x (fold fn ival xs)

We can write such a variant of in APL, supplying the initial value as right operand ⍵⍵:

      fold ← {⍺⍺⌿⍵⍪⍵⍵}        ⍝ right operand ⍵⍵ is initial value

      × fold 1 ⊢2 3 4         ⍝ same as regular ×⌿
24

      {⍺×⍵}fold 1 ⊢2 3 4      ⍝ non-primitive operand function
24

      {⍺×⍵}fold 1 ⊢⍬          ⍝ initial value returned for empty argument
1

Whilst it doesn’t provide the no-op trick for length-1 axes, fold gives us better control for null cases than does primitive reduction, which relies on the single prototypical item of its argument array:

      ⊢mat ← 2 3 ∘.+ 0(0 0)
┌─┬───┐
│2│2 2│
├─┼───┤
│3│3 3│
└─┴───┘

Notice the discontinuity in the depth of the result with regular +⌿ as the number of rows reaches 0:

      +⌿ 2↑mat
┌─┬───┐
│5│5 5│
└─┴───┘

      +⌿ 1↑mat
┌─┬───┐
│2│2 2│
└─┴───┘

      +⌿ 0↑mat              ⍝ Eh?
0 0

Supplying the variant with a prototypical row produces a more uniform convergence:

      +fold 0(0 0) ⊢2↑mat
┌─┬───┐
│5│5 5│
└─┴───┘

      +fold 0(0 0) ⊢1↑mat
┌─┬───┐
│2│2 2│
└─┴───┘

      +fold 0(0 0) ⊢0↑mat   ⍝ Ah!
┌─┬───┐
│0│0 0│
└─┴───┘

A similar discontinuity can be seen even for axes of length 1, with non-scalar primitive operand functions:

      ⊢mat ← 3 3⍴⍳9
1 2 3
4 5 6
7 8 9

Now:

      ,⌿ 3↑mat              ⍝ join reduction
┌─────┬─────┬─────┐
│1 4 7│2 5 8│3 6 9│
└─────┴─────┴─────┘

      ,⌿ 2↑mat
┌───┬───┬───┐
│1 4│2 5│3 6│
└───┴───┴───┘

      ,⌿ 1↑mat              ⍝ Tsk!
1 2 3

      ,⌿ 0↑mat              ⍝ Bah!
DOMAIN ERROR

But:

      ,fold(⊂⍬) ⊢3↑mat
┌─────┬─────┬─────┐
│1 4 7│2 5 8│3 6 9│
└─────┴─────┴─────┘

      ,fold(⊂⍬) ⊢2↑mat
┌───┬───┬───┐
│1 4│2 5│3 6│
└───┴───┴───┘

      ,fold(⊂⍬) ⊢1↑mat      ⍝ Ooh!
┌─┬─┬─┐
│1│2│3│
└─┴─┴─┘

      ,fold(⊂⍬) ⊢0↑mat      ⍝ Aah!
┌┬┬┐
││││
└┴┴┘

Although we have no specific plans to do so, it is conceivable that this definition of fold could be introduced as a variant of primitive reduction:

        nums ← ⍠(⊂⍬)        ⍝ possible variant for numeric reduction

      ,⌿nums 1↑mat          ⍝ Ooh!
┌─┬─┬─┐
│1│2│3│
└─┴─┴─┘
      ,⌿nums 0↑mat          ⍝ Aah!
┌┬┬┐
││││
└┴┴┘
      ,/nums 3 1↑mat        ⍝ Mmm! reduction along last axis
┌─┬─┬─┐
│1│4│7│
└─┴─┴─┘

Notice that the left and right arguments of a reduction’s operand function need not be of the same kind. Using an informal type notation:

      ⌿ :: (⍺ ∇ ⍵ → ⍵) ∇∇ [⍺]⍪⍵ → ⊂⍵

which, given an argument of uniform kind, collapses to:

      ⌿ :: (⍺ ∇ ⍺ → ⍺) ∇∇ [⍺] → ⊂⍺

I hope to say more about this style of polymorphic type notation in a future posting. In the meantime, the significant point is only that, in the general case, the operand function is of “kind” (⍺ ∇ ⍵ → ⍵), which means that the kind of its left argument may differ from that of its right argument and result. See more discussion on this idea in the notes for foldl in dfns.dws.