Making Lasagne
Participants in the SA2 Performance Tuning workshop at the Dyalog ’18 User Meeting were encouraged to bring their own problems for the group to work on. Diane Hymas of ExxonMobil brought a good one. The one-liner computation is as follows:
lasagne0 ← {groups {+⌿⍵}⌸ amts ×[⎕io] spices[inds;]}
where
n ← 8e5
spices ← ?6000 44⍴0
groups ← +\(16↑1 2)[?n⍴16]
inds ← ?n⍴≢spices
amts ← ?n⍴0
Applying lasagne0 to this dataset:
⍴ lasagne0 ⍬
100015 44
≢ ∪ groups
100015
)copy dfns wsreq cmpx
wsreq 'lasagne0 ⍬'
844799820
cmpx 'lasagne0 ⍬'
2.12E¯1
The problem with lasagne0 is space rather than time. The 845 MB required for this dataset may be acceptable, but we can be called upon to cook up large batches of lasagne in a smallish workspace, on a machine with limited RAM. (Large n and large ≢∪groups.)
All benchmarks in this document were run in Dyalog APL version 17.0, in a 2 GB workspace, on a machine with generous RAM.
Solutions
Marshall Lochbaum solved the problem. The alternative solutions are as follows:
lasagne0 ← {groups {+⌿⍵}⌸ amts ×[⎕io] spices[inds;]}
lasagne1 ← {↑ (groups{⊂⍵}⌸amts) {+⌿⍺×[⎕io]spices[⍵;]}¨ groups{⊂⍵}⌸inds}
lasagne2 ← {↑ (groups{⊂⍵}⌸amts) {⍺+.×spices[⍵;]}¨ groups{⊂⍵}⌸inds}
lasagne3 ← {↑ {amts[⍵]+.×spices[inds[⍵];]}¨ {⊂⍵}⌸groups}
lasagne0 is the original expression; lasagne1 and lasagne2 were derived by Marshall during the workshop; lasagne3 was suggested by a participant in the workshop. The four functions produce matching results. Comparing the space and time:
| space (MB) | time | ||
lasagne0 |
845 |
2.29e¯1 ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ |
|
lasagne1 |
74 |
3.60e¯1 ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ |
|
lasagne2 |
74 |
2.39e¯1 ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ |
|
lasagne3 |
74 |
2.93e¯1 ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ |
lasagne0 v lasagne1
Nearly all of the space required to evaluate lasagne0 is accounted for by the space for computing the right argument to key:
wsreq 'lasagne0 ⍬'
844799820
wsreq 'amts ×[⎕io] spices[inds;]'
844799548
In fact, the array spices[inds;], all by itself, is already very large. It has shape (⍴inds),1↓⍴spices (≡ 8e5 44), each item requiring 8 bytes.
wsreq 'spices[inds;]'
281599556
⍴ spices[inds;]
800000 44
8 × ×/ ⍴ spices[inds;]
281600000
qsize←{⎕size '⍵'} ⍝ # bytes for array ⍵
qsize spices[inds;]
281600040
lasagne1 avoids creating these large intermediate results, by first partitioning the arguments (groups{⊂⍵}⌸amts and groups{⊂⍵}⌸inds) and then applying a computation to each partition. In that computation, the operand function {+⌿⍺×[⎕io]spices[⍵;]}, ⍺ is a partition of amts and ⍵ is the corresponding partition of inds.
lasagne1, lasagne2 and lasagne3 require the same amount of space to run, so the comparison among them is on time.
lasagne1 v lasagne2
The only change is from +⌿⍺×[⎕io]spices[⍵;] to ⍺+.×spices[⍵;], which are equivalent when ⍺ is a vector. The interpreter can compute +.× in one go rather than doing +⌿ separately after doing ×[⎕io]; in such computation the interpreter can and does exploit the additional information afforded by +.× and is faster by a factor of 1.5 (= 2.39 ÷⍨ 3.60).
lasagne2 v lasagne3
The idea in lasagne3 is doing one key operation rather than the two in lasagne2. Therefore, the changes between lasagne2 v lasagne3 are:
lasagne2 |
groups{⊂⍵}⌸amts |
⍺ |
spices[⍵;] |
lasagne3 |
{⊂⍵}⌸groups |
amts[⍵] |
spices[inds[⍵];] |
All three key operations involve {⊂⍵}⌸ with groups as the key, and are roughly equally fast, each taking up no more than 7% of the total time.
cmpx 'groups{⊂⍵}⌸amts' 'groups{⊂⍵}⌸inds' '{⊂⍵}⌸groups'
groups{⊂⍵}⌸amts → 1.69E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
* groups{⊂⍵}⌸inds → 1.39E¯2 | -18% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
* {⊂⍵}⌸groups → 1.36E¯2 | -20% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
lasagne3 is doing one less key operation than lasagne2 in exchange for doing, for each of the ≢∪groups (= 100015) executions of the operand function, amt[⍵] v ⍺ and spices[ind[⍵];] v spices[⍵;]. Indexing is by no means slow, but it’s not as fast as references to ⍺ and ⍵. Therefore, lasagne2 is faster.
The trade-off may differ for different values of groups. In this case groups are small-range integers so operations using it as the key value are fast.

Follow