Andy reported that in the shuffle QA some functions take a long time:
m9249
“4½ days so far”
rankop
21.5 hours
m12466
26.3 hours
points
7.2 hours
Background: Shuffle QA is an intensification of the normal QA. The suite of over 1800 QA functions is run under shuffle, whereby every getspace
(memory allocation) is followed by every APL array in the workspace being shifted (“shuffled”) and by a simple workspace integrity check. The purpose is to uncover memory reads/writes which are rendered unsafe by events which in normal operations are rare.
m9249
m9249
tests +\[a]
, ×\[a]
, ⌈\[a]
, and ⌊\[a]
. It ran in under 3 seconds in normal operations; for it to take more than 4½ days under shuffle, getspace
is clearly the dominant resource and hereafter we focus on that.
m9249
used expressions like:
assert (+⍀x) ≡ {⍺+⍵}⍀x
Since {⍺+⍵}
is not recognized by the scan operator ⍀
and therefore not supported by special code, {⍺+⍵}⍀x
is done by the general O(n*2)
algorithm for scans, AND on each scalar the operand {⍺+⍵}
is invoked and a getspace
is done. The improved, faster QA does this:
F ← {1≥≢⍵:⍵ ⋄ x ⍪ (¯1↑⍵) ⍺⍺⍨ ¯1↑x←⍺⍺ ∇∇ ¯1↓⍵}
assert (+⍀x) ≡ +F x
With such a change, the faster m9249
runs in 22 minutes under shuffle instead of over 4½ days, without any reduction in coverage.
The number of getspace
are as follows: +⍀x
takes O(1)
. The second expression {⍺+⍵}⍀x
does ×\1↓⍴x
separate scans, and each one takes O((≢x)*2) getspace
; the total is therefore O(((≢x)*2)××/1↓⍴x) ←→ O((≢x)××/⍴x)
. The third expression +F
is linear in ≢x
and is therefore O(≢x)
in the number of getspace
. In m9249
the largest x
has size 33 5 7
. The following table shows the orders of complexity and what they imply for this largest x
:
+⍀x |
O(1) |
1 |
{⍺+⍵}⍀x |
O((≢x)××/⍴x) |
38115 = 33 × ×/ 33 5 7 |
+F x |
O(≢x) |
33 |
The ratio between 38115 and 33 goes a long way towards explaining why the time to run m9249
under shuffle was over 4½ days and is now 22 minutes. (Why does m9249
still take 22 minutes? It tests for the four functions + × ⌈ ⌊
, for various axes, for different datatypes, and for different axis lengths.)
rankop
Another shuffle sluggard rankop
took 47 hours. rankop
tests expressions of the form f⍤r⊢y
and x f⍤r⊢y
for various functions f
. The techniques used for reducing the number of getspace
differ from function to function. We look at x↑⍤r⊢y
as an illustration.
assert (4↑⍤1⊢y) ≡ 4{⍺↑⍵}⍤1⊢y
⍴y
was 2 7 1 8 2 8
. The expression took 7.6e¯5
seconds normally and 7.5
seconds under shuffle. The left hand side requires O(1) getspace
, the right hand side O(×/¯1↓⍴x)
. The expression was changed to:
assert (4↑⍤1⊢y) ≡ 2 7 1 8 2 4↑y
Feels like cheating, doesn’t it? 🙂 The new expression takes 0.115 seconds under shuffle.
m12466
m12466
tests n⌈/[a]x
(and n⌊/[a]x
). Previously, it did this by comparing n⌈/[a]x
against n{⍺⌈⍵}/[a]x
for x
with shape 7 17 13 11
, for each of the four axes, for two different values of n
, and for the 1-, 2-, 4-, and 8-byte numeric datatypes. It required 5238426 getspace
and 26.3 hours to run under shuffle.
F←{p⍉⍺⍺⌿(⊂(⎕io-⍨⍳|⍺)∘.+⍳(1-|⍺)+(⍴⍵)[⍵⍵])⌷(⍋p←⍒⍵⍵=⍳⍴⍴⍵)⍉⍵}
F
is a dyadic operator. The left operand ⍺⍺
is ⌈
or ⌊
; the right operand ⍵⍵
is the axis; and n(⌈ F a)x
is the same as n⌈/[a]x
. n(⌈ F a)x
mainly does n⌈⌿x
; other axes are handled by transposing the axis of interest to be the first, then doing n⌈⌿x
, then transposing the axes back into the required order. n(⌈ F a)x
is much more complicated than n{⍺⌈⍵}/[a]x
but has the advantage of using only O(1) getspace
.
The revamped m12466
function uses 13717 getspace
and 2 minutes to run under shuffle.
points
points
is a function to test monadic format (⍕
) for different values of ⎕pp
.
points←{⍺←17 ⋄ ⎕pp←⍺
tt←{(⌽∨\⌽⍵≠⍺)/⍵} ⍝ trim trailing ⍺s.
~0∊∘.{ ⍝ all pairs (⍳⍵)
num←'0'tt↑,/⍕¨⍺'.'⍵ ⍝ eg '9.3'
num≡⍕⍎num ⍝ check round trip
}⍨⍳⍵
}
The expression is 14 15 16 17 points¨ 100
, meaning that for each of the 100 numbers num
as text,
1.1 1.2 1.3 1.4 1.5 1.100
2.1 2.2 2.3 2.4 2.5 2.100
3.1 3.2 3.3 3.4 3.5 ... 3.100
...
99.1 99.2 99.3 99.4 99.5 99.100
100.1 100.2 100.3 100.4 100.5 100.100
a “round-trip” num≡⍕⍎num
is evaluated. The expression requires 1.3 million getspace
and 7.2 hours to run under shuffle.
A much more efficient version with respect to getspace
is possible. Let x
be a vector. ⍕¨x
requires O(≢x) getspace
; ⍕x
requires O(1) getspace
and, like ⍕¨x
, formats each individual element of x
independently.
points1←{
⎕io←⎕ml←1
tt←{(⌽∨\⌽⍵≠⍺)/⍵} ⍝ trim trailing ⍺s
x←1↓∊(' ',¨s,¨'.')∘.,'0'tt¨s←⍕¨⍳⍵ ⍝ numbers as text
y←⍎x
{⎕pp←⍵ ⋄ x≡⍕y}¨⍺ ⍝ check round trip
}
14 15 16 17 points1 100
requires 11050 getspace
and less than 2 minutes to run under shuffle.
What to do?
Andy says that the shuffle QA takes over 2000 hours (!). The shuffle sluggards will be tackled as done here by using more parsimonious APL expressions to compare against the aspect being QA-ed. Going forward, a new QA function will be run under shuffle as it is being developed. The developers will be unlikely to tolerate a running time of 4½ days.