The first e-mail of the work week came from Nicolas Delcros. He wondered whether anything clever can be done with ∘.≡
on enclosed character strings. I immediately thought of using “magic functions“, an implementation technique whereby interpreter facilities are coded as dfns. I thought of magic functions because the APL expressions involved in this case are particularly terse:
t ←' zero one two three four five six seven eight nine'
t,←' zéro un deux trois quatre cinq six sept huit neuf'
t,←' zero eins zwei drei vier fünf sechs sieben acht neun'
b←⌽a←1↓¨(' '=t)⊂t
cmpx 'a∘.≡a' '∘.=⍨⍳⍨a'
a∘.≡a → 9.69E¯5 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
∘.=⍨⍳⍨a → 8.21E¯6 | -92% ⎕⎕
cmpx 'a∘.≡b' '(a⍳a)∘.=(a⍳b)'
a∘.≡b → 9.70E¯5 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(a⍳a)∘.=(a⍳b) → 1.43E¯5 | -86% ⎕⎕⎕⎕
y←⌽x←300⍴a
cmpx 'x∘.≡x' '∘.=⍨⍳⍨x'
x∘.≡x → 9.53E¯3 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
∘.=⍨⍳⍨x → 1.52E¯4 | -99% ⎕
cmpx 'x∘.≡y' '(x⍳x)∘.=(x⍳y)'
x∘.≡y → 9.55E¯3 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(x⍳x)∘.=(x⍳y) → 1.95E¯4 | -98% ⎕
The advantage will be even greater if/when we speed up ⍳
. (And, obviously, the idea is also applicable to ∘.≢
: just replace ≡
with ≢
and =
with ≠
.)
Jay Foad objected that the comparisons above aren’t quite fair as the more verbose expressions should check that either ⎕ct←0
or that the arguments do not contain any elements subject to tolerant comparison. I countered that the checking in C would not greatly affect the benchmarks as the time to do the checking is O(m+n)
but the benefits are O(m×n)
.
Since the factors are so promising and the coding relatively easy, I went ahead and did the work, with the following results:
14.1 | 14.0 | ratio | cost of checking | |
a∘.≡a |
9.16e¯6 |
9.69e¯5 |
10.58 |
1.09 |
a∘.≡b |
1.53e¯5 |
9.70e¯5 |
6.34 |
1.08 |
x∘.≡x |
1.48e¯4 |
9.53e¯3 |
64.39 |
1.00 |
x∘.≡y |
1.94e¯4 |
9.55e¯3 |
49.23 |
1.01 |
The last column (the cost of checking that tolerant comparison is not involved) is under 10% and decreases as the argument sizes increase.
This work is another illustration of the ubiquity and practical usefulness of the “selfie” concept — in the new (14.1) implementation, x∘.≡y
is faster when the left and right arguments are the same than when they are not. In particular, the selfie x⍳x
or ⍳⍨x
occurs twice, and bolsters the point made in item 3 of Sixteen APL Amuse-Bouches:
x⍳x
are like ID numbers; questions of identity onx
can often be answered more efficiently onx⍳x
than onx
itself.
Finally, after all that, I recommend that one should consider using x⍳y
before using x∘.≡y
. The latter requires O(m×n)
space and O(m×n)
time, and is inherently inefficient.