(⎕io←0
and timings are done in Dyalog version 15.0.)
Table Look-Up
Some functions can be computed faster by table look-up than a more traditional and more conventional calculation. For example:
b←?1e6⍴2
cmpx '*b' '(*0 1)[b]'
*b → 1.32E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(*0 1)[b] → 1.23E¯3 | -91% ⎕⎕⎕
Some observations about this benchmark:
(a) The advantage of table look-up depends on the time to calculate the function directly, versus the time to do indexing, ⍺[⍵]
or ⍵⌷⍺
, plus a small amount of time to calculate the function on a (much) smaller representative set.
Indexing is particularly efficient when the indices are boolean:
bb←b,1
bi←b,2
cmpx '2 3[bb]' '2 3 3[bi]'
2 3[bb] → 1.12E¯4 | 0% ⎕⎕⎕⎕⎕
2 3 3[bi] → 7.39E¯4 | +558% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
bi
is the same as bb
except that the last element of 2
forces it to be 1-byte integers (for bi
) instead of 1-bit booleans (for bb
).
(b) Although the faster expression works best with 0-origin, the timing for 1-origin is similar.
cmpx '*b' '{⎕io←0 ⋄ (*0 1)[⍵]}b'
*b → 1.32E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
{⎕io←0 ⋄ (*0 1)[⍵]}b → 1.22E¯3 | -91% ⎕⎕⎕
That is, if the current value of ⎕io
is 1, the implementation can use a local ⎕io
setting of 0.
(c) The fact that *b
is currently slower than (*0 1)[b]
represents a judgment by the implementers that *b
is not a sufficiently common computation to warrant writing faster code for it. Other similar functions already embody the table look-up technique:
cmpx '⍕b' '¯1↓,(2 2⍴''0 1 '')[b;]'
⍕b → 6.95E¯4 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
¯1↓,(2 2⍴'0 1 ')[b;] → 6.92E¯4 | -1% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
Small Domains
Table look-up works best for booleans, but it also works for other small domains such as 1-byte integers:
i1←?1e6⍴128 ⍝ 1-byte integers
cmpx '*i1' '(*⍳128)[i1]'
*i1 → 1.48E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(*⍳128)[i1] → 9.30E¯4 | -94% ⎕⎕
cmpx '○i1' '(○⍳128)[i1]'
○i1 → 2.78E¯3 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(○⍳128)[i1] → 9.25E¯4 | -67% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
In general, a table for 1-byte integers needs to have values for ¯128+⍳256
. Here ⍳128
is used to simulate that in C indexing can be done on 1-byte indices without extra computation on the indices. In APL, general 1-byte indices are used as table[i1+128]
; in C, the expression is (table+offset)[i]
.
Domains considered “small” get bigger all the time as machines come equipped with ever larger memories.
Larger Domains
Table look-up is applicable when the argument is not a substantial subset of a large domain and there is a fast way to detect that it is not.
i4s←1e7+?1e6⍴1e5 ⍝ small-range 4-byte integers
G←{(⊂u⍳⍵)⌷⍺⍺ u←∪⍵}
cmpx '⍟i4s' '⍟G i4s'
⍟i4s → 1.44E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
⍟G i4s → 9.59E¯3 | -34% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
The definition of G
implements that the argument is not used directly as indices (as in (⊂⍵)⌷⍺⍺ u
) but needed to be looked up, the u⍳⍵
in (⊂u⍳⍵)⌷⍺⍺ u
. Thus both index-of and indexing are key enablers for making table look-up competitive.
§4.3 of Notation as a Tool of Thought presents a different way to distribute the results of a calculation on the “nub” (unique items). But it takes more time and space and is limited to numeric scalar results.
H←{(⍺⍺ u)+.×(u←∪⍵)∘.=⍵}
i4a←1e7+?1e4⍴1e3 ⍝ small-range 4-byte integers
cmpx '⍟i4a' '⍟G i4a' '⍟H i4a'
⍟i4a → 1.56E¯4 | 0%
⍟G i4a → 6.25E¯5 | -60%
⍟H i4a → 1.78E¯1 | +113500% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
The benchmark is done on the smaller i4a
as a benchmark on i4s
founders on the ≢∪⍵
by ≢⍵
boolean matrix (for i4s
, 12.5 GB = ×/ 0.125 1e5 1e6
) created by H
.
Mathematical Background
“New Math” teaches that a function is a set of ordered pairs whose first items are unique:
*⍵: {(⍵,*⍵)|⍵∊C}
The set of all (⍵,*⍵)
where ⍵
is a complex number
⍟⍵: {(⍵,⍟⍵)|⍵∊C~{0}}
The set of all (⍵,⍟⍵)
where ⍵
is a non-zero complex number
○⍵: {(⍵,○⍵)|⍵∊C}
The set of all (⍵,○⍵)
where ⍵
is a complex number
⍕⍵: {(⍵,⍕⍵)|0=⍴⍴⍵}
The set of all (⍵,⍕⍵)
where ⍵
is a scalar
When ⍵
is restricted (for example) to the boolean domain, the functions, the sets of ordered pairs, are more simply represented by enumerating all the possibilities:
*⍵: {(0,1), (1,2.71828)}
⍟⍵: {(0,Err), (1,0)}
○⍵: {(0,0), (1,3.14159)}
⍕⍵: {(0,'0'), (1,'1')}
Realized and Potential Performance Improvements
realized (available no later than 15.0)
boolean 1-byte integer |
⍕ ∘.f ⍕ a∘⍕ |
potential (non-exhaustive list)
boolean 1-byte integer 2-byte integer small-range 4-byte integer |
* ⍟ | - ○ ! ⍳ a∘f f.g * ⍟ | - ○ × ! ⍳ a∘f f.g ∘.f * ⍟ | - ○ × ⍳ a∘f * ⍟ | × |
a
is a scalar; f
and g
are primitive scalar dyadic functions.