Special Code in APL Primitives
Roger Hui
0. Introduction
Many APL primitives contain special code for some arguments to effect time
and/or space savings not available to general arguments.
Moreover, some phrases are “recognized” and are supported by special code.
For example, the fork x(1 ⍳⍨ >)y stops on the first pair of corresponding elements
for which the condition is true,
and in so doing saves both time and space:
)copy dfns cmpx wsreq
x←?1e6⍴2e6
cmpx '1e6(1⍳⍨>)x' '(1e6>x)⍳1' ⍝ early exit
1e6(1⍳⍨>)x → 4.88E¯7 | 0%
(1e6>x)⍳1 → 1.10E¯3 | +225700% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
wsreq '1e6(1⍳⍨>)x'
852
wsreq '(1e6>x)⍳1'
124572
cmpx '¯5(1⍳⍨>)x' '(¯5>x)⍳1' ⍝ compare to bitter end
¯5(1⍳⍨>)x → 6.76E¯4 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(¯5>x)⍳1 → 1.09E¯3 | +61% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
wsreq '¯5(1⍳⍨>)x'
932
wsreq '(¯5>x)⍳1'
124572
|
1. Special Code List
Instances of special code are listed below.
With a few exceptions idioms are not included.
← +-×÷*⍟⌹○!? |⌈⌊⊥⊤⊣⊢ =≠≤<>≥≡≢ ∨∧⍲⍱ ↑↓⊂⊃⌷⍋⍒ ⍳⍷∪∩∊~
/\⌿⍀ ,⍪⍴⌽⊖⍉ ¨⍨⍣.∘⍤ ⍞⎕⍠⌸⌶⍕⍎ ⋄⍝→⍵⍺∇& ¯⍬ []
|
? | monad | 14.0 |
result in smallest datatype; in particular ?n⍴2
has datatype 11 when 0=⎕io |
n|y | dyad | 13.2 |
for scalar n which is ¯1, 0, 1, or a power of 2 and integer array y |
⌈y | monad | 14.0 |
uses SSE 4.1 instructions; also ⌊y |
|
2⊥b | dyad | 13.1 |
for y with datatype 11 |
x⊥y | dyad | 15.0 |
for scalar or vector x |
b⊥b | dyad | 15.0 |
for boolean vector b |
x⊤y | dyad | 15.0 |
for integer vector x and integer array y |
(n⍴2)⊤y | dyad | 13.2 |
for integer array y ; result has datatype 11 |
|
x≡⍤1⊢v | dyad | 14.1 |
for v with 1,2,4, or 8 bytes; also v≡⍤1⊢x |
|
↑ | monad | 15.0 |
for elements with the same rank, and same shape except for leading axis |
↑[0.5] | monad | 14.0 |
|
⊂⍤r | monad | 15.0 |
same as ⊂[(-r)↑⍳⍴⍴x]x |
⊂[1] | dyad | 14.1 |
for leading axis |
(≢¨⊂) | dyad | 15.0 |
(f/¨⊂) | dyad | 15.0 |
for f one of + × ⌈ ⌊ ∧ ∨ and numeric vector right arguments |
|
(⊂x)⌷y | dyad | 15.0 |
also y[x;
;] |
|
⍋ | monad | 12.1 |
for 1-bit and 1-, 2-, 4-, and 8-byte items; for small range 4-byte items;
also ⍒ instead of ⍋ |
{⍵[⍋⍵]} | idiom | 12.1 |
for 1-bit and 1-, 2-, 4-, and 8-byte items; for small range 4-byte items; also ⍒
instead of ⍋ ;
faster than ⍋⍵ alone |
|
⍳ | dyad | 14.0 |
for 1- and 2-byte items |
⍳ | dyad | 14.1 |
for 4-byte simple items; for 4-byte small range items |
⍳ | dyad | 14.1 |
for k-byte simple items, with 0=⎕ct if floating point or complex;
uses CRC32 instruction |
⍳ | dyad | 13.0 |
for 8-byte float and 16-byte complex vectors |
⍳ | dyad | 14.1 |
for relational matrices; uses CRC32 instruction |
x⍳x | dyad | 14.1 |
for left and right argument being the same C pointer |
b⍳1 | dyad | 14.0 |
also 0 instead of 1 |
⍳⍤1 0 | dyad | 15.0 |
(⍳∘1 c) | dyad | 14.0 |
see 1⍳⍨c |
(1⍳⍨c) | dyad | 14.0 |
c is one of = ≠ ≤ < > ≥ ;
also (⍳∘1 c) ;
also 0 instead of 1 |
|
f/[a] | monad | |
for f a scalar dyadic function |
f/[a]b | monad | |
for f one of + ∧ ∨ = ≠ ;
faster if stride 1 or a multiple of 8 |
+/[a]b | monad | 14.0 |
exploits POPCNT instruction |
⌈/y | monad | 14.1 |
uses vector instructions for integer arguments; also ⌊ instead of ⌈ |
(f/ c) | dyad | 14.0 |
for f one of + ∨ ∧ ; c is one of = ≠ ≤ < > ≥ |
(f/¨⊂) | dyad | 15.0 |
for f one of + × ⌈ ⌊ ∨ ∧ and numeric vector right arguments |
|
2 f/y | dyad | 15.0 |
for f one of + - × ⌈ ⌊ = ≠ ≤ < > ≥ ∨ ∧ ⍲ ⍱ and vector y |
n f/[a]y | dyad | 15.0 |
for f one of + ⌈ ⌊ = ≠ , |
|
/⍳ | idiom | 14.0 |
also /⍳⍴ ; result in smallest datatype |
|
f\[a]y | monad | |
for f one of + × ⌈ ⌊ |
f\[a]y | monad | 15.0 |
for f one of - ÷ |
+\[a]b | monad | 14.0 |
for stride a multiple of 8 |
f\[a]b | monad | 12.1 |
for f one of = ≠ ≤ < ∨ ∧ ;
for stride 1 or a multiple of 8 |
f\[a]b | monad | 14.0 |
for f one of > ≥ ⍲ ⍱ and stride 1 and ¯1↑⍴b is a multiple of 8 |
|
0 0⍴y | dyad | 14.0 |
for large y |
|
⍉b | dyad | 14.1 |
for boolean matrices where 0=8|⍴b ; exploits BMI2 instructions |
|
. | dyad | |
+.× |
| dyad | 14.1 |
+.= | also +.≠ |
| dyad | 14.1 |
+.≡ | for nested arguments; also +.≢ |
| dyad | 14.0 |
+.c | for c one of < ≤ = ≠ ≥ > ; also ∧.c and ∨.c |
| dyad | 14.1 |
x∧.=v | for v with 1,2,4, or 8 bytes |
| dyad | 14.1 |
∧.≡ | for nested arguments; also ∨.≢ |
| dyad | 15.0 |
∨.∨ | also ∧.∧ |
|
∘.f | dyad | | ∘.f |
for f a scalar dyadic function |
| dyad | 14.1 | ∘.≡ |
for nested arguments; also ∘.≢ |
| dyad | 15.0 | ∘., |
| dyad | 15.0 | b∘.f y |
for f a scalar dyadic function (except ÷ ⍟) |
|
⍤ | monad | 14.0 |
| ⌹ ⊂ , ⍪ ⊖ ⍉ {⍵[⍋⍵]} {⍵[⍒⍵]} |
| dyad | 14.0 |
| ⌹ ≡ ≢ ↑ ↓ / ⌿ , ⍪ ⍉ {⍺⍵} |
| dyad | 14.0 |
f⍤r | for f a primitive scalar dyadic function |
| dyad | 15.0 |
⌷⍤0 255 |
| dyad | 15.0 |
⍳⍤1 0 |
| monad | 14.0 |
f⌿⍤r | for f a primitive scalar dyadic function |
| monad | 14.0 |
f⍀⍤r | for f a primitive scalar dyadic function |
|
⌸ | | 14.0 |
{f⌿⍵} |
for f one of + ⌈ ⌊ or (for boolean right arguments) one of ∧ ∨ = ≠ ;
also / instead of ⌿ for vector right arguments |
| | 15.0 |
{⍺(f⌿⍵)} |
for f as above |
| | 15.0 |
{⍺,f⌿⍵} |
for f as above and numeric left arguments |
| | 14.0 |
{≢⍵} |
| | 14.1 |
{⍺(≢⍵)} |
|
| | 14.1 |
{⍺,≢⍵} |
for numeric left argument |
| | 14.1 |
{≢∪⍵} |
|
| | 14.1 |
{⍺(≢∪⍵)} |
| | 14.1 |
{⍺,≢∪⍵} |
for numeric left argument |
| | 14.0 |
{⊂⍵} |
| | 14.1 |
{⍺⍵} | |
| | 14.1 |
{⍺} | see ⊣⌸ |
| | 14.1 |
⊣ | ⊣⌸x ←→ ∪x if ∪ were extended like ⍳ |
|
8⌶ | dyad | 14.0 |
exploits CRC32 instruction; also left and right argument being the same C pointer |
8⌶ | dyad | 14.1 |
for all columns being numeric vectors |
|
⍕b | monad | 15.0 |
⍕y | monad | 15.0 |
for y with datatype 83 (1-byte integer) |
x⍕b | dyad | 15.0 |
for x with 1 or 2 elements |
x⍕y | dyad | 15.0 |
for y with datatype 83 (1-byte integer) and x with 2 elements, the first of which is non-zero |
|
x[y;
;] | dyad | 15.0 |
also (⊂y)⌷x |
x[b;
;] | dyad | 13.2 |
for 0=⎕io ; also (⊂b)⌷x |
b[;
;y] | dyad | 15.0 |
|
(≢¨⊂) | dyad | 15.0 |
(f/¨⊂) | dyad | 15.0 |
for f one of + × ⌈ ⌊ ∨ ∧ and numeric vector right arguments |
(≢∪) | monad | 14.1 |
for 1-bit and 1-, 2-, and 4-byte items; for small range 4-byte items |
(⍳∘1 c) | dyad | 14.0 |
see 1⍳⍨c |
(f/ c) | dyad | 14.0 |
for f one of + ∨ ∧ ; c is one of = ≠ ≤ < > ≥ |
|
(1⍳⍨c) | dyad | 14.0 |
c is one of = ≠ ≤ < > ≥ ;
also (⍳∘1 c) ;
also 0 instead of 1 |
← +-×÷*⍟⌹○!? |⌈⌊⊥⊤⊣⊢ =≠≤<>≥≡≢ ∨∧⍲⍱ ↑↓⊂⊃⌷⍋⍒ ⍳⍷∪∩∊~
/\⌿⍀ ,⍪⍴⌽⊖⍉ ¨⍨⍣.∘⍤ ⍞⎕⍠⌸⌶⍕⍎ ⋄⍝→⍵⍺∇& ¯⍬ []
|
2. Glossary and Notes
|
b | | boolean array |
a | | axis specification |
x y | | array |
r | | rank specification |
v | | vector |
n | | positive integer scalar |
f | | function |
c | | comparison function |
|
| The text uses 1-origin indexing
unless stated otherwise;
special code is provided for both index origins unless stated otherwise.
|
| Integer array x is small range
if its range is small relative to its length;
i.e. (k×≢,x)≥1+(⌈/,x)-⌊/,x for small k (2 or less, say).
|
| Datatype is one of the internal datatypes (as in ⎕dr):
| 11 | 1-bit boolean | 80 | 1-byte character |
| 83 | 1-byte integer | 160 | 2-byte character |
| 163 | 2-byte integer | 320 | 4-byte character |
| 323 | 4-byte integer |
| 645 | 8-byte floating point | 326 | pointer (4- or 8-byte) |
| 1289 | 16-byte complex |
| 1287 | 128-bit decimal floating point |
|
| Type is numeric, character, or pointer;
thus for example 1 0 and ¯4 5 have the same type but different datatypes (11 and 83, respectively).
|
| A numeric result is in the smallest datatype
if it is created in that smallest datatype rather than created in a more general datatype and then squeezed down.
|
| The left and right argument being the same C pointer is a special case
of the left and right arguments being the same.
They would be the same C pointer in x f x or f⍨x , but not (for example)
in x f (⍴x)⍴x .
Their being the same C pointer is efficiently detectable; their having the same value
is expensive to ascertain
and the interpreter does not detect it for purposes of special code.
|
| Stride in a boolean array is the distance from one bit to the next.
For example, if b←1=?10 56⍴2 , then in +/b
the stride is 1 and in +⌿b the stride is 56.
|
| The version numbers (14.0, 13.2, etc.) indicate when
the special code was first implement.
A blank version number indicates that the code pre-dates version 12.1 (circa 2008).
|
| The special code list contains
a few duplicate entries
because some expressions can be classified in more than one way.
|
3. The Fine Print
The special code list is not a guarantee,
probably will never be complete,
may change at any time,
and the performance characteristics may change as CPUs change.
Bibliography
[0] |
Hui, R.K.W., and K.E. Iverson,
J Introduction and Dictionary,
1991-2015; Appendix B. Special Code.
|
[1] |
Hui, R.K.W.
Performance
Tips, Dyalog ’12, Elsinore, Denmark, 2012-10-14.
| [2] |
Hui, R.K.W.,
Index-Of, A 30-Year Quest,
J Conference 2014, 2014-07-25.
|
[3] |
Hui, R.K.W.,
A Speed-Up Story,
Dyalog Blog Post, 2014-11-05.
|
[4] |
Hui, R.K.W.,
In Praise of Magic Functions,
Dyalog Blog Post, 2015-06-22.
|
Presented at Dyalog ’15, Naxos Beach, Sicily, 2015-09-10.
created: | 2014-04-24 16:50 |
updated: | |
|