Rank & Friends
Roger Hui
Definitions
≢y ←→ ⍬⍴(⍴y),1 |
|
f⍤r ⊢y |
←→ |
f
on the rank-r cells of y |
x f⍤(rx,ry) ⊢y |
←→ |
f on the
rank-rx cells of x
and the corresponding
rank-ry cells of y
|
|
x f⌸ y |
←→ |
Major cells of x specify keys for corresponding major cells of y ,
and f is applied to each unique key in x
and the major cells of y having that key. |
f⌸ y |
←→ |
y f⌸ ⍳≢y |
|
|
x⍳y ←→ (⊂⍤¯1⊢x) ⍳ ⊂⍤(¯1+⍴⍴x)⊢y |
I won’t go over these definitions in detail,
because it’s best to learn them by examples
that are meaningful to you.
One thing to note: major cells play a key role.
The major cells of an array are the subarrays
of rank r-1 on the leading axis of the array.
Major cells are also called ¯1-cells.
The major cells of a vector are its scalars;
the major cells of a matrix are the rows;
the major cells of a 3-d array are matrices;
and so on. A scalar has one major cell, itself.
Tally y (≢y) is the number of major cells
of y . In the extended
index-of, x⍳y
looks for major cells of x .
Tally
≢ 'chthonic'
8
≢ 2 3⍴⍳6
2
≢ 3.14159
1
≢¨ (1 2 3) 'apple' (○1)
3 5 1
⍴¨ (1 2 3) 'apple' (○1)
┌─┬─┬┐
│3│5││
└─┴─┴┘
Tally is the leading element of the shape, or 1 for a scalar.
The fact that tally is a scalar is quite handy.
Rank I
Q ⊂⍤2 ⊢Q
John ┌──────┐
Mary │John │
Monika │Mary │
Min │Monika│
│Min │
└──────┘
⊂⍤1 ⊢Q
┌──────┬──────┬──────┬──────┐
│John │Mary │Monika│Min │
└──────┴──────┴──────┴──────┘
⊂⍤¯1 ⊢Q
┌──────┬──────┬──────┬──────┐
│John │Mary │Monika│Min │
└──────┴──────┴──────┴──────┘
I won’t say very much on the rank operator,
except that using enclose (⊂) as the operand function
clarifies how rank works.
Rank II
Q Q {⍺⍵}⍤1 ⍳4
John ┌──────┬───────┐
Mary │John │0 1 2 3│
Monika ├──────┼───────┤
Min │Mary │0 1 2 3│
├──────┼───────┤
│Monika│0 1 2 3│
├──────┼───────┤
│Min │0 1 2 3│
└──────┴───────┘
Q {⍺⍵}⍤1 0 ⍳4 Q {⍺⍵}⍤¯1 ⍳4
┌──────┬─┐ ┌──────┬─┐
│John │0│ │John │0│
├──────┼─┤ ├──────┼─┤
│Mary │1│ │Mary │1│
├──────┼─┤ ├──────┼─┤
│Monika│2│ │Monika│2│
├──────┼─┤ ├──────┼─┤
│Min │3│ │Min │3│
└──────┴─┘ └──────┴─┘
Likewise, in the dyadic case using
link ({⍺⍵})
as the operand function clarifies how rank works.
Key: # of Occurrences I
# of occurrences of unique elements of x←?1e6⍴1000
¯1+{≢⍵}⌸(⍳1000),x
1009 1007 1060 981 1007 1037 1065 1042 1043
a. | ¯1+{≢⍵}⌸(⍳1000),x |
b. | c ⊣ c[x]+←1 ⊣ c←1000⍴0 |
c. | -2-/¯1,(s≠1↓s,¯1)/⍳⍴s←{⍵[⍋⍵]}x |
d. | +/(⍳1000)∘.=x |
This is a problem encountered in July in a tuning project with a client.
The solution using key is faster and shorter than
the practical alternatives not using key.
The outer product solution is shorter,
but is not practical because its performance is abysmal.
Key: # of Occurrences II
?0
0.549364
?2 3⍴0
0.384692 0.331927 0.026112
0.773553 0.784359 0.0516709
{⍺(≢⍵)}⌸ ⌊5×?1e6⍴0
3 199876
4 200555
1 199937
2 199660
0 199972
In v14.0, ?0 returns a uniform random number between 0 and 1.
Tally with key is useful for testing this new facility:
If r is a unform random number between 0 and 1,
then ⌊5×r should be uniformly distributed
among ⍳5 .
And so it is, for this sample.
In this use of key,
the left argument ⍺ is a unique key
and the right argument ⍵ are the indices having that key.
Extended ⍳
a b a ⍳ b
John Min 3 1 0 2 5 5
Mary Mary
Monika John
Min Monika
Max Mesut
Mesut
Index-of ⍳ has been extended
to look for major cells of the left argument:
scalars in a vector,
vectors in a matrix,
matrices in a 3-d array, and so on.
a⍳b is an efficient encoding of
the relationship between a and b ,
and can be used to answer questions of identity
about a and b .
An example from the real world are ID numbers, Social Security Numbers in the US
and National Insurance Numbers in the UK.
We will illustrate the workings of the extended index-of
with an extended example,
encountered this past September in working with a client.
Table
x y x ⍳ y
┌──────┬─┬───┬──┐ ┌──────┬─┬───┬──┐ 3 1 5 2 5 5
│John │M│USA│26│ │Min │F│CN │17│
├──────┼─┼───┼──┤ ├──────┼─┼───┼──┤ x ⍳ x
│Mary │F│UK │24│ │Mary │F│UK │24│ 0 1 2 3 4
├──────┼─┼───┼──┤ ├──────┼─┼───┼──┤
│Monika│F│DE │31│ │John │M│UK │26│ y ⍳ y
├──────┼─┼───┼──┤ ├──────┼─┼───┼──┤ 0 1 2 3 4 4
│Min │F│CN │17│ │Monika│F│DE │31│
├──────┼─┼───┼──┤ ├──────┼─┼───┼──┤
│Max │M│IT │29│ │Mesut │M│DE │24│
└──────┴─┴───┴──┘ ├──────┼─┼───┼──┤
│Mesut │M│DE │24│
└──────┴─┴───┴──┘
A table is a set of values organized into rows and columns.
The rows are records.
Values in a column have the same type and shape.
A table has a specified number of columns, but can have any number of rows.
The extended index-of on tables finds record indices.
Inverted Table
tx x
┌──────┬─┬───┬──┐ ┌──────┬─────┬───┬──────────────┐
│John │M│USA│26│ │John │MFFFM│USA│26 24 31 17 29│
├──────┼─┼───┼──┤ │Mary │ │UK │ │
│Mary │F│UK │24│ │Monika│ │DE │ │
├──────┼─┼───┼──┤ │Min │ │CN │ │
│Monika│F│DE │31│ │Max │ │IT │ │
├──────┼─┼───┼──┤ └──────┴─────┴───┴──────────────┘
│Min │F│CN │17│ ≢¨x ⍪¨x
├──────┼─┼───┼──┤ 5 5 5 5 ┌──────┬─┬───┬──┐
│Max │M│IT │29│ │John │M│USA│26│
└──────┴─┴───┴──┘ │Mary │F│UK │24│
│Monika│F│DE │31│
invert ← {↑¨↓⍉⍵} │Min │F│CN │17│
vert ← {⍉↑⊂⍤¯1¨⍵} │Max │M│IT │29│
└──────┴─┴───┴──┘
x ≡ invert tx
1
tx ≡ vert x
1
An inverted table is a table with the values of a column collected together.
Comma-bar each (⍪¨) applied to an inverted table
makes it “look” more like a table.
And of course the columns have the same tally.
A table can be readily inverted and vice versa.
Notice the use of rank in the function vert .
Inverted Table: Space
⎕size 'tx' 'x'
920 272
ii ← invert tt ← 1000⌿tx
⎕size 'tt' 'ii'
880040 55208
÷/ ⎕size 'tt' 'ii'
15.9404
A table has array overhead per element.
An inverted table has array overhead per column.
The difference that this makes becomes apparent
when you have a sufficiently large number of rows.
Inverted Table: Time
tt←?1e6 2⍴100
tt[;0]←1e6⍴'abc'
ii←invert tt
cmpx '50>1⊃ii' '50>tt[;1]'
50>1⊃ii → 9.61E¯4 | 0% ⎕⎕
50>tt[;1] → 1.42E¯2 | +1376% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
1.42e¯2 ÷ 9.61e¯4
14.7763
⎕size 'tt' 'ii'
80000040 2000112
÷/ ⎕size 'tt' 'ii'
39.9978
The other advantage of an inverted table is that
column access is much faster.
In this example, faster by a factor of 15.
Inverted Table: Index-Of I
⍪¨x
┌──────┬─┬───┬──┐
│John │M│USA│26│
│Mary │F│UK │24│ x ⍳ y
│Monika│F│DE │31│ 4 4 4 4
│Min │F│CN │17│
│Max │M│IT │29│ (vert x) ⍳ (vert y)
└──────┴─┴───┴──┘ 3 1 5 2 5 5
⍪¨y
┌──────┬─┬───┬──┐
│Min │F│CN │17│
│Mary │F│UK │24│
│John │M│UK │26│
│Monika│F│DE │31│
│Mesut │M│DE │24│
│Mesut │M│DE │24│
└──────┴─┴───┴──┘
An important computation is x index-of y
where x and y are compatible inverted tables.
Obviously, it can not be just x⍳y .
The computation obtains by first “verting”
the arguments (un-inverting the tables)
and then applying ⍳ ,
but often there is not enough space for that.
Inverted Table: Index-Of II
(vert x) ⍳ (vert y)
({⍉↑⊂⍤¯1¨⍵}x) ⍳ ({⍉↑⊂⍤¯1¨⍵}y)
(⍉↑⊂⍤¯1¨x) ⍳ (⍉↑⊂⍤¯1¨y)
(⍉↑x⍳¨x) ⍳ (⍉↑x⍳¨y)
⊂⍤¯1⊢x0←0⊃x
┌──────┬──────┬──────┬──────┬──────┐
│John │Mary │Monika│Min │Max │
└──────┴──────┴──────┴──────┴──────┘
x0 ⍳ x0
0 1 2 3 4
⊂⍤¯1⊢y0←0⊃y
┌──────┬──────┬──────┬──────┬──────┬──────┐
│Min │Mary │John │Monika│Mesut │Mesut │
└──────┴──────┴──────┴──────┴──────┴──────┘
x0 ⍳ y0
3 1 0 2 5 5
We derive a more efficient computation
of index-of on inverted tables.
The indices obtain by first uninverting the tables,
that is, by first applying vert .
Replacing vert by its definition,
we see that ⊂⍤¯1 plays a major
role. ⊂⍤¯1 encloses, or alternatively
computes a scalar representation.
For purposes of index-of x⍳¨x and x⍳¨y are
much more efficient representations (small integers)
than ⊂⍤¯1¨x and ⊂⍤¯1¨y (the data itself).
The point is illustrated on column 0:
For purposes of index-of, the integers
have the same information as the boxed data.
Inverted Table: Index-Of III
⍪¨x ⍉↑x⍳¨x
┌──────┬─┬───┬──┐
│John │M│USA│26│ 0 0 0 0
│Mary │F│UK │24│ 1 1 1 1
│Monika│F│DE │31│ 2 1 2 2
│Min │F│CN │17│ 3 1 3 3
│Max │M│IT │29│ 4 0 4 4
└──────┴─┴───┴──┘
⍪¨y ⍉↑x⍳¨y
┌──────┬─┬───┬──┐
│Min │F│CN │17│ 3 1 3 3
│Mary │F│UK │24│ 1 1 1 1
│John │M│UK │26│ 0 0 1 0
│Monika│F│DE │31│ 2 1 2 2
│Mesut │M│DE │24│ 5 0 2 1
│Mesut │M│DE │24│ 5 0 2 1
└──────┴─┴───┴──┘
(vert x)⍳(vert y) (⍉↑x⍳¨x) ⍳ (⍉↑x⍳¨y)
3 1 5 2 5 5 3 1 5 2 5 5
What results from ⍉↑x⍳¨x and ⍉↑x⍳¨y
are integer matrices;
index-of on these matrices produces the
required record indices.
The expression in red computes index-of on inverted tables.
Extended ⍳ : Tool of Thought
{(⍉↑⍺⍳¨⍺)⍳(⍉↑⍺⍳¨⍵)}
MatIota←{(↓⍺)⍳↓⍵}
{(⍉↑⍺ MatIota¨⍺) MatIota (⍉↑⍺ MatIota¨⍵)}
{(⍉↑⍺{(↓⍺)⍳↓⍵}¨⍺) {(↓⍺)⍳↓⍵} (⍉↑⍺{(↓⍺)⍳↓⍵}¨⍵)}
The short Dfn replaced a 28-line tradfn in the client application,
and is 20% faster.
We propose to make it an idiom.
Without the extension to ⍳
the algorithm would have been harder to see.
The extension to index-of makes it a better tool of thought.
We need a similar extension for sort: you know,
extended ⍳ , find anything;
extended sort, sort anything.
The Primitive Performance talk this afternoon will
have more examples of using the new language facilities
in v14.0.
created: | 2013-10-06 10:00 |
updated: | 2013-10-22 07:25 |
|