Rank & Friends
Roger Hui



Definitions

≢y ←→ ⍬⍴(⍴y),1
 
  f⍤r ⊢y ←→  f on the rank-r cells of y
x f⍤(rx,ry) ⊢y ←→ f on the rank-rx cells of x and the corresponding
rank-ry cells of y
 
x f⌸ y   ←→  Major cells of x specify keys for corresponding major cells of y , and f is applied to each unique key in x and the major cells of y having that key.
  f⌸ y  ←→  y f⌸ ⍳≢y
 
x⍳y ←→ (⊂⍤¯1⊢x) ⍳ ⊂⍤(¯1+⍴⍴x)⊢y

I won’t go over these definitions in detail, because it’s best to learn them by examples that are meaningful to you.

One thing to note: major cells play a key role. The major cells of an array are the subarrays of rank r-1 on the leading axis of the array. Major cells are also called ¯1-cells. The major cells of a vector are its scalars; the major cells of a matrix are the rows; the major cells of a 3-d array are matrices; and so on. A scalar has one major cell, itself.

Tally y (≢y) is the number of major cells of y . In the extended index-of, x⍳y looks for major cells of x .



Tally

   ≢ 'chthonic'
8
   ≢ 2 3⍴⍳6
2
   ≢ 3.14159
1

   ≢¨ (1 2 3) 'apple' (○1)
3 5 1
   ⍴¨ (1 2 3) 'apple' (○1)
┌─┬─┬┐
│3│5││
└─┴─┴┘

Tally is the leading element of the shape, or 1 for a scalar. The fact that tally is a scalar is quite handy.



Rank I

   Q            ⊂⍤2 ⊢Q
John         ┌──────┐
Mary         │John  │
Monika       │Mary  │
Min          │Monika│
             │Min   │
             └──────┘
   ⊂⍤1 ⊢Q
┌──────┬──────┬──────┬──────┐
│John  │Mary  │Monika│Min   │
└──────┴──────┴──────┴──────┘
   ⊂⍤¯1 ⊢Q
┌──────┬──────┬──────┬──────┐
│John  │Mary  │Monika│Min   │
└──────┴──────┴──────┴──────┘

I won’t say very much on the rank operator, except that using enclose () as the operand function clarifies how rank works.



Rank II

   Q                      Q {⍺⍵}⍤1 ⍳4
John                  ┌──────┬───────┐
Mary                  │John  │0 1 2 3│
Monika                ├──────┼───────┤
Min                   │Mary  │0 1 2 3│
                      ├──────┼───────┤
                      │Monika│0 1 2 3│
                      ├──────┼───────┤
                      │Min   │0 1 2 3│
                      └──────┴───────┘

   Q {⍺⍵}⍤1 0 ⍳4         Q {⍺⍵}⍤¯1 ⍳4
┌──────┬─┐            ┌──────┬─┐
│John  │0│            │John  │0│
├──────┼─┤            ├──────┼─┤
│Mary  │1│            │Mary  │1│
├──────┼─┤            ├──────┼─┤
│Monika│2│            │Monika│2│
├──────┼─┤            ├──────┼─┤
│Min   │3│            │Min   │3│
└──────┴─┘            └──────┴─┘

Likewise, in the dyadic case using link ({⍺⍵}) as the operand function clarifies how rank works.



Key: # of Occurrences I

# of occurrences of unique elements of x←?1e6⍴1000

   ¯1+{≢⍵}⌸(⍳1000),x
1009 1007 1060 981 1007 1037 1065 1042 1043 …
a.    ¯1+{≢⍵}⌸(⍳1000),x
b.c ⊣ c[x]+←1 ⊣ c←1000⍴0
c.-2-/¯1,(s≠1↓s,¯1)/⍳⍴s←{⍵[⍋⍵]}x
d.+/(⍳1000)∘.=x

This is a problem encountered in July in a tuning project with a client. The solution using key is faster and shorter than the practical alternatives not using key. The outer product solution is shorter, but is not practical because its performance is abysmal.



Key: # of Occurrences II

   ?0
0.549364

   ?2 3⍴0
0.384692 0.331927 0.026112 
0.773553 0.784359 0.0516709

   {⍺(≢⍵)}⌸ ⌊5×?1e6⍴0
3 199876
4 200555
1 199937
2 199660
0 199972

In v14.0, ?0 returns a uniform random number between 0 and 1. Tally with key is useful for testing this new facility: If r is a unform random number between 0 and 1, then ⌊5×r should be uniformly distributed among ⍳5 . And so it is, for this sample.

In this use of key, the left argumentis a unique key and the right argumentare the indices having that key.



Extended

   a           b             a ⍳ b
John        Min           3 1 0 2 5 5
Mary        Mary  
Monika      John  
Min         Monika
Max         Mesut 
            Mesut 

Index-ofhas been extended to look for major cells of the left argument: scalars in a vector, vectors in a matrix, matrices in a 3-d array, and so on.

a⍳b is an efficient encoding of the relationship between a and b , and can be used to answer questions of identity about a and b . An example from the real world are ID numbers, Social Security Numbers in the US and National Insurance Numbers in the UK.

We will illustrate the workings of the extended index-of with an extended example, encountered this past September in working with a client.



Table

   x                  y                  x ⍳ y
┌──────┬─┬───┬──┐  ┌──────┬─┬───┬──┐   3 1 5 2 5 5
│John  │M│USA│26│  │Min   │F│CN │17│
├──────┼─┼───┼──┤  ├──────┼─┼───┼──┤      x ⍳ x
│Mary  │F│UK │24│  │Mary  │F│UK │24│   0 1 2 3 4
├──────┼─┼───┼──┤  ├──────┼─┼───┼──┤
│Monika│F│DE │31│  │John  │M│UK │26│      y ⍳ y
├──────┼─┼───┼──┤  ├──────┼─┼───┼──┤   0 1 2 3 4 4
│Min   │F│CN │17│  │Monika│F│DE │31│
├──────┼─┼───┼──┤  ├──────┼─┼───┼──┤
│Max   │M│IT │29│  │Mesut │M│DE │24│
└──────┴─┴───┴──┘  ├──────┼─┼───┼──┤
                   │Mesut │M│DE │24│
                   └──────┴─┴───┴──┘

A table is a set of values organized into rows and columns. The rows are records. Values in a column have the same type and shape. A table has a specified number of columns, but can have any number of rows.

The extended index-of on tables finds record indices.



Inverted Table

   tx                    x
┌──────┬─┬───┬──┐     ┌──────┬─────┬───┬──────────────┐
│John  │M│USA│26│     │John  │MFFFM│USA│26 24 31 17 29│
├──────┼─┼───┼──┤     │Mary  │     │UK │              │
│Mary  │F│UK │24│     │Monika│     │DE │              │
├──────┼─┼───┼──┤     │Min   │     │CN │              │
│Monika│F│DE │31│     │Max   │     │IT │              │
├──────┼─┼───┼──┤     └──────┴─────┴───┴──────────────┘
│Min   │F│CN │17│        ≢¨x         ⍪¨x                
├──────┼─┼───┼──┤     5 5 5 5     ┌──────┬─┬───┬──┐     
│Max   │M│IT │29│                 │John  │M│USA│26│
└──────┴─┴───┴──┘                 │Mary  │F│UK │24│
                                  │Monika│F│DE │31│
   invert ← {↑¨↓⍉⍵}               │Min   │F│CN │17│
   vert   ← {⍉↑⊂⍤¯1¨⍵}            │Max   │M│IT │29│
                                  └──────┴─┴───┴──┘
   x  ≡ invert tx
1
   tx ≡ vert x
1

An inverted table is a table with the values of a column collected together. Comma-bar each (⍪¨) applied to an inverted table makes it “look” more like a table. And of course the columns have the same tally.

A table can be readily inverted and vice versa. Notice the use of rank in the function vert .



Inverted Table: Space

   ⎕size 'tx' 'x'
920 272

   ii ← invert tt ← 1000⌿tx

   ⎕size 'tt' 'ii'
880040 55208
   ÷/ ⎕size 'tt' 'ii'
15.9404

A table has array overhead per element. An inverted table has array overhead per column. The difference that this makes becomes apparent when you have a sufficiently large number of rows.



Inverted Table: Time

   tt←?1e6 2⍴100
   tt[;0]←1e6⍴'abc'
   ii←invert tt

   cmpx '50>1⊃ii' '50>tt[;1]'
50>1⊃ii   → 9.61E¯4 |     0% ⎕⎕
50>tt[;1] → 1.42E¯2 | +1376% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

   1.42e¯2 ÷ 9.61e¯4
14.7763

   ⎕size 'tt' 'ii'
80000040 2000112
   ÷/ ⎕size 'tt' 'ii'
39.9978

The other advantage of an inverted table is that column access is much faster. In this example, faster by a factor of 15.



Inverted Table: Index-Of  I

      ⍪¨x
┌──────┬─┬───┬──┐
│John  │M│USA│26│
│Mary  │F│UK │24│        x ⍳ y
│Monika│F│DE │31│     4 4 4 4
│Min   │F│CN │17│
│Max   │M│IT │29│        (vert x) ⍳ (vert y)
└──────┴─┴───┴──┘     3 1 5 2 5 5
      ⍪¨y
┌──────┬─┬───┬──┐
│Min   │F│CN │17│
│Mary  │F│UK │24│
│John  │M│UK │26│
│Monika│F│DE │31│
│Mesut │M│DE │24│
│Mesut │M│DE │24│
└──────┴─┴───┴──┘

An important computation is x index-of y where x and y are compatible inverted tables. Obviously, it can not be just x⍳y . The computation obtains by first “verting” the arguments (un-inverting the tables) and then applying , but often there is not enough space for that.



Inverted Table: Index-Of  II

(vert x)      ⍳ (vert y)
({⍉↑⊂⍤¯1¨⍵}x) ⍳ ({⍉↑⊂⍤¯1¨⍵}y)
(⍉↑⊂⍤¯1¨x)    ⍳ (⍉↑⊂⍤¯1¨y)
(⍉↑x⍳¨x)      ⍳ (⍉↑x⍳¨y)

   ⊂⍤¯1⊢x0←0⊃x
┌──────┬──────┬──────┬──────┬──────┐
│John  │Mary  │Monika│Min   │Max   │
└──────┴──────┴──────┴──────┴──────┘
   x0 ⍳ x0
0 1 2 3 4

   ⊂⍤¯1⊢y0←0⊃y
┌──────┬──────┬──────┬──────┬──────┬──────┐
│Min   │Mary  │John  │Monika│Mesut │Mesut │
└──────┴──────┴──────┴──────┴──────┴──────┘
   x0 ⍳ y0
3 1 0 2 5 5

We derive a more efficient computation of index-of on inverted tables. The indices obtain by first uninverting the tables, that is, by first applying vert . Replacing vert by its definition, we see that ⊂⍤¯1 plays a major role. ⊂⍤¯1 encloses, or alternatively computes a scalar representation. For purposes of index-of x⍳¨x and x⍳¨y are much more efficient representations (small integers) than ⊂⍤¯1¨x and ⊂⍤¯1¨y (the data itself).

The point is illustrated on column 0: For purposes of index-of, the integers have the same information as the boxed data.



Inverted Table: Index-Of  III

   ⍪¨x                       ⍉↑x⍳¨x
┌──────┬─┬───┬──┐
│John  │M│USA│26│         0 0 0 0
│Mary  │F│UK │24│         1 1 1 1
│Monika│F│DE │31│         2 1 2 2
│Min   │F│CN │17│         3 1 3 3
│Max   │M│IT │29│         4 0 4 4
└──────┴─┴───┴──┘
   ⍪¨y                       ⍉↑x⍳¨y
┌──────┬─┬───┬──┐
│Min   │F│CN │17│         3 1 3 3
│Mary  │F│UK │24│         1 1 1 1
│John  │M│UK │26│         0 0 1 0
│Monika│F│DE │31│         2 1 2 2
│Mesut │M│DE │24│         5 0 2 1
│Mesut │M│DE │24│         5 0 2 1
└──────┴─┴───┴──┘
   (vert x)⍳(vert y)         (⍉↑x⍳¨x) ⍳ (⍉↑x⍳¨y)
3 1 5 2 5 5               3 1 5 2 5 5

What results from ⍉↑x⍳¨x and ⍉↑x⍳¨y are integer matrices; index-of on these matrices produces the required record indices. The expression in red computes index-of on inverted tables.



Extended : Tool of Thought

{(⍉↑⍺⍳¨⍺)⍳(⍉↑⍺⍳¨⍵)}

MatIota←{(↓⍺)⍳↓⍵}
{(⍉↑⍺ MatIota¨⍺) MatIota (⍉↑⍺ MatIota¨⍵)}

{(⍉↑⍺{(↓⍺)⍳↓⍵}¨⍺) {(↓⍺)⍳↓⍵} (⍉↑⍺{(↓⍺)⍳↓⍵}¨⍵)}

The short Dfn replaced a 28-line tradfn in the client application, and is 20% faster. We propose to make it an idiom.

Without the extension to  the algorithm would have been harder to see. The extension to index-of makes it a better tool of thought. We need a similar extension for sort: you know, extended , find anything; extended sort, sort anything.

The Primitive Performance talk this afternoon will have more examples of using the new language facilities in v14.0.



created:  2013-10-06 10:00
updated:2013-10-22 07:25