Progressive Index-Of

⎕io=0 is assumed throughout.

A recent Forum post motivated investigations into the progressive index-of functions in the FinnAPL Idiom Library:

pix  ← {((⍴⍺)⍴⍋⍋⍺⍳⍺,⍵) ⍳ ((⍴⍵)⍴⍋⍋⍺⍳⍵,⍺)}   ⍝ FinnAPL Idiom 1
pixa ← {((⍋⍺⍳⍺,⍵)⍳⍳⍴⍺) ⍳ ((⍋⍺⍳⍵,⍺)⍳⍳⍴⍵)}   ⍝ FinnAPL Idiom 5

In this note, we:

explain what is progressive index-of
explain why the two functions work
investigate the performance of the two functions
provide a more general solution

Progressive index-of is like index-of (⍳) except that each find “uses up” the target of that find. There are no duplicates in the result with the possible exception of ≢⍺ (for “not found”). Thus:

      x←'mississippi'
      y←'dismiss'

      x pix y
11 1 2 0 4 3 5

The following chart illustrates a step-by-step derivation of each progressive index:

0 1 2 3 4 5 6 7 8 9 10

m i s s i s s i p p  i      d i s m i s s
                           11
m i s s i s s i p p  i      d i s m i s s
                           11 1
m i s s i s s i p p  i      d i s m i s s
                           11 1 2
m i s s i s s i p p  i      d i s m i s s
                           11 1 2 0
m i s s i s s i p p  i      d i s m i s s
                           11 1 2 0 4
m i s s i s s i p p  i      d i s m i s s
                           11 1 2 0 4 3
m i s s i s s i p p  i      d i s m i s s
                           11 1 2 0 4 3 5

It is possible to compute the progressive index without looping or recursion, as the two FinnAPL functions demonstrate.

Why It Works

The basic idea of ⍺ pix ⍵ is to substitute for each item of ⍺ and ⍵ an equivalent representative, collectively c and d, whence the result obtains as c⍳d. The equivalent representative used here is ranking, specifically the ranking of the indices in ⍺.

The ranking of an array ⍵ is a permutation of order ≢⍵. The smallest major cell is assigned 0; the next smallest is assigned 1; and so on. Ties are resolved by favoring the earlier-occurring cell. The ranking can be computed by ⍋⍋⍵. For example:

      x ⍪ ⍉⍪ ⍋⍋x
m i s s i s  s i p p i
4 0 7 8 1 9 10 2 5 6 3

      y ⍪ ⍉⍪ ⍋⍋y
d i s m i s s
0 1 4 3 2 5 6

⍺ pix ⍵ works on two different rankings of indices in ⍺:

⍋⍋⍺⍳⍺,⍵ rankings of indices in ⍺ of ⍺ and ⍵, favoring ⍺
⍋⍋⍺⍳⍵,⍺ rankings of indices in ⍺ of ⍵ and ⍺, favoring ⍵

The first ⍴⍺ items of the former are those for ⍺ and the first ⍴⍵ of the latter are those for ⍵, and we get

pix ← {((⍴⍺)⍴⍋⍋⍺⍳⍺,⍵) ⍳ ((⍴⍵)⍴⍋⍋⍺⍳⍵,⍺)}

The second version depends on the following properties of permutations. Let p be a permutation. Then p[⍋p] ←→ ⍳≢p, the identity permutation, and therefore ⍋p is the inverse of p. Furthermore, p[p⍳⍳≢p] ←→ ⍳≢p and so p⍳⍳≢p is also the inverse of p. The inverse is unique (that’s why it’s called the inverse), therefore ⍋p ←→ p⍳⍳≢p.

      p←97?97         ⍝ a random permutation

      p[⍋p]    ≡ ⍳≢p
1
      p[p⍳⍳≢p] ≡ ⍳≢p
1
      (⍋p)     ≡ p⍳⍳≢p
1

The two rankings are permutations (because the leftmost functions are ⍋) and we just need the first ⍴⍺ items of the former and the first ⍴⍵ items of the latter. Thus:

pixa ← {((⍋⍺⍳⍺,⍵)⍳⍳⍴⍺) ⍳ ((⍋⍺⍳⍵,⍺)⍳⍳⍴⍵)}

Performance

We note that both versions of pix contain the expressions ⍺⍳⍺,⍵ and ⍺⍳⍵,⍺, but the latter is just a rotation of the former. Thus:

pixb ← {i←⍺⍳⍺,⍵ ⋄ ((⍴⍺)⍴⍋⍋i) ⍳ ((⍴⍵)⍴⍋⍋(⍴⍺)⌽i)}
pixc ← {i←⍺⍳⍺,⍵ ⋄ ((⍋i)⍳⍳⍴⍺) ⍳ ((⍋(⍴⍺)⌽i)⍳⍳⍴⍵)}

Which is faster? The answer may surprise.

      x←?1e6⍴3e5
      y←?2e5⍴3e5

      cmpx 'x pixb y' 'x pixc y'
  x pixb y → 9.15E¯2 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  x pixc y → 9.21E¯2 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

A few factors about the Dyalog APL interpreter are relevant to this performance:

Computing ⍺⍳⍵,⍺ as a rotation of an already computed i←⍺⍳⍺,⍵ produces a worthwhile speed-up, although only on a relatively small part of the overall computation.

      i←x⍳x,y
      cmpx '(⍴x)⌽i' 'x⍳y,x'
  (⍴x)⌽i → 5.00E¯4 |     0% ⎕⎕                            
  x⍳y,x  → 7.19E¯3 | +1337% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Both ⍳ and ⍋ have special code for small range data.

      s←?1e6⍴5e5           ⍝ small range
      t←s ⋄ t[t⍳⌈/t]←2e9   ⍝ large range

      cmpx 's⍳s' 't⍳t'
  s⍳s → 5.87E¯3 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕                     
  t⍳t → 2.00E¯2 | +240% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

      cmpx '⍋s' '⍋t'
  ⍋s → 3.25E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕     
  ⍋t → 3.84E¯2 | +18% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

⍋⍵ has special code when ⍵ is a permutation.

      p←1e6?1e6           ⍝ p is a permutation
      q←p ⋄ q[999999]←⊃q  ⍝ q is not; both are small-range

      cmpx '⍋p' '⍋q'
  ⍋p → 5.81E¯3 |    0% ⎕⎕⎕                           
* ⍋q → 5.71E¯2 | +882% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

We saw previously that if p is a permutation then ⍋p ←→ p⍳⍳⍴p. The special code for ⍋p makes the two expressions run at roughly the same speed. The slight advantage for ⍋⍋x versus (⍋x)⍳⍳⍴x would increase if and when ⍋⍋ is recognized as an idiom.

      cmpx '⍋p' 'p⍳⍳⍴p'
  ⍋p    → 6.02E¯3 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕   
  p⍳⍳⍴p → 6.57E¯3 | +9% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

      cmpx '⍋⍋x' '(⍋x)⍳⍳⍴x'
  ⍋⍋x      → 3.16E¯2 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  (⍋x)⍳⍳⍴x → 3.25E¯2 | +2% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

A General Solution

Index-of works on cells rather than just scalars. Likewise, progressive index-of can also be extended to work on cells. The core algorithm remains the same. The generalization obtains by first reshaping ⍵ to have the same rank as ⍺ (having major cells with the same shape), applying the core algorithm, and then reshaping its result to have the same leading shape as the original ⍵. Thus:

pixd←{
  m←≢⍺
  r←0⌊1-⍴⍴⍺
  n←×/r↓⍴⍵
  i←⍺⍳⍺⍪(n,1↓⍴⍺)⍴⍵
  (r↓⍴⍵) ⍴ ((⍋i)⍳⍳m) ⍳ ((⍋m⌽i)⍳⍳n)
}

   xx              yy
mmmm            dddd
iiii            iiii
ssss            ssss
ssss            mmmm
iiii            iiii
ssss            ssss
ssss            ssss
iiii     
pppp     
pppp                                  x
iiii                               mississippi

   ⍴xx             ⍴yy                y
11 4            7 4                dismiss

   xx pixd yy                         x pixd y
11 1 2 0 4 3 5                     11 1 2 0 4 3 5

   xx pixd 3 5 4⍴yy                   x pixd 3 5⍴y
11  1  2  0  4                     11  1  2  0  4
 3  5 11  7  6                      3  5 11  7  6
11 10 11 11 11                     11 10 11 11 11

Postscript
After having written the above, I discovered an alternative exposition on progressive index-of by Bob Smith entitled Anatomy of an Idiom. Adám Brudzewsky has produced a Stack Exchange lesson and a Jupyter Notebook based on Smith’s text.

There is also an exposition in J on the same topic, with a more verbose but easier-to-understand derivation.

2 thoughts on “Progressive Index-Of”

Henk Barkhof on October 23, 2018 at 08:52 said:

Very interesting.

1 remark. y ⍪ ⍉ ⍪ ⍋⍋y give the same as ⍋⍋y

Do I overlook something here ?
Roger Hui on October 23, 2018 at 09:11 said:

I don’t understand your comment.
y←'dismiss'
y ⍪ ⍉ ⍪ ⍋⍋y d i s m i s s 0 1 4 3 2 5 6
⍋⍋y 0 1 4 3 2 5 6

The two expressions don’t give the same results. (The first expression labels each item of ⍋⍋y for easier interpretation.)

The tool of thought for

software solutions

Contact Us

Progressive Index-Of