Changes of Heart

Karen Shaw started the ball rolling (hearts afluttering?) by asking Jay Foad to come up with a one-liner for St. Valentine’s Day; he then solicited contributions from the language development group. Nick Nickolov responded with the following, with no explanation other than that there is room for improvement:

⎕io←0 ⋄ (⊢,⌽)' X'[{(.5×n*2)>+/(⍺-.6×⍵)⍵*2}/¨0↓n-⍳(2×n),n←20]

           XXXXX        XXXXX           
         XXXXXXXXXX  XXXXXXXXXX         
        XXXXXXXXXXXXXXXXXXXXXXXX        
       XXXXXXXXXXXXXXXXXXXXXXXXXX       
       XXXXXXXXXXXXXXXXXXXXXXXXXX       
       XXXXXXXXXXXXXXXXXXXXXXXXXX       
      XXXXXXXXXXXXXXXXXXXXXXXXXXXX      
      XXXXXXXXXXXXXXXXXXXXXXXXXXXX      
      XXXXXXXXXXXXXXXXXXXXXXXXXXXX      
      XXXXXXXXXXXXXXXXXXXXXXXXXXXX      
       XXXXXXXXXXXXXXXXXXXXXXXXXX       
       XXXXXXXXXXXXXXXXXXXXXXXXXX       
       XXXXXXXXXXXXXXXXXXXXXXXXXX       
       XXXXXXXXXXXXXXXXXXXXXXXXXX       
        XXXXXXXXXXXXXXXXXXXXXXXX        
        XXXXXXXXXXXXXXXXXXXXXXXX        
        XXXXXXXXXXXXXXXXXXXXXXXX        
         XXXXXXXXXXXXXXXXXXXXXX         
         XXXXXXXXXXXXXXXXXXXXXX         
          XXXXXXXXXXXXXXXXXXXX          
           XXXXXXXXXXXXXXXXXX           
           XXXXXXXXXXXXXXXXXX           
            XXXXXXXXXXXXXXXX            
             XXXXXXXXXXXXXX             
             XXXXXXXXXXXXXX             
              XXXXXXXXXXXX              
               XXXXXXXXXX               
                XXXXXXXX                
                 XXXXXX                 
                   XX                   

Subsequently, Nick described how he derived the expression:

Between two meetings I saw Jay’s request for ideas. I sent him a link to the Heart Curve on MathWorld, but he thought we’d need graphics for that. To prove that ASCII art is good enough, I tried to plot the set of points (x,y) where (x2 + y2 – 1)3x2 y3 ≈ 0 but the interpreter crashed with a syserror for some reason. (I don’t recall what I last did and can’t reproduce it.) Since I had to type the whole thing again anyway and I hadn’t been too successful in visualising the solutions of the above equation, I decided to write it in a less beautiful but simpler way: draw half a circle, tilt it, and concatenate a mirror image to form a heart. The moment I got something that resembled a heart, I sent the email and went downstairs for coffee.

Meanwhile, I changed Nick’s expression in steps in the process of trying to understand it. ⎕io←0 throughout.

a.  (⊢,⌽)' X'[{(.5×n*2)>+/(⍺-.6×⍵)⍵*2}/¨0↓n-⍳(2×n),n←20]
Nick’s original expression.

b. ,∘⌽⍨' X'[{(.5×n*2)>+/(⍺-.6×⍵)⍵*2}/¨0↓n-⍳(2×n),n←20]
The fork (⊢,⌽) computes x,⌽x without use of a temporary variable. It is equivalent to ,∘⌽⍨.

c. ,∘⌽⍨' X'[{(.5×n*2)>+/(⍺-.6×⍵)⍵*2}/¨n-⍳(2×n),n←20]
The 0↓ is a no-op and is likely scaffolding left from the development process.

d. ,∘⌽⍨' X'[{(.5×n*2)>+/(⍺-.6×⍵)⍵*2}/¨n-⍳2 1×n←20]
(2×n),n←20 is equivalent to 2 1×n←20.

e. ,∘⌽⍨' X'[(.5×n*2)>{+/(⍺-.6×⍵)⍵*2}/¨n-⍳2 1×n←20]

The comparison with .5×n*2 is on the result of the {...} and is likely more efficient “outside” rather than explicitly applied to each scalar “inside”. More importantly, moving it outside avoids the use of a lexical global, an aesthetic infelicity (your opinion may differ).

At this step, it dawned on me that is applied to a 2-element vector, resulting in a 2×n by n matrix of 2-element integer vectors:

   ⍳2 1×n←20
┌───┬───┬───┬───┬───┬
│0 0│0 1│0 2│0 3│0 4│
├───┼───┼───┼───┼───┼
│1 0│1 1│1 2│1 3│1 4│ ...
├───┼───┼───┼───┼───┼
│2 0│2 1│2 2│2 3│2 4│
├───┼───┼───┼───┼───┼
│3 0│3 1│3 2│3 3│3 4│
├───┼───┼───┼───┼───┼
     ...

f. ,∘⌽⍨' X'[(.5×n*2)>⊃∘.{+/(⍺-.6×⍵)⍵*2}/n-⍳¨2 1×n←20]
The reduction on each 2-element vector is equivalent to an outer product; that is, f/¨⍳a,b is equivalent to ⊃∘.f/⍳¨a,b. Not much of a step forward but facilitates what comes next.

g. ,∘⌽⍨' X'[(.5×n*2)>(n-⍳2×n)∘.{+/(⍺-.6×⍵)⍵*2}n-⍳n←20]
Applying twice removes the reduction and makes the outer product more straightforward.

h. ,∘⌽⍨' X'[(n×*⍨.5)>(n-⍳2×n)∘.{|⍵+0j1×⍺-.6×⍵}n-⍳n←20]
The function {+/(⍺-.6×⍵)⍵*2} computes the sum-of-squares on ⍺-.6×⍵ and . It is equally the square of the magnitude of a complex number whose real and imaginary parts are the two expressions. The subsequent comparison to .5×n*2 has the same outcome if instead we compare the square root of both sides, that is, compare n×*⍨.5 and the magitude (not the square of the magnitude) of the complex number.

i. (⎕ucs 32 9829)[(n×*⍨.5)>i∘.{|⍵+0j1×⍺-.6×⍵}|i←n-⍳2×n←20]

A couple of ideas here: (0) Instead of using X in the picture, use the heart symbol, Unicode codepoint 9829. (1) Instead of creating half a heart and then stitching the two halves together with ,∘⌽⍨, the whole heart can be created using an appropriate right argument to the outer product.

This result differs slightly from the previous ones, but makes a prettier picture.

j. ' ♥'[(n×*⍨.5)>∘.(|1 ¯.6j1+.×,)∘|⍨n-⍳2×n←20]

I remembered that Unicode codepoints can be included directly in an APL expression, so ⎕ucs 32 9829 can be replaced by ' ♥'. (Much to my relief. Talk about lack of aesthetics!)

The arguments to the outer product are i and |i, and i can be eliminated altogether by doing f∘|⍨ instead of i f |i.

The function {|⍵+0j1×⍺-.6×⍵}, which computes the magnitude of a complex number, remains the same if the real and imaginary parts are switched, and in this case it is advantageous to switch them for brevity. Moreover, the computation can be coded as a train involving an inner product.

k. ' ♥'[(n×*⍨.5)>|∘.{⍺-⍵×.6j1}∘|⍨n-⍳2×n←20]
Sometimes, trains are better coded as dfns. The magnitude | is moved “outside”, and is unchanged if the complex coefficient is replaced by the negation of its conjugate and the operation changed from + to -.

l. ' ♥'[(n÷2*÷2)>|i∘.-.6j1×|i←n-⍳2×n←20]
The expression can be shortened by using the variable i after all (last seen in step i). Replace n×*⍨.5 by n÷2*÷2 and voilà, a retro expression without dfns, trains, or any of them new-fangled operators.

In summary

a.     (⊢,⌽)' X'[{(.5×n*2)>+/(⍺-.6×⍵)⍵*2}/¨0↓n-⍳(2×n),n←20]
b.     ,∘⌽⍨' X'[{(.5×n*2)>+/(⍺-.6×⍵)⍵*2}/¨0↓n-⍳(2×n),n←20]
c.     ,∘⌽⍨' X'[{(.5×n*2)>+/(⍺-.6×⍵)⍵*2}/¨n-⍳(2×n),n←20]
d.     ,∘⌽⍨' X'[{(.5×n*2)>+/(⍺-.6×⍵)⍵*2}/¨n-⍳2 1×n←20]
e.     ,∘⌽⍨' X'[(.5×n*2)>{+/(⍺-.6×⍵)⍵*2}/¨n-⍳2 1×n←20]
f.     ,∘⌽⍨' X'[(.5×n*2)>⊃∘.{+/(⍺-.6×⍵)⍵*2}/n-⍳¨2 1×n←20]
g.     ,∘⌽⍨' X'[(.5×n*2)>(n-⍳2×n)∘.{+/(⍺-.6×⍵)⍵*2}n-⍳n←20]
h.     ,∘⌽⍨' X'[(n×*⍨.5)>(n-⍳2×n)∘.{|⍵+0j1×⍺-.6×⍵}n-⍳n←20]
i.     (⎕ucs 32 9829)[(n×*⍨.5)>i∘.{|⍵+0j1×⍺-.6×⍵}|i←n-⍳2×n←20]
j.     ' ♥'[(n×*⍨.5)>∘.(|1 ¯.6j1+.×,)∘|⍨n-⍳2×n←20]
k.     ' ♥'[(n×*⍨.5)>|∘.{⍺-⍵×.6j1}∘|⍨n-⍳2×n←20]
l.     ' ♥'[(n÷2*÷2)>|i∘.-.6j1×|i←n-⍳2×n←20]

Zero-length Regular Expression Matches Considered Harmful

I was asked by a colleague why ⎕S reports two matches in the following example:

      ('\d*'⎕S 0 1)'321'
┌───┬───┐
│0 3│3 0│
└───┴───┘

Here we are asking for the position and length of sequences of zero or more digits in an input document containing three numeric characters. Intuitively there is just one match of all three characters from the start of the sequence, but ⎕S reports an additional match of zero characters at the end.

The short explanation is this: when there is a match in the input, the matching characters are “consumed” and searching continues from after them. In this example the first match consumes all three characters in the input, leaving it empty. Searching then resumes and, as the pattern matches the zero remaining characters, there is a second match.

If it seems odd that we search when the input is empty then consider the case where the input is empty to start with:

      ('\d*'⎕S 0 1)''
┌───┐
│0 0│
└───┘

In this case we should not be surprised there is a match – the pattern explicitly allows it and, indeed, we would have coded the search pattern differently, perhaps as '\d+', if that was not wanted. It is therefore correct and consistent that the first example matches zero characters as well.

My colleague was not yet convinced. In this case, he asked, why are there not an infinite number of matches at the end?

It’s a good question. Zero-length matches have to be treated specially by the search engine to prevent exactly that, wherever they appear. The rule used by the PCRE search engine, which Dyalog uses, is that any pattern that results in a zero-length match is prohibited from generating another zero-length match until more characters are consumed from the input. PCRE is widely used so this behaviour is commonplace, but other search engine implementations take a different view and simply consume a single character following a zero-length match to prevent the repetition.

That difference in the rules would not have affected our example but consider a slightly more complex search pattern containing alternatives – here, zero or more digits, or one word character:

      ('\d*|\w'⎕S 0 1)'x321'
┌───┬───┬───┬───┐
│0 0│0 1│1 3│4 0│
└───┴───┴───┴───┘

In this example the first match is of zero digits at the start of the input – that is, at offset zero and zero characters long. PCRE then resumes searching at offset zero but without allowing further zero-length matches. Consequently the first alternative pattern cannot match this time but the second can. The next match is therefore a single word character (the 'x'), also at the start of the input. The 'x' is consumed and searching resumes with '321' as the input where, as explained above, there are two further matches making four in total.

Search engines which simply consume a character after a zero-length match would give a different result. After the same initial match of zero characters at the start of the input they would consume the 'x' and resume searching at '321' to give two further matches and a total of only three.

So, patterns which allow zero-length matches appear to be counter-intuitive and the cause of incompatibilities with other search engines. Should we avoid '*' and other qualifiers which allow zero repetitions of a pattern? Of course not – used carefully they can be very effective.

Firstly, a pattern which allows a sequence of zero characters within it can still be constructed so that the overall match can never have a length of zero. For example, 'colou?r' allows zero or one 'u' characters after the second 'o' so that it matches both the British English spelling 'colour' and the American English spelling 'color' – but it never results in a zero-length match, and should never surprise anyone in the way it works.

Secondly, anchors can often be used to prevent those unintuitive extra matches. Anchors in search patterns don’t match actual characters – rather, they allow the match to succeed only at specific locations in the input. '^' and '$' mean the start and end of a line respectively which, used in our very first example, prevent anything from preceding or following a match within a line and exclude the zero-length sequence because it does not satisfy that requirement:

      ('^\d*$'⎕S 0 1)'321'
┌───┐
│0 3│
└───┘

Thirdly – it often doesn’t matter that zero-length matches occur. In this example the search pattern matches any character except newline zero or more times and we are asking for it to be folded to upper-case:

      ('.*' ⎕R '\u&') 'Hello there'
HELLO THERE

The first match is of all 11 characters which are folded to upper-case and replaced. Then is there a second match of zero characters at the end and this too is folded and replaced, having no further effect. You can see this is so by using a different replacement pattern:

      ('.*' ⎕R '[\u&]') 'Hello there'
[HELLO THERE][]

As an aside, a ⎕R search pattern constructed so that it has no visible effect at all can be very useful for doing a quick and simple read of a text file into the workspace. In this example:

      tieno←⎕NTIE 'input.txt'
      ''⎕R''⊢tieno

text is read from the file without modification into a vector of character vectors, one line per element. There are many search and replace patterns that could be used to do this but this is the shortest, and it makes use of zero-length matches and replacements.

Perhaps, after all, the title of this blog should have been “zero-length regex matches considered helpful“?

Further Reading

Further information about the Dyalog search and replace operators is available in the following documents:

Footnote

Edsger W. Dijkstra wrote a paper in 1968 for Communications of the ACM entitled a “Case against the GO TO Statement” but it was published as a letter under the title “The goto statement considered harmful”. “Considered harmful” became a popular phrase in articles written in response to this and in other unrelated papers. There is an article about the phrase in Wikipedia.

Dijkstra also wrote a 1975 paper in which he criticised a number of computer languages he had used. Of APL he said, “APL is a mistake, carried through to perfection. It is the language of the future for the programming techniques of the past: it creates a new generation of coding bums”.

This author is not afraid to use goto statements or APL.

Solving the 2014 APL Problem Solving Competition – Cryptography Problem 3

This post is the continuation of the series where we examine some of the problems selected for the 2014 APL Problem Solving Competition. In this post we’ll conclude looking at the cryptography problems from Phase II that we started looking at in a previous blog post and continued in a further blog post.

Cryptography Problem 3 – Playfair Cipher

Task 1 – Squaring Off

The first task is to convert a string into a 5×5 Playfair table. The solution makes straightforward use of APL primitives:

  • Unique () to remove duplicate characters from the string.
  • Without (~) to find the rest of the alphabetic characters that are not mentioned in the string, and again to remove the character J.
PlayfairTable←{
    k←∪⍵
    5 5⍴(k,⎕A~k)~'J'
}

Here it is in action:

      ⊢table←PlayfairTable'KENNETHEIVERSON'
KENTH
IVRSO
ABCDF
GLMPQ
UWXYZ

Task 2 – Encryption

To encrypt a message we need to take two characters at a time, find their coordinates in the 5×5 Playfair table, swap their column coordinates and look up the letters at the new coordinates. There are lots of tricks in the code below of which we’ll describe just a few:

  • To process the message two characters at a time, appending to the result as we go, we use a tail-recursive dfn whose left argument accumulates the result. For an introduction to this technique see this implementation of the Fibonacci function.
  • To find letters in the Playfair table we first look them up in the ravel (i.e. the linearised form) of the table, and then use Encode () to convert this linear index into a pair of coordinates. Decode () does the opposite, converting a pair of coordinates back into a linear index.
  • Mix () and Split () are used to convert between two different representations of the coordinates of the two characters we’re encoding: either as a flat 2×2 matrix, or as a nested 2-vector of 2-vectors. The choice of representation is largely a matter of taste, and it might be fun to play with this part of the code. You could tweak it to work entirely with the flat representation, or entirely with the nested representation, rather than converting back and forth between them.
PlayfairEncrypt←{
    ⎕IO←0                       ⍝ To aid arithmetic modulo 5.
    t←⍺                         ⍝ The Playfair table.
    m←⍵,(2|≢⍵)/'Z'              ⍝ Add a Z if the message length is odd.
    ((m='J')/m)←'I'             ⍝ Convert J to I in the message.
    ''{                         ⍝ Start with an empty accumulator.
        0=≢⍵:⍺                  ⍝ Finished: return the accumulated encrypt.
        1=≢⍵:⍺ ∇ ⍵,'X'          ⍝ Only one character left: add an X.
        p←2↑⍵
        =/p:⍺ ∇(⊃⍵),'X',1↓⍵     ⍝ Duplicate character: insert an X.
        c←(⍴t)⊤(,t)⍳p           ⍝ Coords of each letter in the table.
        c←{
            =/1↑⍵:↑(⍴t)|0 1+↓⍵  ⍝ Same row: move to the right.
            =/1↓⍵:↑(⍴t)|1 0+↓⍵  ⍝ Same column: move down.
            0 1⌽⍵               ⍝ Else swap column coords.
        }c
        p←(,t)[(⍴t)⊥c]          ⍝ Look up new coords in table.
        (⍺,p)∇ 2↓⍵              ⍝ Tail recursion.
    }m
}

Here it is in action:

      ⊢cipher←table PlayfairEncrypt'HELLOWORLD'
KNMWQVZVVMCY

(Note that this result is slightly different from that given in the original problem description, because of some confusion about the rules for handling duplicate letters and odd message lengths, which were clarified later in the student competition forums.)

Task 3 – Decryption

Decryption is very similar to encryption. The differences are:

  • Letters found in the same row of the table need to move left instead of right, and letters in the same column need to move up instead of down.
  • We don’t need to worry about the input having an odd length, or containing the letter J.

Hence the code for PlayfairDecrypt is shorter than but very similar to PlayfairEncrypt:

PlayfairDecrypt←{
    ⎕IO←0                       ⍝ To aid arithmetic modulo 5.
    t←⍺                         ⍝ The Playfair table.
    ''{                         ⍝ Start with an empty accumulator.
        0=≢⍵:⍺                  ⍝ Finished: return the accumulated encrypt.
        p←2↑⍵
        c←(⍴t)⊤(,t)⍳p           ⍝ Coords of each letter in the table.
        c←{
            =/1↑⍵:↑(⍴t)|0 ¯1+↓⍵ ⍝ Same row: move to the left.
            =/1↓⍵:↑(⍴t)|¯1 0+↓⍵ ⍝ Same column: move up.
            0 1⌽⍵               ⍝ Else swap column coords.
        }c
        p←(,t)[(⍴t)⊥c]          ⍝ Look up new coords in table.
        (⍺,p)∇ 2↓⍵              ⍝ Tail recursion.
    }⍵
}

Here it is in action:

      table PlayfairDecrypt cipher
HELXLOWORLDX

Solving the 2014 APL Problem Solving Competition – Cryptography Problem 2

This post is the continuation of the series where we examine some of the problems selected for the 2014 APL Problem Solving Competition. In this post we’ll continue looking at the cryptography problems from Phase II that we started looking at in a previous blog post.

Cryptography Problem 2 – Book Cipher Variation

Task 1 – Let’s Get Normal

The first task is to normalise some text by weeding out non-alphabetic characters, collapsing consecutive spaces and converting to upper case. It’s possible to do this all by hand with APL, but it’s much easier to use the ⎕R operator to search for and replace regular expressions.

Taking the transformations in turn, here’s how to convert non-alphabetic characters in message to spaces:

('[^[:alpha:]]'⎕R' ')⍵

Here’s how to convert multiple consecutive spaces to a single space:

(' +'⎕R' ')⍵

And here’s how to convert every alphabetic character to upper case:

('.'⎕R'\u&')⍵

We can combine the first two of these, by converting any sequence of one or more non-alphabetic characters to a single space, giving the following implementation:

Normalise←{
    text←⎕SE.UnicodeFile.ReadText ⍵
    ('[^[:alpha:]]+' '.'⎕R' ' '\u&'⍠'Mode' 'D')text
}

The option 'Mode' 'D' tells ⎕R to operate in Document mode, which processes the whole file at once instead of line by line, as we are not interested in preserving the original line breaks. Here it is in action:

      70↑bor←Normalise'/home/jay/Desktop/BillOfRights.txt'
THE PREAMBLE TO THE BILL OF RIGHTS CONGRESS OF THE UNITED STATES BEGUN

Task 2 – Encryption

In this cipher there are lots of different ways of encoding each character of the message, and we are free to pick any of them. In order to try to “minimise the number of duplicated pairs in the result”, we simply pick randomly whenever we have a free choice. The function pickone helps with this. Given a boolean vector, it first uses {⍵/⍳⍴⍵} to get a vector of the indices of all the 1 bits, and then uses {⍵[?≢⍵]} to choose one of these indices at random.

In this coding of BookEncrypt, the anonymous inner dfn encodes a single character of the message into a (word offset) pair. These pairs are joined together with ⊃,/, a common pattern for catenating strings. The Disclose is required because, in Dyalog, reduction always reduces the rank of its argument, so ,/ on a vector of strings returns a scalar: the enclose of the catenated strings.

BookEncrypt←{
    pickone←{⍵[?≢⍵]}∘{⍵/⍳⍴⍵}
    b←' ',⍺                     ⍝ b has a space wherever a word starts in the key.
    s←{⍵/⍳⍴⍵}b=' '              ⍝ Get the indices of all the word starts.
    ⊃,/{
         p←pickone b=⍵          ⍝ Choose a random occurrence of letter ⍵
         p-←s                   ⍝ and get its offset within each word.
         w←pickone{(0<⍵)∧⍵≤20}p ⍝ Choose a word with a reasonable offset
         w(p[w])                ⍝ and return the (word offset) pair.
    }¨⍵                         ⍝ ... for each letter in the message.
}

Here it is in action:

      ⊢cipher←bor BookEncrypt 'MYSECRETMESSAGE'
480 11 523 11 440 6 115 5 78 16 579 18 696 20 330 16 544 4 658 17 400 9 661 11
      246 18 186 4 482 13

Task 3 – Decryption

Decryption is simpler then encryption, because there is no need to make random choices. All we have to do is:

  • Find the index of the start of each word in the key, as before.
  • Split the input into pairs of numbers.
  • For each pair, find the character in the key at the specified offset from the start of the specified word.

There are various ways to split the input into pairs of numbers. Here, we do it with the Rank operator (). Encrypting an N-character message gives a vector of 2×N numbers. To split it into pairs we first reshape it into a matrix with N rows and 2 columns; and then use f⍤1 to apply f to the rank-1 subarrays of this matrix, which are its row vectors.

Here’s the code:

BookDecrypt←{
    b←' ',¯1↓⍺                  ⍝ b has a space wherever a word starts in the key.
    s←{⍵/⍳⍴⍵}b=' '              ⍝ Get the indices of all the word starts.
    {
        (w o)←⍵                 ⍝ Get word number and offset
        b[s[w]+o]               ⍝ and find the character at that position
    }⍤1⊢(0.5×≢⍵)2⍴⍵             ⍝ ... for each pair of numbers in the input.
}

And here it is in action:

      bor BookDecrypt cipher
MYSECRETMESSAGE

To be continued…

Solving the 2014 APL Problem Solving Competition – Cryptography Problem 1

This post is a continuation of the series where we examine some of the problems selected for the 2014 APL Problem Solving Competition. I’ll start by looking at the cryptography problems from Phase II.

Cryptography Problem 1 – Vigenère Cipher

The cipher is described using a large table of letters, but you don’t need to create this table in APL. Instead, you can convert each letter of the key and the corresponding letter of the message to numbers; add them together; and then convert the result back to a letter. Some subtleties:

  • We want the result of the addition to “wrap around” at the end of the alphabet, so 25 + 1 = 26 (Z) but 25 + 2 = 27 which should wrap round to 1 (A). This is called modular arithmetic.
  • We can implement modular arithmetic easily using APL’s Residue primitive, but this works most naturally when the alphabet is numbered from 0 to 25, rather than from 1 to 26. To use 0-based numbering, I’ll set ⎕IO←0 in the code below.
  • An idiomatic way of converting letters to numbers is to look them up in the ⎕A, the upper case alphabet: ⎕A⍳⍵. To convert the other way we can simply index into ⎕A: ⎕A[⍵]. (For more comprehensive conversions between characters and numbers see the ⎕UCS system function.)

The code is then straightforward:

VigEncrypt←{
    ⎕IO←0                       ⍝ use 0-based numbering
    key←(⍴⍵)⍴⎕A⍳⍺               ⍝ convert key to numbers and expand to length of message
    ⎕A[26|(⎕A⍳⍵)+key]           ⍝ add key to message and convert back to characters
}

To encrypt the message we added the key (modulo 26). To decrypt, we need to subtract the key (modulo 26) instead, so we can go from VigEncrypt to VigDecrypt just by changing one character in the code. Note that the result of the subtraction might be negative, but Residue (unlike the remainder or modulo operations in some other programming languages you may have used) will still give us the correct result in the range 0 to 25. For example, 26|¯3 is 23.

VigDecrypt←{
    ⎕IO←0                       ⍝ use 0-based numbering
    key←(⍴⍵)⍴⎕A⍳⍺               ⍝ convert key to numbers and expand to length of message
    ⎕A[26|(⎕A⍳⍵)-key]           ⍝ subtract key from message and convert back to characters
}

Here are the functions in action:

      key←'KEIVERSON'
      key VigEncrypt'APLISFUNTOUSE'
KTTDWWMBGYYAZ
      key VigDecrypt key VigEncrypt'APLISFUNTOUSE'
APLISFUNTOUSE

To be continued…

The Diamond Kata

Acknowledgments

Morten Kromberg is the other co-author of this text but the blogging software prevents his being listed as such.

We are indebted to Jay Foad, Nick Nikolov, John Scholes, and Fiona Smith for comments on successive drafts of the MS.

The Problem

The diamond kata is a programming exercise used in the agile development, XP and TDD communities. [0, 1, 2, 3]. It first came to our attention in 2012 in discussions with Gianfranco Alongi of Ericcson AB, Sweden, and again more recently in [3]. The following problem description is from [2] (the description varies slightly in each of the above references).

We’re going to write a program that builds a diamond picture like this, for some set of letters from A to whatever:

--A--
-B-B-
C---C
-B-B-
--A--

Solutions in Dyalog APL

We are actually going to solve a slightly different problem: the function will have two arguments. The left argument is a scalar of the background element; the right argument is a vector of the “letters” to be used. The result described in the opening section can be produced as:

   '-' dia 'ABCD'
---A---
--B-B--
-C---C-
D-----D
-C---C-
--B-B--
---A---

(In an APL session, what you enter is indented and the resultant output is aligned at the left margin. As well, in this text, we sometimes present the indented-input/output pairs across the page.)

Making the argument the actual letters rather than the number of letters or the last letter, facilitates working with “alphabets” other than A-Z. For example:

   0 dia 3 1 4 2
0 0 0 3 0 0 0
0 0 1 0 1 0 0
0 4 0 0 0 4 0
2 0 0 0 0 0 2
0 4 0 0 0 4 0
0 0 1 0 1 0 0
0 0 0 3 0 0 0

The diamond result is readily seen to be composed of reflections of one of the quadrants. For example, '-' dia 'ABCD' (shown above) is composed of reflections of

A---
-B--
--C-
---D

The functions (reverse last axis) and (reverse first axis) provide the requisite reflections. If q is the quadrant shown above, then:

   ⌽q       q
---A     A---
--B-     -B--
-C--     --C-
D---     ---D

   ⌽⊖q      ⊖q
D---     ---D
-C--     --C-
--B-     -B--
---A     A---

There are various ways to stitch the quadrants together (and to drop the common middle row or column) to get the required result. The following is a terse way:

   f←{⍉⍵⍪1↓⊖⍵}

   f ⌽q          f f ⌽q        f⍣2 ⌽q
---D---       ---A---       ---A---
--C-C--       --B-B--       --B-B--
-B---B-       -C---C-       -C---C-
A-----A       D-----D       D-----D
              -C---C-       -C---C-
              --B-B--       --B-B--
              ---A---       ---A---

The same pattern (metakata?) of f⍣2 involving can be used to compute the minors of a matrix [4].

It remains to produce the quadrant q. We note that it is a diagonal matrix and resembles the identity matrix for which creation many techniques are known. We use two of the 34 different methods in [5].

The first method: For a vector with n letters, if each vector is followed by n copies of the background element, then the 2⍴n reshape of that rotates each row an additional step to the right, and the result is the required diagonal matrix.

   n←≢v←'ABCD'

   v,'-'⍴⍨s←2⍴n           s ⍴ v,'-'⍴⍨s←2⍴n
A----                  A---
B----                  -B--
C----                  --C-
D----                  ---D

The second method: for the proposed merge operator [6] ⍺ f merge ⍵ is an array like with at positions selected by f. As such, it is a functional version of selective assignment. Merge with 1 1∘⍉ as the operand does the trick.

   4 4⍴⍳16                1 1∘⍉ 4 4⍴⍳16
 1  2  3  4            1 6 11 16
 5  6  7  8
 9 10 11 12
13 14 15 16

   '-'⍴⍨2⍴n              'ABCD' (1 1∘⍉merge) '-'⍴⍨2⍴n
----                   A---
----                   -B--
----                   --C-
----                   ---D

Putting it together:

dia  ← {{⍉⍵⍪1↓⊖⍵}⍣2⌽s⍴⍵,⍺⍴⍨s←2⍴≢⍵}
diam ← {{⍉⍵⍪1↓⊖⍵}⍣2⌽⍵(1 1∘⍉merge)⍺⍴⍨2⍴≢⍵}


   '-' dia 'ABCD'         0 dia 3 1 4 2
---A---                0 0 0 3 0 0 0
--B-B--                0 0 1 0 1 0 0
-C---C-                0 4 0 0 0 4 0
D-----D                2 0 0 0 0 0 2
-C---C-                0 4 0 0 0 4 0
--B-B--                0 0 1 0 1 0 0
---A---                0 0 0 3 0 0 0

   '-' diam 'ABCD'        0 diam 3 1 4 2
---A---                0 0 0 3 0 0 0
--B-B--                0 0 1 0 1 0 0
-C---C-                0 4 0 0 0 4 0
D-----D                2 0 0 0 0 0 2
-C---C-                0 4 0 0 0 4 0
--B-B--                0 0 1 0 1 0 0
---A---                0 0 0 3 0 0 0

Further Examples

   ↑∘⎕a¨ 0 1 5 3           ⍝ four prefixes of the alphabet
┌┬─┬─────┬───┐
││A│ABCDE│ABC│
└┴─┴─────┴───┘
   '.' dia¨ ↑∘⎕a¨ 0 1 5 3  ⍝ apply dia to each prefix
┌┬─┬─────────┬─────┐
││A│....A....│..A..│
││ │...B.B...│.B.B.│
││ │..C...C..│C...C│
││ │.D.....D.│.B.B.│
││ │E.......E│..A..│
││ │.D.....D.│     │
││ │..C...C..│     │
││ │...B.B...│     │
││ │....A....│     │
└┴─┴─────────┴─────┘
   '.' {{⍉⍵⍪1↓⊖⍵}⍣2⌽s⍴⍵,⍺⍴⍨s←2⍴≢⍵}¨ ↑∘⎕a¨ 0 1 5 3
┌┬─┬─────────┬─────┐
││A│....A....│..A..│
││ │...B.B...│.B.B.│
││ │..C...C..│C...C│
││ │.D.....D.│.B.B.│
││ │E.......E│..A..│
││ │.D.....D.│     │
││ │..C...C..│     │
││ │...B.B...│     │
││ │....A....│     │
└┴─┴─────────┴─────┘

The last input expression is standalone (replacing dia in the penultimate input expression with its definition) and you can try it in a http://tryapl.org session.

Testing

Tests are often descriptive rather than prescriptive, asserting what a result should be rather than how to compute it. For that reason they tend to be more robust and easier to develop than the actual function. For example, it is much easier to test that r is a root of a polynomial than to compute it, or to check that S is a solution to an NP-complete problem than to derive it.

In this instance, we did not write the tests before writing the diamond functions, but we easily could have. The same expressive power in APL that enabled terse solutions of the diamond kata also enabled their concise and comprehensive testing.

testd d checks that d is a correct result of a diamond function, and returns a 1 if it is. Like the diamond functions, testd treats d in its totality, as an object (matrix) rather than as individual lines. This treatment allows straightforward statement of properties that would be more laborious otherwise. (For example, d≡⊖d and d≡⌽d assert that d is symmetric in the horizontal and vertical axes.)

assert←{⍺←'assertion failure' ⋄ 0∊⍵:⍺ ⎕signal 8 ⋄ shy←0}

testd←{                   ⍝ test the result of a diamond function
 assert (2=⍴⍴⍵)∧=/⍴⍵:     ⍝ square matrix
 assert (0=≢⍵)∨1=2|≢⍵:    ⍝ dimensions are 0 or odd
 assert ⍵≡⌽⍵:             ⍝ symmetric in vertical   axis
 assert ⍵≡⊖⍵:             ⍝ symmetric in horizontal axis
 n←⌈2÷⍨≢⍵                 ⍝ number of input "letters"
 q←(n,-n)↑⍵               ⍝ upper right quadrant
 assert (q=⍬⍴⍵)∨∘.=⍨⍳n:   ⍝ background or diagonal
 1                        ⍝ 1 iff everything OK
}

   testd '-' dia 'ABCD'
1

   testd 0 dia 3 1 4 2
1

If a check fails (or if there is an outright programming error), execution suspends at the first line with the error. While suspended an interactive debugging environment is provided for investigating the problem.

   testd 'ABCD'
assertion failure
testd[1] assert(2=⍴⍴⍵)∧=/⍴⍵: ⍝ square matrix
     ∧
   ⍵
ABCD
   ⍴⍵
4
   ⍴⍴⍵
1

testd is not a complete test, because it does not “know” what the specified background element or what the “letters” were. The operator T provides a complete test:

T←{                       ⍝ test a diamond function ⍺⍺
 d←⍺ ⍺⍺ ⍵                 ⍝ result of diamond function
 assert testd d:          ⍝ checks not knowing arguments
 n←≢⍵                     ⍝ number of input "letters"
 assert (⍴d)≡0 0⌈¯1+2×n:  ⍝ dimensions are 0 or ¯1+2×n
 assert (1≥n)∨⍺=⍬⍴d:      ⍝ background element
 assert ⍵≡1 1⍉(n,-n)↑d:   ⍝ "letters" in diagonal of quadrant
 1                        ⍝ 1 iff everything OK
}

⍺ dia T ⍵ checks that ⍺ dia ⍵ produces a correct result, and returns a 1 if it does (and suspends at the first line of error if it does not).

   '-' dia T 'ABCD'         '-' diam T 'ABCD'
1                        1

   0 dia T 3 1 4 2          0 diam T 3 1 4 1 5 9
1                        1

The following expression tests '-' dia x where x is a prefix of the alphabet ⎕a, from 0 to 26 letters:

   '-' (dia T)∘(⍴∘⎕a)¨ ¯1+3 9⍴⍳27
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1

The following expression tests 0 dia n?n for n from 0 to 99. The “letters” are n?n, a random permutation of order n:

   0 (dia T)∘(?⍨)¨ 10 10⍴¯1+⍳100
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1

References

  1. Rose, Seb, Recycling Tests in TDD, Claysnow Blog, 2014-11-23.
  2. Cockburn, Alistair, Thinking Before Programming, Alistair Cockburn Blog, 2014-11-29.
  3. Jeffries, Ron, TDD on the Diamond Problem, RonJeffries.com Blog, 2014-11-29.
  4. Rusk, John, An Experiment in Think-First Development, AgileWiki Blog, 2014-12-01.
  5. Hui, Roger, and Kenneth E. Iverson, J Introduction and Dictionary,1990-2014; \. entry
  6. Hui, Roger, Identity Matrix, Jwiki Essay, 2005-11-18.
  7. Scholes, John, Merge, Dfns, 2012-03-30.
  8. Legrand, Bernard, Mastering Dyalog APL, 2009-11.

NOTE: A full description and tutorial of the language can be found in [7].