This post is the continuation of the series where we examine some of the problems selected for the 2014 APL Problem Solving Competition. In this post we’ll continue looking at the cryptography problems from Phase II that we started looking at in a previous blog post.
Cryptography Problem 2 – Book Cipher Variation
Task 1 – Let’s Get Normal
The first task is to normalise some text by weeding out non-alphabetic characters, collapsing consecutive spaces and converting to upper case. It’s possible to do this all by hand with APL, but it’s much easier to use the ⎕R
operator to search for and replace regular expressions.
Taking the transformations in turn, here’s how to convert non-alphabetic characters in message ⍵
to spaces:
('[^[:alpha:]]'⎕R' ')⍵
Here’s how to convert multiple consecutive spaces to a single space:
(' +'⎕R' ')⍵
And here’s how to convert every alphabetic character to upper case:
('.'⎕R'\u&')⍵
We can combine the first two of these, by converting any sequence of one or more non-alphabetic characters to a single space, giving the following implementation:
Normalise←{
text←⎕SE.UnicodeFile.ReadText ⍵
('[^[:alpha:]]+' '.'⎕R' ' '\u&'⍠'Mode' 'D')text
}
The option 'Mode' 'D'
tells ⎕R
to operate in Document mode, which processes the whole file at once instead of line by line, as we are not interested in preserving the original line breaks. Here it is in action:
70↑bor←Normalise'/home/jay/Desktop/BillOfRights.txt'
THE PREAMBLE TO THE BILL OF RIGHTS CONGRESS OF THE UNITED STATES BEGUN
Task 2 – Encryption
In this cipher there are lots of different ways of encoding each character of the message, and we are free to pick any of them. In order to try to “minimise the number of duplicated pairs in the result”, we simply pick randomly whenever we have a free choice. The function pickone
helps with this. Given a boolean vector, it first uses {⍵/⍳⍴⍵}
to get a vector of the indices of all the 1 bits, and then uses {⍵[?≢⍵]}
to choose one of these indices at random.
In this coding of BookEncrypt
, the anonymous inner dfn encodes a single character of the message into a (word offset) pair. These pairs are joined together with ⊃,/
, a common pattern for catenating strings. The Disclose is required because, in Dyalog, reduction always reduces the rank of its argument, so ,/
on a vector of strings returns a scalar: the enclose of the catenated strings.
BookEncrypt←{
pickone←{⍵[?≢⍵]}∘{⍵/⍳⍴⍵}
b←' ',⍺ ⍝ b has a space wherever a word starts in the key.
s←{⍵/⍳⍴⍵}b=' ' ⍝ Get the indices of all the word starts.
⊃,/{
p←pickone b=⍵ ⍝ Choose a random occurrence of letter ⍵
p-←s ⍝ and get its offset within each word.
w←pickone{(0<⍵)∧⍵≤20}p ⍝ Choose a word with a reasonable offset
w(p[w]) ⍝ and return the (word offset) pair.
}¨⍵ ⍝ ... for each letter in the message.
}
Here it is in action:
⊢cipher←bor BookEncrypt 'MYSECRETMESSAGE'
480 11 523 11 440 6 115 5 78 16 579 18 696 20 330 16 544 4 658 17 400 9 661 11
246 18 186 4 482 13
Task 3 – Decryption
Decryption is simpler then encryption, because there is no need to make random choices. All we have to do is:
- Find the index of the start of each word in the key, as before.
- Split the input into pairs of numbers.
- For each pair, find the character in the key at the specified offset from the start of the specified word.
There are various ways to split the input into pairs of numbers. Here, we do it with the Rank operator (⍤
). Encrypting an N-character message gives a vector of 2×N numbers. To split it into pairs we first reshape it into a matrix with N rows and 2 columns; and then use f⍤1
to apply f
to the rank-1 subarrays of this matrix, which are its row vectors.
Here’s the code:
BookDecrypt←{
b←' ',¯1↓⍺ ⍝ b has a space wherever a word starts in the key.
s←{⍵/⍳⍴⍵}b=' ' ⍝ Get the indices of all the word starts.
{
(w o)←⍵ ⍝ Get word number and offset
b[s[w]+o] ⍝ and find the character at that position
}⍤1⊢(0.5×≢⍵)2⍴⍵ ⍝ ... for each pair of numbers in the input.
}
And here it is in action:
bor BookDecrypt cipher
MYSECRETMESSAGE
To be continued…