This post is the continuation of the series where we examine some of the problems selected for the 2014 APL Problem Solving Competition.
The problems presented in Phase 1 of the competition were selected because they could be solved succinctly, generally in a single line of APL code. This makes them well suited for experimentation on TryAPL.org.
Problem 2 of Phase 1, entitled “How tweet it is” reads
“Twitter messages have a 140 character limit – what if the limit was even shorter? One way to shorten the message yet retain most readability is to remove interior vowels from its words. Write a dfn which takes a character vector and removes the interior vowels from each word.”
Test cases:
{your_solution} 'if you can read this, it worked!'
if yu cn rd ths, it wrkd!
{your_solution} 'APL is REALLY cool'
APL is RLLY cl
{your_solution} '' ⍝ an empty vector argument should return an empty vector
{your_solution} 'a' ⍝ your solution should work with a single character message
a
We’ll examine a couple of approaches to this problem – one that’s more “traditional APL” code, and another that makes use of a really helpful Dyalog feature.
This problem could be restated as “find and remove the vowels that aren’t at the beginning or end of a word”. To start with, we need to determine where the words are and where the vowels are. A word is a contiguous set of letters; multiple words are separated by spaces or punctuation. For simplicity’s sake, we’ll ignore contractions and possessives.
The “Traditional APL” Approach
This approach employs a technique that is not commonly found outside of APL and its brethren – using a Boolean vector to determine which elements to remove or keep. First, let’s find where all the vowels are:
string←'If you can read this, it worked!'
vowels←{⍵∊'aeiouAEIOU'}
vowels string
1 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0
To help illustrate what’s happening, I’ll write a little operator called “show” to compactly display the string, the Boolean vector, and the elements that would be selected by applying the Boolean to the string.
show←{⍵⍪⍵{↑(1 0⍕ ⍵)(⍵\⍵/⍺)}⍺⍺ ⍵}
vowels show string
If you can read this, it worked!
10001100100011000010001000100100
I ou a ea i i o e
Next we want to remove vowels that aren’t at either end of a word. First, find where the words are by finding where the letters are. There are several ways to do this; the most obvious may be to use a character vector constant:
letters←{⍵∊'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'}
Long character constants seem a bit awkward to me. So, another technique uses the Unicode Conversion system function to return the 26 characters starting at the code points for each of ‘a’ and ‘A’:
letters←{⍵∊⎕UCS (⍳26)∘.+¯1+⎕UCS'aA'}
Yet another way might be to use the code point values directly and do numerical operations:
letters←{{((⍵≥65)∧⍵≤90)∨(⍵≥97)∧⍵≤122}⎕UCS ⍵}
Which technique you choose is largely a matter of taste and style. All three return the same result and have comparable performance. My personal preference is the second one – it has fewer characters for me to mistype 🙂
letters show string
If you can read this, it worked!
11011101110111101111001101111110
If you can read this it worked
So now let’s mark the interior letters of the words. This employs a technique known as shift and compare that I learned in the early 1980s when I was privileged to work with Bob Smith. Among Bob’s many contributions to the APL world was a book on Boolean Functions and Techniques. To mark the interior letters, we’ll do both a right and left shift:
interior←{⍵∧(¯1↓0,⍵)∧1↓⍵,0}
{interior letters ⍵} show string
If you can read this, it worked!
00001000100011000110000000111100
o a ea hi orke
The last step is to find interior vowels and negate:
{(vowels ⍵)∧interior letters ⍵} show string
If you can read this, it worked!
00001000100011000010000000100100
o a ea i o e
{(vowels ⍵)⍲interior letters ⍵} show string
If you can read this, it worked!
11110111011100111101111111011011
If y u c n r d th s, it w rk d!
Putting it all together…
tweet←{⍵/⍨~(⍵∊'aeiouAEIOU')∧{(1↓⍵,0)∧¯1↓0,⍵}⍵∊⎕UCS(⍳26)∘.+¯1+⎕UCS'aA'}
tweet string
If yu cn rd ths, it wrkd!
The Other Technique – Using Regular Expressions
In version 13.0, Dyalog introduced the system functions ⎕S
and ⎕R
as interfaces to the PCRE (Perl Compatible Regular Expression) library. Like APL, regular expressions may seem a bit alien at first, but in the years since their incorporation into Dyalog, I’ve grown to appreciate their power and flexibility – they can frequently accomplish complex string manipulations more succinctly than their APL equivalents thus furthering Dyalog’s power as a tool of thought, notation and execution.
tweet←{('\B[AEIOU]\B' ⎕R '' ⍠ 1) ⍵}
tweet string
If yu cn rd ths, it wrkd!
The expression above replaces any vowel (⍠ 1
means case-insensitive) that is not at the beginning or end of a word with the empty vector, effectively removing the interior vowels. A blog post is not enough space to give an adequate overview of regular expressions. But I hope the expression above piques your interest and encourages you to experiment with ⎕S
and ⎕R
on TryAPL.org or with a Dyalog system of your own.