Mind Boggling Performance

Or is it Minding Boggle Performance?

Better late than never? This was a blog post I started to write during COVID-19 and now I’ve finally gotten around to finishing it.

In the 2019 APL Problem Solving Competition, we presented a problem to solve the Boggle game . In Boggle, a player tries to make as many words as possible from contiguous letters in a 4×4 grid, with the stipulation that you cannot reuse a position on the board.

Rich Park’s webinar from 17 October 2019 presents, among other things, a very good discussion and comparison of two interesting solutions submitted by Rasmus Précenth and Torsten Grust. As part of that discussion, Rich explores the performance of their solutions. After seeing that webinar, I was curious about how my solution might perform.

Disclaimer

Please take note that performance was not mentioned as one of the criteria for this problem other than the implicit expectation that code completes in a reasonable amount of time. As such, this post is in no way intended to criticize anyone’s solutions – in fact, in many cases I’m impressed by the elegance of the solutions and their application of array-oriented thinking. I have no doubt that had we made performance a primary judging criterion, people would have taken it into consideration and possibly produced somewhat different code.

Goals

I started writing APL in 1975 at the age of 14 and “grew up” in the days of mainframe APL when CPU cycles and memory were precious commodities. This made me develop an eye towards writing efficient code. In developing my solution and writing this post, I had a few goals in mind:

  • Use straightforward algorithm optimizations and not leverage or avoid any specific features in the interpreter. Having a bit of understanding about how APL stores its data helps though.
  • Illustrate some approaches to optimization that may be generally applicable.
  • Encourage discussion and your participation. I don’t present my solution as the paragon of performance. I’m sure there are further optimizations that can be made and hope you’ll (gently) suggest some.

The task was to write a function called FindWords that has the syntax:

      found←words FindWords board

where:

  • words is a vector of words. We used Collins Scrabble Words, a ≈280,000-word word list used by tournament Scrabble™ players. We store this in a variable called AllWords. Note that single letter words like “a” and “I” are not legitimate Scrabble words.
  • board is a matrix where each cell contains one or more letters. A standard Boggle board is 4×4.
  • the result, found is a vector that is a subset of words containing the words that can be made from board without revisiting any cells.

Although the actual Boggle game uses only words of 3 letters or more, for this problem we permit words of 2 or more letters.

Here’s an example of a 2×2 board:

     AllWords FindWords ⎕← b2← 2 2⍴'th' 'r' 'ou' 'gh'
┌──┬──┐
│th│r │
├──┼──┤
│ou│gh│
└──┴──┘
┌──┬───┬────┬─────┬─────┬──────┬───────┐
│ou│our│thou│rough│routh│though│through│
└──┴───┴────┴─────┴─────┴──────┴───────┘

First, let’s define some variables that we’ll use in our exploration:

      b4← 4 4⍴ 't' 'p' 'qu' 'a' 's' 'l' 'g' 'i' 'r' 'u' 't' 'e' 'i' 'i' 'n' 'a' ⍝ 4×4 board
      b6← 6 6⍴'jbcdcmvueglxriybgeiganuylvonxkfeoqld' ⍝ 6×6 board

If you’re using Dyalog v20.0 or later, you can represent this using array notation:

⍝ using array notation with single-line input:
      b4←['t' 'p' 'qu' 'a' ⋄ 's' 'l' 'g' 'i' ⋄ 'r' 'u' 't' 'e' ⋄ 'i' 'i' 'n' 'a']
      b6←['jbcdcm' ⋄ 'vueglx' ⋄ 'riybge' ⋄ 'iganuy' ⋄ 'lvonxk' ⋄ 'feoqld']

⍝ or, using array notation with multi-line input:
      b4←['t' 'p' 'qu' 'a'
          'slgi'
          'rute'
          'iina']

      b6←['jbcdcm'
          'vueglx'
          'riybge'
          'iganuy'
          'lvonxk'
          'feoqld']

The representation does not affect the performance or the result:

      b4 b6
┌──────────┬──────┐
│┌─┬─┬──┬─┐│jbcdcm│
││t│p│qu│a││vueglx│
│├─┼─┼──┼─┤│riybge│
││s│l│g │i││iganuy│
│├─┼─┼──┼─┤│lvonxk│
││r│u│t │e││feoqld│
│├─┼─┼──┼─┤│      │
││i│i│n │a││      │
│└─┴─┴──┴─┘│      │
└──────────┴──────┘

There were 9 correct solutions submitted for this problem. We’ll call them f1 through f9 – my solution is f0. Now let’s run some comparative timings using cmpx from the dfns workspace. cmpx will note whether the result of any of the latter expressions returns a different result from the first expression. We take the tally () of the resulting word lists to make sure the expressions all return the same result. We assume that the sets of words are the same if the counts are the same. These timings were done using Dyalog v20.0 with a maximum workspace (MAXWS) of 1GB running under Windows 11 Pro. To keep the expressions brief I bound AllWords as the left argument to each of the solution functions:

      f0←≢AllWords∘#.Brian.Problems.FindWords

To make it easier to run timings, I wrote a simple function to call cmpx with the solutions of my choosing (the default is all solutions).

      )copy dfns cmpx
      time←{⍺←¯1+⍳10 ⋄ cmpx('f',⍕,' ',⍵⍨)¨⍺}

This allows me to compare any 2 or more solutions, or by default, all solutions on a given board variable name.

      time 'b4' ⍝ try a "standard" 4×4 Boggle board
  f0 b4 → 1.2E¯2 |      0%
  f1 b4 → 2.3E¯1 |  +1791% ⎕⎕
  f2 b4 → 4.3E¯1 |  +3491% ⎕⎕⎕
  f3 b4 → 7.3E¯1 |  +5941% ⎕⎕⎕⎕⎕                                    
  f4 b4 → 5.6E0  | +46600% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  f5 b4 → 1.1E0  |  +8758% ⎕⎕⎕⎕⎕⎕⎕⎕                                 
  f6 b4 → 8.9E¯1 |  +7325% ⎕⎕⎕⎕⎕⎕                                   
  f7 b4 → 1.1E0  |  +9275% ⎕⎕⎕⎕⎕⎕⎕⎕                                 
  f8 b4 → 4.2E0  | +34750% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕           
  f9 b4 → 2.0E0  | +16291% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕                     

If we try to run on the 6×6 sample

      0 1 2 3 4 5 6 7 9 time'b6'
  f0 b6 → 2.2E¯2 |       0%                                          
  f1 b6 → 3.7E¯1 |   +1577%                                          
  f2 b6 → 1.0E0  |   +4581%                                          
  f3 b6 → 1.5E0  |   +6872%                                          
  f4 b6 → 1.6E2  | +705950% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  f5 b6 → 2.3E0  |  +10409% ⎕                                        
  f6 b6 → 1.9E0  |   +8463%                                          
  f7 b6 → 2.0E0  |   +9131% ⎕
  f9 b6 → 2.4E0  |  +10822% ⎕

f8 is excluded as it would cause a WS FULL in my 1GB workspace.

Why is f0 about 16-18 times faster than the next fastest solution, f1? I didn’t set out to make FindWords fast, it just turned out that way. Let’s take a look at the code…

     ∇ r←words FindWords board;inds;neighbors;paths;stubs;nextcells;mwords;mask;next;found;lens;n;m;map
[1]    inds←⍳⍴board                                      ⍝ board indices
[2]    neighbors←(,inds)∘∩¨↓inds∘.+(,¯2+⍳3 3)~⊂0 0       ⍝ matrix of neighbors for each cell
[3]    paths←⊂¨,inds                                     ⍝ initial paths
[4]    stubs←,¨,board                                    ⍝ initial stubs of words
[5]    nextcells←neighbors∘{⊂¨(⊃⍺[¯1↑⍵])~⍵}              ⍝ append unused neighbors to path
[6]    mwords←⍉↑words                                    ⍝ matrix of candidate words, use a columnar matrix for faster ∧.=
[7]    mask←mwords[1;]∊⊃¨,board                          ⍝ mark only those beginning with a letter on the board
[8]    mask←mask\∧⌿(mask/mwords)∊' ',∊board              ⍝ further mark only words containing only letters found on the board
[9]    words/⍨←mask                                      ⍝ keep those words
[10]   mwords/⍨←mask                                     ⍝ keep them in the matrix form as well
[11]   r←words∩stubs                                     ⍝ seed result with any words that may already be formed from single cell
[12]   :While (0∊⍴paths)⍱0∊⍴words                        ⍝ while we have both paths to follow and words to look at
[13]       next←nextcells¨paths                          ⍝ get the next cells for each path
[14]       paths←⊃,/(⊂¨paths),¨¨next                     ⍝ append the next cells to each path
[15]       stubs←⊃,/stubs{⍺∘,¨board[⍵]}¨next             ⍝ append the next letters to each stub
[16]       r,←words∩stubs                                ⍝ add any matching words
[17]       mask←(≢words)⍴0                               ⍝ build a mask to remove word beginnings that don't match any stubs
[18]       found←(≢stubs)⍴0                              ⍝ build a mask to remove stubs that no words begin with
[19]       lens←≢¨stubs                                  ⍝ length of each stub
[20]       :For n :In ∪lens                              ⍝ for each unique stub length
[21]           m←n=lens                                  ⍝ mark stubs of this length
[22]           map←(↑m/stubs)∧.=n↑mwords                 ⍝ map which stubs match which word beginnings
[23]           mask∨←∨⌿map                               ⍝ words that match
[24]           found[(∨/map)/⍸m]←1                       ⍝ stubs that match
[25]       :EndFor
[26]       paths/⍨←found                                 ⍝ keep paths that match
[27]       stubs/⍨←found                                 ⍝ keep stubs that match
[28]       words/⍨←mask                                  ⍝ keep words that may yet match
[29]       mwords/⍨←mask                                 ⍝ keep matrix words that may yet match
[30]   :EndWhile
[31]   r←∪r
     ∇

Attacking the Problem

Intuitively, this felt like an iterative problem. A mostly-array-oriented solution might be to generate character vectors made up from the contents of all paths in board and then do a set intersection with words. But that would be horrifically inefficient – there are over 12-million paths in a 4×4 matrix and, in the case of b4, there are only 188 valid words. What about a recursive solution (many of the submissions used recursion)? I tend to avoid recursion unless there are clear advantages to using it, and in this case I didn’t see any advantages, clear or otherwise. So, iteration it was…

I decided to use two parallel structures to keep track of progress:

  • paths – the paths traversed through the board
  • stubs – the word “stubs” built from the contents of the cells in paths

paths is initialized to the indices of the board, and stubs is initialized to the contents of each cell. Then iterate:

  1. Keep any stubs that are in words
  2. Append the contents of each candidate’s unvisited neighboring cells to the candidates, resulting in a new candidates list
  3. Repeat until there’s nothing left to look at

Setup

First, I need to find the adjacent cells for each cell in board.

[1]    inds←⍳⍴board                                      ⍝ board indices
[2]    neighbors←(,inds)∘∩¨↓inds∘.+(,¯2+⍳3 3)~⊂0 0       ⍝ matrix of neighbors for each cell

You might recognize line [2] as a stencil-like () operation. Why, then, didn’t I use stencil? To be honest, it didn’t occur to me at the time – I knew how to code what I needed without using stencil. As it turns out, for this application, stencil is slower. The stencil expression is shorter, more “elegant”, and possibly more readable (assuming you know how stencil works), but it takes more than twice the time. Granted, this line only runs once per invocation so the performance improvement from not using it is minimal.

      inds←⍳4 4
      ]RunTime -c '(,inds)∘∩¨↓inds∘.+(,¯2+⍳3 3)~⊂0 0' '{⊂(,⍺↓⍵)~(⍵[2;2])}⌺3 3⊢inds'
                                                                                              
  (,inds)∘∩¨↓inds∘.+(,¯2+⍳3 3)~⊂0 0 → 2.2E¯5 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕                       
  {⊂(,⍺↓⍵)~(⍵[2;2])}⌺3 3⊢inds       → 5.0E¯5 | +127% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 

I wrote a helper function nextcells which, given a path, returns the unvisited cells adjacent to the last cell in the path. For example, if we have a path that starts at board[1;1] and continues to board[2;2], then the next unvisited cells for this path are given by:

      nextcells (1 1)(2 2)
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│┌───┐│┌───┐│┌───┐│┌───┐│┌───┐│┌───┐│┌───┐│
││1 2│││1 3│││2 1│││2 3│││3 1│││3 2│││3 3││
│└───┘│└───┘│└───┘│└───┘│└───┘│└───┘│└───┘│
└─────┴─────┴─────┴─────┴─────┴─────┴─────┘

A contributor to improved performance is set up next. I created a parallel transposed matrix copy of words.

[6]    mwords←⍉↑words ⍝ matrix of candidate words, use a columnar matrix for faster ∧.=

Why create another version of words and why is it transposed?

  • In general, it’s faster to operate on simple arrays.
  • Simple arrays – arrays containing only flat, primitive, data without any nested elements – are stored in a single, contiguous, block of memory. Elements are laid out contiguously in row-major order, meaning the last dimension changes fastest. For a 2D matrix, it stores the first row left-to-right, then the second row, and so on. Transposing the word matrix makes prefix searching as we look for candidates that could become valid words much more efficient. Consider a matrix consisting of the words “THE” “BIG” “DOG”. If stored one word per row, the interpreter has to “skip” to find the first letter in each word. However, in a column-oriented matrix the first letters are next to one another and likely to be in cache, making them much quicker to access.

Things Run Faster If You Do Less Work

Smaller searches are generally faster than larger ones. If we pare down words and stubs as we progress, we will perform smaller searches. The first pass at minimizing the data to be searched is done during setup – we remove any words that don’t begin with a first letter of any of board‘s cells as well as words that contain letters not found in board:

[7]    mask←mwords[1;]∊⊃¨,board               ⍝ mark only those beginning with a letter on the board
[8]    mask←mask\∧⌿(mask/mwords)∊' ',∊board   ⍝ further mark only words containing only letters found on the board
[9]    words/⍨←mask                           ⍝ keep those words
[10]   mwords/⍨←mask                          ⍝ keep them in the matrix form as well

For board b4, this reduces the number of words to be searched from 267,752 to 16,247 – a ~94% reduction. Then we iterate, appending each path’s next unvisited cells and creating new stubs from the updated paths:

[13]       next←nextcells¨paths                      ⍝ get the next cells for each path
[14]       paths←⊃,/(⊂¨paths),¨¨next                 ⍝ append the next cells to each path
[15]       stubs←⊃,/stubs{⍺∘,¨board[⍵]}¨next         ⍝ append the next letters to each stub

Append any stubs that are in words to the result:

[16]       r,←words∩stubs                                ⍝ add any matching words

Because a cell can have more than one letter, we might have stubs of different lengths, so we need to iterate over each unique length:

[19]       lens←≢¨stubs           ⍝ length of each stub
[20]       :For n :In ∪lens       ⍝ for each unique stub length

Because we’re doing prefix searching, the inner product ∧.= can tell us which stubs match prefixes of which words. Now we can see the reason for creating mwords. Since the data in mwords is stored in “raveled” format, n↑mwords quickly returns a matrix of all n-length prefixes of words:

[21]           m←n=lens                    ⍝ mark stubs of this length
[22]           map←(↑m/stubs)∧.=n↑mwords   ⍝ map which stubs match which word beginnings
[23]           mask∨←∨⌿map                 ⍝ words that match
[24]           found[(∨/map)/⍸m]←1         ⍝ stubs that match
[25]       :EndFor

We then use our two Boolean arrays, found and mask, to pare down paths/stubs and words/mwords respectively. If we look at the number of words and stubs at each step, we can see that the biggest performance gain is realized by doing less work:

┌──────────────────┬───────┬──────┐
│Phase             │≢words │≢stubs│
├──────────────────┼───────┼──────┤
│Initial List      │267,752│     0│
├──────────────────┼───────┼──────┤
│After Initial Cull│ 16,247│    16│
├──────────────────┼───────┼──────┤
│After 2-cell Cull │  7,997│    56│
├──────────────────┼───────┼──────┤
│After 3-cell Cull │  2,736│   152│
├──────────────────┼───────┼──────┤
│After 4-cell Cull │  1,159│   178│
├──────────────────┼───────┼──────┤
│After 5-cell Cull │    371│   119│
├──────────────────┼───────┼──────┤
│After 6-cell Cull │     87│    42│
├──────────────────┼───────┼──────┤
│After 7-cell Cull │     16│    10│
├──────────────────┼───────┼──────┤
│After 8-cell Cull │      2│     1│
├──────────────────┼───────┼──────┤
│All Done          │      0│     0│
└──────────────────┴───────┴──────┘

Does mwords Make Much of a Difference?

As an experiment, I decided to write a version, f10, that does not used transposed word matrix mwords (it still does the words and stubs culling). I compared it to my original version,f0, and the fastest submitted version, f1:

      0 1 10 time 'b4'
  f0  b4 → 1.4E¯2 |     0% ⎕⎕                                       
  f1  b4 → 2.2E¯1 | +1551% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  f10 b4 → 2.1E¯1 | +1422% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 

Interestingly, f10 performed remarkably close to f1. When I looked at the code for f1, I saw that the author had implemented a similar culling approach and had commented in several places that the construct was to improve performance. Good job! But this does demonstrate that maintaining a parallel, simple, copy of words makes the solution run about 15× faster.

Takeaways

There are a couple of other optimizations I could have implemented:

  • In the setup, I could have filtered out all words that were longer than ≢∊board.
  • If this FindWords was used a lot, and I could be fairly certain that words was static (unchanging), then I could create mwords outside of FindWords. The line that creates mwords consumes about half of total time of the function.

When thinking about performance and optimization:

  • Unless there’s an overwhelming reason to do so – don’t sacrifice code clarity for performance. If you implement non-obvious performance improvements, note them in comments or documentation.
  • Optimize effectively – infinitely speeding up a piece of code that contributes 1% to an application’s CPU consumption makes no real impact.
  • Consider how your data is structured and how that might affect performance. In this case, representing words as a vector of words is convenient, readable, and there aren’t those extra spaces that might occur in a matrix format. But as we saw, it performs poorly compared to a simple matrix format. Don’t be afraid to make the data conform to a more performant organization.
  • Along similar lines, consider how simple arrays are stored in contiguous memory and whether you can take advantage of that.

In case you were wondering, the two solutions Rich Park looked at in his webinar were f4 and f7 in the timings above. The fastest submission, f1, was submitted by Julian Witte. Please remember that we did not specify performance as a criterion for the problem, so this is in no way a criticism of any of the submissions.

If you’re curious to look at the code for the submissions, this .zip file includes namespaces f0f9, each of which contains a FindWords function and any needed subordinate functions (in addition to the solution namespaces, the .zip file also includes AllWords, the time function and 4 sample boards – b2,b3,b4, and b6). You can extract and use the code as follows:

  1. Unzip Submissions.zip to a directory of your choosing.
  2. In your APL session, enter:
    ]Link.Import # {the directory you chose}/Submissions
    This step might take several seconds when Link.Import brings in AllWords

You can then examine the code, run your own timings, and so on. One interesting thing to explore is which submissions properly handle 1×1 and 0×0 boards.

Postscript

When I started to write the explanation of my code, it occurred to me: “This is 2026 and we have LLMs that might be able to explain the code. Let’s give them a try…”

So, I asked each of Anthropic’s Claude Opus 4.6 Extended, Google’s Gemini Pro, and Microsoft’s Copilot Think Deeper the following:

Explain the attached code. Note that a cell in board can have multiple letters like “qu” or “ough”. Also note that the words list is the official scrabble words list and has no single letter words.

The results were interesting and, in several places, a more concise and coherent explanation than I might produce. But how accurate and useful were their explanations? Stay tuned for a blog post about how well different LLMs explain APL code!

Hacking with APL

Vassar Hackathon Poster

Vassar Hackathon Poster

Thanks to our dear friend Dr. Ray Polivka, Dan Baronet and I had the opportunity to travel to Vassar College to participate in their Community Hackathon held on 5-6 February 2016.

“What’s a hackathon?” you ask?
Well, we did too, as we’d never participated in one before.  🙂
According to the Hackathon’s announcement:

“CommunityHack is a way to bridge the gap between Vassar CS Majors and Vassar students in other departments as well as local high school and community college students in order to provide them with the opportunity to explore innovative applications of Computer Science. Dance, music, art, video games? CS can be incorporated into all of that and more!”

StuCommunityHack_Sponsorsdents from Vassar as well as nearby colleges and high schools were invited to attend. In other words, it was a great opportunity to introduce APL to a new generation.  As this was our first Hackathon, we had no idea what to expect.  Laura, the Hackathon’s organizer, did a wonderful job organizing the event and making us feel welcome. We were invited to give an hour long presentation and Dyalog was listed as an event sponsor.

The Hackathon was a 24 hour event where students were encouraged to split up into groups and pick a problem to solve.  During the course of the event, presentations were made on a variety of subjects including “Autonomous Robots Test Ideas About the Evolution of Brains”, “How to make games quick!”, “Virtual Reality”, and of course “Hacking with APL”. Friday evening started with introductions and ice-breakers. During our introduction, I was able to talk a bit about APL and the presentation we would be making on Saturday. Apparently this generated some interest as a number of students came up to Dan, Ray, Jon McGrew and me to ask about APL. We spent several hours showing them APL, to which they seemed eagerly receptive.

I had the pleasure of working with Grace, a CS sophomore, to implement APL (A Puppy Language) in APL. Her project idea was to write an application for potential puppy owners to use so they could get an idea of the responsibility of owning and caring for a puppy. We worked into the wee hours of the night and wound up implementing a multi-threaded domain-specific-language (DSL) where the “puppy”, running in a separate thread, would react to commands typed into the APL session. Negative actions and ignoring the puppy would cause the the puppy’s happiness points (PHPs) to decrease whereas positive actions would increase PHPs.   Grace seemed to really enjoy working with APL and returned to the hackathon twice on Saturday as her schedule permitted to continue work on her project.

Saturday, I was slightly concerned that following a talk on virtual reality, APL might not seem all that “cool”, but my fears were allayed for, as I was waiting before my presentation, several students asked the person at the registration desk specifically about the APL presentation.

HackingWithAPL

The presentation went rather well.  Watching the jaw-dropping “Wow!” expressions on the faces of many of the students as we showed the power of APL notation made me reminisce back to the time when I first learned APL and was amazed at what I could do with a computer.  It also reminded me how blessed I’ve been to have used APL throughout my career.

Our participation in the Hackathon was a great experience. We were able to show Dyalog to close to 100 students, promote the upcoming APL Problem Solving Competition, and encourage people to download and try Dyalog – we had 18 student downloads over the Hackathon weekend. This may have been our first Hackathon, but I’m certain it won’t be our last.

On a personal note, after Dan and I drove up to Montreal to spend the upcoming week working with the APL Tools Team, I received a very nice email from Grace where she wrote “I just wanted to thank you so much for taking the time to work with me on puppy.dws — it is currently my favorite thing that I have ever made.” and “It was really fun working in APL, and I will definitely check out the Dyalog competition.”

HackingWithAPL3

 

Solving the 2014 APL Problem Solving Competition – How Tweet It Is

This post is the continuation of the series where we examine some of the problems selected for the 2014 APL Problem Solving Competition.

The problems presented in Phase 1 of the competition were selected because they could be solved succinctly, generally in a single line of APL code. This makes them well suited for experimentation on TryAPL.org.

Problem 2 of Phase 1, entitled “How tweet it is” reads

“Twitter messages have a 140 character limit – what if the limit was even shorter? One way to shorten the message yet retain most readability is to remove interior vowels from its words. Write a dfn which takes a character vector and removes the interior vowels from each word.”

Test cases:
      {your_solution} 'if you can read this, it worked!'
if yu cn rd ths, it wrkd!
      {your_solution} 'APL is REALLY cool'
APL is RLLY cl
      {your_solution} '' ⍝ an empty vector argument should return an empty vector

      {your_solution} 'a' ⍝ your solution should work with a single character message
a

We’ll examine a couple of approaches to this problem – one that’s more “traditional APL” code, and another that makes use of a really helpful Dyalog feature.

This problem could be restated as “find and remove the vowels that aren’t at the beginning or end of a word”. To start with, we need to determine where the words are and where the vowels are. A word is a contiguous set of letters; multiple words are separated by spaces or punctuation. For simplicity’s sake, we’ll ignore contractions and possessives.

The “Traditional APL” Approach

This approach employs a technique that is not commonly found outside of APL and its brethren – using a Boolean vector to determine which elements to remove or keep. First, let’s find where all the vowels are:

      string←'If you can read this, it worked!'
      vowels←{⍵∊'aeiouAEIOU'}
      vowels string
1 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0

To help illustrate what’s happening, I’ll write a little operator called “show” to compactly display the string, the Boolean vector, and the elements that would be selected by applying the Boolean to the string.

      show←{⍵⍪⍵{↑(1 0⍕ ⍵)(⍵\⍵/⍺)}⍺⍺ ⍵}
      vowels show string
If you can read this, it worked!
10001100100011000010001000100100
I   ou  a   ea    i   i   o  e

Next we want to remove vowels that aren’t at either end of a word. First, find where the words are by finding where the letters are.  There are several ways to do this; the most obvious may be to use a character vector constant:

      letters←{⍵∊'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'}

Long character constants seem a bit awkward to me.  So, another technique uses the Unicode Conversion system function to return the 26 characters starting at the code points for each of ‘a’ and ‘A’:

      letters←{⍵∊⎕UCS (⍳26)∘.+¯1+⎕UCS'aA'}

Yet another way might be to use the code point values directly and do numerical operations:

      letters←{{((⍵≥65)∧⍵≤90)∨(⍵≥97)∧⍵≤122}⎕UCS ⍵}

Which technique you choose is largely a matter of taste and style. All three return the same result and have comparable performance. My personal preference is the second one – it has fewer characters for me to mistype 🙂

      letters show string
If you can read this, it worked!
11011101110111101111001101111110
If you can read this  it worked 

So now let’s mark the interior letters of the words. This employs a technique known as shift and compare that I learned in the early 1980s when I was privileged to work with Bob Smith. Among Bob’s many contributions to the APL world was a book on Boolean Functions and Techniques. To mark the interior letters, we’ll do both a right and left shift:

      interior←{⍵∧(¯1↓0,⍵)∧1↓⍵,0}
      {interior letters ⍵} show string
If you can read this, it worked!
00001000100011000110000000111100
    o   a   ea   hi       orke  

The last step is to find interior vowels and negate:

      {(vowels ⍵)∧interior letters ⍵} show string
If you can read this, it worked!
00001000100011000010000000100100
    o   a   ea    i       o  e  

      {(vowels ⍵)⍲interior letters ⍵} show string
If you can read this, it worked!
11110111011100111101111111011011
If y u c n r  d th s, it w rk d!

Putting it all together…

      tweet←{⍵/⍨~(⍵∊'aeiouAEIOU')∧{(1↓⍵,0)∧¯1↓0,⍵}⍵∊⎕UCS(⍳26)∘.+¯1+⎕UCS'aA'}
      tweet string
If yu cn rd ths, it wrkd!

The Other Technique – Using Regular Expressions

In version 13.0, Dyalog introduced the system functions ⎕S and ⎕R as interfaces to the PCRE (Perl Compatible Regular Expression) library. Like APL, regular expressions may seem a bit alien at first, but in the years since their incorporation into Dyalog, I’ve grown to appreciate their power and flexibility – they can frequently accomplish complex string manipulations more succinctly than their APL equivalents thus furthering Dyalog’s power as a tool of thought, notation and execution.

      tweet←{('\B[AEIOU]\B' ⎕R '' ⍠ 1) ⍵}
      tweet string
If yu cn rd ths, it wrkd!

The expression above replaces any vowel (⍠ 1means case-insensitive) that is not at the beginning or end of a word with the empty vector, effectively removing the interior vowels. A blog post is not enough space to give an adequate overview of regular expressions. But I hope the expression above piques your interest and encourages you to experiment with ⎕S and ⎕R on TryAPL.org or with a Dyalog system of your own.

Simply A-maze-ing

maze

One of many things I like about APL is that it’s fun to use for recreational computing. I will frequently happen upon an interesting problem, puzzle, or piece of code and consider how I might implement it in APL.

I was thinking about how to generate mazes for an idea I have about a game to help kids learn APL (that may be a topic for a future blog entry). Anyhow, I found an interesting web page about maze generating algorithms where the author found one, the “Growing Tree Algorithm“, to be of particular interest. His page included roughly 100 lines Ruby code to implement the algorithm. The algorithm can be boiled down to:

  • Seed a list of visited cells with a cell selected at random
  • While there are unvisited cells
    • If the current cell has any unvisited neighboring cells
      • Select one at random
      • Remove the wall between the cells
      • Add the new cell to the list of visited cells
    • Otherwise backtrack (drop from the visited cell list) until you find a cell with an unvisited neighbor

Here’s a clip of the algorithm implemented in APL building a 10×10 maze.
Notice how whenever it hits a “dead end” it backtracks to the last cell that hasn’t been visited.

What might an APL approach to this algorithm look like? How to represent the maze? My first thought was to separate the maze generation from the drawing. After an hour (or so 😉 ) of tinkering, I’d come up with something that seemed to work pretty well and took about a dozen lines of code.

When I originally thought about writing this blog entry, I was going to launch into a discussion of the code and I realized that it might get lengthy and (egad!) boring. So instead, I’ll highlight a couple of the clever bits, show you the core maze generation code, and point you at the complete namespace for your own amusement and experimentation.

First the clever bits (at least I hope they’re clever)…

  • Represent the cells of the maze as an integer matrix where each element is an encoding of the walls surrounding each cell.  Use powers of 2 for the encoding.
  • Precalculate the indices of the neighboring cells around each cell so the core loop only has to use indexing and no on-the-fly computation.
  • Write a function to remove the common wall between two cells. I originally named the function “Reagan” (after President Reagan’s 1987 exhortation “Mr. Gorbachev, tear down this wall”), but in the spirit of political mindfulness, I renamed it “dewall”.

The core code for the maze generation looks like this:

∇ cells←maze size;valid;neighbors;dewall;visited;current;n;choices;next
 ⍝ Maze - modeled from http://weblog.jamisbuck.org/2011/1/27/maze-generation-growing-tree-algorithm
 ⍝ BPB  - Dec2014
 ⍝ size  - size of the maze in rows/cols
 ⍝ cells   - (2⍴size) integer matrix describing the walls around each cell using powers of 2
 ⍝       1
 ⍝     ┌───┐         ┌───┐
 ⍝   8 │   │ 2       │   │ = 11 = 1 + 2 + 8
 ⍝     └───┘
 ⍝       4
  size←2⍴size  ⍝ set size to square if singleton supplied as argument     

  valid←{{⍵/⍨⍵≢¨⊂⍬}size∘{(0∊⍵)⍱⍵∨.>⍺:⍵ ⋄ ⍬}¨⍵} ⍝ determine if a maze coordinate is valid
  neighbors←valid¨↓(⍳size)∘.+,1 ¯1∘.×1 0⌽¨⊂1 0 ⍝ calculate neighbors for each cell
     
  dewall←{{⍵[2]=0:{(1=⍵)⌽4 1}⍵[1] ⋄ {(1=⍵)⌽2 8}⍵[2]}⍺-⍵}  ⍝ remove wall between cells
     
  cells←size⍴15 ⍝ all cells start with 4 walls
     
  visited←,⊂?size ⍝ random starting cell
     
  :While 15∊cells       ⍝ while we still have cells to examine
      current←1↑visited   ⍝ pop the most recent cell
      :If 0=n←⍴choices←cells{⍵/⍨15=⍺[⍵]}current⊃neighbors ⍝ does it have any unvisited neighbors?
          visited↓⍨←1       ⍝ if none, backtrack
      :Else
          visited,⍨←next←choices[?n] ⍝ otherwise, add the new cell to the front of the list
          cells[next current]-←⊃next⊃.dewall current ⍝ update cell values for which wall was removed
      :EndIf
  :EndWhile
∇

You can get all the code in my maze generating namespace from GitHub. Save a local copy and use the SALT Load command to load it into your workspace, or just cut and paste it into your own namespace script with the editor. The maze namespace contains the following functions of interest:

 

      intmat←{animate} maze size
  • animate is an optional Boolean to indicate whether to animate the maze generation
  • size is the size (rows,cols) of the maze to generate; a single number generates a square maze
  • intmat is the integer matrix representation of the maze

For example: mat←1 maze 10 animates and generates a 10×10 maze

 

      z←{line} draw intmat
  • line is an optional Boolean that indicates:
    • 1 = use line-drawing characters
    • 0 = use ASCII characters
  • intmat is an integer matrix representation of a maze
  • z is the drawing of the maze in ASCII or line-drawing characters

For example: pic←1 draw mat produces a line-drawing of the maze generated in the example above

 

      z←html intmat
  • intmat is an integer matrix representation of a maze
  • z is the HTML necessary to render the maze in a web browser. Save it to a native file and open the file in your browser.

For example: h←html mat produces an HTML representation of the maze generated in the example above

 

Solving the 2014 APL Problem Solving Competition – It’s All Right

This post is the continuation of the series where we examine some of the problems selected for the 2014 APL Problem Solving Competition.

The problems presented in Phase 1 of the competition were selected because they could be solved succinctly, generally in a single line of APL code. This makes them well suited for experimentation on TryAPL.org.

pythProblem 1 of Phase 1, entitled “It’s all right” reads,

“Write a dfn that takes the length of the legs of a triangle as its left argument, and the length of the hypotenuse as its right argument and returns 1 if the triangle is a right triangle, 0 otherwise.”

Test cases:
      3 4 {your_solution} 5
1
      2 3 {your_solution} 4
0

This uses the Pythagorean theorem – A² + B² = C². It’s trivial to implement an almost direct translation of this in APL – in a dfn, using ⍺[1] for A, ⍺[2] for B and for C yields:

right←{((⍺[1]*2)+(⍺[2]*2))=⍵*2}

This seems rather clunky though… what if we consider the problem as “Are the sums of the squares of each argument equal?” To get the sum of the squares, first we can use the composition *∘2 (composing the power function * with the constant 2) to mean “square” and +/ to mean “sum”, and combine them in a 2-item function train (also known as an “atop”): ((+/)*∘2)

then apply this to each argument:   ((+/)*∘2)¨⍺ ⍵

and compare the results for equality, resulting in the dfn:

right1←{=/((+/)*∘2)¨⍺ ⍵}

Still this seems clunky to me. Let’s see…squaring a number is just multiplying the number by itself, so, if we use the monadic commute operator with multiplication,   ×⍨
we get the equivalent of squaring. Then we can use that function in an inner product with addition to also get “sum of the squares”:   +.×⍨

The rest is essentially the same as in right1 above:

right2←{=/+.×⍨¨⍺ ⍵}

All three of these solutions solve the contest problem. The first one, right, is not commutative though – the legs of the triangle must be supplied as the left argument. However, right1 and right2 are commutative and the order of arguments is not significant.

Solving the 2014 APL Problem Solving Competition – it’s as easy as 1 1 2 3…

Competition LogoThe winners of the 2014 APL Problem Solving Competition were recognized at the Dyalog ’14 user meeting and had a chance to share their competition experiences. Emil Bremer Orloff won the student competition and received $2500 USD and an expenses-paid trip to Dyalog ’14, while Iryna Pashenkovska took first place among the non-student entries and received a complimentary registration to Dyalog ’14. Our congratulations to them for their fine efforts! This post is the first of a series where we will examine some of the problems selected for this year’s competition. The problem we’ll discuss first is Problem 3 from Phase I dealing with the Fibonacci sequence.

Problem 3 – Tell a Fib Write a dfn that takes an integer right argument and returns that number of terms in the Fibonacci sequence. Test cases:

      {your_solution} 10
 1 1 2 3 5 8 13 21 34 55
      {your_solution} 1
 1
      {your_solution} 0   ⍝ should return an empty vector

Essentially, you start with the sequence 1 1 and concatenate the sum of the last 2 numbers to the end and repeat until the sequence has the correct number of terms. In Python, a solution might look something like this:

def fib(n): # return the first n elements of the Fibonacci sequence
    result = []
    a, b = 0, 1
    while 0 < n:
        result.append(b)
        a, b = b, a+b
        n -= 1
    return result

You can write nearly the same code in APL:

r←fibLooping n;a;b
  r←⍬
  (a b)←0 1
:While 0<n
  r,←b
  (a b)←b,a+b
  n-←1
:EndWhile

While it’s possible to write APL code that looks like Python (or many other languages), one of the judging criteria for the competition is the effective use of APL syntax – in this case, by avoiding the explicit loop. Two ways to do this are 1) using recursion and 2) using the power operator ().

Recursive Solution

      fibRecursive←{⍵<3:⍵⍴1 ⋄ {⍵,+/¯2↑⍵}∇⍵-1}

The neat thing about recursion is that the function keeps calling itself with a “simpler” argument until it reaches a base case and then passes the results back up the stack. Here the base case occurs when the argument () is less than 3 and the function returns ⍵⍴1 – it’s either an empty vector, 1, or 1 1 for values of of 0, 1 and 2 respectively. It’s easy to illustrate the recursive process by adding some code (⎕←) to display information at each level of recursion.

      fibRecursive←{⍵<3:⎕←⍵⍴1 ⋄ {⎕←⍵,+/¯2↑⍵}∇⎕←⍵-1}
      fibRecursive 10
9
8
7
6
5
4
3
2
1 1
1 1 2
1 1 2 3
1 1 2 3 5
1 1 2 3 5 8
1 1 2 3 5 8 13
1 1 2 3 5 8 13 21
1 1 2 3 5 8 13 21 34
1 1 2 3 5 8 13 21 34 55

The recursive solution above is termed “stack recursive” in that it creates a new level on the function call stack for each recursive call. We can modify the code slightly to implement a “tail recursive” solution which Dyalog can detect and optimize by avoiding creating those additional function call stack levels. You can see the effect of this as each level computes its result immediately:

      fibTail←{⍺←0 1 ⋄ ⍵=0:⎕←1↓¯1↓⍺ ⋄ (⎕←⍺,+/¯2↑⍺)∇ ⎕←⍵-1}
      fibTail 10
        fibTail 10
9
0 1 1
8
0 1 1 2
7
0 1 1 2 3
6
0 1 1 2 3 5
5
0 1 1 2 3 5 8
4
0 1 1 2 3 5 8 13
3
0 1 1 2 3 5 8 13 21
2
0 1 1 2 3 5 8 13 21 34
1
0 1 1 2 3 5 8 13 21 34 55
0
0 1 1 2 3 5 8 13 21 34 55 89
1 1 2 3 5 8 13 21 34 55

Power Operator Solution

      fibPower←{⍵⍴({⍵,+/¯2↑⍵}⍣(0⌈⍵-1))1}

The power operator is defined as {R}←{X}(f⍣g)Y. When the right operand g is a numeric integer scalar – in this case (0⌈⍵-1), the power operator applies its left operand function f{⍵,+/¯2↑⍵} cumulatively g times to its argument Y, which in this case is 1. Similar to the recursive solution, we can add ⎕← to see what happens at each application:

      fibPower←{⍵⍴({⎕←⍵,+/¯2↑⍵}⍣(0⌈⍵-1))1}
      fibPower 10
1 1
1 1 2
1 1 2 3
1 1 2 3 5
1 1 2 3 5 8
1 1 2 3 5 8 13
1 1 2 3 5 8 13 21
1 1 2 3 5 8 13 21 34
1 1 2 3 5 8 13 21 34 55
1 1 2 3 5 8 13 21 34 55

In discussing this blog post with my fellow Dyalogers, Roger Hui showed me a rather neat construct:

      fibRoger←{1∧+∘÷\⍵⍴1}     ⍝ Roger's original version
      fibBrian←{1∧÷(+∘÷)\⍵⍴1}  ⍝ tweaked to make result match contest examples
      fibRoger 10
1 2 3 5 8 13 21 34 55 89
      fibBrian 10
1 1 2 3 5 8 13 21 34 55

I’ve added the parentheses around +∘÷ to make it a bit clearer what the left operand to the scan operator \ is. Let’s break the code down…

      ⍵⍴1 ⍝ produces a vector of ⍵ 1's 
1 1 1 1 1 ⍝ for ⍵=5

Then we apply scan with the function +∘÷, which in this case has the effect of adding 1 to the reciprocal of the previous result:

      (+∘÷)\1 1 1 1 1
1 2 1.5 1.666666667 1.6

Note that the denominator of each term is the corresponding element of the Fibonacci series…

1 ÷ 1 = 1
2 ÷ 1 = 2
3 ÷ 2 = 1.5
5 ÷ 3 = 1.6666666667
8 ÷ 5 = 1.6

To extract them, take the reciprocal and then the LCM (lowest common multiple) with 1.

      1∧÷1 2 1.5 1.666666667 1.6
1 1 2 3 5

What happens if you want to accurately compute larger Fibonacci numbers? There’s a limit to the precision based on the internal number representations. Because fibRoger and fibBrian delve into floating point representation, they’re the most limited. (Roger points out that the J language has native support for extended precision and does not suffer from this limitation.)

Dyalog has a system variable, ⎕FR, that sets the how floating-point operations are performed using either IEEE 754 64-bit floating-point operations or IEEE 754-2008 128-bit decimal floating-point operation. For most applications, 64-bit operations are perfectly acceptable, but in applications that demand a very high degree of precision, 128-bit operations can be used, albeit at some performance degradation. Using 64-bit operations, fibBrian loses accuracy after the 35th term while using 128-bits extends the accuracy to 68 terms.

Even setting ⎕FR to use 128-bit operations and the print precision, ⎕PP, to its maximum, we can only compute up to the 164th element 34-digit 8404037832974134882743767626780173 before being forced into scientific notation.

Can we go further easily? Yes we can! Using Dyalog for Microsoft Windows’ .NET integration, we can seamlessly make use of the BigInteger class in the System.Numerics .NET namespace.
First we need to specify the namespace and where to look for it…

      ⎕USING←'System.Numerics,system.numerics.dll'

Then we make a trivial change to our code to use BigInteger instead of native data types…

      fibRecursive←{⍵<3:⍵⍴⎕NEW BigInteger 1 ⋄ {⍵,+/¯2↑⍵}∇ ⍵-1}

And we’re done!

So the next time someone walks up to you and asks “What can you tell me about the 1,000th element of the Fibonacci series?”, you can confidently reply that it has 209 digits and a value of
43466557686937456435688527675040625802564660517371780402481729
08953655541794905189040387984007925516929592259308032263477520
96896232398733224711616429964409065331879382989696499285160037
04476137795166849228875.

For other APL Fibonacci implementations, check out this page.