A Dialog on APL

A discussion between Nicolas Delcros and Roger Hui

Nicolas, Prologue: From a language point of view, thanks to Ken Iverson, it is obvious that you want grade rather than sort as a primitive. Yet from a performance point of view, sort is currently faster than grade.

Can one be “more fundamental” than the other? If so, who’s wrong…APL or our CPUs? In any case, what does “fundamental” mean?

Roger: Sorting is faster than grading due to reasons presented in the J Wiki essay Sorting versus Grading, in the paragraph which begins “Is sorting necessarily faster than grading?” I can not prove it in the mathematical sense, but I believe that to be the case on any CPU when the items to be sorted/graded are machine units.

Nicolas, Formulation 1: Now, parsing ⍵[⍋⍵] (and not scanning the idiom) begs the question of how deeply an APL intepreter can “understand” what it’s doing to arrays.

How would an APL compiler resolve this conjunction in the parse tree? Do you simply have a bunch of state pointers such as “is the grade of” or “is sorted” or “is squozen” or “axis ordering” etc. walking along the tree? If so, do we have an idea of the number of such state pointers required to exhaustively describe what the APL language can do to arrays? If not, is there something more clever out there?

Roger: I don’t know of any general principles that can tell you what things can be faster. I do have two lists, one for J and another for Dyalog. A big part of the lists consists of compositions of functions, composition in the mathematical sense, that can be faster than doing the functions one after the other if you “recognize” what the composed function is doing and write a native implementation of it. Sort vs. grade is one example (sort is indexing composed with grade). Another one is (⍳∘1 >) or (1 ⍳⍨ >). The function is “find the first 1” composed with >. These compositions have native implementations and:

      x←?1e6⍴1e6

      cmpx '5e5(1 ⍳⍨ >)x' '(5e5>x)⍳1' 
5e5(1 ⍳⍨ >)x → 0.00E0  |       0%
(5e5>x)⍳1    → 1.06E¯2 | +272100% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

      cmpx '¯1(1 ⍳⍨ >)x' '(¯1>x)⍳1' 
¯1(1 ⍳⍨ >)x → 2.41E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(¯1>x)⍳1    → 4.15E¯3 | +71% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

If you get a “hit” near the beginning, as would be the case with 5e5, you win big. Even if you have to go to the end (as with ¯1), you still save the cost of explicitly generating the Boolean vector and then scanning it to the end.

Another one, introduced in 14.1, is:

      cmpx '(≢∪)x' '≢∪x'
(≢∪)x → 4.43E¯3 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
≢∪x   → 1.14E¯2 | +157% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

This is the tally nub composition, used often in a customer application. If you “know” that you just want the tally of the nub (uniques), you don’t actually have to materialise the array for the nub.

I am not conversant with compiler technology so I don’t know what all an APL compiler can do. I do know that there’s a thing call “loop fusion” where, for example, in a+b×c÷2, it doesn’t have to go through a,b,c in separate loops, but can instead do

    :for i :in ⍳≢a ⋄ z[i]←a[i]+b[i]×c[i]÷2 ⋄ :endif

saving on the array temp on every step. You win some with this, but I think the function composition approach wins bigger. On the other hand, I don’t know that there is a general technique for function composition. I mean, what general statements can you make about what things can be faster (AKA algebraically simpler)?

Nicholas: I sort of see…so jot is a “direct” conjunction. An indirect conjunction could be ⍵[⍺⍴⍋⍵] where the intermediate grade is reshaped. We “know” that grade and shape are “orthogonal” and can rewrite the formula to ⍴⍵[⍋⍵].

So if we can establish a list of flags, and establish how primitives touch these flags, and how these flags affect each other, then we can extend to any path of the parse tree, provided that the intermediate nodes don’t destroy the relationship between the two operations (here grade + indexing provides sort, independently of the reshape).

Of course we can spend our lives finding such tricks. Or we could try and systemise it.

Roger: What you are saying above is that reshape and indexing commute (i.e. reshape indexing ←→ indexing reshape). More generally, compositions of the so-called structural functions and perhaps of the selection functions are amenable to optimisations. This would especially be the case if arrays use the “strided representation” described in Nick Nickolov’s paper Compiling APL to JavaScript in Vector. I used strided representation implicitly to code a terse model of ⍺⍉⍵ in 1987.

Nicolas, Formulation 2: On which grounds did the guys in the ’50s manage to estimate the minimal list of operations that you needed to express data processing?

Roger: APL developed after Ken Iverson struggled with using conventional mathematical notation to teach various topics in data processing. You can get an idea of the process from the following papers:

In our own humble way, we go through a similar process: We talk to customers to find out what problems they are faced with, what things are still awkward, and think about what if anything we can do to the language or the implementation to make things better. Sometimes we come up with a winner, for example . You know, the idea for (grade) is that often you don’t just use ⍋x to order x (sort) but you use ⍋x to order something else. Similarly with , you often don’t want just x⍳y but you use it to apply a function to items with like indices. The J Wiki essay Key describes how the idea arose in applications, and then connected with something I read about, the Connection Machine, a machine with 64K processors (this was in the 1980s).

Nicolas, Formulation 3: Do we have something with a wider spectrum than “Turing complete or not” to categorise the “usefulness and/or efficiency” of a language?

Roger: Still no general principles, but I can say this:

  • Study the languages of designers whom you respect, and “borrow” their primitives, or at least pay attention to the idea. For example, =x is a grouping primitive in k. Symbols are Arthur Whitney’s most precious resource and for him to use up a symbol is significant.

    =x ←→ {⊂⍵}⌸x   old k definition

    =x ←→ {⍺⍵}⌸x   current k definition

    Both {⊂⍵}⌸x (14.0) and {⍺⍵}⌸x (14.1) are supported by special code.

  • Study the design of machines. For a machine to make something “primitive” is significant. For example, the Connection Machine has an instruction “generalised beta” which is basically {f⌿⍵}⌸. Around the time the key operator was introduced, we (Ken Iverson and I) realised that it’s better not to build reduction into a partitioning, so that if you actually want to have a sum you have to say +⌿⌸ rather than just +⌸. The efficiency can be recovered by recognising {f⌿⍵}⌸ and doing a native implementation for it. (Which Dyalog APL does, in 14.0.)

Roger, Epilogue: To repeat Nicolas’ question in the opening section, what does “fundamental” mean? (Is grade more fundamental than sort?)

In computability, we can use Turing completeness as the metric. But here the requirement is not computability but expressiveness. I don’t know that there is a definitive answer to the question. One way to judge the goodness of a notation, or of an addition to the existing notation such as dfns or key or rank, is to use it on problems from diverse domains and see how well it satisfies the important characteristics of notation set out in Ken Iverson’s Turing lecture:

  • ease of expressing constructs arising in problems
  • suggestivity
  • subordination of detail
  • economy
  • amenability to formal proofs

I would add to these an additional characteristic: beauty.

Do Functions Know Their Own Names?

Going back a long way when John Scholes and I were writing version 0 of Dyalog there was a big discussion about whether functions knew their own names. This discussion still surfaces, with John taking the side that they don’t and me taking the side that they do.

Essentially, John would argue that after A←2, the “2” does not know that it is called “A”. So after (in modern parlance):

      add←{
          ⍺+⍵
      }

the part in {} does not know that it is called “add”.

The real question here can be put in different terms: Is the symbol + representing the addition function itself or is it one of the possible names of the addition function.

From an APL perspective, does this matter? Most of the time it makes no difference. However, when you view your SI stack it does. Consider:

      add←{
          ⍺+⍵
      }
      times←{
          ⍺×⍵
      }
      inner←{
          ⍺ ⍺⍺.⍵⍵ ⍵
      }

Now if we trace into

      1 2 add inner times 3 4

and stop on inner[1] what do we want to see when we type ⍺⍺ in the session. There are two possibilities:

Either you see:

{
    ⍺+⍵
}

or you see:

∇add

Which of these is more useful?

Being more provocative, try putting the functions in a capsule:

[0] foo
[1] 1 2{
[2]     ⍺+⍵
[3] }{
[4]     ⍺ ⍺⍺.⍵⍵ ⍵
[5] }{
[6]     ⍺×⍵
[7] }3 4

and repeatedly trace until [6]:

      )SI
#.foo[6]*
.
#.foo[4]
#.foo[1]

Compare this with the following:

[0] goo
[1] add←{
[2]     ⍺+⍵
[3] }
[4] inner←{
[5]     ⍺ ⍺⍺.⍵⍵ ⍵
[6] }
[7] times←{
[8]     ⍺×⍵
[9] }
[10] 1 2 add inner times 3 4
      )SI
#.times[1]*
.
#.inner[1]
#.goo[10]

In my view, the latter is much more communicative in a debugging environment.

Going back to the version 0 discussion: We didn’t have dfns or dops, so everything was traditional. The discussion was centred around:

∇r←a add b
[1] r←a+b
∇

∇r←a times b
[1] r←a×b
∇

∇ r←a (f inner g) b
[1] r←a f.g b
∇

Now trace this:

      1 2 add inner times 3 4

until at times[1]

The key question at the time was whether )SI should show this:

      )SI
#.times[1]*
.
#.inner[1]

or this:

      )SI
#.g[1]*
.
#.inner[1]

We choose the first of these options as more informative.

So naming things is good and using those names when reporting state information is also good. When the issue was disputed, David Crossley (who was managing the development of Dyalog) resolved it using the argument about the )SI output.

These days it might not be so obvious. In those days we were essentially thinking in terms of a scrolling paper terminal. It pre-dates the full screen experience that even the tty version gives you. We had to wait for Adam Curtis to join the team before we got that. With the context display whilst tracing there is a stronger argument that the eyes using the debugging information do not need the names. Whilst I admit to the weakening I don’t think it actually changes the balance of the case.

We use a lot of C macros in the interpreter. On Linux, gdb gives us access to those macros when we debug the C code – lldb on MAC, dbx on AIX and Visual Studio on Windows all do not have that information and are, therefore, far less helpful.

Spring 2015 Dyalog Travelogue (An American Tale: Dyalog Goes West)

Concluding Morten and Gitte’s whistlestop tour round Europe before heading to the US (see parts 1 and 2)

Saturday afternoon feels like Déjà Vu all over again...

Saturday afternoon – Déjà Vu all over again…

On Saturday afternoon we passed through London’s Heathrow airport for the third time in 5 days, this time continuing west to JFK, bringing the total for the week to 8,943km plus 300-odd in 3 different cars, 220 by rail, 25 by bus and 5 on the ferry 🙂 .

Sunday was spent with some of Dyalog’s North American contingent, co-ordinating and putting the final polish on the coming week’s presentations.

On Monday morning we were ready to start the first Dyalog North America user meeting – DYNA’15. The Princeton Crowne Plaza was our venue – making this our third time there after Dyalog ’07 and Dyalog ’09.

Back at the site of Dyalog ’07 and Dyalog ’09 – the Crowne Plaza in Princeton, New Jersey

Back at the site of Dyalog ’07 and Dyalog ’09 – the Crowne Plaza in Princeton, New Jersey

 

We had 37 visitors on Monday and 25 on Tuesday – a total of 45 different delegates representing about 15 different clients turned up to listen to updated road maps, demonstrations of new tools and four half-day workshops on Recent Language Enhancements, Parallel Programming using Futures and Isolates, Web Application Development and Modern APL Application Architectures. While there were no user presentations this year (the potential presenters seem to be keeping their powder dry for Dyalog ’15 in Sicily in September), we nonetheless had a full schedule.

Woodley Butler of Automatonics, Inc had been due to make a presentation, but had a scheduling problem and was unable to come. Gitte presented his exciting new hosting solution for Dyalog APL: APLCloud.com, during the opening session on Monday.

Monday’s dinner may have been the highlight of the event. We traveled (longer than expected due to the shuttle driver getting lost) to Mimino’s Restaurant for an authentic Georgian meal. The word meal does not do the experience justice – it was a culinary extravaganza – with plate after plate of incredible Georgian food. Imagine everyone’s surprise when we were told that it was now time for the entrees! Good friends, good food and good drink made it a truly special night.

After a meeting in Princeton on Wednesday morning we hit the road, again, this time to the Poconos to spend a few days relaxing and working with the North American Dyaloggers on a variety of projects before wrapping this tour up with visits to clients next Monday and Tuesday.

Spring 2015 Dyalog Travelogue (The Saga Continues)

Continuing Morten and Gitte’s whistlestop tour round Europe before heading to the US for DYNA15

From left to right this time…

From left to right this time…

FH Bingen – must have good student parties?

FH Bingen – must have good student parties?

Day 3: Bramley-Bingen

Job interviews done by lunch-time, and we hopped in the car (without lunch) to Heathrow, flew to Frankfurt and finally arrived at Bingen at sunset, just in time for some Weissbier and Flammkücken with the other early arrivals. Bingen is located west of Frankfurt, where the Rhine leaves the plains and bends north through hills, heading for Köln, Düsseldorf and the North Sea. Separated from the river by a vineyard-covered hill lies Fachhochschule Bingen, where our, host Dieter Kilsch, uses APL with MatLab and other tools to teach students about quality control and other subjects.

Days 4 & 5: APL Germany Spring Meeting in Bingen

We spent the next two days in the company of about 25 German APL enthusiasts. This time we were first up with a 2.5 hour workshop on Futures and Isolates, and we were very pleased to see that all the delegates who had gone to the trouble of installing Dyalog APL were able to perform all the exercises. We’ll have to make them a little harder next time (Tuesday in Princeton) 🙂 . The afternoon was focused on IBM: News from the Z-series hardware front, and the IBM GSE requirements process, where APL2 users get together and vote on priorities for requests for enhancements. Wouldn’t it be great if our Dyalog users would gang up on us like that and help us to set priorities as a group?

The watch tower at the top of the hill south of Bingen

The watch tower at the top of the hill south of Bingen

The first session on Friday was an APL Germany “business session” which we were allowed to skip. Morten discovered that the Tourist Information office next to our hotel rented bikes. They were a little shocked that he was willing to spend €13 for on hour on a bike, but he felt that he really needed to burn some carbon off the spark plugs. Seems he can’t see a hill without feeling it is necessary to make an attempt to get to the top of it.

With Morten energised after an hour on the bike, it was time to return to the meeting. A couple of presentations were particularly interesting: before lunch, Jürgen Sauermann spoke about GNU APL, which is now 2 years old and up to version 1.5. It is very encouraging to see open source APL systems thriving and promoting the use of APL.

GNU APL

GNU APL

One of the presentations after lunch can only be described as mind-blowing! Jörg Hudelmaier presented a development environment for JavaScript applications, written in Dyalog APL. Jörg was able to prototype JavaScript applications in Dyalog APL, using the WinForms WebBrowser control to render the User Interface, processing callbacks in APL – using a set of APL functions that emulated enough JavaScript DOM support to make his application work. Once he had finished development, he wrote a translator that generated a stand-alone JavaScript application which could run in a browser.

We rounded the two days off with a presentation on our strategy and selected demos of features from versions 14.0 and 14.1: Key, the new experimental JSON parser/generator and the Compiler, threw ourselves in the rental car and headed to Denmark for 15 hours at home before we set off for JFK and Princeton. The story continues next week, on the other side of the Atlantic!

To be concluded…

Spring 2015 Dyalog Travelogue (Ferries, Trains, Planes and Automobiles)

The journey begins: Kronborg Castle, Helsingør

The journey begins: Kronborg Castle, Helsingør

Göteborg: Läppstiften and Barken

Göteborg: Läppstiften and Barken

One of my favourite times of the year is the period in March/April where we traditionally start with a visit to the FinnAPL Forest Seminar followed by the APL Germany Spring Meeting. This year there are two new stops to make: the Swedish APL Group has started holding meetings twice a year too – and we are running the first Dyalog North America meeting in Princeton.

The FinnAPL meeting was a few weeks ago; this week we are wrapping up the rest in a whirlwind tour of no less than five one-way trips to get us to Princeton late on Saturday.

Day 1: Helsingør to Göteborg

We were off to a beautifully easy start on Monday evening, with the 20-minute ferry ride from Helsingør, where Gitte Christensen and I live, to Helsingborg in Sweden. From here it was 1 hour and 45 minutes by snabbtåg (train) to Göteborg, where we arrived just in time for a run at sundown, over the bridge across the Göta river, from which the second image was taken. Our hotel, the beautiful ship Barken, is on the right, and the conference venue, Läppstiften (the Lipstick), a very short walk away to the left.

Day 2: Göteborg to London

Gitte catching some, er, fresh air outside the Hotel Barken

Gitte catching some, er, fresh air outside the Hotel Barken

After a good night’s rest we were welcomed on the 21st floor of the Lipstick by Lars Wenztel of Aplensia, our host for the day. Lars opened the meeting with a talk about the efficiency of using arrays to do product configuration and production planning at Volvo Car Corporation, which is also located just across the river from the meeting site. He mentioned how his team had been working to improve performance, and I would like to take that opportunity to remind you all to send us benchmarks that are representative of your most performance-critical application components so that we (Dyalog) can help you to speed things up.

I was up next with an updated Road Map presentation and demos of the new JSON parser and external workspaces (spoiler alert: Germans and North Americans will see something very similar later this week 🙂 ). The afternoon was full of interesting presentations on the use of APL, with Gert Møller from Genokey in Denmark as the last presenter. He brought the day to a close with his presentation of ongoing work on the application of array-based logic to the problem of reducing the cost, increasing the effectiveness of – and reducing the side-effects that patients are likely to experience when using – new drugs. And guess what…he mentioned that performance was important, too.

Front row: Gitte Christensen, Conrad Helgesson, Joakim Hårsman, Jens Andersson, Tina Leijding, Lars Wentzel. Second: Gilgamesh Athoraya, Ewa Svensson, Radha Jayabalan, Jonas Stensiö. Third: Peter Simonsson, Keerthi Thadikamalla, Per Ericsson, Erik Häggblom and feet of Ole Johansen. Off screen: Paul Grosvenor and Morten Kromberg

Tuesday was a wonderful day in the company of some very lively Swedish APL Users. It is wonderful to see how they have put APL to use in important production planning applications at some of Sweden’s largest manufacturing operations, and the rapidly-growing bio-informatics sector was also well represented. We ended the day deciding to meet again on November 11th, with a target focus of web servers and services. Nearly everyone present had either experience of or plans to implement web-based solutions in APL; there will be code reviews and perhaps even have some hands-on coding sessions next time!

spring2015pt1_6The original plan was to take the train and ferry again and have a night in our own beds, but an opportunity to interview two potential candidates for our current job opening had appeared and, since it would be 2-3 weeks before we were back in the UK, we decide to grasp it. One of the benefits of having a large collection of air miles is that you can spend them on last-minute one-way trips that would otherwise cost a fortune. So instead of hopping on the train, we found ourselves on the airport bus and then a BA flight to Heathrow.

To be continued…

Exploring Key

In ⍺ f⌸ ⍵, major cells of specify keys for the corresponding major cells of , and f applies to each unique major cell of and the major cells of having that key. The monadic case f⌸⍵ is equivalent to ⍵ f⌸ ⍳≢⍵. Key is similar to the GROUP BY clause in SQL.

For example:

   ⎕io←0

   (⍳11) ,[¯0.5] 'Mississippi'
0 1 2 3 4 5 6 7 8 9 10
M i s s i s s i p p  i

   {⍺}⌸'Mississippi'                 {⍺⍵}⌸'Mississippi'
Misp                              ┌─┬────────┐
                                  │M│0       │
   {⊂⍵}⌸'Mississippi'             ├─┼────────┤
┌─┬────────┬───────┬───┐          │i│1 4 7 10│
│0│1 4 7 10│2 3 5 6│8 9│          ├─┼────────┤
└─┴────────┴───────┴───┘          │s│2 3 5 6 │
                                  ├─┼────────┤
   {≢⍵}⌸'Mississippi'             │p│8 9     │
1 4 4 2                           └─┴────────┘

We illustrate key by using it to model some APL primitives and to motivate and solve a programming puzzle.

Index-Of and Friends

Since key is defined in terms of unique and indices, we expect to be able to effect members of the index-of family. And so we can.

The specifications say that f applies to each unique major cell of . Therefore, {⍺}⌸ and equivalently ⊣⌸ computes unique. Since the specifications are for unique major cells, these are what ∪⍵ would be if it were extended to non-vector arguments, analagous to the way was extended to non-vector left arguments.

   x←↑'Aspen' 'John' 'Susan' 'Roger' 'Opal' 'John' 'Aspen'
   x
Aspen
John 
Susan
Roger
Opal 
John 
Aspen

   {⍺}⌸x                  ⊣⌸x
Aspen                  Aspen
John                   John 
Susan                  Susan
Roger                  Roger
Opal                   Opal 

A little surprisingly, key can also be used to model extended index-of and extended membership. For reasons which will become clear, we model (AKA ∊⍨) instead of . Both models use the merge operator, whimsically denoted here by .

ix  ← {(≢⍺) ↓ (0⍴⍨(≢⍺)+≢⍵) ((∊i)→)⍨ (≢¨i)/(≢⍺)⌊⊃¨i←{⊂⍵}⌸⍺⍪⍵}
has ← {(≢⍺) ↓ (0⍴⍨(≢⍺)+≢⍵) ((∊i)→)⍨ (≢¨i)/(≢⍺)>⊃¨i←{⊂⍵}⌸⍺⍪⍵}

   y←↑'China' 'Susan' 'John' 'John' 'Anne' 'Roger'
   y
China
Susan
John 
John 
Anne 
Roger

   x ix y
7 2 1 1 7 3

   x has y
0 1 1 1 0 1

To focus on essentials, ix and has do not directly handle the case where the right argument has higher rank than the left argument (the principal argument). Such higher rank arguments can be handled by pre-application of {↑,⊂⍤(¯1+⍴⍴⍺)⊢⍵} and post-application of ((1-⍴⍴⍺)↓⍴⍵)⍴.

Partition

Since key can effect non-contiguous partitions, we expect to be able to use it to effect contiguous partitions, a special case of the former. And so we can:

   part ← {(+\b/⍺){⊂⍵}⌸b⌿⍵⊣b←∨\⍺}

   x
Aspen
John 
Susan
Roger
Opal 
John 
Aspen
   0 1 0 1 0 0 1 part x
┌─────┬─────┬─────┐
│John │Roger│Aspen│
│Susan│Opal │     │
│     │John │     │
└─────┴─────┴─────┘
   0 1 0 1 0 0 1 (⊂[0] ≡ part) x
1
   (7⍴0) (⊂[0] ≡ part) x
1

Sort and Grade

sort  ← {⍺⌿⍨¯1+{≢⍵}⌸⍺⍪⍵}
grade ← {(≢⍺)-⍨∊1↓¨{⊂⍵}⌸⍺⍪⍵}

If is an array from a small “alphabet” , then ⍺ sort ⍵ sorts and ⍺ grade ⍵ grades it. For example:

   x←?1e6⍴256                        a←⎕a[?1e6⍴26]

   (⍋x)  ≡ (⍳256) grade x            (⍋a)  ≡ ⎕a grade a
1                                 1
   x[⍋x] ≡ (⍳256) sort  x            a[⍋a] ≡ ⎕a sort  a
1                                 1

The Maximum of a Vector

In Dyalog ’13 session D11, I included a slide giving relative timings on the then new key operator:

x←?1e6⍴1000

a. ¯1+{≢⍵}⌸(⍳1000),x     1.00
b. ¯1+{≢⍵}⌸(⍳1+⌈/x),x    1.62
c. ...

The timings show that key with tally takes less than a factor of 2 over finding the maximum. This is fast; unbelievably fast even. The following puzzle arose from investigations into this timing: Find the maximum of a vector of unsigned 1-byte integers without using multicore, vector instructions, loop unrolling, etc. Can you do it faster in C than the following?

max=*x++; 
for(i=1;i<n;++i){if(max<*x)max=*x; ++x;}

I posed the puzzle to some expert C programmers, and they failed to solve it. An “obvious” solution obtains by considering an obvious implementation of {≢⍵}⌸x in C:

M=256;
memset(c,0,M*4);              // initialize table of counts
for(i=0;i<n;++i)c[*x++]++;    // analoguous to c[x]+←1 in APL

Therefore, one way to find the maximum of x is to have an M-element (byte) Boolean table b, and:

memset(b,0,M);
for(i=0;i<n;++i)b[*x++]=1;    // analoguous to b[x]←1 in APL
c=b+M;                        // scan from the back
for(i=0;i<M;++i)if(*--c){max=c-b; break;}

Timing show that the table method is faster than the comparison method by a factor of 1.5 for 1-byte integers and 1.3 for 2-byte integers. The table method requires more code, but the main n-loop, the time for which dominates the overall time, is simpler for the table method.

Jay Foad countered that finding the maximum (and the minimum) is much faster still with the vector instructions available in modern CPUs. Dyalog has now (version 14.1) implemented such finding, with performance benefits for key, the index-of family, grade, and sort.