Name Colouring for Dfns

APL is sometimes criticised because expressions that include names cannot, in general, be parsed without knowing whether the names represent functions or variables. For example, the name thing in the expression thing⍳3 could reference an array (in which case the is dyadic) or it could reference a function (making the monadic).

An APL expression becomes completely parsable if we distinguish each name with one of four colours, depending on the “kind” of its referent: array, function, monadic operator, dyadic operator. Now, with a bit of practice, we can at least parse thing⍳3 vs thing⍳3 without recourse to the definition of thing. Notice how kind differs slightly from APL’s name-class, which does not distinguish monadic from dyadic operators.

Names whose definitions are external to the code under consideration and so cannot be classified would be given a distinct colour, say red, which would at least draw attention to them. Colouring a number of interdependent functions at the same time should help with such issues.

Name-colouring can co-exist with the more superficial token-colouring ( green for comments and so forth) though we would probably want to configure the two schemes separately.

There’s a related argument for colouring parentheses to show the kind of value they contain: (~∘(⊂'')). This would mean that we could always determine the valency of a function call, or the kind of a hybrid token, such as /, by looking at the colour of the single token immediately to its left. Finally, we should probably kind-colour curly braces: {}, {⍺⍺ ⍵}, {⍵⍵ ⍵}.

Yeah but how?

In the following, where appropriate, “function” is a shorthand for “function or operator”.

Most generally, we want to process a number of definitions at the same time, so that inter-function references can be coloured. Such a set of functions may be considered as internal definitions in an anonymous outer function so, without loss of generality, we need consider only the case of a single multi-line nested function.

A function can be viewed as a tree, with nested subfunctions as subtrees. The definitions at each lexical level in the function comprise the information stored at each node of the tree.

The colouring process traverses the tree, accumulating a dictionary of (name kind) pairs at each level. At a particular level, definitions are processed in the same order as they would be executed, so that the (name kind) entries from earlier definitions are available to inform later expressions. Visiting each node before traversing its subtrees ensures that outer dictionary entries are available to lexically inner functions. Prefixing dictionary entries for inner functions ensures that a left-to-right search (à la dyadic iota) finds those names first, thus modelling lexical name-shadowing.

Each assignment expression adds one or more names to the dictionary. The kind of the assignment is inferred by looking at the defining expression to the right of the assignment arrow. This sounds heavy, but if we examine the expression from right-to-left then we can often stop after examining very few tokens. For example, an arbitrarily long expression ending in (… function array) must, if it is syntactically correct, reduce to an array value, while (… function function) must be a function (train). A sequence in braces {…} resolves to a function, monadic operator or dyadic operator, depending only on whether there are occurrences of ⍺⍺ or ⍵⍵ at its outermost level.

This approach, while relatively simple, is not perfect. It takes no account of the order in which names are defined relative to their being referenced. The assumption is that all definitions at an outer level are available to its inner levels. The following small function illustrates the problem:

    {
        M{A}      ⍝ this A correctly coloured function
        N{A}⍵     ⍝ this A should be coloured unclassified
        A←÷          ⍝ A defined as a function
        M N          ⍝ application of function to array
    }

It remains to be seen whether this would be a problem in practice or whether, if name-colouring proved popular, people might adjust their code to avoid such anomalies. The problem does not occur if we avoid re-using the same name for items of different kinds at the same lexical level – which IMHO is not a bad rule, anyway.

Implementation Status

I would like to experiment with name-kind colouring for the dfns code samples on the Dyalog website.

This project is under way, but has rather a low priority and so keeps stalling. In addition, I keep wavering between processing the tokens as a list with tail calling and the dictionary as an accumulating left argument, or in more classic APL style as a vector, developing parallel vectors for lexical depth and masks for assignment arrow positions and expression boundaries, etc.

In the meantime, here is an artist’s impression of what name- and bracket- coloured code might look like, with colours: array, function, dyadic-operator.

    eval{
        stk I(op ops)←⍵
        op in⊃⍺:⍺ prt stk I((dref op)cat ops)
        f(fr rr)f rstk
        c{op≡⍵~' '}
        c'dip ':⍺('*⌷'rgs{prt rr I(f cat fr ops)})
        ...

Notice how functions stand out from their arguments. For example, at the start of the third line: op in⊃⍺:, it is clear that in is a function taking ⊃⍺ (first of alpha) as right argument and op as left argument, rather than a monadic function op taking in⊃⍺ (in pick of alpha) as right argument. Other interpretations of the uncoloured expression op in⊃⍺: include:

    op in⊃⍺:    op in is a vector (array), so is pick.
    op in⊃⍺:    both op and in are monadic function calls, so is first.
    op in⊃⍺:    in is a monadic operator with array operand op.
    op in⊃⍺:    in is a dyadic operator with function operands op and .

I’m keen to hear any feedback and suggestions about name-colouring for dfns. What d’ya think?

A Dialog on APL

A discussion between Nicolas Delcros and Roger Hui

Nicolas, Prologue: From a language point of view, thanks to Ken Iverson, it is obvious that you want grade rather than sort as a primitive. Yet from a performance point of view, sort is currently faster than grade.

Can one be “more fundamental” than the other? If so, who’s wrong…APL or our CPUs? In any case, what does “fundamental” mean?

Roger: Sorting is faster than grading due to reasons presented in the J Wiki essay Sorting versus Grading, in the paragraph which begins “Is sorting necessarily faster than grading?” I can not prove it in the mathematical sense, but I believe that to be the case on any CPU when the items to be sorted/graded are machine units.

Nicolas, Formulation 1: Now, parsing ⍵[⍋⍵] (and not scanning the idiom) begs the question of how deeply an APL intepreter can “understand” what it’s doing to arrays.

How would an APL compiler resolve this conjunction in the parse tree? Do you simply have a bunch of state pointers such as “is the grade of” or “is sorted” or “is squozen” or “axis ordering” etc. walking along the tree? If so, do we have an idea of the number of such state pointers required to exhaustively describe what the APL language can do to arrays? If not, is there something more clever out there?

Roger: I don’t know of any general principles that can tell you what things can be faster. I do have two lists, one for J and another for Dyalog. A big part of the lists consists of compositions of functions, composition in the mathematical sense, that can be faster than doing the functions one after the other if you “recognize” what the composed function is doing and write a native implementation of it. Sort vs. grade is one example (sort is indexing composed with grade). Another one is (⍳∘1 >) or (1 ⍳⍨ >). The function is “find the first 1” composed with >. These compositions have native implementations and:

      x←?1e6⍴1e6

      cmpx '5e5(1 ⍳⍨ >)x' '(5e5>x)⍳1' 
5e5(1 ⍳⍨ >)x → 0.00E0  |       0%
(5e5>x)⍳1    → 1.06E¯2 | +272100% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

      cmpx '¯1(1 ⍳⍨ >)x' '(¯1>x)⍳1' 
¯1(1 ⍳⍨ >)x → 2.41E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(¯1>x)⍳1    → 4.15E¯3 | +71% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

If you get a “hit” near the beginning, as would be the case with 5e5, you win big. Even if you have to go to the end (as with ¯1), you still save the cost of explicitly generating the Boolean vector and then scanning it to the end.

Another one, introduced in 14.1, is:

      cmpx '(≢∪)x' '≢∪x'
(≢∪)x → 4.43E¯3 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
≢∪x   → 1.14E¯2 | +157% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

This is the tally nub composition, used often in a customer application. If you “know” that you just want the tally of the nub (uniques), you don’t actually have to materialise the array for the nub.

I am not conversant with compiler technology so I don’t know what all an APL compiler can do. I do know that there’s a thing call “loop fusion” where, for example, in a+b×c÷2, it doesn’t have to go through a,b,c in separate loops, but can instead do

    :for i :in ⍳≢a ⋄ z[i]←a[i]+b[i]×c[i]÷2 ⋄ :endif

saving on the array temp on every step. You win some with this, but I think the function composition approach wins bigger. On the other hand, I don’t know that there is a general technique for function composition. I mean, what general statements can you make about what things can be faster (AKA algebraically simpler)?

Nicholas: I sort of see…so jot is a “direct” conjunction. An indirect conjunction could be ⍵[⍺⍴⍋⍵] where the intermediate grade is reshaped. We “know” that grade and shape are “orthogonal” and can rewrite the formula to ⍴⍵[⍋⍵].

So if we can establish a list of flags, and establish how primitives touch these flags, and how these flags affect each other, then we can extend to any path of the parse tree, provided that the intermediate nodes don’t destroy the relationship between the two operations (here grade + indexing provides sort, independently of the reshape).

Of course we can spend our lives finding such tricks. Or we could try and systemise it.

Roger: What you are saying above is that reshape and indexing commute (i.e. reshape indexing ←→ indexing reshape). More generally, compositions of the so-called structural functions and perhaps of the selection functions are amenable to optimisations. This would especially be the case if arrays use the “strided representation” described in Nick Nickolov’s paper Compiling APL to JavaScript in Vector. I used strided representation implicitly to code a terse model of ⍺⍉⍵ in 1987.

Nicolas, Formulation 2: On which grounds did the guys in the ’50s manage to estimate the minimal list of operations that you needed to express data processing?

Roger: APL developed after Ken Iverson struggled with using conventional mathematical notation to teach various topics in data processing. You can get an idea of the process from the following papers:

In our own humble way, we go through a similar process: We talk to customers to find out what problems they are faced with, what things are still awkward, and think about what if anything we can do to the language or the implementation to make things better. Sometimes we come up with a winner, for example . You know, the idea for (grade) is that often you don’t just use ⍋x to order x (sort) but you use ⍋x to order something else. Similarly with , you often don’t want just x⍳y but you use it to apply a function to items with like indices. The J Wiki essay Key describes how the idea arose in applications, and then connected with something I read about, the Connection Machine, a machine with 64K processors (this was in the 1980s).

Nicolas, Formulation 3: Do we have something with a wider spectrum than “Turing complete or not” to categorise the “usefulness and/or efficiency” of a language?

Roger: Still no general principles, but I can say this:

  • Study the languages of designers whom you respect, and “borrow” their primitives, or at least pay attention to the idea. For example, =x is a grouping primitive in k. Symbols are Arthur Whitney’s most precious resource and for him to use up a symbol is significant.

    =x ←→ {⊂⍵}⌸x   old k definition

    =x ←→ {⍺⍵}⌸x   current k definition

    Both {⊂⍵}⌸x (14.0) and {⍺⍵}⌸x (14.1) are supported by special code.

  • Study the design of machines. For a machine to make something “primitive” is significant. For example, the Connection Machine has an instruction “generalised beta” which is basically {f⌿⍵}⌸. Around the time the key operator was introduced, we (Ken Iverson and I) realised that it’s better not to build reduction into a partitioning, so that if you actually want to have a sum you have to say +⌿⌸ rather than just +⌸. The efficiency can be recovered by recognising {f⌿⍵}⌸ and doing a native implementation for it. (Which Dyalog APL does, in 14.0.)

Roger, Epilogue: To repeat Nicolas’ question in the opening section, what does “fundamental” mean? (Is grade more fundamental than sort?)

In computability, we can use Turing completeness as the metric. But here the requirement is not computability but expressiveness. I don’t know that there is a definitive answer to the question. One way to judge the goodness of a notation, or of an addition to the existing notation such as dfns or key or rank, is to use it on problems from diverse domains and see how well it satisfies the important characteristics of notation set out in Ken Iverson’s Turing lecture:

  • ease of expressing constructs arising in problems
  • suggestivity
  • subordination of detail
  • economy
  • amenability to formal proofs

I would add to these an additional characteristic: beauty.

Do Functions Know Their Own Names?

Going back a long way when John Scholes and I were writing version 0 of Dyalog there was a big discussion about whether functions knew their own names. This discussion still surfaces, with John taking the side that they don’t and me taking the side that they do.

Essentially, John would argue that after A←2, the “2” does not know that it is called “A”. So after (in modern parlance):

      add←{
          ⍺+⍵
      }

the part in {} does not know that it is called “add”.

The real question here can be put in different terms: Is the symbol + representing the addition function itself or is it one of the possible names of the addition function.

From an APL perspective, does this matter? Most of the time it makes no difference. However, when you view your SI stack it does. Consider:

      add←{
          ⍺+⍵
      }
      times←{
          ⍺×⍵
      }
      inner←{
          ⍺ ⍺⍺.⍵⍵ ⍵
      }

Now if we trace into

      1 2 add inner times 3 4

and stop on inner[1] what do we want to see when we type ⍺⍺ in the session. There are two possibilities:

Either you see:

{
    ⍺+⍵
}

or you see:

∇add

Which of these is more useful?

Being more provocative, try putting the functions in a capsule:

[0] foo
[1] 1 2{
[2]     ⍺+⍵
[3] }{
[4]     ⍺ ⍺⍺.⍵⍵ ⍵
[5] }{
[6]     ⍺×⍵
[7] }3 4

and repeatedly trace until [6]:

      )SI
#.foo[6]*
.
#.foo[4]
#.foo[1]

Compare this with the following:

[0] goo
[1] add←{
[2]     ⍺+⍵
[3] }
[4] inner←{
[5]     ⍺ ⍺⍺.⍵⍵ ⍵
[6] }
[7] times←{
[8]     ⍺×⍵
[9] }
[10] 1 2 add inner times 3 4
      )SI
#.times[1]*
.
#.inner[1]
#.goo[10]

In my view, the latter is much more communicative in a debugging environment.

Going back to the version 0 discussion: We didn’t have dfns or dops, so everything was traditional. The discussion was centred around:

∇r←a add b
[1] r←a+b
∇

∇r←a times b
[1] r←a×b
∇

∇ r←a (f inner g) b
[1] r←a f.g b
∇

Now trace this:

      1 2 add inner times 3 4

until at times[1]

The key question at the time was whether )SI should show this:

      )SI
#.times[1]*
.
#.inner[1]

or this:

      )SI
#.g[1]*
.
#.inner[1]

We choose the first of these options as more informative.

So naming things is good and using those names when reporting state information is also good. When the issue was disputed, David Crossley (who was managing the development of Dyalog) resolved it using the argument about the )SI output.

These days it might not be so obvious. In those days we were essentially thinking in terms of a scrolling paper terminal. It pre-dates the full screen experience that even the tty version gives you. We had to wait for Adam Curtis to join the team before we got that. With the context display whilst tracing there is a stronger argument that the eyes using the debugging information do not need the names. Whilst I admit to the weakening I don’t think it actually changes the balance of the case.

We use a lot of C macros in the interpreter. On Linux, gdb gives us access to those macros when we debug the C code – lldb on MAC, dbx on AIX and Visual Studio on Windows all do not have that information and are, therefore, far less helpful.

Spring 2015 Dyalog Travelogue (An American Tale: Dyalog Goes West)

Concluding Morten and Gitte’s whistlestop tour round Europe before heading to the US (see parts 1 and 2)

Saturday afternoon feels like Déjà Vu all over again...

Saturday afternoon – Déjà Vu all over again…

On Saturday afternoon we passed through London’s Heathrow airport for the third time in 5 days, this time continuing west to JFK, bringing the total for the week to 8,943km plus 300-odd in 3 different cars, 220 by rail, 25 by bus and 5 on the ferry 🙂 .

Sunday was spent with some of Dyalog’s North American contingent, co-ordinating and putting the final polish on the coming week’s presentations.

On Monday morning we were ready to start the first Dyalog North America user meeting – DYNA’15. The Princeton Crowne Plaza was our venue – making this our third time there after Dyalog ’07 and Dyalog ’09.

Back at the site of Dyalog ’07 and Dyalog ’09 – the Crowne Plaza in Princeton, New Jersey

Back at the site of Dyalog ’07 and Dyalog ’09 – the Crowne Plaza in Princeton, New Jersey

 

We had 37 visitors on Monday and 25 on Tuesday – a total of 45 different delegates representing about 15 different clients turned up to listen to updated road maps, demonstrations of new tools and four half-day workshops on Recent Language Enhancements, Parallel Programming using Futures and Isolates, Web Application Development and Modern APL Application Architectures. While there were no user presentations this year (the potential presenters seem to be keeping their powder dry for Dyalog ’15 in Sicily in September), we nonetheless had a full schedule.

Woodley Butler of Automatonics, Inc had been due to make a presentation, but had a scheduling problem and was unable to come. Gitte presented his exciting new hosting solution for Dyalog APL: APLCloud.com, during the opening session on Monday.

Monday’s dinner may have been the highlight of the event. We traveled (longer than expected due to the shuttle driver getting lost) to Mimino’s Restaurant for an authentic Georgian meal. The word meal does not do the experience justice – it was a culinary extravaganza – with plate after plate of incredible Georgian food. Imagine everyone’s surprise when we were told that it was now time for the entrees! Good friends, good food and good drink made it a truly special night.

After a meeting in Princeton on Wednesday morning we hit the road, again, this time to the Poconos to spend a few days relaxing and working with the North American Dyaloggers on a variety of projects before wrapping this tour up with visits to clients next Monday and Tuesday.

Spring 2015 Dyalog Travelogue (The Saga Continues)

Continuing Morten and Gitte’s whistlestop tour round Europe before heading to the US for DYNA15

From left to right this time…

From left to right this time…

FH Bingen – must have good student parties?

FH Bingen – must have good student parties?

Day 3: Bramley-Bingen

Job interviews done by lunch-time, and we hopped in the car (without lunch) to Heathrow, flew to Frankfurt and finally arrived at Bingen at sunset, just in time for some Weissbier and Flammkücken with the other early arrivals. Bingen is located west of Frankfurt, where the Rhine leaves the plains and bends north through hills, heading for Köln, Düsseldorf and the North Sea. Separated from the river by a vineyard-covered hill lies Fachhochschule Bingen, where our, host Dieter Kilsch, uses APL with MatLab and other tools to teach students about quality control and other subjects.

Days 4 & 5: APL Germany Spring Meeting in Bingen

We spent the next two days in the company of about 25 German APL enthusiasts. This time we were first up with a 2.5 hour workshop on Futures and Isolates, and we were very pleased to see that all the delegates who had gone to the trouble of installing Dyalog APL were able to perform all the exercises. We’ll have to make them a little harder next time (Tuesday in Princeton) 🙂 . The afternoon was focused on IBM: News from the Z-series hardware front, and the IBM GSE requirements process, where APL2 users get together and vote on priorities for requests for enhancements. Wouldn’t it be great if our Dyalog users would gang up on us like that and help us to set priorities as a group?

The watch tower at the top of the hill south of Bingen

The watch tower at the top of the hill south of Bingen

The first session on Friday was an APL Germany “business session” which we were allowed to skip. Morten discovered that the Tourist Information office next to our hotel rented bikes. They were a little shocked that he was willing to spend €13 for on hour on a bike, but he felt that he really needed to burn some carbon off the spark plugs. Seems he can’t see a hill without feeling it is necessary to make an attempt to get to the top of it.

With Morten energised after an hour on the bike, it was time to return to the meeting. A couple of presentations were particularly interesting: before lunch, Jürgen Sauermann spoke about GNU APL, which is now 2 years old and up to version 1.5. It is very encouraging to see open source APL systems thriving and promoting the use of APL.

GNU APL

GNU APL

One of the presentations after lunch can only be described as mind-blowing! Jörg Hudelmaier presented a development environment for JavaScript applications, written in Dyalog APL. Jörg was able to prototype JavaScript applications in Dyalog APL, using the WinForms WebBrowser control to render the User Interface, processing callbacks in APL – using a set of APL functions that emulated enough JavaScript DOM support to make his application work. Once he had finished development, he wrote a translator that generated a stand-alone JavaScript application which could run in a browser.

We rounded the two days off with a presentation on our strategy and selected demos of features from versions 14.0 and 14.1: Key, the new experimental JSON parser/generator and the Compiler, threw ourselves in the rental car and headed to Denmark for 15 hours at home before we set off for JFK and Princeton. The story continues next week, on the other side of the Atlantic!

To be concluded…

Spring 2015 Dyalog Travelogue (Ferries, Trains, Planes and Automobiles)

The journey begins: Kronborg Castle, Helsingør

The journey begins: Kronborg Castle, Helsingør

Göteborg: Läppstiften and Barken

Göteborg: Läppstiften and Barken

One of my favourite times of the year is the period in March/April where we traditionally start with a visit to the FinnAPL Forest Seminar followed by the APL Germany Spring Meeting. This year there are two new stops to make: the Swedish APL Group has started holding meetings twice a year too – and we are running the first Dyalog North America meeting in Princeton.

The FinnAPL meeting was a few weeks ago; this week we are wrapping up the rest in a whirlwind tour of no less than five one-way trips to get us to Princeton late on Saturday.

Day 1: Helsingør to Göteborg

We were off to a beautifully easy start on Monday evening, with the 20-minute ferry ride from Helsingør, where Gitte Christensen and I live, to Helsingborg in Sweden. From here it was 1 hour and 45 minutes by snabbtåg (train) to Göteborg, where we arrived just in time for a run at sundown, over the bridge across the Göta river, from which the second image was taken. Our hotel, the beautiful ship Barken, is on the right, and the conference venue, Läppstiften (the Lipstick), a very short walk away to the left.

Day 2: Göteborg to London

Gitte catching some, er, fresh air outside the Hotel Barken

Gitte catching some, er, fresh air outside the Hotel Barken

After a good night’s rest we were welcomed on the 21st floor of the Lipstick by Lars Wenztel of Aplensia, our host for the day. Lars opened the meeting with a talk about the efficiency of using arrays to do product configuration and production planning at Volvo Car Corporation, which is also located just across the river from the meeting site. He mentioned how his team had been working to improve performance, and I would like to take that opportunity to remind you all to send us benchmarks that are representative of your most performance-critical application components so that we (Dyalog) can help you to speed things up.

I was up next with an updated Road Map presentation and demos of the new JSON parser and external workspaces (spoiler alert: Germans and North Americans will see something very similar later this week 🙂 ). The afternoon was full of interesting presentations on the use of APL, with Gert Møller from Genokey in Denmark as the last presenter. He brought the day to a close with his presentation of ongoing work on the application of array-based logic to the problem of reducing the cost, increasing the effectiveness of – and reducing the side-effects that patients are likely to experience when using – new drugs. And guess what…he mentioned that performance was important, too.

Front row: Gitte Christensen, Conrad Helgesson, Joakim Hårsman, Jens Andersson, Tina Leijding, Lars Wentzel. Second: Gilgamesh Athoraya, Ewa Svensson, Radha Jayabalan, Jonas Stensiö. Third: Peter Simonsson, Keerthi Thadikamalla, Per Ericsson, Erik Häggblom and feet of Ole Johansen. Off screen: Paul Grosvenor and Morten Kromberg

Tuesday was a wonderful day in the company of some very lively Swedish APL Users. It is wonderful to see how they have put APL to use in important production planning applications at some of Sweden’s largest manufacturing operations, and the rapidly-growing bio-informatics sector was also well represented. We ended the day deciding to meet again on November 11th, with a target focus of web servers and services. Nearly everyone present had either experience of or plans to implement web-based solutions in APL; there will be code reviews and perhaps even have some hands-on coding sessions next time!

spring2015pt1_6The original plan was to take the train and ferry again and have a night in our own beds, but an opportunity to interview two potential candidates for our current job opening had appeared and, since it would be 2-3 weeks before we were back in the UK, we decide to grasp it. One of the benefits of having a large collection of air miles is that you can spend them on last-minute one-way trips that would otherwise cost a fortune. So instead of hopping on the train, we found ourselves on the airport bus and then a BA flight to Heathrow.

To be continued…