Rencontres Dyalog APL 2016 – Paris, France

It is really good to see APL events come back to life! In April of 2014, we witnessed the re-birth of SWEDAPL, which had been dormant for some time – but now meets twice a year and is perhaps the most vibrant APL meeting on our circuit, with many young developers developing new products and features in APL. The last SWEDAPL meeting drew a substantial international crowd, including a number of Danes – and the next one, scheduled for April 1st, is making a guest appearance in Copenhagen, hosted by Simcorp A/S – a bit like the Tour de France πŸ™‚

paris16This year we are really happy to be back in France, which has also had a relatively dormant APL community for the last decade or so – at least in terms of holding meetings. In this case we decided to arrange a meeting with the help of our French distributor – Quantys. Although the invitation was to a “Dyalog User Meeting“, about half of the attendees were users of other APL systems than Dyalog APL – with a little luck this meeting will turn out to contain the seeds for a rebirth of an independent French APL group. Fingers crossed!

Everyone please note: If you want to organise a local APL event, and you invite a speaker from Dyalog, we will do everything we can to send one or two delegates to your meeting. The current “circuit” includes Finland, Germany and Sweden twice a year, France and the East Coast USA. We have been known to show up at the Bay Area Users’ Group from time to time, in Toronto, at J and kx meetings – and recently also at FunctionalConf in Bangalore.

After welcoming remarks from Marc Righetti of Quantys, Gitte talked about Dyalog’s commitment to ensure that APL is well-integrated with modern computing platforms and infrastructure, which is always in the throes of another revolution. The current movement towards cloud computing and the need for platform independence is no exception. The good news is that Dyalog is growing to meet the challenge; we expect to add another couple of heads this year and grow the company by another ten percent.

Dan Baronet is a native of Montreal, Canada. As one of our French-speaking team members, he ended up doing most of the heavy lifting, with presentations on the upcoming v15.0 release, and a recap of the recent language enhancements in version 14.0 – in particular, the rank and key operators and function trains. Nicolas Delcros also spoke in French on the subject of his most recent work on integrating the publishing capabilities of Adrian Smith’s NewLeaf tool to SharpPlot, under the name SharpLeaf. I was only allowed to interrupt the flow of French twice, first with a road map presentation and, in the afternoon, a brief introduction to Futures and Isolates.

At the end of an action-packed day, Quantys treated us all to Champagne and snacks – many thanks to Marc for running the show and taking good care of us. A single day was much too short a time to do justice to the last decade of Dyalog achievements – so we will have to be back more regularly!

FinnAPL Forest Seminar 2016

The view from the Sauna. Some people actually went in!

The view from the Sauna. Some people actually went in!

Finns probably have better reasons to look forward to spring more than most of us: not only does it get much easier to keep that hole in the ice open, it is time for the annual FinnAPL Forest Seminar!

This year, just under 20 of us gathered for two days (Thursday March 10th and Friday March 11th) at Hirvihaara Manor, about an hour north of Helsinki, to update each other on what we have been getting up to recently.

Thursday

After a warm welcome from Jouko Kangasniemi, Chairman of FinnAPL, Veli-Matti Jantunen from Statistics Finland kicked the proceedings off with a talk titled “The long way of an APL2 bigot to Dyalog world”, where he discussed features of recent versions of Dyalog APL, awarding some of them them varying numbers of thumbs up, declaring some to be irrelevant. A few were found to be flawed… We are hoping to talk him into a repeat at Dyalog’16 as this was a valuable and thought-provoking review!

I was on next – with the Spring 2016 version of the Dyalog Road Map. As should be confirmed by the slides, there is not a big change in direction. We are planning to increase headcount by another 10% this year and continue investing in the core interpreter technology, APL compilers, and tools to help you build applications on a growing number of platforms.

Ants on the left. Ray Cannon on the right

Ants on the left. Ray Cannon on the right

After lunch, Ray Cannon showed us how to “Build a better ant brain”, producing wonderful, coloured, animations with ants crawling all over the big screen, using MiServer 3.0 and a bit of JavaScript – running under Dyalog APL on a Raspberry Pi!

My technical keynote in the morning had included a demo of a very early prototype of a Python interface, which will allow APL users to tap in to Python libraries. I was, therefore, very interested in the next presentation by Esa Pursiheimo from the VTT Technical Research Centre of Finland – which gave us all an introduction to the Python language. There is no question that the Python community has built libraries that could be very useful to APLers (although I cannot say the language itself impressed me much πŸ™‚ ).

The last presentation of the day, titled “Data Driven Documents” (aka “D3”), was also about using libraries written in other programming language to extend APL applications. In this case the language was JavaScript. Jouko Kangasniemi from the Confederation of Finnish Industries (EK) showed how he is generating JavaScript to call the popular D3 Graphics Library and publish charts that are relevant to Economic planners in Finland. A collection of animated charts created using this technology can be found at http://ek.fi/materiaalipankki/tietografiikka/talous/viikon-graafit/.

Since we were in Finland, the afternoon ended with a visit to a traditional “smoke sauna”, before we all scrubbed up for the banquet.

Cheers! From left to right (more or less): Antero Ranne, Gitte Christensen, Esa Lippu, Miika RΓ€mΓ€, Simo Kilponen, Jouko Kangasniemi, Heikki ViitamΓ€ki, Esa Pursiheimo, Olli Paavola, Kaarlo Reipas, GΓΆran Koreneff, Morten Kromberg, Kimmo KekΓ€lΓ€inen, Veli-Matti Jantunen, Timo Korpela, Ray Cannon (Missing: Anssi SeppΓ€lΓ€)

Cheers! From left to right (more or less): Antero Ranne, Gitte Christensen, Esa Lippu, Miika RΓ€mΓ€, Simo Kilponen, Jouko Kangasniemi, Heikki ViitamΓ€ki, Esa Pursiheimo, Olli Paavola, Kaarlo Reipas, GΓΆran Koreneff, Morten Kromberg, Kimmo KekΓ€lΓ€inen, Veli-Matti Jantunen, Timo Korpela, Ray Cannon (Missing: Anssi SeppΓ€lΓ€)

Friday

The first talk on Friday morning was perhaps the most interesting from my point of view: Antero Ranne of the Ilmarinen Mutual Pension Insurance Company: Parallel showed how he was able to speed up financial simulations by a factor of approximately 3 on his Intel i7-based laptop, using Futures and Isolates in Dyalog version 14.0. It is really good to see that domain experts wield this tool!

After coffee, Gitte presented the work that she had done to put APL on the map as an invited speaker at a recent conference on the history of information technology in the Nordic region. She also reminded us all that we will be celebrating the 50th anniversary of the first running APL system on November 27 (http://silvermapleweb.com/first-cleanspace/). At Dyalog ’16 on October 9th-13th in Glasgow, Scotland, Dyalog will set time aside to celebrate this anniversary in collaboration with the British APL Association. If you have a good story about ground-breaking work done in APL in the early days, please get in touch and discuss how you might contribute to the celebrations!

Once again, I found myself standing between the audience and lunch – fortunately there are enough juicy language features and interfaces coming in versions 15.0 and 16.0 and I did not have anyone walk out before I was done. I even had time to talk about a workspace that we added several years ago, after discovering that several members of the audience were unaware of it: The “loaddata” workspace, which contains functions to read and write Excel Spreadsheets, CSV files, XML and ODBC data sources. If you have not seen it yet, try loading it and take a look.

After another excellent lunch, Anssi SeppΓ€lΓ€ of Enease Oy wrapped up the formal part of the programme with a talk on an inverted vectorial database implemented in the J programing language, named JD. JD makes it straightforward to manage large timeseries containing records of power usage and the quality of electricity delivered to consumers, perform analyses and generate visualisations of the data.

Several of us continued discussing programming challenges, while drinking (STRONG!!!) Finnish coffee and eating the wonderful cakes that were provided all day by Hirvihaara Manor, before heading back home after another successful FinnAPL Forest Seminar – we look forward to the 2017 edition!

Gitte and I managed to get about 40 hours at home before boarding the next plane, heading for Paris for the first French Dyalog User Meeting in recent history. More about that coming soon!

PQA

PQA is an acronym for Performance Quality Assurance. We developed PQA to answer the questions, which APL primitives have slowed down from one build of the Dyalog interpreter to the next, and by how much? Currently, PQA consists of 13,659 benchmarks divided into 136 groups.





4

The Graphs

The graph above (and all other graphs in this article) plot the timing ratios between version 14.1 and theΒ build of version 15.0 onΒ 2016-01-22, the Windows 64-bit Unicode version. The data are the group geometric means. The horizontal axis are the groups sorted by the means. The vertical axis are logarithms of the means. The percent at the top (-17.1%) indicates that over all 136 groups, version 15.0 is 17.1% faster than version 14.1. The bottom of the graph indicates how many of the groups are faster (blue, 69.9%) and how many are more than 2% faster (89/136); and how many are slower (red, 30.1%) and how many are more than 2% slower (27/136).

For the graph above, the amount of blue is gratifying but the red is worrying. From past experience the more blue the better (of course) but any significant red is problematic; speed-ups will not excuse slow-downs.

PQA is run nightly when a new build of version 15.0 is available. Each PQA run takes about 12.5 hours on a dedicated machine. The graphs from the past month’s runs are presented on the strip on the right; the first of them is a shrunken version of the big graph at the beginning of the article.

Challenges

About two years ago, investigations into a user report of an interpreter slow-down underscored for us a “feature” of modern computer systems: An APL primitive can run slower even though the C source is changed only in seemingly inconsequential ways and indeed even unchanged. Moreover, the variability in measurements frequently exceeded the difference in timings between the two builds, and it was very difficult to establish any kind of pattern. It got to the point where even colleagues were doubting us β€” how hard can it be to run a few benchmarks?

  • The system is very noisy.
  • Cache is king.
  • The CPU acts like an interpreter.
  • Branching (in machine code) is slow.
  • Instruction counts don’t tell all.
  • The number of combinations of APL primitives and argument types is enormous.

What can be done about it?

  • Developed PQA.
  • Run PQA on a dedicated machine.
  • Make the system as quiet as possible, then make it quieter than that. πŸ™‚
  • Set the priority of the APL task to be High or Realtime.
  • The βŽ•ai or βŽ•ts clocks have insufficient resolution. Use QueryPerformanceCounteror equivalent. The QPC resolution varies but is typically around 0.5eΒ―6 seconds.
  • Run each benchmark expression enough times to consume a significant amount of time (50eΒ―3 seconds, say).
  • Repeat the previous step enough times to get a significant sample.

Despite all these, timing differences of less than 5% are not reliably reproducible.

Silver Bullets

Given the difficulties in obtaining accurate benchmarks and given the amount of time and energy required to investigate (phantom) slow-downs, there evolved among us the idea of “silver bullets”: We will actively work on speed-ups instead of waiting for slow-downs to bite.

The speed-ups to be worked on are chosen based on:

  • decades of experience in APL application development
  • decades of experience in APL design and implementation
  • APLMON snapshots provided by users
  • talking to users
  • interactions on the Dyalog Forums
  • running benchmarks on user applications
  • user presentations at Dyalog User Meetings
  • the speed-up is easy, while mindful of Einstein’s admonition against drilling lots of holes where the drilling is easy

We were gratified to receive reports from multiple users that their applications, without changes, sped up by more than 20% from 13.2 to 14.0. A user reported that using the key operator in 14.0 sped up a key part of their application by a factor of 17. Evidently some of the silver bullets found worthwhile targets.

We urge you, gentle reader, to send us APLMON snapshots of your applications, to improve the chance that future speed-ups will actually speed up your applications. APLMON is a profiling facility that indicates which APL primitives are used, how much they are used, the size and datatype of the arguments, and the proportion of the overall time of such usage. An APLMON snapshot preserves the secrecy of the application code.

We will talk about the new silver bullets for version 15.0 in another blog post. Watch this space.

Coverage

We look at benchmarks involving + as a dyadic function to give some idea of what’s being timed. There are 72 such expressions. Variables with the following names are involved:

      'xy'∘.,'bsildz'∘.,'0124'
β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”
β”‚xb0β”‚xb1β”‚xb2β”‚xb4β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚xs0β”‚xs1β”‚xs2β”‚xs4β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚xi0β”‚xi1β”‚xi2β”‚xi4β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚xl0β”‚xl1β”‚xl2β”‚xl4β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚xd0β”‚xd1β”‚xd2β”‚xd4β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚xz0β”‚xz1β”‚xz2β”‚xz4β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”
β”‚yb0β”‚yb1β”‚yb2β”‚yb4β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚ys0β”‚ys1β”‚ys2β”‚ys4β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚yi0β”‚yi1β”‚yi2β”‚yi4β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚yl0β”‚yl1β”‚yl2β”‚yl4β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚yd0β”‚yd1β”‚yd2β”‚yd4β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚yz0β”‚yz1β”‚yz2β”‚yz4β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜

x and y denote left or right argument; bsildz denote:

b
s
i
l
d
z
boolean
short integer
integer
long integer
double
complex
1 bit
1 byte
2 bytes
4 bytes
8 bytes
16 bytes

0124 denote the base-10 log of the vector lengths. (The lengths are 1, 10, 100, and 10,000.)

The bsil variables are added to the like-named variable with the same digit, thus:

xb0+yb0  xb0+ys0  xb0+yi0  xb0+yl0
xs0+yb0  xs0+ys0  xs0+yi0  xs0+yl0
xi0+yb0  xi0+ys0  xi0+yi0  xi0+yl0
xl0+yb0  xl0+ys0  xl0+yi0  xl0+yl0

xb1+yb1  xb1+ys1  xb1+yi1  xb1+yl1
xs1+yb1  xs1+ys1  xs1+yi1  xs1+yl1
...
xi4+yb4  xi4+ys4  xi4+yi4  xi1+yl4
xl4+yb4  xl4+ys4  xl4+yi4  xl4+yl4

There are the following 8 expressions to complete the list:

xd0+yd0  xd1+yd1  xd2+yd2  xd4+yd4
xz0+yz0  xz1+yz1  xz2+yz2  xz4+yz4

The idea is to measure the performance on small as well as large arguments, and on arguments with different datatypes, because (for all we know) different code can be involved in implementing the different combinations.

After looking at the expressions involving+, the wonder is not why there are as many as 13,600 benchmarks but why there are so few. If the same combinations are applied to other primitives then for inner product alone there should be 38,088 benchmarks. (23Γ—23Γ—72, 23 primitive scalar dyadic functions, 72 argument combinations.) The sharp-eyed reader may have noticed that the coverage for + already should include more combinations:

  xb0+yb1  xb0+yb2  xb0+yb4
  xb0+ys1  xb0+ys2  xb0+ys4
  xb0+yi1  xb0+yi2  xb0+yi4
  xb0+yl1  xb0+yl2  xb0+yl4
  ...
  xl0+yi1  xl0+yi2  xl0+yi4
  xl0+yl1  xl0+yl2  xl0+yl4 

We will be looking to fill in these gaps.

More on the Graphs

You should know that the PQA graphs shown here are atypical. The “blue mountain” is higher and wider than we are used to seeing, and it is unprecedented for a graph (the one for 2016-02-26) to have no red at all.

From version 15.0, the list of new silver bullets is somewhat longer than usual, but silver bullets usually only explain the blue peak. (Usually, speed-ups are implemented only if they offer a factor of 2 or greater improvement, typically achieved only on larger arguments.) What of the part from the “inflection point” around 30 and thence rightward into the red pit?

We believe the differences are due mainly to using a later release of the C compiler, Visual Studio 2015 in 15.0 vs. Visual Studio 2005 in 14.1. The new C compiler better exploits vector and parallel instructions, and that can make a dramatic difference in the performance of some APL primitives without any change to the source code. For example:

      x←?600 800⍴0
      y←?800 700⍴0
      cmpx 'x+.Γ—y' ⍝ 15.0
1.54EΒ―1
      cmpx 'x+.Γ—y' ⍝ 14.1
5.11EΒ―1

(Not all codings of inner product would benefit from these compiler techniques. The way Dyalog coded it does, though.)

Together with improvements for specific primitives such as +.Γ— the new C compiler also provides a measurable general speed-up. Unfortunately, along with optimizations the new C compiler also comes with pessimizations. (Remember: speed-ups will not excuse slow-downs.) Some commonly and widely used C facilities and utilities did not perform as well as before. Your faithful Dyalog development team has been addressing these shortcomings and successfully working around them. Having a tool like PQA has been invaluable in the process.

It has been an … interesting month for PQA from 2016-01-22 to 2016-02-27. Now it’s onward to ameliorating the red under Linux and AIX.