Performance
Caveat: Factors specified on this page are obtained from micro-benchmarks performed on specific primitive functions; in real applications factors will depend on a mix of primitives.
All benchmark tests were performed on 64-bit interpreters on Linux/Microsoft Windows operating systems.
Internal Benchmarks
Internal benchmarking was performed on the initial release of Dyalog version 16.0 and the results compared with the initial release of Dyalog version 15.0.
The benchmarking process comprises over 13,000 benchmarks in more than 130 groups; the group geometric mean timing ratios are measured and plotted against the groups sorted by their means. The vertical axis of the graph shows the ratios as a percentage change; negative values are shown in blue and indicate a performance enhancement, and positive values are shown in red and indicate a deterioration in performance.
Results showed that core interpreter performance in Dyalog version 16.0 has an average improvement of 6% over Dyalog version 15.0.
Specific Speed-Ups in Dyalog Version 16.0
The following table lists speed-ups to specific primitive functions made in Dyalog version 16.0. The improvement factors given in this table are usually obtained on large arguments (thousands or millions of items) measured by cmpx on version 16.0 compared with version 15.0.
Expression | Improvement Factor | Notes |
---|---|---|
Transpose (monadic ⍉) | 5-20 | for Boolean arrays |
Reshape (dyadic ⍴) | unlimited* | for arrays that are not shared and only when there is no rank increase |
≈1.5 | when the left argument is ⍬ | |
Catenate (dyadic ,) | unlimited* | when appending a few elements to a large array that is not shared |
up to 5 | when laminating a non-Boolean array along the last axis | |
5-10 | when laminating a Boolean array along the last axis and the last axis length of the resultant array is less than 64 | |
Catenate First (dyadic ⍪) | unlimited* | when appending a few elements to a large array that is not shared |
Take (dyadic ↑) | 2-10 | when performing an overtake on the last axis of a Boolean array such that the last axis length of the resultant array is less than 64 |
Enlist (monadic ∊) | ≈2 | for any nested array comprising small simple arrays (not mixed type) |
Unique (monadic ∪) Membership (dyadic ∊) Find (dyadic ⍷) Without (dyadic ~) Union (dyadic ∪) Intersection (dyadic ∩) |
up to 2 | for any nested array comprising small simple arrays (not mixed type) |
Expand (dyadic \) Expand First (dyadic ⍀) |
up to 20 | when the left argument is a Boolean array |
up to 5 | when the left argument is a non-Boolean array | |
Encode (dyadic ⊤) | up to 4 | when converting to base-2 |
up to 6 | general case | |
Decode (dyadic ⊥) | up to 4 | when converting from base-2 |
up to 2.5 | general case | |
Index of (dyadic ⍳) | 2-18 | when left and right arguments are different numeric data types |
dyadic ⊣¨ and ⊢¨ | 500-1000 | general case |
unlimited* | when right argument has a single element, making them equivalent to ⊣ and ⊢ | |
monadic ⊂¨ and ⊃¨ | unlimited* | for simple array rights arguments (no-ops) |
Rotate (dyadic ⌽) | up to 12 | operations most improved are:
|
Replicate (dyadic /) Replicate First (dyadic ⌿) |
4-5 | when the left argument is a Boolean array and the processors support the BMI2 instruction set (for Intel this starts with Haswell in 2013) |
up to 5 | when the left argument is a non-Boolean array | |
monadic =\ and ≠\ | 4-7 | for a Boolean array that has a last axis length greater than 32 |
Reverse (monadic ⌽) | up to 2 | for arrays that are not shared |
up to 4 | for a Boolean array that has a last axis length less than 64 |
* speed-ups depend on the size of the arguments (increases as argument size increases)
In addition:
- ⍉⍤2 now uses the same algorithm as transpose with a rank two argument, so it's at least as fast to batch matrices then transpose them as to transpose them individually.
- Transposing row and column vectors is now much faster.
- Operations that copy Boolean data between arrays, for example, reshape and catenate, are up to 6 times faster.
- The :For control structure is up to 1.5 times faster.
- On the Microsoft Windows operating system, Execute (monadic ⍎) is approximately 25 times faster on very long strings containing lots of numbers. This means that the ]IN user command is much faster at loading workspace files containing lots of numeric data.