Sharpening your APL knife
Assumptions
Assumptions -1-
- ∇ r←DropRubbish string
- [1] r←¯4↓string∇
- No meaningful function name
- No comment about what the function is supposed to do
- The name of the right argument tells less than it should
Assumptions -2-
- ∇ r←DropRubbish string
- [1] r←¯4↓string ∇
- In case we get proper data we might be able to work out what DropRubbish is supposed to do
- If not, we are really in trouble
- Since there is a good chance that a crash is caused by faulty data...
The better way
- ∇ r←DropExtension fullPathName
- [1] r←¯4↓fullPathName ∇
- The names chosen for the function and the argument tell the story
- Therefore, no comment is needed
- Always good, because nobody is debugging comments
- Still, there is room for improvement
Even better!
- ∇ r←DropExtension fullPathName
- [1] r←{⍵↓⍨-'.'⍳⍨⌽⍵}fullPathName ∇
- The assumption that an extension has always a length of 3 has been removed
- Note that the assumption might have been true (!) when the function was written
Almost perfect
- ∇ r←DropExtension fullPathName
- [1] r←{'.'∊⍵:⍵↓⍨-'.'⍳⍨⌽⍵ ⋄ ⍵}fullPathName ∇
- Now the function works with and without an extension equally well
- Still, the code comes with the assumption that pathes do not contain dots
Make it shout
- ∇ r←DropExtension fullPathName;errMsg
- [1] errMsg←'Invalid extension'
- [2] errMsg ⎕signal 11/⍨'.'≠1⊃¯4↑fullPathName
- [3] r←¯4↓fullPathName∇
- If an assumption is essential, do not silently rely on it
- Check it, and shout if the result is not fine
- Again, this is self-documenting and therefore much better than a comment
Types of assumptions
There are all kinds of assumptions in our way:
- About the size of data to be processed
- About internal structure
- About rank, shape and type of data
- It is difficult to be aware of an assumption...
- ...but they must be documented somehow
Size -1-
- Assumptions may not only have an impact on readability
- Performance can be an issue as well
- Imagine a matrix with one column keeping digits
- If the programmer gets for whatever reason the impression that the matrix will have 1 to 50 rows...
- It seems to be fine to transfer the digits to numbers by saying:
Size -2-
- The matrix turns out to have up to 100.000 rows
- 2⊃∘⎕vfi¨↓matrix
- takes on my machine 190 ms
- 2⊃⎕vfi,' ',↑,matrix
- would have taken less then 50% of that instead
Assumptions in strategies -1-
- When I started to work on mainframes in 1983, reading relatively large sequential files turned out to be a problem
- Question was: why was this, and how could it be made faster?
Assumptions in strategies -2-
Note that the workspace size was restricted to 4 MB then
The files contained 0 to some 10.000 records
That's what I found:
Assumptions in strategies -3-
Obviously that could be improved a lot:

Problem is: restrictions can be lifted
Assumptions in strategies -4-
- More then 10 years later a new version of the OS was introduced.
- The workpaces size was not restricted to 4 MB any longer but to 1 GB.
- Got slow because now a real monster matrix got initialyzed:
- 12.500.000 ←→ 1E9÷80
- Now the initialisation took much longer than reading the file
End