Geoff Streeter Dyalog Ltd As a credit to John Scholes (1948-2019) I thought I would do this in pared back text. John and I first met in 1975 when we both worked for Atkins. We worked on Dyalog APL together from 1981. I can't claim to have John's presentation ability. That is part of the reason I am not usually seen on these podia. I think this is only my 4th presentation in 42 years. Some brains react better to visual text - mine included - than verbal text. So you get the both. Back in 2013 we had a request from a customer. The customer was suffering from slow startup for his application. The application had several hundred functions in his workspace. His request was for a facility to start with a bare workspace and load functions as their names were referenced. This had echoes of my (and Dave Crossley's) paper in APL 1979 but I had the germ of an idea. I raised the idea which received no enthusiasm and an expressed desire to focus on what was requested. However, I am a natural, non-conformist, dissenter. I still thought it was worth pursuing and we have a policy of a "Friday Afternoon project" for non management approved experiments. It actually took me a day to produce a first cut. That first cut - with a demonstrable prototype - made the others realise that this idea might be worth a bit more effort. So what had I done: Our workspace is divided into "pockets". This is our unit of memory allocation. A pocket has a length, a refcount, a descriptor and some content. ┌────────┬────────┬────────┬────────────────────┐ │LENGTH │REFCOUNT│ZONES │ any old stuff │ └────────┴────────┴────────┴────────────────────┘ The ZONES are bunch of fields which say what to expect in the "any old stuff" - let's call that AOS for short. The AOS can contain pointers to other pockets and the REFCOUNT keeps a track of how many pointers point to this pocket. For a lot of pockets when the REFCOUNT reaches zero the pocket can be discarded and the memory used for another pocket. So for a traditional (not Dfn) function we have several pockets: ┌────────┬────────┬────────┬────────┬────────┬────────┐ │LENGTH │REFCOUNT│DEFUNCT │LST │FNBODY │... │ └────────┴────────┴────────┴───*────┴───*────┴────────┘ The stars mark fields that are pointers to other pockets. The LST (Local Symbol Table) is a pocket of pointers to other pockets: symbols: the names the function uses - think ⎕REFS constants: The explicit constants the function uses 1, 2, 'HELP' ... comments: The comments the function has. The FNBODY is a pocket that does not point elsewhere and defines how the function is split into lines and how it has been tokenised. The next concept you need is of a symbol table. For all versions of Dyalog to date these are binary trees. ┌────────┬────────┬────────┬────────┬────────┬────────┬ ┬─────────┐ │LENGTH │REFCOUNT│SYMBOL │LEFT │RIGHT │VALUE │...│name...\0│ └────────┴────────┴────────┴───*────┴───*────┴───*────┴ ┴─────────┘ Each namespace has its own symbol table including # and ⎕SE. So that gives you some insight into a workspace. However ... Dyalog APL has always supported several workspaces. Originally, this came from how we support )COPY. To do )COPY we allocate memory for another workspace. Load the required workspace into the extra memory and then copy information from that to the main workspace. (In fact originally these two workspaces were in different processes - because the Zilog Z8000 was a 16 bit machine we and only had 64KiB of data per process. We used pipes to transfer the information.) So we can now come to the germ of my idea: If I save the users workspace in a form where it always loads at a known address. I can set up pointers to the addresses from my main workspace to that workspace. Consider our SYMBOL pocket. ┌────────┬ ┬────────┐ ... │VALUE │ ... │foo │ └───*────┴ └────────┘ │ ============┼========= workspace boundary ======================= │ ┌───┴────┬────────┬────────┬────────┬────────┬────────┐ │LENGTH │REFCOUNT│DEFUNCT │LST │FNBODY │... │ └────────┴────────┴────────┴───*────┴───*────┴────────┘ That has not gained us much. We still have to read the users workspace. Just into a different location. But suppose ... We map the workspace into that location. For those not familiar with mapped files: a file is mapped into the address space of the process and portions (pages) of that file are only read on reference. So now we can have that user's workspace loaded and nothing has been read. That is fast. Clearly, we do have to have some adjustment to the main workspace or no code will see the change. All we have to do is to create the symbols in the main workspace and point them at the appropriate pockets in the mapped workspace. Only when those symbols are used will the mapped space be referenced. Before, in a normal loaded workspace: ┌────────┬────────┬────────┬────────┬────────┬────────┬ ┬─────────┐ │LENGTH │REFCOUNT│SYMBOL │LEFT │RIGHT │VALUE │...│name...\0│ └────────┴────────┴────────┴───*────┴───*────┴───┬────┴ ┴─────────┘ ┌────────────────────────────────────────────┘ ┌───┴────┬────────┬────────┬────────┬────────┬────────┐ │LENGTH │REFCOUNT│DEFUNCT │LST │FNBODY │... │ └────────┴────────┴────────┴───*────┴───*────┴────────┘ After, in a workspace with shared code: ┌────────┬────────┬────────┬────────┬────────┬────────┬ ┬─────────┐ │LENGTH │REFCOUNT│SYMBOL │LEFT │RIGHT │VALUE │...│name...\0│ └────────┴────────┴────────┴───*────┴───*────┴───┬────┴ ┴─────────┘ ┌──────────────────────────────────────────────────┘ │ main workspace ======│=============== workplace boundary ======================= │ shared workspace │ ┌────────┬────────┬────────┬────────┬────────┬────────┬ ┬─────────┐ │ │LENGTH │REFCOUNT│SYMBOL │LEFT │RIGHT │VALUE │...│name...\0│ │ └────────┴────────┴────────┴───*────┴───*────┴───┬────┴ ┴─────────┘ │ ┌────────────────────────────────────────────┘ │ ┌───┴────┬────────┬────────┬────────┬────────┬────────┐ └─┤LENGTH │REFCOUNT│DEFUNCT │LST │FNBODY │... │ └────────┴────────┴────────┴───*────┴───*────┴────────┘ So we come back to my lashed up experiment. I took a copy of the code that supports 0 ⎕SAVE. I used some of the code we use for fixing up workspaces as we load them and used that code to create the workspace at a known fixed address and saved it to file. I could then map that workspace (I actually used dfns.dws) at that known address. I could walk the symbol table and copy the appropriate symbols into the main workspace. Lo and behold I have working dfns.dws which only gets paged in on demand. This worked excellently ... on Linux. On Windows it worked but not as fast as I hoped. The customer is committed to Windows - poor benighted soul. Why was this? Windows does not page as well as Linux. It also does not create processes as well. The two aspects may be linked as process code is mapped at kernel level. There are things I can do to mitigate that. I have lots of time as we prepare the workspace (at save time) so I chose to sort the pockets. The first idea was to sort all the symbol pockets together so that they would be grouped in as few pages as possible. Since I was sorting pockets I could also sort data pockets by length so that all those small scalar pockets are packed into only a few pages. That customer has lots of these. That was enough to have a working system that met the customer's desire for fast startup on his system. Then we started to realise that we had other benefits that were not originally envisaged. The customers system looks like: ┌──────────┐ │ SERVER │ │ FOR CODE │ └──────────┘ │ piece of wet string - slow connection │ ┌──────────┐ │ LOCAL │ │ SERVER │ │ ROOM │ └────┬─────┘ │ fast connection - on the same machine │ ┌────────────────┴──────────────┐ ┌───┴────┐ ┌───┴────┐ │ USER │ │ USER │ │ PROCESS│ ... │ PROCESS│ │ 0 │ │ N │ └────────┘ └────────┘ So now user process 0 starts up and pages are dragged across the piece of wet string to the local server. These pages are cached there and when the next user process starts they are already accessible. So user 1 gets even faster startup than user 0. The cache is maintained by the operating system - I do not have to do anything. Let's revisit the way we allocate memory in the workspaces - note that plural. ┌────────┬────────┬────────┬────────────────────┐ │LENGTH │REFCOUNT│ZONES │ any old stuff │ └────────┴────────┴────────┴────────────────────┘ With our new memory mapped approach. ┬────────┬ ┬────────┐ ... │VALUE │ ... │foo │ ┴───*────┴ ┴────────┘ │ ============┼========= workspace boundary ======================= │ ┌───┴────┬────────┬────────┬────────┬────────┬────────┐ │LENGTH │REFCOUNT│DEFUNCT │LST │FNBODY │... │ └────────┴────────┴────────┴───*────┴───*────┴────────┘ That pointer from VALUE to the function pocket has caused the REFCOUNT to increase. However, this is shared memory so you may think that all users of that memory would see the same increased value of the REFCOUNT and worse that we may have a race condition as multiple users increase/decrease the REFCOUNT. But ... as a brilliant part of paging systems we have COPY ON WRITE (COW). This means that if a mapped page is modified then it does not change the original but instead a local page in the users swap space is allocated instead. So each process that modifies a page gets its own copy. I tried to find the history (and inventor) of COW but it seems to have had a slow gestation. Hardware support seems to come from DEC VAX. Developed software from BSD4.4. Today it is pretty universal. For Intel it goes back to the 386. The driving force for the development was the Unix fork() system call that is fundamental to Unix (and Linux) process creation. fork() creates a new process identical (nearly - lots of subtleties) to the current process. It is nearly always followed by an exec() call to replace the current program with a new, different, program. Replicating all of the current process's pages just to dispose of them was too expensive. I use COW both for aspects of ⎕MAP and for the shared code file implementation presented here. The use of COW means that all of our normal code just works. REFCOUNTs get adjusted correctly. The penalty, of course, is that the memory footprint of each task increases as pages are copied. There are of course some specific adjustments needed. That function pocket: ┌────────┬────────┬────────┬────────┬────────┬────────┐ │LENGTH │REFCOUNT│DEFUNCT │LST │FNBODY │... │ └────────┴────────┴────────┴───*────┴───*────┴────────┘ The FNBODY is not a problem. It points to a pocket that does not change and can remain in shared (read only) memory. Every user sees the same pocket (apart from its REFCOUNT and that does not change until the function executes). The FNBODY contains the tokenised code and the definition of the lines in the function. The LST (Local Symbol Table) is more of an issue. It contains pointers to SYMBOLS and CONSTANTS. For this purpose COMMENTS are CONSTANTS. However each SYMBOL pocket has a pointer to its namespace (# is a namespace, ⎕SE is a namespace, anything created with ⎕NS is a namespace, ...). If your function creates a name, A←1, then the A is a SYMBOL but when it is accessed it belongs to a namespace. In a shared code file the SYMBOL points to a namespace in the shared code file but when the function executes it needs to access the name in the appropriate namespace in the main workspace. We already have code that copes with this because it can already happen that code is created in one namespace but executes in another. So we walk the LST and make all of the symbols point to the correct namespace. That is OK but expensive. It is most likely that the next invocation of the function will be in the same namespace so we write the new LST pointer back into the function pocket. This would be OK in our shared code file (COW would cope) but in order to avoid doing things to that workspace that would involve accessing all of the pockets - compaction, garbage collection - we avoid creating pointers in the shared code file workspace that point back into the main workspace. OK: ┬────────┬ ... │pointer │ ... ┴───┬────┴ main workspace ============┼========= workspace boundary ======================= ↓ ┌───┴────┬────────┬────────┬ shared workspace │LENGTH │REFCOUNT│ZONES │ ... └────────┴────────┴────────┴ Not OK: ┌────────┬────────┬────────┬ │LENGTH │REFCOUNT│ZONES │ ... └───┬────┴────────┴────────┴ main workspace ============┼========= workspace boundary ======================= ↑ ┬───┴────┬ shared workspace ... │pointer │ ... ┴────────┴ So ... we pay the cost on function invocation of checking if the function pocket is in a shared code file and, if so, create a copy in the main workspace. Then the replacement of the LST is not a problem. Back to the original request. The customer wanted to load FUNCTIONS on demand. We have also delivered loading DATA on demand. Another bonus. However, never satisfied, the customer wanted namespaces (actually classes) but for me that is the same thing. Not so easy. For instance, similar to symbols, the namespace has a pointer to its parent (##). That parent is in the shared code file not the user's workspace. So while we walk the global symbols to put them in the main workspace we also copy across VALUEs that are namespaces. That copy is not complete some things in the space can remain shared - FUNCTIONs and DATA - but not sub namespaces. Although their FUNCTIONs and DATA can remain shared. We also, at shared code file creation time, can mark SYMBOL values that contain namespaces at a lower level. So the SYMBOL may point to a nested array that contains namespaces. These can also be pulled across as the shared code file is attached. Again we do not have to drag FUNCTIONs and DATA across. Morten now realised that this could give him benefits for isolates. Isolates are separate processes with which APL communicates. They may require a lot of supporting APL code. If each isolate loads a shared code file then it gets the benefit of not accessing stuff it does not use and not impacting the system memory use for stuff it does not use. I don't think the original customer uses isolates - yet. However, really good ideas feed through into strange and unplanned places. I first learned this when I. P. Sharp introduced "packages" in the late 1970s. I am sure the developers did not expect the deluge of applications that bent their use. We have seen what you can do. Let's examine what you can't do. At this point I had wanted to wave around a copy of our book "Shared Code Files User Guide" but LuLu (who do our print on demand) will not print less than 50 pages so I will have to rely on using the screen to show the pdf. Credit Richard Smith and Fiona Smith. In it we list the current restrictions. Some of those may be temporary. Some are just us nudging you in the right direction (no classic). Some are for good, hard, reasons. 64 bit - mapped memory does not go well with a 32 bit address space. We allow the use of 8 shared workspaces concurently. The 8 slots allocated are at 1TiB intervals. I wanted to leave some space for workspaces to grow over the next year or two. No ⎕WC (or .Net) objects is likely to be the one restriction you may notice. Was the customer happy with the solution he had not asked for? Happy enough to be pushing for some futher extensions - so I guess the answer is yes.