Muq Source Code - Transparent Distributed Operation

Go to the first, previous, next, last section, table of contents.

Transparent Distributed Operation

This section contains preliminary design notes on a facility not yet implemented.

One of my prime ambitions for Muq is eventual close-coupled operation of Muq servers and clients over wide area networks, with near-complete network transparency to code running on the server and client: Ideally, you should have to use a special "remote?" predicate to tell if a given object is local or remote.

That objective remains particularly ambitious because we must in practice assume that any Muq clients involved have been maliciously modified to corrupt or cripple the system in any possible way.

In this section we make a start by considering the easier problem of transparent operation spread over multiple servers, where we assume that the servers are relatively small in number (say, 3-12 rather than hundreds or thousands), stable, and trusted: We won't attempt to defend against maliciously modified servers, and we will assume other servers are as entitled to private data &tc as our own.

As a general model, I assume that each server maintains an open reliable-stream socket connection directly to each other server, and that the elementary unit of interaction via these connections is a packet of the general form

LENGTH-IN-BYTES (4 bytes) OPCODE (4 bytes) data

where I'll assume the packets are short enough to be reasonably read and written indivisibly -- maximum size something like 4K to 64K -- and that the server won't have to worry about checksumming, partial packets and such.

I envision each server checking for and processing I/O from other servers about as often as it does so with the client connections.

The first problem to solve is the representation of a reference on one server to an object on another server.

The most elegant solution would be something like adding a 'server' bitfield to our Vm_Obj format, so that any pointer could indifferently point to any value on any machine.

Alas, with a 32-bit Vm_Obj value, we are already hard up against the wall for bits, and adding even another 2-bit field seems quite out of the question, whereas hardwiring even a 6-bit field seems a quite dubious proposition... one would hate to hit a hard design wall when a server network reached 64 servers, and the option of letting client programs all over Internet dynamically join the transparent-networking pool would be completely forclosed if we took this approach.

The only other category of design solution which comes to mind is that of proxy objects: We can create objects in the db which, like symbolic links in the unix file system, contain the address of the real value rather than the value itself. Since Muq is a value-typed virtual machine, it can relatively easily intercept all operations on such proxies and handle them specially via the interserver sockets.

I would envision these proxy objects as containing:

A designation of the remote server, perhaps as a :sl.tcp.com/9876 style keyword giving the domain name and port for the remote server (numbering servers and indirecting through a table might simplify handling server relocations, but might be unwieldy in the extention to clients)
The remote server's Vm_Obj designation for the object.
A 32-bit object ID allowing detection of stale references -- hanging pointers. I envision the object ID as being an integer maintained in each server, incremented each time vm_Malloc() is called, and stored in each addressable object. This adds 32 bits of overhead to every addressable object to support networking, which I believe is reasonable.
Possibly some invariant information about the object, such as the class of object, merely to reduce traffic to the server owning the object.

I'm assuming garbage collection is done locally on each server, ignoring possible remote pointers in other servers. Doing otherwise seems likely to introduce unacceptable performance hits. This does of course diminish network transparency, since a new class of "object vanished!" errors becomes possible for remote objects. I expect this to prove a practical compromise, possibly with a little help from Muq programmers learning to avoid references to transitory foriegn objects and/or server tweaks to remember pointers which have been recently exported.

With the proxy concept established, we can specify the lowlevel interserver packet transmission/translation process in a little more detail.

We will regard the lowlevel interserver packet transmission/ translation problem as solved if we have a satisfactory way of communicating any addressable db component, where in general readOnly values (ints, chars, readOnly strings...) are sent by value, and others by constructing proxy references.

We note that the Muq server has sufficient information to classify any given Vm_Obj value as one of the following:

An immediate value: char, int, short-float...
A pointer to a sequence of 1-byte binary values
A pointer to a sequence of 2-byte binary values
A pointer to a sequence of 4-byte binary values
A pointer to a sequence of 8-byte binary values
A pointer to a sequence of (4-byte) Vm_Obj values

(We oversimplify only slightly; All but the first actually have an associated Vm_Obj value giving the owner, and in this proposal will also have a 32-bit object ID. These pose no new problems, so we ignore them here.)

The first case may be handled by simply byteswapping to network order and transmitting.

When readOnly, the binary cases may be handled by byteswapping the internal fields to network order and transmitting, creating a new object with the matching contents at the far end. When not readOnly, a proxy pointer should be constructed at the far end.

The final case will normally not be readOnly, so it will need to be handled by constructing a proxy pointer at the far end. Read-only values of this sort could in principle be handled by constructing a corresponding value at the far end, and handling the contents recursively.

A problem can occur in the readOnly cases in that values can be duplicated: A single readOnly string on one host might wind up being instantiated several times on a remote host, once for each reference to it. This might break some code using lispStyle 'eq' compares. The most conservative solution is to always use proxies except for immediate values; This avoids the problem completely, but may result in an unacceptable amount of network traffic. It might be the best first-cut implementation, however.

Given such a picture, we need to convince ourselves that all prims implemented in the server can be extended to work correctly when handed proxy arguments.

We tackle the prims by classes:

Prims which internally gaily traverse obj/prop chains on the assumption that this is reasonably fast may need to be recoded: Any time we hit a reference to a remote object, we may need to stop for a good long while waiting for a response, and we cannot freeze the entire Muq server that long. Restarting a complex instruction midway means either rerunning it from the beginning, or icky hacks to save and restore the half-completed state. Any prims requiring the latter probably should be just recoded in-db.

Prims which get/set/etc properties on an object can be delegated to the server holding that object. The key comparison logic obj_Neql() needs to be extended to handle proxy objects, but that is fairly straightforward for values compared by pointer. Strings are currently the only types compared by value; It might help to pass them by value and let the 'eq' people suffer.

Prims which only admit of one potential proxy value can always delegate the operation to the home server. The trickiest operation in this class is probably a function call on a remote function; a stub 'job' will need to be created on the far end, and 'throw()' will need to be able to handle a job in which the bottom of the loop stack is remote in this fashion. A fair amount of busywork, but not obviously a major problem

Type-indifferent prims which merely shuffle pointers without carrying about their meaning, such as those that rotate stack blocks and copy values into and out of variables, clearly don't need any recoding to work with proxies.

Conceptually, the most difficult cases are prims which seriously need to deal with the contents of two objects; If one is remote, and one local, at least a temporary copy of the remote contents may be needed. I do not believe there are any prims which modify the contents of two objects at once, so it should be possible to always delegate these prims to the home server of the object being modified, and treat this as a problem of making a temporary copy of the remaining object(s) for readOnly purposes.

The central, canonical implementation difficulty seems to be: We're in the middle of prim X, we do vm_Loc() (directly or indirectly via something like VEC_P()), and the value in question is a proxy. What should we do? We need to

Issue some request to the home server of the object;
Block the job
Arrange somehow for processing to pick up where it left off.

If prims always argchecked all the objects they intended to vm_Loc() in the prologue before beginning work, we could perhaps block at that point, replace the proxy reference by a local copy, and continue. This would require being sure that the pointer to the local copy wasn't going to get stored into the db, and that the local copy wasn't going to be modified. Would it be realistic to require all prims to obey this protocol?

If speed is the main objection to the above, it might be possible to code speed-critical fns in C, but have a muf-coded version available, and have the C version retreat to calling the MUF one if proxies are encountered.

The icky alternative is to have prims which can block on access to a remote object be explicitly coded at each such spot to save their state in the db, and then have provisions for restoring it. Prims coded using recursive C calls and saving lots of state on the C stack would obviously be poor candidates for this treatment.

It sounds as though a careful survey and perhaps rewrite of the prim instruction set is called for before proceeding further with the design.

Go to the first, previous, next, last section, table of contents.