I have begun my formal study of distributed systems. Here are a few initial thoughts; just brainwaves that occurred while reading. The reference text is A. Tanenbaum and M. Van Steen, Distributed Systems: Principles and Paradigms.
A distributed system is a collection of logically connected computational devices that behave as a single system.
Important characteristics are:
- Openness (or portability)
Scalability doesn’t just refer to the amount of data. Systems can also be scaled geographically, by placing new nodes in other parts of the office, city, region, country, world or universe! Reasons for doing this might include moving resources physically closer to users to combat network latency. Systems can also scale administratively, which means having the capacity to be controlled by a greater number of users. Administrative scaling is apparently the hardest form to get right.
The most important thing when designing algorithms for a distributed system is to make them decentralised:
- No machine participating in the algorithm should know everything about the system state.
- Each machine makes decisions based solely on its own local data.
- Failure of one node shouldn’t ruin the algorithm.
- No assumption of an implicit internal clock.
The last one, although it sounds hard, just means that no part of the algorithm should depend on a schedule. Never assume that any two nodes are set to the same time, no matter how great your synchronisation is. There’s a difference between elapsed time and clock time.
When scaling geographically or administratively, you need to hide communication latency as best as you can. Use caching and/or replication.
Speaking of those, remember that they’re two different concepts. Caching is not deterministic; it’s done on demand based on actual executed functions. Replication is usually planned in advance based on expected use of the system.
The Authors mention security issues with scalability. Perhaps some technical issues have social solutions?
It seems that there is more of an emphasis on hiding the fact the a system is distributed from the user than I would have though necessary. While it’s not necessarily obvious to a user that their request is being processed by more than one machine, I didn’t think it was so important to hide this fact from them. Conversely, it pervasive networks (like smart homes or dynamic environments consisting of mobile devices) need to make their distributed nature clear to afford greater usability. While some transparency is definitely needed, and ironically the most important aspect to make transparent, Failure, is one of the hardest to do. Other features, like concurrency, aren’t always possible to hide. Suppose two users clicked “save” on the same object at the same time, and one of the transactions needed to be aborted. Do you try again, even though the object might be in a state where the second
Openness refers to generality, and leads to a further form of scalability: the functional kind. A system’s interface needs to be generic enough to allow any protocol to be implemented on it. Usually an Interface Definition Language is employed. A language which describes the interface but not the implementation is useful because it allows two computers with different implementations of the same interface to participate in the system; clients need not know. Although, it seems to me that languages are getting to the point where they could define both interface and semantic implementation; the compiler/interpreter would be responsible for actual implementation. This ruins language portability, though.
IDL should be describe the whole system interface with no ambiguity or room for implementation-specific behaviour (in an ideal world). It should also avoid describing implementation in any way. Neutrality is usually easier; if you find yourself getting too specific, just abstract further. Completeness is more difficult; spec sometimes devolves into formal language which is not easily parseable: need better IDLs, existing declarative programming languages (think C++ headers or C# interfaces) describe function name, parameters and return but do not describe behaviour. Ada has pre/post conditions which is better.
So, it seems like each of the three characteristics influence each other; scalability requires transparency, which needs openness to achieve. You can’t have one without the other. These three words, Scalability, Transparency, Openness, form a definition of “Distributed System” on their own.
I love transaction processing. Difficult to see how to do it well without a single monitoring application. This kind of violates the decentralisation of algorithms. More research required! Asynchronous operations make TP difficult, because of competition for resources (usually trying to access a DB object at the same time). Much time spent waiting.
Achieves horizontal scaling. Structured architectures, where resources are shared according to some algorithm, are more pervasive. What happens if there’s failure? Is there a good, general way to handle a node (who might be the only one with a copy of some resource) from dropping out. I guess you just shouldn’t use P2P for critical systems! Unstructured architectures are rare.
One example is the smart home, with many small devices (perhaps embedded or portable) defining the system. Another is the Body Area Network, where many sensors and devices are attached to the body to collect health information. Challenges include: how and when to store data, adapting to context changes. It looks like a lot of this needs to be very generic; users will want tight control over behaviour.
Sensor networks are an important example. Usually a homogeneous (in terms of capability, at least) network of small wireless devices that record data at particular locations. Two options for operation: all nodes in the network send their data to central server, or all nodes store their own data and respond to network-wide queries. Both options are suboptimal. Need to research better ways to do this. Probably need to sacrifice transparency in order to identify sensors based on location etc.