Tandem K2000 computer family
In 1976, Tandem Computers Inc. released its first Tandem NonStop (TNS) — in today's terminology, fault-tolerant— system, which consisted of a varying number of independent small computers connected to a network (not always local), so that in the event of a hardware or software failure, the fault-free state could be automatically restored without data loss.
Although such a system did not operate in Hungary due to embargo restrictions, we will now describe the structure of the first TNS system (picture), referring to the NonStop Himalaya K-series released in 1993.
Use
TNS (fault-tolerant) systems were developed mainly to handle banking applications and calculations, for continuous, reliable, error-free recording of customer data and transactions (purchases, ATM withdrawals, stock market transactions, etc.).
Structure
A fundamental characteristic of the architecture was that there were no shared resources.
The Tandem/16 (T/16, later NonStop I) system consisted of 2-16 central units; each was designed
- own control microprocessor,
- own, unshared main repository and
- own I/O control coprocessor
which were connected by their own internal rail system.
The B/K controllers of the central units were dually connected to each other via a common bus system (DynaBus) implemented on the back panel.
The central unit pairs were connected to two independent power supplies.
The magnetic disks belonging to the system were organized in mirror pairs on separate lines, via two independent disk controllers.
For safety reasons, the shared rail systems were designed with some redundancy, which increased costs somewhat; but essential components were not duplicated simply for fault detection purposes.
Operation
The dual connections allowed the system to remain operational even if a central unit or power supply failed. If a fault occurred in one of the central units or its bus system, its undamaged counterpart continued processing.
If one disk failed, work continued with the undamaged data on its mirror counterpart.
The system was designed to detect as many errors as possible as soon as possible ("fast errors") and to filter out corrupted data before it was permanently included in the database. To this end, the program processes communicated with each other through "master/slave" message exchange, without sharing memory, so they worked reliably in systems with different configurations.
The file manager and transaction manager processes could exchange message pairs independently of each other.
Program set
- The Guardian operating system, commonly used in TNS systems, was written in the machine-independent TPL (Transaction Processing Language), which was developed from the HP 3000 SPL (System Programming Language). Although its syntax resembled C, it was essentially based on Burroughs' Algol language and was capable of controlling message exchange between running and sleeping processes, making it an effective aid in the detection of both hardware and software errors.
- The Cobol 74, Fortran, and MUMPS compilers were also developed in TPL.
Historical curiosities
While the mean time between failures (MTBF) for traditional mainframe systems of the time was 4-5 days, TNS systems achieved 100 times this, with continuous operation measured in years. The latter also remained competitive in price: the price of a configuration consisting of two central units was about twice that of a traditional single-machine system, while other fault-tolerant systems with similar performance exceeded the price of single-machine ones by more than four times.
The NonStop name has survived to this day: it is used in a separate product line of Hewlett-Packard, the company that the founders originally left to create Tandem.
Over the course of more than two decades, from the 1970s to the mid-1990s, Tandem shifted from manufacturing custom hardware devices to designing processors. Compaq eventually acquired the company in 1977 to expand the company's line of reliable servers.
Resources
Detailed description: Tandem Computers
Created: 2017.01.21. 22:50
Last modified: 2020.07.24. 12:04
