Introduction and Issues

What is a distributed system?

“A collection of autonomous computers linked by a network with software designed to produce an integrated facility”

“A collection of independent computers that appear to the users of the system as a single computer”

Examples

Distributed systems

  • Department computing cluster
  • Corporate systems
  • Cloud systems (e.g. Google, Microsoft, etc.)

Application examples

  • Email
  • News
  • Multimedia information systems - video conferencing
  • Airline reservation system
  • Banking system
  • File downloads (BitTorrent)
  • Messaging

Illustration

_images/issues-3.jpg

A Distributed System

_images/issues-4.jpg

A Centralized Multi-User System

_images/issues-5.jpg

An Application Example

Advantages of Distributed Systems vs. Centralized

Economics

  • Commodity microprocessors have better price/performance than mainframes

Speed

  • Collective power of large number of systems is potentially infinite

Geographic and Responsibility Distribution

  • Can provide better speed in geographic regions by not going over slower transoceanic cables

Reliability

  • One machine’s failure need not bring down the system

Extensibility

  • Computers and software can be added incrementally

Advantages of Distributed Systems vs. Standalones

Data Sharing

  • Multiple users can access common database, data files

Device/Resource Sharing

  • e.g., printers, servers, other CPUs…

Communication

  • Communication with other systems…

Flexibility

  • Spread workload to different & most appropriate systems

Extensibility

  • Add resources and software as needed

Disadvantages of Distributed Systems

Software

  • Little software exists compared to PCs (for example) but the situation is improving with the cloud.

Networking

  • Still slow and can cause other problems (e.g., when disconnected)

Security

  • Data may be accessed by unauthorized users through network interfaces

Privacy

  • Data may be accessed securely but without the owner’s consent (significant issue in modern systems)

Key Characteristics

  • Support for resource sharing
  • Openness
  • Concurrency
  • Scalability
  • Fault Tolerance (Reliability)
  • Transparency

Resource Sharing

Share hardware, software, data and information

Hardware Devices

  • printers, disks, memory, sensors

Software Sharing

  • compilers, libraries, toolkits, computational kernels

Data

  • databases, files

Resources Must be Managed

_images/issues-11.jpg

Resources Must Be Managed

_images/issues-12.jpg

Client-Server Model for Resource Sharing

Openness

Determines whether the system can be extended in various ways without disrupting existing system and services

Hardware extensions (adding peripherals, memory, communication interfaces..)

Software extensions
  • Operating System features
  • Communication protocols
Mainly achieved using published interfaces, standardization

Open Distributed Systems

Are characterized by the fact that their key interfaces are published

Based on the provision of a uniform interprocess communication mechanism and published interfaces for access to shared resources

Can be constructed from heterogeneous hardware and software.

Concurrency

  • In a single system several processes are interleaved
  • In distributed systems - there are many systems with one or more processors
  • Many users simultaneously invoke commands or applications (e.g., Netscape..)
  • Many server processes run concurrently, each responding to different client request, e.g., File Server
_images/issues-16.jpg

Opportunities for Concurrency

Scalability

Scale of system

  • Few PCs servers –> Dept level systems –> Local area network –> Internetworked systems —>Wide area network…
  • Ideally - system and applications software should not (need to) change as systems scales

Scalability depends on all aspects

  • Hardware
  • Software
  • Networks
  • Storage

Fault Tolerance

  • Ability to operate under failure(s) - possibly at a degraded performance level
  • Two Approaches - Hardware redundancy - use of redundant components - Software Recovery - design of programs to recover
  • In distributed systems - servers can be replicated - databases may be replicated - software recovery involves the design so that state of permanent data can be recovered
  • Distributed systems, in general, provide a high(er) degree of availability

Transparency

Transparency “is the concealment from the user of the separation of components of a distributed system so that the system is perceived as a whole”.

Examples

  • Access Transparency - enables local and remote objects to be accessed using identical operations (e g., read file..)
  • Location transparency - location of resources is hidden
  • Migration transparency - resources can move without changing names
  • Replication Transparency - users cannot tell how many copies exist
  • Concurrency Transparency - multiple users can hare resources automatically
  • Parallelism Transparency - activities can happen in parallel without user knowing about it
  • Failure Transparency - concealment of faults

Design Issues

  • Openness
  • Resource Sharing
  • Concurrency
  • Scalability
  • Fault-Tolerance
  • Transparency
  • High-Performance

Issues arising from Distributed Systems

  • Naming - How to uniquely identify resources
  • Communication - How to exchange data and information reliably with good performance
  • Software Structure - How to make software open, extensible, scalable, with high-performance
  • Workload Allocation - Where to perform computations and various services
  • Consistency Maintenance - How to keep consistency at a reasonable cost

Naming

  • A resource must have a name (or identifier) for access
  • Name: Can be interpreted by user, e.g., a file name
  • Identifier - Interpreted by programs, e.g., port number

Naming - Name Resolution

  • “resolved” when it is translated into a form to be used to invoke an action on the resource
  • Usually a communication identified PLUS other attributes
  • E.g., Internet communication id
    • host id:port no
    • also known as “IP address:port no”
    • 192:130.228.6:8000
  • Name resolution may involve several translation steps

Naming - Design Considerations

  • Name space for each type of resource
    • e.g., files, ports, printers, etc.
  • Must be resolvable to communication Ids
    • typically achieved by names and their translation in a “name service”
    • You must have come across “DNS” when using the WWW!!
  • Frequently accessed resources, e.g., files are resolved by resource manager for efficiency
  • Hierarchical Name Space - each part is resolved relative to current context, e.g., file names in UNIX

Communication

Communication is an essential part of distributed systems - e.g., clients and servers must communicate for request and response

Communication normally involved - transfer of data from sender to receiver - synchronization among processes

Communication accomplished by message passing

Synchronous or blocking - sender waits for receiver to execute a receive operation

Asynchronous or non-blocking

Types of Communication

  • Client-Server
  • Group Multicast
  • Function Shipping
  • Performance of distributed systems depends critically on communication performance
  • We will study the software components involved in communication

Client-Server Communication

  • Client sends request to server process
  • Server executes the request
  • Server transmits a reply and data, e.g., file servers, web server...
_images/issues-29.jpg

Client-Server Communication

Client-Server Communication

  • Message Passing Operations
    • send
    • receive
  • Remote Procedure Call (RPC)
    • hides communication behind procedure call abstraction
    • e.g., read(fp,buffer,….)
    • Files reside with the server, thus there will be communication between client and server to satisfy this request

Group Multicast

  • A very important primitive for distributed systems
  • Target of a message is a group of processes
    • e.g., chat room, I sending a message to class list, video conference
  • Where is multicast useful?
    • Locating objects - client multicasts a message to many servers; server that can satisfy request responds
    • Fault-tolerance - more than one server does a job; even if one fails, results still available
    • Multiple updates
  • Hardware support may or may not be available
    • if no hardware support, each recipient is sent a message
_images/issues-32.jpg

Group Multicast

Software Structure

  • In a centralized system, O/S manages resources and provides essential services
  • Basic resource management
    • memory allocation and protection
    • process creation and processor scheduling
    • peripheral device handling
  • User and application services
    • user authentication and access control (e.g., login)
    • file management and access facilities
    • clock facilities

Distributed System Software Structure

  • It must be easy to add new services (flexibility, extensibility, openness requirements)
  • Kernel is normally restricted to
    • memory allocation
    • process creation and scheduling
    • interposes communication
    • peripheral device handling
  • E.g., Microkernels - represent light weight O/S, most services provided as applications on top of microkernels
_images/issues-35.jpg

Distributed System Software Structure

Consistency Management

  • When do consistency problems arise?
    • concurrency
    • sharing data
    • caching
  • Why cache data?
    • for performance, scalability
  • How?
    • Subsequent requests (many of them) need not go over the NETWORK to SERVERS
    • better utilized servers, network and better response
  • Caching is normally transparent, but creates consistency problems

Caching

  • Suppose your program (pseudocode) adds numbers stored in a file as follows (assume each number is 4 bytes:

    for I= 1, 1000
           tmp = read next number from file
           sum = sum + tmp
    end for
    
  • With no caching, each read will go over the network, which will send a new 4 byte number. Assuming 1 millisecond (ms) to get a number, requres a total of 1s to get all of the numbers.

  • With caching, assuming 1000 byte pages, 249 of the 250 reads will be local requests (from the cache).

Consistency

  • Update consistency
    • when multiple processes access and update data concurrently
    • effect should be such that all processes sharing data see the same values (consistent image)
    • E.g., sharing data in a database
  • Replication consistency
    • when data replicated and once process updates it
    • All other processes should see the updated data immediately
    • e.g., replicated files, electronic bulletin board
  • Cache consistency
    • When data (normally at different levels of granularity, such as pages, disk blocks, files…) is cached and updates by one process, it must be invalidated or updated by others
    • When and how depends on the consistency models used

Workload Allocation

  • In distributed systems many resources (e.g., other workstations, servers etc.) may be available for “computing”
  • Capacity and size of memory of a workstation or server may determine what applications may are able to run
  • Parts of applications may be run on different workstations for parallelism (e.g., compiling different files of the same program)
  • Some workstations or servers may have special hardware to do certain types of applications fast (e.g., video compression)
  • Idle workstations may be utilized for better performance and utilization

Processor Pool Model

In a processor pool model, processes are allocated to processors for their lifetime (e.g the Amoeba research O/S supports this concept).

_images/issues-41.jpg

Processor Pool Model

Quality-of-Service

Quality of Service (a.k.a. QoS) refers to performance and other service expectations of a client or an application.

  • Performance
  • Reliability and availability
  • security

Examples where this is important.

  • Voice over IP (VOIP) and telephony
  • Video (e.g. Netflix and friends)