Preface
About the Authors.
PART SYSTEMS MODELING, CLUSTERING
AND VIRTUALIZATION
CHAPTER Distributed System Models and Enabling Technologies
Summary
1.1 Scalable Computing over the Internet
1.1.1 The Age of Internet Computing
1.1.2 Scalable Computing Trends and New Paradigms8
1.1.3 The Internet of Things and Cyber-Physical Systems
1.2 Technologies for Network-Based Systems.13
1.2.1 Multicore CPUs and Multithreading Technologies
1.2.2 GPU Computing to Exascale and Beyond.
1.2.3 Memory, Storage, and Wide-Area Networking.
1.2.4 Virtual Machines and Virtualization Middleware.
1.2.5 Data Center Virtualization for Cloud Computing.
1.3 System Models for Distributed and Cloud Computing.
1.3.1 Clusters of Cooperative Computers.
1.3.2 Grid Computing Infrastructures.
1.3.3 Peer-to-Peer Network Families
1.3.4 Cloud Computing over the Internet.
1.4 Software Environments for Distributed Systems and Clouds.
1.4.1 Service-Oriented Architecture (SOA)
1.4.2 Trends toward Distributed Operating Systems.
1.4.3 Parallel and Distributed Programming Models.
1.5 Performance, Security, and Energy Efficiency
1.5.1 Performance Metrics and Scalability Analysis.
1.5.2 Fault Tolerance and System Availability.
1.5.3 Network Threats and Data Integrity
1.5.4 Energy Efficiency in Distributed Computing.
1.6 Bibliographic Notes and Homework Problems.
Acknowledgments.
References
Homework Problems.
Foreword.
CHAPTER Computer Clusters for Scalable Parallel Computing
Summary.
2.1 Clustering for Massive Parallelism
2.1.1 Cluster Development Trends
2.1.2 Design Objectives of Computer Clusters.
2.1.3 Fundamental Cluster Design Issues.
2.1.4 Analysis of the Top Supercomputers.
2.2 Computer Clusters and MPP Architectures
2.2.1 Cluster Organization and Resource Sharing
2.2.2 Node Architectures and MPP Packaging.
2.2.3 Cluster System Interconnects
2.2.4 Hardware, Software, and Middleware Support.
2.2.5 GPU Clusters for Massive Parallelism
2.3 Design Principles of Computer Clusters
2.3.1 Single-System Image Features
2.3.2 High Availability through Redundancy.
2.3.3 Fault-Tolerant Cluster Configurations
2.3.4 Checkpointing and Recovery Techniques
2.4 Cluster Job and Resource Management
2.4.1 Cluster Job Scheduling Methods
2.4.2 Cluster Job Management Systems.
2.4.3 Load Sharing Facility (LSF) for Cluster Computing
2.4.4 MOSIX: An OS for Linux Clusters and Clouds.
2.5 Case Studies of Top Supercomputer Systems.
2.5.1 Tianhe-1A: The World Fastest Supercomputer in 10
2.5.2 Cray XT5 Jaguar: The Top Supercomputer in 09
2.5.3 IBM Roadrunner: The Top Supercomputer in 08
2.6 Bibliographic Notes and Homework Problems
Acknowledgments. 1
References.
Homework Problems.
CHAPTER Virtual Machines and Virtualization of Clusters and Data Centers.
Summary
3.1 Implementation Levels of Virtualization
3.1.1 Levels of Virtualization Implementation.
3.1.2 VMM Design Requirements and Providers.
3.1.3 Virtualization Support at the OS Level
3.1.4 Middleware Support for Virtualization
3.2 Virtualization Structures/Tools and Mechanisms.
3.2.1 Hypervisor and Xen Architecture.
3.2.2 Binary Translation with Full Virtualization.
3.2.3 Para-Virtualization with Compiler Support.
xii Contents
3.3 Virtualization of CPU, Memory, and I/O Devices.
3.3.1 Hardware Support for Virtualization
3.3.2 CPU Virtualization
3.3.3 Memory Virtualization.
3.3.4 I/O Virtualization150
3.3.5 Virtualization in Multi-Core Processors.
3.4 Virtual Clusters and Resource Management.
3.4.1 Physical versus Virtual Clusters
3.4.2 Live VM Migration Steps and Performance Effects.
3.4.3 Migration of Memory, Files, and Network Resources.
3.4.4 Dynamic Deployment of Virtual Clusters
3.5 Virtualization for Data-Center Automation
3.5.1 Server Consolidation in Data Centers
3.5.2 Virtual Storage Management. 1
3.5.3 Cloud OS for Virtualized Data Centers.
3.5.4 Trust Management in Virtualized Data Centers.
3.6 Bibliographic Notes and Homework Problems
Acknowledgments.
References.
Homework Problems.
PART COMPUTING CLOUDS, SERVICE-ORIENTED
ARCHITECTURE, AND PROGRAMMING
CHAPTER Cloud Platform Architecture over Virtualized Data Centers
Summary
4.1 Cloud Computing and Service Models.
4.1.1 Public, Private, and Hybrid Clouds.
4.1.2 Cloud Ecosystem and Enabling Technologies.
4.1.3 Infrastructure-as-a-Service (IaaS)
4.1.4 Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS).
4.2 Data-Center Design and Interconnection Networks206
4.2.1 Warehouse-Scale Data-Center Design206
4.2.2 Data-Center Interconnection Networks
4.2.3 Modular Data Center in Shipping Containers.
4.2.4 Interconnection of Modular Data Centers
4.2.5 Data-Center Management Issues
4.3 Architectural Design of Compute and Storage Clouds.
4.3.1 A Generic Cloud Architecture Design
4.3.2 Layered Cloud Architectural Development.
4.3.3 Virtualization Support and Disaster Recovery.
4.3.4 Architectural Design Challenges
Contents xiii
4.4 Public Cloud Platforms: GAE, AWS, and Azure
4.4.1 Public Clouds and Service Offerings.
4.4.2 Google App Engine (GAE)229
4.4.3 Amazon Web Services (AWS).
4.4.4 Microsoft Windows Azure.
4.5 Inter-cloud Resource Management
4.5.1 Extended Cloud Computing Services.
4.5.2 Resource Provisioning and Platform Deployment
4.5.3 Virtual Machine Creation and Management.
4.5.4 Global Exchange of Cloud Resources
4.6 Cloud Security and Trust Management.
4.6.1 Cloud Security Defense Strategies.
4.6.2 Distributed Intrusion/Anomaly Detection
4.6.3 Data and Software Protection Techniques
4.6.4 Reputation-Guided Protection of Data Centers
4.7 Bibliographic Notes and Homework Problems
Acknowledgements
References.
Homework Problems.
CHAPTER Service-Oriented Architectures for Distributed Computing
Summary
5.1 Services and Service-Oriented Architecture
5.1.1 REST and Systems of Systems.
5.1.2 Services and Web Services.
5.1.3 Enterprise Multitier Architecture
5.1.4 Grid Services and OGSA.
5.1.5 Other Service-Oriented Architectures and Systems.
5.2 Message-Oriented Middleware
5.2.1 Enterprise Bus.
5.2.2 Publish-Subscribe Model and Notification
5.2.3 Queuing and Messaging Systems.
5.2.4 Cloud or Grid Middleware Applications.
5.3 Portals and Science Gateways
5.3.1 Science Gateway Exemplars
5.3.2 HUBzero Platform for Scientific Collaboration
5.3.3 Open Gateway Computing Environments (OGCE).
5.4 Discovery, Registries, Metadata, and Databases.
5.4.1 UDDI and Service Registries.
5.4.2 Databases and Publish-Subscribe
5.4.3 Metadata Catalogs308
5.4.4 Semantic Web and Grid
5.4.5 Job Execution Environments and Monitoring.
xiv Contents
5.5 Workflow in Service-Oriented Architectures.
5.5.1 Basic Workflow Concepts.315
5.5.2 Workflow Standards316
5.5.3 Workflow Architecture and Specification.
5.5.4 Workflow Execution Engine319
5.5.5 Scripting Workflow System Swift.
5.6 Bibliographic Notes and Homework Problems
Acknowledgements
References.
Homework Problems.
CHAPTER Cloud Programming and Software Environments.
Summary
6.1 Features of Cloud and Grid Platforms
6.1.1 Cloud Capabilities and Platform Features
6.1.2 Traditional Features Common to Grids and Clouds.
6.1.3 Data Features and Databases.
6.1.4 Programming and Runtime Support341
6.2 Parallel and Distributed Programming Paradigms
6.2.1 Parallel Computing and Programming Paradigms
6.2.2 MapReduce, Twister, and Iterative MapReduce.
6.2.3 Hadoop Library from Apache.355
6.2.4 Dryad and DryadLINQ from Microsoft.
6.2.5 Sawzall and Pig Latin High-Level Languages.
6.2.6 Mapping Applications to Parallel and Distributed Systems
6.3 Programming Support of Google App Engine
6.3.1 Programming the Google App Engine
6.3.2 Google File System (GFS).
6.3.3 BigTable, Google’s NOSQL System
6.3.4 Chubby, Google’s Distributed Lock Service.
6.4 Programming on Amazon AWS and Microsoft Azure.
6.4.1 Programming on Amazon EC2.
6.4.2 Amazon Simple Storage Service (S3).
6.4.3 Amazon Elastic Block Store (EBS) and SimpleDB.
6.4.4 Microsoft Azure Programming Support.
6.5 Emerging Cloud Software Environments.
6.5.1 Open Source Eucalyptus and Nimbus.
6.5.2 OpenNebula, Sector/Sphere, and OpenStack.
6.5.3 Manjrasoft Aneka Cloud and Appliances.
6.6 Bibliographic Notes and Homework Problems399
Acknowledgement
References.
Homework Problems.
Contents xv
PART GRIDS, P2P, AND THE FUTURE INTERNET
CHAPTER Grid Computing Systems and Resource Management
Summary 16
7.1 Grid Architecture and Service Modeling.
7.1.1 Grid History and Service Families.
7.1.2 CPU Scavenging and Virtual Supercomputers419
7.1.3 Open Grid Services Architecture (OGSA)
7.1.4 Data-Intensive Grid Service Models425
7.2 Grid Projects and Grid Systems Built
7.2.1 National Grids and International Projects.
7.2.2 NSF TeraGrid in the United States.
7.2.3 DataGrid in the European Union
7.2.4 The ChinaGrid Design Experiences
7.3 Grid Resource Management and Brokering
7.3.1 Resource Management and Job Scheduling.
7.3.2 Grid Resource Monitoring with CGSP
7.3.3 Service Accounting and Economy Model
7.3.4 Resource Brokering with Gridbus.
7.4 Software and Middleware for Grid Computing
7.4.1 Open Source Grid Middleware Packages.
7.4.2 The Globus Toolkit Architecture (GT4).
7.4.3 Containers and Resources/Data Management.
7.4.4 The ChinaGrid Support Platform (CGSP)
7.5 Grid Application Trends and Security Measures
7.5.1 Grid Applications and Technology Fusion
7.5.2 Grid Workload and Performance Prediction.
7.5.3 Trust Models for Grid Security Enforcement
7.5.4 Authentication and Authorization Methods
7.5.5 Grid Security Infrastructure (GSI).
7.6 Bibliographic Notes and Homework Problems
Acknowledgments
References471
Homework Problems
CHAPTER Peer-to-Peer Computing and Overlay Networks
Summary
8.1 Peer-to-Peer Computing Systems.
8.1.1 Basic Concepts of P2P Computing Systems.
8.1.2 Fundamental Challenges in P2P Computing.
8.1.3 Taxonomy of P2P Network Systems.
8.2 P2P Overlay Networks and Properties
8.2.1 Unstructured P2P Overlay Networks
xvi Contents
8.2.2 Distributed Hash Tables (DHTs)
8.2.3 Structured P2P Overlay Networks.
8.2.4 Hierarchically Structured Overlay Networks
8.3 Routing, Proximity, and Fault Tolerance
8.3.1 Routing in P2P Overlay Networks.
8.3.2 Network Proximity in P2P Overlays
8.3.3 Fault Tolerance and Failure Recovery
8.3.4 Churn Resilience against Failures.
8.4 Trust, Reputation, and Security Management
8.4.1 Peer Trust and Reputation Systems
8.4.2 Trust Overlay and DHT Implementation
8.4.3 PowerTrust: A Scalable Reputation System.
8.4.4 Securing Overlays to Prevent DDoS Attacks.
8.5 P2P File Sharing and Copyright Protection
8.5.1 Fast Search, Replica, and Consistency
8.5.2 P2P Content Delivery Networks
8.5.3 Copyright Protection Issues and Solutions
8.5.4 Collusive Piracy Prevention in P2P Networks
8.6 Bibliographic Notes and Homework Problems
Acknowledgements
References
Homework Problems.
CHAPTER Ubiquitous Clouds and the Internet of Things
Summary
9.1 Cloud Trends in Supporting Ubiquitous Computing
9.1.1 Use of Clouds for HPC/HTC and Ubiquitous Computing
9.1.2 Large-Scale Private Clouds at NASA and CERN
9.1.3 Cloud Mashups for Agility and Scalability
9.1.4 Cloudlets for Mobile Cloud Computing
9.2 Performance of Distributed Systems and the Cloud
9.2.1 Review of Science and Research Clouds
9.2.2 Data-Intensive Scalable Computing (DISC)
9.2.3 Performance Metrics for HPC/HTC Systems
9.2.4 Quality of Service in Cloud Computing
9.2.5 Benchmarking MPI, Azure, EC2, MapReduce, and Hadoop
9.3 Enabling Technologies for the Internet of Things
9.3.1 The Internet of Things for Ubiquitous Computing
9.3.2 Radio-Frequency Identification (RFID)
9.3.3 Sensor Networks and ZigBee Technology
9.3.4 Global Positioning System (GPS)
9.4 Innovative Applications of the Internet of Things
9.4.1 Applications of the Internet of Things
Contents xvii
9.4.2 Retailing and Supply-Chain Management
9.4.3 Smart Power Grid and Smart Buildings
9.4.4 Cyber-Physical System (CPS)
9.5 Online Social and Professional Networking
9.5.1 Online Social Networking Characteristics
9.5.2 Graph-Theoretic Analysis of Social Networks
9.5.3 Communities and Applications of Social Networks
9.5.4 Facebook: The World’s Largest Social Network
9.5.5 Twitter for Microblogging, News, and Alert Services
9.6 Bibliographic Notes and Homework Problems
Acknowledgements
References.
Homework Problems
Index