Friday, June 25, 2010

Shared-Memory Parallel Programming


Exerted from "The Art of Parallel Programming, 2E by Bruce Lester"

1. Data Parallelism
1.1 Process Creation
1.2 Process Granularity
1.3 Optimal Group Size
1.4 The Fork Operator
1.4.1 Process Termination
1.4.2 The Join Statement
1.4.3 Parallel List Processing
1.5 Amdahl's Law
1.5.1 Effects of Sequential Code on Speedup
1.5.2 Overcoming Initialization Overhead

2. Multiprocessor Architecture
2.1 Bus-oriented systems
2.2 Cache Memory
2.3 Processor-Memory Interconnection Networks

3. Process Communication
3.1 Process Communication Streams
3.2 Pipeline Parallelism

4. Data Sharing
4.1 Atomic Operations
4.2 Spinlocks
4.3 Contention for Shared Data
4.4 Comparing Spinlocks and Streams

5. Synchronous Parallelism
5.1 Solving a Differential Equation
5.2 Parallel Jacobi Relaxation
5.2.1 Synchronization by Process Termination
5.2.2 Barrier Synchronization
5.3 Linear Barrier Implementation
5.4 Binary Tree Implementation of Barriers
5.4.1 Tournament Technique
5.4.2 Tree Creation Algorithm
5.4.3 Performance
5.5 Local Synchronization
5.6 Broadcasting and Aggregation
5.6.1 Convergence Testing
5.6.2 Implementing Parallel Aggregation

6. Replicated Workers
6.1 Work Pools
6.2 Implementation of Work Pools
6.3 Eliminating Contention
6.3.1 Load Balancing
6.3.2 Termination Algorithm
6.3.2 Performance