This section contains documents that could not be made accessible to screen reader software. A "#" symbol is used to denote such documents.
Selected Lecture Notes from Spring 2003 Class
The course notes below are a work in progress. The lecture numbers do not correspond to the class session numbers. The notes are available as a single file (PDF - 4.3 MB)# or as separate chapters below.
1.1. The Machines
1.2. The Software
1.3. The Reality of High Performance Computing
1.4. Modern Algorithms
1.5. Compilers
1.6. Scientific Algorithms
1.7. History, State-of-Art, and Perspective
- 1.7.1. Things that are not Traditional Supercomputers
1.8. Analyzing the Top500 List Using Excel
- 1.8.1. Importing the XML File
- 1.8.2. Filtering
- 1.8.3. Pivot Tables
1.9. Parallel Computing: An Example
1.10. Exercises
Chapter 2: MPI, OpenMP, MATLAB®*P (
PDF)
2.1. Programming Style
2.2. Message Passing
- 2.2.1. Who am I?
- 2.2.2. Sending and Receiving
- 2.2.3. Tags and Communicators
- 2.2.4. Performance, and Tolerance
- 2.2.5. Who's got the floor?
2.3. More on Message Passing
- 2.3.1. Nomenclature
- 2.3.2. The Development of Message Passing
- 2.3.3. Machine Characteristics
- 2.3.4. Active Messages
2.4. OpenMP for Shared Memory Parallel Programming
2.5. STARP
Chapter 3: Parallel Prefix (
PDF)
3.1. Parallel Prefix
3.2. The "Myth" of lg n
3.3. Applications of Parallel Prefix
- 3.3.1. Segmented Scan
- 3.3.2. Csanky's Matrix Inversion
- 3.3.3. Babbage and Carry Look-Ahead Addition
3.4. Parallel Prefix in MPI
Chapter 4: Dense Linear Algebra (
PDF)
4.1. Dense Matrices
4.2. Applications
- 4.2.1. Uncovering the Structure from Seemingly Unstructured Problems
4.3. Records
4.4. Algorithms, and Mapping Matrices to Processors
4.5. The Memory Hierarchy
4.6. Single Processor Condiderations for Dense Linear Algebra
- 4.6.1. LAPACK and the BLAS
- 4.6.2. Reinventing Dense Linear Algebra Optimization
4.7. Parallel Computing Considerations for Dense Linear Algebra
4.8. Better Load Balancing
Chapter 5: Sparse Linear Algebra (
PDF)
5.1. Cyclic Reduction for Structured Sparse Linear Systems
5.2. Sparse Direct Methods
- 5.2.1. LU Decomposition and Gaussian Elimination
- 5.2.2. Parallel Factorization: The Multifrontal Algorithm
5.3. Basic Iterative Methods
- 5.3.1. SuperLU-dist
- 5.3.2. Jacobi Method
- 5.3.3. Gauss-Seidel Method
- 5.3.4. Splitting Matrix Method
- 5.3.5. Weighted Splitting Matrix Method
5.4. Red-Black Ordering for Parallel Implementation
5.5. Conjugate Gradient Method
- 5.5.1. Parallel Conjugate Gradient
5.6. Preconditioning
5.7. Symmetric Supernodes
- 5.7.1. Unsymmetric Supernodes
- 5.7.2. The Column Elimination Tree
- 5.7.3. Relaxed Supernodes
- 5.7.4. Supernodal Numeric Factorization
5.8. Efficient Sparse Matrix Algorithms
- 5.8.1. Scalable Algorithms
- 5.8.2. Cholesky Factorization
- 5.8.3. Distributed Sparse Cholesky and the Model Problem
- 5.8.4. Parallel Block-Oriented Sparse Cholesky Factorization
5.9. Load Balance with Cyclic Mapping
5.10. Heuristic Remapping
5.11. Scheduling Local Computations
Chapter 6: Parallel Machines (
PDF)
#- 6.0.1. More on Private Versus Shared Addressing 
- 6.0.2. Programming Model 
- 6.0.3. Machine Topology 
- 6.0.4. Homogeneous and Heterogeneous Machines 
- 6.0.5. Distributed Computing on the Internet and Akamai Network 
7.1. FFT
- 7.1.1. Data Motion
- 7.1.2. FFT on Parallel Machines
- 7.1.3. Exercises
7.2. Matrix Multiplication
7.3. Basic Data Communication Operations
Chapter 8: Domain Decomposition (
PDF)
#8.1. Geometric Issues
- 8.1.1. Overlapping vs. Non-overlapping Regions
- 8.2.2. Geometric Discretization
8.2. Algorithmic Issues
- 8.2.1. Classical Iterations and their Block Equivalents
- 8.2.2. Schwarz Approaches: Additive vs. Multiplicative
- 8.2.3. Substructuring Approaches
- 8.2.4. Accellerants
8.3. Theoretical Issues
8.4. A Domain Decomposition Assignment: Decomposing MIT
Chapter 9: Particle Methods (
PDF)
#9.1. Reduce and Broadcast: A Function Viewpoint
9.2. Particle Methods: An Application
9.3. Outline
9.4. What is N-Body Simulation?
9.5. Examples
9.6. The Basic Algorithm
- 9.6.1. Finite Difference and the Euler Method
9.7. Methods for Force Calculation
- 9.7.1. Direct Force Calculation
- 9.7.2. Potential Based Calculation
- 9.7.3. Poisson Methods
- 9.7.4. Hierarchical Methods
9.8. Quadtree (2D) and Octtree (3D) : Data Structures for Canonical Clustering
9.9. Barnes-hut Method (1986)
- 9.9.1. Approximating Potentials
9.10. Outline
9.11. Introduction
9.12. Multipole Algorithm: An Overview
9.13. Multipole Expansion
9.14. Taylor Expansion
9.15. Operation No.1 - SHIFT
9.16. Operation No.2 - FLIP
9.17. Application on Quad Tree
9.18. Expansion from 2-D to 3-D
9.19. Parallel Implementation
10.1. Motivation from the Parallel Sparse Matrix Vector Multiplication
10.2. Separators
10.3. Spectral Partitioning - One way to Slice a Problem in Half
- 10.3.1. Electrical Networks
- 10.3.2. Laplacian of a Graph
- 10.3.3. Spectral Partitioning
10.4. Geometric Methods
- 10.4.1. Geometric Graphs
- 10.4.2. Geometric Partitioning: Algorithm and Geometric Modeling
- 10.4.3. Other Graphs with Small Separators
- 10.4.4. Other Geometric Methods
- 10.4.5. Partitioning Software
10.5. Load-Balancing N-body Simulation for Non-uniform Particles
- 10.5.1. Hierarchical Methods of Non-uniformly Distributed Particles
- 10.5.2. The Communication Graph for N-body Simulations
- 10.5.3. Near-field Graphs
- 10.5.4. N-body Communication Graphs
- 10.5.5. Geometric Modeling of N-body Graphs
Chapter 11: Mesh Generation (
PDF)
11.1. How to Describe a Domain?
11.2. Types of Meshes
11.3. Refinement Methods
- 11.3.1. Hierarchical Refinement
- 11.3.2. Delaunay Triangulation
- 11.3.2. Delaunay Refinement
11.4. Working With Meshes
11.5. Unsolved Problems
Chapter 12: Support Vector Machines and Singular Value Decomposition (
PDF )
12.1. Support Vector Machines
- 12.1.1. Learning Models
- 12.1.2. Developing SVMs
- 12.1.3. Applications
12.2. Singular Value Decomposition
Selected Lecture Notes from Spring 2003 Class
Course lecture notes.| LEC # | TOPICS | 
|---|
| 1 | Introduction (PDF - 1.3 MB) | 
| 2 | MPI, MATLAB®*P (TXT) | 
| 4 | Parallel Prefix (PDF) | 
| 5 | Parallel Computer Architecture I (PDF) | 
| 6 | Parallel Computer Architecture II (PDF) | 
| 7 | Dense Linear Algebra I 
 (PDF 1) (Courtesy of James Demmel. Used with permission.)
 (PDF 2)
 (PDF 3 - 1.1 MB)
 (PDF 4) (Courtesy of Jack Dongarra, University of Tennessee. Used with permission.)
 | 
| 9 | FFT (PDF) | 
| 11 | Graph Partitioning (PDF) | 
| 14 | N-Body Problem - Barnes Hut (PDF)# | 
| 18 | Domain Decomposition (PDF)# | 
| 23 | Support Vector Machines (PDF 1) (PDF 2)# (Courtesy of Tong Wen. Used with permission.) |