Lectures are designed for synchronous delivery. It is not expected that the recorded version is an adequate substitute for attending.

Recorded lectures are available on Panopto through Blackboard. Log in to blackboard. Lectures are named LecXX.[Topic].[Date].mp4. Although Panopto is inconvenient, it is the only way to control access to lectures to enrolled students.

Jupyter notebooks for lectures are available on github (https://github.com/randalburns/pplectures2021). It is encouraged that you clone this repo, pull before each lecture and run.


  • Project 1: OpenMP Filter
  • Homework 1.5: All Possible Regressions in Python (due October 1, 2021 5:00 pm EDT)
    • complete the notebook in pplectures2021/homework/HW1.5.ipynb
    • in this way you will have the dataset and the environment
  • Project 2: Java BlockingQueue (due October 15, 2021 5:00 pm EDT)
  • Project 3: dask notebooks (due November 2, 2021 5:00 pm EDT )
  • Project 4: k-means in Spark
  • Project 5: Ray Deadlock (due December 6, 2021 5:00 pm EDT)
    • submissions may be turned in as late as December 8, 2021, 5:00 pm with no late penalty


  • Due October 22, 5 pm EDT. PDF.
    • Midterm is an open-Internet, take home exam. It will be distributed on Wednesday October 20, 2021 at 5:30 pm.
    • Any updates, errata, corrections will be placed on this Webpage. I will notify of changes on Piazza.


  • Final Exam has been released as of December 13, 2021 2:00pm.
    • DUE Tuesday Decmeber 21, 2021 11:59 am.
    • Early submissions are strongly encouraged

Late Hours

A total of 48 late hours are permitted per semester to use as needed for projects. Late hours will be rounded up to the nearest hour, e.g. a project submitted 2.5 hours late will count as using 3 late hours.

Course Schedule

(30 August) Introduction to Parallel Programming

Syllabus review. Academic ethics. Parallelism in modern computer architectures. Performance beyond computational complexity. An introduction to course tools for lectures and homework: conda/python/github/jupyter.

  • Reading:
    • Mattson, Patterns for Parallel Programming, Chapter 1.

(30 August) A First Parallel Program

Parallelization with joblib in Python. The Global Interpreter Lock. Python packages for data science. Performance timing.

  • Reading:
    • Matloff, Parallel Computing for Data Science, Chapter 1.

(6 September) Amdahl’s Law, Strong Scaling, and Parallel Efficiency

Amdahl’s law is the fundamental principle behind strong scaling in parallel computing. Strong scaling is the process of solving a problem of the fixed size faster with parallel resources.

  • Reading:
    • Mattson, Patterns for Parallel Programming, Ch. 2.4-2.6

(8 September) OpenMP

Lecture 4: An introduction to parallelizing serial programs based on compiler directives. Also serial equivalence, and loop parallel constructs.

(15 September) Cache Hierarchy

Lecture 5: Memory hierarchy and latency. Caching concepts: size, lines, associativity, and inclusion/exclusion. Caching microbenchmarks.

(20 September) Loop Optimization

Lecture 6: Loop Optimizations

(22 September) Moore’s Law and Factors Against Parallelism

Lecture 7: Startup, interference, and skew.

  • Reading:
    • Chapter 2, Patterns for Parallel Programming

(27 September) Vector Processsing and Processor Intrinsics

Guest lecturer: Brian Wheatman

(29 September) JIT Compilation, Moore’s Law, Parallel Efficiency

Lecture 9: A potpourri of stuff that I have not gotten to.

(4 October) Processess, Threads, and Java Threads

Lecture 10a: Java Threads

(6 October) Java Concurrency control

Asynchrony, waiting on threads, volatile variables, and synchronized functions.

  • Reading:
    • Appendix C: Patterns for Parallel Programming

(11 October) Mutual Exclusion

Lecture 12: Critical sections and fast mutual exclusion.

  • Reading:
    • Chapter 1 and 2-2.6: Herlihy and Shavat
NO MATERIAL past this point is on the midterm

(13 October) Dask

Lecture 13: Dask Arrays. Data parallel and declarative programming. Execution graphs and lazy evaluation.

(18 October) Dask Dataframes

Lecture 14: Parallel Pandas. Slicing and Aggregation. Indexing.

(20 October) Introduction to Map/Reduce

Lecture 15: The Google Parallel computing environment, functional programming concepts applied to large-scale parallelism, text processing.

(25 October) Hadoop!

Lecture 16: Hadoop! programming, the WordCount tutorial, and the Hadoop! toolchain.

(11 November) Triangle Counting in Hadoop!

Lecture 17: Friends-of-friends running example. The M/R sorting guarantee and combiners.

(1 November) Introduction to Spark

Lecture 18: Spark and Resilient Distributed Datasets.

(3 November) Roofline

Lecture 19: The roofline performance model and off-chip bandwidth

(8 November) Ray: Task Programming with Remote Functions

Lecture 20: Remote functions, distributed objects, distributed memory management

(15 November) BSP, Barriers, and Ray Actors

Lecture 21: Bulk synchronous parallel, barrier synchronization, stateful distributed objects, service centers, ray.get() as a synchronization primitive.

(17 November) MPI, Deadlock and Flynn’s Taxonomy.

Lecture 22a/b/c

  • Reading:
    • MPI Tutorial, Lawrence Livermore National Lab
    • Mattson, Appendix B, Patterns for Parallel Programming.

(29 November) GPU architecture

Lecture 23: The evolution of GPU computing, from graphics pipeline to GPGPU to CUDA. GPU hardware.

(1 December) The Google TPU

Lecture 24: Reading: Jouppi et al. In-Datacenter Performance Analysis of a Tensor Processing Unit. ISCA, 2017.

(6 December) Top 500 Supercomputers

Lecture 25: slides