Lectures
Lectures are designed for synchronous delivery. It is not expected that the recorded version is an adequate substitute for attending.
Recorded lectures are available on Panopto through Blackboard. Log in to blackboard. Lectures are named LecXX.[Topic].[Date].mp4
. Although Panopto is inconvenient, it is the only way to control access to lectures to enrolled students.
Jupyter notebooks for lectures are available on github (https://github.com/randalburns/pplectures2021
). It is encouraged that you clone this repo, pull before each lecture and run.
Projects
- Project 1: OpenMP Filter
- Homework 1.5: All Possible Regressions in Python (due October 1, 2021 5:00 pm EDT)
- complete the notebook in
pplectures2021/homework/HW1.5.ipynb
- in this way you will have the dataset and the environment
- complete the notebook in
- Project 2: Java BlockingQueue (due October 15, 2021 5:00 pm EDT)
- Project 3:
dask
notebooks (due November 2, 2021 5:00 pm EDT ) - Project 4: k-means in Spark
- Project 5: Ray Deadlock (due December 6, 2021 5:00 pm EDT)
- submissions may be turned in as late as December 8, 2021, 5:00 pm with no late penalty
Midterm
- Due October 22, 5 pm EDT. PDF.
- Midterm is an open-Internet, take home exam. It will be distributed on Wednesday October 20, 2021 at 5:30 pm.
- Any updates, errata, corrections will be placed on this Webpage. I will notify of changes on Piazza.
Final
- Final Exam has been released as of December 13, 2021 2:00pm.
- DUE Tuesday Decmeber 21, 2021 11:59 am.
- Early submissions are strongly encouraged
Late Hours
A total of 48 late hours are permitted per semester to use as needed for projects. Late hours will be rounded up to the nearest hour, e.g. a project submitted 2.5 hours late will count as using 3 late hours.
Course Schedule
(30 August) Introduction to Parallel Programming
Syllabus review. Academic ethics. Parallelism in modern computer architectures. Performance beyond computational complexity. An introduction to course tools for lectures and homework: conda/python/github/jupyter.
- Reading:
- Mattson, Patterns for Parallel Programming, Chapter 1.
(30 August) A First Parallel Program
Parallelization with joblib
in Python. The Global Interpreter Lock. Python packages for data science. Performance timing.
- Reading:
- Matloff, Parallel Computing for Data Science, Chapter 1.
(6 September) Amdahl’s Law, Strong Scaling, and Parallel Efficiency
Amdahl’s law is the fundamental principle behind strong scaling in parallel computing. Strong scaling is the process of solving a problem of the fixed size faster with parallel resources.
- Reading:
- Mattson, Patterns for Parallel Programming, Ch. 2.4-2.6
(8 September) OpenMP
Lecture 4: An introduction to parallelizing serial programs based on compiler directives. Also serial equivalence, and loop parallel constructs.
- Reading:
- Mattson, Patterns for Parallel Programming, Appendix A.
- Reference Materials:
- LLNL Tutorial (ignore Fortran stuff): https://computing.llnl.gov/tutorials/openMP/
- Specification (it’s actually really useful): http://www.openmp.org/mp-documents/spec30.pdf
(15 September) Cache Hierarchy
Lecture 5: Memory hierarchy and latency. Caching concepts: size, lines, associativity, and inclusion/exclusion. Caching microbenchmarks.
- Reading:
- Dongarra et al. Accurate Cache and TLB Characterization Using Hardware Counters. https://link.springer.com/content/pdf/10.1007/978-3-540-24688-6_57.pdf
- Please read to comprehend Figures 1, 2, and 3. Figures 4, 5, and 6 and the second microbenchmark (papi cacheBlock) are less important and difficult to understand.
- Dongarra et al. Accurate Cache and TLB Characterization Using Hardware Counters. https://link.springer.com/content/pdf/10.1007/978-3-540-24688-6_57.pdf
(20 September) Loop Optimization
Lecture 6: Loop Optimizations
- Reading:
- Performance tutorial (this is good!): http://www.akira.ruc.dk/~keld/teaching/IPDC_f10/Slides/pdf4x/4_Performance.4x.pdf
(22 September) Moore’s Law and Factors Against Parallelism
Lecture 7: Startup, interference, and skew.
- Reading:
- Chapter 2, Patterns for Parallel Programming
(27 September) Vector Processsing and Processor Intrinsics
Guest lecturer: Brian Wheatman
(29 September) JIT Compilation, Moore’s Law, Parallel Efficiency
Lecture 9: A potpourri of stuff that I have not gotten to.
(4 October) Processess, Threads, and Java Threads
Lecture 10a: Java Threads
(6 October) Java Concurrency control
Asynchrony, waiting on threads, volatile variables, and synchronized functions.
- Reading:
- Appendix C: Patterns for Parallel Programming
(11 October) Mutual Exclusion
Lecture 12: Critical sections and fast mutual exclusion.
- Reading:
- Chapter 1 and 2-2.6: Herlihy and Shavat
NO MATERIAL past this point is on the midterm
(13 October) Dask
Lecture 13: Dask Arrays. Data parallel and declarative programming. Execution graphs and lazy evaluation.
(18 October) Dask Dataframes
Lecture 14: Parallel Pandas. Slicing and Aggregation. Indexing.
(20 October) Introduction to Map/Reduce
Lecture 15: The Google Parallel computing environment, functional programming concepts applied to large-scale parallelism, text processing.
- Reading:
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, OSDI, 2004
(25 October) Hadoop!
Lecture 16: Hadoop! programming, the WordCount tutorial, and the Hadoop! toolchain.
(11 November) Triangle Counting in Hadoop!
Lecture 17: Friends-of-friends running example. The M/R sorting guarantee and combiners.
(1 November) Introduction to Spark
Lecture 18: Spark and Resilient Distributed Datasets.
- Reading:
- Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, NSDI, 2012
(3 November) Roofline
Lecture 19: The roofline performance model and off-chip bandwidth
- Reading: Understand operational intensity and the memory-limited and processing limited portions of the chart. This will be on the final as described in class!
- Williams et al. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures, CACM, 52(4), 2009.
(8 November) Ray: Task Programming with Remote Functions
Lecture 20: Remote functions, distributed objects, distributed memory management
- Reading: P. Moritz et al. Ray: A Distributed Framework for Emerging AI Applications. OSDI, 2018.
(15 November) BSP, Barriers, and Ray Actors
Lecture 21: Bulk synchronous parallel, barrier synchronization, stateful distributed objects, service centers, ray.get()
as a synchronization primitive.
(17 November) MPI, Deadlock and Flynn’s Taxonomy.
Lecture 22a/b/c
- Reading:
- MPI Tutorial, Lawrence Livermore National Lab
- Mattson, Appendix B, Patterns for Parallel Programming.
(29 November) GPU architecture
Lecture 23: The evolution of GPU computing, from graphics pipeline to GPGPU to CUDA. GPU hardware.
- Cool blog post about GPUs in deep learning. https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664
(1 December) The Google TPU
Lecture 24: Reading: Jouppi et al. In-Datacenter Performance Analysis of a Tensor Processing Unit. ISCA, 2017.
(6 December) Top 500 Supercomputers
Lecture 25: slides