- Project 0: AWS Web Server, release 11 February 2018.
- Project 1: OpenMP Filter, release 17 February. study guide 22 February. Solutions available at Project 1 solutions
- Project 2: OpenMPI, release 11 March. Solutions available at Project 2 solutions.
- Project 3: MapReduce, release 4 April. Solutions will be available next Monday, 4/15.
For more experience with thread-level parallelism, see Old project 2 on Java threading, which would be good practice.
|Daniel Bakeremail@example.com||Malone 263/222||Monday/Wednesday 6-7|
|Randal Burnsfirstname.lastname@example.org||Malone 227||Thursday 3-4:30|
|Rohan Tilvaemail@example.com||Malone 1 ugrad lab||Tuesday 5-6|
|Sofya Freymanfirstname.lastname@example.org||Malone 1 ugrad lab||Tuesday 5-6|
|Will David||wdavid2||Malone 1 ugrad lab||Tuesday 5-6|
|James Lubowskyemail@example.com||Malone 1 ugrad lab||Monday 6-7:30|
|Jason Zhangfirstname.lastname@example.org||Malone 1 ugrad lab||Monday 6-7:30|
|Joseph Naness||rnaness1||Malone 1 ugrad lab||Monday 1:30-2:30|
- Note: All times are P.M.
Schedule, Lectures, and Assignments
The Jupyter notebooks for lectures
(28 and 30 January) Introduction, Moore’s Law, Amdahls Law
The impact of multi-core and pipelines on processing power. Moore’s law reloaded. Overview of modern parallel architectures. Comparing concurrency and parallelism.
- Chapter 1, Patterns for Parallel Programming
- Lectures 1-3
(4 February) Metrics for Parallelism
Factors against parallelism, scaleup and parallel efficiency.
- Chapter 2, Patterns for Parallel Programming
- Lectures 4-5
(6 February) OpenMP
An introduction to parallelizing serial programs based on compiler directives. Also serial equivalence, loop parallel constructs, and independent loops.
- Appendix A, Patterns for Parallel Programming
- Reference Materials:
- Wikipedia: http://en.wikipedia.org/wiki/OpenMP
- LLNL Tutorial (ignore Fortran stuff): https://computing.llnl.gov/tutorials/openMP/
- Specification (it’s actually really useful): http://www.openmp.org/mp-documents/spec30.pdf
- Performance tutorial (this is good!): http://www.akira.ruc.dk/~keld/teaching/IPDC_f10/Slides/pdf4x/4_Performance.4x.pdf
(11 February) Class cancelled (snow)
(13 February) OpenMP Loop Optimization (Lecture 8 and openmp/omp_c/stencil.c)
(18 Februrary) Loop Dependencies and Cache Hierarchy
- Lectures 9 and 10
(20 February) Parallel Architectures and OS Constructs
Flynn’s taxonomy. The differences between message passing and shared memory machines. SMPD. Clusters. Hybrid architectures. The parallel memory model and cache hierarchy.
- Lecture 11
Operating system process abstraction: contexts and context switching, virtual memory, program organization in memory. Operating system threads, thread context, thread memory model.
- Lecture 12
(25 February) Java Threads and Java Synchronize
Asynchrony, waiting on threads, volatile variables, and synchronized functions.
- Lectures 13 and 14
(27 February) Assessment #1 and Introduction to Synchronization
- Lectures 14 and 15
(6 March) Top500 and MPI Introduction
Supercomputers of the world and trends in architecture. MPI Overview, the MPI Runtime Environment.
- MPI Tutorial, Lawrence Livermore National Lab
- Project 2 released.
- Lectures 15, 16, 17.
(11 March) MPI Messaging and Barriers
MPI Synchronous Messaging, Asynchronous I/O, and Barriers.
- MPI Tutorial, Lawrence Livermore National Lab
- Appendix B, Patterns for Parallel Programming
- Code Examples:
- Lectures 18, 19
(13 March) TOP 500 Priority Inversion and Deadlock
(18 and 20 March) Spring Break
(25 March More MPI I/O) Lecture 21
- Slides :
- Lecture 21 MPI Collective Operations
(27 March) HPC Checkpoints and I/O Performance. Lecture 22.
- Slides :
(April 1) Assessment #2
(3 April) Introduction to Map/Reduce and Hadoop!
The Google Parallel computing environment, functional programming concepts applied to large-scale parallelism, text processing.
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, OSDI, 2004
- Lectures: 24 and 25
(8 April) Map Reduce Systems and Semantics!
Map/Reduce runtime, scalability and reliability, the Google File System, HDFS. Sorting, partitions, and combiners.
- Lectures: 26 and 27
(10 April) Spark
Spark and RDD’s.
- Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, NSDI, 2012
- Lectures: 28
(15 April) Parallel and Distributed Computing in Python with Dask
Dask parallelizes Python libraries like NumPy, pandas, and scikit-learn, bringing a popular data science stack to the world of distributed computing. This talk will provide an overview of how Dask works then demonstrate Dask in action, showing how it can be used to transparently scale from a single thread of execution on a laptop to many parallel processes running across a cluster. Dask has been used on production systems with over 1000 cores, and works well for computational units in the millisecond range.
Lecturer: Ian Stokes-Rees
Ian Stokes-Rees is a Principal Analytics Consultant at BCG. He is a professional software engineer and computational scientist. Prior to joining BCG Ian spent 5 years at Anaconda where he worked as a product manager, engineer, and evangelist for open source Python data science. Ian has a PhD in Particle Physics from Oxford, where he worked on the CERN LHCb computing team to develop the experiment’s distributed computing framework. This is where Ian first started using Python and he’s been in love with it ever since. Ian hails from Canada and has an undergraduate and Master’s degree in Electrical Engineering from the University of Waterloo.
(17 April) Introduction to CUDA and GPU
(22 April) Assessment #3: Hadoop! and Spark
(24 April) Roofline
- Reading: Understand operational intensity and the memory-limited and processing limited portions of the chart. This will be on the final as described in class!
- Williams et al. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures, CACM, 52(4), 2009.