Assignment #2: Measuring Parallel Performance

Due date: Thursday 8 March 2018, 11:59:00 pm

In this assignment, you will write parallel programs using threads in Java. Follow the guidelines for creating concrete Runnable objects in Appendix C of Patterns for Parallel Programming. There will be few choices in the design of an appropriate parallel algorithm. The point of this exercise is to explore scaling and speedup on multi-core processors.

The majority of your grade (~50%) will be derived from your answers to the questions. These should be concise, thoughtful, and specific. You will turn in both a PDF document that answers the questions below and your source code via Gradescope.

Part 0: What You Need to Know

Amazon EC2

You will run your programs on Amazon EC2 using a c5.4xlarge, High-CPU Extra Large Instance. These machines have 16 virtual cores and will allow you to explore speedup to that scale in multicore systems.

You should develop your code on your personal computer (or a t2.micro free tier instance) and only use EC2 for speedup testing.

vCPUs: See this excellent blog post for a description of vCPUs. The following quote is particularly revealing…

It turns out that a vCPU does not guarantee to you anything more than a single CPU thread.

What this means is a vCPU could be an entire CPU core… or you could be getting a single thread on a hyper-threaded core that you’re sharing with someone else (or, probably much more likely, with yourself, e.g. one Xeon core with HT enabled counts as two vCPUs). Depending on the workload you’re running, you might not see the performance you would expect from doubling the number of vCPUs.


Submit code to the coding assignment on Gradescope and the writeup to the writeup assignment there.

PDF Document: A PDF document that answers the analysis questions listed below. Include your name, the date, and your section number. This document must be a PDF. Google Docs is typically quite good at converting other file formats to PDF.

Source Code: Upload two Java files to Gradescope. Your coin flipper should be named, and your brute force DES key cracking program should be named

Compilation: The easiest way to compile your code is to use javac:

java CoinFlip

Part 1: Parallel Coin Flipping

Write a java program that flips coins in parallel. The program can be invoked from the shell (command line) and should take two arguments: the number of threads to start and the number of coin flips. The program should output the number of heads (or tails), the number of total coin flips, and the elapsed (wall clock) time to run the program. For example, my implementation does the following:

<randal@rio:/Users/randal/class/420> java CoinFlip
Usage: CoinFlip #threads #iterations
<randal@rio:/Users/randal/class/420> java CoinFlip 1 10000000
5000728 heads in 10000000 coin tosses
Elapsed time: 525ms
<randal@rio:/Users/randal/class/420> java CoinFlip 2 10000000
5001027 heads in 10000000 coin tosses
Elapsed time: 267ms

For this experiment you will need to use some java functions. Specifically, System.currentTimeMillis() and Random.generate() and Random.nextInt(). The program I give you in part 2 has examples. You will also need to join() all your child threads from the parent thread. This is the most basic form of synchronization.

Random(): It’s very important to not share your Random() number generator between threads. Instead, create Random() number generators in the constructor of your object. You will have to work out why in first question below…

Also, if you create a Random number generator (Random rand = new Random();) in your run method, you will get strange behavior. Make sure you have a single Random object for each thread, and make sure that object is created somewhere besides the run method (like the constructor).

Part 1: Coin Flipping Analysis

For reasonable parameters, measure the speed-up of this program from 1 thread to twice as many threads as cores (vCPUs in AWS). Your experiment should take seconds (so that it’s a significant number of trials) but not hours. For example, for 16 cores, you might run the program for 1,2,4,…,16,32 threads at 1000000000 coin tosses for speedup. (Don’t worry about +/- rounding errors for number of flips per thread.)

  1. Factors Against Parallelism

    a. Below we have provided the source for a common implementation of the next method of the Random object. The next method is always used to generate random objects, regardless of type (float, int, bool, etc). The nextInt method, for example, calls next with a bit value of 32.

     protected synchronized int next(int bits)
         seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1);
         return (int) (seed >>> (48 - bits));

    Given the source code above, imagine that you created a single Random object shared across each of your threads.

    a. What type of speedup (e.g. linear, sub-linear, constant) would you expect to see?

    b. What factor against parallelism does this demonstrate?

  2. Scaling concepts (please clearly label all axes and include units and scale)

    a. Produce (i) a speedup plot and (ii) characterize it in terms of linearity.

    b. Produce a (i) parallel efficiency plot and (ii) characterize it in terms of linearity.

Part 2: Brute Force a DES Key

In this step, you will implement a parallel/threaded Java program that attempts to brute force a DES key. Because we do not have reasonable computing power to attack 56 bit keys, we are going to attack smaller keys. I also wanted to keep it simple. So, rather than giving you an already encrypted message, we are going to have your program generate the ciphertext and then brute force the decryption.

The attack is a combination brute force + known plaintext attack. If you the attacker know part of the message, you can attempt to decrypt with all possible keys and check to see if the decrypted output matches the know plaintext.

The assignment starts with my serial implementation of a brute force attack:

Parallelize this program so that it will search for a specified number of keys using a specified number of threads. Your program must also accept a file argument from which text should be read, encyrpted then cracked. In this example we use input.txt. My implementation does the following:

<randal@rio:/Users/randal/class/420> java BruteForceDES 2 20 input.txt
Usage: java BruteForceDES #threads key_size_in_bits <filename>
<randal@rio:/Users/randal/class/420> java BruteForceDES 2 20 input.txt
Generated secret key 798711
Thread 0 Searched key number 0 at 0 milliseconds.
Thread 1 Searched key number 600000 at 1803 milliseconds.
Thread 0 Searched key number 100000 at 2204 milliseconds.
Thread 1 Searched key number 700000 at 3384 milliseconds.
Thread 0 Searched key number 200000 at 3783 milliseconds.
Thread 1 found decrypt key 798711 producing message: Johns Hopkins afraid of the big bad wolf?
Thread 1 Searched key number 800000 at 4891 milliseconds.
Thread 0 Searched key number 300000 at 5292 milliseconds.
Thread 1 Searched key number 900000 at 6426 milliseconds.
Thread 0 Searched key number 400000 at 6801 milliseconds.
Thread 1 Searched key number 1000000 at 7925 milliseconds.
Thread 0 Searched key number 500000 at 8309 milliseconds.
Completed search of 1048576 keys at 14614 milliseconds.

You should make sure that your threads have no shared state, variables, or objects aside from a shared SealedObject containing the encrypted String. For example, make sure to create a separate SealedDES object for decryption in each thread. Do not use a single encryption/decryption object shared among all threads.

You should only encrypt the plaintext once and distribute the encrypted object to all your threads.. This is similar to the building of the filter / data arrays in project 1 (which were never included in the timing for your experiments).

Autograder Code Assignment

The autograder is a preliminary evaluation step. Every submission will be looked at by a course assistant. Note that if your code fails the autograder you will receive 0 points without exception.

Create a compressed arvix out of your code named project2.tar.gz. This can be done as follows for example:

tar cfvz project2.tar.gz


  • The autograder is not “dummy-proof” please follow instructions carefully (triple-check before asking on piazza).
  • If you don’t provide a Makefile one is created that assumes your tar includes and If you would like to compile your code in a custom way provide a Makefile.
  • Code is compiled using the command make
  • If you tar a directory containing code rather than the code alone the autograder will fail to find your code. Follow the example above.
  • Cleanup debugging output to stdout/stderr i.e., System.*.print* outputs and only include what is requested.

Scrict output format should be only the following, given an example command of java CoinFlip 2 10000000


5000728 heads in 10000000 coin tosses
Elapsed time: 525ms

Scrict output format should be only the following, given an example command of java BruteForceDES 2 20 input.txt


Decrypted string: "Johns Hopkins afraid of the big bad wolf?"
Keys searched: 1048576 in 1690ms