Activity 5: k-means in Spark

A notebook that contains assignment is available here.

You can make a copy of the notebook in your Google drive and then complete the assignment. This consists of writing code as instructed by either TODO or ... placeholders in the notebook cells.

The original notebook (read-only sharing link) includes the sample output from the cells that can be used as reference. Additionally, I have provided a python-only (no Spark) implementation of the same functionality for your reference here.

Turning it in:

Turn in a PDF of the notebook including your answers to the questions to Gradescope. Also, create a sharing link for your notebook and include that link in the PDF.

Due date: Wednesday November 30, 2022. 11:59:59 pm EST.