Installing the Required Software Packages
- Oracle’s Virtual Box
- Vagrant automatic VM configuration
Note: If you already have either software package installed, makes sure that the versions are VirtualBox 4.3.28 (or later) and Vagrant 1.7.2 (or later)
Vagrant: to manage the VM
Vagrant is a tool to manage a env with your virtual machines. In cs190, it is used for build the env to learn the spark.
Vagranfile is a configure file with ruby to build the vm from the internet. the first time it would download the sparkvm to your disk to build the development env.
Using pyspark in the ipython notebook
when you build the spark development env, open you safari and enter the http://localhost:8001
Click the ‘upload’ button and upload the ML_lab1_review_student.ipynb then you can answer the problem in this notebook. The lab1 notebook is to review the linear algebra ,the foundation of the math , numpy and the lambda python expression that cs190 need .
The foundation of the numpy and lambda python expression is that i known , the key is the DenseVector. It is a class in pyspark.mllib.linalg. DenseVector is used to store arrays of value for use in pyspark. Note that the DenseVector store all value as np.float64.
The content of the week 1 is finished. Next, we will study the Pragramming in Spark by using pyspark. Good Luck!