Blog: AWS GluePyspark Locally

Posted on Sat 16 May 2020 in blogs

Download and install maven

  1. Download maven from https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-common/apache-maven-3.6.0-bin.tar.gz
  2. untar the content to respective folder For example, mv apache-maven-3.6.0 {HOME}/Documents/opt/apache-maven
  3. Add mvn to your path

    bash echo 'export PATH=$PATH:/Users/bhavintandel/Documents/opt/apache-maven/bin' >> ~/.profile

  4. Restart the session

Download the Spark distrubution

At the moment aws provide two glue spark executable,

Download the aws-glue-libs

Aws have two version for aws-glue-libs,

  • 0.9 -> python2 -> git@github.com:awslabs/aws-glue-libs.git
  • 1.0 -> support python3 -> git@github.com:awslabs/aws-glue-libs.git

  • Clone the aws-glue-libs repo, For specific branch, git clone -b {branch-name} git@github.com:awslabs/aws-glue-libs.git

  • Run the gluepyspark

    bash cd aws-glue-libs ./bin/gluepyspark

Configure pycharm for pyspark development

  1. Install pyspark as python package