Skip to content

Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark

Notifications You must be signed in to change notification settings

BenitaDiop/FullStackBigData-with-SPARK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Analyzing 10GB of Yelp Data on AWS EMR

Leveraging Pyspark, Python, Spark, SQL, SparkR, R and Bash

Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark


AWS Cluster Configuration

cluster

AWS Notebook Configuration

notebook

About

Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
-