Analyzing 10GB of Yelp Data on AWS EMR

Leveraging Pyspark, Python, Spark, SQL, SparkR, R and Bash

Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
assets		assets
Pyspark10GB.ipynb		Pyspark10GB.ipynb
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing 10GB of Yelp Data on AWS EMR

Leveraging Pyspark, Python, Spark, SQL, SparkR, R and Bash

AWS Cluster Configuration

AWS Notebook Configuration

About

Releases

Packages

Languages

BenitaDiop/FullStackBigData-with-SPARK

Folders and files

Latest commit

History

Repository files navigation

Analyzing 10GB of Yelp Data on AWS EMR

Leveraging Pyspark, Python, Spark, SQL, SparkR, R and Bash

AWS Cluster Configuration

AWS Notebook Configuration

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages