Skip to content

This project is a powerful Streamlit application designed to provide users with seamless access and analysis of data from multiple YouTube channels. This intuitive tool leverages the Google API to retrieve a comprehensive range of information, including channel details, video statistics, and viewer engagement metrics.

Notifications You must be signed in to change notification settings

pillowTree3/YouTube-Data-Harvesting-and-Warehousing-using-SQL-MongoDB-and-Streamlit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

YouTube-Data-Harvesting-and-Warehousing-using-SQL-MongoDB-and-Streamlit

Problem Statement

The task is to build a Streamlit app that permits users to analyze data from multiple YouTube channels. Users can input a YouTube channel ID to access data like channel information, video details, and user engagement. The app should facilitate storing the data in a MongoDB database and allow users to collect data from up to 10 different channels. Additionally, it should offer the capability to migrate selected channel data from the data lake to a SQL database for further analysis. The app should enable searching and retrieval of data from the SQL database, including advanced options like joining tables for comprehensive channel information.

Technology Stack Used

  1. Python
  2. MySQL
  3. MongoDB
  4. Google Client Library

Approach

  1. Start by setting up a Streamlit application using the python library "streamlit", which provides an easy-to-use interface for users to enter a YouTube channel ID, view channel details, and select channels to migrate.
  2. Establish a connection to the YouTube API V3, which allows me to retrieve channel and video data by utilizing the Google API client library for Python.
  3. Store the retrieved data in a MongoDB data lake, as MongoDB is a suitable choice for handling unstructured and semi-structured data. This is done by firstly writing a method to retrieve the previously called api call and storing the same data in the database in 3 different collections.
  4. Transferring the collected data from multiple channels namely the channels,videos and comments to a SQL data warehouse, utilizing a SQL database like MySQL or PostgreSQL for this purpose.
  5. Utilize SQL queries to join tables within the SQL data warehouse and retrieve specific channel data based on user input. For that the SQL table previously made has to be properly given the the foreign and the primary key.
  6. The retrieved data is displayed within the Streamlit application, leveraging Streamlit's data visualization capabilities to create charts and graphs for users to analyze the data.

About

This project is a powerful Streamlit application designed to provide users with seamless access and analysis of data from multiple YouTube channels. This intuitive tool leverages the Google API to retrieve a comprehensive range of information, including channel details, video statistics, and viewer engagement metrics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

-