"Por Favor No Se Enoje" is a popular Guatemalan radio program known for its engaging discussions on political issues in Guatemala and neighboring countries. This project aims to analyze the sentiment expressed in YouTube comments related to this program, providing insights into public opinion. To achieve this, we will utilize the powerful Gemini model, a large language model developed by Google AI.
This project is part of the 5-Day Gen AI Intensive Course with Google.
Dataset and EDA
The dataset for this project was collected using the YouTube Data API. The process involved identifying relevant videos, retrieving comments, and storing them in a structured format. You can find a detailed explanation of the data extraction process in the accompanying Jupyter Notebook. The resulting dataset is publicly available on Kaggle datasets.
Before diving into sentiment analysis, I performed an Exploratory Data Analysis (EDA) to gain an understanding of the data. Here are some key insights:
- Top Videos with Most Comments: A bar chart visualization revealed the top 15 videos with the most comments. This information can help identify specific episodes or topics that generated significant audience engagement.
- Number of Comments Published by Date: A line plot illustrated the number of comments published over time. This visualization highlighted trends and potential spikes in comment activity.
- Distribution of Like Counts: A histogram displayed the distribution of like counts for comments. This helped understand the overall positivity or negativity expressed in the comments.
Sentiment Analysis using Gemini
We leveraged the Gemini model through Vertex AI to perform sentiment analysis on the YouTube comments. The process involved the following steps:
Setting up the Gemini API Client: We installed the necessary libraries, authenticated using an API key, and configured a retry mechanism for error handling.
Defining the Sentiment Analysis Function (
sentiment
): This function takes a comment as input and constructs a zero-shot prompt for the Gemini model. The prompt instructs the model to classify the comment's sentiment as either "POSITIVO", "NEGATIVO", or "NEUTRAL," and to return only the classification. Google has provided a prompt gallery, depending on the task of your LLM is good to review if there's an existing prompt in the prompt Gallery.Making Predictions: The
generate_content
method of the Gemini model is called with the constructed prompt to obtain the sentiment prediction for each comment. Error handling is included to manage potential issues during API calls.Applying Sentiment Analysis to the Dataset: The
sentiment
function is then applied to the "text" column of our DataFrame, creating a new "sentiment" column containing the predicted sentiment for each comment.
For example, the comment "Grand exponente. Lo mejor de Guate. Muy bonito e ilustrativo programa" is classified as: POSITIVO.
A pie chart visualization was created to display the distribution of sentiments across all comments. This chart provides a clear overview of the overall sentiment towards the "Por Favor No Se Enoje" program.
SQLite and Natural Language Querying with Gemini
To enable more flexible data exploration, we saved the processed data into a local SQLite database. This allowed us to use Gemini's function calling capabilities to perform natural language querying. We defined interaction functions like describe_table
, list_tables
, and execute_query
to interface with the database. Users can then ask questions in natural language, and Gemini automatically translates them into SQL queries to retrieve the desired information.
Google Search
To provide further context, we utilized Gemini's Google Search tool to retrieve information about the "Por Favor No Se Enoje" program and its founders. The model generated a Markdown summary and presented HTML content from the Google Search results, offering insights into the program's background and key figures.
This is a function with great potential to provide one shot prompts, to use it as RAG and many other possible functions.
Conclusion
This project demonstrates the power of the Gemini model for sentiment analysis and natural language querying. By analyzing YouTube comments, we gained valuable insights into public opinion towards the "Por Favor No Se Enoje" program. The findings can be used to understand audience engagement, identify potential areas for improvement, and track changes in sentiment over time. Furthermore, the integration with SQLite and Google Search enhances the flexibility and richness of the analysis, providing a comprehensive understanding of the topic.