AI

Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data Analysis

In this tutorial, we explain the integration of the Python Strong Data Processing Library with Google Cloud with Google.generativei and Gemini Pro. By preparing the environment with the necessary libraries, creating the Google Cloud API key, and taking advantage of iPython display functions, the code provides a step -by -step approach to building a data science agent that analyzes the sample sales collection. The example explains how to convert the data framework into justified format and then use natural language queries to create visions about data, while highlighting the capabilities of combining traditional data analysis tools and modern methods driven by artificial intelligence.

!pip install pandas google-generativeai --quiet

First, we install Pandas and Google-Henerativei libraries quietly, and we prepare the environment to process the data and analysis of artificial intelligence.

import pandas as pd
import google.generativeai as genai
from IPython.display import Markdown

We import Pandas for data processing, Google.generativeai to access the googe AI’s possibilities from Google, and reduce iPython.display to provide formatting outputs.

GOOGLE_API_KEY = "Use Your API Key Here"
genai.configure(api_key=GOOGLE_API_KEY)


model = genai.GenerativeModel('gemini-2.0-flash-lite')

We set the key to the deputy applications of the deputy applications, configuring the Google.generativeai client, and configuring Generatevemodel “Gemini-2.0-Flash-Lite” to generate content.

data = {'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam', 'Headphones'],
        'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics'],
        'Region': ['North', 'South', 'East', 'West', 'North', 'South'],
        'Units Sold': [150, 200, 180, 120, 90, 250],
        'Price': [1200, 25, 75, 300, 50, 100]}
sales_df = pd.DataFrame(data)


print("Sample Sales Data:")
print(sales_df)
print("-" * 30)

Here, we create a Pandas data file called sales_df that contains sales data for samples for different products, then print Dataframe followed by a separation line to visually output the output.

def ask_gemini_about_data(dataframe, query):
    """
    Asks the Gemini Pro model a question about the given Pandas DataFrame.


    Args:
        dataframe: The Pandas DataFrame to analyze.
        query: The natural language question about the DataFrame.


    Returns:
        The response from the Gemini Pro model as a string.
    """
    prompt = f"""You are a data analysis agent. Analyze the following pandas DataFrame and answer the question.


    DataFrame:
    ```
    {dataframe.to_markdown(index=False)}
    ```


    Question: {query}


    Answer:
    """
    response = model.generate_content(prompt)
    return response.text

Here, we build an inverted format from Pandas data system and natural language inquiries, then we use the Gemini Pro model to create and re -analyze.

# Query 1: What is the total number of units sold across all products?
query1 = "What is the total number of units sold across all products?"
response1 = ask_gemini_about_data(sales_df, query1)
print(f"Question 1: {query1}")
print(f"Answer 1:\n{response1}")
print("-" * 30)
Inquiries 1 output
# Query 2: Which product had the highest number of units sold?
query2 = "Which product had the highest number of units sold?"
response2 = ask_gemini_about_data(sales_df, query2)
print(f"Question 2: {query2}")
print(f"Answer 2:\n{response2}")
print("-" * 30)
Inquiries 2 directing
# Query 3: What is the average price of the products?
query3 = "What is the average price of the products?"
response3 = ask_gemini_about_data(sales_df, query3)
print(f"Question 3: {query3}")
print(f"Answer 3:\n{response3}")
print("-" * 30)
Inquiries 3 directing
# Query 4: Show me the products sold in the 'North' region.
query4 = "Show me the products sold in the 'North' region."
response4 = ask_gemini_about_data(sales_df, query4)
print(f"Question 4: {query4}")
print(f"Answer 4:\n{response4}")
print("-" * 30)
Inquiries 4 directing
# Query 5. More complex query: Calculate the total revenue for each product.
query5 = "Calculate the total revenue (Units Sold * Price) for each product and present it in a table."
response5 = ask_gemini_about_data(sales_df, query5)
print(f"Question 5: {query5}")
print(f"Answer 5:\n{response5}")
print("-" * 30)
Inquiries 5 directing

In conclusion, the tutorial shows how to symbolize the panda, the Google.generativeai package, and the Gemini Pro model convert data analysis tasks into a more interactive and insight process. The approach simplifies query and interpretation of data and opens ways for advanced use cases such as data cleaning, feature engineering and exploratory data analysis. By harnessing these modern tools within the familiar ecological system, data scientists can enhance their productivity and innovation, which facilitates the extraction of meaningful visions of complex data collections.


Here is Clap notebook. Also, do not forget to follow us twitter And join us Telegram channel and LinkedIn GrOup. Don’t forget to join 85k+ ml subreddit.


Asif Razzaq is the CEO of Marktechpost Media Inc .. As a pioneer and vision engineer, ASIF is committed to harnessing the potential of artificial intelligence for social goodness. His last endeavor is to launch the artificial intelligence platform, Marktechpost, which highlights its in -depth coverage of machine learning and deep learning news, which is technically intact and can be easily understood by a wide audience. The platform is proud of more than 2 million monthly views, which shows its popularity among the masses.

2025-03-28 16:10:00

Related Articles

Back to top button