Table of Contents
Practicing real-world problems is essential for mastering Bud, a powerful tool for data analysis and visualization. This article provides a series of practical problems along with complete solutions to enhance your skills and confidence in using Bud effectively.
Problem 1: Basic Data Filtering
Suppose you have a dataset containing sales data with columns for Region, Sales, and Date. Your task is to filter the dataset to include only sales from the “North” region.
import pandas as pd
import bud
# Sample data
data = {
'Region': ['North', 'South', 'East', 'North', 'West'],
'Sales': [100, 200, 150, 300, 250],
'Date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'])
}
df = pd.DataFrame(data)
# Load data into Bud
chart = bud.Bud(data=df)
# Filter for North region
north_sales = chart.filter(lambda row: row['Region'] == 'North')
# Display filtered data
print(north_sales.data)
**Solution Explanation:** The filter method applies a lambda function to each row, selecting only those where Region equals “North”.
Problem 2: Creating a Bar Chart
Create a bar chart to visualize total sales per region from the dataset.
import pandas as pd
import bud
# Sample data
data = {
'Region': ['North', 'South', 'East', 'North', 'West'],
'Sales': [100, 200, 150, 300, 250]
}
df = pd.DataFrame(data)
# Load data into Bud
chart = bud.Bud(data=df)
# Aggregate sales by region
sales_by_region = chart.groupby('Region').sum('Sales')
# Create bar chart
chart.bar(x='Region', y='Sales', data=sales_by_region.data)
**Solution Explanation:** The groupby method aggregates sales data by region, and bar visualizes the totals in a bar chart.
Problem 3: Time Series Analysis
Analyze sales trends over time by plotting total sales per month.
import pandas as pd
import bud
# Sample data
data = {
'Date': pd.to_datetime([
'2023-01-01', '2023-01-15', '2023-02-01', '2023-02-15', '2023-03-01'
]),
'Sales': [100, 150, 200, 250, 300]
}
df = pd.DataFrame(data)
# Load into Bud
chart = bud.Bud(data=df)
# Extract month and year
chart.data['Month_Year'] = chart.data['Date'].dt.to_period('M')
# Group by month and sum sales
monthly_sales = chart.groupby('Month_Year').sum('Sales')
# Plot line chart
chart.line(x='Month_Year', y='Sales', data=monthly_sales.data)
**Solution Explanation:** The code extracts month-year from dates, groups data accordingly, and plots a line chart to visualize sales trends over time.
Problem 4: Handling Missing Data
Fill missing sales data with the average sales value.
import pandas as pd
import bud
# Sample data with missing values
data = {
'Product': ['A', 'B', 'C', 'D'],
'Sales': [100, None, 150, None]
}
df = pd.DataFrame(data)
# Load into Bud
chart = bud.Bud(data=df)
# Fill missing Sales with mean
mean_sales = chart.data['Sales'].mean()
chart.data['Sales'] = chart.data['Sales'].fillna(mean_sales)
# Verify
print(chart.data)
**Solution Explanation:** The fillna method replaces missing values with the mean of existing sales data, ensuring data completeness for analysis.
Problem 5: Combining Multiple Dataframes
Combine two datasets: one with sales data and another with customer data, based on a common key.
import pandas as pd
import bud
# Sales data
sales_data = {
'OrderID': [1, 2, 3],
'CustomerID': [101, 102, 103],
'Sales': [250, 450, 300]
}
# Customer data
customer_data = {
'CustomerID': [101, 102, 104],
'CustomerName': ['Alice', 'Bob', 'Charlie']
}
df_sales = pd.DataFrame(sales_data)
df_customers = pd.DataFrame(customer_data)
# Load into Bud
sales_chart = bud.Bud(data=df_sales)
customers_chart = bud.Bud(data=df_customers)
# Merge datasets on CustomerID
merged_data = pd.merge(sales_chart.data, customers_chart.data, on='CustomerID', how='inner')
# Display merged data
print(merged_data)
**Solution Explanation:** The pd.merge function combines datasets based on CustomerID, resulting in a dataset with sales and customer information.
Conclusion
These practice problems cover essential skills in data filtering, visualization, time series analysis, handling missing data, and data merging using Bud. Regular practice with such real-world scenarios will enhance your ability to analyze data efficiently and accurately.