Bellabeat wants to understand how users of other smart devices engage with activity, sleep, and wellness features so they can identify which behaviors are most relevant to Bellabeat customers. This analysis will help Bellabeat decide which user habits to highlight, support, or target in their marketing strategy for the Leaf.
The data utilized is the FitBit Fitness Tracker Data from Kaggle. It is important to note the limitations: the dataset originates from 2016 and includes only 30 participants who consented to the study.
Since the Leaf product tracks activity, sleep, and stress, this case study focuses on daily movement and sleep patterns.
To ensure a reproducible and scalable analysis, I am using R for data transformation and visualization.
This case study follows a transparent workflow. All code is shown so recruiters, reviewers, or collaborators can understand the decisions made at each stage of the analysis.
I will begin by loading the tidyverse for data
manipulation, lubridate for handling date-time objects, and
janitor for data cleaning.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
# The data sets that I will be using
activity <- read_csv("Case Study/Fitbase_Data_160412-160512/dailyActivity_merged.csv") %>% clean_names()
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleep <- read_csv("Case Study/Fitbase_Data_160412-160512/sleepDay_merged.csv") %>% clean_names()
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
For this part, I’ll be using glimpse
glimpse(activity)
## Rows: 940
## Columns: 15
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960…
## $ activity_date <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/1…
## $ total_steps <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 130…
## $ total_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ tracker_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ logged_activities_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_distance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3…
## $ moderately_active_distance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1…
## $ light_active_distance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5…
## $ sedentary_active_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_minutes <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66,…
## $ fairly_active_minutes <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, …
## $ lightly_active_minutes <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 205…
## $ sedentary_minutes <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775, 8…
## $ calories <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2…
glimpse(sleep)
## Rows: 413
## Columns: 5
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1…
## $ sleep_day <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM",…
## $ total_sleep_records <dbl> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ total_minutes_asleep <dbl> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430,…
## $ total_time_in_bed <dbl> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449,…
activity <- activity %>%
mutate(activity_date = mdy(activity_date))
sleep <- sleep %>%
mutate(sleep_day = as_date(mdy_hms(sleep_day)))
merged <- activity %>%
inner_join(sleep, by = c("id", "activity_date" = "sleep_day"))
The merged dataset combines daily activity and sleep records per user per date, allowing a unified view of daily wellness behavior.
head(merged)
## # A tibble: 6 × 18
## id activity_date total_steps total_distance tracker_distance
## <dbl> <date> <dbl> <dbl> <dbl>
## 1 1503960366 2016-04-12 13162 8.5 8.5
## 2 1503960366 2016-04-13 10735 6.97 6.97
## 3 1503960366 2016-04-15 9762 6.28 6.28
## 4 1503960366 2016-04-16 12669 8.16 8.16
## 5 1503960366 2016-04-17 9705 6.48 6.48
## 6 1503960366 2016-04-19 15506 9.88 9.88
## # ℹ 13 more variables: logged_activities_distance <dbl>,
## # very_active_distance <dbl>, moderately_active_distance <dbl>,
## # light_active_distance <dbl>, sedentary_active_distance <dbl>,
## # very_active_minutes <dbl>, fairly_active_minutes <dbl>,
## # lightly_active_minutes <dbl>, sedentary_minutes <dbl>, calories <dbl>,
## # total_sleep_records <dbl>, total_minutes_asleep <dbl>,
## # total_time_in_bed <dbl>
merged %>% get_dupes(id, activity_date)
## # A tibble: 6 × 19
## id activity_date dupe_count total_steps total_distance tracker_distance
## <dbl> <date> <int> <dbl> <dbl> <dbl>
## 1 4.39e9 2016-05-05 2 9603 7.38 7.38
## 2 4.39e9 2016-05-05 2 9603 7.38 7.38
## 3 4.70e9 2016-05-07 2 14370 11.6 11.6
## 4 4.70e9 2016-05-07 2 14370 11.6 11.6
## 5 8.38e9 2016-04-25 2 12405 9.84 9.84
## 6 8.38e9 2016-04-25 2 12405 9.84 9.84
## # ℹ 13 more variables: logged_activities_distance <dbl>,
## # very_active_distance <dbl>, moderately_active_distance <dbl>,
## # light_active_distance <dbl>, sedentary_active_distance <dbl>,
## # very_active_minutes <dbl>, fairly_active_minutes <dbl>,
## # lightly_active_minutes <dbl>, sedentary_minutes <dbl>, calories <dbl>,
## # total_sleep_records <dbl>, total_minutes_asleep <dbl>,
## # total_time_in_bed <dbl>
merged <- merged %>% distinct()
merged %>% get_dupes(id, activity_date)
## No duplicate combinations found of: id, activity_date
## # A tibble: 0 × 19
## # ℹ 19 variables: id <dbl>, activity_date <date>, dupe_count <int>,
## # total_steps <dbl>, total_distance <dbl>, tracker_distance <dbl>,
## # logged_activities_distance <dbl>, very_active_distance <dbl>,
## # moderately_active_distance <dbl>, light_active_distance <dbl>,
## # sedentary_active_distance <dbl>, very_active_minutes <dbl>,
## # fairly_active_minutes <dbl>, lightly_active_minutes <dbl>,
## # sedentary_minutes <dbl>, calories <dbl>, total_sleep_records <dbl>, …
summary(merged)
## id activity_date total_steps total_distance
## Min. :1.504e+09 Min. :2016-04-12 Min. : 17 Min. : 0.010
## 1st Qu.:3.977e+09 1st Qu.:2016-04-19 1st Qu.: 5189 1st Qu.: 3.592
## Median :4.703e+09 Median :2016-04-27 Median : 8913 Median : 6.270
## Mean :4.995e+09 Mean :2016-04-26 Mean : 8515 Mean : 6.012
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:11370 3rd Qu.: 8.005
## Max. :8.792e+09 Max. :2016-05-12 Max. :22770 Max. :17.540
## tracker_distance logged_activities_distance very_active_distance
## Min. : 0.010 Min. :0.0000 Min. : 0.000
## 1st Qu.: 3.592 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 6.270 Median :0.0000 Median : 0.570
## Mean : 6.007 Mean :0.1089 Mean : 1.446
## 3rd Qu.: 7.950 3rd Qu.:0.0000 3rd Qu.: 2.360
## Max. :17.540 Max. :4.0817 Max. :12.540
## moderately_active_distance light_active_distance sedentary_active_distance
## Min. :0.0000 Min. :0.010 Min. :0.0000000
## 1st Qu.:0.0000 1st Qu.:2.540 1st Qu.:0.0000000
## Median :0.4200 Median :3.665 Median :0.0000000
## Mean :0.7439 Mean :3.791 Mean :0.0009268
## 3rd Qu.:1.0375 3rd Qu.:4.918 3rd Qu.:0.0000000
## Max. :6.4800 Max. :9.480 Max. :0.1100000
## very_active_minutes fairly_active_minutes lightly_active_minutes
## Min. : 0.00 Min. : 0.00 Min. : 2.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:158.0
## Median : 9.00 Median : 11.00 Median :208.0
## Mean : 25.05 Mean : 17.92 Mean :216.5
## 3rd Qu.: 38.00 3rd Qu.: 26.75 3rd Qu.:263.0
## Max. :210.00 Max. :143.00 Max. :518.0
## sedentary_minutes calories total_sleep_records total_minutes_asleep
## Min. : 0.0 Min. : 257 Min. :1.00 Min. : 58.0
## 1st Qu.: 631.2 1st Qu.:1841 1st Qu.:1.00 1st Qu.:361.0
## Median : 717.0 Median :2207 Median :1.00 Median :432.5
## Mean : 712.1 Mean :2389 Mean :1.12 Mean :419.2
## 3rd Qu.: 782.8 3rd Qu.:2920 3rd Qu.:1.00 3rd Qu.:490.0
## Max. :1265.0 Max. :4900 Max. :3.00 Max. :796.0
## total_time_in_bed
## Min. : 61.0
## 1st Qu.:403.8
## Median :463.0
## Mean :458.5
## 3rd Qu.:526.0
## Max. :961.0
merged %>%
summarise(sd_light = sd(lightly_active_minutes, na.rm = TRUE))
## # A tibble: 1 × 1
## sd_light
## <dbl>
## 1 86.7
Based on this summary, these are the insights I’ve gathered:
Daily Activity Insights
Users average ≈ 8,500 steps, but 25% of days fall under 5,200 steps—showing inconsistent movement habits. Bellabeat can position Leaf as a tool for building daily consistency rather than chasing high step goals.
Users log a high amount of light activity with a median of ≈ 208 minutes (≈ 3.5 hours). However, there’s a standard deviation of 40%—indicating moderate variability. This suggests that users’ movement habits fluctuate depending on schedule or lifestyle factors, indicating moderate inconsistency in light activity.
Moderate and intense activity are low—Only 25% of days reach more than 27 minutes of moderate activity. Bellabeat can target women who prefer low-impact routines, positioning Leaf as ideal for walking, household chores, and casual movement.
Users spend around 12 hours per day sedentary. This presents a strong opportunity for Bellabeat to emphasize movement reminders, posture nudges, and inactivity alerts as key Leaf features.
Sleep insights
Overall, users show moderate activity with high time spent sedentary and inconsistent sleep duration. Light activity dominates daily movement, suggesting lifestyle-driven wellness rather than workout-focused routines. These behavior patterns reveal opportunities for Bellabeat to support consistent movement and better sleep through targeted reminders and wellness-focused product messaging.
How does activity intensity relate to how long users sleep?
Are users consistent in their daily activity levels?
How often do users meet recommended sleep duration (7–9 hours)?
What movement patterns appear across the week?
# Correlation between type of activity and sleep
daily_activity_sleep <- merged %>%
mutate(SleepHours = total_minutes_asleep / 60)
activity_sleep_long <- daily_activity_sleep %>%
mutate(SleepHours = total_minutes_asleep / 60) %>%
select(
SleepHours,
sedentary_minutes,
lightly_active_minutes,
fairly_active_minutes,
very_active_minutes
) %>%
pivot_longer(
cols = -SleepHours,
names_to = "activity_type",
values_to = "minutes"
)
ggplot(activity_sleep_long, aes(x = minutes, y = SleepHours)) +
geom_point(alpha = 0.3, color = "#e5664a") +
geom_smooth(method = "lm", se = FALSE, color = "#343434") +
facet_wrap(~ activity_type, scales = "free_x") +
labs(
title = "Relationship Between Activity Intensity and Sleep Duration",
subtitle = "Each panel shows how a type of activity relates to hours of sleep",
x = "Minutes of Activity",
y = "Hours of Sleep"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
axis.title = element_text(size = 10),
axis.text = element_text(size = 9),
strip.text = element_text(face = "bold"),
# 👇 THESE control the background
panel.background = element_rect(fill = "#f6f3e8", color = NA),
plot.background = element_rect(fill = "#f6f3e8", color = NA),
strip.background = element_rect(fill = "#f6f3e8", color = NA)
)
Sleep is driven more by lifestyle balance and inactivity reduction than by exercise intensity.
# Measuring Consistency using Standard Deviation
activity_stats <- merged %>%
summarise(
mean_steps = mean(total_steps, na.rm = TRUE),
sd_steps = sd(total_steps, na.rm = TRUE),
cv_steps = sd_steps / mean_steps,
mean_light = mean(lightly_active_minutes, na.rm = TRUE),
sd_light = sd(lightly_active_minutes, na.rm = TRUE),
cv_light = sd_light / mean_light,
mean_fair = mean(fairly_active_minutes, na.rm = TRUE),
sd_fair = sd(fairly_active_minutes, na.rm = TRUE),
cv_fair = sd_fair / mean_fair,
mean_very = mean(very_active_minutes, na.rm = TRUE),
sd_very = sd(very_active_minutes, na.rm = TRUE),
cv_very = sd_very / mean_very,
mean_sedentary = mean(sedentary_minutes, na.rm = TRUE),
sd_sedentary = sd(sedentary_minutes, na.rm = TRUE),
cv_sedentary = sd_sedentary / mean_sedentary
) %>%
pivot_longer(
cols = everything(),
names_to = c("metric", "category"),
names_sep = "_",
values_to = "value"
) %>%
pivot_wider(
names_from = metric,
values_from = value
)
activity_stats
## # A tibble: 5 × 4
## category mean sd cv
## <chr> <dbl> <dbl> <dbl>
## 1 steps 8515. 4157. 0.488
## 2 light 217. 86.7 0.400
## 3 fair 17.9 22.4 1.25
## 4 very 25.0 36.2 1.45
## 5 sedentary 712. 166. 0.233
The coefficient of variation reveals that users are highly inconsistent in moderate and vigorous activity (CV = 1.25–1.44), indicating that structured exercise is not a regular habit. Light activity and steps show moderate variability, reflecting lifestyle-driven movement rather than planned workouts. Sedentary time is the most consistent behavior (CV = 0.23), suggesting prolonged inactivity is a stable part of users’ daily routines.
These patterns highlight a strong opportunity for Bellabeat to focus on small-movement encouragement, sedentary reduction, and holistic wellness rather than high-intensity fitness content.
Bar chart of Coefficient of Variation (CV)
ggplot(activity_stats, aes(x = category, y = cv)) +
geom_col(fill = "#e5664a") +
labs(
title = "Consistency of Daily Behaviors",
subtitle = "Higher values mean less consistent habits",
x = "Activity Type",
y = "Coefficient of Variation (CV)",
color = "#343434"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold"),
axis.text = element_text(size = 10),
panel.background = element_rect(fill = "#f6f3e8", color = NA),
plot.background = element_rect(fill = "#f6f3e8", color = NA),
strip.background = element_rect(fill = "#f6f3e8", color = NA)
)
# Percentage of users that meet the recommended sleep
merged %>%
summarise(
pct_meet_sleep = mean(total_minutes_asleep >= 420 & total_minutes_asleep <= 540) * 100
)
## # A tibble: 1 × 1
## pct_meet_sleep
## <dbl>
## 1 46.3
A 46% sleep compliance rate suggests:
users struggle with regular routines
sleep may be influenced by stress, work hours, or inconsistent lifestyle
bedtime consistency is weak
recovery habits are unstable
Since recommended sleep is achieved less than half the time, Bellabeat can create campaigns around “small steps toward better sleep,” highlighting micro-habits (earlier wind-down, reduced screen time) supported by the Leaf.
How Often Users Meet Recommended Sleep
sleep_pct <- data.frame(
status = c("Meets 7–9h Sleep", "Does Not Meet"),
percent = c(46.34, 100 - 46.34)
)
ggplot(sleep_pct, aes(x = status, y = percent, fill = status)) +
geom_col(width = 0.6) +
coord_flip() +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
scale_fill_manual(values = c(
"Meets 7–9h Sleep" = "#e5664a",
"Does Not Meet" = "#cbb8a6"
)) +
labs(
title = "How Often Users Meet Recommended Sleep",
x = "",
y = "Percentage of Days"
) +
theme_minimal() +
theme(
legend.position = "none",
plot.background = element_rect(fill = "#f6f3e8", color = NA),
panel.background = element_rect(fill = "#f6f3e8", color = NA),
text = element_text(color = "#343434"),
axis.text = element_text(color = "#343434"),
axis.title = element_text(color = "#343434"),
plot.title = element_text(face = "bold")
)
# Activity patterns across weeks
merged <- merged %>%
mutate(weekday = wday(activity_date, label = TRUE, abbr = TRUE))
weekday_activity <- merged %>%
group_by(weekday) %>%
summarise(
avg_steps = mean(total_steps, na.rm = TRUE),
avg_sedentary = mean(sedentary_minutes, na.rm = TRUE),
avg_light = mean(lightly_active_minutes, na.rm = TRUE),
avg_fair = mean(fairly_active_minutes, na.rm = TRUE),
avg_very = mean(very_active_minutes, na.rm = TRUE)
)
Bar chart of Average Daily Steps
ggplot(weekday_activity, aes(x = weekday, y = avg_steps)) +
geom_col(fill = "#e5664a", width = 0.7) +
labs(
title = "Average Daily Steps",
subtitle = "User activity peaks on weekends and early in the workweek",
x = "Day of the Week",
y = "Average Steps"
) +
theme_minimal() +
theme(
plot.background = element_rect(fill = "#f6f3e8", color = NA),
panel.background = element_rect(fill = "#f6f3e8", color = NA),
panel.grid.major = element_line(color = "#e0ddd4"),
panel.grid.minor = element_blank(),
plot.title = element_text(
face = "bold",
size = 14,
color = "#343434"
),
plot.subtitle = element_text(
size = 10,
color = "#343434"
),
axis.title = element_text(size = 10, color = "#343434"),
axis.text = element_text(size = 9, color = "#343434")
)
Users are most active on weekends, with Saturday showing the highest step count of the week. Activity remains high at the start of the workweek (Mon–Tue), then gradually declines before reaching the lowest point on Sunday. This indicates that user movement is driven by weekend routines and early-week motivation.
Bellabeat can leverage this pattern by pushing weekend wellness content, Monday goal-setting, and Sunday recovery prompts.
# Histogram
ggplot(merged, aes(x = total_minutes_asleep / 60)) +
geom_histogram(
binwidth = 0.5,
fill = "#e5664a",
color = "#f6f3e8"
) +
scale_x_continuous(
limits = c(3, 10),
breaks = seq(3, 10, by = 1),
expand = c(0.01, 0)
) +
labs(
title = "Distribution of Sleep Duration",
x = "Hours Asleep",
y = "Frequency"
) +
theme_minimal() +
theme(
plot.background = element_rect(fill = "#f6f3e8", color = NA),
panel.background = element_rect(fill = "#f6f3e8", color = NA),
plot.title = element_text(
face = "bold",
size = 14,
color = "#343434"
),
axis.title = element_text(
size = 10,
color = "#343434"
),
axis.text = element_text(
size = 10,
color = "#343434"
),
panel.grid.major = element_line(color = "#e0ddd5"),
panel.grid.minor = element_blank()
)
# Preparing data
intensity_breakdown <- merged %>%
select(lightly_active_minutes, fairly_active_minutes, very_active_minutes) %>%
summarise(
light = mean(lightly_active_minutes, na.rm = TRUE),
fair = mean(fairly_active_minutes, na.rm = TRUE),
very = mean(very_active_minutes, na.rm = TRUE)
) %>%
pivot_longer(cols = everything(), names_to = "intensity", values_to = "minutes")
# Bar Graph
ggplot(intensity_breakdown, aes(x = intensity, y = minutes)) +
geom_col(fill = "#e5664a", width = 0.7) +
labs(
title = "Average Daily Activity Intensity Breakdown",
x = "Intensity Level",
y = "Average Minutes"
) +
theme_minimal() +
theme(
plot.background = element_rect(fill = "#f6f3e8", color = NA),
panel.background = element_rect(fill = "#f6f3e8", color = NA),
plot.title = element_text(
face = "bold",
size = 14,
color = "#343434"
),
axis.title = element_text(
size = 10,
color = "#343434"
),
axis.text = element_text(
size = 10,
color = "#343434"
),
panel.grid.major = element_line(color = "#e0ddd5"),
panel.grid.minor = element_blank()
)
This confirms that users prefer low-effort, lifestyle-based movement
Users are not engaging in regular, structured moderate exercise.
1. Promote Consistency Over High Performance: Users are inconsistent with steps and light activity.Bellabeat can position Leaf as a daily habit-builder.
2. Introduce Sedentary Interrupt Features: Users sit ≈ 12 hours/day.Promote posture alerts, stretch reminders, “stand and reset” nudges.
3. Support Sleep Routine Stabilization: Sleep compliance is only 46%. Create a “Leaf Sleep Guide”: bedtime reminders, screen-time reduction cues.
4. Design Weekend + Monday Motivation Campaigns: Steps peak on Saturday and Monday. Use these days to push challenges, streaks, and motivational prompts.
5. Highlight Low-Impact Wellness: Most users do NOT engage in moderate or intense exercise. Bellabeat should emphasize walking, light movement, and lifestyle activity.
Cons
Fitbit users may not represent Bellabeat’s core audience
The sample size is small and self-selected
The data is old (2016)
We can see correlation, not causation
The data suggests Bellabeat’s biggest opportunity is not intense workouts, but helping users reduce prolonged sitting, build light daily movement, and improve sleep consistency — especially with only 46% of days meeting sleep goals.
However, these findings are based on limited and short-term data, and don’t capture factors like stress, work schedules, or long-term habit changes. To make stronger product decisions, Bellabeat should collect deeper, longer-term behavioral data and explore why users struggle with sleep and consistency — not just how often.
This creates a clear next step: use Leaf not just to track behavior, but to learn what truly drives it.