STAT 39000: Project 7 — Spring 2021
Motivation: Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! As you probably noticed in the previous project, matplotlib
can be finicky — certain types of plots are really easy to create, while others are not. For example, you would think changing the color of a boxplot would be easy to do in matplotlib
, perhaps we just need to add an option to the function call. As it turns out, this isn’t so straightforward (as illustrated at the end of matplotlib
section). Occasionally this will happen and that is when packages like seaborn
or plotnine
(both are packages built using matplotlib
) can be good. In this project we will explore this a little bit, and learn about some useful pandas
functions to help shape your data in a format that any given package requires.
Context: In the next project, we will continue to learn about and become comfortable using matplotlib
, seaborn
, and plotnine
.
Scope: python, visualizing data
Dataset
The following questions will use the dataset found in Scholar:
/class/datamine/data/apple/health/watch_dump.xml
Questions
Question 1
In an earlier project we explored some XML data in the form of an Apple Watch data dump. Most health-related apps give you some sort of graph or set of graphs as an output. Use any package you want to parse the XML data. There are a lot of Records
in this dataset. Each Record
has an attribute called creationDate
. Create a barplot of the number of Records
per day. Make sure your plot is polished, containing proper labels and good colors.
You could start by parsing out the required data into a |
The |
-
Python code used to solve the problem.
-
Output from running your code (including the graphic).
Question 2
The plot in question 1 should look bimodal. Let’s focus only on the first apparent group of readings. Create a new dataframe containing only the readings for the time period from 9/1/2017 to 5/31/2019. How many Records
are there in that time period?
-
Python code used to solve the problem.
-
Output from running your code.
Question 3
It is hard to discern weekly patterns (if any) based on the graphics created so far. For the period of time in question 2, create a labeled bar plot for the count of `Record`s by day of the week. What (if any) discernable patterns are there? Make sure to include the labels provided below:
labels = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
-
Python code used to solve the problem.
-
Output from running your code (including the graphic).
Question 4
Create a pandas
dataframe containing the following data from watch_dump.xml
:
-
A column called
bpm
with thebpm
(beats per minute) of theInstantaneousBeatsPerMinute
. -
A column called
time
with thetime
of each individualbpm
reading inInstantaneousBeatsPerMinute
. -
A column called
date
with the date. -
A column called
dayofweek
with the day of the week.
You may want to use |
This is one way to convert the numbers 0-6 to days of the week:
|
-
Python code used to solve the problem.
-
Output from running your code.
Question 5
Create a heatmap using seaborn
, where the y-axis shows the day of the week ("Mon" - "Sun"), the x-axis shows the hour, and the values on the interior of the plot are the average bpm
by hour by day of the week.
-
Python code used to solve the problem.
-
Output from running your code (including the graphic).