TDM 40200: Project 5 — 2023
Motivation: Dashboards are everywhere — many of our corporate partners' projects are to build dashboards (or dashboard variants)! Dashboards are used to interactively visualize some set of data. Dashboards can be used to display, add, remove, filter, or complete some customized operation to data. Ultimately, a dashboard is really a website focused on displaying data. Dashboards are so popular, there are entire frameworks designed around making them with less effort, faster. Two of the more popular examples of such frameworks are shiny
(in R) and dash
(in Python). While these tools are incredibly useful, it can be very beneficial to take a step back and build a dashboard (or website) from scratch (we are going to utilize many powerfuly packages and tools that make this far from "scratch", but it will still be more from scratch than those dashboard frameworks).
Context: This is the fourth in a series of projects focused around slowly building a dashboard. Students will have the opportunity to: create a backend (API) using fastapi
, connect the backend to a database using aiosql
, use the jinja2
templating engine to create a frontend, use htmx
to add "reactivity" to the frontend, create and use forms to insert data into the database, containerize the application so it can be deployed anywhere, and deploy the application to a cloud provider. Each week the project will build on the previous week, however, each week will be self-contained. This means that you can complete the project in any order, and if you miss a week, you can still complete the following project using the provided starting point.
Scope: Python, dashboards
Questions
Interested in being a TA? Please apply: purdue.ca1.qualtrics.com/jfe/form/SV_08IIpwh19umLvbE |
Question 1
In the previous projects, our functions would typically return a dict
, which, if we paired it with a JSONResponse
, would return a clean JSON object, displayed in the browser.
However, crafting every response in this manner is not a great idea. API’s need to be consistent and predictable. It is very easy to make a mistake and return a value that is the wrong type, or a value that is not expected. This is where pydantic
can greatly help us out. fastapi
is structured specifically to work with pydantic
models. In addition, one of the most critical parts of any application is the data model. One should take a lot of time considering how data is structured and flows through your application. Working with pydantic
and fastapi
will help you to do this.
In both the
Here, the
This would work using our |
Use the code below as a starting point. Create a pydantic
model to handle titles like we would have in our titles
table from the previous project. Unpack the following set of data into the pydantic
model. What happens when you try to load it into a Title
object? Modify your pydantic
Title
model to accept the data.
For this project, you can use Jupyter Lab. No need to use our VS Code setup. Please make sure to run all cells so the results are displayed.
# create pydantic model for titles here.
def main():
# load data into pydantic model here.
if __name__ == "__main__":
main()
first = {"title_id": "tt3581920", "type": "tvseries", "primary_title": "The Last of Us", "original_title": "The Last of Us", "is_adult": False, "premiered": 2023, "ended": None, "runtime_minutes": 60, "genres": "Action,Adventure,Drama"}
Great! There are a lot of ways you can craft your pydantic
models. You can make certain fields "optional" where the value can either be some type or None
. You can use Unions to specify multiple valid types. You can even specify good default values!
Hint hint: Here is a link to the docs for |
Try loading the following set of data into your Title
type. Pay close attention to the is_adult
field before and after you load the data into the Title
type. Same for the premiered
field. Do your best to explain what is happening.
second = {"title_id": "tt3581920", "type": "tvseries", "primary_title": "The Last of Us", "original_title": "The Last of Us", "is_adult": 0, "premiered": "2023", "ended": None, "runtime_minutes": 60, "genres": "Action,Adventure,Drama"}
Finally, pydantic
models validate your data — this means that you’ll get a very nice description of why your data is incorrect, if it is incorrect. Try loading the following set of data into your Title
type. Does it give you an easy to understand error message?
third = {"title_id": "tt3581920", "type": "tvseries", "primary_title": "The Last of Us", "original_title": "The Last of Us", "is_adult": 0, "premiered": "2023", "ended": None, "runtime_minutes": "60 minutes", "genres": "Action,Adventure,Drama"}
The very first code example here will demonstrate how to take a |
-
Code used to solve this problem.
-
Output from running the code.
Question 2
As you may have gathered after experimenting with pydantic
in the previous question, pydantic
will try to convert to the desired, correct type, if possible. Otherwise you will "fail fast" and receive a nice, detailed error message. If you didn’t use a tool like pydantic
, a customer using your API may receive some very unexpected behavior. For example, if your API would normally return an integer, but for some reason it returned a string instead, your customer’s code, which could be written in a completely different programming language, could break. This is why it is important to validate your data.
Take the following set of data containing the title info for "The Last of Us".
first = {"title_id": "tt3581920", "type": "tvseries", "primary_title": "The Last of Us", "original_title": "The Last of Us", "is_adult": False, "premiered": 2023, "ended": None, "runtime_minutes": 60, "genres": "Action,Adventure,Drama"}
While you built a pydantic
model to handle this data, your model is likely not ideal, yet. Take a look at the genres
. In our example it is: "Action,Adventure,Drama". However, the way our data is stored it could also be "Drama,Adventure,Action" or "Action,Romance", or any combination of a variety of different genres. genres
is really a list, not a string. Why don’t we build up our data model to handle this?
Modify your Title
model so that genres
is a list of str
. Take the first
dict
above, and make any modifications that are needed so the data is loaded into the Title
model correctly. Once you have done this, print out the Title
object to show that it is working correctly.
-
Code used to solve this problem.
-
Output from running the code.
Question 3
So far so good. While this project may be underwhelming in terms of a "wow" factor — we are just messing around with data and types — it is very important, and a good habit to practice. Using tools that validate your data will save you a lot of time and headaches in the future.
Well, our plan is to utilize pydantic
as a part of our backend, right? Well, where will our data come from? Our database! What are we using to get data from our database? aiosql
! The next task is to use aiosql
to load data from our database, and then use pydantic
to convert that data into a Title
object.
Start by establishing a connection to the database, and making a query.
%%bash
cp /anvil/projects/tdm/data/movies_and_tv/imdb.db $SCRATCH
-- name: get-title-by-id -- Given a title id, return the matching title. SELECT * FROM titles WHERE title_id=:title_id;
import aiosql
import sqlite3
queries = aiosql.from_path("queries.sql", "sqlite3")
conn = sqlite3.connect("/anvil/scratch/x-kamstut/imdb.db") # replace x-kamstut with your username
results = queries.get_title_by_id(conn, title_id="tt0108778")
Now, take results
and convert it to a Title
pydantic
model. Print out the Title
object to show that it is working correctly.
First, you will want to end up creating a
|
Don’t forget to convert the |
-
Code used to solve this problem.
-
Output from running the code.
Question 4
pydantic
makes it easy to export your data to a variety of useful formats. Take your resulting Title
object from the previous question, and demonstrate converting the model to a dict
, a json
string, and finally, demonstrate saving the model using the pickle
package. Be sure to print out the results of each conversion.
There is a whole page about this functionality in the documentation. |
-
Code used to solve this problem.
-
Output from running the code.
Question 5
Finally, one other useful feature of pydantic
, is the ability to write custom validators for your data. For example, if you wanted to make sure that the premiered
date was before the ended
date, you could write a custom validator to do this. In fact, this is exactly what we are going to do!
Read this page in the documentation. Update your Title
model to include a custom validator called sane_dates
that will check that the premiered
date is before the ended
date. Test out your validator by attempting to load the following two sets of data into a Title
object. The first one should fail with a clear message, and the last one should succeed. Be sure to include the output in your notebook cells.
failure = {"title_id": "tt3581920", "type": "tvseries", "primary_title": "The Last of Us", "original_title": "The Last of Us", "is_adult": False, "premiered": 2023, "ended": 2000, "runtime_minutes": 60, "genres": "Action,Adventure,Drama".split(",")}
success = {"title_id": "tt3581920", "type": "tvseries", "primary_title": "The Last of Us", "original_title": "The Last of Us", "is_adult": False, "premiered": 2023, "ended": 2030, "runtime_minutes": 60, "genres": "Action,Adventure,Drama".split(",")}
-
Code used to solve this problem.
-
Output from running the code.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |