import pandas as pd
Lab 01: Urbana Weather
Scenario: You are the manager for the Market at the Square, the local Urbana Farmer’s Market. Each year, sometime in Autumn, the market moves from outdoors to indoors. You’d like the be able to reliably predict when to make the move, but well in advance, to give vendors certainty about when the change will take place, as not all vendors make the switch to indoors. You hope to find a model for the minimum daily temperature (as the market opens early in the morning, and vendors arrive even earlier) so that you can predict when it will be too cold to hold the market outdoors.
Goal
The goal of this lab is to create a regression model that predicts the minimum daily temperature in Urbana, IL for a particular day of the year.
Data
This lab will use data collected from Open-Meteo.
Response
temperature_2m_min
Features
year
day_of_year
Data in Python
To load the data in Python, use:
= pd.read_csv(
weather_train "https://cs307.org/lab-01/data/weather-train.csv",
="date",
index_col=True
parse_dates
)= pd.read_csv(
weather_vtrain "https://cs307.org/lab-01/data/weather-vtrain.csv",
="date",
index_col=True
parse_dates
)= pd.read_csv(
weather_validation "https://cs307.org/lab-01/data/weather-validation.csv",
="date",
index_col=True
parse_dates )
To create the X
and y
data for the various datasets, use:
= weather_train[["year", "day_of_year"]]
X_train = weather_train["temperature_2m_min"]
y_train
= weather_vtrain[["year", "day_of_year"]]
X_vtrain = weather_vtrain["temperature_2m_min"]
y_vtrain
= weather_validation[["year", "day_of_year"]]
X_validation = weather_validation["temperature_2m_min"] y_validation
Sample Statistics
Before modeling, be sure to look at the data. Calculate the summary statistics requested on PrairieLearn and create a visualization for your report.
Models
For this lab you will select one model to submit to the autograder. You may use any modeling techniques you’d like. The only rules are:
- Models must start from the given training data, unmodified.
- Importantly, the type and shape of
X_train
andy_train
should not be changed. - In the autograder, we will call
mod.predict(X_test)
on your model, where your model is loaded asmod
andX_test
has a compatible shape with and the same variable names asX_train
.
- Importantly, the type and shape of
- Your model must have a
fit
method. - Your model must have a
predict
method. - Your serialized model must be less than 5MB.
To obtain the maximum points via the autograder, your model performance must meet or exceed:
Test RMSE: 5.0
Model Persistence
To save your model for submission to the autograder, use the dump
function from the joblib
library. Check PrairieLearn for the filename that the autograder expects.
Discussion
As always, be sure to state a conclusion, that is, whether or not you would use the model you trained and selected for the real world scenario described at the start of the lab! If you are asked to train multiple models, first make clear which model you selected and are considering for use in practice. Discuss any limitations or potential improvements.
Additional discussion topics:
- Does the overall strategy here seem appropriate? Do you have any general weather knowledge that suggests an obvious flaw here?
- Be sure you have read the data background, paying attention to how the data was collected and split.
- Assuming you used KNN, does distance make sense here? What are the distance between two dates in time? Does this actually make sense?
When answering discussion prompts: Do not simply answer the prompt! Answer the prompt, but write as if the prompt did not exist. Write your report as if the person reading it did not have access to this document!
Template Notebook
Submission
On Canvas, be sure to submit both your source .ipynb
file and a rendered .html
version of the report.