Urbana Weather

This page presents information about the Urbana Weather dataset which will be used as a part of Lab 01 in CS 307.

It’s not so much the heat, it’s the humidity that’ll kill you.

– John Candy as Irving “Irv” Blitzer

Source

The Urbana Weather data was collected using the Open-Meteo API. Specifically, the Historical Weather API was used.

  • Zippenfenig, P. (2023). Open-Meteo.com Weather API [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.7970649

The Historical Weather API is based on reanalysis datasets and uses a combination of weather station, aircraft, buoy, radar, and satellite observations to create a comprehensive record of past weather conditions. These datasets are able to fill in gaps by using mathematical models to estimate the values of various weather variables. As a result, reanalysis datasets are able to provide detailed historical weather information for locations that may not have had weather stations nearby, such as rural areas or the open ocean.

Additional citations specific the the weather models used by the API can be found on the Open-Meteo website.

Urbana Weather Data

The Urbana Weather data was accessed using:

  • Latitude: 40.1106
  • Longitude: -88.2073

On a map, this places the location almost exactly at Lincoln Square which is home to both Common Ground Food Co-Operative and the Market at the Square. Market at the Square is the Urbana Farmer’s market. If you’ve never been, we highly recommend it!

Open-Meteo provides excellent documentation on their APIs.

The above link will provide detailed information about how to use the API for the Urbana location. It will even automatically generate Python code to make a request to the API and collect the results as a pandas data frame!

Data Dictionary

The Urbana Weather dataset used in CS 307 will include additional preprocessing for use in Lab 01. We document that specific data here:

temperature_2m_min

  • [float64] the minimum air temperature at 2 meters above ground for the day

year

  • [int64] year , such as 2020

month

  • [int64] month , such as 10 for October

day

  • [int64] day of the month, for example 20 for January 20

day_of_year

  • [int64] day of the year, for example 100, which in non-leap years in is April 9

The index of the data frame is the full date using the ISO 8601 standard. For example, 2020-07-04.

Data for Machine Learning

For CS 307 lab, we provide a pre-split train, validation-train, and validations datasets, stored as CSV files, accessible via the web.

Note: The data here is not split randomly. The different datasets are split according to time.

  • Train: 2016 - 2022
  • Validation-Train: 2016 - 2020
  • Validation: 2021 - 2022
  • Test: 2023 (Only available within the CS 307 autograder.)

Loading the Data

import pandas as pd
weather_train = pd.read_csv(
    "https://cs307.org/lab-01/data/weather-train.csv",
    index_col="date",
    parse_dates=True
)
weather_vtrain = pd.read_csv(
    "https://cs307.org/lab-01/data/weather-vtrain.csv",
    index_col="date",
    parse_dates=True
)
weather_validation = pd.read_csv(
    "https://cs307.org/lab-01/data/weather-validation.csv",
    index_col="date",
    parse_dates=True
)
weather_train
temperature_2m_min year month day day_of_year
date
2016-01-01 -4.2715 2016 1 1 1
2016-01-02 -3.8715 2016 1 2 2
2016-01-03 -4.4715 2016 1 3 3
2016-01-04 -3.0215 2016 1 4 4
2016-01-05 -5.7715 2016 1 5 5
... ... ... ... ... ...
2022-12-27 -11.0520 2022 12 27 361
2022-12-28 -5.9020 2022 12 28 362
2022-12-29 5.0980 2022 12 29 363
2022-12-30 3.0480 2022 12 30 364
2022-12-31 -1.8020 2022 12 31 365

2557 rows × 5 columns

Visualization

import matplotlib.pyplot as plt

# Create a figure and axis object with a wider width than height
fig, ax = plt.subplots(figsize=(10, 4))

# Scatter 'temperature_2m_min' on the y-axis with a specific point size
ax.scatter(weather_train.index, weather_train["temperature_2m_min"], s=10, c="dodgerblue")

# Calculate the rolling mean with a window of 7 days and plot it
weather_train["temperature_2m_min"].rolling(window=14, center=True).mean().plot(
    ax=ax, color="darkorange", linewidth=2
)

# Set the title and labels
ax.set_title("Urbana, IL: Temperature Over Time")
ax.set_xlabel("Date")
ax.set_ylabel("Daily Temperature Minimum (Celsius)")

# Rotate x-axis labels
plt.xticks(rotation=45)

# Add buffer around the plot by adjusting the x-axis limits
buffer = pd.Timedelta(days=60)  # Adjust the buffer size as needed
ax.set_xlim([weather_train.index.min() - buffer, weather_train.index.max() + buffer])

# add grid
ax.grid(True, linestyle="--", color="lightgrey")
ax.set_axisbelow(True)  # put grid behind the points

# Show the plot
plt.show()