Lab Policy

There will be a total of ten labs in CS 307. Each lab will consist of two separate but related assignments:

Each lab will involve fitting machine learning models to real world data. The models you developed will be submitted to PrairieLearn for automated testing and grading. The report you write will be submitted to Canvas for human grading.

Lab Model

The model portion of the lab will consist of two questions on PrairieLearn. It will be graded out of 10 points. Because it is autograded on PrairieLearn, it will allow for buffer points.

The Summary Statistics question will ask you to calculate several numeric summaries of the training data to get you familiar with the lab data.

The Models question will autograder the model or models that you are asked to train as a part of the lab.

Model Submission

To save your models for submission to the autograder, use the dump function from the joblib library. This process is called serialization.

from joblib import dump
dump(model_object, "filename.joblib")

With each label, the autograder will only accept a particular filename. This filename will always been given on PrairieLearn.

  • Note: Models submitted to the autograder must be less than 5MB on disk.

Lab Report

In addition to simply training models, you will also write a lab report using the IMRAD structure. A template Jupyter notebook will be provided.

IMRAD Format

While we are requiring the IMRAD format, this does not imply that you need to write a full academic paper. Stick to the template provided and generally try to be concise. You are authorized to plagiarize from the lab document that describes each lab as well as the document on the course website describing the data for the lab.

In general, when writing your report, write as if the lab prompt did not exist, and that the reader is wholly unfamiliar with CS 307.

Introduction

The introduction section should largely state the purpose of the report. That is, it should explain the why and the goal of the report. In should briefly mention what data and models will be used.

Methods

The methods section should describe what you did.

Data

The data section should do three things:

  • Describe the available data
  • Calculate any relevant summary statistics
  • Include at least one relevant visualization

Models

The models section should describe the modeling that was performed. When writing, you should not simply state what each line of your Python code does. Instead, you should describe the modeling as if you were describing it to another person.

This section will also collect the code used to train your models.

Results

The results section should plainly state the results, which will often be test metrics that evaluate the performance of your models, but you may certainly consider other statistics or visualizations.

Discussion

Be sure to state a conclusion, that is, whether or not you would use the model you trained and selected for the real world scenario described at the start of the lab! Discussion any limitations or potential improvements. Additionally, include responses to the any discussion prompts stated in the lab document.

Report Submission

After you complete the lab notebook, do the following:

  1. Clear all output.
  2. Restart the kernel.
  3. Run all cells.
  4. Preview (render) the notebook.

Note that step 4 is preview not export. To do so we will use Quarto CLI and the Quarto VSCode Extension. Installing these will allow you to render your Jupyter Notebook to a .html file using Quarto. This has a number of advantages over the use Jupyter export.

Following these steps will ensure that once you have submitted, we will very, very likely be able to reproduce your work.

Then, to submit, head to the relevant lab on Canvas. You are required to submit two files:

  1. lab-xx.ipynb
  2. lab-xx.html

Here xx should be the two-digit lab number. For example with Lab 01 you will submit:

  1. lab-01.ipynb
  2. lab-01.html

Late Submissions

Unlike other course activities, lab reports are human graded, so no buffer points will be available. Instead, reports may be submitted late, with a 10% reduction per day.

Report submission will allow for unlimited attempts. However, be aware, the grader will grade which version was most recently submitted at the time they choose to grade. Importantly, if you submit one version before the deadline, and another after the deadline, they will grade the late version.

Once a grader has graded a report, you may not submit again, even if there are late days remaining.

Grading Rubric

Lab Reports will be graded on Canvas out of a possible 10 points. Each of the 10 points will have it’s own rubric item for one point. Each rubric item will be assigned a possible values of 0, 0.5, or 1 corresponding to:

  • No issues (1)
  • Minor issues (0.5)
  • Major issues (0)

The ten rubric items are:

  1. Is a rendered .html file submitted?
  2. Is the source .ipynb file submitted?
  3. Is your document well-formatted?
    • Is markdown used correctly?
    • Does the markdown render as expected?
  4. Is your code well formatted and easy to read?
    • Does it follow PEP 8? While we do not expect students to be code style experts, there are some very basics we would like you to follow:
      • No blank lines at the start of cells. No more than one blank line at the end of a cell.
      • Spaces around binary operators, except for passing arguments to function parameters.
  5. Does the introduction reasonably introduce the scenario?
  6. Does the methods section reasonably describe the data and methods used?
  7. Is a well-formatted visualization included in the data subsection of the methods section? (If a graphic seems unreasonable, you may instead provide useful summary statistics.) At minimum every graphic should include:
    • A title that uses Title Case.
    • A manually labeled x-axis using Title Case and including units if necessary.
    • A manually labeled y-axis using Title Case and including units if necessary.
    • A legend if plot multiple categories of things.
  8. Does the results section provide a reasonable, probably numeric, summary of the model performance?
  9. When answering the discussion sections, are course concepts used appropriately?
  10. Is a conclusion stated that takes into account the scenario?