Project Policy

Models Meet Data, Like Labs

Introduction

The “final” project will have a similar format to that of the lab assignments in CS 307.

We put “final” in quotation marks to hint that while this is technically a final project, but it does not carry the same weight as most course final projects. Recall from the syllabus that the “final” project is only 4% of your total course grade. This minimal weight (hopefully) serves two purposes:

  • Less stress during final exam week.
  • Creating an opportunity to creatively explore applying course concepts without significant risk to your course grade.

Similar to labs, you will be given a prompt and data, then you will develop a model, then you will write and submit a report. A similar, but slightly modified rubric will be used.

Unlike labs,

  • data will not be pre-split,
  • you will not be given metrics to outperform,
  • and you will not make a submission to auto-grader.

Importantly, you will need to determine,

  • what is acceptable performance
  • what metrics you will use to measure that performance,
  • how to split the data to appropriately estimate those metrics.

Additionally, rather than a single prompt and dataset, you will be allowed to choose from a selection of prompts.

The project prompts and data will be released on Monday, December 1.

The project report will be due on Tuesday, December 16.

Rubric Items

This rubric is extremely similar to the rubric you have seen during labs. See the results item for the one small modification.

  1. Is the source .ipynb notebook submitted?
  2. Is a rendered .html report submitted?
  3. Is the .html file properly rendered via Quarto?
    • No points will be granted if the file is rendered via Jupyter.
  4. Are both the source notebook and rendered report, including the code contained in them, well-formatted?
    • Is markdown used correctly?
    • Does the markdown render as expected?
    • Are all warnings and messages suppressed from the rendered report?
    • Is code mostly hidden from the rendered report, except where truly useful for narrative or explanation?
    • Does code follow PEP 8? While we do not expect students to be code style experts, there are some very basics we would like you to follow:
      • No blank lines at the start of cells. No more than one blank line at the end of a cell.
      • Spaces around binary operators, except for passing arguments to function parameters.
  5. Does the report have a title?
    • Does the title use (a reasonable variant of) Title Case?
  6. Does the introduction reasonably introduce the scenario?
    • Can a reader unfamiliar with CS 307 and the specific project understand why a model is being developed?
  7. Does the methods section reasonably describe the data used?
    • Is a data dictionary, describing the target and each feature, included?
  8. Does the methods section reasonably describe model development?
    • Include information on models considered, parameters considered, tuning and selection procedures, and any other methods used during model development.
  9. Is a well-formatted exploratory visualization included in the data subsection of the methods section?
    • Does the visualization provide some useful insight that informs modeling or interpretation?
    • At minimum, a well-formatted visualization should include:
      • A manually labeled \(x\)-axis using Title Case, including units if necessary.
      • A manually labeled \(y\)-axis using Title Case, including units if necessary.
      • A legend if plotting multiple categories of things.
      • A figure caption created using Quarto that describes the visualization.
  10. Does the results section provide a reasonable summary of the selected model’s performance?
    • Was the data appropriately split to provide unbiased estimates of generalization?
    • Were appropriate metrics used given the goal of the model?
  11. Is a well-formatted summary figure included in the results (or discussion) section?
    • Does the figure provide some insight into the performance or usability of the model?
    • At minimum, a well-formatted visualization should include:
      • A manually labeled \(x\)-axis using Title Case, including units if necessary.
      • A manually labeled \(y\)-axis using Title Case, including units if necessary.
      • A legend if plotting multiple categories of things.
      • A figure caption created using Quarto that describes the visualization.
  12. Is a conclusion stated in the discussion section?
    • Specifically, you must explicitly state whether or not you would use the model in practice.
  13. Does the conclusion have a reasonable justification?
    • Does the conclusion and justification consider the project scenario?
    • Answer as if you job depends on it. In the future, that might be the case!
    • Using a single numeric metric is wholly insufficient, most importantly because it lacks context. You should give serious consideration to what errors can be made by your model, and what the consequences of those errors could be.
  14. Are the specifics of the conclusion included in the discussion?
    • Are the benefits and limitations discussed if you choose to use the model?
    • Are the risks and improvements discussed if you choose to not use the model?
  15. Throughout the discussion section, are course concepts used correctly and appropriately?