Project Policy
Models Meet Data, Like Labs
Introduction
The “final” project will have a similar format to that of the lab assignments in CS 307.
Similar to labs, you will be given a prompt and data, then you will develop a model, then you will write and submit a report. A similar, but slightly modified rubric will be used.
Unlike labs,
- data will not be pre-split,
- you will not be given metrics to outperform,
- and you will not make a submission to auto-grader.
Importantly, you will need to determine,
- what is acceptable performance
- what metrics you will use to measure that performance,
- how to split the data to appropriately estimate those metrics.
Additionally, rather than a single prompt and dataset, you will be allowed to choose from a selection of prompts.
The project prompts and data will be released on Monday, December 1.
The project report will be due on Tuesday, December 16.
Rubric Items
- Is the source
.ipynbnotebook submitted? - Is a rendered
.htmlreport submitted? - Is the
.htmlfile properly rendered via Quarto?- No points will be granted if the file is rendered via Jupyter.
- Are both the source notebook and rendered report, including the code contained in them, well-formatted?
- Is markdown used correctly?
- Does the markdown render as expected?
- Are all warnings and messages suppressed from the rendered report?
- Is code mostly hidden from the rendered report, except where truly useful for narrative or explanation?
- Does code follow PEP 8? While we do not expect students to be code style experts, there are some very basics we would like you to follow:
- No blank lines at the start of cells. No more than one blank line at the end of a cell.
- Spaces around binary operators, except for passing arguments to function parameters.
- Does the report have a title?
- Does the title use (a reasonable variant of) Title Case?
- Does the introduction reasonably introduce the scenario?
- Can a reader unfamiliar with CS 307 and the specific project understand why a model is being developed?
- Does the methods section reasonably describe the data used?
- Is a data dictionary, describing the target and each feature, included?
- Does the methods section reasonably describe model development?
- Include information on models considered, parameters considered, tuning and selection procedures, and any other methods used during model development.
- Is a well-formatted exploratory visualization included in the data subsection of the methods section?
- Does the visualization provide some useful insight that informs modeling or interpretation?
- At minimum, a well-formatted visualization should include:
- A manually labeled \(x\)-axis using Title Case, including units if necessary.
- A manually labeled \(y\)-axis using Title Case, including units if necessary.
- A legend if plotting multiple categories of things.
- A figure caption created using Quarto that describes the visualization.
- Does the results section provide a reasonable summary of the selected model’s performance?
- Was the data appropriately split to provide unbiased estimates of generalization?
- Were appropriate metrics used given the goal of the model?
- Is a well-formatted summary figure included in the results (or discussion) section?
- Does the figure provide some insight into the performance or usability of the model?
- At minimum, a well-formatted visualization should include:
- A manually labeled \(x\)-axis using Title Case, including units if necessary.
- A manually labeled \(y\)-axis using Title Case, including units if necessary.
- A legend if plotting multiple categories of things.
- A figure caption created using Quarto that describes the visualization.
- Is a conclusion stated in the discussion section?
- Specifically, you must explicitly state whether or not you would use the model in practice.
- Does the conclusion have a reasonable justification?
- Does the conclusion and justification consider the project scenario?
- Answer as if you job depends on it. In the future, that might be the case!
- Using a single numeric metric is wholly insufficient, most importantly because it lacks context. You should give serious consideration to what errors can be made by your model, and what the consequences of those errors could be.
- Are the specifics of the conclusion included in the discussion?
- Are the benefits and limitations discussed if you choose to use the model?
- Are the risks and improvements discussed if you choose to not use the model?
- Throughout the discussion section, are course concepts used correctly and appropriately?