syllabus

Capstone in Data Science

DS190, Fall 2025

Jo Hardin 2351 Estella jo.hardin@pomona.edu

Class: Tuesdays, 1:15-2:30pm, Estella 2099

Office Hours: (Estella 2351)

Monday: 2:30-5:00pm
Tuesday: 9-11am
Thursday: 1:15-3pm

The Course.

Capstone in Data Science exists as a way to support the data science minor capstone courses. The capstone project is outside of statistics, mathematics, and computer science, emphasizing work in aligned disciplines. The students will bring together the work they have done in data scienc with problems in other disciplines, and they will consider the ethical implications of their questions and their work. Prerequisite: all DS minor core courses including statistics, linear algebra, computer science, data science, and ethics in data science.

Anonymous Feedback As someone who is constantly learning and growing in many ways, I welcome your feedback about the course, the classroom dynamics, or anything else you’d like me to know. There is a link to an anonymous feedback form on the landing page of our Canvas webpage. Please provide me with feedback at any time!

Student Learning Outcomes.

  • Statistical & Computational Proficiency: Students will develop expertise and apply their knowledge in statistical analysis, probability, and computational concepts relevant to data science, including efficiency and data structures.
  • Application of Computational Methods: Students will demonstrate the ability to apply algorithmic, mathematical, and scientific reasoning to computational problems using a programming language.
  • Ethical Decision-Making: Students will understand and practice ethical philosophies and frameworks for data-driven decision-making and evaluating scientific claims.
  • Effective Communication: Students will clearly and persuasively communicate data-driven analyses using literate programming and collaborate effectively with stakeholders across disciplines.

The Data Science capstone project should:

  • carry out a study and communicate results from an extensive data-driven project that is related to domain specific challenges; and
  • demonstrate competency in applying at least one type of advanced data-analytic method such as (not limited to):
    • modeling a process (e.g., generalized linear models, Bayesian analysis, advanced probability theory and stochastic processes, non-linear models, machine learning, big data analysis, econometrics, or statistical computing)
    • advanced study-design (e.g., creating a computational online study with sophisticated design to deal with non-independence)
    • advanced data visualization (e.g., creating a dashboard)
    • advanced computational data curation (e.g., scraping multiple websites and using regular expressions); and
  • be written with scripting code (i.e., not pull-down menus) using literate programming (data + code + results + narrative) and version control (e.g., GitHub);
  • include a discussion of ethical issues that came up along with any solutions you used to address the ethical issues; and
  • focus on a question originating from or responding to a domain outside of statistics, mathematics, and computer science.

Diversity and Inclusion Statement.

(adapted from Monica Linden, Brown University):

In an ideal world, science would be objective. However, much of science is subjective and is historically built on a small subset of privileged voices. In this class, we will make an effort to recognize how science (and statistics!) has played a role in both understanding diversity as well as in promoting systems of power and privilege. I acknowledge that it is possible that there may be both overt and covert biases in the material due to the lens with which it was written, even though the material is primarily of a scientific nature. Integrating a diverse set of experiences is important for a more comprehensive understanding of science. I would like to discuss issues of diversity in statistics as part of the course from time to time.

Please contact me if you have any suggestions to improve the quality of the course materials.

Furthermore, I would like to create a learning environment for my students that supports a diversity of thoughts, perspectives and experiences, and honors your identities (including race, gender, class, sexuality, religion, ability, etc.) To help accomplish this:

  • If you have a name and/or set of pronouns that differ from those that appear in your official records, please let me know!
  • If you feel like your performance in the class is being impacted by your experiences outside of class, please don’t hesitate to come and talk with me. You can also relay information to me via your mentors. I want to be a resource for you. If you prefer to speak with someone outside of the course, the math liaisons, Dean of Students, or QSC staff are all excellent resources. I (like many people) am still in the process of learning about diverse perspectives and identities. If something was said in class (by anyone) that made you feel uncomfortable, please talk to me about it. As a participant in course discussions, you should also strive to honor the diversity of your classmates.

Daily.

In class: please no phones or computers.

In class time will consist of two types of class sessions. Some of our class sessions will consist of working through the logistics surrounding your projects. For example, some days we will talk about the projects themselves, some days we will talk about how to turn in assignments, and some days you will be presenting. During 5 of our class sessions, we will talk about ethics in data science through the lens of a discipline (history, linguistics, philosophy, psychology, and economics).

There is an expectation that you attend every class meeting and stay engaged in our discussions. On the ethical data science discussion days, you will have reading that you should do before coming to class.

Writing.

The final data science capstone project is a written paper. During the semester, you will have parts of the paper due along the way. Ideally, when you get to the end of the semester, the written part will come naturally by putting together the parts from earlier in the semester. You should not leave all the writing to the end.

Reading.

There will be expected reading on each of the five days when we discuss ethical data science. The readings will be posted on the front page of the course webpage. Some of the readings are hyper-linked. Others are available at the Claremont Colleges library, and you will need to login to the library to access the reading.

Grades.

Your final grade will be calculated using the following points system below. There are 120 total points. 90% of 120 is 108 points.

topic: 5pts
annotated bibliography: 5pts
1st outline: 5pts
section draft: 5pts
introduction or 2nd section: 5pts
ethics component: 5pts
project draft: 10pts
final write-up: 25pts

1st presentation: 10pts
2nd presentation: 10pts
final presentation: 25pts

active participation in ethics conversations: 10pts

Reuse

CC-BY-SA-4.0