Student and Faculty Guidelines

for the DS 190 Data Science Capstone

Welcome to the Data Science Capstone Guidelines! This document serves to answer questions you have about the capstone course. Please reach out to Jo Hardin (jo.hardin@pomona.edu) with any questions.

DS 190 Catalog description:

A required capstone project seminar in which senior data science minors focus their disciplinary curricular backgrounds (outside of statistics, mathematics, or computer science) on a sophisticated data science project. Professional skills developed include: ethics, project management, collaborative software development, documentation and consulting. Regular meetings, weekly progress reports, interim and final reports, and multiple presentations are required. Open only to Data Science minors. Prerequisites: DS002R (or equivalent), CSCI051 (or equivalent), MATH060, MATH058 (or equivalent), and data science ethics.

Prerequisites:

the 5 core courses must be completed by the end of the junior year in order to be eligible for the capstone course (fall only) which is taken in the fall of the senior year. The capstone course is PERM only. You must formally PERM the course (via course registration) and also fill out the PERM form which includes details of the courses you have taken as well as the structures in place to get the capstone up and running in September of the senior year.

Capstone learning goals:

  • Statistical & Computational Proficiency: Students will develop expertise and apply their knowledge in statistical analysis, probability, and computational concepts relevant to data science, including efficiency and data structures.
  • Application of Computational Methods: Students will demonstrate the ability to apply algorithmic, mathematical, and scientific reasoning to computational problems using a programming language.
  • Ethical Decision-Making: Students will understand and practice ethical philosophies and frameworks for data-driven decision-making and evaluating scientific claims.
  • Effective Communication: Students will clearly and persuasively communicate data-driven analyses using literate programming and collaborate effectively with stakeholders across disciplines.

The Data Science capstone project should:

  • carry out a study and communicate results from an extensive data-driven project that is related to domain specific challenges; and
  • demonstrate competency in applying at least one type of advanced data-analytic methods such as (not limited to):
    • modeling a process (e.g., generalized linear models, Bayesian analysis, advanced probability theory and stochastic processes, non-linear models, machine learning, big data analysis, econometrics, or statistical computing)
    • advanced study-design (e.g., creating a computational online study with sophisticated design to deal with non-independence)
    • advanced data visualization (e.g., creating a dashboard)
    • advanced computational data curation (e.g., scraping multiple websites and using regular expressions); and
  • be written with scripting code (i.e., not pull-down menus) using literate programming (data + code + results + narrative) and version control (e.g., GitHub);
  • include a discussion of ethical issues that came up along with any solutions you used to address the ethical issues; and
  • focus on a question originating from or responding to a domain outside of statistics, mathematics, and computer science.

You might be curious to look through summaries of previous capstones (click on “DS Minor Capstones”).

Capstone structure:

  • Offered in fall semesters
  • Taken in the student’s senior year
  • Includes close work with a faculty member with weekly meetings to discuss project
  • ~5 hours of work per week, outside of the course meeting time (as consistent with Pomona’s course credit definition)
  • Grades will be assigned based on a combination of domain specific criteria and timely submission of assignments

Student dates:

  • Spring junior year – have completed the 5 core requirements, PERM into capstone class for the following fall semester.
  • Fall senior year – complete the DS capstone project / course
  • By the end of the senior year – complete the additional elective course

Student responsibilities:

  • Find a faculty member who can support the project and provide domain expertise. The faculty member need not already be involved with Pomona’s Data Science minor, but you might be interested in which faculty members are affiliated with the minor.
    • Pro tip: ask the faculty mentor if you can help facilitate a project of theirs.
    • Potential mentor might be in a non-academic office (e.g., sustainability, IIE, athletics, student health center, etc.)
  • Meet with faculty mentor regularly. Identify resources on campus that can provide technical support.
  • Perform the computational work required in the project.
  • Communicate the results of the project in writing and in a formal presentation.

Faculty project mentor responsibilities:

  • Meet with student regularly.
  • Meet with other DS capstone periodically.
  • Faculty mentors are not responsible for support outside of their expertise.

Pre-approved upper division courses

The following courses have been pre-approved as the upper division elective course required for the minor at Pomona.

Approved upper division electives with programming

  • Data Science For Conservation Biology - BIOL108 PO
  • Intro Computational Neuroscience - BIOL133L KS
  • Population Genomics - BIOL136: KS
  • Genomics and Bioinformatics - BIOL156L KS
  • Genomics & Transcriptomics w/Lab - BIOL170 PO
  • Genomics & Bioinformatics w/Lab - BIOL173 PO or BIOL 173B PO
  • Data Analysis for Life Sciences - BIOL174 PO
  • Computer Science for Insight - CSCI035 HM
  • Computer Systems - CSCI105 PO
  • Programming Languages - CSCI131 PO
  • Database Systems - CSCI133 PO
  • Operating Systems Principles - CSCI134 PO
  • Applied Algorithms - CSCI143 PO
  • Artificial Intelligence - CSCI151 PO
  • Neural Networks - CSCI152 PO
  • Machine Learning - CSCI158 PO
  • Natural Language Processing - CSCI159 PO
  • Real-Time Graphics and Game Engine Programming - CSCI181G PO
  • Advanced Functional Programming - CSCI181N PO
  • Computer Organization and Design - CSCI181ORPO
  • Graph Algorithm and Application - CSCI181Q PO
  • System Security - CSCI181S PO
  • Managing Complex Systems Lab - CSCI181SLPO
  • Principles of Programming Languages: Object-Oriented - CSCI181V PO
  • Data Analytics & Visualization - CSCI-181AP HM
  • Applied Econometrics - ECON107 PO
  • Economics of Sports - ECON 118
  • Economics of Crime - ECON 120
  • Data Science & Stats Learning - ECON122 CM
  • Poverty and Income Distribution - ECON122 PO
  • Econometrics I - ECON125 CM
  • Empirical Methods of Industrial Organization - ECON132
  • Labor Economics - ECON150 PO
  • Econometrics - ECON167 PO
  • Advanced Econometrics - ECON169 PO
  • Corpus Linguistics - LGCS124 PO
  • Computational Linguistics / NLP - LGCS129 PO
  • Topics in Quantitative Linguistics - LGCS181 PO
  • Computation and Experimentation in Mathematics - MATH061 PO
  • Methods in Biostatistics - MATH150 PO
  • Bayesian Statistics - MATH153 PO / HM
  • Computational Statistics - MATH154 PO
  • Time Series - MATH155 PO / HM
  • Statistical Linear Models - MATH158 PO / HM
  • Mathematical Modeling - MATH 183 PO / SC and MATH 185 SC
  • Introduction to Computer Music - MUS 088 HM
  • Neuroimaging with fMRI w/ Lab - NEUR118 PO
  • Introduction to Computational Neuroscience - NEUR133L KS
  • Selected topics in Computational Neuroscience - NEUR155L KS
  • Machine Learning w Neural Signal - NEUR182 SC / PSYCH182 SC
  • Computational Phys/Engineering - PHYS100 KS
  • Programming for Science+Engineer - PHYS108 KS
  • Advanced Statistics: Psychometrics and Multivariate Methods - PSYCH137 CMC
  • Emotion & Motivation w/ Lab - PSYC163 PO

Approved upper division electives without programming

If your introductory statistics course included a substantial component of coding/programming with data or coding/programming algorithms, then the upper division elective can be fulfilled with a course that does not include programming.

  • Usable Security and Privacy - CSCI181W PO
  • Chinese Language in Society - CHIN150 PO
  • Chinese Language and Gender - CHIN153 PO
  • GIS for Geologists - GEOL189G PO
  • Introduction to Digital Humanities, Women and Politics in Latin America - HIST101M PO
  • Probability Theory - MATH 151
  • Statistical Theory - MATH 152
  • Advanced Linear Algebra - MATH173 PO

Reuse

CC-BY-SA-4.0