Fundamentals of Data Science for Materials Scientists

Collage of Duke University photos


Course Overview

Instructor: Prof. Cynthia Rudin

This is an introductory overview course at an advanced level. It covers standard techniques, such as the perceptron algorithm, decision trees, random forests, boosting, support vector machines and reproducing kernel Hilbert spaces, regression, K-means, Gaussian mixture models and EM, neural networks, and multi-armed bandits. 

General topics include:

  • Basic machine learning evaluation techniques, including ROC curves and cross validation
  • Top 10 algorithms in data mining (including optimization and ensemble methods)
  • Statistical learning theory
  • Introductory online learning - mistake bounds, multi-armed bandits
  • Bayesian Methods in ML (Gaussian mixture models, Bayesian and frequentist interpretations)


Fluency with basic skills such as linear algebra, analysis (including proofs), probability (advanced), and programming.

Data Science and Machine Learning for Applied Science and Engineering (ME 555-09/CEE 690)

Course Overview

Instructors: Prof. Jonathan Holt & Prof. David Carlson

Information generation is accelerating in nearly all scientific domains every year due to technology advances and data collection efforts, rendering it challenging to make sense of the deluge of data.

In this special topics course, you will learn techniques to make sense of these large datasets so that we can interpret the results to create greater scientific understanding and facilitate effective engineering. You will be introduced to concepts from data science and machine learning that can be applied to a variety of applications, with specific examples related to materials science, environmental, and health data. The primary focus will be on machine learning techniques, and how to include them in statistical data analysis. The data science and machine learning concepts will focus on interpretation and evaluation, rather than mathematical proof techniques. There will be discussions of survey data and applications to time-series processes. 


This is course is designed for STEM graduate students with some experience with introductory probability & statistics, linear algebra, and basic coding in a scripting language (e.g. Python, R, Matlab).

The course materials will have significant overlap with machine learning and data science courses; therefore, if you have previously taken a graduate level course on these topics, this course may be redundant for you.