Machine Learning for Data Analysis

Product type

Machine Learning for Data Analysis

Coursera (CC)
Logo Coursera (CC)
Provider rating: starstarstarstar_halfstar_border 7.2 Coursera (CC) has an average rating of 7.2 (out of 6 reviews)

Need more information? Get more details on the site of the provider.

Description

When you enroll for courses through Coursera you get to choose for a paid plan or for a free plan

  • Free plan: No certicification and/or audit only. You will have access to all course materials except graded items.
  • Paid plan: Commit to earning a Certificate—it's a trusted, shareable way to showcase your new skills.

About this course: Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering. By completing this course, you will learn how to apply, test, and …

Read the complete description

Frequently asked questions

There are no frequently asked questions yet. If you have any more questions or need help, contact our customer service.

Didn't find what you were looking for? See also: Science, Python, Software / System Engineering, English (FCE / CAE / CPE), and Teaching Skills.

When you enroll for courses through Coursera you get to choose for a paid plan or for a free plan

  • Free plan: No certicification and/or audit only. You will have access to all course materials except graded items.
  • Paid plan: Commit to earning a Certificate—it's a trusted, shareable way to showcase your new skills.

About this course: Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering. By completing this course, you will learn how to apply, test, and interpret machine learning algorithms as alternative methods for addressing your research questions.

Created by:  Wesleyan University
  • Taught by:  Jen Rose, Research Professor

    Psychology
  • Taught by:  Lisa Dierker, Professor

    Psychology
Basic Info Course 4 of 5 in the Data Analysis and Interpretation Specialization Language English How To Pass Pass all graded assignments to complete the course. User Ratings 4.0 stars Average User Rating 4.0See what learners said Coursework

Each course is like an interactive textbook, featuring pre-recorded videos, quizzes and projects.

Help from your peers

Connect with thousands of other learners and debate ideas, discuss course material, and get help mastering concepts.

Certificates

Earn official recognition for your work, and share your success with friends, colleagues, and employers.

Wesleyan University At Wesleyan, distinguished scholar-teachers work closely with students, taking advantage of fluidity among disciplines to explore the world with a variety of tools. The university seeks to build a diverse, energetic community of students, faculty, and staff who think critically and creatively and who value independence of mind and generosity of spirit.

Syllabus


WEEK 1


Decision Trees



In this session, you will learn about decision trees, a type of data mining algorithm that can select from among a large number of variables those and their interactions that are most important in predicting the target or response variable to be explained. Decision trees create segmentations or subgroups in the data, by applying a series of simple rules or criteria over and over again, which choose variable constellations that best predict the target variable.


7 videos, 15 readings expand


  1. Reading: Some Guidance for Learners New to the Specialization
  2. Reading: SAS or Python - Which to Choose?
  3. Reading: Getting Started with SAS
  4. Reading: Getting Started with Python
  5. Reading: Course Codebooks
  6. Reading: Course Data Sets
  7. Reading: Uploading Your Own Data to SAS
  8. Reading: Data Set for Decision Tree Videos (tree_addhealth.csv)
  9. Video: What Is Machine Learning?
  10. Video: Machine Learning and the Bias Variance Trade-Off
  11. Video: What Is a Decision Tree?
  12. Video: What is the Process of Growing a Decision Tree?
  13. Reading: SAS Code: Decision Trees
  14. Reading: CART Paper - Prevention Science
  15. Video: Building a Decision Tree with SAS
  16. Video: Strengths and Weaknesses of Decision Trees in SAS
  17. Reading: Python Code: Decision Trees
  18. Video: Building a Decision Tree with Python
  19. Reading: Installing Graphviz and pydotplus
  20. Reading: Getting Set up for Assignments
  21. Reading: Tumblr Instructions
  22. Reading: Assignment Example

Graded: Running a Classification Tree

WEEK 2


Random Forests



In this session, you will learn about random forests, a type of data mining algorithm that can select from among a large number of variables those that are most important in determining the target or response variable to be explained. Unlike decision trees, the results of random forests generalize well to new data.


4 videos, 4 readings expand


  1. Video: What Is A Random Forest and How Is It "Grown"?
  2. Reading: SAS code: Random Forests
  3. Reading: The HPForest Procedure in SAS
  4. Video: Building a Random Forest with SAS
  5. Reading: Python Code: Random Forests
  6. Video: Building a Random Forest with Python
  7. Video: Validation and Cross-Validation
  8. Reading: Assignment Example

Graded: Running a Random Forest

WEEK 3


Lasso Regression



Lasso regression analysis is a shrinkage and variable selection method for linear regression models. The goal of lasso regression is to obtain the subset of predictors that minimizes prediction error for a quantitative response variable. The lasso does this by imposing a constraint on the model parameters that causes regression coefficients for some variables to shrink toward zero. Variables with a regression coefficient equal to zero after the shrinkage process are excluded from the model. Variables with non-zero regression coefficients variables are most strongly associated with the response variable. Explanatory variables can be either quantitative, categorical or both. In this session, you will apply and interpret a lasso regression analysis. You will also develop experience using k-fold cross validation to select the best fitting model and obtain a more accurate estimate of your model’s test error rate. To test a lasso regression model, you will need to identify a quantitative response variable from your data set if you haven’t already done so, and choose a few additional quantitative and categorical predictor (i.e. explanatory) variables to develop a larger pool of predictors. Having a larger pool of predictors to test will maximize your experience with lasso regression analysis. Remember that lasso regression is a machine learning method, so your choice of additional predictors does not necessarily need to depend on a research hypothesis or theory. Take some chances, and try some new variables. The lasso regression analysis will help you determine which of your predictors are most important. Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets. The cross-validation method you apply is designed to eliminate the need to split your data when you have a limited number of observations.


5 videos, 3 readings expand


  1. Video: What is Lasso Regression?
  2. Reading: SAS Code: Lasso Regression
  3. Video: Testing a Lasso Regression with SAS
  4. Video: Data Management for Lasso Regression in Python
  5. Video: Testing a Lasso Regression Model in Python
  6. Reading: Python Code: Lasso Regression
  7. Video: Lasso Regression Limitations
  8. Reading: Assignment Example

Graded: Running a Lasso Regression Analysis

WEEK 4


K-Means Cluster Analysis



Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based on their similarity of responses on multiple variables. Clustering variables should be primarily quantitative variables, but binary variables may also be included. In this session, we will show you how to use k-means cluster analysis to identify clusters of observations in your data set. You will gain experience in interpreting cluster analysis results by using graphing methods to help you determine the number of clusters to interpret, and examining clustering variable means to evaluate the cluster profiles. Finally, you will get the opportunity to validate your cluster solution by examining differences between clusters on a variable not included in your cluster analysis. You can use the same variables that you have used in past weeks as clustering variables. If most or all of your previous explanatory variables are categorical, you should identify some additional quantitative clustering variables from your data set. Ideally, most of your clustering variables will be quantitative, although you may also include some binary variables. In addition, you will need to identify a quantitative or binary response variable from your data set that you will not include in your cluster analysis. You will use this variable to validate your clusters by evaluating whether your clusters differ significantly on this response variable using statistical methods, such as analysis of variance or chi-square analysis, which you learned about in Course 2 of the specialization (Data Analysis Tools). Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets.


6 videos, 3 readings expand


  1. Video: What Is a k-Means Cluster Analysis?
  2. Video: Running a k-Means Cluster Analysis in SAS, pt. 1
  3. Video: Running a k-Means Cluster Analysis in SAS, pt. 2
  4. Reading: SAS Code: k-Means Cluster Analysis
  5. Reading: Python Code: k-Means Cluster Analysis
  6. Video: Running a k-Means Cluster Analysis in Python, pt. 1
  7. Video: Running a k-Means Cluster Analysis in Python, pt. 2
  8. Video: k-Means Cluster Analysis Limitations
  9. Reading: Assignment Example

Graded: Running a k-means Cluster Analysis
There are no reviews yet.

    Share your review

    Do you have experience with this course? Submit your review and help other people make the right choice. As a thank you for your effort we will donate $1.- to Stichting Edukans.

    There are no frequently asked questions yet. If you have any more questions or need help, contact our customer service.