Business Analytics

1.     A data analyst working in your group reports to you that he is developing a k-NN model to predict the yearly sales to potential new customers, and that he is currently using a k-fold cross-validation procedure to settle on the best value for k. Explain to your boss, who has a marketing background not a technical one, how a k-NN model works, what cross-validation is, and why the analyst shouldn’t just pick an arbitrary value for k.

2.     How do we prevent Euclidean distance from being biased by attributes with wide ranges of values?

3.     How does a weighted (“similarity-moderated”) k-NN model differ from a model in which the predictions are classifications based on majority vote or predictions based on the average of target values of the k nearest neighbors?

4.     A data analyst working for you reports that she has decided to use a standard OLS linear regression model instead of a weighted k-NN regression model, even though the k-NN regression model seems to be providing slightly higher accuracy on the holdout test set, because she believes that you will have to explain the model in great detail in order to get support from important stakeholders within the organization. What does she mean? Do you agree with her?

5.     Why would we want to perform some type of feature selection before building a k-NN model? Name one feature selection method for categorical target variables and one for numeric target variables.

6.     What is the difference between a characteristic description (of a cluster) and a differential description?

7.     How can a tree model be used to develop good differential cluster descriptions?

8.     What is the difference between the R functions lapply() and vapply()? Structure your answer around the advantages and disadvantages of each.

  1. Which R function would you use to randomly draw some numbers from the standard normal distribution?
  2. Explain what each line of the following R code is doing, and try to explain a situation where we might want to do this.

churn_data <- read.csv(churn.csv’)

idx <- sample(1:nrow(churn_data), 0.80*nrow(churn_data))

churn_train <- churn_data[idx,]

churn_test <- churn_data[-idx,]

NEED ASSIGNMENT HELP?

We guarantee plagiarism-free and AI-free writing services. Every assignment is crafted with originality, precision, and care to meet your academic needs.

Ready to get started? Place your order directly on this post!

Let us help you achieve excellence—authentic work, every time.


Leave a Reply

Your email address will not be published. Required fields are marked *