At the invitation of dr. Mirjana Kljajić, we participated in the 32nd Bled eConference. The conference is one of the most important in the region on trends and technologies for electronic communication and this year a short workshop on Data Science with Orange was included in its programme.
We covered many topics, including data loading, visualization, construction of a predictive modeling workflow, exploration of decision trees, overfitting, model scoring and of course finally predicting with cross-validated model on a new data set. One of the classic exercises for such courses includes using the Attrition - Train and Attrition - Predict data sets from the Datasets widget and answering the following questions:
- Find one or two interesting variables in a Box Plot and describe what they show.
- Use all known models in Test and Score. Which one has the highest classification accuracy?
- Which three variables are the most important predictors for Logistic Regression?
- On a new data set, identify the person that is most likely to quit.
- What would be your recommendations to the HR department, considering the Nomogram?