We will predict Employee Attrition using Artificial Neural Networks.
Table of Contents
- Data Preprocessing
- Create ANN
- Make predictions
- Evaluate - Improve - Tune ANN
Part 1: Data Preprocessing
# Get data from Github and copy that in the cache
!wget https://raw.githubusercontent.com/dhruvpratapsingh/Deep-Learning/master/SupervisedDL/ANN/employee_attrition/Employee-Attrition.csv ./
If you click on “Files” tab in the left panel, you should see the .csv file.
Intuition for Cleaning the data
1. Remove a column with same value for all the rows
StandardHours
Over18
2. Remove the columns with PII(Personally identifiable information)
EmployeeID
3. Move y to start or the end column, to make things easier to slice (Optional)
Remember that in python index slicing we include the start index and exclude the end index.
Use first index i.e. 0 as y (dependent variable) and 1-32 as X (independent variables)
SOME TRICKS
- ‘print(y)’ or to see the y vector.
- ‘X.shape’ to see the shape of the matrix.
- We use Uppercase X as it is a matrix and lowercase y as it is a vector
Encoding categorical data
As the algorithms work best with numbers.
Set the display max columns to see all columns.
Columns that need to encoded:
- Attrition y[0]
- BusinessTravel X[1]
- Department X[3]
- EducationField X[6]
- Gender X[9]
- JobRole X[13]
- MaritalStatus X[15]
- OverTime X[19]
First 5 rows look something like this… rest of the columns not in the image.
Splitt data into training and test set
Feature Scaling
Part 2: Create Artificial Neural Network
The result of this is: We got 92.3% accuracy at best.
0.9237288095183291
0.8596697045390578
0.030821566189208408
Tune parameters using GridSearchCV (Cross Validation)
To find the best parameters we can give grid_search some values to test and it will find the best combo.
Resulting best parameters:
{'batch_size': 25, 'epochs': 100, 'optimizer': 'adam'}
You can find complete python code here.
Please let me know if you have any questions or suggestions in the comments section below. Thanks.
comments powered by