## datatable function from DT package create an HTML widget display of the dataset
## install DT package if the package is not yet available in your R environment
readxl::read_excel("dataset/dataset-variable-description.xlsx") |>
DT::datatable()
Instructions
Welcome to the midterm project exercise! In this exercise, you will work on a project that involves analyzing employee attrition and performance using the HR Analytics Employee Attrition & Performance dataset. The primary goal is to develop insights into the factors that contribute to employee attrition and provide recommendations for HR interventions that could help reduce attrition and improve overall employee satisfaction and performance.
Download the project files using this link: Midterm project files. Be sure to download the entire folder midterm-econ148-project-exercise. The project files include the dataset and a template Quarto markdown file that you will use to complete the exercise.
First, rename the lastname-firstname.qmd file to your actual last name and first name. For example, if your name is John Doe, the file should be named doe-john.qmd. Afterward, open the R project 20241007-econ148-project-exercise-1.Rproj by double-clicking the file and render the Quarto markdown file to generate the HTML file.
Project overiew
In this project, we will explore employee attrition and performance using the HR Analytics Employee Attrition & Performance dataset. The primary goal is to develop insights into the factors that contribute to employee attrition.
The dataset used for this project provides information about employee demographics, performance metrics, and various satisfaction ratings. By analyzing a range of factors, including demographic data, job satisfaction, work-life balance, and job role, we aim to help businesses identify key areas where they can improve employee retention.
Data wrangling and management
Libraries
Before we start working on the dataset, we need to load the necessary libraries that will be used for data wrangling, analysis and visualization. Make sure to load the following libraries here. For packages to be installed, you can use the install.packages function. There are packages to be installed later on this project, so make sure to install them as needed and load them here.
# load all your libraries hereData importation
Import the two dataset
Employee.csvandPerformanceRating.csv. Save theEmployee.csvasemployee_dtaandPerformanceRating.csvasperf_rating_dta.Merge the two dataset using the
left_joinfunction fromdplyr. Use theEmployeeIDvariable as the varible to join by. You may read more information about theleft_joinfunction here.Save the merged dataset as
hr_perf_dtaand display the dataset using thedatatablefunction fromDTpackage.
## import the two data here
## merge employee_dta and perf_rating_dta using left_join function.
## save the merged dataset as hr_perf_dta
## Use the datatable from DT package to display the merged datasetData management
Using the
clean_namesfunction fromjanitorpackage, standardize the variable names by using the recommended naming of variables.Save the renamed variables as
hr_perf_dtato update the dataset.
## clean names using the janitor packages and save as hr_perf_dta
## display the renamed hr_perf_dta using datatable functionCreate a new variable
cat_educationwhereineducationis1=No formal education;2=High school;3=Bachelor;4=Masters;5=Doctorate. Use thecase_whenfunction to accomplish this task.Similarly, create new variables
cat_envi_sat,cat_job_sat, andcat_relation_satforenvironment_satisfaction,job_satisfaction, andrelationship_satisfaction, respectively. Re-code the values accordingly as1=Very dissatisfied;2=Dissatisfied;3=Neutral;4=Satisfied; and5=Very satisfied.Create new variables
cat_work_life_balance,cat_self_rating,cat_manager_ratingforwork_life_balance,self_rating, andmanager_rating, respectively. Re-code accordingly as1=Unacceptable;2=Needs improvement;3=Meets expectation;4=Exceeds expectation; and5=Above and beyond.Create a new variable
bi_attritionby transformingattritionvariable as a numeric variabe. Re-code accordingly asNo=0, andYes=1.Save all the changes in the
hr_perf_dta. Note that saving the changes with the same name will update the dataset with the new variables created.
## create cat_education
## create cat_envi_sat, cat_job_sat, and cat_relation_sat
## create cat_work_life_balance, cat_self_rating, and cat_manager_rating
## create bi_attrition
## print the updated hr_perf_dta using datatable functionExploratory data analysis
Descriptive statistics of employee attrition
Select the variables
attrition,job_role,department,age,salary,job_satisfaction, andwork_life_balance.Save asattrition_key_var_dta.Compute and plot the attrition rate across
job_role,department, andage,salary,job_satisfaction, andwork_life_balance. To compute for the attrition rate, group the dataset by job role. Afterward, you can use thecountfunction to get the frequency of attrition for each job role and then divide it by the total number of observations. Save the computation aspct_attrition. Do not forget to ungroup before storing the output. Store the output asattrition_rate_job_role.Plot for the attrition rate across
job_rolehas been done for you! Study each line of code. You have the freedom to customize your plot accordingly. Show your creativity!
## selecting attrition key variables and save as `attrition_key_var_dta`
## compute the attrition rate across job_role and save as attrition_rate_job_role
## print attrition_rate_job_role## Plot the attrition rateAnalysis of compensation and turnover
Compare the average monthly income of employees who left the company (
bi_attrition = 1) and those who stayed (bi_attrition = 0). Use thet.testfunction to conduct a t-test and determine if there is a significant difference in average monthly income between the two groups. Save the results in a variable calledattrition_ttest_results.Install the
reportpackage and use thereportfunction to generate a report of the t-test results.Install the
ggstatsplotpackage and use theggbetweenstatsfunction to visualize the distribution of monthly income for employees who left and those who stayed. Make sure to map thebi_attritionvariable to thexargument and thesalaryvariable to theyargument.Visualize the
salaryvariable for employees who left and those who stayed usinggeom_histogramwithgeom_freqpoly. Make sure to facet the plot by thebi_attritionvariable and applyalphaon the histogram plot.Provide recommendations on whether revising compensation policies could be an effective retention strategy.
## compare the average monthly income of employees who left and those who stayed
## print the results of the t-test## install the report package and use the report function to generate a report of the t-test results# install ggstatsplot package and use ggbetweenstats function to visualize the distribution of monthly income for employees who left and those who stayed# create histogram and frequency polygon of salary for employees who left and those who stayedProvide your discussion here.
Employee satisfaction and performance analysis
Analyze the average performance ratings (both
ManagerRatingandSelfRating) of employees who left vs. those who stayed. Use thegroup_byandcountfunctions to calculate the average performance ratings for each group.Visualize the distribution of
SelfRatingfor employees who left and those who stayed using a bar plot. Use theggplotfunction to create the plot and map theSelfRatingvariable to thexargument and thebi_attritionvariable to thefillargument.Similarly, visualize the distribution of
ManagerRatingfor employees who left and those who stayed using a bar plot. Make sure to map theManagerRatingvariable to thexargument and thebi_attritionvariable to thefillargument.Create a boxplot of
salarybyjob_satisfactionandbi_attritionto analyze the relationship between salary, job satisfaction, and attrition. Use thegeom_boxplotfunction to create the plot and map thesalaryvariable to thexargument, thejob_satisfactionvariable to theyargument, and thebi_attritionvariable to thefillargument. You need to transform thejob_satisfactionandbi_attritionvariables into factors before creating the plot or within theggplotfunction.Discuss the results of the analysis and provide recommendations for HR interventions based on the findings.
# Analyze the average performance ratings (both ManagerRating and SelfRating) of employees who left vs. those who stayed.# Visualize the distribution of SelfRating for employees who left and those who stayed using a bar plot.# Visualize the distribution of ManagerRating for employees who left and those who stayed using a bar plot.# create a boxplot of salary by job_satisfaction and bi_attrition to analyze the relationship between salary, job satisfaction, and attrition.Provide your discussion here.
Work-life balance and retention strategies
At this point, you are already well aware of the dataset and the possible factors that contribute to employee attrition. Using your R skills, accomplish the following tasks:
Analyze the distribution of WorkLifeBalance ratings for employees who left versus those who stayed.
Use visualizations to show the differences.
Assess whether employees with poor work-life balance are more likely to leave.
You have the freedom how you will accomplish this task. Be creative and provide insights that will help HR develop effective retention strategies.