📚 Importing Libraries

📥 Load Dataset

📝 About Dataset:

Context

This dataset contains 119390 observations for a City Hotel and a Resort Hotel. Each observation represents a hotel booking between the 1st of July 2015 and 31st of August 2017, including booking that effectively arrived and booking that were canceled.

Content

Since this is hotel real data, all data elements pertaining hotel or costumer identification were deleted. Four Columns, 'name', 'email', 'phone number' and 'credit_card' have been artificially created and added to the dataset.

Acknowledgements

The data is originally from the article Hotel Booking Demand Datasets, written by Nuno Antonio, Ana Almeida, and Luis Nunes for Data in Brief, Volume 22, February 2019.

No Columns name Meaning
1 hotel The datasets contains the booking information of two hotel. One of the hotels is a resort hotel and the other is a city
2 is_canceled Value indicating if the booking was canceled (1) or not (0).
3 country Country of origin.
4 market_segment Market segment designation. In categories, the term “TA” means “Travel Agents” and “TO” means “Tour Operators”.
5 adr Average Daily Rate (Calculated by dividing the sum of all lodging transactions by the total number of staying nights).

Basic info about dataset

🧹️ Data cleaning:

Removed unwanted columns:

In our dataset there is features name, email, phone-number and credit_card, company will never give his costumers personal information so we need to delete those columns.

Changed DataType:

reservation_status_date was in object type converted to datetime.

Handling Missing Values:

➥ In agent and company there is huge missing values we can't handle it so i have deleted those columns.

➥ As you can see most occurred country is PRT so i'm going to fill missing values with PRT and for children fill with median.

Handling Outliers:

➥ As you can see data is right skweed and there is big outlier present.

⏳ Data Transformation:

➥ AS you can see in is_canceled column there is two category 0 means Not Cancelled and 1 means Cancelled.

➥ Data transformed 0 to Not_Cancelled and 1 to Cancelled.

📊 Data Analysis and Visualization

➢ Reservation Status:

➢ Reservation Status in Hotels:

➢ Average daily rate in City and Resort Hotel:

➢ Reservation Per Months:

➢ Average daily Rate Per Months when Cancelled:

➢ Top 10 country where Cancellation is more :

➢ Top Reservation Channels:

➢ Average Daily Rate when reservation Cancelled and Not Cancelled: