British Airways

Analysis and Predictions on customers' satisfaction

This project is part of British Airways Virtual Experience, on the website The Forage. Here the repository with notebooks and findings.


The Experience

During this program, we are asked to perform some Data Scientist’s job: build a reviews dataset and perform some analysis over it, and after that we are asked to perform some predictive analysis over a booking dataset to understand which are the main features that contributes to customers’ choice.


Steps

  • To analyse customers’ sentiment after a flight with British Airways, we must build a dataset. We are asked to scrape data from SkyTrax website:
    1. After inspected the website, we began our notebook using ‘request’ and ‘BeautifulSoup’ to grab and inspect reviews data from HTML pages.
    2. Then we built two datasets: the first with reviews text, the second one with customers’ impressions about the flight (web-scraping).
    3. At the end of the task, we performed a rapid analysis over these two datasets: we searched for the most used words in negative reviews (data cleaning and first analysis), and the customers' sentiment (analysis on sentiment).
  • The second task was focussed on predictive analysis: we were asked to train a model on a dataset of 50000 examples of booking process, to understand which features are more important for a customer to purchase a flight (predictive analysis):
    1. The dataset was given by British Airways and had no missing data in it, but many features was in a string format, so we needed to encode them. After checked dataset info, we decided to encode our features using Pandas categories.
    2. After this pre-processing, we analysed the dataset to understand data distribution, especially labels distribution. We noticed that this distribution was extremely skewed.
    3. We tried to build and train a ‘RandomForestClassifier’, but as we expected the estimator was very inaccurate and prone to biases.
    4. To avoid this kind of problem we tried oversampling to balance the number of positive and negative samples, and after this operation our model performed quite well on our dataset.

What we have found

General Sentiment

During the first part of the project we have found that most reviews are negative, as shown in the graph.

  • 58% of customers would not recommend British Airways
  • 42% of customers would recommend British Airways