Blog

Putting ChatGPT to the test as a Data Analyst

Aron Saläng Data Consultant, Solita

Published 04 Jun 2024

Reading time 4 min

I wanted to try out ChatGPT’s data analysis features so I decided to use tasks from one of our Tableau training modules. The tasks are analysis questions to be answered using a given dataset.

The data analysis features are located within the Data Analyst GPT that can be accessed if you have a ChatGPT Plus subscription ($25/month).

By dropping a file onto the chat area, you can upload the data to be analysed.

The file I uploaded contains sales data for a fictitious retail company for the years 2014-2018. Every row in the data represents a receipt row. The first response I got after uploading the file was an overview of the data available in the file.

First, I wanted to prime the GPT with some instructions about how I would like responses displayed.

Okay, let’s get going. There are in total seven questions to be answered and they require an increasing level of complex reasoning to solve. Curious how the Data Analyst GPT did? Keep reading!

Question 1

Looks pretty straight forward but the answer is unfortunately incorrect. The tricky part of this question is that there’s no field in the data for Cost. There is however Sales Gross and Profit so Cost can be calculated using Sales Gross – Profit. The GPT got a little lazy and just showed Sales Gross and called it Total Costs.

After inquiring about the formula used for Cost, the GPT realized the mistake and offered a correction.

We got to the right answer but as a grading teacher I have to fail ChatGPT on the task.

Question 2

This requires some string manipulation and an understanding of how to count distinct products. Depending on if you use ProductID or Product Name to count a number of products you get 53 or 51. The answer here was right but the visualization was a bit cluttered. I asked for a different version.

Now that was nicer. Anyway, the answer was right so I’ll consider this task passed.

Question 3

Another hit! Here you had to calculate the discount amount on receipt row level and then aggregate the results. Saphhira Shipley is indeed the discountiest customer. Passed!

Question 4

All good! ChatGPT managed to compare one date to another and calculate the difference between them in hours. Good job. Passed!

Question 5

Now this one is interesting because the answer given highlighted that we actually asked for the wrong thing. The answer is right for Percent of the target. We expected the answer to be -87% for Same Day Ship Mode but that is actually Percent Difference vs Target. When asked for that ChatGPT returned what we expected.

Thank you ChatGPT for surfacing that error. Passed!

Question 6

This must have been hard because all of a sudden the response was not visualised anymore. I had to ask for that explicitly.

This task requires you to look at all orders with more than one Customer ID and then do a distinct count of how many they are. Passed!

Question 7

Again the answer was not visualised.

Here the challenge was to not only use Order ID to count orders but also include Customer Name in what makes an order unique and count that. ChatGPT passed!

All in all, I was quite impressed by ChatGPT’s analytical capabilities. The results were 6/7 questions answered correctly. Responses were clear and concise and the added bonus of getting the answers visualised was also nice.

My instructions about output and formatting were somewhat remembered. I got graphs for all but the last two questions. Decimal formatting for amounts was shown with two decimals instead of as integers after a while. I never got any labels for the bars.

As for effectiveness, these tasks usually take participants about 90 minutes to complete. A seasoned analyst such as myself could usually solve them in 20-30 minutes. Posing the questions and getting answers from Data Analyst GPT takes only a few minutes so there’s a lot of time to save here. Even if you add time for verifying the answers, you would improve the answer time.

This test worked well since I had a sample dataset. There are uncertainties around what OpenAI do with files uploaded to their servers and how long they keep them. Thus, I would not upload any sensitive data to OpenAI at this time. Since this is a common concern, I would expect OpenAI to address it in the future. Until then, finding ways to anonymise data or exclude sensitive parts would offer a way to still use the Data Analyst GPT. You could also provide a description of the data without uploading it and ask more generalised questions about how to solve certain analysis questions. That way you can use the analytic capabilities without sharing any sensitive data.

After the summer, when it has been released, I will do a similar analysis using Tableau Einstein Copilot. That’s the new feature in Tableau’s analytics suite that allows for similar analytics support using generative AI. Watch out for that.

Author