The ChatGPT Code Interpreter is surreal: It creates and executes Python code for whatever task you describe, debugs its own runtime errors, displays charts, does file uploads/downloads, and suggests sensible next steps all along the way.
Whether you write code yourself today or not, you can take advantage of GPT-4's stellar natural-language input/output capabilities to interact with the Code Interpreter. The mind-blowing experience is equivalent to having an expert data analyst, data scientist or software developer with you to instantaneously respond to your questions or requests.
As an example of these jaw-dropping capabilities (and given the data science-focused theme of my show), I use today's episode demonstrate the ChatGPT Code Interpreter's full automation of data analysis and machine learning. If you watch the episode on YouTube, you can even see the Code Interpreter hands-on in action while I interact with it solely with natural language.
Over the course of today's episode/video, the Code Interpreter:
1. Receives a sample data file that I provide it.
2. Uses natural language to describe all of the variables that are in the file.
3. Performs a four-step Exploratory Data Analysis (EDA), including histograms, scatterplots that compare key variables and key summary statistics (all explained in natural language).
4. Preprocesses all of my variables for machine learning.
5. Selects an appropriate baseline ML model, trains it and quantitatively evaluates its performance.
6. Suggests alternative models and approaches (e.g., grid search) to get even better performance and then automatically carries these out.
7. Optionally provides Python code every step of the way and is delighted to answer any questions I have about the code.
The whole process is a ton of fun and, again, requires no coding abilities to use (the "Code Interpreter" moniker could be misleadingly intimidating to non-coding folks). Even as an experienced data scientist, however, I would estimate that in many everyday situations use of the Code Interpreter could decrease my development time by a crazy 90% or more.
The big caveat with all of this is whether you're comfortable sharing your code with OpenAI. I wouldn't provide proprietary company code to it without clearing it with your firm first and — if you do use proprietary code with it — turn "Chat history & training" off in your ChatGPT Plus settings. To circumnavigate the data-privacy issue entirely, you could alternatively try Meta's newly-released "Code Llama — Instruct 34B" Large Language Model on your own infrastructure. Code Llama won't, however, be as good as the Code Interpreter in many circumstances and will require some technical savvy to get it up and running.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.