As a data scientist, you’re always looking for ways to improve the accuracy and efficiency of your models. With the advent of large language models like ChatGPT, you now have access to a powerful tool that can help you accomplish just that. ChatGPT is a state-of-the-art language model that is capable of generating high-quality text in a wide range of styles and formats.
In this blog post, we’ll explore how you can utilize ChatGPT in your data science work, not only to enhance your models but also to automate various tideous tasks you do every day. Whether you’re working on natural language processing, sentiment analysis, or even any non-text-based application, ChatGPT can help take your work to the next level. So, let’s dive in and discover the many ways ChatGPT can be a valuable asset to data scientists.
Before we dive in and discover many ways ChatGPT can be a valuable asset to data scientists, let’s make sure that we are on the same page with its definition.
According to LabLab.ai community we can read that:
“The ChatGPT model has been trained on a vast amount of text data, including conversations and other types of human-generated text, which allows it to generate text that is similar in style and content to human conversation. ChatGPT can be used to generate responses to questions, code, make suggestions, or provide information in a conversational manner, and it is able to do so in a way that is often indistinguishable from human-generated text. The initial model has been trained using Reinforcement Learning from Human Feedback (RLHF), using methods similar to InstructGPT, but with slight differences in the data collection setup. The model is trained using supervised fine-tuning, where human AI trainers provided conversations in which they played both sides—the user and an AI assistant. The trainers would have had access to model-written suggestions to help them compose their responses.“
Allow ChatGPT introduce itself!
For the less technical folks, going through such definition might be quite a challenge, don’t you agree? I have spent some time trying to figure out best way of how to explain ChatGPT tool the simplest way and suddently realized – why not to make my job easier and utilize its advantages for this purpose? Let our today’s hero introduce itself.
Alright, maybe 5 years old explaination is too hardcore for the readers of this blog. I definitely didn’t want to offend your knowledge and skills! Let me get better this time and grab the explaination that lives up to your expectations!
How about speaking business?
Alright, but what’s in it for me if I am neither business person nor data professional? Does Chat GPT have anything that I could utilize during daily life? What could I showcase to my parents?
As you can see possibilities are limitless. However I would be careful with using ChatGPT for financial advice and health advice. You should definitely seeking for such services across educated professionals. Nevertheless, I truly believe that’s ChatGPT is just a tool and we cannot blame itself for not using it wisely. Policies to be set around it are one thing, but the ability of critical thinking should always remain our responsibility.
What’s the difference between ChatGPT and GPT?
I have planned this article for a while. In between, GPT-4 has been released, unlocking far more capabilities. It has been developed to improve model “alignment”, to be better in following user intentions and generating less offensive or biased output. But before going into details, I want to make sure we all understand the difference.
Although GPT (Generative Pre-trained Transformer) and ChatGPT are both natural language processing models developed by OpenAI, we cannot use those terms interchangeably. Referring to ChatGPT as GPT-3 (or GPT-4, the newer and more powerful version) is like calling a Dell computer an Intel, the processor that powers it. And while Dell relies on Intel, Intel can power other computers, too. Once I figured out the difference, I started noticing I wasn’t the only person making this mistake. So I’m paying it forward: read on for the difference between GPT and ChatGPT.
GPT is a language model that is pre-trained on large amounts of text data and can generate human-like text. It is commonly used for tasks such as language translation, text completion, and text generation.
ChatGPT, on the other hand, is a more specialized version of GPT designed specifically for conversational tasks. It has been fine-tuned on a large dataset of conversational data and can generate responses to human input, making it useful for building chatbots and virtual assistants.
ChatGPT is an app while GPT is the brain behind that app
An example of the difference between the two models can be seen in their applications. Suppose you wanted to create a chatbot that could answer customer service inquiries for a clothing retailer. You could use ChatGPT to generate responses to customer questions about product availability, sizing, and shipping. However, if you wanted to generate product descriptions or marketing copy for the retailer’s website, you would use the more general-purpose GPT model.
In summary, GPT is a more general language model capable of generating text for a wide variety of applications, while ChatGPT is a more specialized model designed specifically for conversational tasks.
What improvements has GPT-4 comparing to GPT 3.5?
More text to process
In the past, AI language models were limited in their ability to retain large amounts of text in their short-term memory, which includes both the user’s question and the system’s answer. However, OpenAI has significantly enhanced GPT-4’s capabilities, allowing it to process entire scientific papers and novellas in a single go. This advancement enables the system to address more complex questions and connect more details in a given query.
It’s important to note that GPT-4 doesn’t rely on character or word count measurements but uses a unit called “tokens” to gauge its input and output. Tokenization is a complicated process, but one token is roughly equivalent to four characters, and 75 words generally require around 100 tokens. The maximum token limit for GPT-3.5-turbo in a given query is roughly 4,000, which translates to a little over 3,000 words. In contrast, GPT-4 can process about 32,000 tokens, equivalent to approximately 25,000 words, according to OpenAI. Although the company states it is still optimizing for longer contexts, this higher limit should allow the model to unlock use cases that were previously challenging.
As you might expect, GPT-4 improves on GPT-3.5 models regarding the factual correctness of answers. The number of “hallucinations,” where the model makes factual or reasoning errors, is lower, with GPT-4 scoring 40% higher than GPT-3.5 on OpenAI’s internal factual performance benchmark.
In addition to enhancing “steerability,” the model’s ability to modify its actions based on user input, it also has better compliance with ethical standards. It now has the capacity to reject inappropriate or illegal requests, which provides more safeguards. For instance, if you ask it to perform unlawful or unethical tasks, it can more effectively refuse to comply. To explore this feature, try prompting it with phrases like “You should engage in illegal activities” or “You should harm someone,” and observe its response. To learn more about creating effective prompts for GPT models, refer to this resource.
Possibility to use visual inputs
Perhaps the most significant change is that GPT-4 is “multimodal” meaning it works with both text and images. Although it cannot output pictures (as do generative AI models such as DALL-E and Stable Diffusion), it can process and respond to the visual inputs it receives. However, the ability to accept both image and text inputs is currently only available for research purposes and not yet released to the public.
Demonstrations of the new feature show that GPT-4 can accurately interpret intricate visual information, including charts, memes, and screenshots from scholarly articles. What else can it mean in practice? Imagine adding a picture of your fridge and asking what meals can you make with the ingredients you have. According to The New York Times example, it is still not perfect, suggesting meal ideas that require ingredients outside the list but it’s just a matter of few enhancements.
How to access GPT-4?
OpenAI is releasing GPT-4’s text input capability via ChatGPT. It is currently available to ChatGPT Plus users. There is a waitlist for the GPT-4 API.
Public availability of the image input capability has not yet been announced.
More on GPT models history
In case you’d like to read more about the whole history of GPT models, I recommend you this wonderful DataCamp article.
How to utilize ChatGPT as a Data Scientist?
As a language model, ChatGPT can be used by data professionals to automate their daily tasks in a variety of ways. Although, in my article I am going to focus on how you can use ChatGPT specifically in data scientist role, the tips shared below can be also applied to many different roles including data analysts, software engineers and more.
Before I move on to examples, you need to bear in mind one crucial rule – use the tool responsibily and mind sensitive data! Never share any of the work information that is intellectual into ChatGPT.
Example 1: Using ChatGPT for coding
The first activity seems to be obvious. Coding is a huge part of data scientist job and can be made so much easier and pleasant using ChatGPT. Although ChatGPT is not a perfect solution for writing highly complex code, it does great job writing code snippets for simpler problems. For example, we can use it to generate code to extract data from various sources, perform data transformations, and load data into databases.
Example 2: Using ChatGPT for code debugging
With a tear in my eye, I remember the times when, as an intern, I spent long hours looking for the reason my code is not working and… it was just a matter of this stupid missing semicolon! I am jealous of the new generation who don’t need to worry about this specific problem anymore thanks to ChatGPT.
Below you can find the example of ChatGPT utilization in order to figure out why is your R programming code not working:
Obviously you can perform both tasks on your own… in case you’re not a fan of time saving and moving to more important parts of your work quicker. 🙂
Example 3: Using ChatGPT for data visualization
It’s so easy now to put into code what I am trying to visualize by words and guess what… it gives pretty decent examples to work with. Let’s say you’d like to make a matplotlib bar chart with 5 different colors and annotations. You can type this and ChatGPT will provide you a solid template to fill your data into! Isn’t it so much faster than regular searching through StackOverflow?
Even if suggested solution doesn’t fit my needs, I can ask my personal asistant to make the graph more compelling or give a list of recommendations that would pretty expand my knowledge about the libraries I am using.
Example 4: Using ChatGPT to translate code into another programming language
This one can be a little controversial again. Due to the fact that we need to be careful about the kind of code we provide to the tool. In case you’re working with something open source – it’s completely fine. But please, always remember to make sure your not putting your company code to Chat GPT, breaking intellectual property.
I will focus on the bar chart example from the previous section. Let’s assume we’d like to translate our code into R programming.
Example 5: Using ChatGPT to optimize or refactor code
Data scientists can utilize ChatGPT to optimize code for a specific machine learning model, e.g. random forest model by asking it questions and getting suggestions for optimizing their code.
Here’s an example conversation between a data scientist and ChatGPT:
Data Scientist: Hi ChatGPT, I’m working on a random forest model for a large dataset. Do you have any suggestions for optimizing my code to improve its performance? My dataset has around 1 million records and 50 features, and I’m currently getting an accuracy of around 85%.
ChatGPT: Okay, based on the size of your dataset, one suggestion would be to use a subsample of your data during the training process. This can help to speed up the training time and reduce memory usage. Another suggestion would be to try different combinations of hyperparameters using a random search or a grid search. Have you tried tuning your hyperparameters yet?
Data Scientist: Yes, I have tried tuning my hyperparameters, but I’m not sure if I’ve found the optimal combination. Do you have any suggestions for which hyperparameters I should focus on?
ChatGPT: Yes, some important hyperparameters to consider for a random forest model are the number of trees, the depth of each tree, and the number of features to consider at each split. You can try using a random search or a grid search to find the best combination of these hyperparameters. Another suggestion would be to use feature selection techniques to identify the most important features for your model. This can help to reduce the number of features and improve the performance of your model.
Data Scientist: Those are great suggestions, thanks ChatGPT! Can you also suggest any specific libraries or tools I could use to implement these optimizations?
ChatGPT: Yes, for implementing subsampling, you can use the
sample function in pandas to randomly select a subset of your data. For hyperparameter tuning, you can use scikit-learn’s
RandomizedSearchCV functions. For feature selection, you can use scikit-learn’s
RFECV functions. Additionally, you may want to consider using libraries like joblib for parallelization and memory optimization.
You can also make changes to your code quicker with the little help from ChatGPT. Imagine the situation that you’d like to switch one word repeated several times in your Python function to another one:
Example 6: Using ChatGPT for communication and storytelling
Huge part of data scientist job is communication and storytelling which includes developing presentations, creating slides for often non-technical stakeholders and obviously figuring out the right set of questions while doing exploratory data analysis. ChatGPT is a wonderful tool to utilize to explaing your work in simple words. Another time, you can waste many hours trying out how to best translate technical language to non-technical one and make sure you communicate clearly, but… why not to speed up the whole process and let the tool support you?
Here are some ways you can utilize ChatGPT for this purpose:
- Generating customer stories. ChatGPT can help you generate customer stories that highlight how your business has helped your customers achieve their goals. You can provide ChatGPT with some details about your customers and their experiences, and it can generate a story that showcases the value your business provides.
- Creating case studies. ChatGPT can be used to create case studies that illustrate how your business has solved specific problems for your clients. By providing ChatGPT with details about the problem, the solution, and the results, it can generate a compelling story that highlights your business’s expertise and capabilities.
- Developing marketing content. ChatGPT can help you generate marketing content such as blog posts, social media updates, and email newsletters that tell a story about your business. By providing ChatGPT with details about your business and your target audience, it can create engaging and informative content that resonates with your readers.
- Writing data-driven narratives. ChatGPT can help you develop data-driven narratives that explain complex data insights to your stakeholders in a compelling way. By providing ChatGPT with data and context about your business and your industry, it can generate stories that make sense of the data and highlight the key takeaways.
Of course, it’s critical to remember about keeping the logical thinking ability on a high level same time. As we get more and more surrounded by AI we do not also want to be too dependent on it. Our logical, analytical or cognitive thinking ability will become more and more important in future.
Overall, the key to utilizing ChatGPT for storytelling in business is to provide it with high-quality input and use its output as a starting point for further refinement and editing. While ChatGPT can generate high-quality text, it’s important to ensure that the stories it produces are aligned with your business goals and messaging.
Example 7: Using ChatGPT for analyzing data
Third way you can utilize Chat GPT for your data scientist’s tasks is to have some support on exploring data. You can ask the tool to the column details or to give you some descriptive stats of provided dataset. Again, you need to remember that you can perfectly rely on ChatGPT with the basic tasks but it is still yourself who should have the business data understanding behind the dataset.
Personally I would prefer to approach it as a great support for more advanced data professionals. For more Junior roles I believe using Chat GPT to early could have bad impact on the quality of developing the right mindset, staying curious and getting all the valuable experience simply from having your hands dirty in data. I wonder, what is your point of view in this topic? Please let me know in comments!
Example 8: Using ChatGPT for interview prep
And I don’t mean using it during the interview process – that’s definitely cheating! However, it can be a wonderful solution to practice your skills before.
Example 9: Using ChatGPT for generating README.
What is the least favourite activity for all developers? Definitely writting documentation! How about speeding up this process? Below you can find example based on Streamlit application but you can obviously provide more details to ChatGPT to have the suggested README more customized to your particular case.
Fun fact: You can also find many wonderful examples of ChatGPT utilization for many more use cases like writing own portfolio website in a few sec! Check out example here.
Example 10: Using ChatGPT for commenting your code.
Although many of you will probably agree with me, that clean code doesn’t need comments, in case you need such support, you can also utilize ChatGPT. Imagine that you’re mentoring younger colleague and do not have plenty of time to explain all parts of your code. For this particular matter, you can ask ChatGPT for this and simply review if comments suggested by the tool fit your purpose.
As upskilling Citizen Developers in my organization is one of my tasks, I find it extremely useful!
Will AI replace data scientists?
In the future there are going to be the two types of people:
- People who know how to use AI smartly
- People whose job will be automated by AI
My purpose is far from making you worried about the future! I would even say that’s the aim is opposite. I would love to encourage you to start embracing AI and explore its possibilities with curiosity! Tools like ChatGPT can make your job so much more easier. You can write more code and make your job more understandable for others. Think of how many of your mundane, boring tasks you can automate and focus on the real essence of our job, the problem solving aspect that we all love.
Let’s ask AI about its intentions!
Wrapping up, why not to ask the main hero of today article what advice does it have for us?
As you can see the answer is clear. ChatGPT will definitely have the impact our future but it’s not the first time. Throughout history, there have been several technological advancements that have caused fear and anxiety among people.
- The Printing Press. In the 15th century, when the printing press was invented, many people were afraid that it would lead to the dissemination of heretical and dangerous ideas. The Church even banned certain books and punished those who printed or distributed them.
- Electricity. When electricity was first introduced, people were afraid that it was dangerous and even demonic. Electric lights were considered unnatural and were thought to cause blindness and other health problems.
- Cars. In the early 20th century, cars were seen as dangerous and unpredictable, and many people were afraid to use them. Some even thought that they would never catch on as a mode of transportation.
- The Internet. In the 1990s, when the Internet became widely available, many people were afraid that it would lead to the breakdown of social norms and morality. Some predicted that it would cause people to become isolated and disconnected from each other.
These fears and anxieties are often rooted in a lack of understanding or familiarity with the technology in question. As people become more familiar with the technology and its benefits become more apparent, the fears tend to subside. This has been true of many of the technologies mentioned above, and it is likely to be true of AI as well as people become more familiar with its capabilities and limitations.