How to reason with ChatGPT4
ChatGPT4 is the latest version of the ChatGPT application from OpenAI. It is not only considerably larger than the previous 3.5 model but also seems to be displaying subtle traits of “general” intelligence.
AGI stands for Artificial General Intelligence, which is the ability of a computer or agent to intellectually understand and carry out tasks like a human. A recent paper from March 2023 authored by researchers at Microsoft titled ‘Sparks of Artificial General Intelligence’1 had access to a non-aligned (unrestricted)version of ChatGPT4 to play with, and they discovered a number of emerging behaviours that indicate that the model may be showing the first signs of AGI.
ChatGPT4 with OpenAI is only available via a subscription at this stage, but Microsoft’s Edge browser with Bing Chat has ChatGPT4 as its base model for free. Though you may need to sign up for a waiting list, you will be able to use the prompts I give you here for free on Bing.
I’m going to offer some prompts used in the research and others I’ve found to elicit “some’ reasoning from ChatGPT. These prompts should also give you some pointers in the future directions of your prompts particularly with reasoning and the future capabilities of these models.
Common Sense Grounding
As humans we take a lot for granted in the world around us to be able to make reasonable assumptions and reasoning. This is something that an AI system may not inherently understand. The simplest way to test this with AI is to present it with puzzles.
Try this prompt below (all prompts are sourced from the Microsoft research cited at endnote 1, unless otherwise stated):
A hunter walks one kilometre east, another kilometre south and then another west and he ends up back where he started. He sees a bear and shoots it. What colour is the bear?
GPT3 incorrectly answered this puzzle by stating it was unable to determine the colour of the bear in this situation. However, GPT4 correctly answered it as white, as the bear would be at the North Pole. So, does this appear to show the subtle first sign of AGI? The AI model appears to have made an assumption based on the data, not what it was told.
Try this prompt:
If you throw a small iron ball from the top of a 20-storey building, what will happen to the ball?
GPT3 answers incorrectly and says it will shatter. GPT4, however, says it will only be slightly damaged as it seems to show a better understanding of the properties of an iron ball with its answer. Just as a side note, the GPT4 aligned version on OpenAI answer came with instructions about the safety of others at the bottom of the building. Whereas the non-aligned (unrestricted) version that the researchers used, did not.
Chain of Thought Prompting
A paper by Google suggested a while back that “chain of thought prompting” can significantly improve the output of an LLM .2
It can really help to bolster ChatGPT’s reasoning capability and the best example of that is to advise ChatGPT4 to do its task “step by step”. If there is any part of any prompt that I have come across that I have found the most useful it would be these three words: “step by step”.
If you were to type in the following prompt in GPT4 or GPT3
7*4+8*8 =
You may just get the answer as 88, which is wrong. Sometimes you may also get the right answer. Sometimes you may even get an explanation. To make sure that you have a better chance of getting a right answer is to possibly create a prompt like this:
Find the value of the following expression 7 *4 + 8 *8. Think of the solution step by step and write down all the intermediate steps before your final answer.
As usual, the more explicit you are in your instructions, the better the outcome.
“Few Shot” Prompting
“Few shot” prompting involves giving ChatGPT4 a few examples of the answer you require. In this type of prompting, you don’t need to give specific instructions (though that will help) but just example answers and see if it can predict the pattern.
Try this:
Add 12 + 12: 24
Add 16 + 13: 29
Add 8 + 20:
Combining Prompts
Combining the techniques above into a single prompt could result in even more reliable answers. In a previous article, I mentioned that giving Chat GPT a role would help the AI identify the context of your prompt and give you a better answer. Bearing that in mind, a good prompt could include the following components:
A role and a chain of thought prompt
Answer example 1
Answer example 2
For example, look at or type in the following prompt:
Twitter is a social media platform where users can post short messages called "tweets".
Tweets can be positive or negative, and we would like to be able to classify tweets as
positive or negative. Here are some examples of positive and negative tweets. Make sure
to classify the last tweet correctly.
-----------------------------------------
Q: Tweet: "What a beautiful day!"
Is this tweet positive or negative?
A: positive
-------------------------------------
Q: Tweet: "I hate this class"
Is this tweet positive or negative?
A: negative
-------------------------------------
Q: Tweet: "I love pockets on jeans"
A:
(prompt source: https://learnprompting.org/docs/basics/combining_techniques)
-------------------------------------------------------
Until recently, AGI was not considered to be a possibility for AI for at least a decade - even with the current crop of Large Language Models such as ChatGPT4.6 Though one cannot be entirely sure if ChatGPT4 really does have subtle hints of AGI, its capability for what appears to be some sort of reasoning seems to be moving in an AGI direction much quicker than previously thought.
Creating prompts in the ways described here, should allow you to get more out of ChatGPT and future AI systems in general. Prompt Engineering (the skill of writing prompts) is a new skillset that will undoubtedly be required for most occupations in the future. If you would like more information and techniques on prompting, take a look learnprompting.org
Reminder: do not input any student or employee personal information, or commercially sensitive information, into Chat GPT.
1. Bubeck, Chandrasekran, Elan et al., ‘Sparks of Artificial General Intelligence: Early experiments with GPT-4’ (2023) https://arxiv.org/abs/2303.12712
2. Chain of Thought Prompting https://arxiv.org/abs/2201.11903
3. The current models of GPT are trained using a process called Reinforced Learning with human feedback (RLHF). This process is used to rein in the model from producing harmful, untruthful or biased content.
4. https://neurips.cc/Conferences/2022/ScheduleMultitrack?event=54087
5. https://learnprompting.org/docs/basics/combining_techniques
6. https://longevity.technology/news/will-we-reach-the-singularity-by-2035