Evaluate Pluto's Dialogue

Evaluate Plato's Dialogue

Background

The following prompt tests an LLM's ability to perform evaluation on the outputs of two different models as if it was a teacher.

First, two models (e.g., ChatGPT & GPT-4) are prompted to using the following prompt:

prompt.txt
Plato’s Gorgias is a critique of rhetoric and sophistic oratory, where he makes the point that not only is it not a proper form of art, but the use of rhetoric and oratory can often be harmful and malicious. Can you write a dialogue by Plato where instead he criticizes the use of autoregressive language models?

Then, those outputs are evaluated using the evaluation prompt below.

Prompt

prompt.txt
Can you compare the two outputs below as if you were a teacher?

Output from ChatGPT: {output 1}

Output from GPT-4: {output 2}

Code / API

file.py
from openai import OpenAI
client = OpenAI()
 
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {
        "role": "user",
        "content": "Can you compare the two outputs below as if you were a teacher?\n\nOutput from ChatGPT:\n{output 1}\n\nOutput from GPT-4:\n{output 2}"
        }
    ],
    temperature=1,
    max_tokens=1500,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)

Reference