JOIN OUR FREE WEBINAR/CLE: How to Draft, Review, and Negotiate Data Privacy Agreements on Aug 15th

Laura J. Greenberg, General Counsel

What are Five Tips to Improve the Reliability of ChatGPT Responses?

Disclaimer: Do not include confidential, personal data, proprietary, or privileged information in prompts for generative AI products. I am using ChatGPT4.

KEY TAKEAWAYS:

Reliability is the biggest concern amongst contracts professionals contemplating use of ChatGPT.
ChatGPT breaks down its own reliability using multiple factors (table below).
There are things we can do to improve our input that will help improve reliability of the output.

The rise of using AI tools like ChatGPT in the legal industry is evident when prominent law firms, like Orrick, Herrington & Sutcliffe, begin offering courses in prompt engineering to their summer associates. For those of us who are not summer associates at Orrick, we can learn through using generative AI tools and from sharing our learnings with one another.

In a poll during the latest Contract Nerds webinar, “Using ChatGPT to Draft and Negotiate Contracts”, 44% of the respondents identified “reliability” as their biggest concern when considering ChatGPT for their contracting needs. Reliability is a concern of mine as well.

Learn More: Contract Nerds on LinkedIn: Real data about contracts professionals and ChatGPT

Before looking at ChatGPT-4’s reliability, it’s important to understand how ChatGPT works.

What is ChatGPT?

ChatGPT is a probability-based generative AI model. This means it predicts the next sequence of text based on the prompt or input it receives from the user. While the intricate technical details and the entirety of its training dataset remain proprietary to OpenAI, a few key points about ChatGPT are publicly known:

it was not specifically fine-tuned for legal matters;
its knowledge is based on data up to September 2021, meaning it is unaware of developments after that date; and
it cannot retrieve real-time data like search results from Bing or Bard.

How Does ChatGPT Generate Responses?

ChatGPT is a digital conversationalist that crafts responses based on patterns discerned from its training data and fine tuning from human engineers. When ChatGPT “hallucinates” case law or facts, it is working as it intended. These answers are not arbitrary. They are generated because they align with patterns it has observed in its training. This is why it is crucial to understand how to properly prompt ChatGPT and firms like Orrick have invested in their associates learning this skill.

How Does Input Influence ChatGPT’s Output?

At this point, I hope we can all agree that any output from ChatGPT needs to be vetted and verified. But how else can we ensure that what we input into ChatGPT yields reliable results?

To demonstrate how input impacts ChatGPT-4’s answers, I posed a simple question to ChatGPT about its reliability in three distinct ways:

How reliable are you?
Can I rely on your responses?
Rate the reliability of your answers.

The answers clearly demonstrate that input impacts the output. Yet, while the answers are varied in their presentation, they have overlapping information and themes. By breaking down the content (with the help of ChatGPT), we can identify the patterns that offer insights into ChatGPT’s self-perception and the factors influencing its responses (see the chat here).

Learn More: How to Use ChatGPT to Create Contract Negotiation Tables

Here are ChatGPT’s responses:

How reliable are you?	Can I rely on your responses?	Rate the reliability of your answers.
The reliability of my responses depends on several factors: Training Data: I was trained on a vast amount of data from the internet up to September 2021. The quality and accuracy of my responses largely depend on the quality of this data. Question Ambiguity: Clear and specific questions are more likely to receive accurate answers. Vague or ambiguous questions may lead to answers that are less precise or off-target. Knowledge Cutoff: My knowledge is current only up to September 2021. I’m unaware of events, developments, or publications after that date. Complexity: For well-understood topics or straightforward factual questions, I’m generally quite reliable. For more nuanced or debated topics, the answer might be more generalized or based on prevailing views up to 2021. Bias and Neutrality: While I aim to provide neutral and unbiased information, I reflect the biases present in the data I was trained on. It’s always a good idea to consult multiple sources when making important decisions based on the information provided. Iterative Interaction: Sometimes, it might be beneficial to ask follow-up questions or rephrase your initial query to get the most accurate answer. It’s essential to approach the information I provide with a critical mind, corroborate it with other trusted sources when necessary, and be cautious when making decisions based solely on my responses.	That’s a good question. Here are a few things to consider: Accuracy: I’ve been trained on a wide range of sources and can provide accurate information on a variety of topics up to my last training cut-off in September 2021. However, I’m not infallible. Always verify critical information from trusted and authoritative sources. Objectivity: While I strive to be objective, I’m a product of the data I’ve been trained on. This means I can reflect the biases present in that data. It’s essential to approach any answer with a critical mindset. Limitations: I don’t have real-time or post-September 2021 data. For the most current events, news, or recent scientific discoveries, you’ll need to consult up-to-date sources. Interpretation: Some questions, especially those of a subjective nature, can have multiple valid answers. I provide answers based on patterns in the data I’ve been trained on, but there’s always room for interpretation. Safety and Ethics: While I try to provide safe and ethical answers, it’s essential to use common sense and discretion, especially in sensitive situations. In summary, I’m a useful tool for general information, brainstorming, and many other tasks. However, for critical decisions or up-to-date information, always consult additional sources and use your judgment.	Rating the reliability of my answers depends on the context: Factual Information: For factual information up to my last training cut-off in September 2021, I’m generally quite reliable. I’ve been trained on a diverse range of sources, and I can recall facts with high accuracy. Examples include scientific concepts, historical events, and general knowledge. Opinions and Subjective Topics: When asked about opinions, personal preferences, or subjective topics, I don’t have personal beliefs or emotions, but I can provide an overview based on the data I’ve been trained on. The reliability here is more about how well I can summarize or represent the diverse viewpoints in my training data. Predictions and Future Events: I can’t predict the future. Any statement about events after 2021 or speculations about future occurrences are based on patterns and knowledge up to 2021 and should be taken with caution. Problem Solving and Computations: For computational tasks, programming, math problems, etc., I’m generally very reliable, assuming the problem is well-defined and within the scope of my capabilities. Advice: While I can provide information or suggestions based on my training data, any advice should be taken as one of many possible perspectives. It’s always good to consult with human experts or consider multiple sources before making important decisions. Recent Information: I won’t have data on events, publications, or developments after September 2021, so I can’t be relied upon for post-2021 updates unless provided by the user. In all cases, it’s crucial for users to critically evaluate the information I provide, consider the context, and verify from trusted external sources when necessary.

Five Tips to Increase Reliability of ChatGPT Responses

From the themes highlighted by ChatGPT combined with my use of the tool, I’ve created these five tips to keep top of mind when drafting prompts to increase the reliability of ChatGPT’s answers:

Include Background Information: ChatGPT can recognize and define legal terms that were widely used in its training data. But this understanding is based on patterns in the data rather than a deep, human-like comprehension. While it can regurgitate definitions and context, it doesn’t “understand” in the way that a trained legal professional does. When drafting a prompt, include additional background information to provide greater context.

Indicate Jurisdiction Preferences: The meaning, history, and context of legal terms can vary significantly between jurisdictions. For example, a term in U.S. law might have a different interpretation in UK law. ChatGPT’s responses might be influenced by the predominant usage in its training data (likely the U.S.), which could skew towards certain jurisdictions (like the U.S.). When drafting a prompt, indicate the jurisdiction and provide additional context to deepen the understanding.

Provide the Definition: Some legal terms might have both a “layperson” meaning and a specific legal definition. The context in which a question is asked can influence which definition ChatGPT selects. Currently, we don’t know which prompts will yield which definition. When drafting a prompt indicate that this is a legal matter and provide the definition to ensure the right definition is being used.

Cross-Check: Be aware that ChatGPT might reflect biases present in its training data, which could inadvertently introduce bias into its answer. Always approach ChatGPT answers with a critical mindset and cross-check with other sources to counteract potential biases.

Simplify Questions: ChatGPT is reliable for straightforward factual questions. However, complex, debated or nuanced topics might yield more generalized answers. For accurate answers, ask clear and specific questions. Vague or ambiguous questions tend to result in more generalized, less accurate responses. Keep in mind that ChatGPT’s answers are pattern-based, and subjective questions might have multiple valid interpretations. Break down complex questions into simple and straightforward prompts. Then have ChatGPT help you reassemble the individual analysis.

Learn More: Biggest concern with using ChatGPT with contracting tasks

So, How Reliable is ChatGPT?

It’s crucial to approach ChatGPT with an informed perspective. Although ChatGPT can produce reliable and factual information, it is neither a search engine nor a legal research tool. While it can provide valuable insights, suggestions, and information on a wide range of topics, its reliability (especially in niche areas like law) should be evaluated each time it is used. Use the five tips above to improve ChatGPT’s reliability.

ChatGPT has immense potential for a wide range of tasks, including contract drafting. But it is a complement to, not a replacement, for the expertise and judgment of legal professionals.

For more expert tips about using ChatGPT with contracts:

Read more articles in the author’s guest column Chatting Contracts with GPT
Follow Laura on LinkedIn and reach out to her if you want customized trainings for your legal team
Subscribe to the Contract Nerds newsletter for weekly tips about contracts straight to your inbox ????????

Laura J. Greenberg, General Counsel

As a California-licensed attorney and American living in Copenhagen, I’ve found a niche helping Danish startups and scale-ups expand into the American market. Currently, I serve as the interim General Counsel for Worksome, which is an external workforce management platform. On LinkedIn, I share and discuss strategies to optimize the performance of legal counsel, legal teams, and legal operations.