Nyambura Kiarie, Commercial & Privacy Counsel

Understanding Training Data in Contracts with AI Vendors

KEY TAKEAWAYS:

As contract professionals, it is important to be aware of some of these key data usage issues since they will feed into the wider contractual negotiation on price and expected service levels.
The quality of the Training Data reflects the quality of the AI product, so AI Vendors want Training Data from its customers.
It is important to distinguish Training Data from Customer Data or Personal Data.

Understanding Training Data in Contracts with AI Vendors by Nyambura Kiarie

A new wave of contractual terms are surfacing—those pertaining to data provided by a customer to a service provider that the service provider wants to use to train their AI—aka Training Data. As contract professionals, it is important to be aware of some of these key data usage issues since they will feed into the wider contractual negotiation on price and expected service levels. Further, understanding the importance of data and how it is used to develop products is key to protecting your company data from gratis usage and, at the extreme end, fines caused by privacy violations.

This article explains what Training Data is, how to distinguish it from Personal Data and Customer Data, and how to draft and negotiate terms related to these types of data in agreements with AI Vendors.

Definitions of Different Data Types

First, let us define a few key terms:

“Training Data” is the data used to train an algorithm or machine learning model to predict the outcome you design your model to predict.

“Customer Data” is the data provided by customers while interacting with your service(s).

“Personal Data” is any information that identifies an individual.

Training Data Usage

The quality of the Training Data reflects the quality of the AI product. So, what do artificial intelligence service providers (“AI Vendors”) want more than anything right now? They want quality Training Data.

Training Data can be used to: 1) improve an AI Vendor’s products and tools; and 2) train AI Vendor products and services to provide a more enhanced version of their current products and services to customers.

In addition, Training Data could be in the form of dummy data or real data. Customers should have a clear understanding of what type of data their business considers to be Training Data. For example, a business might be okay with testing the product using dummy data. In this case, the exercise would not involve Customer Data. But if a business requires that the AI Vendor use real data, then the customer’s confidential information, sensitive information, and Personal Data may be considered Training Data.

Contract professionals should scrutinize the contractual terms that accompany the use of AI tools to ensure that: 1) Training Data does not include the customer’s Personal Data without explicit permission from the customer; and 2) ownership rights and permissions to Customer Data are clearly stated and identified. The contractual terms that address the use and treatment of Customer Data are often found in the trial agreement, master agreement, or data processing agreement.

The two main issues that arise from using data for training purposes are: 1) how the customer’s Personal Data is treated upon onward transfer to service providers; and 2) who owns the Training Data.

Personal Data vs. Training Data

With data privacy laws proliferating around the globe, businesses that determine the means and purposes of data processing (controllers) are increasingly being held accountable for how their service providers handle their Personal Data.

As a result of increased privacy regulations, Data Processing Agreements (“DPAs”) are used to outline how Personal Data will be handled by the parties. DPAs have become necessary when negotiating the purchase of software.

In addition to ensuring that DPAs address the requirements needed by applicable privacy laws, contract professionals must ensure that the customer’s Personal Data is not used to train large language models (LLMs) due to the complexities that are introduced when data subjects would like to exercise their data subject rights. For example, if a user of a service or product wanted to eventually have their Personal Data deleted from the model, this would be difficult to operationalize. This is because Personal Data input cannot normally be unlearned. Further, use of Personal Data to train LLMs could also lead to security breaches such as when there are unintended Personal Data leakages.

When reviewing and negotiating agreements with AI Vendors, it is important to distinguish between Customer Data and a customer’s Personal Data. Customer Data encompasses all the data that is provided by customer on the platform and includes a customer’s Personal Data. A customer’s Personal Data is Customer Data that identifies a natural person such as users of the service or product.

Contract professionals should ensure that service contracts have terms that either: 1) require the AI Vendor to apply anonymization or masking techniques to prevent the customer’s Personal Data from being used to train AI Vendor LLMs; or 2) prevent AI Vendors from using a customer’s Personal Data to train their LLMs. Data masking can be achieved by ensuring that the contract has language that requires the AI Vendor to aggregate, de-identify, or permanently anonymize the data used by the AI Vendor to train the LLM in such a way that your Customer Data cannot ever again be used to identify an individual.

Ownership and Rights to Training Data

Usually, the customer owns Customer Data. The Customer should therefore ensure that the contract addresses its ownership of the data by obtaining acknowledgements of its rights in the data from the licensee through stating that the data provided under the agreement is the licensor’s sole and exclusive property.

Right holders can carve out their data rights in contracts. They can do so by adopting a broad definition of Customer Data or by adopting a narrow definition of the same.

By broadening the definition of Customer Data, to capture all the data the AI Vendor collects or receives indirectly or directly from the customer (including its derivatives), contract professionals can ensure that there is no ambiguity about how Customer Data is to be used. Customers should specify that: 1) the AI Vendor can only use Customer Data to provide the services to customer; and 2) the AI Vendor promises that it will not use or attempt to use Customer Data for any other purposes, including the development of LLMs and other similar products.

The customer could also opt to exert more control over the AI Vendor’s data usage by: 1) ensuring that the definition of licensed data is narrow; and 2) reserving the right to obtain additional fees from the AI Vendor for the usage of Customer Data for additional manners of usage including the development of LLMs and other similar products. Such agreements aim at ensuring that the AI Vendor cannot generate any derivative data without paying the customer for it.

Tips for Reviewing AI Vendor Agreements

Here is a list of top 3 contract reviewing tips to consider when working with an AI Vendor that wants Training Data from the customer:

Customer Data should be clearly defined with the aim of protecting the customer data from gratis usage.
Ensure Training Data does not contain customer’s Personal Data. Specifically, ensure that the customer’s Personal Data is not used to train the AI Vendor’s LLMs.
Clarify the terms of the agreement regarding the use and ownership of customer input data (data fed into the AI by users) and customer output data (data produced using the AI in response to inputs or prompts by users).

As more and more AI Vendors seek out free Training Data from their customers, contracts professionals representing the customer should understand the difference between Training Data and other types of data and ensure appropriate contractual protections.

Nyambura Kiarie, Commercial & Privacy Counsel

Nyambura Kiarie is Commercial and Privacy Counsel at AuditBoard and is an experienced privacy, cybersecurity, and technology transactions lawyer who is also an IAPP-certified U.S. and E.U. Data Privacy Professional. Her experience entails building and supporting privacy and cybersecurity programs within organizations and companies with an aim of ensuring that the companies maintain robust compliance programs to differentiate themselves in their respective markets and build their brands by engendering greater trust, loyalty, and cooperation amongst their consumers and customers. Connect with Nyambura on LinkedIn.

About the Author

Nyambura Kiarie, Commercial & Privacy Counsel

Converting a Traditional Software License Agreement to a SaaS Agreement: A Quick Reference Guide

Breaking Down the Differences: Professional Services vs. Software Licensing vs. SaaS Agreements

3 Key Clauses to Negotiate in Hotel Event Agreements

Diligence is the New Link in Supply Chain Agreements

Creating a Useful Supplier Performance Scorecard with Metrics that Matter

From Conflict to Collaboration: Human-Centered Contract Negotiation Strategies

About the Author

Data Privacy Agreements, Sponsored

Data Protection Agreements: Contract Review Tips

Reviewing DPAs is more than a routine task—it's our line of defense in a world where data is

May 1, 2024

Contract Drafting, Data Privacy Agreements

How to Draft Better Business Associate Agreements in Six Steps

Don’t get so focused on meeting regulatory requirements that you lose sight of how the BAA limits your

April 23, 2024

Data Privacy Agreements

From Compliance to Operations: How Data Privacy Laws Will Change Contracts in 2023

Data privacy regulations will significantly impact contracts in 2023, requiring businesses to invest more time and resources into

February 9, 2023

Data Privacy Agreements

How to Draft and Review Three Important Data Security Terms

Contracts professionals should pay close attention to these three defined terms when drafting and reviewing data security terms.

October 26, 2022

Most Recent

Contract Drafting, Technology Agreements

Converting a Traditional Software License Agreement to a SaaS Agreement: A Quick Reference Guide

By making these key adjustments, you can transform your traditional software license agreement into a robust SaaS contract.

July 23, 2024

Contract Drafting, Technology Agreements

Breaking Down the Differences: Professional Services vs. Software Licensing vs. SaaS Agreements

Understanding the key differences between these types of services will help you more accurately draft your business agreements

July 16, 2024

Contract Negotiations

3 Key Clauses to Negotiate in Hotel Event Agreements

Hotel event agreements can seem like low-risk arrangements, but they can have severe financial consequences if they are

July 10, 2024

Contract Drafting, Sponsored

Diligence is the New Link in Supply Chain Agreements

Consider including due diligence clauses in supply chain contracts to mitigate the risk of disruptions and product liability.

July 1, 2024

Nyambura Kiarie, Commercial & Privacy Counsel

Understanding Training Data in Contracts with AI Vendors

Definitions of Different Data Types

Training Data Usage

Personal Data vs. Training Data

Ownership and Rights to Training Data

Tips for Reviewing AI Vendor Agreements

Nyambura Kiarie, Commercial & Privacy Counsel

About the Author

Nyambura Kiarie, Commercial & Privacy Counsel

More Articles

About the Author

Nyambura Kiarie, Commercial & Privacy Counsel

Related Articles

Data Protection Agreements: Contract Review Tips

Reviewing DPAs is more than a routine task—it's our line of defense in a world where data is

How to Draft Better Business Associate Agreements in Six Steps

Don’t get so focused on meeting regulatory requirements that you lose sight of how the BAA limits your

From Compliance to Operations: How Data Privacy Laws Will Change Contracts in 2023

Data privacy regulations will significantly impact contracts in 2023, requiring businesses to invest more time and resources into

How to Draft and Review Three Important Data Security Terms

Contracts professionals should pay close attention to these three defined terms when drafting and reviewing data security terms.

Most Recent

Converting a Traditional Software License Agreement to a SaaS Agreement: A Quick Reference Guide

By making these key adjustments, you can transform your traditional software license agreement into a robust SaaS contract.

Breaking Down the Differences: Professional Services vs. Software Licensing vs. SaaS Agreements

Understanding the key differences between these types of services will help you more accurately draft your business agreements

3 Key Clauses to Negotiate in Hotel Event Agreements

Hotel event agreements can seem like low-risk arrangements, but they can have severe financial consequences if they are

Diligence is the New Link in Supply Chain Agreements

Consider including due diligence clauses in supply chain contracts to mitigate the risk of disruptions and product liability.

Subscribe to our weekly newsletter!

Download PDF