De-Identified Data in SaaS Agreements

Edited for Contract Nerds from Foster’s Newsletter, “Mastering Commercial Contracts.”


KEY TAKEAWAYS:

  • There is an increase in the significance of the right of a SaaS provider to “de-identify” or “anonymize” customer data in its SaaS solution.
  • You can save time and confusion when you understand whether your SaaS agreement needs data to be de-identified or anonymized.
  • The placement of a de-identified clause can change the strength and significance of the rights to the data. 

De-Identified Data in SaaS Agreements by Foster Sayers

Data rights–there is so much to unpack when you think about them in the context of a SaaS Agreement. SaaS providers and customers alike need to understand them because of the regulatory environment that has emerged over the years in the European Union and various jurisdictions across the United States.

Whether you are the Controller or Processor, there are issues to be aware of in data privacy agreements and how they are changing. One data right taking on increasing significance with recent technology advancements is the right of a SaaS provider to “de-identify” or “anonymize” customer data in its SaaS solution.

SaaS companies have a lot of valuable data in their systems. The data can be used by SaaS companies in applications for benchmarking and analytics, and now to develop LLMs for purpose-built generative AI applications.

However, using existing customer data raises concerns about confidentiality of information and privacy. To address those concerns, SaaS companies de-identify customer data, or in some cases anonymize the data (more on the distinctions later). But they first must have the right to do so in their SaaS Agreements.

Distinctions between De-identified Data and Anonymized Data

The biggest risk in a SaaS provider using de-identified customer data is re-identification. Assessing that risk requires understanding both the use case and what process a SaaS provider follows to remove identifying information from the data. How risky the use case is will be the lens for analyzing the de-identification process. If the use case is high risk and you want to ensure against any risk of re-identification, then you can discuss employing certain techniques like k-anonymity and differential privacy to add noise or aggregate data (see embedded links if you want to geek out and go deep on these techniques).

Before reviewing contract language examples and best practices, let’s spend some time clarifying the underlying concepts around de-identified data and its cousin that it is often confused with, anonymized data.

De-Identified Data

De-identification involves removing (or sometimes altering) personally identifiable information (“PII”) from a dataset so that it is not readily traced back to an individual. The de-identification process typically involves removing obvious identifiers such as names, addresses, and Social Security Numbers. However, certain indirect identifiers, like ZIP codes or birth dates, might still be present. In fact, as de-identified data is often used for benchmarking purposes, ZIP codes and birth dates are likely to remain in the dataset (for example, benchmarking “X” by a population of a certain age in a certain ZIP code).

De-identified data may not be readily traced back to an individual, but it might still carry some risk of re-identification if it is combined with other available data sources. This is a critical point to understand and where the discussion around de-identified data gets tricky.

When applicable, “De-identified Data” should be a defined term in the SaaS Agreement. Make sure the definition matches the concept.  Here is an example definition:

De-Identified Data” means information that has been compiled and modified by Company so that it does not include (i) any personally identifiable information of any individual; or (ii) the identity of any other entity.

Before landing your definition, make sure you know what underlying data is being de-identified and if it is regulated. For example, in the case of de-identified data that is sourced from protected health information (“PHI”), HIPAA Rules set forth what identifying data must be removed from the data set for it to be considered de-identified. Be sure any definition and actual de-identification practice conforms to the applicable laws.

Anonymized Data

De-identified data is different from anonymized data. Anonymization makes it practically impossible to re-identify individuals within the dataset, even in combination with external data sources. Anonymization methods might involve generalization, suppression, or data masking to ensure that the data cannot be linked back to specific individuals. When data has been properly anonymized then privacy protection should be ensured.

While de-identified data might still carry some risk of re-identification, properly anonymized data should have minimal to no such risk.

When the risk of re-identification is great and the parties agree that the SaaS provider will anonymize the customer data, “Anonymized Data” should be a defined term in the SaaS Agreement. Here is an example definition:

Anonymized Data” means information that has been compiled and modified by Company in such a way that a data subject is not or no longer identifiable and cannot be re-identified when combined with other data.

Make sure the definition matches the concept. To do that, you’ll need to be familiar with anonymization methods. Make sure the data will either be generalized to an extent that it is not a data point about an individual, will suppress identifying information so it is not pulled into the data set, or will be masked by a process like associating it with a unique identifier before being included in the data set.

Best Practices

SaaS Agreements do not have a lot of consistency in how they address de-identified data rights because it is a concept that varies greatly in its importance and application among various SaaS offerings. It can be addressed in a section on Data Rights, and it can appear as a carve-out in the section on Confidentiality, but I prefer the practice of addressing it as a subsection in the section on Intellectual Property.

Best Practice Example

X. Intellectual Property

X.3         De-Identified Data.  Company shall have the right to access, compile, and aggregate information supplied by Customer, including Customer Data, into De-Identified Data. Company shall own all De-Identified Data.  Company may use or distribute such De-Identified Data for any lawful purpose, including without limitation, analytics, benchmarking, and research purposes.

A. Definitions

A.Y “De-Identified Data” means Information that has been compiled and modified by Company so that it does not include (i) any personally identifiable information of any individual; or (ii) the identity of any company, trade group, or any other entity.

Best Practice Example Breakdown

Addressing de-identified data in the Intellectual Property section is the best practice because it affords the SaaS company the most rights to the de-identified data. While a perpetual license is not the worst fallback position, in such case make sure the license is transferable. The justification for SaaS company ownership is that the customer’s data has been transformed into a new dataset using proprietary methods.

The example definition for “De-Identified Data” is a best practice as it clearly states what identifiable information is going to be excluded from the de-identified data. It addresses identifiable information from both the standpoint of individuals and corporate entities. This is best practice in a business-to-business context as there will likely be concerns about both privacy issues and proprietary information.

Whether you represent a SaaS company or a SaaS customer, be certain you understand the differences between de-identified data and anonymized data. If concerns about re-identification are high, then push for anonymized data both in definition and in process.

Our world is only becoming more data-driven, so SaaS companies need to be thinking about how they use data in their systems. If you represent a SaaS company and do not currently address de-identified data in your agreement, then it’s time to have a conversation with your internal business partners. If they are not currently leveraging de-identified customer data, they are likely planning to one day. Legal cannot be the last to know as the terms to permit such activities need to be in place with customers before any such initiatives can begin.


For more expert tips about SaaS Agreements:

About the Author

More Articles

About the Author

One Response

  1. “While a perpetual license is not the worst fallback position, in such case make sure the license is transferable. ”

    Why is it important to make sure the license is transferable?

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles

Most Recent

Follow Contract Nerds

© 2022 Contract Nerds United, LLC. All rights reserved.
The opinions expressed throughout this website are not intended to provide legal advice or create an attorney-client relationship.

Subscribe to our weekly newsletter!
By subscribing to our newsletter, you agree to our Terms of Use and Privacy Policy. We promise not to spam you!
Contract Nerds Logo

Download PDF

[download id='9545']