Entertainment and Media Guide to AI: Geopolitics of AI- Text and … – Lexology

Copyright is a territorial beast and not all countries are equal in how they have decided to approach the TDM debate.

The U.S. apprehends TDM through its doctrine of “fair use,” that permits limited use of copyright protected material without having to first acquire permission from the copyright holder – in particular where the contemplated use is deemed “transformative.” Japan enjoys a flexible copyright exception for “non-enjoyment” purposes. Other countries, such as Singapore, South Korea, Malaysia, Israel and Taiwan, have adopted similar rules across the globe, with the firm intention of removing uncertainties for their tech industries and positioning themselves in the AI race, unencumbered.

The EU followed suit, albeit with a much shallower version of the exception as far as businesses are concerned. The Directive on Copyright in the Digital Single Market, adopted in 2019 (the Copyright Directive) introduced two mandatory exceptions under EU copyright law: (i) one for research and cultural organizations to conduct research; and (ii) another one available to any type of beneficiaries for any type of use, but with a significant caveat – it may be overridden by “opt-out,” a concession to rightsholders introduced during the very last stage of the Copyright Directive’s adoption process and which is fraught with practical difficulties.

The UK did not transpose the Copyright Directive and may find itself unable to legislate on TDM for some time as a result of the charged atmosphere which seems to have permeated this issue, at home and elsewhere. Meanwhile, the UK could find itself in a Catch-22 situation, with a desire to encourage its AI sector, yet strict copyright rules with very limited scope for data extraction without a license. That is, of course, unless the UK, now free from its EU shackles, decides to reinvent the meaning of its “fair dealings” exception…

Key takeaways

  • While certain jurisdictions, such as the European Union, work on developing a regulatory landscape around AI, other jurisdictions, such as the United States, rely on industry-specific guidelines (at both the federal and state level) in the absence of comprehensive legislation.
  • Certain jurisdictions, such as Singapore, quickly adapted to generative AI by creating an exception permitting AI companies to reproduce copyrighted works for training purposes.
  • In the United States, in the absence of a TDM exception, AI companies contend that inclusion of copyrighted materials in training sets constitute fair use eg not copyright infringement, which position remains to be evaluated by the courts.

In the following articles, we look at how various jurisdictions have approached the text and data mining debate.

The United States

Copying copyright protected content for TDM purposes

In the United States, the reproduction right is reserved for the copyright owner of a work or its licensees under section 106 of the U.S. Copyright Act of 1976. While there are no express exceptions in U.S. copyright law, section 107 of the Copyright Act authorizes the fair use of a copyright protected work, “including by reproduction in copies or phonorecord or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching […], scholarship, or research.” Copying copyright protected works for the sole purpose of text and data mining has traditionally been considered a case of fair use by the technology sector. The creative sector disagrees, and the launch of generative AI solutions capable of producing photos, paintings and music at the push of a button has seen copyright holders rally behind the “unfair use” banner to condemn the use of their content by AI businesses.

What is fair use?

To determine whether the use of a copyright protected work without the consent of the copyright owner constitutes non-infringing fair use, courts will balance the following four factors on a case-by-case, highly fact-specific inquiry basis:

(1) The purpose and character of the use, including whether the use is of a commercial nature or is for non-profit educational purposes;

(2) The nature of the copyright protected work;

(3) The amount and substantiality of the portion used in relation to the copyright protected work as a whole;

(4) The effect of the use upon the potential market for or value of the copyright protected work.

The first factor. The first factor, also known as the “transformative use factor,” is generally the most heavily weighted by the courts. A use is transformative, if it merely supersedes the existing work, or, to the contrary, if it adds something new, with a further purpose or different character, altering the first work with new expression, meaning or message. Even if a work is copied and stored in substantially the same form as the original without meaningful alteration, it does not preclude the use from being considered transformative in nature, so long as the use by the would-be copier serves a materially different function than the original work.

Some examples where courts have found a use to be transformative include making digital copies of student papers to use an anti-plagiarism software (where the defendant’s use of the works was unrelated to such works’ expressive content), or scanning books to create a full-text searchable database and public search function (in a manner that did not allow users to read the texts). While educational and non-commercial uses are generally more likely to be decided to be fair use, courts will not necessarily find a commercial use to be unfair and will instead balance the purpose and character of the use against other factors.

Copies of original works made for TDM purposes appear to have a purely functional purpose, namely, to teach an AI model about the underlying characteristics of a work through pattern recognition. Copies of original works made for TDM purposes are never released or made available to the public, hence it would appear that their transformative nature is on par with existing case law.

The second fair use factor examines whether the reproduced content is factual in nature, in which case it is entitled to a lower level of protection in an attempt to encourage the spread of scientific or educational works for the public’s benefit. While the reproduction of nonfactual, creative works such as images or sound recordings is less likely to satisfy this factor, the second factor has been considered by courts to hold little weight in the fair use balance and is rarely found to be determinative.

The third fair use factor assesses whether the quantity and significance of the portion of the work reproduced is justifiable, considering the intent of the copying. Even though using an entire image, sound recording or other creative work may seem contradictory to fair use, it does not necessarily preclude the possibility of such a ruling.5 Importantly, the factors should not be scrutinized in isolation but should be weighed collectively. In this regard, the fourth factor, which along with the first factor is generally given the greatest weight by the courts, could tip the balance towards a fair use finding.

The fourth factor examines whether the copy brings to the marketplace a competing substitute for the original work or if it diminishes the original work’s value by serving as an alternative that potential buyers might prefer. More generally, in order to be deemed fair, the use should not negatively impact the market (or the potential market) or the value of the original copyright protected work by serving as a viable substitute. As copyright is a commercial right intended to protect the ability of authors to profit from their work, this factor is often influential in a fair use analysis. The interrelation between the fourth and first factors is crucial: the more the new work serves a different purpose than the original work (the first factor), the more unlikely it is that the second work will serve as a market substitute for the original work (the fourth factor).

Should the long-term purpose of the TDM operations be considered by the courts when assessing the fairness of the practice? Should the court’s fair use analysis differ based on the type of AI model being trained (generative or not)? These questions are highly topical and, for a large part, they hinge on the US courts’ response to a small number of highly visible lawsuits which the drafters of this guide will watch closely.

European Union

Text and data mining in the EU – a tale of two exceptions

Directive (EU) 2019/790 of 17 April 2019 on copyright and related rights in the Digital Single Market (Copyright Directive) created two TDM exceptions: one for research and another for everyone else.

TDM for research

The machine reading problem and its copyright law implications is not new. Studies dating back from 2012 and 20149 had long identified the potential copyright headache that might stem from it and the world of research, in particular, had been calling for lawmakers to tackle the issue EU-wide.

The initial version of the Copyright Directive, published by the European Commission in 2016, answered this call by setting out one exception, under article 3, for TDM by research organizations and cultural heritage institutions. The exception was, however, limited to the purposes of scientific research making the exception largely inapplicable for commercial purposes.

TDM for any purpose

The second TDM exception, extending the scope of article 3 to everyone else, was added just a few weeks before the text’s adoption date, under the pressure of the tech sector whose future was hanging in the balance, absent the provision. A compromise, however, had to be reached with the rightsholders and it materialized in the form of a significant caveat: the ability for copyright holders to opt out of that exception and expressly reserve such use of works to themselves.

Challenges ahead

The “TDM for any purpose” exception is in principle quite broad, but subject to the very significant opportunity given to rights owners to opt out and expressly reserve such use of works to themselves. The caveat allowing rights owners to opt out is significant and could potentially place a considerable burden on businesses that would arguably need to verify, each time a training set needs to be copied, whether owners of the underlying copyright-protected material have opted out or not. Otherwise, businesses could inadvertently be infringing copyright.

How can one exercise its opt-out? On this topic, the Copyright Directive is somewhat unclear. It provides that a rights owner may only reserve those rights by the use of machine-readable means, including metadata and terms and conditions of a website or a service, and should be able to apply measures (e.g., technical measures) to ensure that their reservations in this regard are respected. This raises many questions including with regard to: (i) the exact manner in which the opt-out must be expressed; (ii) at what point the TDM user needs to check on whether the opt-out has been exercised (e.g., at the time when it first accesses the data, or on a continual basis?); (iii) who bears the burden of proof as between the rights owner and the user (bearing in mind the difficulty a user will have in “proving a negative,” i.e., that the opt-out right has not been exercised); and (iv) how to determine the period of permitted retention, among other things.

The European TDM exceptions are likely to provide a contrasting level of protection to businesses, depending on the type of data they use. If the data being used is likely to belong to the most traditional areas of the content industry, then these exceptions may provide little support for use in commercial AI applications. The geopolitical context thereby created is one in which other jurisdictions have positioned themselves favorably in the race to become global centers for TDM and AI development through their more developed copyright exceptions.

United Kingdom

The UK was among the first countries to introduce a TDM exception, in 2014. This remains in force in section 29A of the Copyright, Designs and Patents Act 1988. It permits the making of a copy of a work in order to carry out “computational analysis,” but is limited to the sole purpose of research for a non-commercial purpose. Nonetheless, it is worth noting that the exception does not distinguish between public and private research, and the exception is mandatory and cannot be limited by contract.

In 2021, the UK published its National AI Strategy, based on three key pillars: “(i) investing in the long-term needs of the AI ecosystem; (ii) ensuring AI benefits all sectors and regions; and (iii) governing AI effectively.” Part of this strategy involved launching a consultation on copyright and specifically the extent to which measures should be implemented to facilitate the use of “copyright protected material in AI development.”

Shortly thereafter, the Intellectual Property Office (IPO) launched its AI and Copyright consultation, which included a number of questions focusing on “licensing or exceptions to copyright for text and data mining, which is often significant in AI use and development.” The outcome of the consultation was published by the IPO in June 2022 and made headlines for the strong signal it sent to the AI sector by championing the creation of a copyright exception to permit the extraction, without a license, of non-protected facts and data from lawfully accessed content protected by copyright.

But the UK creative sector had other ideas. Publishers, visual artists and the music sector voiced their concerns that they may need to “exit the UK market or apply paywalls where access to content is currently free,” because the proposed exception would prevent rights holders from licensing or receiving payment for the use of their data and content. The government was quick to listen, and in February 2023, it confirmed that it “will not be proceeding with the proposals.”

No further plan was announced, and no mention was made of exploring alternative options, leaving the UK isolated on the AI geopolitical map and, paradoxically, with one of the strictest legal frameworks for AI across the globe.

Singapore

The Singapore Copyright Act (SCA) provides for a fair use exception modelled after the fair use provisions in U.S. copyright law. Under the fair use exception, whether the use of a copyright-protected work qualifies as a non-infringing fair use is assessed according to a number of factors including, the purpose and character of the use and the effect on the market for the work.

This fair use exception, if helpful, was nevertheless deemed too unpredictable and a specific exception was called upon to provide more certainty to the sector, in light of the increasing importance of TDM as a research and training tool for the AI sector.

Following a public consultation which began in 2016, the same year the Copyright Directive was published in the EU, proposed amendments for a new exception to copyright for use of works for text and data mining (TDM) were adopted by Parliament in 2021 and enacted in November 2021.

Under section 243 of the Copyright Act, “computational data analysis,” in relation to a work or a recording of a protected performance, includes:

“(a) using a computer program to identify, extract and analyze information or data from the work or recording; and

(b) using the work or recording as an example of a type of information or data to improve the functioning of a computer program in relation to that type of information or data.”
This computational data analysis exception extends to communicating the work to the public and publication of the work.

The following five conditions, set forth in the Copyright Act section 244(2) apply:

“(a) the copy is made for the purpose of computational data analysis; or preparing the work or recording for computational data analysis;

(b) the party does not use the copy for any other purpose;

(c) the party does not supply (whether by communication or otherwise) the copy to any person other than for the purpose of (i) verifying the results of the computational data analysis carried out by him; or (ii) collaborative research or study relating to the purpose of the computational data analysis carried out by him;

(d) he has lawful access to the material (the first copy) from which the copy is made; and

(e) one of the following conditions is met:

(i) the first copy is not an infringing copy;

(ii) the first copy is an infringing copy but –

(A) the party does not know this; and

(B) if the first copy is obtained from a flagrantly infringing online location (whether or not the location is subject to an access disabling order) – the party does not know and could not reasonably have known that;

(iii) the first copy is an infringing copy but –

(A) the use of infringing copies is necessary for a prescribed purpose; and

(B) the party does not use the copy to carry out computational data analysis for any other purpose.”

The Singapore courts have not decided if works solely produced by generative AI (as opposed to generative AI assisting human creators) will receive copyright protection. However, it is likely that the courts will not grant copyright protection to works solely produced by generative AI. Doing so will discourage “stockpiling,” where generative AI is exploited to create works in bad faith and profit from inflated sales to creators

Source link

Source: News

Add a Comment

Your email address will not be published. Required fields are marked *