Insights

Insights


Latest News

    Trending Topics

      Futures

      Products


      Brand Protection

      IP Intelligence

      Litigation Analysis

      Case Management

      Nunc Orci


      Products Case Studies

      People

      Careers

      About

      Announcements

      • About Us
      • The Rouse Network
      • The Rouse Difference
      • Rouse Connect

      Grass Roots

      • Climate Change
      • Mitrataa
      • Rouse Cares

      ClientWEB

      Thank You

      Your are now register subscriber for our Rouse

      Balancing Data Scraping and IP Rights in the Age of AI

      Published on 10 Dec 2024 | 5 minute read
      AI's rapid growth sparks global IP debates on data scraping, copyright, competition, and enforceable safeguards.

      The rapid development of artificial intelligence (AI) and its reliance on vast datasets has raised critical questions about the balance between innovation and intellectual property (IP) rights. As AI-generated content (AIGC) grows, concerns over data scraping—often essential for AI training—are leading to debates about copyright infringement, unfair competition (in countries such as China), enforceability of website terms of use, and technological safeguards.

      So, what are the issues at stake and what can content owners and AI developers do to manage the risk surrounding them?

       

      Scraping and Copyright Infringement

      At its core, data scraping involves extracting large volumes of information from websites, often through automated bots. While browsing by humans inherently implies a license to view and reproduce content on their devices, this implied license does not extend to bots performing large-scale scraping. This distinction often forms the basis for alleging copyright infringement against unauthorized scrapers.

      Despite this, the question arises whether such scraping can be defended under the doctrine of fair use or fair dealing. However, these defences are often limited or absent in many jurisdictions, leaving the issue unresolved.

       

      Other Rights and Interests Related to Data

      Apart from copyright, there could also be other types of rights or interests entitled to data. Taking China as example, if a dataset is collected and produced to bring monetary benefits to the owner, others’  who simply scrape the dataset would unfairly harm the interests of the dataset owner without any justifiable grounds.There is also the possibility that the scraping and follow up use of the dataset may be deemed as unfair competition activities according to the Anti-Unfair Competition Law.

       

      Enforceability of Website Terms of Use

      Website owners frequently use express terms of use to regulate access, including prohibitions on scraping. These terms, when legally enforceable, can form the foundation of contractual claims against scrapers. For example, a European case involving airline Ryanair upheld terms of use as an enforceable contract, leading to a ruling against a price comparison platform that violated these terms.

      However, the effectiveness of such enforcement remains limited. Quantifying damages caused by scraping is challenging, and pursuing litigation across jurisdictions is resource-intensive. Strengthening the prominence and clarity of website terms of use may improve enforceability and provide a stronger deterrent.

       

      The Role of Technological Protection Measures

      Technological Protection Measures (TPMs) and Digital Rights Management (DRM) systems serve as safeguards against unauthorized data access and tampering. These measures include anti-crawling mechanisms, such as systems that differentiate human browsing from bot activity. For example, Getty Images successfully traced copyright infringement in a case involving Stable Diffusion by relying on watermarked content embedded in its dataset.

      Yet, these measures are not foolproof. Techniques like data cleaning, often used during AI training, can remove watermarks or other identifiers, making it harder to trace or prove infringement. Moreover, identifying the individuals or entities responsible for scraping often requires court-ordered discovery actions, which can be hindered by legal and jurisdictional challenges.

       

      Policy and Legal Frameworks Across Jurisdictions

      Legal certainty varies widely across countries, influencing the balance of power between content owners, data centres, and AI developers. Singapore, for instance, offers legal clarity that facilitates enforcement against data centres hosting scraping activities. Conversely, jurisdictions like Indonesia, which lack fair use defenses and recognition of clickwrap agreements, present challenges in proving and addressing copyright infringement.

       

      Data Centres/Cloud Services Liability Impact on AI Developers

      The new reporting requirements from the US Department of Commerce for AI developers aim to enhance oversight and national security by mandating detailed disclosures about AI model development, cybersecurity measures, and testing outcomes. This could lead to increased operational costs as companies invest in compliance resources and modify processes to meet reporting standards.

      Also in another developing area, a data centre has been sued for enabling copyright infringement - this is probably a tactic when the data centre user cannot be identified.

       

      Recommendations for Stakeholders

      1. For Content Owners:
        • Conduct audits of website terms of use, TPMs, and DRM systems.
        • Ensure these measures are prominently displayed to provide early notice of restrictions.
        • Monitor for scraping activities and act promptly to mitigate damage.
      2. For AI Developers:
        • Assess the legal risks associated with the data used for training.
        • Review contracts with cloud service providers to address data retention and liability concerns.
        • Evaluate the location of data centres/ cloud service providers to consider impact of local legal framework in relation to local compliance, enabling infringement legal theory,  litigation disclosure rules.
        • Develop internal policies to mitigate reliance on contentious data sources.

       

      Conclusion

      The tension between protecting IP and fostering AI development highlights the need for clearer legal frameworks and proactive measures. While no country outright opposes AI innovation, the degree of legal certainty they offer significantly impacts stakeholders. Striking a balance between innovation and rights protection is key to ensuring the sustainable growth of AI technologies.

      30% Complete
      Principal, Partner at Lusheng Law Firm
      +86 10 8632 4100
      Principal
      +62 21 769 7333
      Principal, Partner at Lusheng Law Firm
      +86 10 8632 4100
      Principal
      +62 21 769 7333