I'm a little late coming to the party, but there has been a great deal of discussion on the net of the recent lawsuit against the Internet Archive, in particular, accusing the Wayback Machine of allowing a law firm to access archived copies of a website, which shouldn't have been available according to Archive.org's policies regarding voluntary compliance with a robots.txt file. The Archive's policy is to not allow access to archived webpages if the URL includes an exclusionary robots.txt file. This policy is applied retroactively, so that if a webpage didn't have a robots.txt file when archived, access to the archived file will be prohibited if the current version of the page has the exclusionary request.
Reportedly, a law firm interested in the archived version of pages was able to get around the Archive's policy through the simple expedient of repeatedly requesting the documents until the system, for whatever reason, spit them up.
What has gotten many commentators interested is that the anti-circumvention provisions of the Digital Millennium Copyright Act have been invoked against the law firm that accessed the files by repeatedly requesting them from Archive.org.
Read the (massive download) complaint here: Healthcase Advocates v. Harding Complaint [PDF].
Some of the commentary here:
The first important thing to note about the case is that the Wayback Machine is not being sued for violation of the DMCA itself, but only for a variety of contract-based claims, for not taking care to ensure that their robots.txt voluntary compliance actually did what they claimed was the policy.
The only party that is alleged to have violated the anti-circumvention provisions of 17 USC 1201(a)(1) are the lawyers who actually accessed the archived pages. The key element of this charge is here:
44. The denial text string in the robots.txt file on the computer server hosting the www.healthcareadvocates.com web site effectively controlled access to the archived historical content of the www.healthcareadvocates.com web site through the Wayback Machine at www.archive.org.
Nope. According to
Wikipedia, Robots.txt:
is purely advisory, and relies on the cooperation of the web robot, so that marking an area of your site out of bounds with robots.txt does not guarantee privacy. Many web site administrators have been caught out trying to use the robots file to make private parts of a website invisible to the rest of the world. However the file is necessarily publicly available and is easily checked by anyone with a web browser.
Pretty open and shut case. Robots.txt is not a technological measure under the DMCA. If it were, then all you'd have to do is slap a little text into any file saying something along the lines of "don't copy this" and suddenly it would be technological control measure. This would also pretty clearly eviscerate 17 USC 1201(c)(3), the no technological mandates provision:
Nothing in this section shall require that the design of, or design and selection of parts and components for, a consumer electronics, telecommunications, or computing product provide for a response to any particular technological measure, so long as such part or component, or the product in which such part or component is integrated, does not otherwise fall within the prohibitions of subsection (a)(2) or (b)(1).
The fact that the Internet Archived voluntarily agreed (perhaps even contracted) to comply with the robots.txt file doesn't change a thing. Either something is a technological measure under the DMCA or it is not. It can't be one thing for one party and something else for another. Either it meets the definition under 1201(a) or 1201(b), and is applicable to all parties, or it does not. Otherwise, you could see collusion between two parties to make "don't copy this file" into a technological measure under the DMCA, at least for their purposes.
You can't have a DMCA violation if you don't have a technological control measure. So, if the plaintiffs want to maintain a DMCA violation they're going to have to base it on something other than a robots.txt file.
1. Seth Finkelstein on July 19, 2005 01:28 PM writes...
"It can't be one thing for one party and something else for another."
Sure it can, why not?
It can be an *advisory* for a webcrawler, but a *permission mechanism* for the Internet Archives. That's already true.
The charge doesn't hold, in my view, because the permission mechanism does not meet the standards of DMCA section 1201, as I argue in my post. But it certainly is used as permission mechanism by the Internet Archives
Permalink to Comment2. Seth Finkelstein on July 19, 2005 01:31 PM writes...
Clarification: The clause "... but a *permission mechanism* for the Internet Archives." should be parsed as "The Internet Archives can themselves choose to employ the file for their own purposes as a permission mechanism".
Permalink to Comment3. Fred von Lohmann on July 20, 2005 06:02 PM writes...
What Seth said. I said much the same thing in a comment on Patry's blog, here.
Permalink to Comment