I'm a little late coming to the party, but there has been a great deal of discussion on the net of the recent lawsuit against the Internet Archive, in particular, accusing the Wayback Machine of allowing a law firm to access archived copies of a website, which shouldn't have been available according to Archive.org's policies regarding voluntary compliance with a robots.txt file. The Archive's policy is to not allow access to archived webpages if the URL includes an exclusionary robots.txt file. This policy is applied retroactively, so that if a webpage didn't have a robots.txt file when archived, access to the archived file will be prohibited if the current version of the page has the exclusionary request.
Reportedly, a law firm interested in the archived version of pages was able to get around the Archive's policy through the simple expedient of repeatedly requesting the documents until the system, for whatever reason, spit them up.
What has gotten many commentators interested is that the anti-circumvention provisions of the Digital Millennium Copyright Act have been invoked against the law firm that accessed the files by repeatedly requesting them from Archive.org.
Read the (massive download) complaint here: Healthcase Advocates v. Harding Complaint [PDF].
Some of the commentary here:
The first important thing to note about the case is that the Wayback Machine is not being sued for violation of the DMCA itself, but only for a variety of contract-based claims, for not taking care to ensure that their robots.txt voluntary compliance actually did what they claimed was the policy.
The only party that is alleged to have violated the anti-circumvention provisions of 17 USC 1201(a)(1) are the lawyers who actually accessed the archived pages. The key element of this charge is here:
44. The denial text string in the robots.txt file on the computer server hosting the www.healthcareadvocates.com web site effectively controlled access to the archived historical content of the www.healthcareadvocates.com web site through the Wayback Machine at www.archive.org.
Nope. According to Wikipedia
is purely advisory, and relies on the cooperation of the web robot, so that marking an area of your site out of bounds with robots.txt does not guarantee privacy. Many web site administrators have been caught out trying to use the robots file to make private parts of a website invisible to the rest of the world. However the file is necessarily publicly available and is easily checked by anyone with a web browser.
Pretty open and shut case. Robots.txt is not a technological measure under the DMCA. If it were, then all you'd have to do is slap a little text into any file saying something along the lines of "don't copy this" and suddenly it would be technological control measure. This would also pretty clearly eviscerate 17 USC 1201(c)(3), the no technological mandates provision:
Nothing in this section shall require that the design of, or design and selection of parts and components for, a consumer electronics, telecommunications, or computing product provide for a response to any particular technological measure, so long as such part or component, or the product in which such part or component is integrated, does not otherwise fall within the prohibitions of subsection (a)(2) or (b)(1).
The fact that the Internet Archived voluntarily agreed (perhaps even contracted) to comply with the robots.txt file doesn't change a thing. Either something is a technological measure under the DMCA or it is not. It can't be one thing for one party and something else for another. Either it meets the definition under 1201(a) or 1201(b), and is applicable to all parties, or it does not. Otherwise, you could see collusion between two parties to make "don't copy this file" into a technological measure under the DMCA, at least for their purposes.
You can't have a DMCA violation if you don't have a technological control measure. So, if the plaintiffs want to maintain a DMCA violation they're going to have to base it on something other than a robots.txt file.