This discussion is heavily biased and prioritizes people's emotional need to be credited and/or paid for their work over a discussion of the legal and ethical concerns at play here. It disregards the comments of an expert in the field and focuses instead on demands that may well be unsupported by copyright law.
For example, GitHub license section D.4 specifically grants GitHub the right to display your content, analyze your content, and reproduce it in full to other users of the service. Yet no one seems particularly interested in discussing that here today, because it isn't compatible with the outrage that people are prioritizing on HN when discussion Copilot.
I would have expected HN to be better than Reddit in this regard, but I'm not seeing it yet. I don't know if the expert is right or wrong here, but nothing in today's comments suggests anything new or curious that hasn't already been ranted about in every prior thread about this topic. I specifically care about copyright law and it's disappointing to see HN having a group tantrum instead of a discussion.
The legal commentary I'm seeing from people who really know this stuff is pretty much unanimously in favor of this being legal in at least most of the world based on caselaw--while acknowledging why some might have ethical concerns.
I'm actually sort of curious as to the vigor of the backlash. Because Microsoft? Because of concerns about perceived further undermining of the GPL in particular? Because of people anxious to get their credit? Because...?
> I'm actually sort of curious as to the vigor of the backlash. Because Microsoft? Because of concerns about perceived further undermining of the GPL in particular? Because of people anxious to get their credit? Because...?
Because, this is really against the understanding of what was possible for copyrighted works. So, now that this is possible for anyone, copyright will start to get examined and hopefully updated to be useful in today's environment.
There are about a million problems with this.
This can even be used to intentionally launder source codes from a competitor. Apparently, all it will take will be to steal code (or just fork it), then create more than 10 copies on Github. At that point, copilot will start to emit the code during use. With all the legal commentary saying this isn't infringement, imagine how companies will be able to use this product.
Similarly, the training set can be intentionally polluted, so your competitor finds the output of Copilot worthless.
Because they’re not getting a share of GitHub’s future revenues from their works or from derivations or their work.
(Why do they care so much about revenue? Open source coders and ‘starving artists’, not to mention Covid economic wreckage, the US approach to medical insurance, and the total absence of Universal Basic Income in virtually all countries permitted to access GitHub.)
> For example, GitHub license section D.4 specifically grants GitHub the right to display your content, analyze your content, and reproduce it in full to other users of the service. Yet no one seems particularly interested in discussing that here today, because it isn't compatible with the outrage that people are prioritizing on HN when discussion Copilot.
Well, Copilot isn't really an analaysis and display of the source code within the original meaning that people held. That was meant more to run codeql, github actions, and other analysis while presenting the results in a repository to people. People never anticipated that github would strip their licenses from files and present their source code inside of VSCode for people to use freely. It may be legal, but what we are seeing now is an abuse of the sentences you just quoted that goes outside what they were originally understood to mean.
Is it fair use to remix two musical albums into a new derivative work, that cannot plausibly be judged to replace demand for either original work?
Is it fair use to autogenerate GIFs from movies, perhaps the most protected digital works on the Internet today, in order to use them as reaction memes on Imgur?
Is it fair use to autoextract code fragments from a code base, in order to use them as suggestions on GitHub?
The Internet, and I imagine HN, was in an uproar when the music industry attempted to kill the White Album, because it infringed on their freedom to remix and derive.
The Internet, and I imagine HN, was in an uproar when MLB attempted to kill unauthorized baseball GIFs and replace them with official curated ones, because it infringed on their freedom to remix and derive.
How, precisely, is remixing and deriving from code ‘abusive’, in contrast to the past ten or twenty years of pressure on the Internet to the contrary when remixing and deriving from music or movies?
This is a core point of the original post linked above, where the author is shocked by our demands for more prohibitive copyright interpretations, and I want to call this out more bluntly and less politely than they did:
Fair use of a work is almost always perceived as abusive and unfair by the creator of a work. Creators ignore the cognitive dissonance between their demand to have fair use rights granted more easily to the protected works of others, and their demand to have fair use rights granted less easily to their own protected works.
I see that dissonance go unaddressed in every top-level comment in today’s discussion. I see that desire to deny fair use rights driving hundreds of emotional me-too posts, without considering the framing of whether it is fair use in alignment with every prior copyright outrage we’ve discussed over the years.
My theory is that permitting discussion of fair use would weaken their efforts to groundswell a pitchfork mob, and no one wants to confront their own biases or emotional investment or inability to profit from their code.
Whatever the motivations, HN deserves better than this.
The Grey Album is the only example comparable to what Copilot is doing. And even this is tenuous. It was a one-off and even though the copyright holder EMI did not give permission, the creators of the content remixed were happy with the re-use. Moreover, Danger Mouse could have sought a statutory licence that only applies to music under US law, whereas no such thing exists for code.
None of the other examples match up because the GIFs are not being compiled into films. The remixed works are in a different field of endeavour.
If Copilot were being used to show snippets that scroll across the screen in hacker films, or used by musicians to rap a few lines of Rust code, that would be palatable to the copyright holders. It is transformative and very likely to be fair use.
If Danger Mouse, who created the Grey Album, had instead started a business selling access to a tool that splices in copyrighted music and video based on a clip that the user provided, facilitating widespread, systematic infringement, creators would have been far less sympathetic, and EMI far more persistent in their legal attempts shut it down and collect damages.
You ask a very insightful question. Let me see where I end up running out the analogy in a certain direction.
If Danger Mouse sold a remixing tool that enable widespread remixing of any/all albums, would DM be profiting illegally from the content of others?
In each individual case, the remix album produced would have to pass the fair use tests, and if the user produced a sufficiently close replica, they could be restrained from distributing it. But that wouldn’t implicitly be the remixing tool’s fault, unless it mechanically reproduced a complete protected work with the user completely unaware it was doing so. A dedicated user can make any tool produce a protected work, so we have to aim for the narrow window of user-oblivious in order to fault the tool.
Translating back to Copilot, this then becomes the question: can Copilot regurgitate an entire protected work for a user who then sells that work, with the user being fully unaware that they have reproduced a protected work without meeting fair use terms, such that Copilot is responsible?
Copilot requires user prompting to emit code, and seems to draw the line at around the single function boundary, so reproducing an entire codebase becomes exponentially less likely as the number of functions increases.
So if there were a weakness in Copilot’s defense, it would be in small single-function programs, at which point the parallel to another music case comes to mind: the person who generated and copyrighted every single musical phrase in Western major/minor, to prove that the law as written is not applicable when the total size and complexity of a given work falls below a certain threshold. I thus assume that Copilot is essentially protected in the single function case - it doesn’t matter if you have a protected work for (‘four’ (2 2 +) func), because that’s so simplistic that any human might reproduce it at any time unaided, and so claiming against them would fall flat when a judge applies the common sense threshold. It’s a high bar to expect a judge to recognize this analogy and understand code well enough, but I think between user intention to break fair use being required for complex multi-function systems, and the protection of snippets being essentially impossible to enforce against in music terms, would absolutely shield Copilot from being judged liable and owing damages.
(General disclaimer applies: I am not your lawyer, please seek legal counsel before making use of my opinion, etc.)
You wrote 11 paragraphs about how HN deserves better than what everyone except you wrote. Yet in your reply you didn't address the comment to which you replied.
If it helps you parse my reply, consider that fair use explicitly intends to allow unpredictable derivatives that might otherwise be rejected by the copyright owner of a work, and so most of my reply anchors directly to this final paragraph of yours:
> People never anticipated that github would strip their licenses from files and present their source code inside of VSCode for people to use freely.
I can’t offer you a more detailed mapping of my reply onto exclusively your talking points, as I didn’t consider that a viable constraint. My original point at top of thread remains clear on my mind as I try to provide - using the examples of my own questions and concerns, after objections in the past that I wasn’t! - of what better, more reasoned, more curious, more worthwhile conversation looks like on this topic.
I do accept that not everyone desires to see the change in tone I’m trying to represent here, and no doubt I have been imperfect in my efforts to represent it. I’m sad that this isn’t connecting for you, even though I accept each time I try this that understanding and agreement are never universal. Thank you for your effort in trying to understand all the same.
> D.4 specifically grants GitHub the right to display your content, analyze your content, and reproduce it in full to other users of the service
If you read the section carefully, this covers the right of GitHub to do those things to your content "as necessary to provide the Service". "It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service".
So, does "Service" only cover the type of Github's service at the time of the agreement, or does it allow Github to invent all kinds of unrelated services and use the code as such? If Github can provide a "Copilot" service that arguably "learns" the code, can it also provide a service that blatantly "copies" large pieces of source code for the user (without complying to OSS license terms)?
It's not very clear what the answer would be, but if what I described is allowed, the consequences of this term being so broad would imply that if you're not the copyright owner of code you uploaded to Github, you've probably violated some OSS license by agreeing to Github's terms.
Which OSS licenses are potentially incompatible with GitHub? Are they also incompatible with GitLab? How can one or the other be judged to have exceeded the bounds of what is permissible as a user-generated content provider, and/or fair use rights, in the legal jurisdiction of each?
> For example, GitHub license section D.4 specifically grants GitHub the right to display your content, analyze your content, and reproduce it in full to other users of the service. Yet no one seems particularly interested in discussing that here today, because it isn't compatible with the outrage that people are prioritizing on HN when discussion Copilot.
How applicable is the Github license when a lot of code on Github (let's say eg. the Linux kernel) was posted there by people other than the individual copyright holders? I'd assume they can only rely on the open source license of the code in question, and not really on additional license terms. As far as I can tell, Github claims fair use rather than citing their license.
That's perhaps the most important question of this entire debate, and it's the one that no one is considering seriously here in the comments. I personally think that it's because no one at HN is both competent enough at copyright and licensing law to debate it and willing to spend time debating it with Internet commenters for a $0/hour wage.
If an oversight is all the excuse needed to dismiss considering anything I’ve said that you find unpalatable, then I can save you trouble and instruct you to dismiss everything I ever say, now and in the future, as I am merely human and will continue to be imperfect forever. I don’t generally intend to post this disclaimer on every comment I make, as this is a standard human condition defect in all of us, but I hope this one-time exception allows you to reject my opinions and move on to other discussions with a clear conscience.
There is no logical fallacy since HN refuses to even have a logical discussion about Copilot.