User:Moonriddengirl/Milhist

Copyright violation and why it's a problem, by Moonriddengirl

I am not a member of the Military History WikiProject, though I've written a few articles that fall under your umbrella, but I am an admirer. I don't remember precisely when I first encountered your group, but I know the general context: it was a copyright problem. That's the general context in which I encounter many Wikipedians. For several years now, I've devoted most of my administerial time to addressing text-based copyright concerns on Wikipedia. Needless to say, I am often the harbinger of doom for an article, and I am not always warmly welcomed. My first experience with your group impressed me deeply: I found you collegial, focused, organized and ready to help.

The way we handle copyright problems has changed since then. That was before the Copyright Cleanup WikiProject, before the "contributor copyright investigation" noticeboard, before I personally had any idea how widespread a problem copyright might be. But my experience of your group has not altered. Accordingly, when I was asked if I might write write something for your newsletter about copyright work on Wikipedia, I was more than happy to comply.

I've focused on text rather than images, because text is what I know.^[1] From the very basic (a definition of copyright violation for Wikipedia's purposes) to the more complex (instructions how to help with a CCI), I've tried to give an overview of copyright cleanup on Wikipedia. But there is far more to be said on the subject than I have said here (Scout's Honor); if I have not covered something here you'd like to discuss, I am always open to questions and conversation. Come by my talk page, any time.

What "copyright violation" is...and isn't

The copyright laws described here are those of the United States, as these laws govern the U.S. based Wikipedia. Copyright laws do vary internationally. This is a brief overview of a complicated situation, and it is by no means legal advice.

Copyright violation" is not the same thing as "plagiarism"

What we're talking about here is not plagiarism, although plagiarism may be involved. Plagiarism is the unattributed or inadequately attributed use of the work (including uncommon information or ideas) of others. Wikipedia has a guideline on it, and it's generally easily repaired with credit. Copyright infringement, by contrast, can occur even with full attribution.^[2] Copyright law does not protect information or ideas (not even if compiling them represents significant labor), but rather all elements of creative expression, which may include language, structure, and organization.

Any substantial unauthorized use of copyrighted content that exceeds "fair use" may be found to infringe. Copyright infringement can occur whether content is taken verbatim or simply follows too closely on an unfree source.^[3] There is no such thing as a safe word count;^[4]^[5] there is no such thing as a safe percentage to change or retain.^[6] Although the U.S. courts have made efforts to come up with fair and reasonable tests for copyright infringement, generally it comes down to "I know it when I see it".^[7]

"Copyright violation" is not even the same thing as "copyright violation"

No, really, it's not.

Our article on copyright infringement says, "Copyright infringement (or copyright violation) is the unauthorized or prohibited use of works covered by copyright law, in a way that violates one of the copyright owner's exclusive rights, such as the right to reproduce or perform the copyrighted work, or to make derivative works."^[8] Legally speaking, a copyright violation exists when a court of law confirms that it exists. They're the ones who determine if the use violates exclusive rights.

On Wikipedia, "copyright violations" are, literally, violations of Wikipedia:Copyrights. This policy, which incorporates by reference our non-free content policy, has been conservatively devised to remain well within U.S. copyright law.^[9] Copyright laws are complex, but we've done our best to make our copyright policy as simple as possible. When it comes to text, we need to remember the following:

If it's obviously free (for instance if it is obviously public domain by age or origin), we can copy it if we attribute it (per Wikipedia:Plagiarism). A word of caution: a lack of copyright notice does not make it obviously free.
If it's free, but not obviously, we have to prove it, by linking to a disclaimer or otherwise verifying permission. (And, yes, we have to attribute it. More than citing it, that means we have to acknowledge that it is copied. See Category:Attribution templates.)
If we can't prove that it's free, even if we think it might be, we can only copy a little bit of it, clearly marked as a quotation and attributed, and only for good reasons such as those set out at Wikipedia:NFC#Text. Aside from brief quotations, all other information we take from sources we can't prove to be free needs to be completely rewritten in our own words, which includes altering language, structure and organization as much as necessary to create a new work. (Tips on doing that can be found in this essay and in this signpost dispatch, beginning under "Avoiding plagiarism".)^[10]

Any text imported to Wikipedia that does not accord with these three points is a copyright violation (even if not a copyright violation) and needs to be dealt with accordingly.

Why we should care

Opinions vary widely on copyright. Some people see the misuse of copyrighted content as akin to theft, and to them there's a moral dimension to stopping it. Others see copyright laws (at least as currently written) as draconian attempts to control what should be free information,^[11] and to some of these there's a Robin Hood dimension to liberating material. A lot of people just don't see it as that big a deal. Working copyright on Wikipedia over the last couple of years, I've more than once been told that the educational nature of our website and its non-profit status should make any use we make of copyrighted material "fair use", or, at least, a small matter, easier to apologize for than to ask permission.

As far as I'm concerned, it doesn't really matter whether we support or don't support the concept of copyright; we all (presumably) support Wikipedia, and copyright issues can seriously impact our project and its goals. Focusing only on that impact, misused copyrighted content can lead to (a) legal complications for the project, its contributors and its re-users, (b) developmental set-backs to articles when the removal of older copyright problems disrupts later edits, and (c) reluctance on the part of others (even other Wikimedia projects) to reuse our content.^[12]

Notes

^ There are tips for image clean-up at Wikipedia:WikiProject Copyright Cleanup/How to clean copyright infringements.
^ "Copyright: Fair Use". www.copyright.gov. U.S. Copyright Office. May 2009. Retrieved 31 August 2010. Acknowledging the source of the copyrighted material does not substitute for obtaining permission.
^ Osterberg, Eric C. (2003). Substantial similarity in copyright law. Practising Law Institute. p. §1:1, 1-2. ISBN 1402403410. With respect to the copying of individual elements, a defendant need not copy the entirety of the plaintiff's copyrighted work to infringe, and he need not copy verbatim.
^ Templeton, Brad (1994 (October 2008 rev.)). "10 Big Myths about copyright explained". Retrieved 31 August 2010. {{cite web}}: Check date values in: |date= (help)
^ Rich, Lloyd L. (1996). "Fair Use: Interpretations and Guidelines - The Fair Use Doctrine Part II". The Publishing Law Center. Retrieved 31 August 2010.
^ Taylor, Terry (2004). Altered Art: Techniques for Creating Altered Books, Boxes, Cards & More. Lark Books. p. 20. ISBN 9781579905507.
^ The courts may appoint either experts or lay persons to evaluate the level of taking. Bruce P. Keller; Jeffrey P. Cunard (2001). Copyright law: a practitioner's guide. Practising Law Institute. p. §11–31. ISBN 9781402400506.
^ As of 1 September 2010; bolding removed.
^ Why conservatively? Well, first, our content is reused in some nations that have stricter laws about fair use than the U.S. does; setting them up to violate copyright does them no favors. Second, as I've just said, copyright is based on the perception of whatever court appointed viewer evaluates the material. We don't want to push the boundaries here, because it would be bad news to reach that point and be judged wrong.
^ Yes, I know the latter is about plagiarism, but in this case the advice can help avoid copyright infringement, too.
^ cf. Kinsella, N. Stephan (Spring 2001). "Against Intellectual Property" (PDF). Journal of Libertarian Studies. 15 (2): 1–53.
^ Reuse is something we encourage. The Wikimedia Foundation's mission is "to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally."

What we can do about copyright violation, by Moonriddengirl

To miminize damage to the project, we should do everything we can to prevent copyright problems in the first place, but when they do creep in we need to get them out of here as quickly as possible. Doing so not only protects copyright holders and reusers, but is also a great courtesy to other Wikipedia contributors. My heart sinks when I'm cleaning copyright problems and encounter an article that shows considerable time and effort from other contributors. When they are building on the base of a copyright infringement, their work must frequently be eliminated, too, as a derivative work of the original. Not only are those immediate man hours lost, but I can't but worry that the contributor might be lost as well. How many times does this have to happen before our post-copyright contributor begins to wonder if it's worth it?

How to recognize it

I have personally found copyright violations in almost every level of article, from stub to good article. Copyright violations have made it to the main page, in DYKs. They've been placed by IPs and administrators. Accordingly, there is no profile on where they're going to appear or who is going to have placed them.

For your project, the most likely red flag to look for will be the addition of a lengthy stretch of professional level text, especially if it lacks sources. This isn't exhaustive: some copyright infringers lift the sources directly from their copyrighted content, while some add copyrighted content incrementally, especially if copying from multiple sources. Too, sometimes we do get extensive contributions directly from professional-level writers.^[1] But this is one indicator that further investigation may be necessary. Likewise, content that has a tone that is all wrong for Wikipedia might have been copied from someplace where the tone was right: an editorial, perhaps, or a less formal resource.

When content raises a red flag, a quick run through a few well-selected search engines can often help confirm if issues exist.^[2] Especially if the content has been present for a while, be careful of our reusers. A list of some of these can be found at Wikipedia:Mirrors and forks. If you aren't sure if the content was published here first, you can do a quick gut-check by (a) looking at the contributor's talk page and (b) looking at the article's history. Has the contributor had copyright warnings before? Another red flag. Was the content placed largely in one edit? Another red flag. A couple of those is worth at least further investigation.^[3]

How to handle it

So, you've found copied or very closely paraphrased content and you've identified what may be the source—or at least a source that seems to predate us. It doesn't matter if you've found the first publisher. All that really matters is that we know that we didn't publish it first. It was copied from somewhere.

How to handle it, honestly, depends on how much time you have. It would be great if you devoted the full time to investigate, but if you can't, it's better to do the minimum than nothing. At the very least, consider noting your concerns at Wikipedia talk:Copyright problems so that others can investigate. (But, please, try not to do the very least as a matter of habit, as that talk page is not heavily watched itself.) If you have time, you can more thoroughly evaluate by following these steps:

Check to see if there is an OTRS permission tag or a {{backwardscopy}} tag referencing the suspected source at the top of the article's talk page. If there is, either permission has already been verified or somebody has already investigated and found that the content was here first. (We hope with good reason. If you have reason to doubt, feel free to investigate further yourself! You can ask an OTRS volunteer to check the ticket, or ask the person who placed the backwards copy template to explain their actions.) If they don't reference the specific source, there may still be problems, and you should at least seek feedback. If these tags are present and seem in order, you're done.
If not, check to see how long the content has been here and if there is any sign that the contributor who placed it here had permission. If it's been there forever, and there's no sign of permission, it will probably need to be speedily deleted. At this writing, articles with unsalvageably corrupt history where there is "no credible assertion of permission, public domain, fair use, or a free license, where there is no non-infringing content on the page worth saving" should be tagged {{db-g12|url=name your source}}. The tag generates a notice for you to give to the contributor. Please do. (See next section for why.)
If the content hasn't been there forever or the article is not unsalvageably corrupt (say somebody pasted something into one or two sections, but others are all ours), you can revert to the last clean version or remove/rewrite the copyright problem. (What if you aren't sure where it was added or how to extricate it? Skip this step; we're coming to this situation soon.) Please tag the article's talk page {{subst:cclean|url=name your source}}. This can help prevent the content being inadvertently restored. (If it's advertently restored, see step 5.) What you say to the contributor here depends on whether they indicated permission. If they did not, the standard notice is {{subst:uw-copyright|Article}}. If they did, the standard notice is.... Well, we don't have one. I usually copy the {{cclean}} notice I've just placed at the article's talk and tweak it a bit. It has all the information they need to verify.
If it's been there forever and permission is indicated (or some credible assertion of public domain, fair use or free license), replace the article with {{subst:copyvio|url=name your source}}. The tag you generate will tell you what to do next, providing the notice to place at Wikipedia:Copyright problems and on the contributor's talk page.
Suppose the copyright problem is not foundational but the copyrighted content is terribly intertwined with the article...or you are afraid based on the article's history that there may be other sources involved...or somebody is edit-warring with your efforts to remove the copyvio. In these cases, too, you should replace the article with {{subst:copyvio|url=name your source}}. Follow the directions the tag generates, and an administrator or copyright volunteer will take over in due course.

Why we notify contributors

Before I started working basically full time at copyright problems, I had already noted the interesting (to me) fact that the copyvio speedy deletion criterion is the only one under which taggers are required to notify the page's creator. With all other criteria, notice is an optional courtesy.

There are good reasons for this. First, obviously, creators may learn that they can't paste content, and that's a win-win. The best case outcome here is a contributor who goes on to a happy, productive Wikipedia life without ever violating our copyright policy again. But even more importantly, a consistent practice of notification may itself protect the project against prosecution. Without bogging down into law, the Online Copyright Infringement Liability Limitation Act doesn't just require that we remove copyright problems when we receive notice of them. Among other requirements, we must also inform people about our policy and their risk of account termination. As the full-on legalese goes, a service provider who wants protection must have "adopted and reasonably implemented, and inform[ed] subscribers and account holders of the service provider’s system or network of, a policy that provides for the termination in appropriate circumstances of subscribers and account holders of the service provider’s system or network who are repeat infringers".^[4]

The notice clearly serves to inform; it also helps to implement. We don't keep a central record of infringers. These notices serve as red flags. When I'm cleaning copyright, I will usually look at a contributor's talk page history to see if he or she has received multiple notices in the past. If the user has and has persisted (sometimes the infringement I'm cleaning will predate their notices), it's time to consider terminating the account, at least temporarily. And it may be time to launch a contributor copyright investigation.

When to seek admin assistance

Admin assistance is necessary when a contributor has multiple warnings but has persisted in violating copyright. The Administrator's Noticeboard/Incidents is one place to go, but you can also simply approach an administrator whom you know routinely works copyright.

Admin assistance is also required if contributors are obstructing copyright cleanup of their own or others' work. If your cleanup is reverted or if the copyright problem is replaced with an unusably close paraphrase, I recommend replacing the content with the {{subst:copyvio}} first, as the template instructs that it is only to be removed by an administrator or OTRS agent.^[5] It may prevent the publication of a copyright problem while the matter is straightened out. If the person obstructing you is working in good faith, a friendly note clarifying the problem is also in order at this point. Sometimes bystander dismay leads to knee-jerk reversions of copyright cleanup, as people fear content is being removed without good reason. Sometimes people are attempting to replace a copyright problem with usable content, but not understanding the extent to which material must be revised. (A pointer to Wikipedia:Close paraphrasing can help.) If the obstruction continues, it's time to head to WP:ANI or to the talk page of an admin who works copyright. We can't publish non-free content for which we don't have permission. Sometimes contributors need to be blocked to prevent the behavior. Sometimes, pages need to be protected.

Replacing copyright violations

Beyond locating copyright concerns, it will often be a great help if you can assist in rewriting them. When a copyright problem is foundational, we may lose the entire article. I have myself rewritten hundreds of these, but I'm sorry to say it's a paltry percentage of what I've had to delete. The Copyright Problems board has daily listings of a handful to even dozens of articles that have been blanked for evaluation; the template blanking them includes a link to a temporary page in which clean content can be proposed. You are very welcome to fill it, even if you are the person who originally blanked the text.

One thing you do need to remember is that rewriting of copyright problems must be done from scratch. Works based upon other creative works are "derivative works".^[6] This includes translations, annotations, abridgments, condensations, elaborations or modifications. If the original content is copyrighted, only the copyright holder has the right to prepare derivative works.^[7] For this reason, incrementally modifying a copyright violation on Wikipedia is not likely to be helpful. If it isn't written from scratch, the rewrite may not be usable itself.

Again, if we can't prove that content is free, we can only use brief, clearly marked quotations for good reasons; all other information should be written in our own language, structure and organization.^[8]

A word about CCIs

Beyond helping to address copyright problems when you trip over them (and even seeking them out), probably the biggest assistance members of your project can offer to copyright cleanup on Wikipedia is helping with CCIs.

A CCI is an in-depth evaluation of one contributor's edits; they are launched (usually with great reluctance) only after we have verified copyright infringement in multiple articles or images. Usually, we will provide notice to a project if a CCI is opened that heavily impacts articles under their provenance, but not always—particularly because contributors often work in multiple areas. However, all active CCIs are listed at the top of Wikipedia:Contributor copyright investigations, along with one or two areas in which they work. A spot check can help clarify if any of them have worked on articles of interest to you.

Any contributor who does not him or herself have a history of copyright problems is welcome to help out in cleaning up CCIs. (Downright celebrated, even.) We have literally dozens of these, with thousands of articles waiting review. The longer copyrighted content remains in an article, the more damage it may do to reusers, copyright holders, and to Wikipedia's contributors, who waste their time polishing something we can't retain.

Each CCI has instructions at the top of its individual listings page. In general, these are the same: content can be removed, if copying is found, or presumptively removed, if copying seems likely. (Remember, in these cases, we know the contributor of the content has violated copyright and are simply trying to figure out where. Given our knowledge, we err on the side of protecting the project, by exercising a reasonable duty of care.) There are special templates that can be placed on article talk pages to help avoid inadvertent return and also to help reduce bystander dismay.

Notes

^ Even if it is their content, it may still be a copyright problem; in one contributor copyright investigation that affected your project, an author copied content from his own books without permission from his publishers. Cleaning up after him was pretty painful, as it cost us a lot of very well-written content, even if it was largely unsourced.
^ I frequently use Google and Google books, but sometimes also Google news and Google scholar. The mechanical detectors I use are generally not sensitive and only detect larger amounts of copying, but User:The Earwig has made a nice tool that can start a search here. I search manually for "apt phrases" or runs of four or five words that don't seem likely to be common. Looking at one of your featured articles, for example, Admiralty Islands campaign, I would not search for the phrase "no signs of enemy activity." With quotation marks (which I do not always use, in case a few words have been minimally changed), I get 6,000 precise matches; without them, I have almost 5.5 million! I'd be more likely to look for "the Japanese had not anticipated an assault" or "completed the isolation of the major Japanese base". For some examples of "apt phrasing", see Department of Political Science. "III. How Not to Plagiarize B. Types and Examples of Plagiarism Type 4: The "Apt Phrase"". Resources on Avoiding Plagiarism. Concordia University. Retrieved 3 September 2010.
^ Speaking of those phrases from Admiralty Islands campaign, for the first I get 7 results, at this moment, all to us and our mirrors; the second brought me to this pdf: not a reuser I'd encountered before. The wonderful WikiBlame tool, available under the "history" tab of every article as "Revision history search", helped me quickly determine that they are indeed a reuser: see where the bulk of the content entered here. Some of the content was already present, having been placed almost a full year before. Such signs of "natural evolution" in any article are a strong indicator that we're being reused.
^ 17 U.S.C. § 512(i)(1)(A)
^ Sometimes these templates are removed by other contributors acting in good faith to clean copyright problems; as long as the copyright is properly cleaned and it causes no other issues, I'm not bothered by this. To me, it's a valid "ignore all rules" situation.
^ 17 U.S.C. § 101
^ 17 U.S.C. § 106
^ See Wikipedia:Close paraphrasing and Wikipedia:Wikipedia Signpost/2009-04-13/Dispatches for some tips.

[1] There are tips for image clean-up at Wikipedia:WikiProject Copyright Cleanup/How to clean copyright infringements.

[2] "Copyright: Fair Use". www.copyright.gov. U.S. Copyright Office. May 2009. Retrieved 31 August 2010. Acknowledging the source of the copyrighted material does not substitute for obtaining permission.

[3] Osterberg, Eric C. (2003). Substantial similarity in copyright law. Practising Law Institute. p. §1:1, 1-2. ISBN 1402403410. With respect to the copying of individual elements, a defendant need not copy the entirety of the plaintiff's copyrighted work to infringe, and he need not copy verbatim.

[4] Templeton, Brad (1994 (October 2008 rev.)). "10 Big Myths about copyright explained". Retrieved 31 August 2010. {{cite web}}: Check date values in: |date= (help)

[5] Rich, Lloyd L. (1996). "Fair Use: Interpretations and Guidelines - The Fair Use Doctrine Part II". The Publishing Law Center. Retrieved 31 August 2010.

[Taylor2004-6] Taylor, Terry (2004). Altered Art: Techniques for Creating Altered Books, Boxes, Cards & More. Lark Books. p. 20. ISBN 9781579905507.

[7] The courts may appoint either experts or lay persons to evaluate the level of taking. Bruce P. Keller; Jeffrey P. Cunard (2001). Copyright law: a practitioner's guide. Practising Law Institute. p. §11–31. ISBN 9781402400506.

[8] As of 1 September 2010; bolding removed.

[9] Why conservatively? Well, first, our content is reused in some nations that have stricter laws about fair use than the U.S. does; setting them up to violate copyright does them no favors. Second, as I've just said, copyright is based on the perception of whatever court appointed viewer evaluates the material. We don't want to push the boundaries here, because it would be bad news to reach that point and be judged wrong.

[10] Yes, I know the latter is about plagiarism, but in this case the advice can help avoid copyright infringement, too.

[11] . Kinsella, N. Stephan (Spring 2001). "Against Intellectual Property" (PDF). Journal of Libertarian Studies. 15 (2): 1–53.

[12] Reuse is something we encourage. The Wikimedia Foundation's mission is "to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally."

[13] Even if it is their content, it may still be a copyright problem; in one contributor copyright investigation that affected your project, an author copied content from his own books without permission from his publishers. Cleaning up after him was pretty painful, as it cost us a lot of very well-written content, even if it was largely unsourced.

[14] I frequently use Google and Google books, but sometimes also Google news and Google scholar. The mechanical detectors I use are generally not sensitive and only detect larger amounts of copying, but User:The Earwig has made a nice tool that can start a search here. I search manually for "apt phrases" or runs of four or five words that don't seem likely to be common. Looking at one of your featured articles, for example, Admiralty Islands campaign, I would not search for the phrase "no signs of enemy activity." With quotation marks (which I do not always use, in case a few words have been minimally changed), I get 6,000 precise matches; without them, I have almost 5.5 million! I'd be more likely to look for "the Japanese had not anticipated an assault" or "completed the isolation of the major Japanese base". For some examples of "apt phrasing", see Department of Political Science. "III. How Not to Plagiarize B. Types and Examples of Plagiarism Type 4: The "Apt Phrase"". Resources on Avoiding Plagiarism. Concordia University. Retrieved 3 September 2010.

[15] Speaking of those phrases from Admiralty Islands campaign, for the first I get 7 results, at this moment, all to us and our mirrors; the second brought me to this pdf: not a reuser I'd encountered before. The wonderful WikiBlame tool, available under the "history" tab of every article as "Revision history search", helped me quickly determine that they are indeed a reuser: see where the bulk of the content entered here. Some of the content was already present, having been placed almost a full year before. Such signs of "natural evolution" in any article are a strong indicator that we're being reused.

[16] 17 U.S.C. § 512(i)(1)(A)

[17] Sometimes these templates are removed by other contributors acting in good faith to clean copyright problems; as long as the copyright is properly cleaned and it causes no other issues, I'm not bothered by this. To me, it's a valid "ignore all rules" situation.

[18] 17 U.S.C. § 101

[19] 17 U.S.C. § 106

[20] See Wikipedia:Close paraphrasing and Wikipedia:Wikipedia Signpost/2009-04-13/Dispatches for some tips.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]