Copyleft has no impact on project activity?!

Copyleft has no impact on project activity?!


Recently, I collected some data from Sourceforge, hoping to find evidence for the importance of copyleft. But I found something surprising: although there's plenty of evidence that many developers believe in the power of copyleft, the one measure I could derive of how much copyleft actually works showed that copyleft made no difference whatsoever! If true, this means a lot of free software's social theory is wrong and many things will have to be re-thought.

One of the fundamental ideas in free software circles is that copyleft—the requirement that improvements to free software remain free to the community—is essential (or at least very important) for ensuring that the public good of free software production isn't co-opted and privatized ("hijacking" I like to call it, in response to those who dub copyright infringement "piracy").

However, while this sounds good in theory, it's a challenge to prove it quantitatively.

One of the fundamental ideas in free software circles is that copyleft is essential

Fortunately, there is Sourceforge. All Sourceforge projects are tagged with metadata for things like licensing, maturity level, category of application, category of target users, and so on. Sourceforge also collects some objective statistics, via traffic analysis of the projects' web services. By doing careful searches using the Trove Software Map and filters, it's possible to tease out some useful statistics.

Since this is a magazine column and not a scientific paper, I'll start with a results summary and conclusions first. At the end, I'll summarize my methods for those who are interested.

Copyleft is certainly popular

The easiest thing to test is "How popular are copyleft licenses?" After all, if many developers use them, that is some suggestion that they are a good thing. It means, basically, that many developers (or at least the people who start free software projects) like copyleft licenses.

Copyleft is very popular with Sourceforge developers: the lowest fraction occurred for Copyleft is very popular with Sourceforge developers: the lowest fraction occurred for "Mature" projects (projects which are still maintained, but are no longer being actively developed or improved), while the highest occurred for software targeted for "End Users/Desktop"

Approximately 80% of all the projects on Sourceforge are under "Free/Copyleft" licenses. You can cut this down in many different ways according to maturity and expected audience. Perhaps surprisingly, there isn't a lot of variation: the lowest copyleft-rates were for "projects aimed at developers" (about 75.9% are copylefted) and "mature projects" (74.3%) while the highest were "projects aimed at desktop end-users" (85.5%) and "maturity level '4-Beta'" (81.1%). "Projects with downloadable files" were about 81.0% copylefted. So "about 80%" is a very good description of the whole picture.

Nearly two-thirds of the projects at Sourceforge use a GPL license and most of the others are Nearly two-thirds of the projects at Sourceforge use a GPL license and most of the others are "GPL compatible"

Of the total, almost two-thirds (64.7%) were licensed under the one most popular copyleft license, the GNU General Public License, and another 11.1% were under its companion the GNU Lesser General Public License (often used for libraries, because it allows for linking with proprietary programs). All told, about 92% of the licenses were "GPL Compatible", which means that, while they weren't necessarily GPL themselves, it would be legal to include them into a GPL project. That number includes quite a few non-copyleft licenses (such as the most popular non-copyleft licenses: BSD (7.1%), Apache (3.7%), and MIT/X11 (2.4%)).

So it's fair to say that many, many developers trust to copyleft when they start a project.

But what does it actually buy you?

If you ask most developers, they'll say they want a copyleft, because they want to avoid their work being co-opted or exploited (or even "hijacked"). That implies an assumption that a copyleft license will encourage more people to feel more comfortable about contributing their work to the project.

I started out with a basic hypothesis that: "Copyleft encourages contributions (because it eliminates fear of co-option of ones work)"

It has been suggested, for example, that the greater size and activity of the GNU/Linux community, versus the much smaller BSD Unix community, is explained by this.

I wanted a way to test this.

So, I started out with a basic hypothesis that: "Copyleft encourages contributions (because it eliminates fear of co-option of ones work)".

That led me to a working hypothesis that: "Copyleft projects will be more active than non-copyleft projects", and finally to the much more specific hypothesis "Copyleft projects on Sourceforge will have higher median activity scores than Non-Copyleft projects".

And that hypothesis is testable with the data I have available. Sourceforge tracks traffic data for projects, and assigns them an "activity" score based on it. This score is normalized to a "percentile", based on the average activity level of all Sourceforge projects. I haven't made an extensive study of the formula that goes into this activity level calculation, but I have included it in the references (the formula is published). It doesn't appear very surprising: things such as home page visits, CVS commits, and so on are collected.

Using the tools on the site, I was able to calculate the median and quartile "activity levels" for my four license categories: "Free/Non-Copyleft", "Free/Copyleft", "Public Domain", and "Other". The results took me a little by surprise:

Surprisingly, the choice of copyleft or non-copyleft licenses seems to have no appreciable effect on project activity levels at Sourceforge (there is a small, but significant difference when compared to Surprisingly, the choice of copyleft or non-copyleft licenses seems to have no appreciable effect on project activity levels at Sourceforge (there is a small, but significant difference when compared to "Public Domain" and "Other" licenses)

The same data is tabulated below:

ACTIVITY PERCENTILES Lower Quartile Median Upper Quartile
Copyleft 47.2% 65.6% 83.1%
Non-Copyleft 45.7% 66.0% 84.3%
Public Domain 46.3% 57.6% 77.7%
Other 55.1% 73.3% 90.0%

For all intents and purposes, the "Copyleft" and "Non-Copyleft" projects are the same! Not only are the medians the same, but the quartiles (which measure the spread of the activity curve) are also the same.

For all intents and purposes, the "Copyleft" and "Non-Copyleft" projects are the same!

There's a greater difference between either and the "Public Domain" and "Other" categories. I'm not really able to account for that either. One possibility for the "Other" category is that many of the projects so labeled consist of projects with multiple licenses (some of these are games, with Creative Commons licenses applying to the "content" while another free license, often the GPL, applies to the "engine"). This might suggest that more thought went into their categorization, which would select for more "serious" projects. Or it might mean that multimedia projects are easier to contribute to.

So far, I haven't found anything wrong with my technique, so I am inclined to accept that these values are genuine. So what does that mean?

Copyleft has no practical advantage?

Well, there are basically three possibilities:

1. Copyleft doesn't work

If the data truly supports the idea that non-copyleft projects have just as much activity as copyleft projects do, then it suggests that copyleft isn't really giving us any material benefit. Despite the myth, people don't care about co-option or exploitation of their contributions, and will contribute anyway. That being the case, it suggests that we should re-think whether copyleft is a good idea (after all, it does carry some burdens).

It would also mean that we have to find an alternate explanation for the relative success of copylefted projects like GNU/Linux versus their non-copylefted counterparts, like BSD Unix.

Perhaps the Free Software Movement, with its insistence on copylefting software is basically just a cult religion, and people are drawn to it by charisma and faith, regardless of the actual evidence?

Maybe it's all just the effect of marketing? Maybe GNU/Linux is simply more popular because people talk about it more? Perhaps the Free Software Movement, with its insistence on copylefting software is basically just a cult religion, and people are drawn to it by charisma and faith, regardless of the actual evidence?

2. Copyleft does work, but somehow the effect doesn't show up in Sourceforge activity

The second possibility is that the activity figures are indeed correct, but (somehow) this doesn't reflect the actual impact of copyleft. Perhaps copyleft only kicks in for extremely popular products, with very high market value and lots of developers? Or perhaps copyleft affects contributory behaviors that Sourceforge's activity index doesn't measure? The question here is over the chain of hypotheses that I listed at the beginning of this article: does each really imply the next, or is there some disconnect along the way?

Perhaps copyleft only kicks in for extremely popular products, with very high market value and lots of developers?

Most projects hosted on Sourceforge are small with only one or two developers. Though this is certainly my impression from experience, it is much harder to prove, since Sourceforge's search engine doesn't let you filter based on number of developers.

However, they do include the data on the main project pages, and it's possible to do a Google search of the site. Google's index of the site is very incomplete, however, only showing about 200 projects out of over 120,000 (i.e. less than 0.2%). That might be enough data if the sample were random, though it's not clear that it is (if, for example, Google cataloged popular pages, then it would probably be skewed towards larger projects). However, it should give some rough idea of the distribution, as shown in the histogram below.

A rough histogram of Sourceforge project sizes, based on Google searches of the site. This shows what you'd expect: most Sourceforge projects are small, with some 30% being single-developer projects. However, there's a pretty good showing for larger projects, too.A rough histogram of Sourceforge project sizes, based on Google searches of the site. This shows what you'd expect: most Sourceforge projects are small, with some 30% being single-developer projects. However, there's a pretty good showing for larger projects, too.

Now, this does show that small projects dominate, but not to the extreme that you'd expect everything else to be lost in the noise. Single developer projects make up only about 30% according to this chart. Still, it might be enough to muddy the impact of copyleft on participation, if only the largest projects were appreciably affected.

3. Something is wrong with the experimental technique

Lastly, of course, there is always the lurking gremlin of bad experimental design. I could've done something really dumb, so that the numbers I measured don't actually mean what I think they mean.

Lastly, of course, there is always the lurking gremlin of bad experimental design

I do believe that the formula for calculating the activity score is normalized against the entire body of Sourceforge projects, and is therefore effectively constant for the purposes of this study. It would be prohibitively computationally expensive for Sourceforge to do it any other way.

The activity scoring system is somewhat documented. See:

for more information on how the activity indexes are calculated. These articles lead me to believe that the numbers are absolute enough to be compared between different selection sets, as I have done here.

What do you think?

I'm putting this out here for you to think about: either it means something rather unexpected about copyleft (or should we have expected it?), or it means something weird and difficult to explain is going on. Please leave a comment if you want to suggest a new hypothesis, or offer an explanation for the results, or suggest some way in which my methods are flawed. I'm genuinely interested in refining the results. This was a quick ad hoc study, and I don't pretend that I've dotted every 'i' and crossed every 't', there's certainly a possibility of error on my part.

Observational Methods

As promised, here's how I did the research...

Sorting out the licenses

Sourceforge's Trove system breaks licenses down by type into "OSI Approved License", "Public Domain", and "Other/Proprietary". The OSI category, unfortunately, contains both copyleft and non-copyleft licenses. Therefore, the only way to make the distinction was to look up each of the listed licenses (all 72 of the OSI approved licenses!), and decide which category it belonged to. Then, each appropriate license can be "excluded" or "required", using the Trove system. I got these results by excluding all the licenses that didn't match my criteria (the numbers for "Copyleft", for example, are the results when all of the "Non-Copyleft" licenses, the "Public Domain", and the "Other/Proprietary" licenses have been excluded.

Technically, of course, there is not a one-to-one correspondence between projects and licenses. Sourceforge allows having multiple licenses on a single project. However, only a very small fraction of projects are affected. The biggest exception to this is the figures for "Other/Proprietary": many of those projects are actually dual-licensed projects, where one license (such as the GPL) applies to "code" while a different sort of license (such as Creative Commons By-SA) applies to the "content".

Activity Levels

The main result here is the difference (or lack of difference) in median activity levels between copyleft and non-copyleft licenses. The search engine doesn't directly deliver this information. However, using the advanced search, it is possible to restrict searches to ranges of activity levels, using a search term like this: percentile:[000.0 TO 050.0] to limit to activities between 0% and 50%. The resulting search will tell how many projects fall into this range.

If you record the number of projects returned without the activity range limitation, then divide by two, you will have the number of projects which will be returned by a search which limits the search to activities below the median. This is by definition true, because the median is the "50th percentile": the median activity level is defined to be higher than half the projects and lower than the other half. The quartiles are defined similarly, except for the 25th and 75th percentiles.

So, all that's necessary is to do a binary search: split the range, see if you're low or high, then split the appropriate remaining range, and keep going until you reach the level of precision you want. I determined my figures to 0.1% in this way—more than enough precision for this kind of study.

Remaining Questions

One thing I would've liked to have done, would be to compare projects with large numbers of developers, excluding the smaller projects. There's no filter or search method provided for that, and the free text search fields that Sourceforge provides only compare against the project name and description fields.

Certainly Sourceforge does have this information in their database and it's not a secret: every project has the number of developers as public metadata for the project. But there's no easy way to search for it without crawling the site (which is considered very bad form, because it would tax the servers, which are already pretty busy). I may be able to get access to bulk information from which this kind of result can be derived (I've already asked, but it's too soon to expect a response).

Giving up on copyleft?

So, is copyleft a bad idea? Well, once the cat's out of the bag, you can't put it back in: if you release without a copyleft, you can't easily put it back on your work. Also, copyleft can be viewed as a kind of insurance against a possibly unlikely, but unpleasant occurrence. Maybe that's all it is. Then again, maybe that's enough.

Also, one line of evidence does not tear down an entire theory. There are a number of indirect evidences that copyleft is useful, and the FSF does handle a few cases of GPL infringement (mostly settled out of court). So copyleft is definitely being used. The question here is whether it has a practical benefit of encouraging people to contribute to development projects.

For my own projects, I'm still favoring copyleft. But copyleft does have a cost: conflicts between different copyleft licenses, certain kinds of activities you can't (or can't easily) do with it (sometimes being incorporated into a proprietary project is good for your project). So, this information is certainly forcing me to re-think some of my assumptions.

Category: 

Comments

ncjones's picture
Submitted by ncjones on

It is no surprise that the license used in sourceforge projects does not appear to have any effect on the levels of activity. Simply by being on sourceforge all these projects already have everything they need to build an active development community. Using a copyleft license doesn't affect the development process used in community-driven projects so we shouldn't expect them to be more active.

Since the goal of the GPL is not to make sourceforge projects more active, judging its effectiveness by measuring the activity of projects using this license seems rather pointless. Its intention is to give freedom to the users of software and make it easier to create derivative works or branches, so perhaps a better measure for the effectiveness of copyleft would be to count the number of successful derivative projects.

Terry Hancock's picture

You don't need a copyleft to be free to make derivatives, just a free license. All of the projects being considered here are "free" in that sense.

The distinction is between projects which legally enforce derivatives to be free-licensed (copyleft) and those that don't (non-copyleft).

Copyleft doesn't make any one person "more free". It can be argued that it makes the project "stay free", but that's a different matter: each individual user has certain freedoms curtailed (i.e. the right to make proprietary derivatives), when compared to non-copyleft free projects.

It is a good suggestion that SF provides so much support for developers that it is possible that contribution is so easy as to make a legally-enforced copyleft unnecessary.

But if so, then why continue to use copyleft licenses?

They make users less "free" in the sense you mean (they restrict what you can do with derivatives), so they do carry a burden, even if it is slight. So, they need to be doing something helpful to make their use "economically rational".

Of course people could use them irrationally -- out of spite, for example, intending to harm would-be proprietary users. But if there are no positive results, then such negative effects are not rational or ethical, are they?.

Paul Gaskin's picture

I would not want my favorite GPL-licensed free software projects to be confused with proprietary derivatives.

The source code of free software projects are protected by some parts of the copyleft license, and the identity and integrity of the project are protected by the stipulation which prevents proprietary derivatives.

It's not resentment, although I'll confess to feeling some resentment against various proprietary software companies.

I resent it when recruiters ask for my resume as a Word .doc or Adobe .pdf. I resent when employers request Photoshop or Dreamweaver as if you're not a graphic or web designer without using specific proprietary software.

I resent all the various proprietary media codecs which make content inaccessible.

John Edwards's picture

Interesting. So how are we now to explain the popularity of Linux relative to BSD? The BSD projects are much more centralized, and they were not originally intended for desktops, but I am just guessing.

As you said, copyleft is still useful anyway. It keeps code from becoming non-free, which is what it was intended to do. It would be interesting to compare things like number of forks, integration with other projects, code "hijacking" for copyleft and non-copyleft projects.

Of course, we already know copyleft is not essential for free software projects- the BSD tradition has survived for over 30 years.

Mauro Bieg's picture
Submitted by Mauro Bieg on

it is possible that contribution [to SF] is so easy as to make a legally-enforced copyleft unnecessary.
But if so, then why continue to use copyleft licenses?

The point of using copyleft is not attracting more users to participate in a project, but (as you said) rather to keep all the derived code free.

Let's be honest, in most cases copyleft absolutely doesn't matter; about 95% of all code under a copyleft license never is in danger of being assimilated and made proprietary, simply because no one has an interest in using that code in a proprietary product. But in the rare case that this danger becomes real, and someone wants to take, use and sell your code without sharing his improvements back with you, in that rare case copyleft does matter and becomes very important indeed. Then copyleft is the only way to prevent your code from being assimilated and if your project hasn't been under a copyleft license from the very beginning, there is nothing you can do about it now.

Because in most cases copyleft doesn't matter, people don't think about it as a criteria when joining or contributing to a SF project, no surprise. (Or if you have a patch for a program, have you ever considered not submitting it because the program is under a non-copyleft-license?) Still, in rare situations copyleft can become crucial and we thus should keep using it as much as we can.

Terry Hancock's picture

Yes, that's one way to look at it: it's like insurance. Odds are you never need it, but if you do, it's important.

The only thing that concerns me about this analysis, is "What is it really protecting?"

It obviously is attacking proprietarization (if that's a word) of a free software project.

But, if many developers will continue to contribute to the free project, then why does it matter if there is a proprietary competitor? (Or rather does it really matter?)

I have to admit that instinctively, I agree with you that this is important, but I have trouble rationalizing it (and it's easy for me to see how this could be something that people -- including me -- will react irrationally towards).

Mauro Bieg's picture
Submitted by Mauro Bieg on

why attacking proprietarization?

  • strategic goal: well, if your goal is to spread free software and lower the market share of proprietary software you certainly don't want free software to become proprietary. here the whole free software rationale comes in, about preserving users freedoms. u dont want somebody to take the firefox code, make it proprietary and implement a small but cool killer feature so everybody will use this version and most users are deprived of their freedom (the new firefox might start playing dirty; like IE did against netscape, not adhering standards etc).
  • personal motivation: as a free software developer, if you decide to not give away your software for free but share it within a community, you can limit the community to only include people that are willing to share back their improvements. you'll only share within the free software community, not with just everyone (like proprietary software developers)
  • and closely related to the point above, you can look at it as a moral question: it is unjust if you decide to share your work for free and someone else takes it, doesn't share his improvements and makes a profit from that on top of it.

As I said, in almost all software projects these aspect never come to play. and because it's not a very probable danger, people don't think about it when contributing to a project. but in the rare case that proprietarization does happen, you just wish you'd have had copyleft.

Ryan Cartwright's picture

Having viewed the "SourceForge Stats Demystified" posting I noted that the activity percentile for a project is "a combination of three different classes of activity: traffic, development, and communication."

Only one of those really represents developer activity, the other two are mostly about usage. The development activity is measured by the "number of CVS or SVN commits, the number of days since the last file release, and the number of days since the most recent project admin login". So with a project that is more stable or has a single developer this would bring the development activity percentile down.

To give you an example of how activity != development activity, I have two active projects on SF and I am the sole developer on each. I have never made a CVS/SVN commit to either and don't log in as admin that often. The last release for the one (RyansIOB) was in January 2008. For the other (RyansEDS) it was in April 2007. Both have development status of 5-Production/Stable and get some but not huge amounts of web traffic or downloads. With that in mind you would expect the activity percentiles to be quite low for both but RyansIOB has an activity of 96% and RyansEDS, 93%. Both have copyleft licences. It seems that development activity does make up a large part of what constitutes project activity.

On the other hand a third SF project of mine which is in 2-Pre-Alpha stage and has had no activity of any kind other than to test out an SVN commit which was then wiped, has an activity percentile of 83%!

On that note it may be fair to say that SF activity percentiles are only really useful to find trends within a single project?

cheers
Ryan

Terry Hancock's picture

Well, your examples show how off-target the activity numbers can be, I suppose. It would be better to measure pure CVS activity (and possibly it would be better to do those in some absolute way, such as "net SLOC altered per month" or something).

I'm still pretty surprised that copyleft makes no difference to these numbers, though. I would've expected it to have some impact, even with this kind of consideration.

It could be, of course, that "activity" is so dominated by page views or some other measure that has little to do with development activity that no development effects show up. But that seems a little unlikely, given that the numbers are supposed to measure activity rather than popularity.

igli's picture
Submitted by igli on

> Well, your examples show how off-target the activity numbers can be, I suppose.

Which kinda invalidates the whole thing, statistically speaking. Honestly, this article is really flimsy, and the "conclusions" both erroneous and sensationalist. I'm sorry your gut instinct that there would be some difference was not fulfilled. Any programmer will tell you (wrt profiling): gut instincts are useless, numbers matter. You've simply gone for the wrong ones.

pachelbel's picture
Submitted by pachelbel on

Terry, good post. The stance about the licenses is a never ending one. Your statistics show in effect that many developers like to publish work, more than insist on keeping the fruits in the free world. One criterion I have not seen in your analysis, though, is the complexity of the applications - a single-man app is probably simple.

The reason I heard for Linux being more popular than *BSD is the consequence of the legal action in 1993. As a result, a whole bunch of the developers reacted by moving to a project that would keep clear from further actions - and they tend to be more "missionaries" than those who joined the free software at a later stage. What I see is that Linux, and GPL software in general, is more visible and therefore ends up more in the statistics. BSD software is sometimes reused discretely in commercial products (where it is extremely difficult to measure its impact) or in niche applications (like the public DNS backbones, for instance) and these extensions tend to disappear from the figures. The BSD's tend to work more on technologies than products and use the 3 major distributions as a nest for development and proof of concept and the systems are usually high profile, high quality environments. To be honest, I like Linux but often feel "tight on the edges" when I compare it to FreeBSD - but that's a personal impression.

Now, as a developer, if I had to base a product on other applications, my choice for the license would depend on the target: for a free application, I would not care much but for a commercial one, I would prefer BSD, MIT, Apache or the likes. If you consider the big players in the free software, like Apache or PostgreSQL, they often do too - or apply a dual license like MySQL.

Terry Hancock's picture

Thanks!

One criterion I have not seen in your analysis, though, is the complexity of the applications - a single-man app is probably simple.

Yep, that was my thought. The best I could do without privileged access (or crawling the site) was to look at the Google searches, to get some idea of how developer count is distributed. Unfortunately, that number can't really be trusted because it's such a tiny sample and not necessarily unbiased.

Also, even if it were definitive, it doesn't really answer the question, it just shows the potential for masking of effects on large projects by the overwhelming majority of small ones (the interesting thing, actually, is that it isn't as peaky as I'd have expected -- but that could be caused by bias in the Google crawler).

I've already placed a support request with SF to see if they'd be willing to give me something I can use to do that kind of analysis. I think it could be interesting.

mattflaschen's picture

You're missing the point so badly it's like you weren't even aiming. I don't use copyleft licenses for any of your reasons, and I don't think most others do either.

I use copyleft for one reason:

I want people to use my software freely, and I don't want them to take that freedom away from future users if they make and distribute modifications.

You can not prove that philosophy invalid through any amount of misused statistical data.

The reason you suggest for copyleft is totally inapplicable to me:

"Copyleft encourages contributions (because it eliminates fear of co-option of ones work)”."

It would be nice if people contributed code back to my project, but I don't care if they don't. What I do care about is that any derived code they release is free.

Acting like copyleft is some unproven theory ("Also, one line of evidence does not tear down an entire theory. There are a number of indirect evidences that copyleft is useful") is absurd. Do you think it's a coincidence or an accident that there are dozens of proprietary forks of BSD but not a single (legal) proprietary fork of the Linux kernel? No...the Linux devs decided they didn't want proprietary forks, they used the GPL, and surprise, there are none. It certainly seems like it worked out for them and thousands of other developers.

"The question here is whether it has a practical benefit of encouraging people to contribute to development projects."

No, that is not and has never been the question. The question is whether all future users of my software will have the 4 freedoms.

admin's picture
Submitted by admin on

Hi,

I need to jump in because this is important:

"You're missing the point so badly it's like you weren't even aiming."

What kind of state of mind do you expect Terry to have when he replies to you?

Please.

Merc.

Terry Hancock's picture

Thanks for your comment. However, we agree on the goal of wanting users to continue to have the "four freedoms" in their software.

I want people to use my software freely, and I don't want them to take that freedom away from future users if they make and distribute modifications.

The choice of whether to use a copyleft or not has no bearing on whether users are free to use your software.

Proprietary derivatives cannot possibly "take that freedom away" from future users (by definition, an extra alternative cannot reduce freedom). It can only give them a non-free alternative (under the assumption that this alternative is worthless, this has no impact on user freedom, because they are still just as free to use the free version). Only free derivatives can give them freedoms (and if there are no derivatives, there are no freedoms to be had in them).

Thus the only mechanism by which allowing proprietary derivatives can "take that freedom away" from users is if the existence of the proprietary alternative negatively effects the production of the free alternative (for example, if the same contributions would've been made under a free license had the license required it, but weren't because it didn't).

This may or may not be true. But it is a factual question, subject to empirical investigation.

If non-copyleft projects were threatened by the proprietary competitors that spring up from them, then you would expect to be able to measure the impact of that threat if you look at a large number of copyleft and non-copyleft projects and compare their success rates.

In fact, based on the degree of faith that the free software community puts into copyleft, I expected to easily measure this effect. In all honesty, I was very surprised to not find an effect, even in this rather crude bulk study.

Yet, thus far, I can't. That suggests that (possibly) no such threat really exists (or at least, it is a much smaller threat than I assumed).

Thus, this research (while not conclusive) challenges the assumption that a copyleft is needed to protect the four freedoms for users of your work or future free derivatives of it.

Personally, I still suspect that there is an impact from the licensing choice, but that it is simply a more subtle or narrow effect than would show up in this kind of study. However, even that is fairly interesting.

mattflaschen's picture

"Proprietary derivatives cannot possibly “take that freedom away” from future users (by definition, an extra alternative cannot reduce freedom). It can only give them a non-free alternative (under the assumption that this alternative is worthless, this has no impact on user freedom, because they are still just as free to use the free version)."

I know they're still free to use my original free version. I never said otherwise. The issue is that I want users to be able to modify and redistribute any and all derivatives in addition. Copyleft is the only way I can accomplish that.

"if the existence of the proprietary alternative negatively effects the production of the free alternative (for example, if the same contributions would’ve been made under a free license had the license required it, but weren’t because it didn’t)."

I've already said that I don't care whether people work on my free version. I'm more than happy if someone decides, hey, I don't like the way the original developer works and I'd rather fork. This will show up as the SF activity on my project dropping to near zero. However, if they distribute a modified version, it will still be free.

Terry Hancock's picture

Well, you may not care whether using a copyleft actually protects (or provides) freedom to users, and you may be perfectly content merely to remove proprietary options. No doubt, there are many who agree with you.

However, I do care about freedom, and I don't think I'm alone.

I am therefore very interested in whether it succeeds at achieving that goal: does eliminating proprietary derivatives contribute to the provision of free options to the user by promoting free derivatives?

That requires answering empirical questions. And that is what I've begun to explore here.

skOre's picture
Submitted by skOre on

The general misconception in this article is this:

  1. I want to look at numbers to find something out about a social phenomenon - so I look at a popular site that has numbers
  2. I set up a hypothesis which is my own opinion
  3. I take the numbers I find and try to apply them to my hypothesis

This is a logical fallacy in many ways. I will try to give you a good list of why it is wrong:

Sourceforce == Open Source <= Free Software
Sourceforge is by no means representational as it is OSI related which is more akin to the apolitical Torvalds type of programmer (your stats seem to be OSI centric as well) - take GNU Savannah if you want to find something out about Free Software.

Numbers about Software
The numbers you look at (activity, communication) say something about people and only people. The reason why you have ambiguous results is because people are people no matter what license they program under.

Numbers are complete?
I know a good lot of projects that use SF only for a free SVN. That there is less communication might be because the projects has its own forum or chatroom. Activity might be completely elsewhere as well - not all software is made equal and surely isn't developed equally.

Copyleft == Contributions?
Copyleft is about freedom and freedom only. It never was about making cooperation possible in some special way. It IS about factoring out one possibility - that software is made non-free, but that does not have to imply that it encourages contributions.

Copyleft == Protecting Freedom
You are mistaken in your understanding of what the Free Software movement means by "free". Of course it also means that people can share development of software, but we very much stress the importance that this freedom has to be protected. We say that we want to be able to take a certain level of freedom for granted and THEN we can work on software together.

Maybe some examples on where you went wrong would be handy, so let me go through your train of thought:

"...the one measure I could derive of how much copyleft actually works showed that copyleft made no difference whatsoever! If true, this means a lot of free software’s social theory is wrong and many things will have to be re-thought."

You fail to state what this social theory is (it would have helped to show how much you know about it) and why it has to be re-thought.

"However, while this sounds good in theory, it’s a challenge to prove it quantitatively. Fortunately, there is Sourceforge."

Because the only thing you can prove something is by quantity? Oh, and it certainly does NOT sound good in theory. Weasel words?

"The easiest thing to test is “How popular are copyleft licenses?”"

Oh yes, because that is a number we have somehow completely missed to check on.

"If you ask most developers, they’ll say they want a copyleft, because they want to avoid their work being co-opted or exploited (or even “hijacked”). That implies an assumption that a copyleft license will encourage more people to feel more comfortable about contributing their work to the project."

No, it does not imply that. You are projecting an assumption. The only thing it implies that the GPL discourages any EEE strategy.

"That led me to a working hypothesis that: “Copyleft projects will be more active than non-copyleft projects”"

Which is where you went wrong. How does the potential to cooperate imply that the gold standard to measure success is the overall activity? But I know why you chose that hypothesis:

"And that hypothesis is testable with the data I have available."

Well, how terribly convenient.

Then your numbers show that all activity is very similar despite license difference and that leads you to the conclusion:

"Copyleft has no practical advantage?"

Very cleverly put into a question. Well - you assumed that Copyleft was created to have a practical advantage in a certain field and your data is inconclusive. That, to you, means that your hypothesis is proven.

Conclusion:
You found out that people have somewhat different levels of activity.

The "Open Source" world is certainly very similar and thus you get pretty much similar results between groups within it. Still - all that has nothing to do with copyleft and especially not with Free Software as social concept. You might be better off to check what you test for next time.

I'm really surprised that this article showed up on a site centered around Free Software. It has a terrible taste of FUD and confusion.

Terry Hancock's picture

Copyleft == Protecting Freedom [...] we very much stress the importance that this freedom has to be protected [...]

Exactly.

That is the assumption that is being tested here: IS copyleft necessary to protect the freedom of free licensed software?

So long as the free software projects are unaffected, the existence or non-existence of proprietary forks is a red herring. For those who only use free software, they are irrelevant.

Thus copyleft only promotes freedom for users if it promotes the production of free software to serve them. If it solely eliminates proprietary software (but does nothing to create free replacements for it), then it is a net loss for society (how much of a loss is debatable -- some free software advocates would say very little, but the point is that it is not a gain).

It is only a gain if copyleft succeeds in protecting free software development. Or to put it another way, proprietary software can only have a negative impact on user freedoms if it somehow takes away free options, not if it merely provides non-free options. By definition, additional options can never reduce your freedom.

Your arguments are actually based on a miscalculation of "opportunity cost":

It is exactly the same fallacy as occurs when proprietary media companies attempt to compute "lost sales" from "illegally downloaded" media by multiplying the downloads by the per-copy sticker price. They are of course assuming that every person who downloads for free would, if that ability were taken away, buy a copy for whatever sale price the company offers. Which is a ludicrous assumption, of course.

Similarly however, copyleft advocates often make the mistake of assuming that each proprietary derivative of a non-copyleft work represents a lost free derivative if only the work were copylefted. But of course, in some cases, the derivative in question would never have been created if it weren't allowed to be proprietary (hence nothing was actually lost by allowing the proprietary derivative).

Meanwhile, free derivatives may continue to be created, with or without copyleft. They, of course, are unaffected by the licensing.

Now of course, just because it's unproven doesn't mean it's untrue: it could be that proprietary forks do have a negative impact on the creation of free software and therefore on the "user freedoms" that result from it. But again, that's a factual question. And it's the one I was interested in answering.

As for choosing Sourceforge, it is simply one of the largest collections of free software development projects (where would you have gone to study this?).

The ideological considerations are essentially irrelevant to this particular point (the effect in question is objective, so it really doesn't matter what the participants' politics are). However, since 80% of the projects are under copyleft licenses that would seem to refute your claim that Sourceforge is biased against them somehow: clearly most SF developers agree with your insistence on the need for copyleft.

skOre's picture
Submitted by skOre on

Your hypothesis was that you tried to prove the relationship between productivity and license used.

Now that I have shown you that it is impossible to "prove" something like that in the first place, especially with the means you employed, you try to move the argument to whether or not copyleft actually protects development or not.

This is not (and never has been) in question. There are numerous examples where the GPL helped protect Free Software projects from proprietary derivatives.

The major difference between the GPL and non-copyleft licenses is that the GPL side says "hey, if you think its good to share, why don't you share as well?" where as the MIT/BSD/etc. folks say "everything is fine with me".

Reality has shown that a lot of proprietary vendors are just waiting for a weak spot to use for their own gain (TIVO anyone?). After all, its a logical consequence - most of the big players in software are solely profit oriented and you can never make more profit than stealing and selling something as your own.

And thats what we don't want and that is why we protect ourselves from this possibility.

There is nothing to argue about this point. It is used, it works. You seem to be more attracted to a wishy-washy "lets all share and be happy". But that doesn't work with humans.

I'm not sure why you bring up opportunity costs - as nobody has talked about this - so I guess you're trying to make a point instead of addressing something I've said.

So let me answer that with a straight reply as well: We don't care about opportunity cost. You address a dead point.

We don't care about people making software proprietary because we have excluded ourselves from this possibility. And we especially don't care about people who wouldn't write software if they cannot make it proprietary.

What I live in is an ecosystem in which participants have agreed on a common ground for sharing. The only difference from the non-copyleft folks, and this might be my opinion, is that we are a little more aware of the rest of the world and don't want to allow other people to be fools. It is a very complex and expensive understanding of fairness, but I'm convinced that its prevalence shows how successful it is.

(as for Sourceforge - I did mention Savannah - excluding that from your calculation means that you left out large chunks of GNU)

(you also still seem to be confused about the difference of Free Software and Open Source. Please read about it.)

pachelbel's picture
Submitted by pachelbel on

Terry, I've been watching the popularity of your article and the fluctuations clearly show you hit the point right in the center :-)

In the comments I see advocates of both GPL and "most of the others", I mean MIT/BSD/Apache like. I don't think there is anything wrong with any of the 2 approaches. BSD has survived for 30 years despite a big hurricane in the 90's and GPL is being challenged in court now to protect its followers. This sounds all fine to me: a developer can perfectly decide what he/she wants to do with his/her own code, and even make it proprietary - geeks may traditionally live on junk food, they still have to pay for it :-)

To answer briefly to John Edwards, the popularity of Linux is partly due to the lawfight that BSD had to face. There was another project picking up at that time and the main developers massively jumped on it thinking "lawfights never again" - and the GPL was the right approach to avoid it. So the Linux crowd should give a warm thank you to the people they usually hate, i.e. a proprietary editor (AT&T in this case) :-)

Finally, to follow up on the "project complexity" point I raised, we would need a metrics like "function points" to estimate the real extent (but then, do you take into account libraries, artifically inflated code with useless functions...) and the degree of redundance of the project (I could start 10 text editor projects in Sourceforge but I'm not sure it would be a major contribution to the OpenSource community). In the end, whatever metrics you use to give a ranking, you will always find people fighting your conclusions... What a pity.

Author information

Terry Hancock's picture

Biography

Terry Hancock is co-owner and technical officer of Anansi Spaceworks. Currently he is working on a free-culture animated series project about space development, called Lunatics as well helping out with the Morevna Project.