The Research Works Act (RWA): why Scientific Publishing needs FOSS Methods

I'm sure I don't need to explain SOPA or ACTA to regular readers of Free Software Magazine. They're toxic. End of. But RWA? It stands for Research Works Act. It's not the big sexy beast of the other two but it is, in its way, just as insidious and as harmful to the freedom of scientific publishing as SOPA and ACTA are to internet freedom and it's all interconnected. Here's why.

Traditionally, scientists do research and publish the results in Journals. In my day, that mean dead tree publications. In the age of the internet, the journals have gone digital but, paper or electronic, access to them was and is problematical. One massive hurdle of course is cost and the other hurdle is the ability to access and use them freely and openly (with attribution). The central issue is similar to the choice between proprietary software like Windows and Apple and software based on GNU/Linux: one is closed but the other is open and free under the GPL. Of course, Microsoft and Apple are private companies but those scientists working in Universities and research institutes funded by government, that is to say, the taxpayer, are effectively public employees.

In its wisdom therefore the US government decreed that the results of all scientific research paid for by the American taxpayer (except for matters reserved under national security) must be be made publicly available. That is why citizens of that country have the Public Library of Science (PLOS). I can do no better than to sum up their mission than to quote verbatim from the site:

We are a non-profit publisher and advocacy organization. Our mission is to accelerate progress in science and medicine by leading a transformation in research communication. Everything that we publish is open-access - freely available online for anyone to use. Sharing research encourages progress, from protecting the biodiversity of our planet to finding more effective treatments for diseases such as cancer.

Further, they apply the Creative Commons Attribution License (CC-BY) to all published research and recogniizing that science is international, all content is available to everyone, everywhere; including the ability to have research from all nationalities hosted on the site. This happens because the Federal government decreed it should, through the National Institutes of Health (NIH). (However, this deals only with medical and biological sciences.) This contrasts well the situation in the UK, my stamping ground.

The UK is another country. We do things differently

So far, so British. They do, or rather, did things differently in America

In the UK, by contrast, there is a tendency to double charge for information: once, for research paid for by the British taxpayer and twice, for the self same taxpayers to access that research and apply it in new directions. An example of this is the Ordnance Survey which charges for information already funded by the taxpayer. That caused the launching of OpenStreetMaps. However, it's not all bad news. CERN's model was and is scrupulously open both in terms of the software it uses to search for the Higgs Boson and public access to research. DIY biology has taken a few leaves out of the book of FOSS too. So far, so British. They do, or rather, did things differently in America.

Meanwhile, in the USA

The USA has always been known as a place of freedom and innovation (which is why it, and not the UK, has produced global brands like Apple, Microsoft, Facebook and Google) but the grip of big business and its corporate lobbying power loathes such openness and is constantly coming up with new threats to internet freedom and the information it disseminates. Thanks to fellow FSM writer Terry Hancock, I replicate below a list of legislative threats from his review of an excellent (free) book by the US Pirate Party:

To that depressing list we must now add the Research Works Act (RWA). Or H.R. 3699, as it is known. That sounds innocuous enough but dig beyond the anodyne title and some nasty stuff begins to emerge. The bill has two sponsors, Congress members Carolyn Maloney and Darrell Issa (both of whom received funds from Elsevier). The basic thrust and intent of the bill is to prevent publicly-funded research being made available for free. In short, to restrict access behind a hideously expensive paywall called bundled subscription.

Elsevier's profit margin is 42%, a margin that would bring tears of joy to many British and American high-street retailers trying to get by on margins in low single figures

Those paywalls are operated by all the big names in academic publishing: Springer, Wiley, Pearson and above all, Elsevier. For many in the Open Access community Elsevier has become a Microsoft-like bogeyman. With good reason. It extracts eye-watering amounts of money for subscriptions and access to papers. That's why it (and the others) returned profits of $1.9 billion on revenues of $5.3 billion based on 2,637 journals. Their profit margin is 42%, a margin that would bring tears of joy to many British and American high-street retailers trying to get by on margins in low single figures.

Elsevier, SOPA and PIPA. A duplicitous unanimity

That's bad enough but Elsevier and other academic publishers really stuck the knife in when they decided to back SOPA and PIPA. For many it was the final straw. Scientists starting organizing boycotts. Tim Gower, a British mathematician (and Fields medalist), is a key name here. His boycott of Elsevier may become an iconic text. If all of that hasn't annoyed you enough, you can add publishing fake journals to protect a pharmaceutical company and possibly endangering lives in the process.

here's the inside scoop. Academic publishers need academics but academics don't need them

These profits would make Goldman Sachs green with envy and yet these academic publishers have relatively low overheads to leverage such profits. The much vaunted peer review process is provided free of charge by academics and they submit their papers free too. The only added value added by Elsevier and others is secretarial and administrative. Talk about easy money or, as we say, appropriately, in this neck of the woods, money from America. The American government and taxpayer is clearly getting bad value for money. Let's be blunt about this: these academic publishers are parasitic and, as Doctor Perry Cox would say, here's the inside scoop. Academic publishers need academics but academics don't need them.

Since the advent of the internet, they never did. The solution has always been in their own hands. Don't boycott. Get keystroking. In the words of Tim Gowers: "the basic question remains: why do we allow ourselves to be messed about to this extraordinary extent, when one would have thought that nothing would be easier than to do without them?" So, if the publishers are the problem, what's the solution? Open Access (OA), a concept that that borrowed heavily from the FOSS way of doing things. The internet has changed everything and that plus the open source way of doing things will, in Gower's words, be the downfall of Elsevier.

However, before I outline the various Open Access methods for science publishing I need to deal with one disingenuous claim made by academic publishers like Elsevier. Jobs and profits. Long before the Research Works Act surfaced, these publishers were making representation to Congress (and here) that the OA model would cost jobs but it was pointed out that the NIH OA mandate had been in existence since 2008 without any noticeable effect. Besides, it is not the business of science to worry about Elsevier's profits, anymore than users of GNU/Linux should bite their fingernails to the quick worrying that using free and open software (and or open hardware) will deprive Microsoft and Apple of even bigger profits or having to employ fewer people. Oh, the humanity of it. Do you think Elsevier executives spend sleepless nights worrying about the way they fleece academic and research institutes of large sums of money for a restrictive and poor value product which, as I will demonstrate, can be duplicated and vastly improved for hugely lower costs and with control which may improve the rate of innovation?

Open Access (OA)

So, what exactly is Open Access (OA)? Simply put, it is not a method of doing science so much as how to publish and disseminate scientific research. Historically, it has evolved through three definitions, known collectively as the BBB definition: Budapest, Bethesda and Berlin definitions. The common definition can be broadly summed up as for copyright holders to consent to permit "copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship". If this suggests irresistibly variations on the Creative Commons ethos, you wouldn't be too far wrong. This effectively removes the permission barriers and depending on how things are setup, the price barriers too. (For example the PLOS, for all its merits isn't exactly cheap. It can charge up to $2,900 per paper in publication fees.)

With these barriers removed or substantially reduced one other major barrier is reduced or removed altogether. Time. It is not uncommon for scientists to have to wait for anything up to a year before a submitted article sees the light of day in a peer-reviewed journal. The rise of OA can reduce that delay hugely and, with the right publishing model, abolish it altogether. Better still, the saving to be made can be ploughed back into the revenue and capital costs of more research rather than subsidising an inefficient, outdated way of doing things.

Just as there are many open sources licenses in software (GPL, MPL, LGPL, BSD, etc) and many packaging systems (apt-get, yum etc) spread across innumerable distros too, there are many variants in OA. The main division is between Gold OA and Green OA. In the former, researchers submit papers to OA journals and in the latter they publisin in traditional non-OA journals but also self archive their peer-reviewed draft papers in their own OA institutional repositories. EPrints is a good example of these kinds of repositories. If you follow that link you will land on a page which will warm the cockles of every user of GNU/Linux: software downloads for Debian/Ubuntu and Redhat/Fedora as well as the inevitable Windows. There's even a live CD if you want to take it out for a spin. And as it's based on Ubuntu it can be ported over to run on a USB stick via software like UNetbootin or Startup Disk Creator which comes bundled with Ubuntu. You can self host (just like Wordpress) or use EPrints as a service hosted by EPrints itself. Many British universities use it and it has been adopted by Novartis, the Swiss pharmaceuticals company. The Open Access directory (OAD) has a list of all the available free and open source repository software.

Of course, one of the obvious potential problems with OA is the danger of "vanity" publishing. Elsevier will tell you, conveniently, that they are the ultimate guarantors of peer reviewed quality science. Of course this can be a threat to quality science but the problem is known and has been identified. You can find a list of these vanity OA publications at Posterous. So, scientists know what to avoid and the sites listed above indicate how to contribute to genuine OA journals with rigorous standards and proper peer review.

A better peer review model

Outside of the scientific community peer review was not widely known. The scandals at the IPCC and the Climate Research Unity of the University of East Anglia brought it into the mainstream. With the advent of the enabling power of the internet and OA it has assumed a critical importance. Scientists are aware of the dangers of adopting OA models and letting peer review standards decline so they make special efforts to duplicate the broad methodology of the traditional, closed journals like Elsevier. A subset of peer review though, are the issues of impact factor and citation. These are critical for the reputations of scientists, their publications and their careers too. OA addresses these too. For example, take a look at Scholastica, which describes itself as "academic publishing done the right way". It provides a one-stop-shop which ticks all the boxes: fast publication, peer review, integration with arXiv (via their open-sourced arXiv Ruby gem) and many other features promoting the efficient and more democratic publication of research. Scholastica costs to users is committed to never exceed $5. Compare that to Elsevier, and even PLOS. The saving are immense.

Scholastica costs to users is committed to never exceed $5. Compare that to Elsevier, and even PLOS. The saving are immense

I mentioned arXiv. It has been around for some time and predates the emergence of the latest tranche of OA self-publishing tools. Essentially, it can be described as an electronic archive for preprints of scientific papers in a wide variety of fields. ArXiv is considered the concept that precipitated the rise of OA. One of its most famous "scalps" was the publication of an outline of a proof of Thurston's geometrization conjecture by Grigori Perelman. It only ever appeared as an e-print. It never appeared in any "reputable" peer-reviewed journal and Perelman himself said, "If anybody is interested in my way of solving the problem, it's all there, let them go and read about it." On issues of copyright, arXiv gives authors a number of options, from public domain to Creative Commons licenses. The Science Code Manifesto goes further and insists on open data, source code and citation. It is based on the Panton Principles which states boldly what true OA should be:

By open data in science we mean that it is freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published science should be explicitly placed in the public domain

That's the kind of stuff that keeps Elsevier executives up at night. No profit here folks, just free and open data and research results

That's the kind of stuff that keeps Elsevier executives up at night. No profit here folks, just free and open data and research results. All data to be released under the Public Domain Dedication and Licence (PDDL) or Creative Commons Zero Waiver (CCZero). The latter is incredibly permissive though and scientists might want to think twice about using it as commercial undertakings could exploit it (although it should be emphasized that the license relates to databases and the CC0 is compatible with the GPL). Nick Barnes, the author of Science Code Manifesto absolutely gets the free, open model for doing science, especially coding and wrote an Op.Ed piece in Nature on coding which could have been written by Richard Stallman. Barnes admits that it is not original but an extension of the core principle of science: publication. Combine open source code with OA publishing on the internet on personal archives and or institutional repositories and you have the perfect storm to sink the most robust commercial scientific journal.

Altmetrics. Peer review for a digital age

If you want to see what that looks like in action take a look at Figshare. It ticks all those boxes in the last paragraph. In addition to the usual measures like impact factor and citation, Figshare (and others) are deploying altmetrics which, among other things, compensates for the bottleneck of the peer review process and makes impact factor and citation more transparent. Altmetrics effectively crowdsources peer review. That's very important because these metrics have a very real bearing on reputation, scientific careers and grant applications–and this links back to what is wrong with commercial academic publishing.

JIF data (journal impact factor) is sold by Thomson Scientific to journals that wish to change the factors involved in JIF for commercial advantage–as scientists will wish to publish in journals with high metrics. More disturbingly, the method by which Thomson actually determines these factors is a trade secret. Altmetrics may be a more effective (and honest?) and dynamic way of doing peer view and there is already a Javascript bookmarklet for that that works with Chrome, Firefox and Safari (IE and Opera to follow).

it's time that scientists made the transition from static imitations to more dynamic models

In a sense though, all of these commendable alternatives to the outmoded, expensive and restrictive models of science publishing exemplified by publishers like Elsevier are good as far as they go but they are largely static clones of the analogue era of doing things. It cannot be said often enough that the internet has changed everything. It's as big, if not bigger, than the industrial revolution in impact. So, it's time that scientists made the transition from static imitations to more dynamic models. In short, science needs to do it the GitHub way.

More does not necessarily mean worse

Every software developer using GNU/Linux probably need no introduction to GitHub. It is the repository of choice which superseded CVS and Subversion, the repository of choice for the Kernel developers. It, and the client GUIs designed to interface with it, is the main vehicle for storing, sharing and disseminating software. It improves code, debugs it ("more eyes make shallow bugs"), innovates and accelerates. Unlike traditional journals, results are not bottlenecked by peer review, costs or committees and boards. Indeed, in addition to concerns about quality control, peer review also arose as a consequence of space limitations in traditional print journals. In the internet era that limit has effectively disappeared. Space and distribution costs are negligible. What this means, effectively, is that peer review has not been abolished but shifted. Where? To post publication.

"More means worse" is a well-worn mantra in response to the expansion of British universities in the 1960s but this is not true of concepts based on GitHub. Software and coding stand or fall by how good they are. If they're good, they're popular. If they're popular they'll be downloaded, forked, added to, improved and debugged if necessary. It's really a version of Google's page ranking except that the credit and prestige accrues to the individual scientist, not the website. That creates what scientists need, what has been called "prestige metrics". This rewards citation and can add value to a CV (curriculum vitae) and assist with appointments, grant applications and tenure. That would made Elsevier redundant at a stroke. All the tools to do this already exist. It's a matter of organization and will to make it happen. The only substantial barrier to adopting the GitHub way of doing things is that, to state the obvious, code is not raw scientific data. Code can be copied but duplicating the experiment to verify the data is a different matter but this would be without prejudice to publishing models.

It's only a small step from dynamic websites like GitHub to bespoke GUI web apps on the networked desktop in order to leverage the API capacity for functionality mashups. If the API is free and open scientists can use it to not only develop better publishing interfaces but also better ways to manipulate raw data and innovate and accelerate progress–in the same way that biologists recognized that mapping the human genome was a matter of computing power. Publication and data converge. MyExperiment seems to be going in that direction, combining all those elements. If GitHub did science and software publishing the Elsevier way, it might however look something like this.

David can slay Goliath

Students of British history will well remember the conflict generated by the enclosures dating back to the thirteenth century, a political and agricultural process by which land was progressively enclosed to provide land for sheep grazing. This led to the increasing privatizing of land, concentrated in the hands of a political elite, often ably assisted by corrupt enclosure commissioners. The toxic troika of power, money and land was born. You don't need to be a genius to see parallels in the modern, digital age with the symbiotic relationship between big business lobbying, hardware/software vendors and central government. That's the effective subtext of SOPA, PIPA and ACTA. Now we can add RWA to that ever lengthening list.

Elsevier is yesterday's solution to tomorrow's non-existent problem

With the recent (temporary?) victory over SOPA it may be that we've discovered a methodology to untangle some of the modern enclosures that threaten the digital commons. The term, "the tragedy of the commons" caught on a long time ago and became a term of pejorative disparagement. The real tragedy is that knowledge has been caught in the duplicitous pincer movement of big business, money, lobbying and big government but with open-source software, open-source methodologies and the ability to leverage them and apply them to scientific publishing it is possible for David to slay Goliath. Elsevier is yesterday's solution to tomorrow's non-existent problem.

Postscript

Just as I was about to put this piece to bed I picked up on some good news, news that the RWA is being challenged by a legislative counter proposal called the Federal Research Public Access Act (FRPAA). It's not perfect, it doesn't go far enough but it might block the RWA. As the name suggests, it seeks to keep publicly-funded science in the public domain, earlier and free. It's a start.

The Research Works Act (RWA): why Scientific Publishing needs FOSS Methods