This is a read only archive of pad.okfn.org. See the shutdown announcement for details.

31c3opensciencemeeting Intro

This is a rough protocol of a meeting of Open Science supporters that took place at 2015-12-27 as a session of the 31th Chaos Communication Congress (31C3), Hamburg, Germany. We were around 20 people with different scientific background (math, astrophysics, psychology, life sciences and others). The event was intially planned for networking but was then used to discuss different aspects of open science and refer to different resources.

Participants


Resources and connctions to the community




Open Access
                                                                                                                                                       
Literature: Publishers used to have the copyright for your work and sold your work for profit. Tax payer pays twice (for your salary and then for the libraries to get the publications back).
                                                                                                                                                       
Open Access defines licenses and allows more or less free access to your  works (e.g., Creative Commons - https://creativecommons.org/ )  
                                                                                                                                                       
- Gold: Author pays publisher; 
- Green: Self-archiving, grassroots initiatives. 
                                                                                                                                                       
Examples for OA journals: PLOS, elife, faculty1000research, PeerJ

Also: Preprint servers, in physics and neighbours http://arXiv.org has made a major impact. Recently: http://biorxiv.org/ for life sciences.
                                                                                                                                                       
epiSciences/epiMath (http://www.episciences.org/ ) - building a  publishing system around repositories
http://www.nature.com/news/mathematicians-aim-to-take-publishers-out-of-publishing-1.12243


Open Data                                                                                                                                          
                                                                                                                                                       
Publications are only the last step in science; Open Data is about previous steps. 
                                                                                                                                                       
Reusability (Licenses!), reproducability.
                                                                                                                                                       
Panton Principles: Use the freest license possible (e.g., CC-0) to make mixing of datasets possible. http://pantonprinciples.org/
                                                                                                                                                       
Common worry with open data: Will people outpublish me if I make my data open?  But see http://datapub.cdlib.org/closed-data-excuses-excuses/ for common excuses to keep data closed -- including good arguments against.


Open Source                                                                                                                                        
                                                                                                                                                       
Source code and data are bound in very many ways, so it's hard to distinguish OD and OS. 
                                                                                                                                                       
Next step: Workflow engines to facilitate reproducable science in a world of evolving software.
                                                                                                                                                       
Main problem at least in some disciplines: Code by scientists tends to be really bad, so they're embarrassed to publish it even if they're sympathetic to the idea. Question of culture and education to accept and improve this.


Do we need publishers as filters?
                                                                                                                                                       
Outrage fact: Elsevier's profit margin ~ 40%. 
                                                                                                                                                       
Publishers are gatekeepers to publication and jobs, which is how the keep their stranglehold on science.  In an age where every child can put out cat videos, they're certainly not needed as distributors. 
                                                                                                                                                       
Decouple peer review and publication?  Post-publication peer review might actually improve quality control. 
                                                                                                                                                       
New style journals with manuscripts public from day one and an open review process: "Faculty 1000 Research"  
  
                                                                                                                                                       
Reproducable workflows

Examples: Taverna, ipython notebooks.  But: In life sciences reproducability has major aspects rooted in the physical world (example mentioned impact of mouse houses)

Literate Programming with IPython Notebook or knitR.

Politics

Incentives for making data open: DataCite (https://www.datacite.org/ )?
                                                                                                                                                       
Interaction of Credit and CC-0: Will people attribute people if data is in the public domain?  Well, there's a difference between the legal and the scientific conduct.  People stealing data will commit scientific suicide much like they commit scientific suicide when the steal ideas now (and get caught).                                                                                                                                  
                                                                                                                                                       
We have nearly everything to practice Open Science. Mainly a cultural issue - lack of incentives, fear of getting scooped  etc. cultural issues.  How to improve open science?  "Open science oath": As a reviewer, I'll bitch about non-open data and missing reproducability. See Open Science Peer Review Oath - Publication under currently under review (http://f1000research.com/articles/3-271/v1 )

At last Cold Spring Harbor meeting people who didn't put up their code got mildly harrassed.
                                                                                                                                                       
Also: objective problems like privacy issues with patient data.  But then that's just a small fraction of the data we're talking about here.                                                                                
                                                                                                                                                       
Infrastructure issues: what happens if you move from one institute to the next?  What if you have a few terabytes of data and your new institute doesn't give you the online space for that? How is that then preserved?                                                                                                                                             
                                                                                                                                                       
Self-publication vs. repositories; some people feel professionally managed repositories are absolutely needed for credible data publishing suitable for use in publications hat might be in use for centuries.  


Identifiers
                                                                                                                                                       
Citing data, allowing the location of data, requires identifiers.  Some are there and cross-discipline.                                                                                                                        

ORCID: Author identifier.
                                                                                                                                                       
DOI: Digital object identifier, for papers but also for individual data sets. Experiences with them haven't been great so far, because minting DOIs isn't universally easy, and money is involved. The difficulties don't appear to be in DOI's design, though.
                                                                                                                                                       
purl.org: a formalised URL shortener. It's used for quite a few important things, and it would at least solve the problem of data moving with its self-publishing author.  Of course, maintenance of the purl might still hinge on a individual's preparedness to maintain the purl.


Software Carpentry

NGO to that supports the teaching of computational skill to scientist.

 http://software-carpentry.org/