anonymization problems/privacy concerns

missing tools for anonymizing data from social sciences

Panton Principles?

* make an explicit and robust statement of your wishes (reviews, questions,...)
* use a recognized waiver or license that is appropriate (explain, give guidance)
* no restrictive clauses like non-commercial - Open Definition
* explicit dedication to public/open domain

application to humanities?

* data is too narrow for humanities -> "a work" can be more
* description  of re-usability of elements
* possibility of adaption is important
* Open Content Definition
* recommendation of explicit declaration of copyright status (cc by, cc by-sa)

Goals of the session:

* A set of clear principles
* perspective of humanities on open (data)

- accessability of works
- stuck at certain types (books)
- referencing problem: annotations
- no clear way to refer to place in text - nilf(?) annotation format
- problem with other media types (video, audio): privacy
- re-usability only of results (annotations,...), re-using original data very tricky
- unintended consequences / use-cases
- risks of e.g. survey participants, interview partners not addressed
- distinguish between data types in humanities: different principles needed
- not "what are open humanities" but "what are Open Data in the humanities"

- rephrasing of work for data valid? distinguish data from works
- problem of crediting annotations

- underlying principle: permissive and/or prohibitive?
- to which disciplines can these principles be applied?
- where is the link between Open and Digital Humanities? Digital: use/explore how computers can be used in the humanities - open as in Open; other link: publicly funded efforts are in the public domain (museums, ...)

group discussions about problems & solutions:

privacy concerns:
- tools that help making data compliant
- risk management: publish guidance on how to  different data formats (manuscripts, paintings, sculptures) (museums, archives, libraries)

- what are data in the humanities?
- research materials instead of data
- talk with researchers in the field
- works are not data; blurry definitions

- attribution
- foreign use of assembled datasets and possible loss of research results

- clear definition of what are data in the humanities is needed badly
- problem this definition could be context dependent


(level 0 problem)
data in humanities = in books
books are not digitesed
use pirate editions?
        library genesis

level 1 problem
no ways of REFERENCING moments in books
= no URI for books !!!!!!!
        not unlike
technically possible, but not done!

"How to put existing materials into the public domain?"


Is principle 1 permissive or prohibited?

"panton principles are not scripture"
rethink and make a crowd effort

in humanities:
        but there is also other types of "objects of study" that appear in humanities

OKFN has a working group for OPEN HUMANITIES
Join the Mailing list!
Some projects:
. OpenPhilosophy
. OpenShakespeare
. Textus project
. OpenAnnotation

Open Humanities vs Digital Humanities ?
        "Digital" doesn't care about licenses, openness, etc
A lot though is publicly funded, so gets in the public domain
Access problem:
Public funding pays for projects
books cost money to print
digital release not done because then "no one would buy books"

annotated / "improved" editions do not maintain the openness of the public domain work that it is based on
solutions ???
        idea: a diff for humanities
        micro attributions

positive example:
TOPOI humanities publishing
in berlin
they are commited to digital and open publishing
        2 years after release it becomes PD

open humanities vs open data for humanities (?)
        this is what binds to the panton principles
participatory research ..?

----------------------> group work

        publish guidance on data in diff formats
        considerations to be legal compliant / risk mgmt
2. problem of attribution
        use research materials instead of data
        motivation for research is attribution/citation
        ... so DOES THIS collide with specific levels of openness (like CC-Zero)

3. "fears of getting scooped"
        you assemble a dataset
        then "someone comes and scoops/publishes the best stuff"
        so where is the incentive to publish primary dataset?
4. work is not same as data
        forget about "data" in humanities
        humanities is not science
        "panton was bout legal protection and public domain"
        OpenBibliography : open data for catalogs
        "We need to give a list of things we think are data" !!!
        assemble a list/examples of open data in humanities?
                "list of poets"
                "Map of parnasus" this somewhat "para-humanities"?
        de-contextualized data

