This is a read only archive of pad.okfn.org. See the shutdown announcement for details.

ascw15 Notes for
Quantifying and Analysing Scholarly Communication on the Web (#ASCW15) June 30, 2015, Oxford

--please add and share what you think is important or you would like to be discussed

Aim of the WS: Having more discussions on quantifying scholarly communication on the Web by all kinds of researchers from different disciplines

First talk: Alan Dix

REF: approx 5 yearly research assessments in the UK discussed in 4 panels for institutions, measures
* outputs (mostly papers)
* impacts
* environments

--> best grade is 4* , important 
--> no individual grades are kept, list of outputs is public domain, but not the grade
--> each output was given an ACM code, this helped create profiles on subject basis: e.g. HCI: 10% 4*
--> found that theoretic research is valued higher than applied research

found positive correlation between citations and REF scores (star ratings)

Key point: This is data not information - however, it affected decisions already (e.g. hiring), people from that area/ discipline won't produce good work, so they won't get hired

Findings:
1) best applied work is weak (including websci,..)
2) Long tail: weak researchers choose applied areas
3) Latent bias: despite efforts being fair

Question: Can bibliometrics untangle this?

Metrics and Assessment

Citation metrics known to be good but for individuals and small groups danger of gaming and policy distortion

Strong areas again are the theoretic ones, weak the applied ones

extremes in distributions appear to be unstable = basic maths

on average web paper needs to be in top 0.6% worldwide to geht 4* ranking
logic paper just needs to be in top 6% worldwide

Context to gender: more women in applied and human areas!

Discussion:
what's correct?
*web sci is weak science (globally)? -->  when we believe that you believe that you are 10 times weaker than theoretical areas
*latent bias?

future for research assessment?
*pure metrics give a better picture -- aggregations are bad
*just metrics? 
*metrics as part of older outputs
*metrics as under-girding (burden of proof)

Second talk: Robert Jäschke --> Response to Alan

Three levels of observation

1) macro (universities)
2) meso (departments and acm categories)
3) micro (papers)

Robert found several explanations: 
    * differences in citation behavior between sub-areas?
    * Halo effects when assessing papers from "good" institutions
    * inter-area bias
    * citation counts not a good measure?
    
Questions:
    * Can results from REF2014 be trusted
    * How can differences to citations be explained=
    * What needs to be done - is REF helpful
    * How can we handle scholarly communication on the Web

Discussion
* how can we assess the bias of visibility (e.g. when more popular users tweet)?
* are papers praised for different characteristics? more factors, ever any correlation?
* bias is always present...every channel is probably equally biased: fame of organisations is correlated with quality
* which methods are transparent?
* what is the controlling factor?
* expertise of the crowd? REF.Process: to increase overlap of evaluations of papers papers were assigned to one expert and two non-experts --> Solution? Nips Experiment?

Third talk: Peter Kraker & Elisabeth Lex

back in 2009, RG was more for managing project and for enabling collaborations
Many WS participants have a RG score but only very few use it actively

Openness is key to altmetrics/bibliometrics data --> only with open data flaws can be seen and evaluated --> Leiden Manifesto for Research Metrics is mentioned here (Hicks et al., 2015) - as a result of STI 2014 (http://www.sti2015.usi.ch/)

Journal Impact Factor (JIF) is part of the RG score - JIF problematic since distribution of citations highly skewed, not field normalized, only available for journals
Also bad practise to use IF to grade researchers --> acknowledged at the San Francisco DORA recently

Fourth talk: Katy Jordan --> Response to Peter & Liz

investigated people who have just 1 paper and have a RG score
--> evidence from this analysis showed that there is a correlation between impact points and RG score - but it is transformed by a ln (natural log)

based on multi-paper analysis, a model was developed to predict the RG score

Why is the RG score algorithm not open?
* to prevent gaming?
* commercial value?

--> found that RG score can be reproducted to an extent - 4.9% of variation within the data remains unexplained
-->FYI: RG people once told in a talk that they also take into account a pagerank-like algorithm, i.e. answers/questions from more influential people develop to a higher RG score

Is anybody using RG metrics at all?
-- Katy found that most researchers use RG as online CV/ business card --> Twitter is rather used for discussion

Discussion
* question about (lack of) openness: in order to prevent people from gaming the system
* RG doesn't check if uploaded input is correct or actually yours?
* the community can help - but how can the community handle academic misbehaviour within the community
* RG uses its score to engage users 
* RG score doesn't correlate with any other metric (altmetrics or citations)
* how to improve the RG score or any other score?
* can we use any of our indexes? they are all crude...
* open metrics are needed and a bit of context to evaluate what the metrics can tell
* is it ethical correct to use the RG score?
* observation: people with high scores do not interact too much with other users,  but: people from third countries are pretty active and try to get in touch with other communities --> research need: who is really using RG and for what purposes?
* At RG you don't always get endorsed by real people but RG infers this from people's networks (Isabella's assumption :-))


Fifth talk: Ryan Whalen

citations are not equal and have a different meaning, but: citation analysis only considers presence or absence of citations
aim: calculate distance between paper topics to better inform meaning of citations
study based on keywords (authors, editors)

highly distant citations are better predictors whether you reach high impact

6.Talk: Brett Buttliere  --> Answer to Ryan

 * what's the relationship between citations and other characteristics (e.g., authors, institutions etc.) besides words
 * the notion of "temporality" is missing in the approach
* which metric to choose depends on what we want to measure

what is good science and impact?

Brett suggests to use insights from Psychology and Social Media to answer all these questions, e.g. make a system where the people want to catch the cheaters (Skinner's Walden Two) --> embrace the good thing, not punishing the bad
-also look into science history/philosphy, e.g. Platt's Strong Inference, Popper's falsification, progress made through conflict...
and research of Kuhn, meaning making (Festinger)

let's use the content of social media not only network information!

Discussion
* there are not that many negative citations
* exploit semantic web technology? authors tried topic modelling, hierarchical clustering --> it was a time sink! :-) also tried lists of synonyms
* context of citations? that was the initial idea because they had information on citation paragraph-wise
* cito = citation ontology provides around 54 ways of citing something
* typos in the terms which occur only once? yes, also weird punctuation etc.
* keyword evolution: impact on method?  
* concept hierarchy in keywords? 
* clustering of resources based on similarity tired? not yet
* approach doesn't solve the problems of citations