fail

This is a read only archive of pad.okfn.org. See the shutdown announcement for details.

fail Welcome to Etherpad for the #fail2016 workshop at #icwsm!

Please feel free to join in the discussion on "Things that didn't work out in social media research - and what we can learn from them"

--------------------------------------------------------------------
#fail experiences from the audience
--------------------------------------------------------------------
Please feel free to share any practical examples from your own work here.

--------------------------------------------------------------------
Categories: what can go wrong in social media research?
--------------------------------------------------------------------
Please add / modify / comment!

Notes from 3rd workshop (ICWSM-16)

Keynote Munmun de Choudhury:
    - studying mental wellbeing - which is often happening "offline". So how to combine different types of data to learn about online and offline behaviour.
    - there are many social media where you don't have to identify. This is important for people with mental health illnesses, as they often do not want identify.
    - how to figure out that an observation (e.g. language change with new mothers) something due to the specific cohort, or anything that you could generalize.
    - LESSON LEARNED: you need ground truth data. You cannot study this by only looking at online data.
    - You have to involve the users: if you go into talking to the people you are studying, how do you scale?
    - demographics: how do you measure the prevalence of depression - if you work with Twitter data you have to keep in mind that Twitter is not used equally across the US.
    - interventions in social media environment: e.g. "thighgap" is blocked on Instagram as a searchterm.
    - ethical challenges: what happens if people realize that they are being monitored, even if a researcher may have the intention to help them.
    - does the duty to act apply, if the judgement of someone being at risk of suicide is done by an algorithm (e.g. based on social media data).

Yenn Lee:
- Hyperconnectivity and Hybridity and Fluidity.
- Anonymity / Pseudonyminty.
- bamboo groves - accounts which display the password on the website, so that everyone can use them.
- high time of this phenomenon was around 2012. Is research on it still relevant if you start studying it 1, 2, 3 years later.
- psychological toll on researchers.
- time: what effect does the time you do a study have on your resutls.
- platforms.: cross-platform studies.
- participation: how much should I involve myself into phenomena (e.g. when studying misogynic online platforms).
--> boundary drawing.
- dicision not to persue any interactions with indivuduals (because these people were really
Comment from audience: "You always create social responsibility. You don't just start a new project, you get involved with people. It will also affect you personally."

Isabella Peters:
- good thing about classic bibliometrics with Scopus etc.: you know the gold standard, you know how much information is in there.
This is different in altmetrics.
- popular do it yourself tools in altmetrics, because the approach is demographic.
- we already know that results in altmetrics depend heavily on the tools and aggregators you may use.
- disciplinary differences in how researchers use social media such as twitter.
- compare different tools that are used in data collection.
- bugs in data collection systems
- different entry barriers for tools: some tools work only with windows, some have to be paid
- no tool is looking at "all" social media platforms, so you always get a certain bias.
LESSON LEARNED: different altmetric tools all perfom differently
LESSON LEARNED: setting up a data collection logic / search query is already difficult (e.g. collect DOIs to search them in altmetrics tool).
LESSON LEARNED: we are likely to underestimate real numbers
- question: how do you increase comparability, collect all data at the same time.
- challenge: pick the right time span. And be aware that social media usage may change across seasons (e.g. Christmas vs. exam time).
- visualization is another challenge.
- scalability. Some tools are not equipped for big datasets.
- different concepts e.g. "like", "share" work different across platforms.
Approach: NISO Altmetrics Data Quality Code of Conduct - desciption of how they treat their data.

--------------------------------------------------------------------
Potential venues for future workshops?
--------------------------------------------------------------------

--------------------------------------------------------------------
Things to do differently at next workshops?
--------------------------------------------------------------------
Tipps for logistics?

--------------------------------------------------------------------
Key aspects for workshop summary
--------------------------------------------------------------------
Which part of todays discussions should we report at future workshops?

--------------------------------------------------------------------
Notes from previous workshops:
--------------------------------------------------------------------

From first workshop (#websci15):

users:

how to involve users in social media research?
lesson learnt: Participantsmay be creative in their use of technology – flexibility is needed.
lesson learnt: users may perceive social media differently than researchers, both may have differen definitions e.g. for "social network"

tools:

How to asses the quality of specific tools?
How to share/document best practices for working with specific tools?
How to understand if other studies have been affected by problems inherent to specific tools.
suggestion from audience: always get in touch with tool developers when encountering problems

content:

how to capture dynamic content from social media?
data collection: difficulties in finding the balance between too much data and loosing interesting data. --> lesson learnt try out different periods for data collection. Shortening the collection period may help to reduce data volume by maintaining all data properties.
lesson learnt: download and document all data that is used for content analysis. Some researchers also use screenshots as proofs.

methods and algorithms:

lessons learnt: even "standard" procedures have to be questioned, example: the p-value

Random notes:

collaboration is crucial

Notes from second workshop (#ir16)

Archiving:

URLs may vanish (Question: linearrate of decay?)
Images missing
Platforms changing (moving target!)– not just about the interface!

Visualization of results

Word cloud (compare histograms)

Tools

sentiment140, Internet Archive, GNIP

Methods

Content Analysis:

replicability? Validation?
Contextfor social media contents (e.g. surrounding tweets).
LIWC, General Inquirer

Sampling
Combining data analysis and qualitative approaches
Predictions
„Data Science“

Lack of theory

Data Quality:

Can we still cite/use data andresearch published in 2007/2008`?
Baseline? (how to define for amoving target)

Theory/ Epistemology

Can we only do descriptive work for single platforms?
Look for the theory instead for the data?

Meta

Systematic review of existing literature is needed
Interdisciplinary approaches needed
Conceptual triangulation

Documentation

Timeframe generalizaion
Document time, cultures?
How long will my results be valid?
Have a general base for comparison
Data sharing?

Ethics

Guidelines?
Ethics approval processes.
Consent? How to get consent before starting data collection (e.g. if you have to identify the users you want toget in touch with based on high activity in a social media platform)
“latent metadata” – it’s not jus tpublic data, it gets aggregated
Data sharing rules?
Chance of re-identification growswith data size.

Users

User movitations vs. collected data/ data analysis
What are users’ expectations for ethics
How do users perceive a platform
What if users disagree with theresearch results?