This is a read only archive of pad.okfn.org. See the
shutdown announcement
for details.
fail
Welcome to Etherpad for the #fail2016 workshop at #icwsm!
Please feel free to join in the discussion on "Things that didn't work out in social media research - and what we can learn from them"
--------------------------------------------------------------------
#fail experiences from the audience
--------------------------------------------------------------------
Please feel free to share any practical examples from your own work here.
--------------------------------------------------------------------
Categories: what can go wrong in social media research?
--------------------------------------------------------------------
Please add / modify / comment!
Notes from 3rd workshop (ICWSM-16)
Keynote Munmun de Choudhury:
- studying mental wellbeing - which is often happening "offline". So how to combine different types of data to learn about online and offline behaviour.
- there are many social media where you don't have to identify. This is important for people with mental health illnesses, as they often do not want identify.
- how to figure out that an observation (e.g. language change with new mothers) something due to the specific cohort, or anything that you could generalize.
- LESSON LEARNED: you need ground truth data. You cannot study this by only looking at online data.
- You have to involve the users: if you go into talking to the people you are studying, how do you scale?
- demographics: how do you measure the prevalence of depression - if you work with Twitter data you have to keep in mind that Twitter is not used equally across the US.
- interventions in social media environment: e.g. "thighgap" is blocked on Instagram as a searchterm.
- ethical challenges: what happens if people realize that they are being monitored, even if a researcher may have the intention to help them.
- does the duty to act apply, if the judgement of someone being at risk of suicide is done by an algorithm (e.g. based on social media data).
Yenn Lee:
- Hyperconnectivity and Hybridity and Fluidity.
- Anonymity / Pseudonyminty.
- bamboo groves - accounts which display the password on the website, so that everyone can use them.
- high time of this phenomenon was around 2012. Is research on it still relevant if you start studying it 1, 2, 3 years later.
- psychological toll on researchers.
- time: what effect does the time you do a study have on your resutls.
- platforms.: cross-platform studies.
- participation: how much should I involve myself into phenomena (e.g. when studying misogynic online platforms).
--> boundary drawing.
- dicision not to persue any interactions with indivuduals (because these people were really
Comment from audience: "You always create social responsibility. You don't just start a new project, you get involved with people. It will also affect you personally."
Isabella Peters:
- good thing about classic bibliometrics with Scopus etc.: you know the gold standard, you know how much information is in there.
This is different in altmetrics.
- popular do it yourself tools in altmetrics, because the approach is demographic.
- we already know that results in altmetrics depend heavily on the tools and aggregators you may use.
- disciplinary differences in how researchers use social media such as twitter.
- compare different tools that are used in data collection.
- bugs in data collection systems
- different entry barriers for tools: some tools work only with windows, some have to be paid
- no tool is looking at "all" social media platforms, so you always get a certain bias.
LESSON LEARNED: different altmetric tools all perfom differently
LESSON LEARNED: setting up a data collection logic / search query is already difficult (e.g. collect DOIs to search them in altmetrics tool).
LESSON LEARNED: we are likely to underestimate real numbers
- question: how do you increase comparability, collect all data at the same time.
- challenge: pick the right time span. And be aware that social media usage may change across seasons (e.g. Christmas vs. exam time).
- visualization is another challenge.
- scalability. Some tools are not equipped for big datasets.
- different concepts e.g. "like", "share" work different across platforms.
Approach: NISO Altmetrics Data Quality Code of Conduct - desciption of how they treat their data.
--------------------------------------------------------------------
Potential venues for future workshops?
--------------------------------------------------------------------
--------------------------------------------------------------------
Things to do differently at next workshops?
--------------------------------------------------------------------
Tipps for logistics?
--------------------------------------------------------------------
Key aspects for workshop summary
--------------------------------------------------------------------
Which part of todays discussions should we report at future workshops?
--------------------------------------------------------------------
Notes from previous workshops:
--------------------------------------------------------------------
From first workshop (#websci15):
- users:
- how to involve users in social media research?
- lesson learnt: Participantsmay be creative in their use of technology – flexibility is needed.
- lesson learnt: users may perceive social media differently than researchers, both may have differen definitions e.g. for "social network"
- tools:
- How to asses the quality of specific tools?
- How to share/document best practices for working with specific tools?
- How to understand if other studies have been affected by problems inherent to specific tools.
- suggestion from audience: always get in touch with tool developers when encountering problems
- content:
- how to capture dynamic content from social media?
- data collection: difficulties in finding the balance between too much data and loosing interesting data. --> lesson learnt try out different periods for data collection. Shortening the collection period may help to reduce data volume by maintaining all data properties.
- lesson learnt: download and document all data that is used for content analysis. Some researchers also use screenshots as proofs.
- methods and algorithms:
- lessons learnt: even "standard" procedures have to be questioned, example: the p-value
Random notes:
Notes from second workshop (#ir16)
- Archiving:
- URLs may vanish (Question: linearrate of decay?)
- Images missing
- Platforms changing (moving target!)– not just about the interface!
- Visualization of results
- Word cloud (compare histograms)
- Tools
- sentiment140, Internet Archive, GNIP
- Methods
- Content Analysis:
- replicability? Validation?
- Contextfor social media contents (e.g. surrounding tweets).
- LIWC, General Inquirer
- Sampling
- Combining data analysis and qualitative approaches
- Predictions
- „Data Science“
- Data Quality:
- Can we still cite/use data andresearch published in 2007/2008`?
- Baseline? (how to define for amoving target)
- Theory/ Epistemology
- Can we only do descriptive work for single platforms?
- Look for the theory instead for the data?
- Meta
- Systematic review of existing literature is needed
- Interdisciplinary approaches needed
- Conceptual triangulation
- Documentation
- Timeframe generalizaion
- Document time, cultures?
- How long will my results be valid?
- Have a general base for comparison
- Data sharing?
- Ethics
- Guidelines?
- Ethics approval processes.
- Consent? How to get consent before starting data collection (e.g. if you have to identify the users you want toget in touch with based on high activity in a social media platform)
- “latent metadata” – it’s not jus tpublic data, it gets aggregated
- Data sharing rules?
- Chance of re-identification growswith data size.
- Users
- User movitations vs. collected data/ data analysis
- What are users’ expectations for ethics
- How do users perceive a platform
- What if users disagree with theresearch results?