This is a read only archive of pad.okfn.org. See the shutdown announcement for details.

content-mining-workshop Purpose of document: 
    
    To plan practical exercises for content mining to teach relevant software tools and explore possibilities
    a) For the Oxford Open Science meeting on 27 Nov 2013 http://science.okfn.org/community/local-groups/oxford-open-science/
    b) More generally for a package of content mining training material that can be used online and offline
    
Contributors:
    Jenny Molloy
    

Meetings (add name to list if you can make that date/time):
    
Meeting #1
Timeline

5 mins -   Content mining: Intro and what you are allowed to do
15 mins - Iain Emsley presenting on text mining with python - conference tweets and books 
10 mins - The power of content mining (mining graphs)
80 mins - Hands-on
10 mins - Demos and wrap-up

Hands-on Sessions

Twitter mining and visualisation with Iain
https://github.com/austgate/openscience

Systematic mining of the literature using AMI
https://bitbucket.org/petermr/ami/wiki/Oxford_Launch

Pre-installation:
Documentation

Moving On



### Possible exercises/problems:


PMR's Idea:
    
    Quick overviews:
         * http://chemicaltagger.ch.cam.ac.uk/
         
* use BMC as corpus (primarily HTML) and choose bioscience where everyone can feel comfortable (e.g. species)
* get people to preload simple tools (we'll use wget, grep, etc.) Linux does this. Windows will need cygwin or better Enthought Canova BashGitHub. I dont grok MAC but people managed it. BTW ppl are thinking of having a SWCarpentry in Ox so there could be some useful contacts there.
* use wget to download several papers
* use grep to extract italic sections with species names in them
...
Then move on to advanced studies
* Tabula - I am working with these people. It's a nice tool for analyzing PDF tables
* AMI - I and Ross will provide. We'll do phylo trees and choose 2-3 which work and then get ppl to find others

### Content Mining Problems - what would you use it for?

Mat Todd's OSM Idea:
    https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/99
IPCC Report stuff - how many references are open access?