This is a read only archive of pad.okfn.org. See the
shutdown announcement
for details.
labs-hangouts
@OKFNLabs Hangout #15 - Thursday, April 21, 2016
Agenda
- TCP-EEBO texts and related quality, freely-licensed historical text collections (tfmorris can recap EMOP & Git-Lit from Hangout #13 if he attends)
- OpenTrials with @vitorbaptista
- Open Product Data recently rebuilt and looking for a maintainer!
- C4Corpus - open licensed multi-lingual text corpus extracted from CommonCrawl
@OKFNLabs Hangout #14 - Thursday, March 17, 2016
Agenda
New communication channelshttps://gitter.im/openspending/chathttps://gitter.im/opentrials/chathttps://gitter.im/frictionlessdata/chathttps://gitter.im/okfn/chatFrictionless Data Transport in Pythonhttp://okfnlabs.org/blog/2016/03/11/frictionless-data-transport-in-python.htmlNewsletter submissions!http://okfnlabs.org/blog/2016/02/18/submissions.htmlLatest OpenSpending developments Other Events?
@OKFNLabs Hangout #13 - Thursday, February 25, 2016
Agenda
AOB:
https://github.com/okfn/ideas/issues/41
(From tfmorris) Example of Google search which finds an image-only PDF:
https://www.google.com/search?q=mupak+site%3Abitsavers.org
also cached version contains first 20 pages of text OCR'd from images in doc:
http://webcache.googleusercontent.com/search?q=cache:rsAU3aq64YQJ:www.bitsavers.org/pdf/dec/pdp15/DEC-15-GXZC-D_MUMPS_Apr72.pdf+&cd=1&hl=en&ct=clnk&gl=us
(not sure who authored the line below - Etherpad seems to be, shall we say "lacking", in the provenance department)
ContentMine (whom I am associated with) are involved in software development http://contentmine.org/software/
Present
@OKFNLabs Hangout #12 - Monday, November 23, 2015
Agenda
- Introduction
- Who is here, what they are doing
- Questions/Discussion
- Floor for projects
- Labs Outcomes and Outputs: https://discuss.okfn.org/t/open-knowledge-labs-outcomes-and-outputs/1594
- What people are working on?
- Diane: At Montreal we are concerned about all irritants associated with of the open data available .
- We are planning to do an important activity on this theme during the OKFestMTL, 12 April 2016, a co-located events of the WWW2016.ca
- So we are very interested to contribute to the "frictionless data ecosystem" (mainly about corruption - related data)
- Oleg: I've recently published an open source tool for running hackathons http://dribd.at and would be happy to answer any questions. Meanwhile I'm helping with preparations for the relaunch of the Swiss OGD portal, in particular by putting together learning materials based on the open data handbook. Bringing the School of Data to Switzerland http://make.opendata.ch/school is another one of my goals for next year, and running a workshop around #okfnlabs tools has been on my mind, would be happy to hear suggestions and any experiences/work you are doing re: data literacy.
- Share use cases?
- Working together
- Scheduling December Hangout
- Shared links:
Participants
Please add your name and email/twitter/website here
Notes
* Daniel indicated he was interested
* David indicated he was interested in contributing
Types of activities:
*
@OKFNLabs Hangout #11 - Thursday, July 16, 2015
- Communication, collaboration, inspiration.
- Goals
- Run at least 4 Hangouts in 2015
- Document the process of organising the Hangouts
- Notify the lists
- Set up the recording
@OKFNLabs Hangout #10 - Thursday, May 14, 2015
Agenda
Part 1: 30m - projects and updates show and tell - add your latest project news here!
Part 2: 20m - current issues and operational updates from Open Knowledge Labs
Participants
Please add your name and email/twitter/website here
Notes:
Anders Pedersen, NRGI (Natural Resource Governance Institute)
http://www.resourcegovernance.org/draft-eiti-data-visualization
- Great to present as work at NRGI connects with OpenSpending and Data Packages
- Extractives data space - income and spending re mining, oil, gas etc
- Increasing amount of data but not very standardized - hard to compare and understand
- What do we need to make meaningful stories at the national level - looking at Data Packages as a lightweight solution here to help out
Potential datasets to develop as data packages:
- World Bank governance indicators straight in so they could be compared to e.g. extractives situation
- Also commodity prices [ed: cf https://github.com/datasets/registry/issues/5 - commodity prices]
- Blog series on falling commodity prices: http://www.resourcegovernance.org/issues/commodity-prices
Paul Walsh
- OpenSpending - running since 2010
- Discussion about re-architecting the technical platform, formalise a work plan for next year focused on a MVP
- Looking for stories, use cases, especially things you cannot do today with OpenSpending (see forum link above)
- Seeing exponential growth,
- Standardising on the Open Spending Data Package and improving ETL & API are goals in the new architecture
- Maximize the speed with which users go from uploading data to a working visualisation is a priority
- Comparative analysis between datasets/countries is the holy grail, relies on quality of data, see https://discuss.okfn.org/t/what-is-openspending/265
180 datasets in Japan, several dozen in DE, ...
Sneak previews SpenDB:
What was the point about Texas? (Steve Adcock) Steve: Textus, we'll table it until next time..
Oleg
- Running an Open Research Data hackathon in Switzerland on June 5 & 6 http://make.opendata.ch
- Part of this will be a data literacy / data quality workshop with a focus on science data, follow @sodacamper for info
- Any ideas of sciency datasets, Labs tools we should try use I can provide some of these from the oil and gas industry. One example (3D viz and data analysis): www.opendtect.org
Matthew
- Working on http://givemetext.okfnlabs.org/
Roadmap
- Vision of Labs is not just about a set of projects:
https://docs.google.com/a/okfn.org/document/d/1vZQaEQ6Rm8eFk54XBN9KhBGQE1KomesFJg3kRalO4AU/edit?usp=drive_web
- What technology we think should exist
- Why we want it
- What, specifically, Open Knowledge (Labs) should focus on
- Database, wrangling tooling, ability to collaborate around data
- A common outlook even if we are working on individual projects
- We are a loose collective, people can come and work on what they like
- "Not the architecture of a house - more like the layout of a park"
- Not producing an ecosystem, but helping to shape one - an artist collective, with shared principles
Community charter
- How to get the community to participate more (newsletter, local leads, ...)
- People should feel bold about sharing with the list
- Tools to help with their own data wrangling projects
- What would be really exciting and fun for people getting involved with us?
- Interface with the OpenDataHandbook.org
- Set up mentorships between members, and with wider community
A couple of ways we might approach this (I can't figure out how to verbally connect on the hangout):
- tiered giveaways (online stickers, bumber stickers, tshirts) for various levels of participation
- a commercial backin for developers - the ability to sell feature addons to open source software labs builds
- our business model at dGB (www.dgbes.com) attempts to merge this open source + commercial approach
@OKFNLabs Hangout #9 - Thursday, April 16, 2015
Agenda
- Part 1: 30m - projects and updates show and tell
- Part II: Labs development and management discussion
- Labs sysadmin - how we manage
- Provide routes in for contribution
- Reaching out to other lists and connecting with Open Knowledge local groups
- Africa Open Data and IBM led possibilities
Participants
- Please add your name and email/twitter/website here
- Oleg Lavrovsky / @loleg / http://make.opendata.ch/?lang=en
- Rufus Pollock / @rufuspollock / @rgrp - http://okfnlabs.org/members/rgrp/
- Matt Fullerton / @mattfullerton
- Daniel Fowler / @danfowler
- Miroslav Schlossberg / mschlos@gmail.com / @darvvon / @schlos
- Alex Peek / alexpeek1@gmail.com
- Kristin Antin / kristin@theengineroom.org / @kjantin
- Mike Chelen / michael.chelen@gmail.com / @mikechelen
- Tom Morris / @tfmorris
- Om Goeckermann / om.imap@gmail.com / @3kv
- Rodrigo Parra / rodpar07@gmail.com / @rparrapy
Notes
- Part 1: 30m - projects and updates show and tell
- Please add your latest project news here!
- Dan: latest developments on labs website, re-rebooting product-open-data]
- Labs Website
- Product Open Data
- Have dockerized and now back online and moved to s110
- Next steps: sorting the cache
- Rufus: Frictionless Data and Core Datasets project
- Oleg: tools for making hackdays impactful
- How can Labs make hackdays / hackathons / datathons help make such events
really goodspectacular - Are there projects we could especially feature? E.g. I'm looking for ideas for upcoming Swiss event for Open Research Data http://make.opendata.ch/wiki/event:2015-06
- What's lacking is concise and clear guidance - use CKAN here, ReclineJS here, Data Package there
- They are great opportunities to find and connect new community champions
- I'll start a thread on the forum with some clearer use cases
- if you have any feedback from using Labs projects at your own hackathons, please share the story
- Alex: to announce the launch of http://econfactbook.org
- Statistical representation of the world economy, aggregates macroeconomic data scattered around the web
- Currently the data is maintained by hand in HTML: suggestion to look at Core Datasets (http://data.okfn.org/data ) and look at better ways of publishing the data
- Tom: 30 second OED (Oxford English Dictionary - 1st edition - 1888) update - https://github.com/tfmorris/oed
- Oxford English Dictionary: text resource as data, an initiative to scan and parse and format the open archive
- Awesome live demo - Tom has progressed a lot since last year
- Tom also doing some proper analysis of various OCR techniques
- eMOP - Early Modern OCR - http://emop.tamu.edu/ - 40 million pages in 457,000 volumes published 1475-1800
- Aside: Project GITenberg - http://gitenberg.github.io/
- Rufus: OpenSpending new roadmap
- Kristin: Responsible Data
- Part II: Labs development and management discussion
- Labs sysadmin - how we manage
- Reaching out to other lists and connecting with Open Knowledge local groups
- Can we find a Labs contact person in each Local Group
- Open Data for Africa, Steven Adler, Meetup.com - Members of the meetings are substantially placed to promote open data conceptually but lack the feet on the ground to make useful tools. Steve is a member of the W3C and helped create their open data recommendaitons -Om
- What is Labs about - should we have a charter
- http://okfnlabs.org/about/ - existing doc
- The about page does list all the facets of okfn, there is a cognitive gap however. Since the value and utility of open data isn't well understood by lay people, a couple of powerful examples might help. Yet there is an even bigger conceptual question. I think analogy might be useful, and well designed visual reinforcement. Is anyone interested to brainstorm this topic with me? - Om
For next time
- Provide routes in for contribution
- Africa Open Data and IBM led possibilities
- Open Data Handbook update
- Data Patterns - editor wanted!
Next Hangout: mid May - thanks for joining & see you!!
Hangout #8 - Thursday 15th May 2014
Agenda:
- Please add your latest project news here!
- Open Data Maker Night London feedback
Participants
- Please add your name and email/twitter on here
- Andy Lulham / @andylolz / a.lulham@gmail.com
Hangout #7 - Thursday 20th March 2014
Hangout URL: https://plus.google.com/hangouts/_/7ecpjeapbheesbnvvgd4uj9dnc?authuser=1&hl=en-GB
Agenda:
- Project presentations (max 5 min each)
- ... please add your name and topic here if you want to present
- Current status/directions Data Package Protocols/API?
- Identification of implicit unique keys in spreadsheets (Tom Levine)
- OED first edition (Tom Morris) - https://github.com/tfmorris/oed
- Static site, using open IATI data:
Participants
- Please add your name and email/twitter on here
- Mark Brough / @mark_brough
- Stefan Urbanek / @Stiivi
- Andy Lulham / @andylolz
- Tom Morris / @tfmorris
- Thomas McNally/ @lorcanmcnath
- Tommy Levine _@thomaslevine.com / @thomaslevine
- Rufus Pollock /
- Matt Fullerton / @mattfullerton
- Neil Ashton /
- Olivia Gill /
Notes on presentations:
* Tom Morris: OED
- * has been working on OED parsing
- * also project lead on Open Refine:
- * interest from OCLC people in doing (BF?) reconciliation service
- * has been working on taking Internet Archive's raw OCR XML of 1st edition OED
- * structural work: recognizing headwords, splitting/recombining correct chunks
- * the idea: apply additional heuristics that apply knowledge of entry structure
- * then break the work down into something that proofreaders could turn into an online database
* Tom Levine (Multiple Tom's)
- * everyone complains about the lack of good metadata
- * one issue: without having a Data Package, we don't know what columns are unique identifiers
- * special_snowflake takes a CSV, checks whether columns have duplicate values anywhere in the CSV; if not, it's a unique key
- * also for combinations of columns
- * facilitates dataset combination and comparison
- * this has potential!
* Mark: IATI datastore
- * IATI registry: the repository of IATI-formatted aid data
- * it's a CKAN instance
- * decentralization has advantages, but gathering together data can be a challenge
- * Mark has created the IATI Datastore to address this
- * has been working on Mali aid data viewer
- * result: static site using IATI data
- * includes both French and World Bank data
- * see GH for the source code
Open discussion:
- * Tom M: cf. data-tamer.com (inre: Tom L's work)
- * based in Cambridge (MA)
- * their work on tabular data analysis is comparable
- * also cf. latest work on Data Wrangler
- * Matt Fullerton: hope to do work on Google Sheets importer and exporter
- * on tooling around Data Packages (esp. how to pool data transformation effort into an API, which does not do too much of that sort of thing yet)
Hangout #5 - Thursday 23rd January 2014
Hangout URL: https://plus.google.com/hangouts/_/7acpj82g77bhgt70u1ttpcb220?hl=en-GB
Agenda:
- Project presentations (max 5 min each)
- Part II - general discussion
- Improved project pages
- "Join in" process - what can we improve
Participants
- Please add your name and email/twitter on here
- Andy Lulham / @andylolz a.lulham@gmail.com
- Rufus Pollock / @rufuspollock
- Peter Murray-Rust / @petermurrayrust peter.murray.rust@googlemail.com (line may be slow)
- Neil Ashton / neil.ashton@okfn.org
- Daniel Lombraña González / teleyinex@gmail.com
- Dorotea Dorotea@accorda.org
- Lane Rasberry lane@bluerasberry.com
- Tom Morris @tfmorris tfmorris@gmail.com
- Tom Steinberg director@mysociety.org - @steiny
Notes on presentations:
- SayIt (Tom S) - http://sayit.mysociety.org
- a publication platform for speeches & transcripts
- i.e. data in which someone (politicians, parliamentarians, interviewees) is saying stuff
- as a non-reusable tool, transcripts have been very popular on They Work For You
- but if anyone else wants to make transcripts available in a "nice" way (e.g. decent API, links for speeches), you'd have to do it from scratch
- thus: most transcript data is in terrible PDFs
- SayIt is first about publishing, next about authoring
- authoring part: we need more tools like this!
- to allow data publishers to produce data in a nice form
- goal: lead to good initial publications of transcript data
- secondary pedagogical tool: this is what nice data publication tools look like
- a collaboration with Latin American group SCI
- questions from audience:
- Peter asks about the PDF => SayIt pipeline
- Andy: crowdsourcing audio transcription with PyBossa?
- Daniel responds: there's a Crowdcrafting template for transcription: http://crowdcrafting.org/app/soundcloud/
- the PyBossa template can be used with Soundcloud.JS library or Popcorn.JS from Mozilla
- Crowdcrafting provides redundancy, so you can do statistical analysis on thetranscriptions easily
- You can analyze the results with Enki, and publish the results on the web as an IPython notebook (see: http://daniellombrana.es/blog/2013/12/16/pybossa-enki.html)
- There is another platform that is built for crowdsourcing video captions: Amara.org and I don't know if it is possible to use it with only audio files.
- Tom: "BBC has done some interesting work in segmenting recordings into separate speaker tracks; Also speaker ID across multiple recordings"
- The open data standard that SayIt uses is Akoma Ntoso - here's the subset we use - http://sayit.mysociety.org/about/developers
- Open Literature Sprint this Saturday (Rufus)
- PyBossa updates (Daniel)
- latest news: today, 100% full coverage of source code with unit tests! https://coveralls.io/r/PyBossa/pybossa
- expecting to get more code contributions
- last week: two additions for integration of PyBossa into Facebook
- pull requests must be discussed; the two requests are very similar (but independent)
- OpenLibrary Internet Archives (Tom M)
- a client project that a library system in Cali got a grant for
- goal (seems simple, but actually complex):
- add e-pubs from Internet Archive and OpenLibrary into catalogue
- make available to patrons for download
- start with a set of classics; compare with existing usage; then add completely new titles
- OCR quality generally sucks and is highly variable (from unreadable to pretty good)
- dozens of editions; each edition might have several scans with varying OCR quality
- created parser for ABBYY Finereader files to sort scans by OCR quality
- sorted list used to pick edition to use in catalogue
- about to start the user study; just about to go live
- Rufus asks: is this relevant to OED stuff they were doing earlier?
- work is similar in that it needs to parse the AF format; different focus, though
- there aren't multiple scans for the OED (or are very few)
- info that was needed from OED: spatial cues for parsing out dictionary entries
- Tom S asks: what's the problem the project solves? what's the use case?
- presumably, the project's genesis is: libraries are increasingly marginalized
- they're trying to re-engage their user population
- their clientele is into e-pubs, so they want to find e-pub sources that are free but of good quality
- Frictionless Data (Rufus)
- major developments on F.D. front
- chatted with Max Ogden, Sebastian B(??)
- now have a roadmap that shows the goals and how to get involved
- the simple pitch:
- it's hard to get data and ship it around
- we can package up data a bit better and integrate it into existing tools
- R, Excel, Google Docs, ...
- ways to easily publish and consume data
- solution (per roadmap):
- simple standards for common data types (tabular, geodata)
- super-specific use case, requested by Tom S:
- say you're interested in house prices
- you want real house prices, not nominal ones
- you dig up data, find some shitty Excel file, join it with some other sheet...
- if you have the data packaged up, you should be able to do all this **without writing a bespoke app**
- there's also the related push to produce nice versions of some core datasets
- COFOG, postal codes, ...
- reference / indicator datasets
- Data Package spec has advanced
- foreign keys
- data package schema
Notes on "part 2" (general discussion):
- projects page updates
- revamped largely through Oleg's efforts
- things we were keen to add:
- filters
- tab for "help wanted"
- geared around "user journeys":
- Rufus says:
- more work to do, but looks great so far
- goal: a better way to see what projects are going on (whether for participants or users)
- quite a few projects in Issues to add
- would be good to flesh out "help wanted" with docs on what you can do
- "join in" process
- we had lots of discussion last time:
- not super friendly or obvious how to join in in a gradual way
- we could have more frequent, informal hangouts to give into to would-be participants
- Neil is helping to partition the mailing list:
- some subscribers might just want the newsletter
- vs. those interested in the minutiae of the discussion list
- another idea: make it easier to run local Open Data Maker Nights
- also to report that you're running one
- people also seem to want to ask permission
- "of course!" is the right answer, but this isn't obvious
- how can we make it clearer?
- Andy: advantage to this is that you then know that they're doing it...
- Tom M: the "help wanted" tab is a good start
- the level of exposition on the Frictionless Data page shows a lot of effort on behalf of newcomers
- you might want to add: reflect the fact that people do have different roles (translator, artist, programmer, ...)
- link to GDG Pulse: https://developers.google.com/groups/pulse/
- cf. Meetup API; one could pull stats
- Peter's comments:
- this hangout was very useful!
Random links:
https://github.com/okfn/ideas/issues/50
http://timemapper.okfnlabs.org/okfn/open-knowledge-foundation-history
########################################################################################
########################################################################################
Hangout #4 - 17th December 2013
Hangout URL: https://plus.google.com/hangouts/_/72cpi7e0sgova91dat5gg9paqg?hl=en-GB
Agenda:
- Project presentations (max 5 min each)
- Part II
- Projects page hack session
- Open Data Maker meetup maker
- PDF Wrangling?
Participants
- Please add your name and email/twitter on here
- Mark Brough / @markbrough
- Justin York / @justincyork / justincyork@gmail.com
- Rufus Pollock / @rufuspollock rufus.pollock@okfn.org
- Mike Chelen / @mikechelen
- Steve Adcock / gsahasdata@gmail.com
- Daniel Lombraña / @teleyinex teleyinex@gmail.com
- Karen Brzezinska kfbrzezinska@gmail.com
- Andy Lulham / @andylolz
- Thomas Levine / @thomaslevine / _@thomaslevine.com
- John Levin / @anterotesis
- Neil Ashton / neil.ashton@okfn.org
## Notes
Scribe: Andy, Rufus, Neil ...
* Daniel on Enki
- * allows you to analyze Crowdcrafting / PyBossa applications
- * although data is downloadable in JSON & CSV, you have to roll your own tool to work with it
- * see blog post for more details
- * Enki uses Pandas and IPython
- * CC results loaded into Pandas dataframes: nice table view, graphs, ...
* Rufus: Data Package spec & Data Explorer - http://explorer.okfnlabs.org/
* Steve: project management proposal
- * idea: to get a better handle on OKF Labs itself
- * question: what is the workflow like?
- * Mark responds: no, it's a loose and informal group to share work and seek advice / support
- * very loose, not an OKFN core thing
- * the way projects are presented is too complex for newcomers
- * one way organizational efficiency can be improved:
- * do a better job of managing both volunteers and projects Yes
- * suggestion: adopt the ubiquitous model of the "stage gate process"
- * (the doc walks through the stage gate process)
- * a stage gate-style system could be (semi-)automated
- * Mark: this relates to our reorganization of the projects page... re-open discussion in part II I agree
### Part II - Labs Coordination
* finding a time after Xmas for people to get together and hack on the projects page
- Projects page: https://github.com/okfn/okfn.github.com/issues/46
- Join in process: https://github.com/okfn/okfn.github.com/issues/68
Steve: Search by keyword and category needed (ref back to need for project management data model).
A short list at the beginning of all categories for projects would be useful as well as the prominent search bar. This would look good as a bubble map of projects and that should work well with the project data model approach.
A clear indication of coding methods and software is needed.
Category: Functionality is needed i.e. what sort of things do the labs tools allow me to do (as a hacker)
Graphic workflows (maps of processes) with click-throughs to the oklab tool(s) suited to that problem (as per Kati's suggestion)
(per Kati) having a tracked history of your path of search is very useful and I've seen excellent ideas of simple highlighting of a site tree for this purpose.
Excellent data model questions!! As far as display, we can always just pull what we want from the data model. Basic project attributes (i.e. attributes of category "projects"): Data originated, originator, Life cycles attributes (idea, startup, ..., abandoned, on hold, etc.), volunteers attached to project, etc.
Step one in this process would normally be to ask each client group what their questions were and then incorporate those into the data model. We have more knowledge of this though and should be able to build the straw model very quickly. I'll create a spreadsheet with a column for each of the project management data attributes and post it on the Google Drive this week.
Easy:
* Make it clear how to add a project
Kati: I want to learn
Example: http://devtracker.dfid.gov.uk/countries/BD/projects/ (project layout)
=======================================================================
Hangout #3 - 19th November 2013
- When: Tuesday 19th November. 1700-1800 GMT (1200-1300 EST, 1800-1900 CET)
- Where: Google Hangout - we’ll announce link on the day on the list and IRC (#okfn)
Hangout url: https://plus.google.com/hangouts/_/72cpilc1shmpsr2q3u8lus8ikc?hl=en-GB
Agenda
- 1700-1730: Project and Idea presentations (max 5m each)
- please add your name and topic here if you want to present
- Andy Lulham - datapipes update
- Daniel Lombraña - Crowdcrafting/PyBossa
- Oleg Lavrovsky - Open Access/Data Button
- Vítor Baptista - QueremosSaber/AdoteUmPedido
- Mark Brough - Philippines project browser
- Rufus Pollock - data.okfn.org
- Alex Peek - Global Economic Map
- 1730-1800: general discussion including next steps for labs website
Participants
- Add yourself here - name plus email and/or twitter and/or irc nick (on #okfn)
- Rufus Pollock rufus.pollock / @rufuspollock / @rgrp
- Oleg Lavrovsky / @loleg
- Mark Brough / @mark_brough
- Tarek Amr / @gr33ndata
- Andy Lulham / @andylolz
- Daniel Lombraña / @teleyinex
- Neil Ashton / neil.ashton@okfn.org / nmashton
- Vítor Baptista / @vitorbaptista
- Tod Robbins / @todrobbins
- Alessio Dragoni / @groundrace alessio.dragoni@gmail.com - i need your email to add you :( didn't make it.. (hope next time)
- Friedrich Lindenberg / @pudo
- Tom Morris / @tfmorris
- Anders
- Alex Peek - alexpeek1@gmail.com
- Günther Burow - @guntherburow - gunther.burow@gmail.com
## Scribing
### general discussion
* Rufus throws out an idea for improving the site:
- - an automatic idea to boot an open data maker night
* Oleg: summary of projects conversation
- - there are projects going on all around the world;
- - some are central to the OKF mission (e.g. CKAN, OpenSpending)
- - but Labs really exists to promote and kickstart the little guys, right?
- - has been thinking about how other OSS communities achieve balance between upkeep on old projects and promotion of new ones
* Rufus adds:
* Tod suggests: categories
- - Rufus: what categories?
- - Tod: what about a classifieds type page on Labs where we can list needs for collaboration like, "I want to do x but I need help with y"
- you know, like the cork boards where people list that they need a drummer for there band
- * Mark: multiple select / filter for languages, technologies
- * Oleg: there are fundamentally different types of projects
- - those that enable other projects vs. those that are specific and local
- - how do we flag projects that do or do not need contributors? (e.g. "needs assistance" category)
- * Andy: Oleg was looking at the Github API, and you can draw out interesting stuff about how active a project is
- - you can probably automatically answer some of these questions (e.g. a heuristic for "needs help")
- - stars and forks only go in one direction - i agree they're not useful measures for activity
- PRs in the last week or something like that is a much better way to tell
- * Mark: even a nice thing on activity of different repos tracked by Labs would be useful
* Rufus: are there any simple wins, just looking at the projects page?
- - what do we want people to do when they come to the page?
- * Oleg: projects need to be made more actionable; right now it's just a showcase
- - the choice to contribute must be put right in front of users
- - fundamentally, # of stars and forks is a good basic metric for activity
- - automated tools would be interesting, but project owners should be able to declare the status of projects
* Oleg asks: how many are working on Labs projects professionally?
- * Daniel is
- * Rufus says: in general, Labs projects are prototypes or experiments rather than commercially developed software
- * Oleg: would be good to nail down what's Labs and what's OKF
- * Mark: the lines are blurred (e.g. the Philippines project is a mix of both)
- * Rufus: things are listed on the Projects page because they're done by Labs members *and* have a Data-related theme
- - not generally projects that OKF central is paying anyone to work on
- - criterion: community projects, not being paid for by OKF
* Oleg: where OKF Labs can shine, even at the local level
- - by showing that the community can provide a more robust developer base for projects
- - people outside OSS ask: what happens when the developer goes away?
- - OKF Labs activity shows: other community members can step in
* Mark: major action items
- - projects page
- - categories will help work out how things fit in
- - maker night automation
* Rufus and Oleg on how maker night applies to local chapters
* Oleg asks: is Labs a working group?
- - is a maker night different from an OpenSpending hack night?
- * Rufus: similar to a working group in that it's an informal gathering & focused on making
- * Mark highlights focus on making stuff
* Rufus: is anyone up for taking on making the Projects page better?
- - Oleg has volunteered to take the lead, but other collaborators are welcome
- - jump in, fork the repo, and do stuff!
* Tom:
- - would be useful to step outside the organization and look at it as an outsider
- - didn't realize until recently that OKFN != OKF (thought "FN" == "Foundation"; didn't realize "N" == "Network")
- - subtle internal distinctions are obscure from outside
- * Rufus: would that matter to most people coming to the website? do they need to know?
- * Tom: assumption was that Labs was like Google Labs: experimental OKF projects
- - didn't realize it wasn't necessarily affiliated with paid OKF staff
### individual presentations / show-n-tell
* Andy on Data Pipes
- - http://datapipes.okfnlabs.org/
- - Main task: replaced higher order functions with node.js transforms
- - parse and render are now streaming operations
- - mocha tests
- - various bug fixes
- - strip, tail, replace operators
- - optimist option parser; lots of new options added e.g. grep (matching command line / csvkit)
- - variable substitution
- - CLI
* Daniel: Crowdcrafting
- - http://crowdcrafting.org/app/MM_TweetClicker/ - CC last week, Patrick Meier used CC to classify tweets related to Philippines typhoon disaster
- - http://crowdcrafting.org/app/frackfinder_tadpole/ : map of history of fracking in the US
- - # of registered users is up to 3,500 (but doesn't mean too much in terms of participation; research is being done on this)
- - now more answers than tasks!
- - Daniel has been in contact with other group - epicollect - trash in the streets in Spain
- - rgrp shouts out Daniel for becoming a Shuttleworth fellow
* Oleg: OA / data button
* Vítor: QueremosSaber
- - a site to create FOI requests; made a couple years ago http://www.queremossaber.org.br/
- - were having problems with it: federal govt are not accepting FOI requests through e-mail anymore
- - they've made their own system and are pushing it for local govts
- - QS is not that useful because there are many places that you simply can't use it for (they're not accepting requests)
- - idea: create an API for the govt system
- - instead of a REST API, an **email API**
- - whenever email received at a certain address (i.e. ministryofhealthcare@queremossaber.org.br) , a crawler would grab the appropriate data and create the FOI request in their system
- - still under development
- - solves some interesting problems:
- - in Brazil, FOI requests can't be created anonymously
- - some public servants don't want their names to be associated with FOI requests (they fear the consequences)
- - so: this API can be used to create an anonymization layer (if the service promises not to keep logs on who uses the system)
* Mark: Philippines project
- - the idea for http://markbrough.github.io/philippines/ :
- - create a small static site based on IATI data from the IATI datastore
- - goal: see what data is available on Philippines projects
- - shows what's available and what we'd like to see from other donors
- - rgrp brings up Tanzania scraping...
- - idea: get a current budget for Tanzania (or any developing country) and put it together with aid data
- - point: to highlight the problem with aid data not being published in a way that's useful for constructing budgets
- - sort of like the old project on Uganda, but *not* based on a static dataset like that site was
- - two more datasets to wrangle for this one
* Alex: Global Economic Map
- - this is a collection of standardized datasets from govt publications, SEC filings
- - goal: fully integrate project with Wikidata
- - major parts of project will be:
- - GEP by industry
- - employment by industry
- - top 10 corporations w/statistics on each
- - fiscal budget (similar to OpenSpending): revenues as % total, expenses as % total; monetary budget
- - point: centralizing statistics to make them easier to access
* Rufus:
- data.okfn.org: the idea of "frictionless data", making it as simple as possible to get the data you want into the tool of your choice (see the about page)
- a small amount of high-quality data will be hosted here
- - also tooling and simple data standards
- - standards: simplifying the pipeline of converters (etc.)
- - talking with Max Ogden about Dat and connection with frictionless data ecosystem
- - what we want: data package management;
- - recent idea: running on **npm**
- - indeed, data packages are based on Node packages
- - in progress: dpm
- - works at a very basic level; is basically an import of npm
- - some tooling around data packages now exists
- - data packages can be published:
- - put up a data package on github, then use the data.okfn.org Community area to get a nice view (which can be shared, etc.)
- - publish them to S3, wherever...