This is a read only archive of pad.okfn.org. See the shutdown announcement for details.

ckancon Topics for technical working group discussions
---------------------------------------------------------------

Attendants:

Summary by Ian:
    
1. Multilingual metadata
Various options were discussed including serving all languages together (open.canada.ca approach), serving English plus a set of translations (upcoming EC portal approach) and serving a single language with an optional global table of translations (ckanext-multilingual approach). Some strongly held positions were taken but no clear solution emerged.
2. ETL Tools
Joel presented a number of tools and soon-to-be-released extensions for using Drake, Open Refine, Pentaho Kettle, FME with CKAN resources. Suggested building a community around ETL scripts for CKAN e.g. Accela created connectors for loading business permits into CKAN
3. Deploying CKAN, packaging extensions and themes
Installing CKAN from source is a very long, manual process. Many users have built systems to simplify the process including use of Puppet, Azure and Amazon images, ckan-docker and custom tools such as datacats and ckan-multisite. Easing install and sharing of CKAN development environment should be a priority to help grow our community.
4. Visualizations and Resource Views
A number of JS and integrations with third-party tools were discussed including Plotly, Tableau, ODI web view. Some participants felt strongly that we should leverage existing tools and services instead of building our own. Concerns were expressed about directing CKAN users to third party services that track users and even require a separate log-in.
5. Creative ways to accelerate/fund high-priority ideas-and-roadmap issues
US Open Data has funded some open souce projects: ckan-exporter (now part of ckanapi) and ckan-multisite. The possibility of setting a bounty on issues was discussed, as well as using a gittip or kickstarter model.
6. Scheming extension
Ian spoke more about the future of ckanext-scheming. Integration with the solr search index is a priority. Group and organization support is planned. Creating a registry of schemas was discussed as well as sharing schemas at a field-by-field level. Adding versions to schemas is planned to support schemas changing over time.

We ran short of time to discuss the other topics raised. David suggested having an architectural meeting every two months in place of one developer meeting to encourage more input on these topics.



Please add a +1 if you're interested or add your name if there is something you would like to help lead the discussions

# Discussed

1. Multilingual metadata  +1+1+1+1+1

2. Extract, Transform, Load (ETL) tools  +2 +1+1+1

3. Deploying CKAN, packaging extensions and themes +1 +1 +1+1

4. Visualizations and Resource Views +1 +1 +1

5. creative ways to accelerate/fund high-priority ideas-and-roadmap issues  +1+1+1

6. Scheming+1+1+1

7. Core CKAN development directions +2+1

# Outstanding








Multilingual Metadata
* Ian has in scheming extension rules to replace language fields with a JSON blob with {'en': 'River data', 'fr': 'Fleuve data'}  (BCP47 for the lang code)
  * In CKAN core there could be an option in package_show to say you are happy to receive these multilingual fields.
* Other approach (pan-EU data portal) is where 'en' is the default and others are in an extra field or a separate table.
The problem is how the dataset is presented to the API - if you just start replacing the title string with JSON blob then clients that don't understand it will have a problem.

ETL Tools
* ? existing tool, open source
* Also using Drake - a script that can run curl, transform, open refine recipe and upload to CKAN
* Also written connectors for Pentaho (open source FME) which are not open source yet, targeting to open source summer 2015
* Would be good to build a community around ETL scripts.  Especially for standard enterprise solutions widely used in gov.  As a client, perhaps require vendors to have CKAN Open Data Connectors
* e.g. Accela created connectors for some business permits etc. to ETL them and put them into CKAN

Deploying CKAN, packaging extensions and themes
* David described install from source and extensions & dependencies - very manual & long winded
* David talked about puppet install - Data.gov.uk To Go. Hard & slow to write puppet scripts.
* Ian multisite - lets you distribute a source directory for the extensions and things you want to go with the site, as well as the ini file, to run what you want
   * uses git submodules
* CKAN Azure image (although CKAN 2.1)

Visualizations and Resource Views
* Plotly
* David - rather than plotly/tableau, how about D3 in CKAN - open source. Ian wonders what happened to Vega extensions (year ago)
* Google Chart API
* Joel - encourages piggybacking - Tableau. Has ODI connector (Gavin, Joel created a fork). Uses web view. Free up to 1m rows.
* Issues of usign external sites - endorsement. registration required
* showcase best resource views on home page? 'ckanext featured views' for exactly this, released last week.

Creative ways to accelerate/fund high-priority ideas-and-roadmap issues
* US Open Data / USODI has funded some things th
* bounty
* gittip not ideal
* kickstarter

Scheming
* Doing it for groups etc as well as packages. Required IGroupForm PR has just been merged.
* Schema registry. Who's using each schema / field. And mappings between schemas to cross between CKANs. Schema versions?