This is a read only archive of pad.okfn.org. See the shutdown announcement for details.

DBpediaReproducibility Currently, the exact state of the main DBpedia endpoint isn't easily reproducible for externals. However, such easy reproducibility is (at least) nice to have for scientific evaluations that base on DBpedia. Additionally, it would allow us to reduce the overall load on DBpedia: heavy-hitters could be blocked more aggressively as we could refer them to use their own DBpedia clone if they need to issue too many or too complicated queries.



Requirements:
-------------
- The state of each of the future releases should be reproducible
- Necessary steps should be openly documented (revisioned in case of future changes)
- A clone should act reasonably close to the online endpoint of DBpedia at that time, including:
  - The SPARQL endpoint: http://dbpedia.org/sparql
  - Content negotiation: http://dbpedia.org/resource/...
- It should be easy to set up a clone
- A clone should be easy to isolate (docker)


Next steps:
-----------
Given these requirements, we will investigate, document and discuss:
1. The steps necessary to reproduce the online endpoint from dumps as close as reasonable
2. Automating these steps (docker)
3. Way of distribution (build yourself vs easy to install docker images)


Existing work:
--------------
- There are a couple of existing approaches already:
- [1]: https://joernhees.de/blog/2015/11/23/setting-up-a-linked-data-mirror-from-rdf-dumps-dbpedia-2015-04-freebase-wikidata-linkedgeodata-with-virtuoso-7-2-1-and-docker-optional/
  - started out as a step-by-step guide how to set up a local Linked Data endpoint from (amongst others) DBpedia dumps with Virtuoso
  - in its latest version also includes some docker-fu to build and/or isolate the Virtuoso DB
  - builds virtuoso from source, including the dbpedia.vad file
  - wrt. our requirements:
    - mostly in the area documenting (step 1)
- https://dockerizing.github.io
- https://github.com/dbpedia/Dockerized-DBpedia