This is a read only archive of pad.okfn.org. See the shutdown announcement for details.

distributed_music_ideas
Isso NÃO é sobre o compartilhamento de arquivos ou mesmo arquivos torrent. Um sistema distribuído para armazenar informações que seriam visíveis em uma interface de usuário é necessário. Conteria registros para artistas, lançamentos, uploads, fóruns, colagens, usuários (talvez).
It would likely lack some features. Enforcing ratios would be extremely difficult, but hopefully not actually too much of a problem. 
There would likely be a large group of people that are semi-trusted to run a data node of some sort, but it should not require everyone running a data node to not be a bad actor, since that is impossible. 
 
Problem: what.cd had a great database of music metadata, along with collages, releases, forums, etc. The data was not distributed redundantly and there was a single point of failure. Copies of the data were not saved, there was only one interface to the data, and only one authoritative source of truth. This made it possible to shut down.
Therefore, a system with a distributed database of metadata, divorced from interfaces or file hosting of any sort, should be harder to take down, whether by legal or illegal means. It could also be expected to be more on the legal side of the legal gray area in which what.cd was in. It would be highly redundant and could have different systems of access control and interface hooked up to the different nodes in the metadata replication network.
Naturally there would be trade-offs. Quality will not be as good as what.cd's, period. Deal with it. Ratio enforcement will likely be impossible. These things may not matter - TPB does work pretty well, even if you scoff at the poor quality and tags and malware.  The trade-off in quality of content could be combated by a voting system, and a powerful search function with filters.


Meteor-gazelle design notes made by devs prior to shutdown:
    https://github.com/meteor-gazelle/meteor-gazelle/tree/master/doc/specs
    Has some valuable ideas, userflows, requirements for a successor to what.cd/gazelle. good read

Idea: database-level distributed consensus, such as with the Paxos extension for postgresql  - https://github.com/citusdata/pg_paxos#pg_paxos
Multiple independent postgresql nodes could be set up with their own systems of access control (public, or VPN, or totally private, tor, whatever)
When updates/inserts/deletes happen on a node they are asynchronously replicated to the other nodes via the Paxos SQL statement log.
If there are conflicts they are resolved by the consensus of a majority of the nodes.
combine with pg_paxos maybe? https://github.com/begriffs/postgrest "REST API for any Postgres database"  -- yes pgrest is awesome. is a non-standard extension so can't be used on heroku/RDS (do we care?). 
Possible architecture: Dockerize postgres with pg_paxos and paxos_replicated_tables across a semi-private network of semi-trusted people (as long as a majority are not actively trying to shit in the pool it should work out). 
Idea Strong seperation of metadata from 'content':
All metadata / documentation exists on some public web site with a good API, and any sharing service queries that site for its metadata needs. Imagine a website similar to Discogs, for instance, where the site would serve a dual purpose: to document a release, but then also include a magnet link to download the release. Or, divorce the magnet link from the site completely and have it stored elsewhere so that the main site containing the metadata is not targeted for removal.
Maybe an interface so that when a user navigates to any bandcamp, discogs, amazon, itunes album page, torrent links are provided?

(alternate idea: releases have IDs that are included in torrent metadata so that one can look up the torrent in some other closed system using the public id - the closed system could even mirror data from the public system in order to integrate the information better. Oh wait, this is the same idea as described below)

The site can use Discogs, Musicbrainz, etc as an index, with torrents/files 'linked' to them. Non-discogs releases can be uploaded in the same format and potentially submitted to discogs. Symbiosis.
Discogs is not free or open source, its prorietary data. Musicbrainz is free/public domain.
Pros:
Question regarding Discogs: Can we really just pull Discogs metadata willy nilly? What restrictions are in place as far as their API, would we just scrape it all and store if offline and check regularly for changes, or what?
Could we pull album info from the MP3/FLAC metadata?
Requests can also be linked to discogs releases, essentially all discogs releases will be unfilled 'requests'.
 Request bounty?
Bounties can work as usual. By default they will have 0 votes and bounty. With a link to the appropriate Discogs page.

From the IRC:
    16:53 < glittershark> 100 trusted people get access to the database, they each host 100 UIs with stricter auth
    16:53 < glittershark> etc
essentially like a multi-layered thing

Backup Idea:
    Once a month or so, an encrypted (choose your poison) backup of the site is made and released as a freeleech torrent. Thousands download, and therefore a backup is in the hands of users not the tracker.
    Possible issues: needs to be secure and verifiable (nobody can read or tamper with it) also: who has the keys exactly. I'm not sure if it's worth it because let's say what.cd had had this: We would all have the db, but what would we do with it?
    RE keys: still requires some single point/s of trust. There are some ceremonies for keeping a shared secret, but that is fairly OTT and difficult. (ZCash did this)
    RE what to do with database: simply ensures that there is definitely a copy out there; in the event something like this happens again, we can recover (assuming the hardware/systems is in place)
    ~How many people would hav access to the keys? (Estimate) I cannot really accurately estimate the backup size (experience + what data do we want to back up?), but let's say a few dozen GB. If it was freeleech, you should have hundreds with the space at least. Only need to keep the most recent copy for this to work.
 
 Idea: Freenet
 
 Pros:
     Anonymous
     Censorship Proof
     No need to be online to "seed"
     
Cons:

Idea: ZeroNet - using blockchain
https://zeronet.readthedocs.io/en/latest/using_zeronet/sample_sites/

quick description after having read about it:
    * ZeroNet stores name registrations in the blockchain
    * content.json is distributed over bittorrent, signed by the "wallet"/name registration
    * content.json links to the other files of the site so they too will be distributed over bittorrent
    * DHT or trackers bind addresses to IPs that host them (site address = magnet link)
    * Differs from torrents in that this "torrent" will be updated live, and the newest signed & valid update always wins
    * has a websocket api so your webpage can be notified when resources are updated
    * normal users update sites by signing updates with their own key, sends that to the site owner, and the site owner does whatever site owner wants to do with that data. This can mean authorizing the user to update a certain file with his own key
    * this means that the site owner can store private data outside the network - but normal centralization problems apply to that
    * Observation: A site could easily be a repository of files (using the optional files feature)
    * What's the catch: If authorities actually found the IP that issued the updates for a site, the pc could be seized and the key could be used to take down or overtake the site
See: https://docs.google.com/presentation/d/1_2qK1IuOKJ51pgBvllZ9Yu7Au2l551t3XBgyTSvilew/pub?start=false&loop=false&delayms=3000#slide=id.g9a1cce9ee_0_4

User accounts/identity:
Thoughts: federated user access? Something like diaspora but for file sharing would be super cool.

Idea: IPFS - distributed filesystem https://ipfs.io
...
 
Idea: Tahoe-LAFS - similar to IPFS but its an encrypted distrubuted filesystem so more suited for private content
 
More distributed dbs:
    https://github.com/haadcode/orbit-db "Distributed peer-to-peer database on IPFS" (JavaScript)
    https://github.com/amark/gun "A realtime, decentralized, offline-first, graph database engine. http://gun.js.org/" (JavaScript)
    https://github.com/bigchaindb/bigchaindb "BigchainDB is a scalable blockchain database https://www.bigchaindb.com/" (Python)
i could see a nosql backend work well for this kind of content - it scales well horisontally, is easily clustered ina  distriobuted manner and eventual consistency should work well for this kinda of content. Maybe couchbase: http://www.couchbase.com/
What about incorporating Hadoop toolset for parsing the various datapoints? HDFS, MapReduce (or Spark), Sqoop, Pig, Avro, Zookeeper, and Flume all seem very applicable here.


Decentralized Web
    https://github.com/cjb/GitTorrent "A decentralization of GitHub using BitTorrent and Bitcoin"
    https://github.com/blockstack/blockstack-core
    https://github.com/mediachain/mediachain
    https://github.com/datproject/dat
    https://github.com/HelloZeroNet/ZeroNet -- https://zeronet.io/
    http://www.mediachain.io/
    https://morph.is/v0.8/
    https://www.wikipediap2p.org/
 
Miscellanous:
    http://telehash.org/ "encrypted mesh protocol"
    https://github.com/feross/webtorrent "Streaming torrent client for the web https://webtorrent.io
  https://clickhouse.yandex/ "open-source column-oriented database management system"
  https://github.com/gitchain/gitchain "Decentralized, peer-to-peer Git repositories aka "Git meets Bitcoin" "
  https://github.com/bitchan/bitmessage "Bitmessage is a P2P communications protocol " https://bitmessage.org/wiki/Main_Page [* worth checking out imo]
 https://github.com/cjdelisle/cjdns "Cjdns implements an encrypted IPv6 network using public-key cryptography for address allocation and a distributed hash table for routing. This provides near-zero-configuration networking, and prevents many of the security and scalability issues that plague existing networks."
 https://github.com/adiitya/p2pstream "P2P Live streaming using centralized architecture http://adityaprakash.in/p2pstream "
 https://www.tribler.org/ "Tribler is an open source decentralized BitTorrent client which allows anonymous peer-to-peer by default."
 
 Encrypted email server by lavabit:
     https://github.com/lavabit/magma 
     
"Classic" BitTorrent:
    https://github.com/chihaya/chihaya "A customizible, multi-protocol BitTorrent Tracker" (Go)
    https://github.com/drbawb/babou "Babou is a combination web-framework/ torrent-tracker written in Go."
    https://github.com/mdlayher/goat "Goat: Go Awesome Tracker. BitTorrent tracker implementation, written in Go. MIT Licensed."
    https://github.com/leighmacdonald/mika "mika: Go based torrent tracker using redis for a backend and designed for private site use"
 
- Don't forget: It needs to be simple for end-users who just want to share content, and preferrably not blocked on corporate firewalls or by ISPs
- Move away from old PHP systems for sure. Gazelle's great and all but PHP encourages laziness and sloppy code (and it's PHP) (+1)(+1)(-1)(+1)(+1)
- True but: best if we can build a decentralized layer on top of existing tech (e.g. Gazelle) and then slowly replace it piecemeal. Rebuilding gazelle from scratch should be the ultimate goal but is unlikely to happen, current momentum won't last long. someththing like wikipediap2p would help the transition be smooth
- - Hence the RDBMS-based solution being not ideal but best for right now if we can say, make Gazelle work with Postgres and then replicate some of the tables with pg_paxos. Then we get distributed architecture but keep the old UI (for now)
- If a group of people is willing to work on it, I'm willing to help rebuild gazelle from scratch, possibly in Go. hahaha We can argue all day about languages/frameworks, though I will suggest that Go is great but maybe a little more low-level than we desire. 
- -100 to go (-- = + hehe), go makes me cry http://yager.io/programming/go.html - without parametric polymorphism there are only so many libraries you can write before you run into the `interface{}` upcasting problem over and over and over again. oh whatever. we're talking a databased-backed web app. even web apps have to build abstractions or else you get completely tied up in boilerplate and cruft and writing the same thing over and over and over again. That can be done almost no matter the languages though with some work.*with some work* - read the above link
- honestly I would vote for rails the reason i suggest go is because the goroutines are suited for a web application, and there are many frameworks that can help simplify web development. You don't have to start at the HTTP server coding part. Also pls not rails, it doesn't scale well.there are plenty of perfectly good programming languages with green threads that also have a type system that isn't crap
also "doesn't scale well" is a non-issue for a private tracker with 10000 users.
- The language really doesn't matter much at this point. We gotta figure out a distributed architecture to share the data first. Then we can pick technologies.If we have a semi-distributed RDBMS then everyone can write their own UIs on top of their local synced copy of the data. Then you can make yours in rails, someone else can use adapted gazelle, someone else can do Go. UI is the least important part.and i just realized i could find someone's blog post about how bad any language is. i give up. let's just work on the arch.
fine by me
N.B. what.cd was on gazelle (PHP) and we all liked it. https://eev.ee/blog/2012/04/09/php-a-fractal-of-bad-design/ read that once, it's a good article
- What if there was a way to have a "swarm" of self-replicating trackers? For arguments sake, let's say there's 10. When you go to download/upload using the website/UI it randomly picks one of the 10 trackers. Then the others replicate it and vice versa (ideally they would be continuously replicating if possible, otherwise once a day, once a week, etc.) This way if one goes down you either lose nothing, or just a very small amount of data. There could still be a central authority moderating one of the trackers (which effectively moderates all of them via replication) - but if they get taken down we would only be losing governance not all the data.

quick rant..
Public trackers have a pretty easy job - if all the information is public then there is no problem with scattering backups all over so that the next site can just pick it up and move on.
Private trackers have to deal with having private information, typically authentication. Information which must also be protected from unauthorized access.
So, say you have a distributed file system, who gets to make an update? If you forced all updates to be cryptographically signed, then you can grant keys to privileged servers. If police gets the key, police can hijack things. so there's a problem.
Even if all privileged servers went down, all the data would be safe for migration into a new network..
Similarly, private information could be signed and encrypted. For the end user to access this information, the user would have to talk with a privileged server.
I suppose this is just a less refined version of ZeroNet. I think the lesson I want to take from this is that the owner of a zeronet site could store information encrypted in the blockchain that only the owner (+ friends) would be able to decrypt

STRUCTURE:
    001 PROBLEM OF ACQUIRING AND MAINTAINING TOP QUALITY METADATA
    1 requirement/wish "best quality ever descriptions, tags, collages"
     1a, All legal, can be on musicbrainz, metabrainz. Take backups of musicbrainz (available), share with torrent.
     1b, as this is legal, can use musicbrainz website with identity/oauth/whatever to curate content/prevent spam.
     Reward is MISSING ? *** why would we do manual labor, somehow we must be rewarded with internet points. TODO
     
    002 PROBLEM OF DISTRIBUTING CONTENT WITHOUT RISK TO USER/ADMINS OR RISK OF HAVING THE DISTRIBUTION SYSTEM DESTROYED/DEGRADED
      2 torrents over i2p work, and would be fast if there was more people there. ipfs doesnt provide anonymity, other options above also dont provide anonymity

   003 LINKING "ILLEGAL" CONTENT WITH MUSIC LOVERS
       Linking/associating musicbrainz to an track/album content like magnetlink,
       here ipfs or zeronet can be created, ipfs or zeronet website which lists and can be updated (hence zeronet may be better
       last I heard ipfs didnt support updates to a website well). User submits their previous what.cd torrent to tracker-On-I2P, public DHT-like, and can now provide
       the torrents-magnetlink to zeronet-website togheter with musicbrainz id which its supposed ot be, 
       for others to find, and begin seeding/downloading. The zeronet website would be the index of torrent-links on i2p + musicbrainz id. The website must be free/open-source,
       and others can run their own if they like, and the database of torrent-links-on-i2p tracker to musicbrainz id ... ... ... ... ... ... ... ... ... ... ...
       
Idea Concept: Technical specification using Tor-network to configure an end point to another server. 
1. Setup tor Exit and Entry nodes
2. 
    
    
Want to make yourself useful?
i'd suggest if anyone wants to make themselves useful, try setting up a docker or vagrant thing with pg_paxos in a replicated cluster
or kubenetes or docker-compose
get the gazelle/wcd schema and load it inreplicate some tables with paxos
share your code/results
figure out how much effort it'd be to port gazelle to run on pg instead of mysql