FOSS4GNA 2012 Day Two Plenary: URLs and Firehoses
Michael Byrne, Geographic Information Officer (GIO) of the Federal Communications Commission (FCC), started with this idea: The FCC doesn’t make paper maps. Why not? Because people want interactive maps. Which led to the title of his talk: "I believe in URLs."
Byrne and his team have two key tasks:
- make sure the policy team has data it can visualize for decision making
- make those data transparent to public
The substance of his presentation was a list of 10 memes that enable his vision of data sharing/geo implementation at the Commission:
- one click to data - three clicks for Mike Byrne, but one on a dedicated URL for the requesting official
- if you don’t own URL, you don’t own squat - the URL must mean something, humanly readable, humanly editable to say a different state
- know the resource you have - you can publish all the data and folks will use it
- unintended consequences are good - data used in weird ways is good
- think about geography first - spatial not special, it’s just column in database (much applause)
- domain is the worldwide web - no platform needed, just use the standards of the Web
- enterprise is any device - low barrier to publish, low barrier to consume
- know your [Web page] real estate - numbers can be more important that a giant map on a Web page
- publish multiple instances of the data - the marginal cost of offering WMS, downloadable shapefiles and a Mapbox tile set is negligible
- the issues is the issue - it’s not about geography, but geography is part of the issue
Josh Berkus, CEO, PostgreSQL Experts Inc. gave the second plenary of the day. It focused on the practical and technical issues related to the “firehose” of data. That firehose is defined by a high volume of data from automated devices and the need for its continuous processing and aggregation to be useful.
In short, developers and implementation professional need to solve four problems to build valuable apps for the firehose:
- volume - that grows over time, it must be managed now and for the future
- flow is 24/7 - must continue operating, outages must be short (ETL does not work) and the data coming in can’t be out of order
- database size
- components fail - and yet the data must still come in and processing must pick up where it left off
He gave two examples in which he’d participated:
Tips for dealing with the firehose:
- collection and processing must be continuous, parallel and fault tolerant
- every component should be able to fail
- don’t use cutting edge tech, don’t use untested hardware, don’t run components to capacity, don’t do hot patching
- more coverage of FOSS4GNA
