Monday, June 8, 2009

Open data, free analysis?

One of the major headaches associated with trying to help organizations develop Adaptive Management plans involves getting all the data relevant to the question. The bigger the organization the harder this is. For some situations, transparent access to the data might be one way to build trust with stakeholders - if they know they can see the data for themselves and have someone else analyze it they might be more willing to accept the analysis that has already been done. Especially once they find out how much it costs to hire a statistician. Maybe I'm dreaming.

For the analyst it creates a different kind of headache, because now you're going to have to defend the myriad choices you made enroute to getting an answer. Because in many cases there are no right answers, only better and worse ones. I should have to defend those choices, but it increases the time required to finish an analysis.

I came across a post about open data publication in the world of genomics - interesting stuff. A world where academic labs of 20 people are "small". Hard to imagine when I'm wrestling with the transition from a lab of 1 (me) to a lab of 4 (me, a postdoc, and two grad students). But the critical point made was about the differences in incentives leading to differences in data publication - submission of raw or analyzed sequences to public databases. Genome centers of 1000 people get funded based on genomes produced - hence they "publish" their data to databases quickly. Academic labs get funding based on paper outputs, so they lag in submitting data to public databases until they have the papers in press - getting scooped sucks. So we could fix that problem by changing the funding model for small labs as well as large, but, and its a big but for me too - how do you get funding to do analysis?

I was frankly delighted to see that someone else also thinks analysis doesn't come for free. In my world, I regularly meet people who have data, and think its relevant to a management problem. But they don't have the expertise to turn that data into something relevant to their problem. Unfortunately it is hard for me to help - I'm only one person, and already completely swamped. The classic natural resources model of "put a student on it" doesn't work well for analysis, because it takes YEARS to develop the necessary skills. Frankly it took me decades. Grant milestones can't wait for that. Ideally, there is some method for a student to start developing the skills, absent pressure to deliver, then when they are ready they can practice by working on a real project. So - who pays for that early development? One solution is teaching assistantships - get them helping undergrads. Well - great, if your department works that way (mine doesn't).

So, what to do in an Adaptive Management world where data is guarded, analysts are scarce, and problems are immediate? The current solution is to make decisions without analysing the data. Publishing data - online, available in raw form - would mean that many additional hands and minds could come to the task of working out what it means, and what the best way to get there is. The USGS publishes stream discharge data in real time. That's not realistic for Tern and Plover fledging success data, but annually - it could be annually. A central database with relevant data would make many things easier for my Missouri River work. Models should also be a part of that database! Open source model for Adaptive Management.

Food for thought, but no answers today.

No comments:

Post a Comment