[I was asked to move this discussion from the "Design" group. I've also added a few words in response to a comment.]

 

Dear colleagues:

I want to argue that data management, analysis, etc., have become sufficiently complex tasks that we need to be handing them off to expert third parties--much as many of us hand off our email to Google, rather than running lab email servers as used to be common practice. I suggest that this perspective has significant implications for how we conceive of, and implement, an eventual EarthCube architecture.

As supporting material, I attach a white paper prepared for a recent research data lifecycle management workshop, plus a recent paper on the Globus Online system that we are developing as a first foray into this space. The following are the first three paragraphs of the white paper:

Big increases in data generated within research laboratories and demands for more careful data management lead to increased pressure on investigators. Researchers need not data storage, but full-­‐service data lifecycle management processes, encompassing data collection, storage, sharing, metadata, search, archiving, provenance, assignment of DOIs, security, etc. Establishing such processes would demand substantial time and resources that most researchers do not have, and cannot easily acquire.

We believe that the solution to this problem is not simply to define “best practices”—nor to provide researchers with software. Once defined, best practices must still be implemented. software still must be installed, operated, and maintained. Those implementation, installation, and operations steps are precisely where many investigators run into problems.

Instead, we should aim to outsource the entire lifecycle management process to a third partyResearch Data Lifecycle Management service. Ideally, this service will encompass discipline-­‐specific practices and methods, so that the individual researcher can connect their lab and then have many of their problems taken care of—much as many outsource their email to Google today. 

In response to a comment made on my previous version of my post, I don't mean to imply that we should expect such hosted services to organize people's metadata--but they can operate metadata services, and the processes needed to publish data and populate catalogs, for people.

Regards -- Ian

 

Tags: Cloud, Data, Economies, Federation, Metadata, Movement, SaaS, Scale, Security, WhitePaper

Views: 189

Attachments:

Replies to This Discussion

Great thoughts Ian, and thanks for starting this important thread! It seems that we are in an exciting new phase now where we are seeing the rise of "data scientists," and these are scholars not only in computer science, but also in the so-called "domain" sciences of ecology, Earth science, ocean science, plant science, geographic information science, etc. To many of these scholars the lifecycle management process IS the core research for them; it is the core science where the most compelling questions and grand challenges are. So I'm wondering if there is a parallel solution here where researchers may indeed want to outsource to third parties, or perhaps instead work with those data scientists already in their discipline to help them. My understanding is that this is what the iPlant Collaborative does, for example, for the plant science community (http://www.iplantcollaborative.org/ ). So EarthCube, as I believe Karen has commented, would encompass these domain-specific initiatives that are dealing with metadata, semantic interoperability, archiving, provenance, and the like. They may not always be viewed as "third party." Cheers....

This comment is more of an affirmation than a response. In short, I believe that Ian is right on point here. I can relate from anecdotal evidence that I have spent weeks and months myself and, more recently, had students who have spent too much time just downloading, managing, formatting and processing data -- in our case primarily global climate model simulation outputs and remote sensing data -- and installing and maintaining the necessary software. These tasks are only going to grow in importance and magnitude as datasets are increasing in size and complexity, e.g., higher spatial and temporal resolution of simulation models as we move from IPCC-AR4 to AR5, or increased resolution and fidelity of remotely sensed data.

The GMail analogy is a fitting one, because outsourcing these activities would allow us to focus focus on our strengths, which happen to be in the development of data mining and machine learning techniques for the climate and environmental sciences. At the same time, there are also research challenges in creating this type of infrastructure, which someone else may find of great interest. I hope that EarthCube will initially provide a medium for bringing these (sometimes disparate) people together and ultimately result in a sustainable solution for scientific data lifecycle management.

Dear Karsten, thanks for the positive reply. I've been reading and thinking about "long tail science" (a term attributed to Jim Downing)--how do you support the very productive work done by the 95% (?) of scientists who don't have huge cyberinfrastructure
I had not heard this term before but it is very appropriate. Another project that is taking a step in that direction is the NASA Earth Exchange (NEX). I think in terms of the vision it is not too dissimilar from EarthCube (collaborative environment, capabilities for data storage and computation, ability to share results, workflows, etc.) but in some ways it may have an advantage by being centrally managed -- as opposed to the distributed model of EarthCube, where openness is also a stated goal but ultimately individual institutions have to agree on the design, contribute resources and infrastructure, and so on.

RSS

© 2013   Created by Dennis Carey.

Badges  |  Report an Issue  |  Terms of Service

Any opinions, findings, conclusions or recommendations presented in this material are only those of the presenter grantee/researcher, author, or agency employee; and do not necessarily reflect the views of the National Science Foundation.