Mapping the Earth and its Future with Big Data

Big data and analytics applications run the gamut of human activity, from doctors curing disease to athletes seeking competitive advantage to retailers predicting customer behavior. What all big data use cases have in common is that they are an attempt to understand the world around us.

But it’s possible that no big data project will get us closer to that literal goal than the Global Ecological Land Units (ELU) map. The map, a joint project of the U.S. Geological Survey and ESRI, is a groundbreaking effort that maps Earth’s ecosystems in unprecedented detail. The work was commissioned by the Group on Earth Observations (GEO), a United Nations-level intergovernmental consortium of 96 nations and the European Commission. The GEO’s mission is “to achieve comprehensive, coordinated, and sustained observations of the Earth … as a basis for sound decision-making for improving human welfare, encouraging innovation and growth, alleviating human suffering, including eradicating poverty, protecting the global environment, and advancing sustainable development.” The ELU map was a part of the GEO work plan because a globally comprehensive map of ecosystems at a management-suitable scale did not exist.

Data Informed spoke with Dr. Roger Sayre, senior scientist for ecosystems in the land change science program at the U.S. Geological Survey, and Charlie Frye, ESRI’s chief cartographer, about the map as a big data use case and as a tool to improve our understanding and stewardship of our planet.

Data Informed: Tell me about the global ecological land units map. What is it and why is it significant?

Sayre: The global ecological land units map is a new map of terrestrial ecosystems of the planet at a relatively fine spatial resolution for a globally comprehensive product. It is a concept wherein we define and then map ecosystems as unique combinations of the physical environment – that is, the bioclimate, the land forms, and the geology, as well as the vegetation that exists in response to that physical potential of the environment. So we essentially map ecosystems as unique physical environments with their vegetation, and then model them out across the planet using data for each of those input layers.

What was the reason behind developing the map?

Sayre: GEO, as a group of nations, has put together an intergovernmental protocol called GEOSS – the Global Earth Observation System of Systems. And GEOSS is essentially a big work plan with lots of observation-related tasks, and one of those tasks is to develop a comprehensive, robust, and practical map of global ecosystems for terrestrial as well as marine and freshwater environments. So there’s an official intergovernmental task about this. That’s the charge, that’s my commission. And I happen to be the task leader of that, and the U.S., as a member nation, is responsible for the work, and the USGS is the federal agency responsible for the work. So this commission appeared on the scene a number of years ago from the framers of the GEO and GEOSS constitution, and they were looking for a task leader and I had some experience in continental-scale ecosystem mapping. So they contacted me and I eventually became the GEO-designated task lead for this activity.

How does the map compare to previous or existing maps? Is it bigger? Is it more detailed? More complete?

Sayre: It’s definitely more detailed. All previously existing maps of, say, macro-scale ecosystems of the planet have been fairly coarse and also were derived through an expert-opinion process, whereas our map is derived from actual data. It’s scientifically rigorous in that way and it’s repeatable. And so its two distinguishing features are that it’s much finer in detail than anything that comes before and it’s also derived from data as opposed to expert opinion.

What’s the significance of that difference, that it’s based on data as opposed to expert opinion?

Sayre: It’s more quantitative work than what we call interpretive work. It does happen that experts get together and draw boundaries around ecological areas, and their concepts and their bounding of ecological areas have always been really good, because they know their disciplines and they have field experience. So expert opinion-derived maps have been the status quo but they haven’t always been repeatable, as it depends on which experts you have in the room when those lines are drawn. It was always good scientific practice to try to more objectively and quantitatively define these ecosystem boundaries. And that’s what we have done. That’s definitely new and more innovative. Charlie, would you agree with that assessment?

Frye: Absolutely. And I wanted to go back a little bit on why we at ESRI were working on this with Roger. I think something important happened when President Obama announced his climate action plan and, subsequently, the climate change initiative last November, which is it gave some real incentive to being involved in producing and helping with the science behind global data. I look at this (project) from an ESRI perspective, and five years ago there really wasn’t a lot of a business case to provide global data, because people weren’t asking for it. And with President Obama announcing that and then following up on it, it became quite obvious that there were some very good reasons to be involved, both from the point of view that ESRI selfishly can get a little bit of attention from the standpoint that we provided this data and have been providing analytical tools for years, but also that most of us who are involved in this kind of work felt strongly that it should be done in the first place. Five years ago, we couldn’t get anyone to describe a business case that we could work with to produce global data. And now we are working on dozens of global layers and getting a lot of attention from the standpoint that it’s not only the U.S., but everyone in the world who has a stake in figuring out these problems.

What were some of the data sources that were used to produce the map? How much data went into the production of the map?

Sayre: A lot of data went in. These are global data sets and each one was, I think, 11 billion raster cells across the whole planet. And we had four essential inputs that we used to model ecosystems, and those are: land forms, and the source of data for the land-form model is a digital elevation model of the Earth, which is derived from satellite imagery. And the climate data that we used is a 50-year historical average of temperature and precipitation data from all of the reporting meteorological stations around the world. And, of course, there are some areas where there are no weather stations and so we don’t really have data from those areas and there had to be some interpolation in those data-poor environments. So that was the land forms and the bioclimate sources of data. And the lithology, or rock-substrate type layer, that’s important because substrate matters to living things. And that data source was the global lithology map, which is a recent arriver on the scene. And for some of those input sources, satellite imagery is used for the geology analysis. And, finally, our last input layer is land cover. And that tells you about the vegetation as well as the human-altered surfaces on the planet. And that is a classic satellite-image-derived product.

Ecological Land Units map. Click to enlarge.

All of (these data sets) are very big because our spatial resolution, 250 meters, is relatively fine for the whole planet.

And we take those four inputs – climate, land form, lithology, and land cover – and we combine them in a GIS (geographic information system). And so you are multiplying 11 billion cells per layer by four inputs, and we model the ecosystems from that GIS combination of the layers. Really, we have 60, 70 billion discrete values in the data set, which is why we are calling it a real-life example of a big data application.

Frye: I think it’s also worth noting that, on the bioclimate data, that precipitation data is based on monthly precipitation data for 50 years. So that means that there are 600 precipitation data sets behind that. And the same thing is true for temperature, there are 600 temperature data sets behind our growing degrees data set, which is how we express temperature. And so there’s a lot more data behind what we use than just the face-value data sets that Roger described.

Sayre: This is not the type of analysis that a normal person, even a very computer-inclined person, can do on their laptop or their desktop. The processing power needed to do this analysis is very substantial. I was very fortunate to have ESRI generously donating all that processing power, which is a lot of compute time and analyst time. It really was a big analysis and it was made possible by ESRI doing it.

So this is clearly a big data application. What are some of the applications that will come from this use case? What are we going to be using it for?

Sayre: One of the biggest applications is the emerging interest in the economic and social value of ecosystem goods and services. Ecosystems provide us, as humans, with these goods and services that are critical for our very survival. And there’s a recent interest in trying, both economically and noneconomically, to valuate those goods and services. And so there’s an emerging discipline of ecosystem services mapping. And it has struck me for a long time that probably the best spatial analytical units to use for that mapping are maps of ecosystems themselves. They are the service-provider unit. And you will find that these ecosystem service assessments are typically using watersheds, sometimes geopolitical units – something other than ecosystems as the spatial analytical unit. I think a very promising, powerful first application will be in these assessments of the economic and social value of ecosystem goods and services.

And then for climate change, there’s a lot of interest in knowing, in understanding, the impacts of climate change on ecosystems. And I would maintain that we need to know what the ecosystems are and where they are out on the landscape in order to be able to answer that question of how are they being impacted.

And then, for conservation applications, there’s a great utility here as well. It is a fundamental conservation tenet among the global conservation NGOs that, in addition to conserving rare and endangered species, it’s important to conserve representative ecosystems. Representative ecosystem conservation requires a knowledge of what those representative ecosystems are. So we have mapped them out such that conservationists now can study how well protected or represented these global ecosystems are in the set of global protected areas. So for biodiversity priority setting, there’s great utility, we believe.

Click to enlarge

And then, finally, there are resource management applications. If you have an ecosystem-based management mandate, then you need to know the ecosystems that are within your jurisdictions so you can properly manage them. And there’s an application around environmental security. Nations will go to war for access to these ecosystem goods and services. So I think it will help us to understand our distribution of ecosystems and the goods and services they are providing to better maintain global environmental security.

Those are a number of applications for which the data were intended to be useful. It is brand new and so we are exposing it as broadly as we can. We are asking the scientific community to work with us on both improving the concept going forward and providing critical early adopter use cases so that we can demonstrate that ecosystem data are useful for those intended applications.

You mentioned climate change and measuring the effect on ecosystems. Does the map tell us anything about recent climate trends?

Sayre: It does not. It only establishes the location of ecosystems on the ground. But we are in a position to model either forward or backward in time the distribution of ecosystems based on a climate change. If we have climate change trend data, we can actually remodel ecosystem distributions in the past or predict them into the future as part of a potential impact of climate change on ecosystems, but the data themselves do not provide any information on trends in climate change. We need to get that information from the climate change trend modelers.

In the map’s forward, it states that Mark Schaefer, former Assistant Secretary of Commerce for Conservation and Management, National Oceanic and Atmospheric Administration (NOAA), emphasized the value of linking these detailed maps of ecosystem units globally at various scales. What are the advantages of linking these ecological land unit maps and data?

Sayre: One advantage is standardization. A global ecosystem map didn’t exist previously, certainly not at this finer spatial resolution and derived from data. Now that it does exist, it’s possible to do regional comparisons or cross-continent comparisons with a standardized reference data set. Similarly, the IUCN, the International Union for the Conservation of Nature, has a red-list concept for assessing the endangerment of species. Now they are developing red-list criteria for ecosystems. They are assessing ecosystems to determine their degree of endangerment and vulnerability. In order to do that, it helps to have a standardized set of ecosystems that everyone can use that were derived in the same way everywhere around the world. It’s the notion of a reference standard for the spatial currency of ecosystems. We feel it will be very powerful in that way.

How do you see the map being used by decision makers and policy makers in regards to environmental management and policy?

Sayre: I believe that if we can make it easy enough for them to access on a simplified, dashboard type of presentation – and ESRI has developed already some incredible value-added applications for browsing the data and for querying the data – if we can just make it easy enough for policy makers to access and understand, then it does allow policy makers to ask questions about where are ecosystems now and where might they be relocated to in the future given some scenario of climate change, and which ones are currently enjoying some protection in our national park system or the global protected areas. And so any management or policy-related question for which you need to know the distribution of ecosystems on the ground, these data should be very useful for that.

Frye: And we have also found that the map is a really easy, low point of engagement that almost any decision maker can at least look at. We put together one story map that, at least I as a geographer, underestimated the power of. I looked at it as being kind of trivial to look at to look at lots of different places and see different pictures of those places. And that’s because I do that all the time as a geographer, but most of our decision makers don’t get to do that all the time. And this map, along with engaging photos of different types of ecosystems, gives them the ability to approach and understand ecosystems on their own terms as opposed to having it dictated to them.

It gives them the ability to visualize and understand the data without having someone have to explain it to them.

Frye: Exactly.

Sayre: That actually brings up another audience, which I feel is very important. This has an educational application as well. I think there has been a historical aversion to mapping ecosystems because the people who pioneered that concept began with the notion that ecosystems are scale-less, they can be as big as the planet or as small as a grain of sand. And because of that, people thought (ecosystems) can’t be mapped. And exactly what an ecosystem is has never been fixed within the ecology community. So one thing that we have done is that we have actually said that ecosystems can be scaled, and they can be defined, and they can be developed rigorously in a scientific way. And we did that and here they are. And we feel that will have a lot of utility just for the educational value. And Charlie, aren’t there some grad schools that are incorporating this data into their curriculum already?

Frye: We have already started working with Colorado State University and we have gotten additional interest from the University of Pennsylvania. And they are looking at this from the same kinds of ideas that Roger’s already explained, from the point of view that this is a globally consistent and comparable set of data that allows them to look at the impacts. And Penn, in particular, is looking at ecological hot spots relative to protected areas around the world. And their driver for that is to help the UN understand how to increase the percentage of the world’s habitats that are protected, from 13 percent to 17 percent.

The education applications raise an interesting question: What can the map teach us about our environment and what do you see as the overarching lesson that the map delivers?

Sayre: It teaches us a lot about our environment because it essentially characterizes ecosystems as a combination of four really important earth surface characteristics: the climate, the land form, the geology, and the vegetation. The map really helps us understand what ecosystems are in a physical geography sense, and once you grasp that, you can start asking lots of questions about the relationship of humans to the natural environment and, comparatively, how are similar ecosystems different across the continents and what causes particular species to be found where they are, sort of what are their ecosystem or environmental requirements? The map can help us begin to ask a lot of questions about our natural world and our place in it as human beings.