Global River Data

1. Introduction
2. Data
3. Yearly plots of land cover near rivers
- 3.1. Aside: saving a series of images
4. Statistics
- 4.1. The Delta's canals
5. References

Figure 1: WWF HydroSHEDS Free Flowing Rivers Network v1. Only rivers corresponding to RIV_ORD values of between 1 and 4 inclusive, with increasingly dark shades of blue. Continents are not drawn, but they can easily be inferred from the beautiful structure of the river networks; likewise, arid regions are easily identified. The plotted lines do not take river width into account.

1. Introduction

For my second foray into Google Earth Engine, I decided to look at rivers, and in particular what I could do by combining the land cover data with rivers. Combining datasets opens up many new possibilities and the options quickly become overwhelming, so to keep things simple for now I aim to answer the following question: how have croplands changed near rivers over time? The question is perhaps too simple, because while it might give some indication of how productive land near rivers is, it cannot tell us whether the quality of the crops is better or worse, what kind of crops are being grown, or if the changes are compensated for elsewhere (e.g. near sources of groundwater). But it can give us an insight into changes in economic activity. If, for example, cropland near rivers disappears, what does that mean for the local farming population? what does it imply about the types of food being cultivated, or how that might affect markets? Since agriculture is often reliant on rivers, particularly in arid regions of the world, such as the River Nile in Egypt, radical changes in land use could be an indicator of a changing economic, and thus likely social situation. Of particular interest, if cropland is expanding upstream, what might that tell us about agriculture downstream, and in particular in neighbouring states?

The questions here are rather vague. We cannot hope to make economic forecasts, or geopolitical predictions, but since everything about human existence revolves around water, food and land, asking them might give insight into where we might expect to see changes as a consequence of how local economies respond to changes in both climate and neighbouring economies.

As before, this article is as much about using Google Earth Engine as the results, so I'll be discussing the technical aspects of the code as well as the data. I'll be producing maps and statistics, each product requiring its own approach.

2. Data

For the cropland I will continue using the MODIS data as before, which gives us 23 years of yearly estimates of land cover. Likewise, for the geopolitical boundaries I will continue to use the geoBoundaries data. For the rivers, I chose the WWF HydroSHEDS Free Flowing Rivers Network v1 dataset (Grill et al. 2019) mainly because of the permissible licence. The only downside when compared to some other datasets available in GEE is that it was derived from a single year's data, so it will not represent changes in flow rate over time. Indeed, rivers can also change position over time, but not enough to worry me for the rather coarse statistics I will be calculating. River flow rate is seasonal, but long term trends would be worth exploring as they could represent climatic change, or change in usage upstream. I will defer this to a future test.

The key feature of the HydroSHEDS dataset for my purposes is the RIV_ORD attribute. The rivers are mapped according to their "long term average discharge" in cubic metres per second, subject to a logarithmic progression where the lowest numbers refer to the highest flow. Thus a discharge of 100,000 m³ s^-1 or more is attributed the number 1, and a discharge of between 10,000 and 100,000 m³ s^-1 is attributed the number 2 etc, down to the number 10 for discharges of less than 0.001 m³ s^-1. A map of these rivers is shown in figure 1 for RIV_ORD values 1 through 4.

3. Yearly plots of land cover near rivers

We already know how to use the land cover data, so the additional steps will involve filtering to keep only cropland and masking it when it is further than a given distance from a river. The river data is contained in a feature collection where each feature consists of a LineString (a connected set of vertices tracing the path of a segment of river) with a set of properties, including the RIV_ORD attribute which we will use to filter with. These data are therefore different to the geometries and images we have been using up to now.

Since not all rivers are equal, we will only consider those above a certain discharge rate. This choice of where to draw the line is a little arbitrary, but the river networks quickly become so dense that the question of distance to a river becomes irrelevant. I tried this, and apart from the unreasonable computational load, if all rivers are mapped then even the Sahara appears to be richly served by rivers. Including these usually dry river beds is clearly absurd. As a first approximation I decided to set the cutoff at a RIV_ORD value of 4, corresponding to rivers with an average discharge of between 100 and 1,000 m³ s^-1. My litmus test was that the Murray River should be included as it counts as a significant river in south-eastern Australia, serving as a source of water, a means of transport and a state border. Many other smaller rivers in the Murray-Darling basin, such as the Goulburn, count as its tributaries and are of fundamental importance in local agriculture, though are not big enough to be included at this cutoff value. A future refinement to this work could be to descend to the next level of discharge rates. The filtering is easy enough:

// Get rivers above a certain threshold
var bigrivers = rivers.filter(ee.Filter.lessThanOrEquals("RIV_ORD", 4))

To create the mask, I made use of the convenient distance function which applies to feature collections. It generates a double precision image where each pixel's value is a measure of its distance from the nearest point, line or polygon in the feature collection. We are not yet interested in every pixel's distance, but by supplying it a cutoff distance of 100,000 m, this will define our mask, since all values beyond the cutoff are masked off.

var river_distance = bigrivers.distance(100000)

Now we need to apply this image as a mask to the MODIS land cover data to isolate cropland within the cutoff. Because the MODIS data forms an image collection (consisting of one image per year) we need to create a function to map over each image. Here is the function definition and its evocation to the MODIS data, contained in the variable lcmode:

var mask_dist = function(image){
  var nearRiver = image.mask(river_distance)
  nearRiver = nearRiver.mask(nearRiver.where(nearRiver.neq(12), 0))
  return river_distance.mask(nearRiver)
}
var cropDistances = lcmode.map(mask_dist)

The first line in the function mask_dist takes the image and masks it using the river_distance image, such that only land cover values within the cutoff distance are preserved. The next line further masks off all land cover values that are not cropland using the where method: anything not equal to 12, which is the value identifying cropland in the MODIS LC_Type1 data, gets set to zero and is therefore masked. Now we could simply return this value and recover only cropland within 100 km of "big" rivers, but I have added an extra step, applying the mask back onto the river_distance variable. What this gives us is an image showing all cropland within 100 km of a "big" river plus its distance from the nearest river, an additional piece of information which will come in handy later.

The resultant image looks like figure 2 below.

Figure 2: Combining WWF HydroSHEDS Free Flowing Rivers Network v1 and the MODIS LC Type 1 land cover data, showing only cropland located within 100 km of a river. The colour darkens with distance from a river. Only rivers corresponding to RIV_ORD values of between 1 and 4 inclusive. The image shown here is for 2022.

Though this reflects the river network displayed in figure 1, let's take a moment to consider it more closely. In particular, notice that India (and indeed Europe) is quite densely covered by cropland, meanwhile the Nile's banks comprise only a thin sliver of cropland. Zooming in we find that for most of its length, there is cropland only up to a maximum distance of about 10 km either side of the river. Now comparing with India is not an apples-to-apples comparison, since India's river network is much denser and, importantly, India is also much wetter, benefitting from the South Asian monsoon. But that is the point: despite having arguably the most legendary river known to western culture, which helped propel a powerful civilisation for millennia, Egypt's food and water supply is precariously dependent on essentially this one body of fresh water, whose source is in a distant, foreign land.

3.1. Aside: saving a series of images

I have produced a collection of images, each like that of figure 2, with one image produced per year of MODIS data. There is no convenient way to save an image collection, so we have to loop through each image in the collection and export each image individually. Normally in GEE this would be done with a function mapping, but we actually have to do this on the client, as this is where the export function operates, so a regular javascript loop is necessary. To loop over the image collection we need to convert it into a list first,

var listOfImages = cropDistances.toList(cropDistances.size())

The the loop is

var n = listOfImages.size().getInfo()
for (var i = 0; i < n;  i++) {
  var im = ee.Image(listOfImages.get(i))
  var datestr = im.get("system:index").getInfo()
  Export.image.toDrive({
    image: im,
    description: 'crop_distance_full_'+datestr,
    crs: "EPSG:4326",
    region: geometry,
    scale: 10000
  });
}

Note the use of .getInfo() to procure data from GEE into the client side, and the casting of each image from the list as an ee.Image object, the expected input to the Export.image.toDrive function. Loops like this are discouraged in GEE, but since this is a client-side operation there is no other option, nor does it pose any problem in terms of efficiency.

4. Statistics

The images produced in the last section can be converted into NetCDF files using, as I have, Xarray in Python, and exploited however one likes. While they might be interesting to look at, it is more useful to generate statistics for some quantitative analysis. This is best done within Google Earth Engine. To begin, we do much the same as for the script for the yearly maps by filtering the cropland within a threshold distance of the rivers with the distance method. Then, just as for the land cover statistics, we use reduceRegions to calculate statistics for a given region. This time I didn't just want a simple sum, I wanted to take advantage of the distance function attributed to the cropland pixels. It would therefore be nice to perform a sum for distinct distance intervals from the rivers. The ee.Reducer class includes histogram functions, of which fixedHistogram is the most appropriate for our needs. By predefining the intervals relative to the cutoff distance thusly,

var minDistance = 0
var maxDistance = 100000
var sliceWidth = 10000
var slices = (maxDistance-minDistance)/sliceWidth

we can construct a histogram through reduceRegions thusly,

var genHist = function(image){
  var histograms = image.reduceRegions({
      reducer: ee.Reducer.fixedHistogram({
      min: minDistance,
      max: maxDistance,
      steps: slices,
      cumulative: true,
    }).splitWeights(),
    collection: geolabFiltered,
    scale: 1000
  })
  return histograms
}

The splitWeights() method is used to weight the sums by the pixel area, and it requires some explanation, partly because it was unnecessary in the land cover sums from before (they were simply sums of pixels multiplied by area), but especially since the GEE Reference manual's description of it is so unhelpful:

Returns a Reducer with the same outputs as the given Reducer, but with each weighted input replaced by two unweighted inputs. (Google for Developers n.d.--a)

The Guide is slightly better, stating that

A reducer modified by splitWeights() takes two inputs, where the second input is the weight (Google for Developers n.d.--b)

But how those inputs should be arranged might not be clear to someone not used to GEE's paradigm, though it is simple enough: the weights, in this case the area per pixel, should be the second band in the image. There is no way to specify the band explicitly as an argument to splitWeights so this requires us to manually place the weight in that second position, if it isn't (as in my case) already. This too is easy enough, one simply selects the bands in the desired order. The generation of the pixel areas as an extra band is as before

var add_country_band = function(feature){
  return ee.Image.pixelArea().addBands(feature)
} 
var riverDistArea = cropRiverDist.map(add_country_band)

And we select the bands with "area" second with

riverDistArea = riverDistArea.select(["distance", "area"])

We can now map the genHist function defined above onto riverDistArea and splitWeights() will do its job,

var hist = riverDistArea.map(genHist)

The end result is a histogram of land cover according to distance intervals from rivers, per country. I chose to do cumulative sums, and set the scale to 1 km. Here is an extract of the table with the cumulative areas as columns for each country, for every year of the MODIS data, where I've only shown the columns for distances up to 50 km from a river. Units are in square metres.

shapeGroup	shapeName	year	10000	20000	30000	40000	50000
IRQ	Iraq	2001	14539613522	25185212387	31951492772	36890334426	40757415808
IRL	Ireland	2001	118540933	314010840	607583612	991345647	1250515223
KAZ	Kazakhstan	2001	3160391855	7043131191	10811377877	13728018211	15736434625
KEN	Kenya	2001	1494156322	2366870586	3544441330	4781636392	5805470273
PRK	Korea, North	2001	4582730332	9303096326	13100694731	16642759101	20110070920
KOR	Korea, South	2001	3887985243	6787127149	9471448552	12012928405	14073946008

Of course these statistics are most interesting when viewed as a function of time. Seeing as we've discussed it, let's look at Egypt's statistics for the full period.

year	0	10000	20000	30000	40000	50000	60000	70000	80000	90000	total	countrywide
2001	9186	3768	2300	2151	1935	1944	1631	1391	1179	976	26465	27143
2002	9229	3786	2305	2160	1950	1950	1641	1378	1196	981	26582	27271
2003	9220	3808	2307	2152	1922	1939	1629	1386	1206	988	26561	27287
2004	9187	3790	2302	2156	1913	1930	1623	1403	1207	988	26504	27229
2005	9157	3760	2294	2148	1911	1925	1618	1404	1214	997	26431	27204
2006	9158	3761	2298	2149	1907	1932	1625	1405	1216	1004	26460	27281
2007	9183	3789	2299	2135	1895	1933	1621	1390	1214	1005	26468	27296
2008	9223	3846	2313	2136	1901	1940	1621	1400	1218	1017	26620	27483
2009	9278	3899	2323	2150	1909	1939	1630	1412	1222	1026	26795	27705
2010	9274	3889	2318	2148	1906	1945	1636	1406	1234	1034	26792	27706
2011	9311	3907	2325	2153	1918	1942	1652	1408	1230	1046	26896	27821
2012	9336	3951	2357	2180	1961	1961	1678	1420	1231	1041	27120	28091
2013	9353	3987	2381	2194	1980	1972	1693	1457	1241	1047	27310	28334
2014	9346	4005	2380	2190	1961	1972	1707	1461	1246	1050	27322	28385
2015	9318	4011	2383	2179	1960	1973	1715	1460	1252	1052	27307	28301
2016	9327	4053	2380	2193	1988	1973	1714	1467	1268	1060	27428	28476
2017	9332	4089	2385	2195	1992	1974	1718	1472	1268	1059	27488	28560
2018	9335	4115	2401	2204	1994	1974	1714	1491	1275	1062	27571	28646
2019	9373	4170	2417	2226	2006	1994	1728	1510	1277	1069	27775	28850
2020	9430	4216	2442	2242	2020	2001	1733	1523	1296	1079	27986	29134
2021	9429	4228	2441	2249	2019	1992	1730	1512	1297	1083	27982	29119
2022	9417	4233	2433	2240	2015	1992	1735	1496	1295	1081	27941	29101
2023	9460	4268	2450	2239	2027	1998	1734	1526	1310	1096	28113	29320

I've converted the units from metres (for the distance labels) and square metres to kilometres and square kilometres. Each column shows the surface area of cropland for the 10 km interval starting at the column's heading, so the first column shows the area up to 10 km from rivers, the second between 10 and 20 km and so on. The "total" column is the sum of all other columns, and the "countrywide" column shows the total land cover for all of Egypt, whose calculation was discussed in the previous article. We can more easily read this as a plot where we subtract the 2001 value from each column.

Figure 3: Egypt, change in surface area for each distance interval over the MODIS data period, where from each distance interval I have subtracted its value in 2001 to show its absolute increase.

This gives us an indication of each distance interval's total gain and rate of change. It shows that over the 22 years of the MODIS data, cropland increased in surface area in every one of those intervals, and that indeed most of the cropland is within 100 km of the Nile or its Delta. Some caution should be exercised in interpreting exactly how much lies outside the 100 km limit, since recall that we have only measured from large rivers, and it is entirely possible that many small streams are used for crop irrigation. It is also possible (and indeed a quick visual verification bears this out) that the cropland outside the 100 km limit is still close to it. By consulting a map, we find that most of the cropland is in the Delta. Either way, the proportion outside the limit (comparable to the last few intervals) is small enough that it won't have a major effect on the conclusions we derive. I'll come back to the Delta, because although much of it is criss-crossed with canals, these canals do not all show up in the river data.

The greatest increase in cropland is in the 10–20 km interval. This reflects two things: that land close to the river is more valuable cropland (easier to irrigate), but also that the land closest to the river is probably saturated. Indeed, from the table we see that the number of square kilometres of cropland in the interval closest to the river is well over double that of this second interval even at the end of the period, and that the second interval comprises almost a third of the total increase up to 100 km. It is interesting to find that some intervals actually lose substantial amounts of cropland at various periods. This could be because the land is temporarily devoid of crops or for some reason the signature is unclear and fools the data processing algorithm, or alternatively the land is put to use for something else and later on new cropland is opened up. Without looking at the satellite images in detail this is not possible to ascertain.

Let us leave aside for a moment the question of distance to major water ways, whose application is, admittedly, limited. There are many reasons to release more land for agriculture: economic incentives to produce more agricultural products, such as for export, or to compensate for possibly less reliable growing seasons in a changing climate. There are also several factors that may contribute to the feasibility of generating more cropland, such as improved water management, increased fresh water transport by the Nile, more access to labour or more efficient farming techniques. But next to these incentives we must give centre stage to Egypt's positively exploding population—from the beginning to the end of the MODIS data period it increased from around 70 million to approaching 120 million, and it is currently climbing by about 2 million people per year (World Population Review n.d.). The relatively modest increase in cropland, about 8% from 2001 to 2023, on its own would not be sufficient to feed these extra people, so either the farming would have to have become much more efficient, or the import–export relationship would have to have changed substantially; or perhaps a combination of the two. It seems unlikely to me that farming efficiency has increased that much in just 22 years, so the economic and social implications of this increase are dramatic. With the population set to increase still further, Egypt is certainly facing some challenges. But if the current landscape is anything to go by, with all the cropland (and population) thus far being concentrated so close to the Nile, and moreover dependent on it as the only major supply of fresh water, it would appear that expansion of food production will be among the greatest of these challenges.

4.1. The Delta's canals

Here's a nice reminder for why data should be treated with caution. When looking at the satellite images, we can see that there are canals carrying water throughout the Delta. Here is an example image taken from Google Maps of Abu Kabir, in the south-eastern part of the Delta, about 80 km from Cairo,

Figure 4: Abu Kabir, annotated screenshot from Google Maps.

I've annotated the map indicating some of the canals. Now look at the same area using our GEE river data, making sure to map all sizes of river.

Figure 5: Abu Kabir and surrounds with rivers plotted. Rivers of all sizes have been plotted.

Those canals are missing—and worse than that, the "river" to the north, just west of Ezbet El-Soultan Hasan, doesn't appear anywhere on the satellite image! One wonders what else is missing, displaced or erroneously present. Happily for me, this isn't a study. It's even part of the point of these exercises to see what the limits of the data are, and to think about how to exploit them without falling into a trap. Happily enough, our statistics shouldn't be too badly affected since the density of canals is such that within the Delta we are more often than not within 10 km of a river.

5. References

Google for Developers. (n.d.--a). Google Earth Engine API Reference. Retrieved from https://developers.google.com/earth-engine/apidocs/ee-reducer-splitweights

Google for Developers. (n.d.--b). Google Earth Engine JavaScript and Python Guides. Retrieved from https://developers.google.com/earth-engine/guides/reducers_weighting

Grill, G., Lehner, B., Thieme, M., … Zarfl, C. (2019). Mapping the world’s free-flowing rivers: datasets and materials. Nature. doi:figshare https://doi.org/10.6084/m9.figshare.7688801

World Population Review. (n.d.). Egypt Population 2024 (Live). Retrieved from https://worldpopulationreview.com/countries/egypt