Using Google Earth to identify sites associated with slavery
Next week, I’ll be posting Earthbound’s first-ever guest article! It will be written by Dr. Andrew Feldman, who is a Postdoctoral Research Fellow at NASA Goddard Space Flight Center. He was a top 20 reviewer for Remote Sensing of Environment, and I asked him to provide some advice and guidance for those who are new (or relatively new) to the world of earth observation and/or publishing. This will be a great read if you’re early in your academic career or work in the earth observation space. Subscribe to get his guest article directly in your inbox!
This week, we look at a paper that uses earth observation data to help address Sustainable Development Goal 8.7: Take immediate and effective measures to eradicate forced labour, end modern slavery and human trafficking and secure the prohibition and elimination of the worst forms of child labour, including recruitment and use of child soldiers, and by 2025 end child labour in all its forms.
This is the first time I’ve seen earth observation data being used for this purpose and it highlights the wide-ranging applications for these types of data. If you’re interested in reading the full paper you can find it here.
A few notes for context:
This paper focuses on an area called the “Brick Belt,” which is a region of Asia that has a high concentration of brick kilns. Brick kilns are used to manufacture bricks and this is typically done by modern-day slaves.
The distribution of these kilns is inconsistent and according to the authors of this paper, locating them can be an initial step in preventing slavery in the first place.
Brick kilns for brick manufacturing are one of many industries for slavery that can be identified using earth observation data. Other examples include the following:
Fish farms in mangrove forests
Mining excavation
Currently, there are ways to crowd-source these kilns' locations using aerial imagery. However, if we can apply an ML algorithm to automatically detect their locations, this could reduce the manpower needed to create this much-needed inventory on a continuous basis
This paper identifies three technological advancements that made this work possible:
High-resolution satellite imagery
Crowd-sourced information for model validation
Advanced machine-learning image classifiers
Methods
They focused their modelling efforts on trying to classify “Bull’s trench” brick kilns; this is a type of brick kiln that has a characteristic oval shape.
The need for high-resolution (sub-meter) imagery is especially important when considering the size of the brick kiln sites. A single site would encompass one cell in a Landsat satellite image
The images used in this classification study were RGB images from Google Earth
The model was trained using site images from 2003-2016 across a region of Rajasthan, India
The accuracy assessment (used to test the performance of the model) was conducted using 2017 data of 178 brick kilns
There were three steps to the analysis:
Images were classified using the Faster R-CNN machine learning model. This model is a deep convolutional network used for detecting a specified object and it was used to identify whether or not an image contained a brick kiln. The output of this first step looked like this:
Then, the outputs from step 1 were reclassified using CNN - this is another machine learning classification model. This secondary model was used on the results from the Faster R-CNN model to refine the predictions even further. Basically, it reduces the commission of kilns (i.e. identifying a non-kin site as a kiln-site) conducted in step 1.
An accuracy assessment was conducted using crowd-sourced ground-reference data from known kiln sites.
How did the model perform?
As someone who hasn’t conducted image classification before, the workflow described here was very interesting.
They set a probability threshold when running the faster R-CNN. So only images that had a probability of containing a brick kiln greater than the threshold were classified as a kiln. They describe two thresholds: 0.8 and 0.5.
With a threshold of 0.8, a large number of kiln sites were ignored and omitted. However, when that threshold was set to 0.5, it correctly included all the known kilns in the area, but there was an overestimation of non-kiln sites being misclassified as kilns.
To reduce the number of non-kiln sites being included, they ran a second R-CNN model. This also helped the model’s performance, and the number of non-kiln sites classified as ‘kilns’ was reduced to 9.
However, this secondary classification step resulted in known kilns being misclassified as ‘non-kilns.’ This is more problematic than false positive errors because it could lead to kilns being missed or ignored when conducting any on-ground activity. This makes sense. It’s better to have non-kiln sites misclassified than kiln sites.
Therefore, the use of a secondary classification algorithm should be treated with caution, as increased omissions vs. commissions (including non-kilns) would be deeply problematic considering the context of this work.
Although this was an older paper, I thought it was a greater reminder of the power of EO data. I encourage you to read the full document and let me know what you think!
Thanks for reading! If you found value in what you read here, feel free to share and subscribe to stay up to date with the latest earth observation research.