Estimating oil reserves from high-resolution satellite imagery has become rather fashionable in our budding geospatial-analytics-from-space industry. Oil is typically stored in tanks with floating roofs, therefore the fill of the oil tank can be estimated from the shadow cast on the inside of the tank as the lid sinks. A pretty neat idea.
How are oil tanks, filled to various degrees, detected? With sufficient training data, a neural network can probably learn to identify them. We have seen the power and flexibility of artificial intelligence algorithms in the past, when we successfully used the same neural network architecture to identify properties with pools in Australia and remote villages in Nigeria. However, training and deploying a well-performing model is expensive, both in time and money. Yes, computation on the cloud is relatively cheap, but the costs start to add up if you’re doing continental-scale feature detection.
Oil tanks are round, they look like bright disks when they are filled, and they are relatively big. Can we take into account these properties and come up with an image filtering algorithm to detect them? Enter Protogen. Protogen (for PROTOcol GENerator) is a geospatial image analysis and processing software suite developed at DigitalGlobe, which uses state of the art hierarchical image representation structures (so called ‘trees’) for the efficient organization, access and retrieval of the image information content. In the PROTOGEN max-tree, oil tanks appear as nodes which can be singled out based on attributes such as size and compactness.
Oil tank detection.
This is what filtering of a small image chip from Houston, TX, to detect oil tanks looks like in Protogen:
import protogen # Speficy protocol family and method p = protogen.Interface('max_tree', 'filter') # Configure method p.maxtree.filter.filtering_rule = 'subtractive' p.maxtree.filter.tree_type = 'max_tree' # Specify attributes p.athos.tree_type = 'max_tree' p.athos.area.usage = ['remove if outside'] p.athos.area.min = [100.0] p.athos.area.max = [3500.0] p.athos.compactness.usage = ['remove if less'] p.athos.compactness.min = [0.97] # Specify input image p.image = 'houston.tif' # Execute protogen with current configuration p.execute()
And this is the result:
Max-tree filtering at its finest.
This collection of Python statements:
- specifies the protocol family (
max_tree) and method (
- configures the
- specifies the set of attributes and their range; in this case, we are keeping the nodes of the max-tree with area between 100m2 and 3500m2, and with compactness larger than 0.97 (a compactness of 1.0 corresponding to a perfect disk);
- specifies the input image (
- and finally executes the protocol instance.
Note that these operations are based exclusively on the panchromatic (grayscale) image and, for an image of this size, are practically instantaneous. Here is the result of executing the same code on another chip from the same area:
The varying degrees of brightness and the ladders connecting the oil tanks affect the filter performance.
Note how we’re missing a number of the darker tanks. Moreover, the presence of ladders on and between tanks reduces the compactness of the corresponding nodes in the max-tree. This results in chunks of the tanks missing in the filtered image.
If we’re interested only in the larger oil tanks, all we need to do is increase the minimum value of the acceptable area. Here is the result if we set this value to 1000 (i.e., a radius of approximately 18m)
Just big oil tanks.
If we want to increase recall, we can decrease the compactness. For example, setting the minimum compactness to 0.8:
We have found more tanks but took a hit in precision.
We have detected more oil tanks but also picked up some noise due to objects which are relatively compact, but are not disks; the inevitable precision/recall tradeoff. We can remove this noise in a number of ways, one being feeding these results to our crowd to weed out the false positives. You can imagine this workflow at scale: Protogen detects oil tank candidates on an entire strip then the crowd cleans up the results. Much faster than having the crowd scan the entire strip; much more accurate than doing it just with Protogen. Another version of the crowd/machine combo we’ve deployed in the past.
Protogen also includes a vectorization module which derives a geojson with the bounding boxes of the detected oil tanks:
Oil tank bounding boxes.
Having vectors makes it easier to count. According to Protogen, there are 133 oil tanks (give or take!) in this image segment.
Deploying at scale
We can use Protogen to run fast experiments locally and pick the desired attribute values. How about deploying our code not just on an image chip but on a number of entire image strips? This requires creating a GBDX task, i.e., packaging our code up in a Docker container.
We’ve taken a slightly different approach here. We’ve created a generic protogen-runner task. This task takes as input an instance of the Protogen Interface class as a pickled string and simply executes it on the input image. This is what you need to do to run the max-tree on an entire WV2 strip over Cushing, Oklahoma (another major oil storage facility):
import gbdxtools from os.path import join import protogen import pickle import uuid # Create protogen interface as previously p = protogen.Interface('max_tree','filter') p.maxtree.filter.filtering_rule = 'subtractive' p.maxtree.filter.tree_type = 'max_tree' p.athos.tree_type = 'max_tree' p.athos.area.usage = ['remove if outside'] p.athos.area.min = [1000.0] p.athos.area.max = [3500.0] p.athos.compactness.usage = ['remove if less'] p.athos.compactness.min = [0.95] # Taskify it gbdx = gbdxtools.Interface() maxtree = gbdx.Task('protogen-runner') maxtree.inputs.pickle = pickle.dumps(p) # Specify input image maxtree.inputs.image = 's3://gbd-customer-data/32cbab7a-4307-40c8-bb31-e2de32f940c2/platform-stories/oil-tanks/image-cushing-pan' # Run gbdx workflow and save results under platform-stories/trial-runs/random_str wf = gbdx.Workflow([maxtree]) random_str = str(uuid.uuid4()) output_location = join('platform-stories/trial-runs', random_str) wf.savedata(maxtree.outputs.output, output_location) wf.execute()
The workflow takes about 15min on our default r3.2xlarge instance.
Max-tree filtering for oil tank detection on the entire strip.
The orderly spots to the north and south of the image center correspond to oil tanks, while the randomly scattered spots are noise. Here is a close-up:
Pretty good recall and precision in this scene.
And another one:
This is a tough scene; yet the accuracy is acceptable.
We have created a dedicated oil tank extraction module in Protogen that includes a number of preprocessing steps targeted towards improving the performance of the max-tree filter specifically for oil tank detection, e.g., morphological filtering to remove dark spots in the oil tanks which otherwise reduce the compactness. We have also created a Jupyter notebook that walks you through finding an image of interest over Houston, TX, downloading a chip for experimentation with Protogen and deploying your code over a large area using protogen-runner to extract oil tanks; you can run the notebook locally with our GBDX notebook server.
The results are shown below (click here for a full page view). The slippy map was created by uploading the Houston image and the oil tank bounding boxes to Mapbox, and using Mapbox GL JS to display the corresponding raster and vector tilesets.
Oil tank detections in Baytown, TX, shown in green.
How can we improve accuracy? There are many avenues. An immediate step is to use Protogen’s Land Use Land Cover family of methods on the multispectral image, in order to filter out compact features on soil and water and other irrelevant classes that can not possibly contain water tanks. A more futuristic approach is to combine Protogen with Machine Learning. Stay tuned for updates!