Regrid netDF data using harp and python
A simple process to create a stack of easy-to-use netCDF files.
Google Earth Engine is great, but sometimes you want to download and manipulate remote sensing data outside of their cloud-computing framework (although I’m doing this less and less as GEE’s documentation continue to improve).
In any case, it’s nice to know how to regrid netCDF files so that you aren’t completely dependent on an external software and so that you have the flexibility to use packages and functions unavailable in GEE.
The ‘harp’ package in python allows us to do this using a couple lines of code. Harp is a series of command line tools that allow you to easily process satellite remote sensing files, creating comparable Level 3 files across many different data types. They’ve converted the functions into useful python and R packages that provide all of the functionality as the original command line tools.
Before you start, you’ll need to download the netCDF files. If you’re using NASA EarthData, I would recommend reviewing my post on that topic. I’ll be working with Sentinel-5P data.
Once the files have been downloaded, we can process them in python. If you are working with a large number of files (multiple months or years), I would highly recommend using a virtual machine or cloud computing platform, because it can take quite a while (and require a lot of memory) to run it on your own local machine.
We need to import four python libraries:
from glob import iglob
import sys
import harp
import os
If this is your first time using harp, you’ll have to install it. Luckily, you can use conda to install it, just like any other python package:
$ conda install -c conda-forge harp
Once the libraries have been installed, we want to make a list of all the files with an “.nc” extension (i.e. all the Sentinel-5P files). You’ll notice that I’ve also included the head of the netCDF file name - “S5P_OFFL_L2” - so that python doesn’t try to process any Level 3 (i.e. files that have “L3” instead of “L2”) files that I have floating around in my folder.
file_names = sorted(list(iglob("S5P_OFFL_L2*.nc*",recursive=True)))
This should give you a list of all the Level 2 netCDF products.
The next step requires a string of harp operations, each separated by a semicolon. You will need to spend some time figuring out which products you want to keep, derive, and modify. An explanation of the harp package can be found here. There is also really helpful breakdown of these operations in the RUS Copernicus YouTube tutorials, linked here, which is where I originally learned about this package.
The harp “bin_spatial()” command will be used to create a new grid. I’ve chosen a step length of 0.02 degrees. This slightly over-samples the original TROPOMI resolution (5.5 km x 3.5 km) but it’s suitable for my work. I’ve limited the latitude and longitude extents to my study area, and it will grid the Sentinel data into a 850 x 1000 cell grid.
The keep() function defines which variables you want to keep. I only want to keep the NO2 column density, time, latitude and longitude. If you want to keep additional variables you can include them.
harp_op = "tropospheric_NO2_column_number_density_validity>50; bin_spatial(850, 40, 0.02, 1000, -95, 0.02); derive(latitude {latitude}); derive(datetime_stop {time}); derive(datetime_start {time}); derive(longitude {longitude}); keep(tropospheric_NO2_column_number_density,datetime_stop,datetime_start,latitude,longitude,latitude_bounds,longitude_bounds)"
This is a text string that will be used in the harp.import_product() function to tell python how we want to regrid and reprocess the files.
Once the operations have been created, we can create a loop that will process all the Level 2 files. I included a print() line simply because I like to check in on the progress, but it’s optional. I want the new files to replace “L2” to “L3”, and I want to export everything in a netCDF format. NOTE: I’ve included an os.remove() function to deleted the original Level 2 file. Remove this if you want to keep the original file.
for i in file_names:
print(i)
no2_l3 = harp.import_product(i,operations=harp_op)
export_filename = "{name}".format(name=i.replace('L2','L3'))
harp.export_product(no2_l3,export_filename,file_format='netcdf')
os.remove(i)
And that’s it! Once your code is done running, you should have a a series of regridded Level 3 netCDF files. If you removed most variables and cropped the area, your new files will be significantly smaller than the original Level 2 files, which makes further processing much, much easier.
If you have any questions, feel free to reach out!