
This post introduces the hdx-cli-toolkit – a tool written in Python to examine and update metadata in the Humanitarian Data Exchange (HDX) which is based on the CKAN data catalogue.
HDX is a project of the UN Office for the Coordination of Humanitarian Affairs (UNOCHA). It is a data catalogue focussed on sharing data relevant to relief efforts. The data come from a wide range of providers who can upload data and the related metadata either manually or using automated pipelines. A team at the Centre for Humanitarian Data manage HDX: approaching providers for new data, working on data quality, enhancing HDX, and writing pipelines to automatically add data. I was involved at the start of the HDX project in 2014.
The hdx-cli-toolkit grew out of my recent time as a consultant on a return to the HDX project. The team I was in received numerous internal queries relating to metadata on HDX that couldn’t easily be answered by using the HDX user interface or CKAN API but were amenable to solution using relatively small pieces of Python code. Rather than scatter around small pieces of one-off code I decided to write a command-line tool to collect together this code and make it re-useable by others.
I also use hdx-cli-toolkit as a point of reference for doing various operations in the Python language (configuring a Python project, command-line interfaces with the click library, tests with mocks, publishing to PyPI, configuring Visual Studio Code for Python development) as well as snippets of code for interacting with HDX which may be used in data pipeline code.
As an aside, my post on understanding Python project setup is the most popular one I have written by a large margin.
Overview
hdx-cli-toolkit supports the following commands:
- configuration – Print configuration information to terminal
- download – Download dataset resources (files) from HDX
- get_organization_metadata – Get an organization id and other metadata
- get_user_metadata – Get user id and other metadata
- list – List datasets in HDX
- print – Print datasets in HDX to the terminal
- quickcharts – Upload QuickChart JSON description to HDX
- remove_extras_key – Remove extras key from a dataset
- scan – Scan all of HDX and perform an action
- showcase – Upload showcase to HDX
- update – Update datasets in HDX
- update_resource – Update a resource in HDX
Installation
hdx-cli-toolkit is a Python application published to the PyPI package repository. It can be installed easily with:
pip install hdx_cli_toolkit
Users may prefer to make a global, isolated installation using pipx which will make the hdx-cli-toolkit
commands available across all of their projects:
pipx install hdx_cli_toolkit
hdx-cli-toolkit
can then be updated with:
pipx install --force hdx_cli_toolkit
hdx-cli-toolkit
uses the hdx-python-api
library, this requires the following to be added to a file called .hdx_configuration.yaml
in the user’s home directory.
hdx_key_stage: "[an HDX API token from the staging HDX site]"
hdx_key: "[an HDX API token from the prod HDX site]"
default_organization: "[your organization]"
The default_organization
is required for the configuration
command and can be supplied using the --organization=
command-line parameter. If not defined it will default to hdx
.
A user agent (hdx_cli_toolkit_*
) is specified in the ~/.useragents.yaml
file with the * replaced with the user’s initials.
hdx-cli-toolkit:
preprefix: [YOUR_ORGANIZATION]
user_agent: hdx_cli_toolkit_ih
Usage
Details of the currently implemented commands can be revealed by running hdx-toolkit --help
, and details of the arguments for a command can be found using hdx-toolkit [COMMAND] --help
A couple of simple invocations, this first one prints out all the metadata for a dataset in a readable JSON format:
hdx-toolkit print --dataset_filter=geoboundaries-admin-boundaries-for-nepal --with_extras
This one shows the value of a particular metadata element for a set of datasets. The list
command works in conjunction with the update
command in which case the --value
option provides the value to update to:
hdx-toolkit list --organization=healthsites --dataset_filter=*al*-healthsites --hdx_site=stage --key=private --value=True
A detailed guide including many example invocations can be found in the USERGUIDE.md file
Get in touch!
Currently the hdx-cli-toolkit only works on HDX but with some modification it should work with any CKAN instance.
If you are interested in learning more about hdx-cli-toolkit or CKAN then please get in touch.
Leave a Reply