Mission

Introduction

The Mission subpackage is based on routines mainly adopted from HelioPy (https://heliopy.readthedocs.io/en/stable/) and SunPy packages (https://docs.sunpy.org/en/stable/) and further developed. It allows to select and download data from in situ, remote and ground open-access databases. In its present version, the subpackage allows to download data from NASA OMNI, ESA Cluster and NASA MMS in situ data. The tool includes the capability to verify the version of the data to download, in order to always have the latest version of the files. The main output of the subpackage are xarray to be further used by other subpackages of the AIDApy package.

The main features of the Mission subpackage are:

  • being able to handle various missions, such as the Cluster mission or the Magnetospheric Multiscale Mission. Additional missions could be added according to needs without any major change to the architecture, either by contributing to HelioPy or building our own downloader;

  • having precise control of the data. The query can ask specific time range, probes, coordinates, etc. The module will check the availability of the data;

  • managing time-varying multi-dimensional distributions. This feature is of great importance to the mission module as it is not available for other Python packages;

  • ensuring the use of a proper data container providing raw data but also time range, metadata, etc;

  • offering a uniform interface to other AIDApy’s packages in order to perform advanced analysis on the mission data. For instance, an aidapy timeseries object can be generated using data from the mission package, given access to statistical tools and data processing to the final user.

Inputs

The philosophy of AIDApy is to ease as much as possible the user experience. To this end, we decided to minimize the number of arguments entered by the user and the knowledge required to obtain the desired data. Below is a list of the parameters the code needs from the user:

  1. The first parameter to enter is the start and the end of the requested time interval as a Python datetime.datetime object written as: datetime(year, month, day, hour, minute, second) or an astropy.Time object allowing to handle multiple time format such as GPS, ISOT, or FITS.

  2. The second parameter specifies the mission from which to download the data, as a string: 'mission' = '<mission1>'. Missions that can be selected currently are: OMNIWeb, MMS, Cluster.

  3. The third parameter is a list of strings, containing the different probes of the multi spacecraft mission considered: 'probes' = ['<probe1>', '<probe2>', ...]. Probes that can be selected are probe1, probe2, probe3 and probe4 for MMS and Cluster and probe1 for OMNIWeb. Note that for the single spacecraft mission or if one wants data from all probes, one can omit the 'probes' parameter or set it to an empty string.

  4. The fourth parameter is a string specifying the coordinate system the user wants the data in, such as: 'coords' = '<coord_system>'. Note that for the sake of clarity, it has been decided that data can be retrieved in only one coordinate system at once. The coordinate system that is currently available is Geocentric Solar Ecliptic (GSE).

  5. The last parameter to enter is a list of strings containing the data products that the user wants to be loaded (for instance the magnetic field vector, the ion density, etc), written as: 'prod' = ['<product1>', '<product2>', ...]. As mentioned in the previous section, a catalog of all available products is embedded in the Mission subpackage, which will be updated on the fly as more and more space missions and datasets will be added. We note here that this catalog also includes Level 3 (L3) products that are not directly available through open-access databases but are processed in the AIDApy package. A preliminary catalog of detailed available products can be found below in the following table:

Data product

Level

Available for missions

Description

dc_mag

L2

  • OMNIWeb

  • MMS

  • Cluster

3-component (x, y, z) or 4-component (x, y, z, tot) vector of magnetic field

i_dens

L2

  • OMNIWeb

  • MMS

  • Cluster

Ion number density

e_dens

L2

  • OMNIWeb

  • MMS

Electron number density

i_dist

L2

  • MMS

  • Cluster

3D ion distribution function

e_dist

L2

  • MMS

3D electron distribution function

i_bulkv

L2

  • MMS

  • Cluster

Ion bulk velocity vector

e_bulkv

L2

  • MMS

  • Cluster

Electron bulk velocity vector

i_temppara

L2

  • MMS

  • Cluster

Ion temperature parallel to dc_mag

i_tempperp

L2

  • MMS

  • Cluster

Ion temperature perpendicular to dc_mag

i_temp

L2

  • Cluster

Total ion temperature

all

L2

  • OMNIWeb

All products available for OMNI data

sc_pos

L2

  • OMNIWeb

  • MMS

  • Cluster

Spacecraft location

sc_att

L2

  • MMS

Spacecraft attitude

dc_elec

L2

  • MMS

DC electric field

i_omniflux

L2

  • MMS

Omnidirectional ion energy spectrum

i_energy

L2

  • MMS

Ion energy channels table

i_aspoc

L2

  • MMS

ASPOC instrument ion current

i_prestens

L2

  • MMS

Ion pressure tensor

i_temptens

L2

  • MMS

Ion temperature tensor

i_heatq

L2

  • MMS

Ion heat flux

j_curl

L3

  • MMS

  • Cluster

Current density calculated from the Curlometer method

mag_elev_angle

L3

  • OMNIWeb

  • MMS

  • Cluster

Magnetic elevation angle

i_beta

L3

  • MMS

  • Cluster

Ion plasma beta

e_beta

L3

  • MMS

  • Cluster

Electron plasma beta

Availability of data products

As stated in the introduction, AIDApy is designed to be upgraded on the fly according to needs such that the availability of data products can vary depending on the considered space mission. In order to ease the burden of browsing through the above (preliminary) data products catalog to search for the keyword corresponding to the desired physical quantity, AIDApy provides the get_mission_info function as a part of the high-level load_data function (see below). This function is designed to give information on the data products available for each space mission, but also to deliver the data product keywords corresponding the queried physical quantity as well as other data product settings (e.g., available data rates or modes, probes, coordinates, etc).

Below is a very simple code snippet showing how to get all available data products from a particular mission:

get_mission_info(mission='<mission>')

To get the AIDApy data product keywords corresponding to a physical quantity, simply provide the desired quantity as a string (e.g., 'magnetic field'):

get_mission_info(mission='<mission>', product='<physical quantity>')

All supplemental product parameters are accessible by setting the full_product_catalog parameter to True:

get_mission_info(mission='<mission>', product='<physical quantity>', full_product_catalog=True)

High level interface

The function load_data is used as a high-level interface with the mission subpackage. Below is a code snippet showing how to define the inputs for the load_data function using a Python dictionary:

# Define the time interval
start_time = datetime(<year>, <month>, <day>, <hour>, <minute>, <second>)
end_time = datetime(<year>, <month>, <day>, <hour>, <minute>, <second>)

# Define the settings as a Python dictionary
settings = {'prod': ['<prod1>', '<prod2>'], 'probes': ['<probe1>', '<probe2>'],
'coords': '<coord_system>'}

Once the parameters are set up, we generate the Mission downloader and download the data. This is done by calling the load_data function, that tells the Mission subpackage to create a specific downloader for every mission and time interval requested, and download and load the desired data products for the specified probes and coordinate system:

data = load_data(mission='<mission>', start_time, end_time, **settings)

The data is then returned in the data object and can be printed using the print() Python command. The returned data is discussed in the following section.

Outputs

The DataArray and Dataset objects from the Python package xarray (http://xarray.pydata.org/en/stable/) have been chosen as outputs for the Mission subpackage, as they have been especially designed to handle time-series of multidimensional data. It is basically an N-dimensional array with labeled coordinates and dimensions, which also supports metadata aware operations (see details in the Metadata subsection). It also provides many functions to easily manipulate multidimensional data, such as indexing, reshaping, resampling, etc.

The following code snippet is an example of what is printed when downloading a 1D array (e.g., time series of values) of size (2000) and a 2D array (e.g., time series of vector components) of size (1000,3) (the first dimension is the time and the second dimension is the vector components):

>>> print(data)
<xarray.Dataset>
Dimensions:                        (<1D_array_dimension1>: 2000, <2D_array_dimension1>: 1000, <2D_array_dimension2>: 3)
Coordinates:
  * <1D_array_dimension1>          (<time1>) datetime64[ns] 2013-08-05T00:00:00.000007 ... 2013-08-05T00:04:59.000987
  * <2D_array_dimension1>          (<time2>) datetime64[ns] 2013-08-05T00:00:00.000006 ... 2013-08-05T00:05:00
  * <2D_array_dimension2>          (<vector_components>) <U1 'x' ... 'z'
Data variables:
        <prod1>                        (<time1>) float32 86.812 ... 29.063
        <prod2>                        (<time2>, <vector_components>) float32 144.979 ... 5.028
Attributes:
        mission:  <mission>

The data retrieved from the Mission subpackage is an xarray.Dataset object, which is basically a labeled dictionary of xarray.DataArray objects. This Dataset contains two xarray.DataArray objects, which are in their turn N-dimensional arrays with labeled coordinates and dimensions. This Dataset object has a total of 3 Dimensions, which represent the two Coordinates (labels) of the 2D DataArray (<vector_components>, the xyz components of the vector and time1, the timestamps) and the coordinate (label) of the 1D DataArray (time2, the timestamps). The dimensions of the arrays are summarized in the Data variables section: each DataArray has one Data variable that represent the queried data products. The dimension of each data product (array) is shown in front of each data variable (here (1000,3) and (2000)), as well as the data type and first and last value of each data product (i.e., DataArray). Metadata is embedded in the objects Attributes in the form of a basic Python dictionary. For instance, the mission is written in the Dataset Attributes and the units and spacecraft location can be found in the Attributes of each DataArray object (e.g., units can be retrieved by typing data['prod1'].attrs['Units']).

Plotting

The DataArray and Dataset objects provide a simplified plotting system (see http://xarray.pydata.org/en/stable/plotting.html) based on the popular matplotlib package (https://matplotlib.org). Thus, it is rather easy to plot the data that is returned by the Mission subpackage using either the xarray.Dataset.plot() or xarray.DataArray.plot() methods. For instance, the following code snippet shows how to line plot the 1D array from the above Dataset:

>>> data['<prod1>'].plot()

The following code snippet shows how to line plot the 2D array from the above Dataset:

>>> data['<prod2>'].plot.line(x='time2')

File handling

Because the core of the downloading process is based on the HelioPy package, we chose to borrow its file handling system.

  1. Configuration file:

    HelioPy comes with a sample source or configuration file (“heliopyrc”), that is located in ~/.heliopy/heliopyrc and can be customised by the user (the config parser will look in ~/.heliopy first). The default contents of the configuration file are:

  • The working directory is the parent directory in which all downloaded data will be stored. By default, it is set to: download_dir = ~/heliopy/data Note that this default value may be changed to ~/aidapy/data in the future. Inside this directory, the files are stored in a rather standardized folder tree with the following structure: ~/mission/probe/instrument/mode (if available, else the year).

  • The user can also choose whether to convert all downloaded data to a hdf store, enabling much faster file reading after the initial load, but requiring the additional h5py and py-tables dependencies. By default, this value is set to False: use_hdf = False

  • In this file is also stored the user’s personal Cluster cookie that is required to download data from the Cluster Science Archive (CSA). This personal cookie can be retrieved by registering at the following address: http://www.cosmos.esa.int/web/csa/register-now . By default, the cookie value is not set: cluster_cookie = none

  1. File version control:

    One key point of the AIDA project objectives is to always deal with up-to-date data files, that evolve on the fly from open-access sources due to regular reprocessing of available datasets. Fortunately, the HelioPy package fulfills this requirement by getting the latest version of data files at every execution. In other words, the program checks the latest available files from the open-access sources every time data is requested. If the available file is not already in the user’s database, then it is downloaded and processed. Thus, the up-to-date data is returned to the user every time. This behavior will also be generalized to the AIDApy in the future. Every time the user will analyse spacecraft data in the AIDApy, the program will check if the latest available files are in the user database, otherwise it will download and process them.

load_data function