Welcome to AIDApy’s documentation!

AIDApy

LicenseMIT LicenseCC Pipeline PyPi CoverageReport PylintScore DocSphinx Maintenance

The Python package aidapy centralizes and simplifies access to:

  • Spacecraft data from heliospheric missions

  • Space physics simulations

  • Advanced statistical tools

  • Machine Learning, Deep Learning algorithms, and applications

The aidapy package is part of the project AIDA (Artificial Intelligence Data Analysis) in Heliophysics funded from the European Unions Horizon 2020 research and innovation programme under grant agreement No 776262. It is distributed under the open-source MIT license.

Full documentation can be found here

Content

Getting Started

Motivations

A major advantage of our era is the easiness of gathering and recording data. Data can be found literary everywhere, from quantum scale to stars and galaxies. Great challenges are arising from this fact and the biggest of them all is how you can efficient manage all this information.

In particular for space data, technology has advanced so rapidly that every day, petabytes of useful information is recorded through satellites, other sensors around the globe, and simulations. However, a big part of this information is remain unused (or partly used) because the proper tools have not yet been created. New software packages can lead the way to win that challenge and fully (or partly) automate various processes that are necessary in order to transform them in a format that the scientists can understand and analyze.

This is the reason why we developed aidapy which is a high level Python package for the analysis of spacecraft data from heliospheric missions using modern techniques including data assimilation, machine learning (ML), artificial intelligence (AI), and advanced statistical models. The transition from proprietary software based on commercial languages to the new open-source community-developed packages in Python represent one of the main objective of aidapy. Our vision is to combine software tools that are already existing out there for different reasons, in a nice and efficient way. In our minds, this is a crucial and necessary task for the reducing of the knowledge “gap” that exist in several occasions between different scientific fields. aidapy has been created in order to fulfil the needs of a user with high programming skills as well as, for a user with basic programming knowledge.

Finally, we aim to build a new specific database for helping the ongoing trend in database standardization in space. Key heliophysics problems are selected to produce a database (AIDAdb) of new high-level data products that include catalogs of features and events detected by ML and AI algorithms from existing numerical simulations and observations. New simulations are also performed to enrich this database. Moreover, many of the AI methods developed in AIDA represent themselves higher-level data products, for instance in the form of trained models.

The transition to a big-data phase and the use of the big-data language of choice Python prepares the space community to the use of ML and AI. The latest AI developments and especially the methodology of deep neural networks are particularly suited for the identification of physical processes in space images and data. However, there is a barrier to overcome between the two very distinct communities of space scientists and experts in AI. Aidapy aims at bridging this gap linking the software developed in the AI community with heliospheric data.

We are developing aidapy to make it easier for a researcher in space science who does not have a background in computer science or in artificial intelligence to use some of the most advanced tools in this field and provide some concrete examples on how AI can help analyse data from simulations and observations, how AI can help discover physical processes hidden in the data and how it can make space weather forecasts.

Installation

The package aidapy has been tested only for Linux.

Using PyPi

aidapy is available for pip.

pip install aidapy
From sources

The sources are located on GitLab:

  1. Clone the GitLab repo:

Open a terminal and write the below command to clone in your PC the AIDApy repo:

git clone https://gitlab.com/aidaspace/aidapy.git
cd aidapy
  1. Create a virtual env

AIDApy needs a significant number of dependencies. The easiest way to get everything installed is to use a virtual environment.

  • Anaconda

You can create a virtual environment and install all the dependencies using conda with the following commands:

pip install conda
conda create -n aidapy
source activate aidapy
  • Virtual Env

Virtualenv can also be used:

pip install virtualenv
virtualenv AIDApy
source AIDApy/bin/activate
  1. Install the version you want via the commands:

For the “basic” version:

python setup.py install

For the version with the ML use cases:

pip install aidapy[ml]
  1. Test the installation in your PC by running. (Install both versions before running the tests)

python setup.py test

5) (Optional) Generate the docs: install the extra dependencies of doc and run the setup.py file:

pip install aidapy[doc]
python setup.py build_sphinx

Once installed, the doc can be generated with:

cd doc
make html
Dependencies

The required dependencies are:

Optional dependencies are:

Testing dependencies are:

Extra testing dependencies:

~

Reference manual

Mission

Introduction

The Mission subpackage is based on routines mainly adopted from HelioPy (https://heliopy.readthedocs.io/en/stable/) and SunPy packages (https://docs.sunpy.org/en/stable/) and further developed. It allows to select and download data from in situ, remote and ground open-access databases. In its present version, the subpackage allows to download data from NASA OMNI, ESA Cluster and NASA MMS in situ data. The tool includes the capability to verify the version of the data to download, in order to always have the latest version of the files. The main output of the subpackage are xarray to be further used by other subpackages of the AIDApy package.

The main features of the Mission subpackage are:

  • being able to handle various missions, such as the Cluster mission or the Magnetospheric Multiscale Mission. Additional missions could be added according to needs without any major change to the architecture, either by contributing to HelioPy or building our own downloader;

  • having precise control of the data. The query can ask specific time range, probes, coordinates, etc. The module will check the availability of the data;

  • managing time-varying multi-dimensional distributions. This feature is of great importance to the mission module as it is not available for other Python packages;

  • ensuring the use of a proper data container providing raw data but also time range, metadata, etc;

  • offering a uniform interface to other AIDApy’s packages in order to perform advanced analysis on the mission data. For instance, an aidapy timeseries object can be generated using data from the mission package, given access to statistical tools and data processing to the final user.

Inputs

The philosophy of AIDApy is to ease as much as possible the user experience. To this end, we decided to minimize the number of arguments entered by the user and the knowledge required to obtain the desired data. Below is a list of the parameters the code needs from the user:

  1. The first parameter to enter is the start and the end of the requested time interval as a Python datetime.datetime object written as: datetime(year, month, day, hour, minute, second) or an astropy.Time object allowing to handle multiple time format such as GPS, ISOT, or FITS.

  2. The second parameter specifies the mission from which to download the data, as a string: 'mission' = '<mission1>'.

  3. The third parameter is a list of strings, containing the different probes of the multi spacecraft mission considered: 'probes' = ['<probe1>', '<probe2>', ...]. Note that for the single spacecraft mission or if one wants data from all probes, one can omit the 'probes' parameter or set it to an empty string.

  4. The fourth parameter is a string specifying the coordinate system the user wants the data in, such as: 'coords' = '<coord_system>'. Note that for the sake of clarity, it has been decided that data can be retrieved in only one coordinate system at once. By default, the coordinate system is set to Geocentric Solar Ecliptic (GSE).

  5. The last parameter to enter is a list of strings containing the data products that the user wants to be loaded (for instance the magnetic field vector, the ion density, etc), written as: 'prod' = ['<product1>', '<product2>', ...]. As mentioned in the previous section, a catalog of all available products is embedded in the Mission subpackage, which will be updated on the fly as more and more space missions and datasets will be added. We note here that this catalog also includes Level 3 (L3) products that are not directly available through open-access databases but are processed in the AIDApy package. A preliminary catalog of detailed available products can be found below in the following table:

Data product

Level

Available for missions

Description

dc_mag

L2

  • OMNIWeb

  • MMS

  • Cluster

3-component (x, y, z) or 4-component (x, y, z, tot) vector of magnetic field

i_dens

L2

  • OMNIWeb

  • MMS

  • Cluster

Ion number density

e_dens

L2

  • OMNIWeb

  • MMS

Electron number density

i_dist

L2

  • MMS

  • Cluster

3D ion distribution function

e_dist

L2

  • MMS

3D electron distribution function

i_bulkv

L2

  • MMS

  • Cluster

Ion bulk velocity vector

e_bulkv

L2

  • MMS

  • Cluster

Electron bulk velocity vector

i_temppara

L2

  • MMS

  • Cluster

Ion temperature parallel to dc_mag

i_tempperp

L2

  • MMS

  • Cluster

Ion temperature perpendicular to dc_mag

i_temp

L2

  • Cluster

Total ion temperature

all

L2

  • OMNIWeb

All products available for OMNI data

sc_pos

L2

  • OMNIWeb

  • MMS

  • Cluster

Spacecraft location

sc_att

L2

  • MMS

Spacecraft attitude

dc_elec

L2

  • MMS

Direct-current electric field

dc_elec

L2

  • MMS

Direct-current electric field

i_omniflux

L2

  • MMS

Omnidirectional ion energy spectrum

i_energy

L2

  • MMS

Ion energy channels table

i_aspoc

L2

  • MMS

ASPOC instrument ion current

i_prestens

L2

  • MMS

Ion pressure tensor

i_temptens

L2

  • MMS

Ion temperature tensor

i_heatq

L2

  • MMS

Ion heat flux

j_curl

L3

  • MMS

  • Cluster

Current density calculated from the Curlometer method

mag_elev_angle

L3

  • OMNIWeb

  • MMS

  • Cluster

Magnetic elevation angle

i_beta

L3

  • MMS

  • Cluster

Ion plasma beta

e_beta

L3

  • MMS

  • Cluster

Electron plasma beta

Availability of data products

As stated in the introduction, AIDApy is designed to be upgraded on the fly according to needs such that the availability of data products can vary depending on the considered space mission. In order to ease the burden of browsing through the above (preliminary) data products catalog to search for the keyword corresponding to the desired physical quantity, AIDApy provides the get_mission_info function as a part of the high-level load_data function (see below). This function is designed to give information on the data products available for each space mission, but also to deliver the data product keywords corresponding the queried physical quantity as well as other data product settings (e.g., available data rates or modes, probes, coordinates, etc).

Below is a very simple code snippet showing how to get all available data products from a particular mission:

get_mission_info(mission='<mission>')

To get the AIDApy data product keywords corresponding to a physical quantity, simply provide the desired quantity as a string (e.g., 'magnetic field'):

get_mission_info(mission='<mission>', product='<physical quantity>')

All supplemental product parameters are accessible by setting the full_product_catalog parameter to True:

get_mission_info(mission='<mission>', product='<physical quantity>', full_product_catalog=True)
High level interface

The function load_data is used as a high-level interface with the mission subpackage. Below is a code snippet showing how to define the inputs for the load_data function using a Python dictionary:

# Define the time interval
start_time = datetime(<year>, <month>, <day>, <hour>, <minute>, <second>)
end_time = datetime(<year>, <month>, <day>, <hour>, <minute>, <second>)

# Define the settings as a Python dictionary
settings = {'prod': ['<prod1>', '<prod2>'], 'probes': ['<probe1>', '<probe2>'],
'coords': '<coord_system>'}

Once the parameters are set up, we generate the Mission downloader and download the data. This is done by calling the load_data function, that tells the Mission subpackage to create a specific downloader for every mission and time interval requested, and download and load the desired data products for the specified probes and coordinate system:

data = load_data(mission='<mission>', start_time, end_time, **settings)

The data is then returned in the data object and can be printed using the print() Python command. The returned data is discussed in the following section.

Outputs

The DataArray and Dataset objects from the Python package xarray (http://xarray.pydata.org/en/stable/) have been chosen as outputs for the Mission subpackage, as they have been especially designed to handle time-series of multidimensional data. It is basically an N-dimensional array with labeled coordinates and dimensions, which also supports metadata aware operations (see details in the Metadata subsection). It also provides many functions to easily manipulate multidimensional data, such as indexing, reshaping, resampling, etc.

The following code snippet is an example of what is printed when downloading a 1D array (e.g., time series of values) of size (2000) and a 2D array (e.g., time series of vector components) of size (1000,3) (the first dimension is the time and the second dimension is the vector components):

>>> print(data)
<xarray.Dataset>
Dimensions:                        (<1D_array_dimension1>: 2000, <2D_array_dimension1>: 1000, <2D_array_dimension2>: 3)
Coordinates:
  * <1D_array_dimension1>          (<time1>) datetime64[ns] 2013-08-05T00:00:00.000007 ... 2013-08-05T00:04:59.000987
  * <2D_array_dimension1>          (<time2>) datetime64[ns] 2013-08-05T00:00:00.000006 ... 2013-08-05T00:05:00
  * <2D_array_dimension2>          (<vector_components>) <U1 'x' ... 'z'
Data variables:
        <prod1>                        (<time1>) float32 86.812 ... 29.063
        <prod2>                        (<time2>, <vector_components>) float32 144.979 ... 5.028
Attributes:
        mission:  <mission>

The data retrieved from the Mission subpackage is an xarray.Dataset object, which is basically a labeled dictionary of xarray.DataArray objects. This Dataset contains two xarray.DataArray objects, which are in their turn N-dimensional arrays with labeled coordinates and dimensions. This Dataset object has a total of 3 Dimensions, which represent the two Coordinates (labels) of the 2D DataArray (<vector_components>, the xyz components of the vector and time1, the timestamps) and the coordinate (label) of the 1D DataArray (time2, the timestamps). The dimensions of the arrays are summarized in the Data variables section: each DataArray has one Data variable that represent the queried data products. The dimension of each data product (array) is shown in front of each data variable (here (1000,3) and (2000)), as well as the data type and first and last value of each data product (i.e., DataArray). Metadata is embedded in the objects Attributes in the form of a basic Python dictionary. For instance, the mission is written in the Dataset Attributes and the units and spacecraft location can be found in the Attributes of each DataArray object (e.g., units can be retrieved by typing data['prod1'].attrs['Units']).

Plotting

The DataArray and Dataset objects provide a simplified plotting system (see http://xarray.pydata.org/en/stable/plotting.html) based on the popular matplotlib package (https://matplotlib.org). Thus, it is rather easy to plot the data that is returned by the Mission subpackage using either the xarray.Dataset.plot() or xarray.DataArray.plot() methods. For instance, the following code snippet shows how to line plot the 1D array from the above Dataset:

>>> data['<prod1>'].plot()

The following code snippet shows how to line plot the 2D array from the above Dataset:

>>> data['<prod2>'].plot.line(x='time2')
File handling

Because the core of the downloading process is based on the HelioPy package, we chose to borrow its file handling system.

  1. Configuration file:

    HelioPy comes with a sample source or configuration file (“heliopyrc”), that is located in ~/.heliopy/heliopyrc and can be customised by the user (the config parser will look in ~/.heliopy first). The default contents of the configuration file are:

  • The working directory is the parent directory in which all downloaded data will be stored. By default, it is set to: download_dir = ~/heliopy/data Note that this default value may be changed to ~/aidapy/data in the future. Inside this directory, the files are stored in a rather standardized folder tree with the following structure: ~/mission/probe/instrument/mode (if available, else the year).

  • The user can also choose whether to convert all downloaded data to a hdf store, enabling much faster file reading after the initial load, but requiring the additional h5py and py-tables dependencies. By default, this value is set to False: use_hdf = False

  • In this file is also stored the user’s personal Cluster cookie that is required to download data from the Cluster Science Archive (CSA). This personal cookie can be retrieved by registering at the following address: http://www.cosmos.esa.int/web/csa/register-now . By default, the cookie value is not set: cluster_cookie = none

  1. File version control:

    One key point of the AIDA project objectives is to always deal with up-to-date data files, that evolve on the fly from open-access sources due to regular reprocessing of available datasets. Fortunately, the HelioPy package fulfills this requirement by getting the latest version of data files at every execution. In other words, the program checks the latest available files from the open-access sources every time data is requested. If the available file is not already in the user’s database, then it is downloaded and processed. Thus, the up-to-date data is returned to the user every time. This behavior will also be generalized to the AIDApy in the future. Every time the user will analyse spacecraft data in the AIDApy, the program will check if the latest available files are in the user database, otherwise it will download and process them.

load_data function

Loading data functions

This script contains the function load_data needed to make the interface with the user and also the associated functions for the check.

This file can be imported as a module and contains the following functions:

aidapy.aidafunc.load_data.get_mission_info(mission='mission', start_time=None, end_time=None, product=None, full_product_catalog=False, hide=False)[source]

Provide information on the mission

Parameters
  • mission (str) – The name of the mission from which the data are loaded/downloaded

  • start_time (datetime or Time) – Start time of the loading

  • end_time (datetime or Time) – End time of the loading

  • product (str) – Data product to look for in product_catalog

  • full_product_catalog (bool) – Tag to provide all available keys

  • hide (bool) – Tag to hide print messages when use in routines

Returns

info – String containing the AIDApy keyword of the queried product or Dictionary with information on the mission or queried product with the following keys: - name: - allowed_probes: - product_catalog:

Return type

dict or str

aidapy.aidafunc.load_data.load_data(mission, start_time, end_time, **kwargs)[source]

Load the data from the given mission on a specific time interface. The extra arguments gives details on the data to load.

Parameters
  • mission (str) – The name of the mission from which the data are loaded/downloaded

  • start_time (datetime or Time) – Start time of the loading

  • end_time (datetime or Time) – End time of the loading

  • **kwargs – Specific arguments for providing information for the mission or the download settings: - prod: high-level product to load. Can be a string or a list. The full list is available by using the get_mission_info() routine. - probes: probe number. Can be a string or a list. - coords: coordinate system to use. - mode: mode to define the data rate. Usually it can be either ‘low_res’ or ‘high_res’. The user can also specify a mode specific to a mission (for instance for MMS : ‘slow’, ‘fast’, ‘brst’, ‘srvy’) The list for these specific modes (or data_rate) can be found in the heliopy documentation. https://docs.heliopy.org/en/stable/reference/data/index.html - frame: frame used only for spacecraft attitude. Usually ‘dbcs’ Example: {‘prod’: [‘dc_mag’], ‘probes’: [‘1’, ‘2’], ‘coords’: ‘gse’, ‘mode’: ‘high_res’}

Returns

xarray_mission – Requested data contained within a xarray DataSet. Each data variable contains a specific product of a specific probe

Return type

DataSet

Velocity distribution function

vdf extension to xarray

AIDA module responsible for the vdf handling. etienne.behar

class aidapy.aidaxr.vdf.AidaAccessorVDF(xarray_obj)[source]

Xarray accessor responsible for the vdf utilities.

interpolate(start_time, end_time, start_time_sub, end_time_sub, species='electron', frame='instrument', grid_geom='spher', v_max=None, resolution=60, interp_schem='near', verbose=True)[source]

Main method of the AidaAccessorVDF.

Pre-processes the loaded data, initialises the rotation matrices and translations vectors, selects a time sub-interval, interpolates the data, post-process them and save them as new variables of the xarray dataset.

Parameters
  • start_time (datetime.datetime) – The start time of the global period of interst.

  • end_time (datetime.datetime) – The end time of the global period of interest.

  • start_time_sub (datetime.datetime) – The start time of the local period of interest.

  • end_time_sub (datetime.datetime) – The end time of the local period of interest.

  • species (str) – The species to be analysed, either ‘electron’ or ‘ion’

  • frame (str) –

    The reference frame in which the data will be analysed and

    visualised.

  • grid_geom (str) – The geometry of the grid-of-interest, along which VDF values will be interpolated. Either ‘cart’ for cartesian geometry, ‘spher’ or spherical, or ‘cyl’ for cylindrical.

  • v_max (float) – The maximum extend of the grid-of-interest. If None, taken as the maximum instrument speed coverage.

  • resolution (int) – Resolution of the grid-of-interest. Advised is between 50 and 200 for reasonable quality and computation time.

  • interp_schem (str) – Inteprolation scheme used by the method. Either ‘near’ for nearest-neighbour, ‘lin’ for tri-linear, or ‘cub’ for tri-cubic interpolation. The two first are provided by scipy (https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.RegularGridInterpolator.html) and the last by the tricubic package (https://pypi.org/project/tricubic/).

Returns

xr_mms

The xarray dataset. The interpolation products are added to the dataset

at the end of the method.

Return type

xarray dataset

plot(ptype='1d', plt_contourf=False, cmap='RdBu_r')[source]

Calls to plotting methods from aidapy.tools.vdf_plot

Parameters
  • ptype (str) – Type of plot, ‘1d’, ‘2d’, ‘3d’, ‘3d_time’ or ‘3d_gyro’

  • plt_contourf (bool) – If True, colormeshes are plotted using a filled contour method. Default is False.

  • cmap (str) – Valid matplotlib colormap string. Default is ‘RdBu_r’.

Machine Learning tool

Introduction

Examples

In this section, we provide simple and practical examples on how to use the different subpackages of the aidapy package.

Spacecraft data downloading

Introduction

Here we demonstrate how to download spacecraft data using the load_data function of AIDApy. Three missions are currently implemented: Omniweb, Cluster and MMS

[1]:
from datetime import datetime

#AIDApy Modules
from aidapy import load_data
import aidapy.aidaxr
Omniweb data

First, we set up the parameters of the data: time interval and name of the mission. For this example, we will download from Omniweb all available data between May 15, 2008 and May 16, 2008, thus all other parameters are let to default values.

[2]:
start_time_omni = datetime(2008, 5, 15, 0, 0, 0)
end_time_omni = datetime(2008, 5, 16, 0, 0, 0)

Once all the parameters have been set up, the load_data function can be called. The data are given in a xarray.DataSet format, which is an implementation of labeled and multi-dimensional arrays. A specific extension has been developed (aidapy.aidaxr) to provide specific pre and postprocessing of the data (see the statistical example).

[3]:
xr_omni = load_data(mission='omni', start_time=start_time_omni, end_time=end_time_omni)
xr_omni_all = xr_omni['all1']
print(xr_omni_all)
<xarray.DataArray 'all1' (time1: 25, products: 52)>
array([[2385. ,   51. ,   52. , ...,  -15. ,   10. ,    6. ],
       [2385. ,   51. ,   52. , ...,  -12. ,   10. ,    6.1],
       [2385. ,   51. ,   52. , ...,  -12. ,   11. ,    5.9],
       ...,
       [2385. ,   51. ,   52. , ...,  -17. ,   19. ,    5.7],
       [2385. ,   51. ,   52. , ...,  -16. ,   27. ,    5.9],
       [2385. ,   51. ,   52. , ...,  -13. ,   25. ,    6.2]])
Coordinates:
  * products  (products) <U33 'Bartels Rotation Number' ... 'Magnetosonic Mach No.'
  * time1     (time1) datetime64[ns] 2008-05-15 ... 2008-05-16
Attributes:
    Units:    {'Bartels Rotation Number': Unit(dimensionless), 'ID IMF Spacec...
[4]:
xr_omni_mag = xr_omni_all.sel(products=['Bx GSE, GSM', 'By GSM', 'Bz GSM'])
xr_omni_mag.plot.line(x='time1')
/home/rdupuis/miniconda3/envs/aidapy/lib/python3.7/site-packages/pandas/plotting/_matplotlib/converter.py:103: FutureWarning: Using an implicitly registered datetime converter for a matplotlib plotting method. The converter was registered by pandas on import. Future versions of pandas will require you to explicitly register matplotlib converters.

To register the converters:
        >>> from pandas.plotting import register_matplotlib_converters
        >>> register_matplotlib_converters()
  warnings.warn(msg, FutureWarning)
[4]:
[<matplotlib.lines.Line2D at 0x7ff6c028a750>,
 <matplotlib.lines.Line2D at 0x7ff6c0246050>,
 <matplotlib.lines.Line2D at 0x7ff6c0246310>]
_images/examples_01_missions_mission_test_6_2.png
Cluster data

Then, data can be downloaded from cluster mission. As this mission has several spacecraft, the index of the spacecraft called probe must also be provided. For this example, we will download the magnetic field during 5 minutes on August 5, 2013 for the probe 1 and 2 in GSE coordinate system.

[5]:
start_time_cluster = datetime(2013, 8, 5, 0, 0, 0)
end_time_cluster = datetime(2013, 8, 5, 0, 5, 0)

settings_cluster = {'prod': ['dc_mag'], 'probes': ['1', '2'], 'coords': 'gse'}
[6]:
xr_cluster = load_data('cluster', start_time_cluster, end_time_cluster, **settings_cluster)
print(xr_cluster)
<xarray.Dataset>
Dimensions:                        (B_vec_xyz_gse__C1_CP_FGM_SPIN: 3, B_vec_xyz_gse__C2_CP_FGM_SPIN: 3, time1: 71, time2: 72)
Coordinates:
  * B_vec_xyz_gse__C1_CP_FGM_SPIN  (B_vec_xyz_gse__C1_CP_FGM_SPIN) <U1 'x' ... 'z'
  * time1                          (time1) datetime64[ns] 2013-08-05T00:00:02.000119 ... 2013-08-05T00:04:58.000458
  * B_vec_xyz_gse__C2_CP_FGM_SPIN  (B_vec_xyz_gse__C2_CP_FGM_SPIN) <U1 'x' ... 'z'
  * time2                          (time2) datetime64[ns] 2013-08-05T00:00:02.000182 ... 2013-08-05T00:04:56.000512
Data variables:
    dc_mag1                        (time1, B_vec_xyz_gse__C1_CP_FGM_SPIN) float32 144.983 ... 4.068
    dc_mag2                        (time2, B_vec_xyz_gse__C2_CP_FGM_SPIN) float32 86.769 ... 29.271
Attributes:
    mission:        cluster
    load_settings:  {'prod': ['dc_mag'], 'probes': ['1', '2'], 'coords': 'gse'}
[7]:
xr_cluster['dc_mag1'].plot.line(x='time1')
[7]:
[<matplotlib.lines.Line2D at 0x7ff6c02b2f50>,
 <matplotlib.lines.Line2D at 0x7ff6c01291d0>,
 <matplotlib.lines.Line2D at 0x7ff6c012f190>]
_images/examples_01_missions_mission_test_10_1.png
[8]:
xr_cluster['dc_mag2'].plot.line(x='time2')
[8]:
[<matplotlib.lines.Line2D at 0x7ff6bf759810>,
 <matplotlib.lines.Line2D at 0x7ff6bf7be9d0>,
 <matplotlib.lines.Line2D at 0x7ff6bf7becd0>]
_images/examples_01_missions_mission_test_11_1.png
MMS data

Finally, data can be downloaded from Magnetospheric Multiscale Mission (MMS) mission. This mission has also several spacecrafts. For this example, we will download the magnetic field during 1 minute on April 8, 2018 for the probe 1 and 2 in GSE coordinate system.

[9]:
start_time_mms = datetime(2018, 4, 8, 0, 0, 0)
end_time_mms = datetime(2018, 4, 8, 0, 1, 0)

settings_mms = {'prod': ['dc_mag'], 'probes': ['1', '2'], 'coords': 'gse'}
[10]:
xr_mms = load_data(mission='mms', start_time=start_time_mms, end_time=end_time_mms, **settings_mms)
/home/rdupuis/miniconda3/envs/aidapy/lib/python3.7/site-packages/HelioPy-0+unknown-py3.7.egg/heliopy/data/util.py:601: UserWarning: The CDF provided units ('hours') for key 'mms1_mec_mlt' are unknown
  warnings.warn(message)
Creating new directory /home/rdupuis/heliopy/data/mms/2/fgm/srvy

/home/rdupuis/miniconda3/envs/aidapy/lib/python3.7/site-packages/HelioPy-0+unknown-py3.7.egg/heliopy/data/util.py:601: UserWarning: The CDF provided units ('hours') for key 'mms2_mec_mlt' are unknown
  warnings.warn(message)
[11]:
print(xr_mms['dc_mag1'])
xr_mms['dc_mag1'].plot.line(x='time1')
<xarray.DataArray 'dc_mag1' (time1: 849, mms1_fgm_b_gse_srvy_l2: 4)>
array([[ 1.712529,  2.073415, -1.225534,  2.95529 ],
       [ 1.714843,  2.064923, -1.207659,  2.943303],
       [ 1.710802,  2.077349, -1.215393,  2.952863],
       ...,
       [ 2.291496,  1.610505, -1.082593,  3.00278 ],
       [ 2.281956,  1.615301, -1.089721,  3.000669],
       [ 2.277435,  1.605456, -1.105932,  2.99788 ]], dtype=float32)
Coordinates:
  * mms1_fgm_b_gse_srvy_l2  (mms1_fgm_b_gse_srvy_l2) <U3 'x' 'y' 'z' 'tot'
  * time1                   (time1) datetime64[ns] 2018-04-08T00:00:06.987911 ... 2018-04-08T00:00:59.988633
Attributes:
    Units:    {'mms1_fgm_b_gse_srvy_l2': Unit("nT")}
    pos_gse:  <xarray.DataArray 'mms1_mec_r_gse' (time: 3, mms1_mec_r_gse: 3)...
[11]:
[<matplotlib.lines.Line2D at 0x7ff6bf78ff10>,
 <matplotlib.lines.Line2D at 0x7ff6b65c3a90>,
 <matplotlib.lines.Line2D at 0x7ff6b65c3790>,
 <matplotlib.lines.Line2D at 0x7ff6b65c3610>]
_images/examples_01_missions_mission_test_15_2.png
[12]:
settings_mms_dist = {'prod': ['i_dist'], 'probes': ['1'], 'coords': 'gse'}
xr_mms_dist = load_data(mission='mms', start_time=start_time_mms, end_time=end_time_mms, **settings_mms_dist)
/home/rdupuis/miniconda3/envs/aidapy/lib/python3.7/site-packages/HelioPy-0+unknown-py3.7.egg/heliopy/data/util.py:601: UserWarning: The CDF provided units ('hours') for key 'mms1_mec_mlt' are unknown
  warnings.warn(message)
[14]:
print(xr_mms_dist['i_dist1'])
<xarray.DataArray 'i_dist1' (time1: 14, mms1_dis_energy_fast: 32, mms1_dis_theta_fast: 16, mms1_dis_phi_fast: 32)>
array([[[[0.000000e+00, ..., 0.000000e+00],
         ...,
         [0.000000e+00, ..., 0.000000e+00]],

        ...,

        [[0.000000e+00, ..., 0.000000e+00],
         ...,
         [1.977800e-20, ..., 0.000000e+00]]],


       ...,


       [[[8.998902e-21, ..., 0.000000e+00],
         ...,
         [0.000000e+00, ..., 0.000000e+00]],

        ...,

        [[9.016211e-21, ..., 0.000000e+00],
         ...,
         [0.000000e+00, ..., 0.000000e+00]]]], dtype=float32)
Coordinates:
  * mms1_dis_energy_fast  (mms1_dis_energy_fast) float32 2.16 3.91 ... 17800.0
  * mms1_dis_theta_fast   (mms1_dis_theta_fast) float32 5.625 16.875 ... 174.375
  * mms1_dis_phi_fast     (mms1_dis_phi_fast) float32 2.75 14.0 ... 340.25 351.5
  * time1                 (time1) datetime64[ns] 2018-04-08T00:00:00.120976 ... 2018-04-08T00:00:58.621428
Attributes:
    mms1_dis_dist_fast:  s^3/cm^6
    Units:               {'mms1_dis_dist_fast': Unit("s3 / cm6"), 'mms1_dis_e...
    pos_gse:             <xarray.DataArray 'mms1_mec_r_gse' (time: 3, mms1_me...
[ ]:

Search for scientific processes

Introduction

Here we demonstrate how to search for events of specific scientific processes using the Event Search subpackage included in AIDApy. First, we start with importing the modules required by Event Search:

[5]:
#Python modules
from datetime import datetime
import numpy as np

#AIDApy modules
from aidapy import event_search
Looking for specific EDR events

In particular, here we want to look for Electron Diffusion Regions, which are tiny regions located at the very center of magnetic reconnection. To do so, we chose to use data from the MMS mission, which has been specifically designed to study the microphysics (down to electron scales) of magnetic reconnection. We also chose to look at data on 2017 July 20, during which a specific campaign was dedicated to the observation of reconnection in the magnetotail. The time interval is thus defined as:

[6]:
start_time = datetime(2017, 8, 20, 2, 0, 0)
end_time = datetime(2017, 8, 20, 2, 10, 0)

Then, we need to set up the settings for the specific scientific process we want to study; i. e., here Electron Diffusion Region (EDR). The EDR is characterized by a very low magnetic field (< 5 nT), high-speed electron jets (> 4000 km/s) and a high currents (current density > 70 nA/m²). Thus we define these criteria as inputs to the Event Search tool, in a format compliant with the event_search function (see the Event Search section in the user guide). Additionally, we chose to download data from all probes on board MMS mission, in the default GSE coordinate system. Finally, since the EDR region size is of electron scale, we chose to use the high-resolution burst-mode data from MMS, and a time window of 60 s.

[7]:
settings = {"criteria": lambda dc_mag_tot, j_curl_tot, e_bulkv_x: (np.any(dc_mag_tot < 5)) &
            (np.any(j_curl_tot > 70)) & (np.any(np.abs(e_bulkv_x) > 4000)),
            "parameters": {"mission": "mms",
                           "process": "edr",
                           "probes": ['1'],
                           "mode": "brst",
                           "time_window": 120,
                           "time_step": 120}}

Once the settings on the data and parameters are defined, we call event_search function as follows:

[9]:
event_search(settings, start_time, end_time, plot=True)
C:\Users\breuillard\AppData\Local\Continuum\anaconda3\lib\site-packages\heliopy-0+unknown-py3.7.egg\heliopy\data\util.py:601: UserWarning: The CDF provided units ('0, 1') for key 'mms1_des_compressionloss_brst' are unknown
  warnings.warn(message)
C:\Users\breuillard\AppData\Local\Continuum\anaconda3\lib\site-packages\heliopy-0+unknown-py3.7.egg\heliopy\data\util.py:601: UserWarning: The CDF provided units ('0, 1') for key 'mms1_des_steptable_parity_brst' are unknown
  warnings.warn(message)
C:\Users\breuillard\AppData\Local\Continuum\anaconda3\lib\site-packages\heliopy-0+unknown-py3.7.egg\heliopy\data\util.py:601: UserWarning: The CDF provided units ('00-32') for key 'mms1_des_sector_despinp_brst' are unknown
  warnings.warn(message)
C:\Users\breuillard\AppData\Local\Continuum\anaconda3\lib\site-packages\heliopy-0+unknown-py3.7.egg\heliopy\data\util.py:601: UserWarning: The CDF provided units ('hours') for key 'mms1_mec_mlt' are unknown
  warnings.warn(message)
C:\Users\breuillard\AppData\Local\Continuum\anaconda3\lib\site-packages\heliopy-0+unknown-py3.7.egg\heliopy\data\util.py:601: UserWarning: The CDF provided units ('hours') for key 'mms1_mec_mlt' are unknown
  warnings.warn(message)
C:\Users\breuillard\AppData\Local\Continuum\anaconda3\lib\site-packages\heliopy-0+unknown-py3.7.egg\heliopy\data\util.py:601: UserWarning: The CDF provided units ('hours') for key 'mms1_mec_mlt' are unknown
  warnings.warn(message)
C:\Users\breuillard\AppData\Local\Continuum\anaconda3\lib\site-packages\heliopy-0+unknown-py3.7.egg\heliopy\data\util.py:601: UserWarning: The CDF provided units ('hours') for key 'mms2_mec_mlt' are unknown
  warnings.warn(message)
C:\Users\breuillard\AppData\Local\Continuum\anaconda3\lib\site-packages\heliopy-0+unknown-py3.7.egg\heliopy\data\util.py:601: UserWarning: The CDF provided units ('hours') for key 'mms3_mec_mlt' are unknown
  warnings.warn(message)
C:\Users\breuillard\AppData\Local\Continuum\anaconda3\lib\site-packages\heliopy-0+unknown-py3.7.egg\heliopy\data\util.py:601: UserWarning: The CDF provided units ('hours') for key 'mms4_mec_mlt' are unknown
  warnings.warn(message)
_images/examples_01_missions_event_search_mms_edr_7_1.png
[9]:
[0, 3999, 8000, 11999]

As seen above, the function gives as ouput an overview plot with highlighted events for the user to eyeball the results, but if to_file parameter set to True, also writes a simple text file with the events selected, as follows:

[10]:
#events_list.txt

'''
List of events found:

N°,       START TIME (UTC),              END TIME (UTC),                        S/C location (km)
1, 2017-08-10T12:18:01.728287000, 2017-08-10T12:19:01.208817000          -96743.35/16839.555/30637.818
''';
[ ]:

Overview of adiapy tools for VDF analysis


illustration ___ The present Notebook illustrates the possibilities offered by aidapy for handling and analysing particle Velocity Distribution Functions (VDFs) observed by the Magnetospheric MultiScale (MMS) mission.

[1]:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from datetime import datetime
from IPython.display import Image
import sys, os
import warnings
warnings.filterwarnings('ignore')
Regular aidapy setup

We first import the data loading utilities (aidapy and heliopy) and the xarray classes/accessors.

[2]:
from aidapy import load_data
import aidapy.aidaxr

We select global start and end times for the data to be loaded. A sub-selection is allowed later for finer analyses. But a time selection shorter than the duration of a single file results in no file being loaded at all.

[3]:
start_time = datetime(2019, 3, 8, 13, 54, 53)
end_time   = datetime(2019, 3, 8, 13, 57, 0)

The settings dictionnary allows us to define the products we want to load. In this example, we will load the ion and electron VDFs via the keywords i_dist and e_dist, the (fluxgate) magnetic field (dc_mag), some spacecraft attitude products (sc_att) and the ion bulk velocity i_bulkv.

[4]:
settings = {'prod': ['i_dist', 'e_dist', 'dc_mag', 'sc_att', 'i_bulkv'],
            'probes': ['1'], 'coords': 'gse', 'mode': 'high_res', 'frame':'gse'}

The data are loaded with the following call of load_data:

[5]:
xr_mms = load_data(mission='mms', start_time=start_time, end_time=end_time, **settings)

The previous steps are shared by all applications using aidapy, though the precise products vary between the applications.

Probes positions

We can easely get a quick global view of the probes’ positions for the chosen time interval using the SDC plots. We find that between 13:00:00 and 14:00:00, the probes were in the magnetosheath.

[6]:
url_sdc = 'https://lasp.colorado.edu/mms/sdc/public/data/sdc/mms_orbit_plots/'
url_sdc += 'mms_orbit_plot_{:4d}{:02d}{:02d}{:02d}0000.png'.format(start_time.year, start_time.month, start_time.day, start_time.hour)
Image(url=url_sdc, embed=True)
[6]:
_images/examples_01_missions_vdf_mms_overview_12_0.png
VDF-specific aidapy features

Next, we will explore the possibilities offered by the librairy for VDFs interpolation and visualisations. aidapy visualisation tools only handle interpolated VDFs. ### Setting up and running the interpolation To set up the interpolation, we first select the species of interest, the frame of interest in which we want to visualise the VDFs (available are the instrument frame instrument and the B-aligned frame B), the coordinate system/geometry of the interpolation grid (cartesian cart, spherical spher or cylindrical cyl), its maximum extent in terms of speed and its resolution, and the method used for the interpolation, to chose between nearest-neighbour (near), trilinear (lin) and tricubic (cub).

[7]:
species      = 'electron'
frame        = 'B'
grid_geom    = 'spher'
v_max        = 1.2e7
resolution   = 100
interp_schem = 'lin'

Additionaly, one can select a sub-interval of time, shorter than the overall interval defined before loading the data.

[8]:
start_time_sub = datetime(start_time.year, start_time.month, start_time.day, 13, 55, 0, 100000)
end_time_sub   = datetime(start_time.year, start_time.month, start_time.day, 13, 55, 1, 500000)

The interpolation method is now called with the defined parameters. Depending on the chosen resolution and the chosen interpolating method, the interpolation may take quite some time. The index of the current distribution being interpolated is shown and updated every 10 VDF.

[9]:
xr_mms = xr_mms.vdf.interpolate(start_time, end_time, start_time_sub, end_time_sub,
                        species=species, frame=frame, grid_geom=grid_geom,
                        v_max=v_max, resolution=resolution, interp_schem=interp_schem)


.____________________________________________________________
| mms_vdf.py, aidapy.
|
| Product(s):
|   - i_dist
|   - e_dist
|   - dc_mag
|   - sc_att
|   - i_bulkv
| Grid geometry:    spher
| Resolution:       100
| Interpolation:    lin
| Start time:       2019-03-08T13:55:00.115349000
| Stop time :       2019-03-08T13:55:01.525361000
| Ind. start-stop:  237-284
| Nb distributions: 47
|____________________________________________________________


280/4234 (current VDF/nb total VDFs)
Total runtime: 14 s.

The output of the method indicates that with the chosen sub-interval, 47 distributions were interpolated, with their corresponding indices in the original data file/arrays.

Visualisations

A few visualisations are offered by the aidaxr.graphical module, and are all illustrated below. The first one uses the spherical geometry of the grid to display the parallel and perpedicular 1D profiles of the VDF averaged over time, in this case 35 times 30 ms. The striking feature of these profiles are the arcs appearing for medium to high velocities. These are visual artifacts due to the linear interpolation, resulting in straight segments between two measurment points, and in arc segments in a logarithmic scale.

[10]:
xr_mms.vdf.plot('1d')
_images/examples_01_missions_vdf_mms_overview_20_0.png

The main new possibility offered by the library is the visualisation of the distribution using a scaled view and a normalised view, aiming at highlighting features of a VDF by getting rid of its high gradients, without using additional information (VDF at some other time, or fitting procedures). These views can be displayed using the vdf_spher() method, within the \(v_{para}, v_{perp})\)-plane. the plt_contourf keyword (boolean) toggles the use of a filled contour representation implemented in matplotib.pyplot

[11]:
xr_mms.vdf.plot('2d', plt_contourf=True)
_images/examples_01_missions_vdf_mms_overview_22_0.png

A cylindrical average is often used to display the data in the (\(v_{para}, v_{perp})\)-plane. For quickly checking that the VDF does not present obvious departure from gyrotropy, the following vdf_gyro() method displays the equatorial cut of the VDF (left-most full circular plot), and four angular segment of the spherical grid, their angular range displayed in the left-most plot by the dashed lines. The first row displays the original interpolated VDF values, the second row presents the scaled values. We see in this representation that the characteristic double branch feature seen in the scaled view is indeed gyrotropic.

[12]:
xr_mms.vdf.plot('3d_gyro')
_images/examples_01_missions_vdf_mms_overview_24_0.png

These three first methods operate a time average over the entire sub-selection (from start_time_sub to end_time_sub). vdf_spher_time retain the time dimension, by splitting the energy/speed dimension in 5 intervals and averaging the scaled VDF over speed, over each of these intervals. The result, namely five scaled pitch-angle distributions, is close to what would be obtained by a classical pitch-angle distributions. We see in this example the 35 distributions along time, and check that the double branch signature is consistently found through time.

[13]:
xr_mms.vdf.plot('3d_time', plt_contourf=False)
_images/examples_01_missions_vdf_mms_overview_26_0.png
Changing parameters and re-interpolating VDFs

We show now that using the exact same loaded data, one can reset the parameters, change the species, the resolution, the interpolation scheme, etc., and re-interpolate the VDFs. Note that the time sub-interval can be redefined here. In this example however, we keep the same time selection to compare electrons and ions.

[14]:
grid_geom  = 'cart'
species    = 'ion'
resolution = 120
v_max      = 8.e5

The interpolating method is called exactly the same way as previously. Because of the difference of measurement cadence, we only get 7 distributions in this case, with a corresponding gain in time.

[16]:
xr_mms = xr_mms.vdf.interpolate(start_time, end_time, start_time_sub, end_time_sub,
                        species=species, frame=frame, grid_geom=grid_geom,
                        v_max=v_max, resolution=resolution, interp_schem=interp_schem)


.____________________________________________________________
| mms_vdf.py, aidapy.
|
| Product(s):
|   - i_dist
|   - e_dist
|   - dc_mag
|   - sc_att
|   - i_bulkv
| Grid geometry:    cart
| Resolution:       120
| Interpolation:    lin
| Start time:       2019-03-08T13:55:00.235349000
| Stop time :       2019-03-08T13:55:01.585361000
| Ind. start-stop:  48-57
| Nb distributions: 9
|____________________________________________________________


50/847 (current VDF/nb total VDFs)
Total runtime: 4 s.

A cartesian coordinate system is much less handy when it comes to representation and selections. Only one plotting method is offered in this case, which provides cuts through the time average VDF, with iso-contours.

[17]:
xr_mms.vdf.plot('3d')
_images/examples_01_missions_vdf_mms_overview_32_0.png
Manual handling of the interpolated VDFs

Until now, the user has only been calling methods and hasn’t been manipulating the various products. We breifly illustrate here how to access the products, in order to code specific representations, or implement some further analyses. We also refer to the source scripts of each of the above plotting methods for more detailled use cases.

Let’s first check the content of the dataset xr_mms:

[18]:
xr_mms
[18]:
<xarray.Dataset>
Dimensions:                   (mms1_des_energy_brst: 32, mms1_des_phi_brst: 32, mms1_des_theta_brst: 16, mms1_dis_bulkv_gse_brst: 3, mms1_dis_energy_brst: 32, mms1_dis_phi_brst: 32, mms1_dis_theta_brst: 16, mms1_fgm_b_gse_brst_l2: 4, mms1_mec_quat_eci_to_gse: 4, phi: 120, speed: 120, theta: 120, time: 9, time1: 847, time2: 4234, time3: 16253, time4: 5, time5: 847, v: 3, v_para: 120, v_perp: 120, vx: 120, vy: 120, vz: 120)
Coordinates:
  * mms1_dis_energy_brst      (mms1_dis_energy_brst) float32 2.16 ... 17800.0
  * mms1_dis_phi_brst         (mms1_dis_phi_brst) float32 7.0 18.25 ... 355.75
  * mms1_dis_theta_brst       (mms1_dis_theta_brst) float32 5.625 ... 174.375
  * time1                     (time1) datetime64[ns] 2019-03-08T13:54:53.035267 ... 2019-03-08T13:56:59.936741
  * mms1_des_theta_brst       (mms1_des_theta_brst) float32 5.625 ... 174.375
  * mms1_des_phi_brst         (mms1_des_phi_brst) float32 5.375 ... 354.125
  * mms1_des_energy_brst      (mms1_des_energy_brst) float32 6.52 ... 27525.0
  * time2                     (time2) datetime64[ns] 2019-03-08T13:54:53.005267 ... 2019-03-08T13:56:59.996741
  * mms1_fgm_b_gse_brst_l2    (mms1_fgm_b_gse_brst_l2) <U3 'x' 'y' 'z' 'tot'
  * time3                     (time3) datetime64[ns] 2019-03-08T13:54:53.024982 ... 2019-03-08T13:56:59.995838
  * mms1_mec_quat_eci_to_gse  (mms1_mec_quat_eci_to_gse) <U3 'x' 'y' 'z' 'tot'
  * time4                     (time4) datetime64[ns] 2019-03-08T13:55:00 ... 2019-03-08T13:57:00
  * mms1_dis_bulkv_gse_brst   (mms1_dis_bulkv_gse_brst) <U1 'x' 'y' 'z'
  * time5                     (time5) datetime64[ns] 2019-03-08T13:54:53.035267 ... 2019-03-08T13:56:59.936741
Dimensions without coordinates: phi, speed, theta, time, v, v_para, v_perp, vx, vy, vz
Data variables:
    i_dist1                   (time1, mms1_dis_energy_brst, mms1_dis_theta_brst, mms1_dis_phi_brst) float32 0.0 ... 0.0
    e_dist1                   (time2, mms1_des_energy_brst, mms1_des_theta_brst, mms1_des_phi_brst) float32 1.2165267e-25 ... 0.0
    dc_mag1                   (time3, mms1_fgm_b_gse_brst_l2) float32 -0.29875872 ... 41.451748
    sc_att1                   (time4, mms1_mec_quat_eci_to_gse) float64 -0.2019 ... 0.9733
    i_bulkv1                  (time5, mms1_dis_bulkv_gse_brst) float32 4.4563413 ... -76.759415
    vdf_interp_time           (time, vx, vz) float32 1.2914518e-15 ... 2.6537375e-16
    vdf_interp                (vx, vy, vz) float64 3.526e-16 ... 2.228e-15
    grid_interp_cart          (v, vx, vy, vz) float32 -800000.0 ... 800000.0
    grid_interp_spher         (v, speed, theta, phi) float32 1385640.6 ... 0.7853982
    grid_interp_cyl           (v, v_perp, phi, v_para) float32 1131370.9 ... 800000.0
    time_interp               (time) datetime64[ns] 2019-03-08T13:55:00.235349 ... 2019-03-08T13:55:01.435361
    grid_geom                 <U4 'cart'
Attributes:
    mission:        mms
    load_settings:  {'prod': ['i_dist', 'e_dist', 'dc_mag', 'sc_att', 'i_bulk...

One can spot the four variables added by the execution of interpolate() method: distrib_interp, grid_interp_spher, grid_interp_cart and time_interp . grid_* variables are three dimensional grids giving three coordinates for each of the VDF values. We can get the 1D arrays of the centers of the bins as shown here:

[19]:
vdf_time = xr_mms['vdf_interp_time'].values
grid_cart = xr_mms['grid_interp_cart'].values
centers_vx = grid_cart[0,:,0,0]
centers_vy = grid_cart[1,0,:,0]
centers_vz = grid_cart[2,0,0,:]

We check the content and dimensions of the distrib_interp variable:

[20]:
xr_mms['vdf_interp_time'].dims
[20]:
('time', 'vx', 'vz')

Therefore vdf_time is a time series of interpolated VDFs, along the dimensions \(v_x\) and \(v_z\). [The reason for the absence of \(v_y\) is a matter of memory saving and is obviously not ideal. It is “herited” from the spherical product, where the two dimensions of speed and pitch angle are enough to well describe the data] As an arbitrary example, we can plot the VDFs averaged over the perpendicular velocities \(v_x\) (different from the parallel profile shown previously, which only shows VDF values for velocities close to \(v_{perp}=0\)).

[21]:
%matplotlib notebook
fig, ax = plt.subplots(figsize=(9,9))

ind_mid = int(resolution/2.)

for d in vdf_time:
    ax.loglog(-1.*centers_vz[:ind_mid], np.nanmean(d, axis=0)[:ind_mid], '--', lw=1)
    ax.loglog(centers_vz[ind_mid:], np.nanmean(d, axis=0)[ind_mid:], lw=1)
ax.set_xlabel('v_para (m/s)')
ax.set_ylabel('VDF (#/m^6/s^3)')

plt.show()

Disturbance Storm Time analysis from OMNI data

This section contains notebooks for the analysis of Disturbance Storm Time (DST) index using OMNI data. COPY/PASTE FROM WIKIPEDIA TO BE MODIFIED The DST index is widely used in the context of space weather and gives information about the strength of the ring current around Earth caused by solar protons and electrons. The ring current around Earth produces a magnetic field that is directly opposite Earth’s magnetic field, i.e. if the difference between solar electrons and protons gets higher, then Earth’s magnetic field becomes weaker. A negative Dst value means that Earth’s magnetic field is weakened. This is particularly the case during solar storms.

The two notebooks download time series data from OMNI:

  • The First notebook trains several models to forecast the DST index and evaluates their performance.

  • The Second notebook analyses the influence of each variable on the DST index by computing correlations and selecting the best features.

DST forecasting from OMNI data

This example shows how to use AIDApy to perform the following tasks:

  • Download time series data from OMNI

  • Preprocess the data so that it can be used for machine learning

  • Train several models and evaluate their performance

Downloading data

We will first download low-resolution data from OMNI.

[1]:
from datetime import datetime
from aidapy import load_data

# Set the start and end date as year, month, day
t0 = datetime(2005, 1, 1)
t1 = datetime(2015, 12, 31)

# Download the data
omnixr = load_data(mission='omni', start_time=t0, end_time=t1)

# Store data in pandas format
pd_data = omnixr['all1'].to_pandas()

pd_data.describe()
[1]:
products Bartels Rotation Number ID IMF Spacecraft ID SW Plasma Spacecraft points(IMF Average) points(Plasma Average) |B| Magnitude of Avg Field Vector Lat. Angle of Aver. Field Vector Long. Angle of Aver. Field Vector Bx GSE, GSM ... Proton Flux > 10MeV Proton Flux > 30MeV Proton Flux > 60MeV flag ap index f10.7 index PC(N) index AL index (Kyoto) AU index (Kyoto) Magnetosonic Mach No.
count 96385.000000 96385.000000 96354.00000 96385.000000 96354.00000 96385.000000 96385.000000 96385.000000 96385.000000 96385.000000 ... 91174.000000 91171.000000 91169.000000 96385.000000 96385.000000 96265.000000 96212.000000 96385.000000 96385.000000 95501.000000
mean 2413.760481 51.469160 52.41442 58.407595 34.32062 5.216666 4.620777 0.442976 200.817234 -0.003756 ... 4.467667 0.895241 0.283256 -0.945936 8.207802 98.542824 0.876475 -93.396431 56.163034 5.769463
std 42.939382 3.027076 2.95985 7.630976 5.37908 2.785629 2.659993 29.518247 101.021925 3.079144 ... 74.289989 15.688849 5.969648 0.226146 12.072234 29.679255 1.171071 128.300576 61.312826 1.119411
min 2339.000000 51.000000 51.00000 1.000000 1.00000 0.400000 0.100000 -89.200000 0.000000 -40.800000 ... 0.050000 0.040000 0.030000 -1.000000 0.000000 65.100000 -6.900000 -2452.000000 -225.000000 0.600000
25% 2377.000000 51.000000 52.00000 58.000000 34.00000 3.400000 2.900000 -18.600000 121.100000 -2.200000 ... 0.140000 0.080000 0.060000 -1.000000 3.000000 73.500000 0.100000 -120.000000 17.000000 5.100000
50% 2414.000000 51.000000 52.00000 60.000000 36.00000 4.600000 4.000000 0.100000 182.600000 0.000000 ... 0.180000 0.100000 0.070000 -1.000000 5.000000 88.400000 0.600000 -38.000000 34.000000 5.800000
75% 2451.000000 51.000000 52.00000 61.000000 37.00000 6.200000 5.600000 19.200000 300.600000 2.200000 ... 0.220000 0.120000 0.090000 -1.000000 9.000000 117.800000 1.300000 -17.000000 72.000000 6.500000
max 2488.000000 71.000000 71.00000 91.000000 52.00000 55.400000 54.500000 89.800000 360.000000 26.400000 ... 4559.950000 1210.000000 956.000000 0.000000 300.000000 255.000000 18.400000 15.000000 873.000000 10.900000

8 rows × 52 columns

Split the data into different sets

For evaluating the performance of a machine learning approach, the input data has to be split in different sets:

  • A training set, on which the model is trained

  • Optionally, a validation set, to tune so-called hyperparameters of the model

  • A test set, which is only used at the end to evaluate the performance of the model

Such a separation is important to prevent overfitting. Since there is usually a correlation between consecutive observations in a time series, it is important not to shuffle the data before splitting it in different sets. Otherwise, the test set would not be truly independent from the training set.

[2]:
from sklearn.model_selection import train_test_split

# Split into training and test data
dtrain, dtest = train_test_split(pd_data, shuffle=False)
Let’s have a look at the data

Below, the DST index is shown. Note that there are quiet periods, and periods with more magnetic storms, indicated by large negative values.

[3]:
import matplotlib.pyplot as plt
from pandas.plotting import register_matplotlib_converters
%matplotlib notebook
register_matplotlib_converters()

plt.figure()
plt.title("DST index (training/test data)")
dtrain['DST Index'].plot(label='train')
dtest['DST Index'].plot(label='test')
plt.gcf().autofmt_xdate()
plt.legend();
Selecting features

The features are here selected by hand (and they are not necessarily the best ones).

[4]:
features = ['|B|',
            'Magnitude of Avg Field Vector',
            'Proton Density',
            'Proton Temperature',
            'Alfven Mach Number',
            'Bz GSM',
            'Na/Np',
            'Plasma Flow Speed',
            'Plasma Beta',
            'Electric Field',
            'DST Index']
targets = ['DST Index']
Preprocessing data

The training and test data are preprocessed so that we can make predictions for the targets based on a history of the features. If the original features and targets are \(X_i\) and \(y_i\), then we define new features \(X_i' = [X_i, X_{i-1}, \dots, X_{i-n-1}]\), where \(n\) is the history size. We use these features to make predictions at some future time, so that the new targets are \(y_i' = y_{i+k}\), where \(k\) is the forecast time.

Another question is what to do with missing values. This is a common issue in working with space craft data – in particular if data from multiple instruments is combined. We have opted for a simple approach: remove all samples for which some data is missing, either in the features or the targets.

[5]:
from aidapy.ml import preprocess

histsize = 8                    # Number of past hours
forecast_time = 1               # Hours into the future

# Use the AIDApy preprocessing method for time series
X_train, y_train, mask_train = preprocess.time_series(
    dtrain[features].values, dtrain[targets].values, histsize, forecast_time)
X_test, y_test, mask_test = preprocess.time_series(
    dtest[features].values, dtest[targets].values, histsize, forecast_time)

t_train = dtrain.index[mask_train]
t_test = dtest.index[mask_test]
Regression models

For the first use case, the following regression models are used: * A fully-connected neural network implemented in PyTorch. Users can customize the number of layers, the number of neurons per layer, and the activation functions of the layers. * A linear regression model imported from scikit-learn

[6]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from aidapy.ml import mlp
from skorch import NeuralNetRegressor
import torch

models = []

models.append({
    'name': 'Linear Regression',
    'pipe': Pipeline(steps=[('preprocess', StandardScaler()),
                            ('model', LinearRegression())])
})

# RegressorMlp is a simple, fully-connected neural network,
# for which the layer sizes are defined below. The default
# activation function is ReLU.
mlp_model = NeuralNetRegressor(
    mlp.RegressorMlp,
    max_epochs=25,
    lr=0.001,
    batch_size=128,
    optimizer=torch.optim.Adam,
    module__layer_sizes=[X_train.shape[1], 64, 64, 64, 1]
)

models.append({
    'name': 'Multilayer perceptron',
    'pipe': Pipeline(steps=[('preprocess', StandardScaler()),
                            ('model', mlp_model)])
})
[7]:
from sklearn.metrics import r2_score

for model in models:
    model['pipe'].fit(X_train, y_train)
    model['test_predict'] = model['pipe'].predict(X_test)
    model['train_predict'] = model['pipe'].predict(X_train)

for model in models:
    print("{:30} R2 score on test / train set:  {:8.3f} {:8.3f}".format(
        model['name'], r2_score(y_test, model['test_predict']),
        r2_score(y_train, model['train_predict'])))
  epoch    train_loss    valid_loss     dur
-------  ------------  ------------  ------
      1       60.7240       32.4020  2.9466
      2       17.2051       27.2859  1.8680
      3       13.9108       27.3186  4.9162
      4       11.7659       25.7481  3.9948
      5       11.0440       29.0479  2.6648
      6       10.3904       26.3400  2.7699
      7       10.1087       27.2759  1.9154
      8        9.7082       24.8305  2.6972
      9        9.3186       24.3196  2.6852
     10        9.0785       23.8498  3.3641
     11        8.9514       23.7893  2.6362
     12        8.8422       23.3065  4.0155
     13        8.6888       22.9633  2.9111
     14        8.4773       21.6068  2.7943
     15        8.4333       22.4387  2.9364
     16        8.2843       21.1695  3.0378
     17        8.1932       21.9871  3.1358
     18        8.0966       20.9862  3.1069
     19        8.1332       22.9881  3.7974
     20        7.9812       20.2496  4.5640
     21        7.9269       21.3206  4.9270
     22        7.9495       21.2599  3.3066
     23        7.7617       19.6344  5.3131
     24        7.6907       20.3049  1.9088
     25        7.6336       19.9777  2.4472
Linear Regression              R2 score on test / train set:     0.969    0.964
Multilayer perceptron          R2 score on test / train set:     0.952    0.948

Let’s compare the predictions with the actual data

[8]:
plt.figure()
plt.title('Predicting the change in DST Index')
plt.plot(t_test, y_test, label='Data test')
plt.plot(t_train, y_train, label='Data train')
for model in models:
    p = plt.plot(t_test, model['test_predict'], label=model['name'])
    # Plot another line with the same color
    plt.plot(t_train, model['train_predict'], color=p[0].get_color())
plt.legend()
plt.gcf().autofmt_xdate()
plt.grid(True)
This looks pretty good, but…

Most variables we consider in a time series have strong auto-correlation. This means that future values strongly resemble past values; examples are the temperature at a location on earth or the price of a stock on a stock market. When making predictions for a variable with strong auto-correlation, one has to be careful in assessing the predictive power of a model. Simply predicting the past state as the future state can look convincing (also numerically in most error norms), but gives little information.

Below, we therefore take the following approach: predict the change in the variable.

[9]:
from aidapy.ml import preprocess

histsize = 24                   # Number of past hours
forecast_time = 1               # Hours into the future

# Use the AIDApy preprocessing method for time series
# With the predict_change=True flag, the change in the DST index
# is the target variable
X_train, y_train, mask_train = preprocess.time_series(
    dtrain[features].values, dtrain[targets].values,
    histsize, forecast_time, predict_change=True)
X_test, y_test, mask_test = preprocess.time_series(
    dtest[features].values, dtest[targets].values,
    histsize, forecast_time, predict_change=True)

t_train = dtrain.index[mask_train]
t_test = dtest.index[mask_test]

models = []

models.append({
    'name': 'Linear Regression',
    'pipe': Pipeline(steps=[('preprocess', StandardScaler()),
                            ('model', LinearRegression())])
})

# RegressorMlp is a simple, fully-connected neural network,
# for which the layer sizes are defined below. The default
# activation function is ReLU.
mlp_model = NeuralNetRegressor(
    mlp.RegressorMlp,
    max_epochs=25,
    lr=0.001,
    batch_size=128,
    optimizer=torch.optim.Adam,
    module__layer_sizes=[X_train.shape[1], 64, 64, 64, 1]
)

models.append({
    'name': 'ANN',
    'pipe': Pipeline(steps=[('preprocess', StandardScaler()),
                            ('model', mlp_model)])
})
[10]:
from sklearn.metrics import r2_score

for model in models:
    model['pipe'].fit(X_train, y_train)
    model['test_predict'] = model['pipe'].predict(X_test)
    model['train_predict'] = model['pipe'].predict(X_train)

for model in models:
    print("{:30} R2 score on test / train set:  {:8.3f} {:8.3f}".format(
        model['name'], r2_score(y_test, model['test_predict']),
        r2_score(y_train, model['train_predict'])))
  epoch    train_loss    valid_loss     dur
-------  ------------  ------------  ------
      1        9.3040       13.4525  2.5473
      2        8.0748       13.2288  2.4043
      3        7.6921       12.3340  2.4899
      4        7.5276       12.2525  3.9910
      5        7.3974       12.1197  3.8056
      6        7.2771       11.9705  3.0214
      7        7.1804       11.9094  2.0155
      8        7.0822       11.9217  2.8191
      9        7.0014       11.8192  2.6564
     10        6.8758       11.8942  2.6622
     11        6.7928       11.8929  1.9705
     12        6.6663       11.8819  1.9809
     13        6.6267       12.3010  2.6679
     14        6.5041       12.2031  1.8921
     15        6.4059       12.2843  2.4354
     16        6.3420       12.9039  3.0471
     17        6.2806       12.1478  2.7479
     18        6.1105       13.0912  4.9462
     19        6.0671       12.2712  4.6252
     20        5.9791       12.7974  1.9119
     21        5.9633       12.8095  3.2434
     22        6.0265       13.1740  2.2317
     23        5.9151       13.1795  1.7708
     24        5.8232       13.1638  3.2358
     25        5.7188       12.9696  4.8095
Linear Regression              R2 score on test / train set:     0.389    0.411
ANN                            R2 score on test / train set:     0.287    0.457
[11]:
plt.figure()
plt.title('Predicting the change in DST Index')
plt.plot(t_test, y_test, label='Data test')
plt.plot(t_train, y_train, label='Data train')
for model in models:
    p = plt.plot(t_test, model['test_predict'], label=model['name'])
    # Plot another line with the same color
    plt.plot(t_train, model['train_predict'], color=p[0].get_color())
plt.legend()
plt.gcf().autofmt_xdate()
plt.grid(True)
[ ]:

Predicting the DST current index from other variables
Downloading data

We will first download low-resolution data from OMNI, and split it into a training and a test set.

[1]:
from datetime import datetime
from aidapy import load_data
from sklearn.model_selection import train_test_split

# Set the start and end date as year, month, day
t0 = datetime(2000, 1, 1)
t1 = datetime(2015, 12, 31)

# Download the data
omnixr = load_data(mission='omni', start_time=t0, end_time=t1)

# Store data in pandas format
pd_data = omnixr['all1'].to_pandas()

# Split into training and test data
dtrain, dtest = train_test_split(pd_data, shuffle=False)

pd_data.describe()
Downloading https://cdaweb.gsfc.nasa.gov/pub/data/omni//low_res_omni/omni2_2000.dat to /users/cpa/romaind/heliopy/data/omni/omni2_2000.dat
100.0% 2883584 / 2881152


Downloading https://cdaweb.gsfc.nasa.gov/pub/data/omni//low_res_omni/omni2_2001.dat to /users/cpa/romaind/heliopy/data/omni/omni2_2001.dat
100.0% 2875392 / 2873280


Downloading https://cdaweb.gsfc.nasa.gov/pub/data/omni//low_res_omni/omni2_2002.dat to /users/cpa/romaind/heliopy/data/omni/omni2_2002.dat
100.0% 2875392 / 2873280


Downloading https://cdaweb.gsfc.nasa.gov/pub/data/omni//low_res_omni/omni2_2003.dat to /users/cpa/romaind/heliopy/data/omni/omni2_2003.dat
100.0% 2875392 / 2873280


Downloading https://cdaweb.gsfc.nasa.gov/pub/data/omni//low_res_omni/omni2_2004.dat to /users/cpa/romaind/heliopy/data/omni/omni2_2004.dat
100.0% 2883584 / 2881152


[1]:
products Bartels Rotation Number ID IMF Spacecraft ID SW Plasma Spacecraft points(IMF Average) points(Plasma Average) |B| Magnitude of Avg Field Vector Lat. Angle of Aver. Field Vector Long. Angle of Aver. Field Vector Bx GSE, GSM ... Proton Flux > 10MeV Proton Flux > 30MeV Proton Flux > 60MeV flag ap index f10.7 index PC(N) index AL index (Kyoto) AU index (Kyoto) Magnetosonic Mach No.
count 140233.000000 140233.000000 140105.000000 140233.000000 140105.000000 140233.000000 140233.000000 140233.000000 140233.000000 140233.000000 ... 113159.000000 113134.000000 113120.000000 140233.000000 140233.000000 140113.000000 140035.000000 140233.000000 140233.000000 135984.000000
mean 2379.926836 55.852353 56.569801 48.675333 30.049670 5.827601 5.174818 0.130479 201.635718 -0.004155 ... 7.994024 2.036895 0.759653 -0.807028 10.408599 116.238255 0.960367 -110.092596 68.026370 5.641059
std 62.473060 8.573346 8.166480 19.356001 9.397953 3.219993 3.069832 29.213007 100.877033 3.473102 ... 152.075039 48.944587 18.042820 0.394632 16.728629 43.382109 1.345067 143.618415 70.739359 1.158862
min 2272.000000 51.000000 45.000000 1.000000 1.000000 0.400000 0.100000 -89.200000 0.000000 -40.800000 ... 0.010000 0.010000 0.010000 -1.000000 0.000000 65.100000 -21.500000 -2452.000000 -260.000000 0.600000
25% 2326.000000 51.000000 52.000000 51.000000 17.000000 3.800000 3.200000 -18.500000 122.300000 -2.500000 ... 0.150000 0.080000 0.060000 -1.000000 3.000000 78.500000 0.100000 -148.000000 20.000000 4.900000
50% 2380.000000 51.000000 52.000000 59.000000 35.000000 5.100000 4.500000 -0.100000 181.100000 0.000000 ... 0.200000 0.110000 0.080000 -1.000000 6.000000 107.100000 0.600000 -49.000000 42.000000 5.700000
75% 2434.000000 51.000000 52.000000 60.000000 37.000000 7.000000 6.200000 18.600000 302.000000 2.400000 ... 0.250000 0.150000 0.110000 -1.000000 12.000000 141.200000 1.500000 -20.000000 91.000000 6.400000
max 2488.000000 71.000000 71.000000 94.000000 74.000000 62.000000 60.700000 89.800000 360.000000 34.800000 ... 9650.000000 3220.000000 1220.000000 0.000000 400.000000 325.100000 28.000000 22.000000 1226.000000 12.700000

8 rows × 52 columns

Let’s have a look at the data

Below, the DST index is shown. Note that there are quiet periods, and periods with more magnetic storms, indicated by large negative values.

[2]:
import matplotlib.pyplot as plt
from pandas.plotting import register_matplotlib_converters
%matplotlib notebook
register_matplotlib_converters()

plt.figure()
plt.title("DST index (training/test data)")
dtrain['DST Index'].plot(label='train')
dtest['DST Index'].plot(label='test')
plt.gcf().autofmt_xdate()
plt.legend();
The correlation between the DST Index and other variables

Let’s have a look at the linear correlation between the DST index and other variables. Correlations range from -1 (anti-correlation) to 1 (perfect correlation), and a value around zero indicates the variables are linearly independent. (However, there might be a complex non-linear correlation)

[3]:
# Generate a correlation matrix
corr_matrix = dtrain.corr()

# Get the correlation with the DST index
dst_corr = corr_matrix['DST Index']

# Display correlations sorted by absolute value
ix = dst_corr.abs().sort_values(ascending=False).index
sorted_corr = dst_corr.reindex(ix)

# Print the variables with the strongest correlations
print(sorted_corr[:10])
products
DST Index                        1.000000
ap index                        -0.629716
Kp                              -0.573182
AL index (Kyoto)                 0.548138
AE Index                        -0.542807
PC(N) index                     -0.514771
Plasma Flow Speed               -0.456372
AU index (Kyoto)                -0.420654
|B|                             -0.396887
Magnitude of Avg Field Vector   -0.386568
Name: DST Index, dtype: float64
Selecting features

We will now select features from which we will predict the DST Index (our target). There are a number of other indices that the DST correlates to, but to make this example somewhat challenging we will exclude them!

[4]:
# Get names of the variables
all_names = list(sorted_corr.index)

# Remove index variables
my_names = [name for name in all_names if 'ndex' not in name]

# Remove other variables
my_names.remove('Kp')

# Select variables with the strongest (absolute) correlations as features
n_features = 5
features = my_names[:n_features]
targets = ['DST Index']
all_vars = features + targets
print("The features are: ", features)
print("The targets are: ", targets)

# Remove rows with missing values. We do this after selecting features,
# otherwise we remove too many rows.
dtrain_dropna = dtrain[all_vars].dropna()
dtest_dropna = dtest[all_vars].dropna()

# Get the features and targets
X_train = dtrain_dropna[features].values
y_train = dtrain_dropna[['DST Index']].values

X_test = dtest_dropna[features].values
y_test = dtest_dropna[['DST Index']].values

t_test = dtest_dropna.index
t_train = dtrain_dropna.index
The features are:  ['Plasma Flow Speed', '|B|', 'Magnitude of Avg Field Vector', 'Na/Np', 'Bz GSM']
The targets are:  ['DST Index']
Regression models

In this example, we consider a linear regression model and an artificial neural network (ANN) regression model. These models are used in a pipeline, which also performs scaling of the input data.

[5]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from aidapy.ml import mlp
from skorch import NeuralNetRegressor
import torch

# A list of models to use
models = []

# Append a dictionary with the model name and pipeline
models.append({
    'name': 'Linear Regression',
    'pipe': Pipeline(steps=[('preprocess', StandardScaler()),
                            ('model', LinearRegression())])
})

# RegressorMlp is a simple, fully-connected neural network,
# for which the layer sizes are defined below. The default
# activation function is ReLU.
mlp_model = NeuralNetRegressor(
    mlp.RegressorMlp,
    max_epochs=25,
    lr=0.001,
    batch_size=128,
    optimizer=torch.optim.Adam,
    module__layer_sizes=[X_train.shape[1], 64, 64, 64, 1]
)

models.append({
    'name': 'Multilayer perceptron',
    'pipe': Pipeline(steps=[('preprocess', StandardScaler()),
                            ('model', mlp_model)])
})
Training the models
[6]:
from sklearn.metrics import r2_score

for model in models:
    model['pipe'].fit(X_train, y_train)
    model['test_predict'] = model['pipe'].predict(X_test)
    model['train_predict'] = model['pipe'].predict(X_train)

for model in models:
    print("{:30} R2 score on test / train set:  {:8.3f} {:8.3f}".format(
        model['name'], r2_score(y_test, model['test_predict']),
        r2_score(y_train, model['train_predict'])))
  epoch    train_loss    valid_loss     dur
-------  ------------  ------------  ------
      1      265.4195      510.7300  4.9675
      2      199.7389      514.0299  6.8559
      3      196.6792      513.1639  5.4448
      4      194.9946      511.3364  8.0728
      5      193.8425      508.8832  3.3963
      6      192.9440      506.7368  6.7031
      7      192.1152      507.0628  4.3043
      8      191.5201      504.7900  4.5337
      9      190.9533      504.0500  5.6420
     10      190.5970      503.2600  3.0030
     11      190.2063      502.6676  3.0651
     12      189.9117      501.8307  7.2454
     13      189.5939      501.4661  7.0313
     14      189.3153      501.0051  4.6045
     15      189.0875      500.4397  8.1879
     16      188.8131      500.0692  5.8018
     17      188.5830      499.5452  8.0832
     18      188.3553      499.0485  5.2213
     19      188.0861      498.7523  4.9091
     20      187.9017      498.4360  4.6728
     21      187.7025      498.1764  4.8058
     22      187.4986      497.6907  3.5740
     23      187.3130      497.4540  4.4685
     24      187.1445      497.1794  4.2933
     25      186.9713      496.8183  7.3463
Linear Regression              R2 score on test / train set:     0.394    0.375
Multilayer perceptron          R2 score on test / train set:     0.415    0.404
Visualizing the model predictions
[7]:
plt.figure()
plt.title('Performance on test/train data')
plt.plot(t_test, y_test, label='Data test')
plt.plot(t_train, y_train, label='Data train')
for model in models:
    p = plt.plot(t_test, model['test_predict'], label=model['name'])
    # Plot another line with the same color
    plt.plot(t_train, model['train_predict'], color=p[0].get_color())
plt.legend()
plt.gcf().autofmt_xdate()
plt.grid(True)
Working with a history to improve predictions

Before, we only used the ‘instantaneous’ values of other variables to predict the DST Index. We can improve predictions by keeping track of a history of values, for which we use the AIDApy preprocessing module.

[8]:
from aidapy.ml import preprocess

histsize = 24                   # Number of past hours
forecast_time = 0               # Hours into the future

Xtr = dtrain[features].values
ytr = dtrain[targets].values
Xte = dtest[features].values
yte = dtest[targets].values

# Use the AIDApy preprocessing method for time series
X_train, y_train, mask_train = preprocess.time_series(
    Xtr, ytr, histsize, forecast_time)
X_test, y_test, mask_test = preprocess.time_series(
    Xte, yte, histsize, forecast_time)

t_train = dtrain.index[mask_train]
t_test = dtest.index[mask_test]

models_v2 = []

models_v2.append({
    'name': 'Linear Regression',
    'pipe': Pipeline(steps=[('preprocess', StandardScaler()),
                            ('model', LinearRegression())])
})

mlp_model = NeuralNetRegressor(
    mlp.RegressorMlp,
    max_epochs=20,
    lr=0.001,
    batch_size=128,
    optimizer=torch.optim.Adam,
    module__layer_sizes=[X_train.shape[1], 64, 64, 64, 1]
)

models_v2.append({
    'name': 'Multilayer perceptron',
    'pipe': Pipeline(steps=[('preprocess', StandardScaler()),
                            ('model', mlp_model)])
})
Training and evaluating the new models
[9]:
for model in models_v2:
    model['pipe'].fit(X_train, y_train)
    model['test_predict'] = model['pipe'].predict(X_test)
    model['train_predict'] = model['pipe'].predict(X_train)

for model in models_v2:
    print("{:30} R2 score on test / train set:  {:8.3f} {:8.3f}".format(
        model['name'], r2_score(y_test, model['test_predict']),
        r2_score(y_train, model['train_predict'])))
  epoch    train_loss    valid_loss     dur
-------  ------------  ------------  ------
      1      211.1822      320.4362  3.7218
      2      120.9668      272.9826  4.0578
      3      116.8801      265.6732  5.9941
      4      114.9084      249.5811  3.8256
      5      112.3243      244.7855  3.6040
      6      110.7551      240.7139  3.4175
      7      109.4082      237.4817  3.4066
      8      108.2381      234.2301  3.3303
      9      107.1030      231.2991  2.4646
     10      105.9993      228.7583  5.1124
     11      105.0800      226.6691  3.4773
     12      104.1000      224.7690  3.3862
     13      103.2819      222.9194  3.0130
     14      102.5243      221.2111  5.1869
     15      101.7451      219.7934  2.9591
     16      101.0786      218.5600  2.5240
     17      100.4546      216.9458  4.8371
     18       99.7634      215.4997  2.8842
     19       99.2128      214.3354  2.6664
     20       98.6472      213.3654  5.0065
Linear Regression              R2 score on test / train set:     0.660    0.634
Multilayer perceptron          R2 score on test / train set:     0.667    0.700
Looking at the results

The performance of both models seems to have significantly improved by using a history of variables.

[10]:
plt.figure()
plt.title('Performance on test/train data')
plt.plot(t_test, y_test, label='Data test')
plt.plot(t_train, y_train, label='Data train')
for model in models_v2:
    p = plt.plot(t_test, model['test_predict'], label=model['name'])
    # Plot another line with the same color
    plt.plot(t_train, model['train_predict'], color=p[0].get_color())
plt.legend()
plt.gcf().autofmt_xdate()
plt.grid(True)
Possible extensions

Above, features were selected based on their linear correlation to the DST index. Such features will work well for a linear model, but a neural network could potentially also use features that have a more complex correlation with the DST index. We could simply use all variables as the input for the neural network, but then we have to take care that there are no missing values (NaN’s).

[ ]:

Modules

aidapy

aidapy package
Subpackages
aidapy.aidafunc package
Submodules
aidapy.aidafunc.load_data module

Loading data functions

This script contains the function load_data needed to make the interface with the user and also the associated functions for the check.

This file can be imported as a module and contains the following functions:

aidapy.aidafunc.load_data.get_mission_info(mission='mission', start_time=None, end_time=None, product=None, full_product_catalog=False, hide=False)[source]

Provide information on the mission

Parameters
  • mission (str) – The name of the mission from which the data are loaded/downloaded

  • start_time (datetime or Time) – Start time of the loading

  • end_time (datetime or Time) – End time of the loading

  • product (str) – Data product to look for in product_catalog

  • full_product_catalog (bool) – Tag to provide all available keys

  • hide (bool) – Tag to hide print messages when use in routines

Returns

info – String containing the AIDApy keyword of the queried product or Dictionary with information on the mission or queried product with the following keys: - name: - allowed_probes: - product_catalog:

Return type

dict or str

aidapy.aidafunc.load_data.load_data(mission, start_time, end_time, **kwargs)[source]

Load the data from the given mission on a specific time interface. The extra arguments gives details on the data to load.

Parameters
  • mission (str) – The name of the mission from which the data are loaded/downloaded

  • start_time (datetime or Time) – Start time of the loading

  • end_time (datetime or Time) – End time of the loading

  • **kwargs – Specific arguments for providing information for the mission or the download settings: - prod: high-level product to load. Can be a string or a list. The full list is available by using the get_mission_info() routine. - probes: probe number. Can be a string or a list. - coords: coordinate system to use. - mode: mode to define the data rate. Usually it can be either ‘low_res’ or ‘high_res’. The user can also specify a mode specific to a mission (for instance for MMS : ‘slow’, ‘fast’, ‘brst’, ‘srvy’) The list for these specific modes (or data_rate) can be found in the heliopy documentation. https://docs.heliopy.org/en/stable/reference/data/index.html - frame: frame used only for spacecraft attitude. Usually ‘dbcs’ Example: {‘prod’: [‘dc_mag’], ‘probes’: [‘1’, ‘2’], ‘coords’: ‘gse’, ‘mode’: ‘high_res’}

Returns

xarray_mission – Requested data contained within a xarray DataSet. Each data variable contains a specific product of a specific probe

Return type

DataSet

aidapy.aidafunc.set_load_config module
Module contents
aidapy.aidaxr package
Submodules
aidapy.aidaxr.graphical module

AIDA module responsible for the Graphical utilities of the timeseries Contributors: Etienne Behar

class aidapy.aidaxr.graphical.AidaAccessorGraphical(xarray_obj)[source]

Bases: object

Xarray accessor responsible for the Graphical utilities

find_columns_name()[source]

Method to automatically find the name of the columns of the xarray

Returns

col – the name of the columns of the xarray

Return type

list

find_time_index()[source]

Method to automatically find the time index of xarray

Returns

name – the name of the time index

Return type

str

index_names()[source]

The data indexes names

peek()[source]

Plot of the time series

Parameters
  • axes (~matplotlib.axes.Axes or None) – If provided the image will be plotted on the given axes. Otherwise the current axes will be used.

  • **plot_args (dict) – Any additional plot arguments that should be used when plotting.

class aidapy.aidaxr.graphical.TimeIndexNotFound[source]

Bases: object

Error Class responsible for the time index

aidapy.aidaxr.process module

Xarray Accessor that is responsible for conversions

class aidapy.aidaxr.process.AidaProcessAccessor(xarray_obj)[source]

Bases: object

This class is responsible for converting units to other formats

elev_angle()[source]

tofill

Returns

data – tofill

Return type

tofill

class aidapy.aidaxr.process.AidaStatisticsAccessor(xarray_obj)[source]

Bases: object

Xarray Dataset accessor responsible for the statistical utilities

static curl_b(b1, b2, b3, b4, r1, r2, r3, r4)[source]

tofill Routine taken from esa. https://www.cosmos.esa.int/web/csa/multi-spacecraft

Parameters
  • b1 (tofill) – tofill

  • b2 (tofill) – tofill

  • b3 (tofill) – tofill

  • b4 (tofill) – tofill

  • r1 (tofill) – tofill

  • r2 (tofill) – tofill

  • r3 (tofill) – tofill

  • r4 (tofill) – tofill

Returns

  • jj (tofill) – tofill

  • div_b (tofill) – tofill

  • div_b_bycurl_b (tofill) – tofill

j_curl()[source]

tofill

Returns

data – tofill

Return type

tofill

plasma_beta(probe=1, species='i')[source]

tofill

Returns

data – tofill

Return type

tofill

reindex_ds_timestamps(sample_freq=None)[source]

tofill

Returns

data – tofill

Return type

tofill

aidapy.aidaxr.statistics module

AIDA module responsible for the statistical utilities of the timeseries

class aidapy.aidaxr.statistics.AidaAccessorStatistics(xarray_obj)[source]

Bases: object

Xarray accessor responsible for the statistical utilities

autocorr(lagbeg=0, stride=1, dt=1, timeu='s', normalize=True)[source]

Calculates the autocorrelation of the xarray per column.

Parameters
  • lagbeg (int) – Default = 0. Initial lag.

  • stride (int) – Default = 1. Stride of the window to autocorrelate.

  • dt (int) – Default = 1. Sampling frequeny of the signal.

  • timeu (str) – Default = ‘s’. Time unit of the signal.

  • normalize (bool) – Default = True. Normalizes the autocorrelation

Returns

values – An xarray containing the values of the autocorrelation, where the index is the lag and the units are the same as the input vector

Return type

xarray.DataArray

mean(**kwargs)[source]

Calculates the mean of on xarray DataArray

Parameters
  • coord (int) – The keyword is the coordinate of the xarray on which the mean will be calculated. The value given is the size of the sliding window.

  • center (bool) – If true the resulting value will be placed in the middle index of the window, otherwise it will be placed in the last index of the window.

Returns

values – An xarray containing the values of the mean in the requested coordinate.

Return type

xarray.DataArray

psd(timeu='s', **kwargs)[source]

Calculates the power spectral density using the signal.welch routine from scipy

Parameters
  • timeu (str) – Sampling unit of the signal. Default is ‘s’

  • **kwargs – arguments used by the signal.welch method of scipy

Returns

values – An xarray containing the values of the PSD, were the index is the frequency and the units are in Hz

Return type

xarray.DataArray

psdwt(*args)[source]

Returns the wavelet transform of a timeseries

Parameters

*args – Arguments used in scipy.signal.cwt

Returns

values – An xarray containing the wavelet transform of the original timeseries. The signal has to be one dimensional. The name of the first dimension is used as first dimension in the resulting xarray.

Return type

xarray.DataArray

sfunc(scale=[1], order=2)[source]

Returns the structure function of a timeseries

\[y= \frac{1}{N-s}\sum_{i=1}^{N-s}(x_i - x_{i+s})^o\]

where \(s\) is the scale, and \(o\) is the order.

Parameters
  • scale (list or numpy.array) – A list or an array containing the scales to calculate.

  • order (int) – Order of the exponential of the structure function.

Returns

values – An xarray containing the structure functions, one per product in the original timeseries. The index coordinate contains the scale value, and the attribute ‘order’ keeps a record on the order used for its calculation.

Return type

xarray.DataArray

std(**kwargs)[source]

Calculates the standar deviation of on xarray DataArray

Parameters
  • coord (int) – The keyword is the coordinate of the xarray on which the std will be calculated. The value given is the size of the sliding window.

  • center (bool) – If true the resulting value will be placed in the middle index of the window, otherwise it will be placed in the last index of the window.

Returns

values – An xarray containing the values of the std in the requested coordinate.

Return type

xarray.DataArray

class aidapy.aidaxr.statistics.AidaStatisticsAccessor(xarray_obj)[source]

Bases: object

Xarray Dataset accessor responsible for the statistical utilities

sfunc(var=None, dim=None, scale=[1], order=2)[source]

Generalizes the sfunc to xr.Dataset

aidapy.aidaxr.tools module
aidapy.aidaxr.tools.check_time_serie_compatible(method)[source]

Check if the xarray object in the accessor corresponds to a time serie.

aidapy.aidaxr.vdf module

AIDA module responsible for the vdf handling. etienne.behar

class aidapy.aidaxr.vdf.AidaAccessorVDF(xarray_obj)[source]

Bases: object

Xarray accessor responsible for the vdf utilities.

interpolate(start_time, end_time, start_time_sub, end_time_sub, species='electron', frame='instrument', grid_geom='spher', v_max=None, resolution=60, interp_schem='near', verbose=True)[source]

Main method of the AidaAccessorVDF.

Pre-processes the loaded data, initialises the rotation matrices and translations vectors, selects a time sub-interval, interpolates the data, post-process them and save them as new variables of the xarray dataset.

Parameters
  • start_time (datetime.datetime) – The start time of the global period of interst.

  • end_time (datetime.datetime) – The end time of the global period of interest.

  • start_time_sub (datetime.datetime) – The start time of the local period of interest.

  • end_time_sub (datetime.datetime) – The end time of the local period of interest.

  • species (str) – The species to be analysed, either ‘electron’ or ‘ion’

  • frame (str) –

    The reference frame in which the data will be analysed and

    visualised.

  • grid_geom (str) – The geometry of the grid-of-interest, along which VDF values will be interpolated. Either ‘cart’ for cartesian geometry, ‘spher’ or spherical, or ‘cyl’ for cylindrical.

  • v_max (float) – The maximum extend of the grid-of-interest. If None, taken as the maximum instrument speed coverage.

  • resolution (int) – Resolution of the grid-of-interest. Advised is between 50 and 200 for reasonable quality and computation time.

  • interp_schem (str) – Inteprolation scheme used by the method. Either ‘near’ for nearest-neighbour, ‘lin’ for tri-linear, or ‘cub’ for tri-cubic interpolation. The two first are provided by scipy (https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.RegularGridInterpolator.html) and the last by the tricubic package (https://pypi.org/project/tricubic/).

Returns

xr_mms

The xarray dataset. The interpolation products are added to the dataset

at the end of the method.

Return type

xarray dataset

plot(ptype='1d', plt_contourf=False, cmap='RdBu_r')[source]

Calls to plotting methods from aidapy.tools.vdf_plot

Parameters
  • ptype (str) – Type of plot, ‘1d’, ‘2d’, ‘3d’, ‘3d_time’ or ‘3d_gyro’

  • plt_contourf (bool) – If True, colormeshes are plotted using a filled contour method. Default is False.

  • cmap (str) – Valid matplotlib colormap string. Default is ‘RdBu_r’.

aidapy.aidaxr.xevents module
class aidapy.aidaxr.xevents.AidaAccessorStatistics(xarray_obj)[source]

Bases: object

increm(scale)[source]

Returns the increments of a timeseries

\[y = |x_i - x_{i+s}|\]

where \(s\) is the scale.

Parameters

scale (int) – Scale at which to compute the increments.

Returns

kurt – kurtosis of the increments, one per product, using the Fisher’s definition (0 value for a normal distribution).

Return type

np.array

resultxarray.DataArray

An xarray containing the time series increments, one per product in the original timeseries.

pvi(scale)[source]

Returns the PVI of a timeseries

\[y = \frac{|x_i - x_{i+s}|^2}{<|x_i - x_{i+s}|^2>}\]

where \(s\) is the scale.

Parameters

scale (int) – Scale at which to compute the PVI.

Returns

values – An xarray containing the pvi of the original timeseries.

Return type

xarray.DataArray

threshold(theta)[source]

Returns the location of the PVI peaks above the threshold theta theta is usually 8 for extreme events, as reported in literature.

Parameters

theta (int) – Threshold value for PVI.

Returns

result – A numpy array containing the location of the pvi peaks above theta.

Return type

np.array

Module contents

Init to import easily the accessors

aidapy.data package
Subpackages
aidapy.data.mission package
Subpackages
aidapy.data.mission.mission_settings package
Submodules
aidapy.data.mission.mission_settings.settings module
Module contents
Submodules
aidapy.data.mission.base_mission module

This module serves as base for all the mission data manager

Author : Romain Dupuis

class aidapy.data.mission.base_mission.BaseMission(t_start, t_end)[source]

Bases: object

Base format for the data downloaded from missions

Parameters
  • t_start (datetime object) – It gives the time at which we started to look at the data

  • t_end (datetime object) – It gives the time at which we finished looking at the data

static convert_time(time, format_out)[source]

Convert time to an accepted astropy time format

Parameters
  • time (datetime or astropy.time instance) – Time object we want to convert

  • format_out (str) – Indicates what is the format output. Must be a format accepted by astropy

download_data(data_types, probes, coords, mode, frame)[source]

Download data

get_mission_info()[source]

Provide information on the mission

Returns

info – Dictionary with information on the mission with the following keys - name: - allowed_probes: - product_catalog:

Return type

dict

static parse_units(xarray_obj)[source]

Convert unit at the dim level to unit at coord level

set_coords(coords)[source]

Set the coordinates

set_frame(frame)[source]

Set the frame only for spacecraft attitude

set_mode(mode)[source]

Set the instrument mode

abstract set_observation(obs_settings)[source]

Abstract method to set the different parameters of the mission

set_probe(probe)[source]

Set the current working probe

set_probes(probes)[source]

Set the probes

set_product_catalog()[source]

Docstring

aidapy.data.mission.cluster module

This module serves as data manager for cluster

Authors : Romain Dupuis, Hugo Breuillard

class aidapy.data.mission.cluster.Cluster(t_start, t_end)[source]

Bases: aidapy.data.mission.base_mission.BaseMission

Cluster data manager.

Parameters
  • t_start (datetime or astropy object) – It gives the time at which we started to look at the data

  • t_end (datetime or astropy object) – It gives the time at which we finished looking at the data

mission

Name of the mission

Type

str

allowed_probes

List of probes available for the mission

Type

list of str

probes

List of probes from which the downloader is going to download data

Type

list of str

coords

Coordinates in which the data are downloaded

Type

str

set_observation(obs_settings)[source]

Set parameter for the observation.

aidapy.data.mission.mission module

This module serves to build high level interface to use mission data

Author : Romain Dupuis

class aidapy.data.mission.mission.Mission(mission_name, t_start, t_end)[source]

Bases: object

Docstring

download_data(data_types=None, probes=None, coords=None, mode=None, frame=None)[source]

Method at the Mission level to query data downloading

get_info_on_mission()[source]

Provide information on the mission

class aidapy.data.mission.mission.MissionFactory[source]

Bases: object

Docstring of MissionFactory

get_mission(mission_name, t_start, t_end)[source]

Provide the relevant mission object

register_mission(mission_name, mission_creator)[source]

Add the implemented mission to the list of available cretors

aidapy.data.mission.mms module

This module serves as data manager for MMS

Authors : Romain Dupuis, Hugo Breuillard

class aidapy.data.mission.mms.Mms(t_start, t_end)[source]

Bases: aidapy.data.mission.base_mission.BaseMission

MMS data manager.

Parameters
  • t_start (datetime object) – It gives the time at which we started to look at the data

  • t_end (datetime object) – It gives the time at which we finished looking at the data

meta_data

Contains all the metadata. It is generated after the data

Type

metadata object

data

Dict containing the time series queried by the user

Type

dict of sunpy timeseries object

mission

Name of the mission

Type

str

set_observation(obs_settings)[source]

Set parameter for the observation.

aidapy.data.mission.modified_cluster module

Slight modification of the cluster method from HelioPy Methods for importing data from the four Cluster spacecraft.

To download data you will need to be registered at the cluster science archive (http://www.cosmos.esa.int/web/csa/register-now), and have set either the environment variable CLUSTERCOOKIE to your cookie, or set your cookie in the heliopyrc file.

The data download method is described at https://csa.esac.esa.int/csa/aio/html/wget.shtml.

aidapy.data.mission.modified_cluster.cis_codif_h1_moms(probe, starttime, endtime, sensitivity='high', try_download=True)[source]

Load H+ moments from CIS instrument.

See https://caa.estec.esa.int/documents/UG/CAA_EST_UG_CIS_v35.pdf for more information on the CIS data.

Parameters
  • probe (string) – Probe number. Must be ‘1’, ‘2’, ‘3’, or ‘4’.

  • starttime (datetime) – Interval start.

  • endtime (datetime) – Interval end.

  • sensitivity (string, 'high' or 'low', default: 'low') – Load high or low sensitivity

Returns

data – Requested data.

Return type

DataFrame

aidapy.data.mission.modified_cluster.cis_hia_onboard_moms(probe, starttime, endtime, try_download=True)[source]

Download onboard ion moments from the CIS instrument.

See https://caa.estec.esa.int/documents/UG/CAA_EST_UG_CIS_v35.pdf for more information on the CIS data.

Parameters
  • probe (string) – Probe number. Must be ‘1’ or ‘3’

  • starttime (datetime) – Interval start.

  • endtime (datetime) – Interval end.

Returns

data – Requested data.

Return type

DataFrame

aidapy.data.mission.modified_cluster.fgm(probe, starttime, endtime, try_download=True)[source]

Download fluxgate magnetometer data.

See https://caa.estec.esa.int/documents/UG/CAA_EST_UG_FGM_v60.pdf for more information on the FGM data.

Parameters
  • probe (string) – Probe number. Must be ‘1’, ‘2’, ‘3’, or ‘4’.

  • starttime (datetime) – Interval start.

  • endtime (datetime) – Interval end.

Returns

data – Requested data.

Return type

TimeSeries

aidapy.data.mission.modified_cluster.peace_moments(probe, starttime, endtime, try_download=True)[source]

Download electron moments from the PEACE instrument.

See https://caa.estec.esa.int/documents/UG/CAA_EST_UG_PEA_v25.pdf for more information on the PEACE data.

Parameters
  • probe (string) – Probe number. Must be ‘1’, ‘2’, ‘3’, or ‘4’.

  • starttime (datetime) – Interval start.

  • endtime (datetime) – Interval end.

Returns

data – Requested data.

Return type

DataFrame

aidapy.data.mission.omni module

This module serves as data manager for Omniweb

Authors : Romain Dupuis, Hugo Breuillard

class aidapy.data.mission.omni.Omni(t_start, t_end)[source]

Bases: aidapy.data.mission.base_mission.BaseMission

Omniweb data manager

Parameters
  • t_start (datetime object) – It gives the time at which we started to look at the data

  • t_end (datetime object) – It gives the time at which we finished looking at the data

meta_data

Contains all the metadata. It is generated after the data

Type

metadata object

ts

Contains the time series queried by the user

Type

timeseries object

mission

Name of the misson

Type

str

set_observation(obs_settings)[source]

Doc string

classmethod variables_info()[source]

Print variables info with corresponding index

Module contents
Module contents
aidapy.ml package
Submodules
aidapy.ml.gmm module
aidapy.ml.mlp module
aidapy.ml.preprocess module
aidapy.ml.preprocess.time_series(X_in, y_in, histsize=1, forecast_offset=0, dropna=True, predict_change=False)[source]

Preprocess time series data for machine learning purposes

Create new features X containing a history of the old features, and define the targets at a future time defined by forecast_offset.

Parameters
  • X_in – input features (2D array)

  • y_in – input targets (1D or 2D array)

  • histsize – how much history to include

  • forecast_offset – index offset of the forecast

  • dropna – remove entries with missing values

  • predict_change – if true, predict change in y (y[t+dt]-y[t])

Returns

X, y, mask

Return type

preprocessed features X and targets y, and the corresponding mask

Module contents
aidapy.tools package
Submodules
aidapy.tools.sitl_parsing module
aidapy.tools.sitl_parsing.convert_sitl(file_path, specific_output_path=None)[source]

Convert the SITL report into file readable by Pandas. It is written in the formated folder.

Parameters
  • file_path (str) – File path of the SITL report.

  • specific_output_path (str) – Specific the name of the folder (in the SITL report folder) where we want to save the converted report.

aidapy.tools.sitl_parsing.generate_event_dict()[source]
aidapy.tools.sitl_parsing.generate_location_dict()[source]
aidapy.tools.sitl_parsing.generate_output_path(input_file_path, specific_output_path=None)[source]

Generate the output path where to write a new file from the path of an existing file. By default, the new output_path is in the ‘formated’ folder present in the folder of the existing file.

Parameters
  • input_file_path (str) – File path of the existing file.

  • specific_output_path (str) – Specific the name of the folder (in the SITL report folder) where we want to save the converted report.

Returns

output_path – Output path for the new files. By default, associated path to ‘formated’

Return type

str

aidapy.tools.sitl_parsing.get_info(discussion_column, category='location')[source]

Generate a list of keys (from category) from a Serie of strings.

Parameters
  • discussion_column (Series object) – Specific column from the SITL associated to the SITL discussion.

  • category (str) – Specify which category are generated (location, events, etc.)

Returns

output – A df with all the information from the SITL report. Start time, end time, the location, and the associated text.

Return type

list of XXXX

aidapy.tools.sitl_parsing.look_for(string, category=None)[source]

Look for the presence of specific patterns in the SITL comments.

Parameters

string (str) – SITL discussion string

Returns

output_key – Key associated to the identified pattern. Example: ‘ms’ if ‘magnetosphere’ has been identified in string.

Return type

str

aidapy.tools.sitl_parsing.path_leaf(path)[source]

Extract the file name from a path. If the file ends with a slash, the basename will be empty, so the function to deal with it

Parameters

path (str) – Path of the file

Returns

output – The name of the file

Return type

str

aidapy.tools.sitl_parsing.prepare_small_string(string)[source]
aidapy.tools.sitl_parsing.read_and_transform(file_name, category='location')[source]

Read the formated SITL report (preprocessed and ready to be read) as a Pandas Dataframe and add new columns (location, events, etc.)

Parameters
  • file_name (str) – File name of the cleaned and prepared SITL report.

  • category (str) – Specify which category are generated (location, events, etc.)

Returns

df – A df with all the information from the SITL report. Start time, end time, the location, and the associated text.

Return type

DataFrame object

aidapy.tools.vdf_plot module

Methods for plotting interpolated VDFs products, from data or simulation. Etienne Behar.

aidapy.tools.vdf_plot.cart(vdf, grid_cart, plt_contourf=False, cmap='RdBu_r')[source]

Plots the VDF interpolated over a cartesian grid-of-interest, cuts.

aidapy.tools.vdf_plot.gyro(vdf, grid_spher, grid_cart, cmap='RdBu_r')[source]

Original and scaled along gyro-angle (relevant if right frame used).

aidapy.tools.vdf_plot.profiles_1d(vdf, grid_spher)[source]

Plots the parallel and perpendicular profiles of the interpolated vdf.

aidapy.tools.vdf_plot.set_spines(AX)[source]

Nicer plots IMHO, Etienne Behar.

aidapy.tools.vdf_plot.spher(vdf, grid_cart, plt_contourf=False, cmap='RdBu_r', vlim_norm=None)[source]

Plots the VDF interpolated over a spherical grid-of-interest, cylindrical symmetry. Meant for electrons in a magnetic field aligned frame.

aidapy.tools.vdf_plot.spher_gyro(vdf, grid_spher, grid_cart, plt_contourf=False, cmap='RdBu_r', vlim_norm=None)[source]

Plots the VDF interpolated over a spherical grid-of-interest, cylindrical symmetry, together with four sectors of gyro-angle. Meant to be used with electrons in a magnetic field aligned frame.

aidapy.tools.vdf_plot.spher_time(vdf_interp, vdf_scaled, vdf_normed, grid_spher, time_interp, plt_contourf=False, cmap='RdBu_r')[source]

Plots the VDF interpolated over a spherical grid-of-interest, cylindrical symmetry, along time. Meant for electrons in a magnetic field aligned frame.

aidapy.tools.vdf_plot.xy_plane(vdf, grid_spher, grid_cart, cmap='RdBu_r')[source]

xy-plane.

aidapy.tools.vdf_utils module
aidapy.tools.vdf_utils.R_2vect(vector_orig, vector_fin)[source]

Taken from: https://github.com/Wallacoloo/printipi/blob/master/util/rotation_matrix.py Calculate the rotation matrix required to rotate from one vector to another. For the rotation of one vector to another, there are an infinit series of rotation matrices possible. Due to axially symmetry, the rotation axis can be any vector lying in the symmetry plane between the two vectors. Hence the axis-angle convention will be used to construct the matrix with the rotation axis defined as the cross product of the two vectors. The rotation angle is the arccosine of the dot product of the two unit vectors. Given a unit vector parallel to the rotation axis, w = [x, y, z] and the rotation angle a, the rotation matrix R is:

      |  1 + (1-cos(a))*(x*x-1)   -z*sin(a)+(1-cos(a))*x*y   y*sin(a)+(1-cos(a))*x*z |
R  =  |  z*sin(a)+(1-cos(a))*x*y   1 + (1-cos(a))*(y*y-1)   -x*sin(a)+(1-cos(a))*y*z |
      | -y*sin(a)+(1-cos(a))*x*z   x*sin(a)+(1-cos(a))*y*z   1 + (1-cos(a))*(z*z-1)  |
Parameters
  • R – The 3x3 rotation matrix to update.

  • vector_orig – The unrotated vector defined in the reference frame.

  • vector_fin – The rotated vector defined in the reference frame.

aidapy.tools.vdf_utils.Rx(a, theta)[source]

Rotation around the x-axis of angle theta.

aidapy.tools.vdf_utils.Ry(a, phi)[source]

Rotation around the y-axis of angle phi.

aidapy.tools.vdf_utils.Rz(a, psi)[source]

Rotation around the z-axis of angle psi.

aidapy.tools.vdf_utils.cart2cyl(v_cart)[source]

Coordinate system conversion

aidapy.tools.vdf_utils.cart2spher(v_cart)[source]

Coordinate system conversion

aidapy.tools.vdf_utils.cyl2cart(v_cyl)[source]

Coordinate system conversion

aidapy.tools.vdf_utils.init_grid(v_max, resolution, grid_geom)[source]

Here we define the bin edges and centers, depending on the chosen coordinate system.

aidapy.tools.vdf_utils.spher2cart(v_spher)[source]

Coordinate system conversion

class aidapy.tools.vdf_utils.vdf(v_max, resolution, grid_geom)[source]

Bases: object

docstring

interpolate_cart_vdf(grid, vdf0, interpolate='near')[source]

docstring

interpolate_spher_vdf(grid, vdf0, interpolate='near')[source]

docstring

transform_grid(R=None, v=None, s=None)[source]

docstring

aidapy.tools.vdf_utils.vdf_scaled(vdf)[source]

0-to-1 scaling of a vdf. Must be given over a spherical grid.

Module contents
Module contents

Init

Contributing to AIDApy

All contributions are welcome. For detailed information about the code style please read the following instructions. All the code must have a adequate number of tests that assures it’s credibility. Feel free to make pull requests.

Introduction

This guide defines the conventions for writing Python code for aidapy. The main ideas are:

  • ensuring a consistent code style

  • promote good practices for testing

  • maintaining a good level of readability and maintainability

  • to keep it simple

Python version

Prefer if possible Python>=3.6 since there are major dependencies that do not support other python versions. There is no reason motivating the use of Python 2 as AIDApy is not using dependencies or legacy code.

Coding style

Stick as much as possible to PEP8 for general guidelines in term of coding conventions and to PEP257 for typical docstring conventions. You can also have a look to Python anti-pattern.

Main guidelines from PEP8

PEP8 coding conventions are:

  • Use 4 spaces per indentation level.

  • Limit all lines to a maximum of 100 characters.

  • Separate top-level function and class definitions with two blank lines.

  • Imports should be grouped in the following order:

  • Standard library imports.

  • Related third party imports.

  • Local application/library specific imports.

  • A blank line between each group of imports.

Use Linters

Linters are tools for static code quality checker. For instance, you can use the following tools to test conformity with the common pythonic standards:

  • pylint is one of the oldest linters and tracks various problems such as good practice violation, coding standard violation, or programming issues. Pylint may be seen as slow, too verbose and complex to configure to get it working properly. You can run a complete static analysis with the following command:

pylint aidapy --rcfile=setup.cfg --ignore-patterns='file_to_ignore1.py','file_to_ignore2.py'

All these linters can be simply installed with pip. Further details on the functionnalities can be found here or there. Also, a lot of features can also be provided natively or by installing plugins with your IDE (PyCharm, Spyder, Eclipse, etc.).

Documentation

Documentation of all the files must be done in-line using Sphinx. We strongly encourage you to setup a documentation using doctest. Proper guidelines remain to be written.

Testing

The library pytest is used to launch tests in the code. All tests can be launched using:

coverage run -m pytest

This command gives coverage at the same time. The output consists in tests results and coverage report.

Regarding the location of the test files, add a directory where you will keep the unit tests for each individual modules of the package.

<AIDApy_root>/aidapy/<mymodule>/tests

Each test must be minimal and should ensure non-regression, and full functionality of the module. The files should be named:

  • test_<name>.py where <name> is any name you think representative of the test.

Integration tests will be kept in the following directory. Do not forget that integration tests are those that involve two or more sub-modules of the aidapy package.

<AIDApy_root>/tests

These tests are handled by the package integration team.

Using the git repository

You will always work in a branch different from master. The general rule is the following (source):

  • The production version of your code sits in a branch named master.

  • The development version of your code (ready to be tested and productionised) sits in a branch named develop.

  • When development on a new feature is started, a new branch is created off of the develop branch. This is a feature branch.

  • When development has completed, the feature branch is either merged back into the develop branch if the changes are to be taken into production, or left in their respective branch if they are not.

  • When the code in the develop branch is finally ready to be released into production, a new release branch is created.

  • When the release branch is approved for release, it is merged into the master branch and develop branch to capture any fixes that may have been done directly on the release branch.

When you are working on your feature branch, commit often, every time that an consistent modification is made. Please use meaningful commit messages: follow the standards presented in the guide How to write a git commit message.

Remember to always merge the latest version of the master branch to your branch before any commit. Please check that your code works and passes the standardization tool tests before requesting a merge to the master branch.

Once you are sure that your branch is in good conditions to share it with the developers and users, please make a merge request on gitlab.

Testing Standards

In order the merging request (of the feature branch to master branch ) to be authorized the feature branch must pass successfully from the Continuous Integration tools. To achieve this, the score of the pylint library must be over 8.0 / 10.0 and the test coverage above 80%.

For assessing the pylint score run the command:

pip install pylint
pylint aidapy --rcfile=setup.cfg

Finally for assessing the code test coverage of your branch run:

coverage run setup.py test

Help & reference

Reference

About