[Python] Read EMD

In this post we will see what it takes to read an EMD file with python. As a test case we consider the test.emd file written in the previous post. Don’t forget to take a look at our collection of common pitfalls.


In the easiest case you know the structure of the EMD file either because you have written it yourself or from browsing it with a different tool like the EMD viewer. In this case it takes only a few lines to access the data.

import h5py

# open the EMD file
f = h5py.File('test.emd', 'r')

# assuming you know the structure of the file
emdgrp = f['data/dataset_1']

# read data
data = emdgrp['data'][:]

# close the EMD file

First, we import the h5py package to facilitate working with HDF5 files. We then open the EMD file by specifying its path. Here we use the readonly option to not accidentally change its content.

Remember that the HDF5 file works as its own small filesystem. We can therefore access the dataset_1 group in our EMD file by specifying its full path inside the file. For convenience we save a reference to this group in a variable called emdgrp.

The dataset data saved in the dataset_1 group can be accessed the very same way, using emdgrp['data'] which is equivalent to f['data/dataset_1/data']. We use the [:] indexing here to copy all the values to a new numpy.ndarray object referenced by the data variable.

Finally, we close the EMD file calling f.close(), as we do not need it anymore.


Of course this in not the only thing one can read from the EMD file. For example one will generally want to access the dimensions information as well. The following lines read in the dim# datasets plus the metadata stored in the attributes. (Note that the following code examples have to be placed before f.close(), as they need to access information from the file.)

# read dimensions 1 and 3
dim1 = emdgrp['dim1'][:]
dim1_meta = (emdgrp['dim1'].attrs['name'], emdgrp['dim1'].attrs['units'])

dim2 = emdgrp['dim2'][:]
dim2_meta = (emdgrp['dim2'].attrs['name'], emdgrp['dim2'].attrs['units'])

dim3 = emdgrp['dim3'][:]
dim3_meta = (emdgrp['dim3'].attrs['name'], emdgrp['dim3'].attrs['units'])

This information can be used for further processing, for example to create coordinate arrays which are useful to evaluate mathematical functions at the same points we have data values for.

# create x and y coordinate arrays
import numpy as np
xx, yy = np.meshgrid(dim1, dim2)

The metadata becomes for example important when it comes to plotting. In the following line a string is created to potentially label the z-axis of the dataset.

# label for z axis
print('{} {}'.format(dim3_meta[0].decode('utf-8'), dim3_meta[1].decode('utf-8')))

Lets find out whom to contact, in case we have any questions about how the data has been acquired. Simply grap the email attribute from the user group.

# grap email from user
email = f['user'].attrs['email'].decode('utf-8')
print('In case of questions, let\'s ask {}'.format(email))

To review the changes made to the EMD file, we can have a look at all notes in the comments group:

# review changes logged in the comments section
changes = f['comments'].attrs
for key in changes:
    # iterating over dict
    print('{}:\t{}'.format(key, changes[key].decode('utf-8')))


In case you do not know the structure of your EMD file or are to lazy to look it up, you can iteratively search for things. The following lines go through the items in the file and test them for the emd_group_type attribute.

 # recursive function to run and retrieve groups with emd_group_type set to 1
def proc_group(group, emds):
    # take a look at each item in the group
    for item in group:
        # check if group
        if group.get(item, getclass=True) == h5py._hl.group.Group:
            item = group.get(item)
            # check if emd_group_type
            if 'emd_group_type' in item.attrs:
                if item.attrs['emd_group_type'] == 1:
                    print('found an emd group at: {}'.format(item.name))
            # process subgroups
            proc_group(item, emds)
# run
emds = []
proc_group(f, emds)

We define a processing function, which we recursively run on all groups in the file. Given a parent group a for loop iterates over every item in the group. As these can be datasets or groups, the item is checked which type it is. Only in the case of a group, we check for the emd_group_type attribute and whether it is set to 1. If both applies, a message is printed out and a reference to this group is saved in a list. For the case item is a group, the function is recursively run on item, in case of a dataset, nothing further happens.
To execute the search, we create an empty list emds and start the recursion by running the proc_group function on the root of the EMD file.