In this post we will see what it takes to read an EMD file with python. As a test case we consider the test.emd file written in the previous post. Don’t forget to take a look at our collection of common pitfalls.
Basic
In the easiest case you know the structure of the EMD file either because you have written it yourself or from browsing it with a different tool like the EMD viewer. In this case it takes only a few lines to access the data.
import h5py # open the EMD file f = h5py.File('test.emd', 'r') # assuming you know the structure of the file emdgrp = f['data/dataset_1'] # read data data = emdgrp['data'][:] # close the EMD file f.close()
First, we import the h5py
package to facilitate working with HDF5 files. We then open the EMD file by specifying its path. Here we use the readonly option to not accidentally change its content.
Remember that the HDF5 file works as its own small filesystem. We can therefore access the dataset_1 group in our EMD file by specifying its full path inside the file. For convenience we save a reference to this group in a variable called emdgrp
.
The dataset data saved in the dataset_1 group can be accessed the very same way, using emdgrp['data']
which is equivalent to f['data/dataset_1/data']
. We use the [:]
indexing here to copy all the values to a new numpy.ndarray
object referenced by the data
variable.
Finally, we close the EMD file calling f.close()
, as we do not need it anymore.
More
Of course this in not the only thing one can read from the EMD file. For example one will generally want to access the dimensions information as well. The following lines read in the dim# datasets plus the metadata stored in the attributes. (Note that the following code examples have to be placed before f.close()
, as they need to access information from the file.)
# read dimensions 1 and 3 dim1 = emdgrp['dim1'][:] dim1_meta = (emdgrp['dim1'].attrs['name'], emdgrp['dim1'].attrs['units']) dim2 = emdgrp['dim2'][:] dim2_meta = (emdgrp['dim2'].attrs['name'], emdgrp['dim2'].attrs['units']) dim3 = emdgrp['dim3'][:] dim3_meta = (emdgrp['dim3'].attrs['name'], emdgrp['dim3'].attrs['units'])
This information can be used for further processing, for example to create coordinate arrays which are useful to evaluate mathematical functions at the same points we have data values for.
# create x and y coordinate arrays import numpy as np xx, yy = np.meshgrid(dim1, dim2)
The metadata becomes for example important when it comes to plotting. In the following line a string is created to potentially label the z-axis of the dataset.
# label for z axis print('{} {}'.format(dim3_meta[0].decode('utf-8'), dim3_meta[1].decode('utf-8')))
Lets find out whom to contact, in case we have any questions about how the data has been acquired. Simply grap the email attribute from the user group.
# grap email from user email = f['user'].attrs['email'].decode('utf-8') print('In case of questions, let\'s ask {}'.format(email))
To review the changes made to the EMD file, we can have a look at all notes in the comments group:
# review changes logged in the comments section changes = f['comments'].attrs for key in changes: # iterating over dict print('{}:\t{}'.format(key, changes[key].decode('utf-8')))
Advanced
In case you do not know the structure of your EMD file or are to lazy to look it up, you can iteratively search for things. The following lines go through the items in the file and test them for the emd_group_type attribute.
# recursive function to run and retrieve groups with emd_group_type set to 1 def proc_group(group, emds): # take a look at each item in the group for item in group: # check if group if group.get(item, getclass=True) == h5py._hl.group.Group: item = group.get(item) # check if emd_group_type if 'emd_group_type' in item.attrs: if item.attrs['emd_group_type'] == 1: print('found an emd group at: {}'.format(item.name)) emds.append(item) # process subgroups proc_group(item, emds) # run emds = [] proc_group(f, emds)
We define a processing function, which we recursively run on all groups in the file. Given a parent group
a for
loop iterates over every item
in the group. As these can be datasets or groups, the item is checked which type it is. Only in the case of a group, we check for the emd_group_type attribute and whether it is set to 1. If both applies, a message is printed out and a reference to this group is saved in a list. For the case item
is a group, the function is recursively run on item, in case of a dataset, nothing further happens.
To execute the search, we create an empty list emds
and start the recursion by running the proc_group
function on the root of the EMD file.