[Python] Common Pitfalls

Do not make the same mistakes we did! Here you find a list of common pitfalls we have witnessed when working with EMD files in python.

  • Note that the first dimension is saved in the dataset called dim1, there is no dim0.
  • The dimensions of a dataset in h5py are in ascending order 1,2,..n or x,y,..n. When working with images as numpy arrays however, the usual way to order the dimensions is as y,x corresponding to rows and columns on the screen. To interchange these with the EMD file, one has to flip x and y directions by using for example np.transpose().
  • To correctly save strings using h5py the use of fixed-width byte strings is encouraged. Saving python string objects can lead to encoding errors or worse. Just parse your string through np.string_('example'). To convert back just decode it to UTF8 like b'example'.decode('utf-8')
  • Remember, that you can create significant memory leakages in python, if you are not careful about assigning variables. There is a difference between h5py_dset = data and h5py_dset = data[:], which becomes interesting when you repeatedly read in data in a loop.
  • There have been reports about performance issues in h5py related to numpy indexing. Feel free to do a quick internet research.