SciPy Recipes
上QQ阅读APP看书,第一时间看更新

Storing several arrays in binary format

The savez() function allows for the saving of several arrays in the same file. In the following code, we generate arrays with random data and store them to the disk:

x = np.random.rand(200, 300)
y = np.random.rand(30)
z = np.random.rand(10, 5, 7)
np.savez('arrays_xyz.npz', xfile=x, yfile=y, zfile=z)

This code performs the following steps:

  • Generates files in NPY format, holding each of the arrays x, y, and z. These files are named, respectively, xfile, yfile, and zfile.
  • Creates a single ZIP archive containing the NPY files generated for each array.
  • Saves the archive to the disk in a file named arrays_xyz.npz.

Since the generated archive is a ZIP file, it can be opened with any standard archiving utility. In my system, a printout of information on the contents of the arrays_xyz.npz file produces the following:

Archive:  arrays_xyz.npz
Zip file size: 483584 bytes, number of entries: 3
-rw------- 2.0 unx 480080 b- stor 17-May-09 09:32 xfile.npy
-rw------- 2.0 unx 320 b- stor 17-May-09 09:32 yfile.npy
-rw------- 2.0 unx 2880 b- stor 17-May-09 09:32 zfile.npy
3 files, 483280 bytes uncompressed, 483280 bytes compressed: 0.0%

This output indicates that the arrays_xyz.npz archive contains three files, xfile.npy, yfile.npy, and zfile.npy. Notice the last line of the report, stating that the compression factor is 0%. By default, the storez() function does not compress the data, which makes it work faster at the cost of producing larger files. If a compressed file is required, we can use the savez_compressed() function, demonstrated in the following code:

np.savez_compressed('arrays_xyz_c.npz', xfile=x, yfile=y, zfile=z)

By checking the generated file, arrays_xyz_c.npz, with an archive utility, it can be seen that the compression factor in this example is about 6%. The low compression rate is due to the fact that the arrays contain random data, which has little redundancy. Data exhibiting more regularity will yield better compression rates.