python - What's the best way to serialize a large scipy sparse matrix? -


I have a large SPA sparse matrix, which is taking 90% of my total system memory; I would like to save it to disk, because it takes a few hours to create the matrix ...

I have tried cPickle, but it leads to a major memory explosion ... < Pre> as import nppy as np scipy.sparse import lil_matrix import cPickle dim = 10 ** 8m = lil_matrix ((dim, dim), dtype = np.float) with open (file name , 'Wb') as f: cpickle. Dump (M, F) leads to a major memory explosion, is probably being copied to a lot

while HDF 5 did not like datatype: TypeError: Object DTP DTEP ('O')

This is not a native HDF5 equivalent

actually How much data is stored in the matrix? Have you seen the matrix type change before serialization?

LIL matrix is ​​not the most memory efficient sparse matrix available to you. For example: [43]: In dim = 10 ** 6 [44] For example: [49]: m = lil_matrix ((slow, slow), dtype = np.float) In [45]: For category (10000): M [np.random.uniform (0, dim), np.random.uniform (0, dim)] = 1 in [46]: lane (cPickle.dumps ( M.Dodok ()) [46]: 1256302 in [47]: lane (cPickle.dumps (mt) ()) out [47]: compared to # 557691 # [48]: lane (cPickle.dumps (M)) Out [48]: 23018393

These formats do not support the same set of all operations, but the conversion between formats is trivial.

Comments

Popular posts from this blog

ios - Adding an SKSpriteNode to SKScene from a child SKSpriteNode -

Matlab transpose a table vector -

c# - Textbox not clickable but editable -