python - What's the best way to serialize a large scipy sparse matrix? -
I have a large SPA sparse matrix, which is taking 90% of my total system memory; I would like to save it to disk, because it takes a few hours to create the matrix ...
I have tried cPickle, but it leads to a major memory explosion ... < Pre> while HDF 5 did not like datatype: TypeError: Object DTP DTEP ('O') This is not a native HDF5 equivalent actually How much data is stored in the matrix? Have you seen the matrix type change before serialization? LIL matrix is not the most memory efficient sparse matrix available to you. For example: [43]: In dim = 10 ** 6 [44] For example: [49]: m = lil_matrix ((slow, slow), dtype = np.float) In [45]: For category (10000): M [np.random.uniform (0, dim), np.random.uniform (0, dim)] = 1 in [46]: lane (cPickle.dumps ( M.Dodok ()) [46]: 1256302 in [47]: lane (cPickle.dumps (mt) ()) out [47]: compared to # 557691 # [48]: lane (cPickle.dumps (M)) Out [48]: 23018393 These formats do not support the same set of all operations, but the conversion between formats is trivial. as import nppy as np scipy.sparse import lil_matrix import cPickle dim = 10 ** 8m = lil_matrix ((dim, dim), dtype = np.float) with open (file name , 'Wb') as f: cpickle. Dump (M, F) leads to a major memory explosion, is probably being copied to a lot
Comments
Post a Comment