http - Splitting a dictionary in python based on memory size -
I am in the process of moving a distributed file system to ALS SERE using BOTO, and I run into a problem I am not aware that I have a clear solution: The current status of my code is as follows:
def insert (documents): data = {hash_doc (d): _decode (d) The hash_doc (d)} domain for. Batch_put_attributes (data) The real issue is that I am hitting that the request to make up AWS in is batch_put_attributes function is the maximum size of 1 MB . Obviously I want to reduce the number of requests I am doing, but I can not even stop the limit of 1 MB. There is a good python way to say originally
Split this split into separate pieces that are below certain memory size, but possible As an affair in some form
I do not like to include more code, but I did not find anything that is sensitized on this one and I think there is a There should be a very straightforward solution.
maybe do something K, in data.items () for V
size_d = defaultdict (list): preprocess it to slightly like the following. Size_d [sys.getsizeof (v)] enclosed (v) Then fill up one MB bucket of items to create only one function, pop the item you decide to send So that you do not reuse it. Probably may be slightly favorable by sorting the objects based on size. It is very good that if you find an optimal solution then we all know :)
Comments
Post a Comment