c - Sequential, subsequent loading of files gets much slower over time -

- April 15, 2010

I have found the following code to read and process many very large files from one.

 for  (j = 0; j & lt; cors; ++ j) {double time = omp_get_wtime (); Printf ("File:% d, time:% f \ n", j, time); Four [256]; Sprintf (in, "% s.% D", FIN, j); FILE * f = fopen (in, "r"); If (f == NULL) fprintf (stderr, "open fail:% s \ n", FIN); Int i; Four buffers [1024]; Four * tweets; Int lateem = 1; (I = 0, tweet = tweeters + (size_t) h * (size_t) tnn * (size_t) tsesee; i & lt; tNUM; i ++, tweet + = tsesees) {double start; Double end; If (techime) {start = omp_get_wtime (); TechTime = 0; } Char * line = fgets (buffer, 1024, f); If (line == faucet) {fprintf (stderr, "error reading line% d \ n", i); Exit (2); } Int fn = readNumber (and line); Int ln = readNumber (and line); Int month = readMonth (& line); Int day = readNumber (and line); Int Hit = Counterfeit Hits (Line, Key); WriteTweet (Tweet, FN, LN, Hit, Month, Day, Line); If (i% 1000000 == 0) {end = omp_get_wtime (); Printf ("line:% d, time:% f \ n", i, end-start); TechTime = 1; }} Fclose (f); }    Each file has 24000000 Tweets and I read a total of 8 files, one after the other. Each line (1 color) is processed and writes (:) A really copies a modified line in the big four array.  
 As you can see, I measure time to see how much time I take in reading and processing one million tweets. For the first file, about 0.5 seconds per 1 million, which is fast enough but after every additional file, it takes longer. File 2 has 1 million lines per 1 million lines (but not every time, in some iterations), file number 8 to 8 seconds. Should this be expected? Can I speed things up? All files are more or less completely identical, always have 24 million lines.  
 EDIT: Additional information: Each file is required, in processed form, it means about 730MB of RAM, using 8 files, we end up with approximately 6 GB memory requirement Are there.  
 As content, writeTweet () content  
  Zero instrument (four * tweet, const int fn, const int ln, const int hit, const int month, Const int day, four * line) {short * ptr1 = (short *) tweet; * Ptr1 = (short) fn; Int * ptr2 = (int *) (tweet + 2); * Ptr2 = ln; * (Tweet + 6) = (four) hits; * (Tweet + 7) = (four) months; * (Tweet +8) = (four) days; Int i; Int n = TSIZE - 9; For (i = strangle (line); i & lt; n; i ++) line [i] = ''; // padding mempi (tweet + 9, line, n); }     
  Perhaps,  writeTweet ()  is an obstacle if If you copy all the processed tweets in memory, the huge data array with which the operating system has to do something is created over time. If you do not have enough memory or other processes in the system actively use it, then the OS will dump the portion of the data (in most cases) on a disk. It increases the time of access to the array. It is more hidden that can affect performance.  
 You should not store all the processed lines in memory the easiest way: to dump the processed tweets on a disk (type a file). However, the solution depends on how you use processed Tweets. If you do not use data from the array sequentially, then it is worth thinking about the special data structure for storage (?). For this purpose there are already many libraries -  
  UPD:   
 Use a special memory for the maintenance of this model in the kernel Is the manager who creates special structure of reference. Usually this is the map, they refer to the sub-maps for large memory versions - it is rather a large branch structure, with random pieces of memory often addressing any random pages during work. Necessary. The OS uses a special cache for address acceleration. I do not know all the nuances of this process, but I think the cash should be often invalid in this case because there is no memory of all references at the same time for storage. This reduces the costly operation performance. It will be more, more than memory is used.  
 If you need to sort the big tweets array, then it is not compulsory for you to store everything in memory. There are ways in which if you want to sort the data in memory, it is not necessary to take actual swap action on array elements. Using the intermediate structure with the references of elements in the Tilt's array and it is better to sort the references rather than the data.   

 




  



















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




ios - Adding an SKSpriteNode to SKScene from a child SKSpriteNode -



-



May 15, 2015








    I have a SKScene where I am adding an SKSpriteNode I have subclassed the SKSpriteNode class to create this node. In the subclass I am defining some SKActions on Sprite. What do I want to do when the end of this phantom ends the SKAction sequence, then I add a new Prayer node to this scene. How is this possible. The following code is mine:   The code for the sequence is that I'm running on the Scansprinitode subclass (TEMissileNode): -    SKAction * moveDown = [SKAction moveToY: Self.position.y - 20 Duration: 0.2]; SKAction * animation = [SKAction animateWithTextures: textures timePerFrame: time / 7]; SKAction * moveMissileProjectile = [SKAction Moving: Pointoffscreen Term: Time]; SKAction * group = [SKAction group: @ [animation, movement projectile]]; SKAction * Sequence = [SKAction Sequence: @ [Hilldown, Group, [SKAction removalFromParent]]]; [Self run action: sequence];    From the main scene, I call those actions that execute these tasks    TEMissileNode * missile = [tmissil...





Read more





Matlab transpose a table vector -



-



May 15, 2013








    An embarrassing simple question seems to be, but how can I move a Matlab table vector?    aTableT = aTable ';   I tried a standard syntax for the simple interaction of a line vector to a line vector  aTable :   ATableT = reshape (aTable, 1, Height (aTable));    and    aTableT = rot90 (aTable);    According to the head, the last time table should work for table, see. However, I get this error code:     Error type in using table / permit (line 396) Undefined function 'permute' for the input arguments.   Error in rot 90 (line 29) b = magnitude (b, [2 1 3: ndims (a)]);     NB:  fliplr  is not useful either pretty sure I have covered clear angles - any ideas? Thanks!      Try changing your table into an array, move it, then back to a table In other words, try doing this:    aTableArray = table2array (aTable); ATableT = array 2table (aTableArray. ');     I have also read the document for  rot90 , and it says that  rot90  is definitely working for tables Do, and I find you...





Read more





c# - Textbox not clickable but editable -



-



September 15, 2011








    I have a small form with 10 text boxes, I have set them in the correct tab order. I want them on my tab service I was wondering if there is a way to set the text box so that they can be selected for editing until they are tabbed. That is ... I do not want the end user to edit the text box to click on them, I just want to make them editable through tabu.      this should do the trick    public partial category poor text box: Text Box {Safe Override Zero WndProc (Ref: Message Message) {if (m.msg == (Int) WM.LBUTTONDOWN) {Return; // mount down events} base.WndProc (ref m); }}    Window message enum can be found.   How to do this without anherizing:  text box :   class EatMouseDown: netwindows {safe override zero WndProc (ref message message) {if (m.msg == (int) WM.LBUTTONDOWN) {return; } Base.WandProc (Ref M); }} Safe Override Zero Onload (EventEurge E) {base.OnLoad (e);      How to do this without any part:  left a clean part, whatever is important, this can be a buggy but it works R...





Read more

Search This Blog

Coat

c - Sequential, subsequent loading of files gets much slower over time -

Comments

Post a Comment

Popular posts from this blog

ios - Adding an SKSpriteNode to SKScene from a child SKSpriteNode -

Matlab transpose a table vector -

c# - Textbox not clickable but editable -