Extracting Data from Multiple TXT Files and Creating a Summary CSV File in Python -


I have a folder that contains 50 .txt files containing the data in the following format.

  === Forecasting on test data === inst # actual antitrust error distribution (OTDT19__lantation) 1 1: S2: R + 0.125, * 0.875 (73.84)   

I have to write a program that adds the following: my index number (i), letter of real class (rs or s), letter of prophecy class, and each of the distribution forecasts (less than 1.0 Decimal).

When I finish it, I want to look like the following, but preferably one. As a Csv file

  id corrected SR1RR 0.125 0.875 2RR 0.105 0.8 9 3Ss 0.945 0.055. . . . . . . . . . . . . . . NSS 0.900 0.100   

I am a beginner and a little fuzzy how everyone is involved after parsing and then inserted and added. Here I was wondering, but do not hesitate to suggest another direction if it is easy. For category (1, n)

 : s = str (i) readin = open ('mydata / output / output' + s + 'out', 'r') # files All have been given the same name but output = associated with different numbers = open ("mydata / summary.csv", "a") storage = [] for line in redidin: #data extraction / concatenation if line.startswith (' 1 '): Take the id = i true = # split and take the letter after it - # split over the other:' and then take the letter # Some people have an error '+' And there is no such thing. I'm not really sure that the distributor What should be done to get the partition on DS = #, to get the ',' and if you take the character except before the pred == 'R': dr = # characters, then the characters are: # Comma Lineholder = ID + Take up five characters after ',' + true + ',' + + pred + ',' + ds + ',' + dr: continue: output.write (lineholder)   

I think the use of the index will be another option, but if it's spacing in any file, it can make things complicated and I have it Not Sure did.

Thank you for your help!

Well first, if you want to use CSV, you come up with python Should use the CSV module. More information about this module here: I will not use it because it is very easy.

To read the input data, here is the suggestion of how to break every line of data. I think the data lines in these input files are different from the blank space, and each value does not have space Can be:

  def process_line (id_, line): pieces = line.split () # array of our values ​​now = true = pieces [1] .split (':') [ 1] Divide on # '': and then take the letter [2] .split (':') [1] # split on the second: 'and after lane (pieces) =' 6 then take the letter Take: # was an error, there is p4 = pieces [4] and: # there was no '' only spaces P4 = pieces [3] DS = P4 split (',') [0] # ' ', And if the first ==' R ': take a 5-digit string from Dr. = P4. Split (',') [0] [1:] # to hide the character after a comma, but to take ??? After the letter: dr = p4.split (',') [0] back to ID_ + ',' + right + ',' + + pred + ',' + ds + ',' + dr   

What was the main function I used here was the function of strings divided: The simple syntax of strings [1:] to leave the first letter of the string and strings after all arrays , We can use this skiing syntax).

Keep in mind that my function will not make any errors or lines differently as an example compared to what you have posted. If the values ​​in each row are separated by tabs and there are no spaces, then you should change this line: pieces = line.plit () with pieces = line.split ( '\ T') .

Comments

Popular posts from this blog

ios - Adding an SKSpriteNode to SKScene from a child SKSpriteNode -

Matlab transpose a table vector -

c# - Textbox not clickable but editable -