perl - Importing loosely structured data into database -
I get daily data feeds with data which is only loose, I have to import it into a database so that I Run a report that finds changes in new records and existing records.
The data looks like this:
--- ------------------------ ----- Blast Foo Bar Loram: Epsom Dollar: Seat Fu: Bar Bar: Afu 123-555-1212 Loram / Ipsum / Dialer / Sit Foo / Bar --------------- ----------------- As you can see, there are some field titles like "Blah", "Laram" etc. But some data does not have a title, such as a phone number or a slash delimited list. And some titles are on the same line and others are not.
Just to keep us on our toes, there are no fields similar to records.
So I think there is a need to have at least 3 ways to parse the data to parse,
if "title: $" then the next "*." Hold the next lines and read "Title: Value" and if the line number starts with the title of "phone", and if the row has a slash delimited list then "--------. .. .. "but I do not have the idea of how to start some coding like this, the language is open at this point though I have to run the code in MacOS.
I think perl is good for this but very bad perl is foo.
Even to know where it starts with one where it starts.
You always need to think something about your text, otherwise there is an exercise in NLP.
Can we assume that the value of the non-price-value is finally? Therefore, the following regexs will help you:
Break the text in # record: @records = split / \ n ---------------- - \ N /, $ text; # It will be found in those lines which have another key / value added, followed by qr / \ A (\ w +): (. *?) (? = \ N \ w + :) / ms # then the last key / Value, maybe it is necessary to have a line: qr / ^ (\ w +): (. *) / I recommend that each time, after successful matching, matched Remove the text and continue.
Other useful assumptions: that the phone number can be displayed only once in the record, (and not as part of the other key / value) that tags are at the end
Comments
Post a Comment