Nat! bio photo

Nat!

Senior Mull

Twitter Github Twitch

Writing importers is my plight...

It just so happens that in every damn job I had, I had to write some sort of importer. I have really started to hate it. It just appears that I am spending more time writing importers for my applications than writing the application itself. Now with my own company I try to offload this sometimes and that even works sometimes. Hurray. So what do I do im my free time ?

I write an importer....

Since I upgraded the forum software on the R.A. Lafferty Page I put off the import of the old boards for some spare time.
I figured I was gonna do it manually some time...
But now I've got a volunteer (a fellow Lafferty fan) who undertakes the task to prepare the old forum entries in CSV, so that I can import them into phpBB which is running on MySQL.

So I have to massage the flatfile a little to create entries into four distinct tables to geneate users, topics and posts.

The problem is, I don't know how to write perl. The old unix tools cut and awk can't handle the CSV because they don't deal with quoted text. Perl could do it but the syntax is something like

 @new = ();
push(@new, $+) while $text =~ m{
                "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
              | ([^,]+),?
              | ,
}gx;
push(@new, undef) if substr($text,-1,1) eq ',';

I can't deal with that kinda shit.

So I made the (IMO) heroic effort and made mullecut, off the original textutils cut which is now able to handle quotes, escaped quotes and can deal with linefeeds, carriage returns and stuff. This makes this possible:

echo "ab""c";"abc";NULL; 0 | mullecut -f 1,4 -d';' -q'"' -e'\\'
ab"c;0

or

echo "ab""c";"abc";NULL; 0 | mullecut -f 1,4 -d';' -q'"' -e'\\' -r
"ab""c";0

Did I say I was very pleased with myself :)