« Spending time on the R.A. Lafferty Devotional Page | Main | Boring Empty Void »

Writing importers is my plight...

It just so happens that in every damn job I had, I had to write some sort of importer. I have really started to hate it. It just appears that I am spending more time writing importers for my applications than writing the application itself. Now with my own company I try to offload this sometimes and that even works sometimes. Hurray. So what do I do im my free time ?

I write an importer....

Since I upgraded the forum software on the R.A. Lafferty Page I put off the import of the old boards for some spare time.
I figured I was gonna do it manually some time...
But now I've got a volunteer (a fellow Lafferty fan) who undertakes the task to prepare the old forum entries in CSV, so that I can import them into phpBB which is running on MySQL.

So I have to massage the flatfile a little to create entries into four distinct tables to geneate users, topics and posts.

The problem is, I don't know how to write perl. The old unix tools cut and awk can't handle the CSV because they don't deal with quoted text. Perl could do it but the syntax is something like

 @new = ();
push(@new, $+) while $text =~ m{
                "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
              | ([^,]+),?
              | ,
}gx;
push(@new, undef) if substr($text,-1,1) eq ',';
I can't deal with that kinda shit.

So I made the (IMO) heroic effort and made mullecut, off the original textutils cut which is now able to handle quotes, escaped quotes and can deal with linefeeds, carriage returns and stuff. This makes this possible:

echo "ab""c";"abc";NULL; 0 | mullecut -f 1,4 -d';' -q'"' -e'\\' 
ab"c;0
or
echo "ab""c";"abc";NULL; 0 | mullecut -f 1,4 -d';' -q'"' -e'\\' -r 
"ab""c";0
Did I say I was very pleased with myself :)

About

This page contains a single entry from the blog posted on November 9, 2003 11:45 PM.

The previous post in this blog was Spending time on the R.A. Lafferty Devotional Page.

The next post in this blog is Boring Empty Void.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34