Sunday, May 31, 2009

New BETA AWK CSV Parser

As Mentioned before on Sunday, December 21, 2008 I've been working on a new version of my AWK CSV parser. Working two jobs and trying to spend time with my Wife consumes an tremendous amount of my time. So unfortunately I have not made very fast progress. I have put up a new BETA version on my website. This still needs more testing, but it is working quite well thus far.

This has many major changes. The function has been renamed and more functions created. The error reporting has changed dramatically. There is also a new function that takes an array and creates a CSV string. I also changed the parser to start array counting at 1. This should put it more in line with how split() works. Please let me know if there are any problems. Either here as a comment or via email @ "LoranceStinson+CSV" at GMAIL dot "The usual".

Please keep in mind this is a BETA version. I have not created a test suite for this yet. Once I get a chance I will work on that. I'm also working to rewrite my CSV Utilities to not only incorporate this new version of the parser but they are also getting some major work done to them as well. I'm changing option handling and will write proper documentation and a complete testing suite.

Saturday, January 03, 2009

AWK Golf - pi eddition

Another one from Stack Overflow. This one to calculate pi using the Leibniz formula for pi. I of course submitted my solution in AWK. By the time I see them, the other languages I know well are already used.

awk 'BEGIN {p=1;for(i=3;i<10^6;i+=4){p=p-1/i+1/(i+2)}print p*4}'
-> 3.14159

Sunday, December 28, 2008

More AWK Golf

An older AWK golf from the Facebook AWK group.

My solution: awk 'NR==1||$0!=p{print;p=$0}'

From the post:

Write an awk script that reads (from file or stdin), but filters out duplicate lines that are adjacent. Input cannot be assumed to be sorted.

e.g.
a
a
a
b
b
c
c
c
a
a

would become:
a
b
c
a

Friday, December 26, 2008

AWK Golf

Spotted an x-mas themed golf on Stack Overflow. I submitted my solution in AWK.

echo "8" | awk '{s="#";for(i=0;i<$1;i++){printf"%"$1-i"s%s\n","",s;s=s"##"}printf"%"$1"s#\n",""}'

        #
       ###
      #####
     #######
    #########
   ###########
  #############
 ###############
        #

Sunday, December 21, 2008

Rewriting my AWK CSV parser

I posted this over on my personal blog and thought it would be good to copy it here. I know some people who follow this blog will be interested in this.

I've been working on rewriting my AWK CSV Parser for a while now. I'ave made many changes and improvements, like handling whitespace around separators. I've also created a function for creating CSV strings correctly (harder than it seems). With the addition of the new functions the parser has been renamed to csv_parse. Gone is the global error variable, replaced with a function that takes error codes from csv_parse and returns the string equivalent. Today I did more cleanup, fixed more bugs with the changes and the new csv_create function. I've also started incorporating the changes into my CSV Utilities. csv2csv has been the first victim, and I've already fixed a bug I never noticed before. The performance has suffered slightly over the old versions, but I think being correct is worth it.

There is still quite a bit of work to do though. I'm pondering changing to using getopts instead of a simple while; shift; loop in use now. The other csv2 utilities need to be converted over to the new parsing function. The -W option also needs to be added to trimming whitespace around the separators. The testing really needs to be improved. The testing shell script I created and have refined for some of my other projects, like my AWK data libraries (yet to be released) and CharCode, would probably be best.

A big change that I thought would be much harder is a change to newline handling. Now if -1 is passed for the newline string csv_parse returns -1 when a newline was expected. This will allow the caller to handle fetching the required data. Not every instance where csv_parse will be used would read data from the default source. Unfortunately the error codes have had to be rearranged to handle this. With all the other major changes already this didn't seem that big of a deal. And now I finally have a decent base to write a freeze/thaw routine for AWK. I've been pondering writing a routine to freeze arrays to a file and allow later thawing. The new CSV library makes this much easier to create.

These utilities, and the AWK library they are based on, have been a lot of work to get right. They have also been some of the most useful things I have written yet. I have made extensive use of the library and utilities. csv2csv alone is extremely useful for rearranging the columns of CSV data.

Saturday, June 21, 2008

Odd AWK

I've been working on two AWK projects off and on for a while now. I've been meaning to release them, but have not yet finished with them. I hope to do so, with at least one, soon.

One is a 65c02 emulator, complete with assembler, disassembler and even a simple program to combine object files. It's intended to be a teaching tool for both AWK code and 65c02 assembly. It's been on the back burner for a long while now but does actually work. The emulator alone is almost 2,000 lines of AWK, including comments. I have almost all 65c02 instructions emulated, only some strange and seldom used ones are left. The assembler needs a complete rewrite. It rather sucks. Though the expression evaluator makes a nice calculator when removed and put on it's own.

The other is a set of data libraries. There are implementations of singly and doubly linked lists, queues, stacks and binary trees. Each is in it's own file with a complete API. All based on the book Mastering Algorithms with C and the work of Donald E. Knuth. AWK arrays are used to store the actual data. They were only intended to be a learning exercise for myself. I thought others might use them and thus started cleaning them up and adding testing. I'd like to implement a few more interfaces to each and also an array library to get more Perl/C like arrays. AWK arrays are nice, but they are not real arrays. This project will probably be distributed before the 65c02 emulator. And perhaps might even be useful to someone...

Sunday, April 13, 2008

AWK CSV Parser Update

I've made a few improvements to my AWK CSV Parser:

  • Added a new option to trim spaces from the beginning and end of fields. This option is off by default to match the behavior of Excel and OpenOffice more closely.
  • Cleaned the code up a bit and added more comments.
  • Made the return codes indicate the error that occurred. (See the comments in the source)

http://lorance.freeshell.org/csv/