Andrew Timberlake Andrew Timberlake

Hi, I’m Andrew, a programmer and entrepreneur from South Africa,
building Mailcast for taking control of your email.
Thanks for visiting and reading.


Skipping blank lines in ruby CSV parsing

I recently had an import job failing because it took too long. When I had a look at the file I saw that there were 74 useful lines but a total of 1,044,618 lines in the file (My guess is MS Excel having a little fun with us).

Most of the lines were simply rows of commas:

Row,Of,Headers
some,valid,data
,,
,,
,,
,,
,,

The CSV library has an option named skip_blanks but the documentation says “Note that this setting will not skip rows that contain column separators, even if the rows contain no actual data”, so that’s not actually helpful in this case.

What is needed is skip_lines with a regular expression that will match any lines with just column separators (/^(?:,\s*)+$/). The resulting code looks like this:

require 'csv'
CSV.foreach('/tmp/tmp.csv',
            headers: true,
            skip_blanks: true,
            skip_lines: /^(?:,\s*)+$/) do |row|
  puts row.inspect
end

#<CSV::Row "Row":"some" "Of":"valid" "Headers":"data">
#=> nil
12 Jul 2015
comments powered by Disqus