Skipping blank lines in ruby CSV parsing
I recently had an import job failing because it took too long. When I had a look at the file I saw that there were 74 useful lines but a total of 1,044,618 lines in the file (My guess is MS Excel having a little fun with us).
Most of the lines were simply rows of commas:
Row,Of,Headers
some,valid,data
,,
,,
,,
,,
,,
The CSV library has an option named skip_blanks
but the documentation says “Note that this setting will not skip rows that contain column separators, even if the rows contain no actual data”, so that’s not actually helpful in this case.
What is needed is skip_lines
with a regular expression that will match any lines with just column separators (/^(?:,\s*)+$/
).
The resulting code looks like this:
require 'csv'
CSV.foreach('/tmp/tmp.csv',
headers: true,
skip_blanks: true,
skip_lines: /^(?:,\s*)+$/) do |row|
puts row.inspect
end
#<CSV::Row "Row":"some" "Of":"valid" "Headers":"data">
#=> nil