Skipping blank lines in ruby CSV parsing
How to ignore blank lines when parsing a csv file with ruby's CSV parser.
I recently had an import job failing because it took too long. When I had a look at the file I saw that there were 74 useful lines but a total of 1,044,618 lines in the file (My guess is MS Excel having a little fun with us).
Most of the lines were simply rows of commas:
Row,Of,Headers some,valid,data ,, ,, ,, ,, ,,
The CSV library has an option named
skip_blanks but the documentation says “Note that this setting will not skip rows that contain column separators, even if the rows contain no actual data”, so that’s not actually helpful in this case.
What is needed is
skip_lines with a regular expression that will match any lines with just column separators (
The resulting code looks like this:
comments powered by Disqus
require 'csv' CSV.foreach('/tmp/tmp.csv', headers: true, skip_blanks: true, skip_lines: /^(?:,\s*)+$/) do |row| puts row.inspect end #<CSV::Row "Row":"some" "Of":"valid" "Headers":"data"> #=> nil