Skipping blank lines in ruby CSV parsing

I recently had an import job failing because it took too long. When I had a look at the file I saw that there were 74 useful lines but a total of 1,044,618 lines in the file (My guess is MS Excel having a little fun with us).

Most of the lines were simply rows of commas:

Row,Of,Headers
some,valid,data
,,
,,
,,
,,
,,

The CSV library has an option named skip_blanks but the documentation says “Note that this setting will not skip rows that contain column separators, even if the rows contain no actual data”, so that’s not actually helpful in this case.

What is needed is skip_lines with a regular expression that will match any lines with just column separators (/^(?:,\s*)+$/). The resulting code looks like this:

require 'csv'
CSV.foreach('/tmp/tmp.csv',
            headers: true,
            skip_blanks: true,
            skip_lines: /^(?:,\s*)+$/) do |row|
  puts row.inspect
end

#<CSV::Row "Row":"some" "Of":"valid" "Headers":"data">
#=> nil
Jul 13, 2015 tech, ruby

Append items to a sorted collection in Backbone.js

I won’t cover all the boiler plate code but you can view that at JSFiddle The project is a ListItem model and a corresponding ListCollection. There is a ListItemView which is compiled into a ListView to create an ordered list. There is a FormView used for adding items to the collection.

The first component of our code is the comparator in the collection which keeps the list sorted by name.

var ListCollection = Backbone.Collection.extend({
  model: ListItem,
  comparator: function(item) {
    return item.get('name').toLowerCase();
  }
});

With this a simple render method will always have the list in order but it needs to redraw the list every time the collection is updated. Simply bind the add event to this.render and you’re done.

//...
  initialize: function() {
    this.listenTo(this.collection, 'add', this.render);
  },
  render: function() {
    var items = [];
    this.collection.each(function(item) {
      items.push((new ListItemView({model: item})).render().el);
    });
    this.$el.html(items);
    return this;
  }
//...

What if we have a list that is more complicated or we want to display the item being added. For this we need a couple of things.

  1. Split the creation of the item view out into its own factory method
  2. Call the factory method when building the initial list within render
  3. Create a new addItem method which will append the item to the list
  4. Change our event binding to this.addItem
//...
  initialize: function() {
    this.listenTo(this.collection, 'add', this.addItem);
  },
  render: function() {
    var self = this;
    var items = [];
    this.collection.each(function(item) {
      items.push(self.buildItemView(item).render().el);
    });
    this.$el.html(items);
    return this;
  },
  addItem: function(item) {
    var $view = this.buildItemView(item).render().$el;
    this.$el.append($view.hide().fadeIn());
  },
  buildItemView: function(item) {
    return new ListItemView({model: item});
  }
//...

The problem now is that we’re using jQuery’s append which adds the item view to the end of the list negating the work of the comparator in our Backbone collection. What we need now is a way to insert the new item into the list at the correct index. For that we’ll need at add an insertAt method to jQuery. This new method will take an index and an element and it will place it into the childNodes collection at the correct index.

$.fn.extend({
  insertAt: function(index, element) {
    var lastIndex = this.children().size();
    if(index < lastIndex) {
      this.children().eq(index).before(element);
    } else {
      this.append(element);
    }
    return this;
  }
});

Now we can update our addItem method to calculate the index of the new item and then add it into the list at that index.

//...
  addItem: function(item) {
    // Get the index of the newly added item
    var index = this.collection.indexOf(item);
    // Build a view for the item
    var $view = this.buildItemView(item).render().$el;
    // Insert the view at the same index in the list
    this.$el.insertAt(index, $view.hide().fadeIn());
  }
//...

The final working product is embedded here:

Jun 30, 2015 tech, javascript

The Lord's Prayer Compared

I have used green for text that is in one passage but not the other and orange for text that is different between the two passages.

Luke 11:1-13
Matthew 6:9-14; 7:7-11
11 Now Jesus was praying in a certain place, and when he finished, one of his disciples said to him, “Lord, teach us to pray, as John taught his disciples.”
And he said to them, “When you pray, say:
“Father, hallowed be your name.
Your kingdom come.
Give us each day our daily bread,
and forgive us our sins,
for we ourselves forgive everyone who is indebted to us.
  And lead us not into temptation.”
Pray then like this:
Our Father in heaven,
hallowed be your name.
10  Your kingdom come,
your will be done,
on earth as it is in heaven
.
11  Give us this day our daily bread,
12  and forgive us our debts,
as we also have forgiven our debtors.
13  And lead us not into temptation,
but deliver us from evil.
14 For if you forgive others their trespasses, your heavenly Father will also forgive you, 15 but if you do not forgive others their trespasses, neither will your Father forgive your trespasses.
And he said to them, “Which of you who has a friend will go to him at midnight and say to him, ‘Friend, lend me three loaves, for a friend of mine has arrived on a journey, and I have nothing to set before him’; and he will answer from within, ‘Do not bother me; the door is now shut, and my children are with me in bed. I cannot get up and give you anything’? I tell you, though he will not get up and give him anything because he is his friend, yet because of his impudence he will rise and give him whatever he needs.

And I tell you, ask, and it will be given to you; seek, and you will find; knock, and it will be opened to you. 10 For everyone who asks receives, and the one who seeks finds, and to the one who knocks it will be opened. 11 What father among you, if his son asks for a fish, will instead of a fish give him a serpent; 12 or if he asks for an egg, will give him a scorpion? 13 If you then, who are evil, know how to give good gifts to your children, how much more will the heavenly Father give the Holy Spirit to those who ask him!”
“Ask, and it will be given to you; seek, and you will find; knock, and it will be opened to you. For everyone who asks receives, and the one who seeks finds, and to the one who knocks it will be opened. Or which one of you, if his son asks him for bread, will give him a stone? 10 Or if he asks for a fish, will give him a serpent? 11 If you then, who are evil, know how to give good gifts to your children, how much more will your Father who is in heaven give good things to those who ask him!
May 27, 2015 theology

Tip: View the SQL query behind psql commands

If you want to view the SQL query used to construct the information returned from a psql command (which will help you learn the underlying information schema) then type \set ECHO_HIDDEN

$ psql test
psql (9.4.1)
Type "help" for help.

test=# \set ECHO_HIDDEN
test=# \dt
********* QUERY **********
SELECT n.nspname as "Schema",
  c.relname as "Name",
  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','')
      AND n.nspname <> 'pg_catalog'
      AND n.nspname <> 'information_schema'
      AND n.nspname !~ '^pg_toast'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
**************************

      List of relations
Schema | Name | Type  | Owner
--------+------+-------+--------
public | temp | table | andrew
(1 row)
May 15, 2015 tips, postgresql, tech

Unique constraint across two rows in PostgreSQL

I recently had a requirement where I needed an account to have zero, one or two actions associated with it. One could be a single action and the other could be one of many repeating types. I didn’t want two single actions and I didn’t want two or more types of repeating actions. To solve this I used two partial indexes to split the data set and apply a unique constraint to each set.

CREATE TABLE accounts (
  id   integer NOT NULL,
  name text    NOT NULL
);

CREATE TABLE actions (
  id          integer NOT NULL,
  account_id  integer NOT NULL,
  repeat_type text    NOT NULL DEFAULT 'none'
);

INSERT INTO accounts (id, name) VALUES (1, 'Test 1'), (2, 'Test 2');

If I create a unique index on actions(account_id) then I will only be able to have a single action per account.

CREATE UNIQUE INDEX idx_unique_accounts ON actions(account_id);

INSERT INTO actions (id, account_id, repeat_type) VALUES (1, 1, 'none');
-- INSERT 0 1
INSERT INTO actions (id, account_id, repeat_type) VALUES (1, 1, 'weekly');
-- ERROR:  duplicate key value violates unique constraint "idx_unique_accounts"
-- DETAIL:  Key (account_id)=(1) already exists.

DROP INDEX idx_unique_accounts;

The solution is to create two partial indexes, one for the single action and one for the repeating action.

TRUNCATE TABLE actions;
CREATE UNIQUE INDEX idx_unique_single_actions    ON actions(account_id) WHERE (repeat_type = 'none');
CREATE UNIQUE INDEX idx_unique_repeating_actions ON actions(account_id) WHERE (repeat_type != 'none');

INSERT INTO actions (id, account_id, repeat_type) VALUES (1, 1, 'none');
-- INSERT 0 1
INSERT INTO actions (id, account_id, repeat_type) VALUES (1, 1, 'weekly');
-- INSERT 0 1

Now inserting another single action will result in an error.

INSERT INTO actions (id, account_id, repeat_type) VALUES (1, 1, 'none');
-- ERROR:  duplicate key value violates unique constraint "idx_unique_single_actions"
-- DETAIL:  Key (account_id)=(1) already exists.

Or inserting another repeating action, even of a different repeat type, will result in an error.

(sql) INSERT INTO actions (id, account_id, repeat_type) VALUES (1, 1, 'monthly'); -- ERROR: duplicate key value violates unique constraint "idx_unique_repeating_actions" -- DETAIL: Key (account_id)=(1) already exists.

May 15, 2015 tech, postgresql

Looping with Fibers

An overview of how Fibers work in Ruby

Fibers are code blocks that can be paused and resumed. They are unlike threads because they never run concurrently. The programmer is in complete control of when a fiber is run. Because of this we can create two fibers and pass control between them.

Control is passed to a fiber when you call Fiber#resume, the Fiber returns control by calling Fiber.yield

fiber = Fiber.new do
  Fiber.yield 'one'
  Fiber.yield 'two'
end
puts fiber.resume
#=> one
puts fiber.resume
#=> two

The above example shows the most common use case where Fiber.yield is passed an argument which is returned through Fiber#resume. What’s interesting is that you can pass an argument into the fiber via Fiber#resume as well. The first call to Fiber#resume starts the fiber and that argument goes to the block that creates the fiber, all subsequent calls to Fiber#resume have their arguments passed to Fiber.yield.

fiber = Fiber.new do |arg|
  puts arg                   # prints 'one'
  puts Fiber.yield('two')    # prints 'three'
  puts Fiber.yield('four')   # prints 'five'
end
puts fiber.resume('one')     # prints 'two'
#=> one
#=> two
puts fiber.resume('three')   # prints 'four'
#=> three
#=> four
puts fiber.resume('five')    # prints nil because there's no corresponding yield and the fiber exits
#=> nil

Armed with this information, we can setup two fibers and get them to communicate between each other.

require 'fiber'

fiber2 = nil
fiber1 = Fiber.new do
  puts fiber2.resume     # start fiber2 and print first result (1)
  puts fiber2.resume 2   # send second number and print second result (3)
  fiber2.resume 4        # send forth number, print nothing and exit
end
fiber2 = Fiber.new do
  puts Fiber.yield 1     # send first number and print returned result (2)
  puts Fiber.yield 3     # send third number, print returned result (4) and exit
end
fiber1.resume            # start fiber1
#=> 1
#=> 2
#=> 3
#=> 4
puts "fiber1 done" unless fiber1.alive?
#=> fiber1 done
puts "fiber2 done" unless fiber2.alive?
#=> fiber2 done

EachGroup module

Knowing we can send information between two fibers with alternating calls of Fiber#resume and Fiber.yield, we have the building blocks to tackle a streaming #each_group method. Tip: The fiber you first call #resume on should always call #resume on the fiber it is communicating with. The other thread then always calls Fiber.yield. This goes against the natural inclination to pass information with Fiber.yield as in the first example above. Because of how the two fibers are setup below, you’ll see that no information is passed with Fiber.yield, information is only passed using Fiber#resume —confusing, I know.

# -*- coding: utf-8 -*-
require 'fiber'

module EachGroup
  def each_group(*fields, &block)
    grouper = Grouper.new(*fields, &block)
    loop_fiber = Fiber.new do
      each do |result|
        grouper.process_result(result)
      end
    end
    loop_fiber.resume
  end

  class Grouper
    def initialize(*fields, &block)
      @current_group = nil
      @fields = fields
      @block = block
    end
    attr_reader :fields, :block
    attr_accessor :current_group

    def process_result(result)
      group_fiber = get_group_fiber(result)
      group_fiber.resume(result) if group_fiber.alive?
    end

    private
    def get_group_fiber(result)
      group_value = fields.map{|f| result.public_send(f) }
      unless current_group == group_value
        self.current_group = group_value
        create_group_fiber(result, group_value)
      end
      @group_fiber
    end

    def create_group_fiber(result, group_value)
      @group_fiber = Fiber.new do |first_result|
        group = Group.new(group_value)
        block.call(group)
      end
      @group_fiber.resume(nil) # Start the fiber and wait for its first yield
    end
  end

  class Group
    def initialize(value)
      @value = value
    end
    attr_reader :value

    def each(&block)
      while result = Fiber.yield
        block.call(result)
      end
    end
  end
end

Example Usage

#each_group requires input sorted for grouping.

require 'each_group'
require 'ostruct'

Array.send(:include, EachGroup)

array = [
  OpenStruct.new(year: 2014, month: 1, date: 1),
  OpenStruct.new(year: 2014, month: 1, date: 3),
  OpenStruct.new(year: 2014, month: 2, date: 5),
  OpenStruct.new(year: 2014, month: 2, date: 7),
]
array.each_group(:year, :month) do |group|
  puts group.value.inspect
  group.each do |obj|
    puts "  #{obj.date}"
  end
end
#=> [2014, 1]
#=>   1
#=>   3
#=> [2014, 2]
#=>   5
#=>   7

This code can be used with ActiveRecord as follows:

ActiveRecord::Relation.send(:include, EachGroup)

Model.order('year, month').each_group do |group|
  group.each do
    # ...
  end
end

I have uploaded a Gist that shows a previous iteration of the EachGroup module using a nested loop which you may find easier to use to understand how the fibers are used to control the flow of the loop.

  1. The above code with a RSpec spec - https://gist.github.com/andrewtimberlake/9462561
  2. The original code with nested loops - https://gist.github.com/andrewtimberlake/9462561/f0e88cd310614a34693d57c3fc759f5c78e3a264

Thanks for taking the time to read through this. Explaining complicated concepts like Fibers is a challenge, please leave a comment and let me know if this was helpful or if you still have any questions.

Mar 10, 2014 ruby, tech

How to Add Subscribers to a MailChimp List With Ruby

I’m working on an app that creates user accounts and (optionally) subscribes users to our mailing list. Because I’m handling user creation in my app, I need some way to add them to the mailing list which is hosted on MailChimp. To do this, I am using their API to send through subscriber information.

The documentation for the ruby gem is not great. You have a few choices:

Below is some sample code that will get you started.

Install the mailchimp-api gem

> gem install mailchimp-api
# or
> echo 'gem "mailchimp-api", require: false' >> Gemfile
> bundle install

Get your MailChimp API Key

In MailChimp, go to your account settings page, click Extras and API Keys. If you don’t have an API key yet, click Create A Key.

Get your MailChimp list ID

Every list has a unique ID which is needed to add subscribers to the correct list. Got to Lists, Click on your list name, Click Settings and List name & defaults. On the right you’ll see your List ID (a 10 character hex code).

The code

require 'mailchimp' # The gem name is mailchimp-api but you require mailchimp

module MailChimpSubscription
  # These should prabably be environment variables or configuration variables
  MAIL_CHIMP_API_KEY = "0000000001234567890_us1"
  MAIL_CHIMP_LIST_ID = "abcdef1234"
  extend self

  def subscribe(user)
    mail_chimp.lists.subscribe(MAIL_CHIMP_LIST_ID,
                               # The email field is a struct that can use an
                               #    email address or two MailChimp specific list ids (see API docs)
                               {email: user.email},
                               # Set your merge vars here
                               {'FNAME' => user.first_name, 'LNAME' => user.last_name})
    rescue Mailchimp::ListAlreadySubscribedError
      # Decide what to do if the user is already subscribed
    rescue Mailchimp::ListDoesNotExistError => e
      # This is definitely a problem I want to know about
      raise e
    rescue Mailchimp::Error => e
      # Unforeseen errors that need to be dealt with
  end

  private
  def mail_chimp
    @mail_chimp ||= Mailchimp::API.new(MAIL_CHIMP_API_KEY)
  end
end

To use this module, you pass in a user object that responds to #email, #first_name and #last_name

user = OpenStruct.new(email: '[email protected]', first_name: 'John', last_name: 'Doe')
MailChimpSubscription.subscribe(user)

Final thoughts

It’s probably a good idea to put mailing list subscription into a background job so that you don’t slow down your user creation response time. You can also handle transient errors, retry failed attempts etc.

Feb 12, 2014 ruby, api, tech

Building my blog in Middleman

Installing Middleman

Adding extensions

middleman-blog middleman-syntax redcarpet

Github source code coloring

wget https://github.com/richleland/pygments-css/raw/master/github.css
def some_code
end
Dec 10, 2013 ruby, tech

Potential security hole authorising modules in CanCan

I got a message from a client this morning telling me that all users could see all reports on our product. Not good. I use CanCan to manage permissions and until now it has served me well. What went wrong? Whether a bug or not, I discovered that a very recent change I made had openned up the hole.

I wanted to have a permission setting that could prevent anyone from seeing any reports as well as more fine grained control over each individual report. My permissions looked a bit like this:

class Ability
  def initialize(user)
    can :read, Reports
    can :read, Reports::ReportA
  end
end

When checking permissions for another report within the module, I didn’t expect this:

module Reports
  class ReportBController
    def show
      authorize! :read, Reports::ReportB #=> I assumed it would not be authorized but it is
      ...
    end
  end
end

What I didn’t expect is that when you authorise a module, all classes in that namespace are authorised as well. As I mentioned above, I don’t know if this is by design or not. Some quick googling didn’t help me so I changed my code for a quick solution.

I post this to warn others who may have made the same assumption. If you’re reading this and know the project better and can point out if it is a bug or feature, please let me know in the comments.

Nov 7, 2013 ruby, ruby-on-rails, tech
Previous page Next page