Tuesday, August 18, 2009

Searching a Hash with Regular Expressions

You want to grep a hash: that is, find all keys and/or values in the hash that match a regular expression.

The fastest way to grep the keys of a hash is to get the keys as an array, and grep that:

h = { "apple tree" => "plant", "ficus" => "plant",
"shrew" => "animal", "plesiosaur" => "animal" }
h.keys.grep /p/
# => ["apple tree", "plesiosaur"]

The solution of grepping the values of a hash is similar (substitute Hash#values for Hash#keys), unless you need to map the values back to the keys of the hash. If that's what you need, the fastest way is to use Hash#each to get key-value pairs, and match the regular expression against each value.

h.inject([]) { |res, kv| res << kv if kv[1] =~ /p/; res }
# => [["ficus", "plant"], ["apple tree", "plant"]]

The code is similar if you need to find key-value pairs where either the key or the value matches a regular expression:

 class Hash
      def grep(pattern)
          inject([]) do |res, kv|
           res << kv if kv[0] =~ pattern or kv[1] =~ pattern
           res
       end
     end
   end

h.grep(/p/)
# => [["ficus", "plant"], ["apple tree",plant"], ["plesiosaur", "animal"]]
h.grep(/plant/) # => [["ficus", "plant"], ["apple tree", "plant"]]
h.grep(/i.*u/) # => [["ficus", "plant"], ["plesiosaur", "animal"]]


Hash defines its own grep method,but it will never give you any results. Hash#grep is inherited from Enumerable#grep, which tries to match the output of each against the given regular expression. Hash#each returns a series of two-item arrays containing key value pairs, and an array will never match a regular expression. The Hash#grep implementation above is more useful.

Hash#keys.grep and Hash#values.grep are more efficient than matching a regular expression against each key or value in a hash. If memory usage is your primary concern, iterate over each_key or each_value instead:

res = []
h.each_key { |k| res << k ik k =~ /p/ }
res # => ["apple tree", "plesiosaur"]

No comments:

Post a Comment