Ruby one-liners for file manipulations

When I started using Ruby back in 2003 it was for scripting my Linux box. I didn’t even thought of using it for something else than scripting and system programming.

At this time I evaluated Perl, Python and Ruby. After a while I chose Ruby as I thought it was the cleanest and more enjoyable language out of the three to write scripts. Did I thought that I would use it for web development for the next fifteen years? Not even a second. I chose it to ease my daily job on my Linux system.

Today I’d like to share some tips I learned by using Ruby everyday for scripting. It’s going to be a bunch of one-liners that can be useful to manipulate files and get info out of it.

Sure you can use Perl, Awk / Sed, Python to do the same thing, but I like to do it using Ruby.

Ready? Fasten your belt, we’re going to take off.

In the following example I’m going to use /usr/share/dict/words as an input example since most of us have it but you can use the same one-liners on any file.

Working with newlines in files

In this section, we’re going to see how you can handle spaces in your text files.

Double newlines in a file

The idea is to add an empty line between each line so if your file is composed of one word per line, it will end up with a word, an empty line, a word and so on:

$ cat /usr/share/dict/words | ruby -pe 'puts'

The p switch tells Ruby to iterate through all lines and the e switch tells what to do on each one.

So here we print the line then add an empty one.

Triple newlines in a file

Now let’s say we want to add two empty lines between each word:

$ cat /usr/share/dict/words | ruby -pe '2.times { puts }'

Remove double newlines in a file

If you have a file with two newlines after each line, you’ll maybe want to strip the extra newline. There’s more than one way to do it:

$ cat double_newlines.txt | ruby -lne 'BEGIN{$/="\n\n"}; puts $_'

The l switch will use the value of $/ to chop! it (understand remove it) on each line for us.

If we don’t provide the l flag, we have to remove the double newlines by ourselves:

$ cat double.txt | ruby -ne 'BEGIN{$/="\n\n"}; puts $_.chop!'

Add a blank line every five lines

Now we’re going to add an empty line every five lines:

$ cat /usr/share/dict/words | /usr/bin/ruby -pe 'puts if $. % 6 == 0'

Here $. is the number of the current line we’re are processing. So if the line number modulo six is equal to zero then we add a new blank line.

Numbering and counting lines

A thing you’ll often want to do when writing scripts dealing with text files is to print the lines numbers or count the number of lines in the file. Here is how to do it using Ruby.

Number each line (left justified)

$ cat /usr/shar/dict/words | ruby -ne 'printf("%-6s %s", $., $_)'

This will add the line number on the left side, a space then the word. The line number will be left justified with a six characters pad.

Number each line (right justified)

$ cat /usr/shar/dict/words | ruby -ne 'printf("%6s %s", $., $_)'

This will add the line number on the left side, a space then the word. The line number will be right justified with a six characters pad.

Count lines

Not that this is not very effective, there are other (non one-liners to be used in the term) that will be much more faster.

$ cat /usr/share/dict/words | ruby -ne 'END { puts $. }' 

For this file that is near 250k lines it takes 140ms which is still pretty fast to me.

Converting newlines format (DOS / Unix)

This is something I used to do a lot when I still handled files coming from Microsoft world. I had to change the text file so that the Microsoft newline format (\r\n) was converted to Unix newline format (\n):

Convert DOS newlines (CR/LF) to Unix format (LF)

$ cat /usr/share/dict/words | ruby -lne 'BEGIN{$\="\n"}; print $_'

Convert Unix newlines (LF) to DOS format (CR/LF)

$ cat /usr/share/dict/words | ruby -lne 'BEGIN{$\="\r\n"}; print $_'

Deleting white spaces

Now we’re going to deal with deleting unwanted spaces.

Leading white spaces (space, tab, …)

You’ll sometimes have text files with lines beginning with spaces that you want to remove. Here how to do it:

$ cat leadings_whitespaces.txt | ruby -pe 'gsub(/^\s+/, "")'

We’re substituting everything that is understand as a white space by nothing so there are gone.

Trailing white spaces

You should also want to be able to do the same substitution for trailing white spaces:

$ cat trailing_whitespaces.txt | ruby -pe 'gsub(/\s+$/, $/)'

Leading and trailing white spaces

At some point you’ll maybe want to remove leading and trailing whither spaces from each line:

$ cat leading_and_trailing_whitespaces.txt | ruby -pe 'gsub(/^\s+/, "").gsub(/\s+$/, $/)'

Handling indentation

If you’re dealing with a lot of text files you’ll probably want to fix some indentation issues. Here are some tips.

Insert 4 spaces at the beginning of each line

Don’t know why you would want to do this but still 😆

$ cat /usr/share/dict/words | ruby -pe 'gsub($_, "    #{$_}")'

Align all text flush right on a 79 columns width

This one is more useful:

$ cat /usr/share/dict/words | ruby -ne 'printf("%79s", $_)'

Center all text in middle of 79 columns width

$ cat /usr/share/dict/words | ruby -lne 'puts $_.center(79)'

And now every line of text is centered!

Substitution

A common need when it comes to text handling is to change something bu something else, let’s see how to do it in one line.

Find and replace

Let’s say we want to change “foo” by “bar” in a text file:

$ cat foo_file.txt | ruby -pe 'tr("foo", "bar")'

Replace only for some lines

Maybe you don’t want to replace every occurrences but only the ones that are on a line that includes “baz”:

$ cat file | ruby -pe 'tr("foo", "bar") if $_ =~ /baz/'

Replace except for some lines

Maybe now you want to replace every occurrences for lines that doesn’t include “baz”:

$ cat file | ruby -pe 'tr("foo", "bar") unless $_ =~ /baz/'

Replace some words by another one

Let’s say you’re a demanding one and you want to be able to change “foo”, “bar” or “baz” by “Bounga”:

$ cat file | ruby -pe 'gsub(/(foo|bar|baz)/, "Bounga")'

Reverse things

Sometimes you’ll have to reverse input, here are some examples.

Reverse order of lines

This is a classic one, for some reason you would want to read the file in reverse order:

$ cat /usr/share/dict/words | ruby -ne 'BEGIN{@arr=Array.new}; @arr.push($_); END{puts @arr.reverse}'

Reverse character

Maybe you’ll want to reverse character of words in every lines:

$ cat /usr/share/dict/words | ruby -lne 'puts $_.reverse'

Joining

Pairs lines side by side

Let’s say you have a file full of words just like /usr/share/dict/words and you want to pair words by 2. Here is a way to do it with a Ruby one-liner:

$ cat /usr/share/dict/words | ruby -pe '$_ = $_.chomp + " " + gets if $. % 2'

Interpret backslash as an append operator

If you’re used to shell, you maybe know that you can split your command on multiple lines by using a backslash. Here’s how to interpret such splitting using Ruby:

$ cat file | ruby -pe 'while $_.match(/\\$/); $_ = $_.chomp.chop + gets; end'

Appending to previous line

Now you want to level up your game by allowing your user to use an equal sign in the beginning of a line to happen the statement to the previous line:

$ cat file | ruby -e 'puts STDIN.readlines.to_s.gsub(/\n=/, "")'

Selective printing

Emulate head behavior

Let’s print the first line of a file:

$ cat file | ruby -pe 'puts $_; exit'

Now we’ll print the first ten lines:

$ cat file | ruby -pe 'exit if $. > 10'

Emulate tail behavior

Now we’re gonna print the last line of a file:

$ cat file | ruby -ne 'line = $_; END {puts line}'

Now we’ll print the first ten lines:

$ cat file | ruby -e 'puts STDIN.readlines.reverse!.slice(0,10).reverse!'

Once again this one isn’t very effective. It’s ok for small files (hundred thousand of lines) but we’re parsing the whole file only to display latest lines.

Match regexp

Print lines that match a regexp only

$ cat file | ruby -pe 'next unless $_ =~ /regexp/'

Print lines that do not match a regexp

$ cat file | ruby -pe 'next if $_ =~ /regexp/'

Print the line immediately before a regexp

$ cat file | ruby -ne 'puts @prev if $_ =~ /regex/; @prev = $_;'

Print the line immediately after a regexp

$ cat file | ruby -ne 'puts $_ if @prev =~ /regex/; @prev = $_;'

Emulating grep

Grep lines with matching terms in any order

Here’s how to print lines that match foo, bar and baz:

$ cat file | ruby -pe 'next unless $_ =~ /foo/ && $_ =~ /bar/ && $_ =~ /baz/'

Grep lines with matching terms in order

Now let’s do the same but respecting the order:

$ cat file | ruby -pe 'next unless $_ =~ /foo.*bar.*baz/'

Grep lines with any term matching

Now we want print each line matching any of the terms specified/

$ cat file | ruby -pe 'next unless $_ =~ /(foo|bar|baz)/'

Printing paragraphs

Print paragraph if it contains regexp

$ cat file | ruby -ne 'BEGIN{$/="\n\n"}; print $_ if $_ =~ /regexp/'

Print paragraph if it contains `foo` and `bar` and `baz` in any order

$ cat file | ruby -ne 'BEGIN{$/="\n\n"}; print $_ if $_ =~ /foo/ && $_ =~ /bar/ && $_ =~ /baz/'

Print paragraph if it contains `foo` and `bar` and `baz` in order

$ cat file | ruby -ne 'BEGIN{$/="\n\n"}; print $_ if $_ =~ /(foo.*bar.*baz)/'

Print paragraph if it contains `foo` or `bar` or `baz`

$ cat file | ruby -ne 'BEGIN{$/="\n\n"}; print $_ if $_ =~ /(foo|bar|baz)/'

Print based on line length

Print only lines of 65 characters or greater

$ cat file | ruby -lpe 'next unless $_.length >= 65'

Print only lines of 65 characters or less

$ cat file | ruby -lpe 'next unless $_.length < 65'

Print based on line numbers

Print section of file based on line numbers (eg. lines 2-7 inclusive)

$ cat file | ruby -pe 'next unless $. >= 2 && $. <= 7'

Print line number 52

$ cat file | ruby -pe 'next unless $. == 52'

Print every 3rd line starting at line 4

$ cat file | ruby -pe 'next unless $. >= 4 && $. % 3 == 0'

Print line based on regexp

Print section of file from regex to end of file

$ cat file | ruby -pe '@found=true if $_ =~ /regex/; next unless @found'

Print section of file between two regular expressions, /foo/ and /bar/

$ cat file | ruby -ne '@found=true if $_ =~ /foo/; next unless @found; puts $_; exit if $_ =~ /bar/'

Print all the file except between two regular expressions, /foo/ and /bar/

$ cat file | ruby -ne '@found = true if $_ =~ /foo/; puts $_ unless @found; @found = false if $_ =~ /bar/'

Removing duplicates

Print file and remove duplicate, consecutive lines from a file

$ cat file | ruby -ne 'puts $_ unless $_ == @prev; @prev = $_'

Print file and remove duplicate, non-consecutive lines from a file

$ cat file | ruby -e 'puts STDIN.readlines.sort.uniq!.to_s'

Delete all consecutive blank lines from a file except the first

$ cat file | ruby -e 'BEGIN{$/=nil}; puts STDIN.readlines.to_s.gsub(/\n(\n)+/, "\n\n")'

Delete all leading blank lines at top of file

$ cat file | ruby -pe '@lineFound = true if $_ !~ /^\s*$/; next if !@lineFound'

Selective deleting

Print file except for first 10 lines

$ cat file | ruby -pe 'next if $. <= 10'

Print file except for last 10 lines

$ cat file | ruby -e 'lines=STDIN.readlines; puts lines [0,lines.size-10]'

Print file except for every 8th line

$ cat file | ruby -pe 'next if $. % 8 == 0'

Print file except for blank lines

$ cat file | ruby -pe 'next if $_ =~ /^\s*$/'

Conclusion

Who said that Perl was the only language to deal with file content manipulation?!

Share on

Twitter Facebook LinkedIn

Have comments or want to discuss this topic?

Send an email to ~bounga/public-inbox@lists.sr.ht