Linux Articles April 11, 2012 0

A bit of fun with cut, paste, join and sort in Linux

1. cut – Removes sections from each line of files.

Prints selected parts of lines from each file to standard output.
Default separator is a “Tab” sign.

Imagine we’ve got a file file_numbers.txt, which consist of three columns:

col_1 col_2 col_3   
one two three   
four five six   
seven eight nine   
ten eleven twelve

In the following example cut will return only column (col_2):

cut -f2 file_numbers.txt   
col_2 
two 
five 
eight 
eleven

in the next example cut will return columns 2 and 3:

cut -f1,3 file_numbers.txt   
col_1 col_3 
one three 
four six 
seven nine 
ten twelve

There os few more options described in man pages. Let’s take a look at two of them:

– c –characters=LIST select only these characters

– d –delimiter=DELIM use DELIM instead of TAB for field delimiter

In below example we will use “space” mark as a separator (-d ” “)

NOTE: an input delimiter may be specified only when operating on fields

We’ve got the following file: fileone.txt

01234:567;89 
ABCD:EFGH;IJ

display everything in first column before delimiter “;”

cut -d ";" -f1 fileone.txt   
01234:567 
ABCDE:FGH

display everything in second column beyond delimiter “;”

cut -d ";" -f2 fileone.txt   
89 
IJ

and again but this time using “:” as a deliminter

cut -d ":" -f1 fileone.txt   
01234 ABCDE   

cut -d ":" -f2 fileone.txt   
567;89 FGH;IJ

In below example we will use option “-c”

We’ve got the following file: fileone.txt

0123456789 
ABCDEFGHIJ

To display only characters 1 (of each row of the file)

cut -c1 fileone.txt   
0 A

To display only second character

 cut -c2 fileone.txt   
1 B

to return only third character

cut -c3 fileone.txt   
2 C

to return only eighth character

cut -c8 fileone.txt   
7 H

To display range (first till third characters)

cut -c1-3 fileone.txt   
012 ABC

To display fifth character and all characters after, till the end of line

cut -c5- fileone.txt   
456789 
EFGHIJ

 

2. Paste – merge lines of files

Write lines consisting of the sequentially corresponding lines from each FILE, separated by TABs, to standard output.

Let’s create two files

file_one.txt

01 Washington 
02 Sydney 
03 Warsaw 
04 Amsterdam 
05 Berlin

and file_two.txt

01 USA 
02 Australia 
03 Poland 
04 Netherlands 
05 Germany

 

Examples

paste file_one.txt file_two.txt   
01 Washington 01 USA 
02 Sydney 02 Australia 
03 Warsaw 03 Poland 
04 Amsterdam 04 Netherlands 
05 Berlin 05 Germany

Applying delimiter -d

paste -d "," file_one.txt file_two.txt   
01 Washington,01 USA 
02 Sydney,02 Australia 
03 Warsaw,03 Poland 
04 Amsterdam,04 Netherlands 
05 Berlin,05 Germany

and in combination with option -s

paste -s file_one.txt file_two.txt   
01 Washington 02 Sydney 03 Warsaw 04 Amsterdam 05 Berlin 
01 USA 02 Australia 03 Poland 04 Netherlands 05 Germany

The below example would take the input from ls and paste that input into four columns.

ls | paste - - -   
file_numbers_1.txt file_numbers.txt file_one.txt 
fileone.txt file_two.txt filetwo.txt

 

3. Join – join lines of two files on a common field.

For each pair of input lines with identical join fields, write a line to standard output.
The default join field is the first, delimited by white space. When FILE1 or FILE2 (not both) is -, read standard input.

Once again we have two example files:

file_one.txt

01 Washington 
02 Sydney 
03 Warsaw 
04 Amsterdam 
05 Berlin

file_two.txt

01 USA 
02 Australia 
03 Poland 
04 Netherlands 
05 Germany

 

NOTE: We have common rows for each rows 01,02…05 in both files

join file_one.txt file_two.txt   
01 Washington USA 
02 Sydney Australia 
03 Warsaw Poland 
04 Amsterdam Netherlands 
05 Berlin Germany

Above example is simple because both files do match in terms of beginning of rows.

Let’s add one line into file_two.txt

file_two.txt

01 USA 
02 Australia 
03 Poland 
06 Sweden 
04 Netherlands 
05 Germany

Let’s join two files again and see what happens

join file_one.txt file_two.txt   
01 Washington USA 
02 Sydney Australia 
03 Warsaw Poland   

join: file 2 is not in sorted order

Only rows until line 03 have been joined because line with 06 do not match file_one.

Let’s get back to the original file_two.txt and make the follwoing trick:

tac file_two.txt   
05 Germany 
04 Netherlands 
03 Poland 
02 Australia 
01 USA   

tac file_two.txt > file_three.txt   

join file_one.txt file_three.txt 
join: file 2 is not in sorted order   

05 Berlin Germany

Only 05 Germany has been returned because join checks only and only on the same rows in file two.

TIP: It’s a good idea to sort files before we join them to have both files sorted in correct order.

 4. Sort – sort lines of text files. Write sorted concatenation of all FILE(s) to standard output.

example fileone.txt

Washington 
Sydney 
Warsaw 
Amsterdam 
Berlin 
1
3
2
10
12
11

Let’s sort that file:

sort fileone.txt   
1
10
11
12
2
3
Amsterdam
Berlin
Sydney
Warsaw
Washington

City names have been sorted properly but numbers, also properly but alphabetically not numericaly.

To sort it properly we can apply option -n –numeric-sort. Compare according to string numerical value

sort -n fileone.txt   
Amsterdam 
Berlin 
Sydney 
Warsaw 
Washington 
1 
2 
3 
10 
11 
12

Useful sort options:

-d, --dictionary-order, consider only blanks and alphanumeric characters
-f, --ignore-case, fold lower case to upper case characters
-i, --ignore-nonprinting, consider only printable characters
-n, --numeric-sort, compare according to string numerical value
-r, --reverse, reverse the result of comparisons

 

That’s all folks.

Cheers!!