Sort text files with multiple lines as a row

Question

I have a text file in this format:

####################################
KEY2
VAL21
VAL22
VAL23
VAL24
####################################
KEY1
VAL11
VAL12
VAL13
VAL14
####################################
KEY3
VAL31
VAL32
VAL33
VAL34

I want sort this file by KEY line and keep next 4 lines with it in result so sorted result should be:

####################################
KEY1
VAL11
VAL12
VAL13
VAL14
####################################
KEY2
VAL21
VAL22
VAL23
VAL24
####################################
KEY3
VAL31
VAL32
VAL33
VAL34

is there a way to do this ?

@Zanna: I think there is a exclusion for unix and askubuntu sections as these two have a lot of overlap with eachother! I think I read about this in unix's meta section — RYN, 11 hours ago
relevant meta question asked here by AU mod :) How should questions cross-posted on Ask Ubuntu be handled? — Zanna, 11 hours ago

Peter Cordes · Accepted Answer · 2016-12-31 16:06:30Z

msort(1) was designed to be able to sort files with multi-line records. It has an optional gui, as well as a normal and usable-for-humans command line version. (At least, humans that like to read manuals carefully and look for examples...)

AFAICT, you can't use an arbitrary pattern for records, so unless your records are fixed-size (in bytes, not characters or lines). msort does have a -b option for records that are blocks of lines separated by blank lines.

You can transform your input into a format that will work with -b pretty easily, by putting a blank line before every ###... (except the first one).

By default, it prints statistics on stderr, so at least it's easy to tell when it didn't sort because it thought the entire input was a single record.

msort works on your data. The sed command prepends a newline to every #+ line except for line 1. -w sorts the whole record (lexicographically). There are options for picking what part of a record to use as a key, but I didn't need them.

I also left out stripping the extra newlines.

$ sed '2,$ s/^#\+/\n&/' unsorted.records | msort -b -w 2>/dev/null 
####################################
KEY1
VAL11
VAL12
VAL13
VAL14

####################################
KEY2
VAL21
VAL22
VAL23
VAL24

####################################
KEY3
VAL31
VAL32
VAL33
VAL34

I didn't have any luck with -r '#' to use that as the record separator. It thought the whole file was one record.

thank you very much; msort is very useful; thanks (about -r it seems it is because there are more than one # i used -d and it worked — RYN, 6 hours ago

xhienne · Answer 2 · 2016-12-31 12:24:23Z

up vote 8 down vote

A solution is to first change the line feeds inside a block to a unused character of your choice ('|' in the example below), to sort the result and to change back the chosen separator to the original line feed:

sed -e 'N; N; N; N; N; s/\n/|/g' file.txt \
| sort -k2,2 -t\| \
| sed 's/|/\n/g'

answered 11 hours ago

xhienne

9658

1

Thanks; this works but it is very dirty specially when the data is dirty too! if the lines after the key was 100 then I need to put 100 ;N there, and it can get difficult to find a character that is not used in text itself; it is very good for sort or awk , ... be able to do multiline sorting – RYN 11 hours ago

add a comment |

JJoao · Answer 3 · 2016-12-31 21:43:27Z

up vote 4 down vote

perl -0ne 'print sort /(#+[^#]*)/g' file.txt

perl -0 slurps the entire file
/(....)/g match and extract the records
print sort ... sort and print them

edited 2 hours ago

answered 3 hours ago

JJoao

2,977313

add a comment |

don_crissti · Answer 4 · 2016-12-31 15:54:08Z

Here's another way that should work with any number of lines in a KEY section:

# extract delimiter
delim=$(head -n1 <infile)
sed '/#/d;/KEY/h;G;s/\n/\x02/' infile | nl -ba -nrz -s $'\002' | sort -t $'\002' -k3 -k1,1 |
cut -d $'\002' -f2 | sed '/KEY/{x;s/.*/'"${delim}"'/;G}'

This works by saving the delimiter into a variable (to then remove it from the input). It then appends the KEY* to each line in its corresponding section using a low ascii char (which is unlikely to occur in your input) as a separator and then numbers all lines using the same separator. It's then only a matter of sorting by the 3rd and 1st field and cutting the middle column and then restoring the delimiters via a final sed. Do note that with the above, KEY12 will sort before KEY2 so adjust the sort command per your needs.

asked	today
viewed	137 times
active	today

current community

your communities

more stack exchange communities

Sort text files with multiple lines as a row

4 Answers 4

Your Answer

Not the answer you're looking for? Browse other questions tagged linux text-processing sort or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Sort text files with multiple lines as a row

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged linux text-processing sort or ask your own question.

Related

Hot Network Questions