Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I have a text file in this format:

####################################
KEY2
VAL21
VAL22
VAL23
VAL24
####################################
KEY1
VAL11
VAL12
VAL13
VAL14
####################################
KEY3
VAL31
VAL32
VAL33
VAL34

I want sort this file by KEY line and keep next 4 lines with it in result so sorted result should be:

####################################
KEY1
VAL11
VAL12
VAL13
VAL14
####################################
KEY2
VAL21
VAL22
VAL23
VAL24
####################################
KEY3
VAL31
VAL32
VAL33
VAL34

is there a way to do this ?

share|improve this question
2  
don't cross post please – Zanna 12 hours ago
    
@Zanna: I think there is a exclusion for unix and askubuntu sections as these two have a lot of overlap with eachother! I think I read about this in unix's meta section – RYN 11 hours ago
    
relevant meta question asked here by AU mod :) How should questions cross-posted on Ask Ubuntu be handled? – Zanna 11 hours ago
    
@Zanna: Ok I deleted the AU one; thanks – RYN 11 hours ago
    
no, thank you! :) – Zanna 11 hours ago
up vote 3 down vote accepted

msort(1) was designed to be able to sort files with multi-line records. It has an optional gui, as well as a normal and usable-for-humans command line version. (At least, humans that like to read manuals carefully and look for examples...)

AFAICT, you can't use an arbitrary pattern for records, so unless your records are fixed-size (in bytes, not characters or lines). msort does have a -b option for records that are blocks of lines separated by blank lines.

You can transform your input into a format that will work with -b pretty easily, by putting a blank line before every ###... (except the first one).

By default, it prints statistics on stderr, so at least it's easy to tell when it didn't sort because it thought the entire input was a single record.


msort works on your data. The sed command prepends a newline to every #+ line except for line 1. -w sorts the whole record (lexicographically). There are options for picking what part of a record to use as a key, but I didn't need them.

I also left out stripping the extra newlines.

$ sed '2,$ s/^#\+/\n&/' unsorted.records | msort -b -w 2>/dev/null 
####################################
KEY1
VAL11
VAL12
VAL13
VAL14

####################################
KEY2
VAL21
VAL22
VAL23
VAL24

####################################
KEY3
VAL31
VAL32
VAL33
VAL34

I didn't have any luck with -r '#' to use that as the record separator. It thought the whole file was one record.

share|improve this answer
    
thank you very much; msort is very useful; thanks (about -r it seems it is because there are more than one # i used -d and it worked – RYN 6 hours ago

A solution is to first change the line feeds inside a block to a unused character of your choice ('|' in the example below), to sort the result and to change back the chosen separator to the original line feed:

sed -e 'N; N; N; N; N; s/\n/|/g' file.txt \
| sort -k2,2 -t\| \
| sed 's/|/\n/g'
share|improve this answer
1  
Thanks; this works but it is very dirty specially when the data is dirty too! if the lines after the key was 100 then I need to put 100 ;N there, and it can get difficult to find a character that is not used in text itself; it is very good for sort or awk , ... be able to do multiline sorting – RYN 11 hours ago
perl -0ne 'print sort /(#+[^#]*)/g' file.txt
  • perl -0 slurps the entire file
  • /(....)/g match and extract the records
  • print sort ... sort and print them
share|improve this answer

Here's another way that should work with any number of lines in a KEY section:

# extract delimiter
delim=$(head -n1 <infile)
sed '/#/d;/KEY/h;G;s/\n/\x02/' infile | nl -ba -nrz -s $'\002' | sort -t $'\002' -k3 -k1,1 |
cut -d $'\002' -f2 | sed '/KEY/{x;s/.*/'"${delim}"'/;G}'

This works by saving the delimiter into a variable (to then remove it from the input). It then appends the KEY* to each line in its corresponding section using a low ascii char (which is unlikely to occur in your input) as a separator and then numbers all lines using the same separator. It's then only a matter of sorting by the 3rd and 1st field and cutting the middle column and then restoring the delimiters via a final sed. Do note that with the above, KEY12 will sort before KEY2 so adjust the sort command per your needs.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.