Getting the grasp of grep, awk and sed

I’ve been looking at my server logs and been wanting to check which ip’s are hitting my server, both good and bad.

I looked through my logs and one of things I noticed in my maillog was authentication failures.
Dec 3 06:56:08 madbull postfix/smtpd[24006]: warning: unknown[x.x.x.x]: SASL LOGIN authentication failed: UGFzc3dvcmQ6

After a little bit of searching and reading I was able to put together this:

grep 'SASL LOGIN authentication failed' /var/log/maillog | awk '{print $7}' | sed 's/\unknown//g;s/\[//g;s/\]//;s/\://g' | uniq -c | sort

I grep for the term ‘SASL LOGIN authentication failed’ in the file /var/log/maillog, then pipe it into awk that prints out the 7th row in the line. From there it is piped into sed which does a search for unknown, [, ] and :. From there it’s piped into uniq and counted so that I see if there are any repeat offenders. Finally it is piped into sort, so that I have the ip’s with the most hits at the bottom.

The grep command is self explanatory, so I’ll jump in and explain what awk does. In the curly braces it is told to print out the 7th row and if you count the numbers of rows that are in the line from the log you will see that Dec is one, 3 is two, 06:56:08 is three and so on. It looks for spaces, so the whole postfix/smtpd[24006]: is one row. So we get unknown[x.x.x.x]: as our new string to pass along.

Turn to sed and get some substitution going :D. The ‘s says that we want to substitute the text that matches our search term after the delimiter /, the \ in that part is there to show the character / (there are other delimiters also). We are looking for the text unknown so we set up the unknown string as the word we are searching for. After the second slash / we can write what we want it to substitute with, for instance sed ‘s/\unknown/known/’ would change unknown to known. The last slash ends the substitution and g is the Global flag. Is says that if it occurs more than one time on a line, substitute that too. So if there where 10 unknown strings, they all would be deleted since i have nothing in between the second and the third slash. The rest of the sed line is just more of the same.

uniq -c put every instance of a matching string together, so that if I had ten instances of in there with uniq it would show up as just one, but with the -c flag it would also count them and give you the line 10, and that is a nice way to see if there is a repeat offender knocking at our door.

The sort is just for letting me see the ones with the highest number at the bottom.

Man (the manual command in linux) is your friend when it comes to explaining more of what the different commands does. I’m reading up on them myself so that I can understand better what is going on under the hood.

For some more sed magic: Sed