Updates to the Bash command line on a regular basis

N
Netooze
January 31, 2020

RegEx (or RegExp) is a set of characters that define a search pattern. They can be used to do Find and Replace operations, as well as to check conditions like password policy, phone number entry, and so on.

Here is an example of a regular expression:

/t[aeiou]l/

This regular expression will look for a word that starts with the letter 't', contains any of the letters 'aeio u' in the middle, and ends with the letter 'l'. It can be 'tel', 'tal' or 'til'. The match can be a single word or part of another word, such as 'tilt', 'brutal' or 'telephone'.

Now, using the Bash shell as an example, let's look at fundamental regular expressions.

Regular expression basics

The general syntax for the 'grep' command is as follows:

$ grep поисковый_запрос_regex расположение_файла

Let's look at some special characters known as metacharacters. They help to create more complex search expressions:

. will match any character;
[ ] will match the range of characters;
[^ ] will match all characters except those specified in curly braces;
* will match any number of characters preceding the asterisk, including zero;
+ will match one or more of the expressions before it;
? will match zero or one of the expressions before it;
{n} will match 'n' repetitions of preceding expressions;
{n,} will match at least 'n' repetitions of the preceding expressions;
{n m} will match at least 'n' and at most 'm' repetitions of the preceding expressions;
{,m} will match at most or equal to 'm' repetitions of the preceding expressions;
is an escape character (escape character) used when one of the metacharacters needs to be included.

Here are some examples:

. (point)

Used to match any character that appears in the search query. For example, we can use a dot as:

$ grep "d.g" file1

This regular expression means that we are looking for a word that starts with 'd', ends with 'g' and can contain any one character in the middle of a file named 'file1'. Similarly, we can use the dot character any number of times for our search pattern, like so:

T......h

This search term will look for a word that starts with 'T', ends with 'h' and can contain any six characters in the middle.

[ ]

Square brackets are used to define a range of characters. For example, when you need to look for one of the listed characters, and not any character, as in the case of a dot:

$ grep "N[oen]n" file2

Here we are looking for a word that starts with 'N', ends with 'n' and can only have 'o', 'e' or 'n' in the middle. You can use any number of characters in square brackets. We can also define ranges such as 'ae' or '1-18' as a bracketed list of matching characters.

[^ ]

This is similar to the negation operator for regular expressions. Using [^ ] means that the search will include all characters except those in square brackets. For example:

$ grep "St[^1-9]d" file3

This means that we can have all words that start with 'St', end with 'd', and do not contain the digits 1 to 9.

So far, we have used examples of regular expressions that only look for a single character. But what to do in other cases? For example, if you want to find all words that begin or end with a character, or may contain any number of characters in the middle. This task is handled by the so-called quantifier meta-characters, which determine how many times the preceding expression can occur: + * & ?

{n}, {nm}, {n, } or { ,m} are also examples of other quantifiers we can use in terms of regular expressions.

* (asterisk)

The following example shows any number of occurrences of the letter 'k', including none:

$ grep "lak*" file4

This means we can match 'lake' or 'la' or 'lakkkkk'.

+

The following pattern requires at least one occurrence of the letter 'k' in a string to match:

$ grep "lak+" file5

Here the letter 'k' must appear at least once, so our results can be 'lake' or 'lakkkkk', but not 'la'.

?

In the following template, the result will be the string bb or bab:

$ grep "ba?b" file6

With the given quantifier '?' we can have one occurrence of a character or none.

Important note! Let's say we have a regular expression:

$ grep "S.*l" file7

And we get the results 'Small', 'Silly', and 'Susan is a little to play ball'. But why did we get 'Susan is a little to play ball' when we were only looking for words and not the full sentence?

The thing is, this sentence matches our search criteria: it starts with the letter 'S', has any number of characters in the middle, and ends with the letter 'l'. So what can we do to fix our regex so that we only get words as output instead of whole sentences.

To do this, add the quantifier '?' to the regular expression:

$ grep "S.*?l" file7

or an escape character

The '' character is used when it is necessary to include a character that is a metacharacter or has a special meaning for a regular expression. For example, you want to find all words ending with a dot. For this we can use the expression:

$ grep "S.*?." file8

It will search and match all words that end with a dot.

So you've got a basic understanding of how regular expressions work. Practice as much as possible, create regular expressions and try to include them in your work as often as possible. You can check the correctness of using your regular expressions on a specific example on a special site .

Start your cloud journey? Take the first step right now.