Regular expressions appear to be rapidly gaining in popularity among VIM users as they discover the sheer programming power that regular expressions can provide. Historically, regular expressions have been associated with the UNIX platform and scripting languages like Perl (Practical Extraction and Report Language).

The syntax in VIM is slightly different then in Perl, but is pretty close. This makes Perl regular expression examples relevant to VIM users.

Softpanorama RegEx page contain basic information about regular expressions. I would like to stress that Vim's regexp implementation is reasonably close to Perl's and skills are transferable. Among the differences between Perl and Vim we can note:

Some meta characters are different (in yellow)

# Matching # Matching
. any character except new line
\s whitespace character \S non-whitespace character
\d digit \D non-digit
\x hex digit \X non-hex digit
\o octal digit \O non-octal digit
\h head of word character (a,b,c...z,A,B,C...Z and _) \H non-head of word character
\p printable character \P like \p, but excluding digits
\w word character \W non-word character
\a alphabetic character \A non-alphabetic character
\l lowercase character \L non-lowercase character
\u uppercase character \U non-uppercase character

Many special characters need to be escaped. For example:
\+ matches 1 or more of the preceding characters...
\{n,m} matches from n to m of the preceding characters...
\= is used instead of \? (matches 0 or 1 more of the preceding characters)

Quantifier Description

* matches 0 or more of the preceding characters, ranges or metacharacters .* matches everything including empty line
\+ matches 1 or more of the preceding characters...
\= matches 0 or 1 more of the preceding characters...
\{n,m} matches from n to m of the preceding characters...
\{n} matches exactly n times of the preceding characters...
\{,m} matches at most m (from 0 to m) of the preceding characters...
\{n,} matches at least n of of the preceding characters...


Alternatives (OR) need to be escaped

Using "\
" you can combine several expressions into one which matches any of its components. The first one matched will be used.

\(Date:\
Subject:\
From:\)\(\s.*\)

will parse various mail headings and their contents into \1 and \2, respectively. The thing to remember about VIM alternation that it is not greedy. It won't search for the longest possible match, it will use the first that matched. That means that the order of the items in the alternation is important!

Tip 3: Quick mapping to put \(\) in your pattern string
cmap ;\ \(\)

Non-greed modifiers are different and more obscure then in Perl. Perl allows you to convert any quantifier into a non-greedy version by adding an extra ? after it. So *? is a non-greedy version of a special character *


Quantifier Description
\{-} matches 0 or more of the preceding atom, as few as possible
\{-n,m} matches 1 or more of the preceding characters...
\{-n,} matches at lease or more of the preceding characters...
\{-,m} matches 1 or more of the preceding characters...

Replacement rules are different

You can group parts of the pattern expression enclosing them with "\(" and "\)" and refer to them inside the replacement pattern by their special number \1, \2 ... \9. Typical example is swapping first two words of the line:

s:\(\w\+\)\(\s\+\)\(\w\+\):\3\2\1:


where \1 holds the first word, \2 - any number of spaces or tabs in between and \3 - the second word. How to decide what number holds what pair of \(\) ? - count opening "\(" from the left.

Replacement part of the S&R has its own special characters which we are going to use to fix grammar:



# Meaning # Meaning
& the whole matched pattern \L the following characters are made lowercase
\0 the whole matched pattern \U the following characters are made uppercase
\1 the matched pattern in the first pair of \(\) \E end of \U and \L
\2 the matched pattern in the second pair of \(\) \e end of \U and \L
... ... \r split line in two at this point
\9 the matched pattern in the ninth pair of \(\) \l next character made lowercase
~ the previous substitute string \u next character made uppercase


Now the full S&R to correct non-capital words at the beginning of the sentences looks like

s:\([.!?]\)\s\+\([a-z]\):\1 \u\2:g

We have corrected our grammar and as an extra job we replaced variable number of spaces between punctuation and the first letter of the next sentence with exactly two spaces.

Perl supports a more options that can be appended to the regexp, or even embedded in it.

You can also embed variable names in a Perl regular expression. Perl replaces the name with its value; this is called "variable interpolation".

The most common task is to make replacements in a text following some certain rules using VIM search and replace command (S&R) :s(substitute). For example here is how globally replace all occurrences of vi with VIM.


%s/1999/2003/g

This is a very common idiom in vi/vim. Like in Perl you can also use several modifiers

c Confirm each substitution
g Replace all occurrences in the line (without g - only first).
i Ignore case for the pattern.
I Don't ignore case for the pattern

 

Posted by 옥탑방람보
,