sed -re -i 's/ *$//' script.sh
LibrePlanet at MIT in Boston
"Device and Personal Privacy Technology Roundup"
Sunday, 2018Mar25 @ 11:50
Short: Regular Expressions (RegEx) are sequences of characters that define a matching pattern using a specialized language.
Medium: RegEx define patterns that describe sets of strings. They are used by many common *NIX tools such as grep and sed. They can also be used in many programming languages such as Perl, PHP and Python. Database query languages also likely support RegEx.
Created in the 1950s.
Popularized by *NIX :)
sed -re -i 's/ *$//' script.sh
ip addr list | grep -E '(([12]{,1}[[:digit:]]{1,2})\.){3}([12]{,1}[[:digit:]]{1,2})'
ip addr list | grep -E '([[:lower:][:digit:]]{2}:){5}[[:lower:][:digit:]]{2}'
/|
/ |
/__|______
| |
| Star |
| __ __ |
| | || | |
| |__||__| |
| __ __()| --------------
| | || | | < Hi, I'm Star >
| | || | | --------------
| |__||__| | /
| | /
|__________| *
*
== zero or more of the previous character
Star is a modifier acting on whatever comes before it
x* == zero or more x
y* == zero or more y
sed -re -i 's/ *$//' script.sh
$ # the same as "grep x file.txt"
$ grep -E 'xy*' file.txt
$ # find at least one x, still the same as "grep x file.txt"
$ grep -E 'xx*' file.txt
$ # use grep to cat the file
$ grep -E 'x*' file.txt
$ # sloppily also look for British spelling
$ grep -E 'colou*r' file.txt
$ # search for the first zero or more r, then replace
$ echo fred | sed -re 's/r*/x/'
xfred
$ # search for all zero or more r, then replace
$ echo fred | sed -re 's/r*/x/g'
xfxexdx
$ # search for all zero or more r, then replace
$ echo anke | sed -re 's/r*/x/g'
xaxnxkxex
/|
/ |
/__|______
| Plus |
| __ __ |
| | || | |
| | || | |
| |__||__| | -----------
| __ __()| < hi I'm Plus>
| | || | | -----------
| | || | | /
| |__||__| | /
|__________| +
We’ve moved into a fancy neighborhood now!
+
== one or more of the previous character
Plus is a modifier acting on whatever comes before it
x+ == one or more x
y+ == one or more y
sed -re -i 's/ +$//' script.sh
$ # search for x followed by at least one y
$ grep -E 'xy+' file.txt
$ # find at least one x, still the same as "grep x file.txt"
$ grep -E 'x+' file.txt
$ # sloppily look for only British spelling
$ grep -E 'colou+r' file.txt
$ # search and replace the first one or more r
$ echo fred | sed -re 's/r+/x/'
fxed
$ # search and replace all one or more r
$ echo fred | sed -re 's/r+/x/g'
fxed
$ # search and replace all one or more r
$ echo anke | sed -re 's/r+/x/g'
anke
There are multiple RegEx languages
Extended RegEx - man 7 regex
Basic RegEx - man 7 regex
Perl Compatible Regex ( PCRE ) - man perlre
Fred’s House of RegEx ( FHRegEx: pronounced fregex )
For command line and *NIX tools use extended where possible
If extended not available, check man page :)
For programming languages use PCRE or native matching
# the same as "grep x file.txt"
$ grep -P 'xy*' file.txt
# find at least one x, still the same as "grep x file.txt"
$ grep -P 'xx*' file.txt
# use grep to cat the file
$ grep -P 'x*' file.txt
# sloppily also look for British spelling
$ grep -P 'colou*r' file.txt
Same as before
# search for x followed by at least one y
$ grep -P 'xy+' file.txt
# find at least one x, still the same as "grep x file.txt"
$ grep -P 'x+' file.txt
# sloppily look for British spelling
$ grep -P 'colou+r' file.txt
Same as before
/|
/ |
/__|______
| Dot |
| __ __ |
| | || | |
| | || | |
| |__||__| | ----------
| __ __()| < hi I'm Dot>
| | || | | ----------
| | || | | /
| |__||__| | /
|__________| .
.
== any single character
Dot is a wild card
Dot matches any single character except line breaks
Plus and Star match whatever comes before them, dot matches in place
x.+ == x followed by one or more characters
y.+ == y followed by one or more characters
Works the same in extended, PCRE and basic RegEx
$ # find at least one x, still the same as "grep x file.txt"
$ grep -E 'x.*' file.txt
$ # search for x followed by at least one other character
$ grep -E 'x.+' file.txt
$ # find Fred-based names ( Freddy, Fredericka, etc. ), but not Fred
$ grep -E 'Fred.+' names.txt
$ # replace r and all chars after it with x
$ echo fred | sed -re 's/r.+/x/'
fx
$ # replace r and all chars before it with x
$ echo fred | sed -re 's/.+r/x/'
xed
$ # replace f followed by any 2 characters with x
$ echo fred | sed -re 's/f../x/'
xd
Repeated Dot ( .., .* or .+ ) doesn’t require matches to be the same character
Plus and Star are greedy and will match everything they can
Plus and Star combined with Dot matches everything
$ # show all lines in the file
$ grep '.*' file.txt
$ # show all lines in the file that have at least one character
$ grep '.+' file.txt
Unless escaped, a period is a dot
grep -i '2018.*.jpg' /var/mail/account # also matches 2018_fred_jpg.png
Use \. to match a period
grep -i '2018.*\.jpg' /var/mail/account # require a period
/|
/ |
/__|______
| SCQ |
| __ __ |
| | || | |
| | || | |
| |__||__| | ----------
| __ __()| < hi I'm SCQ>
| | || | | ----------
| | || | | /
| |__||__| | /
|__________| \
Backslash quotes whatever comes right after it
\
== quote the next character, which won’t be interpreted as special character
\.
== period, not dot
# find files that end in '.jpg'
$ find ~/Images/ | grep '\.jpg$'
$ # find lines that have a plus symbol in them
$ grep '\+' math.txt
/|
/ |
/__|______
|Collection|
| __ __ |
| | || | |
| | || | |
| |__||__| | -------------------
| __ __()| < hi I'm Collection >
| | || | | -------------------
| | || | | /
| |__||__| | /
|__________| [ ]
Surround the collections with square brackets, aka bracket expression
[aeiou]
== any lower case English full vowel
$ echo abcdefhij | sed -re 's/[aeiou]/./g'
.bcd.fh.j
$ echo abcdefhij | sed -re 's/[a1b2c3]/./g'
...defhij
/|
/ |
/__|______
| Range |
| __ __ |
| | || | |
| | || | |
| |__||__| | ------------------
| __ __()| < hi I make a range>
| | || | | ------------------
| | || | | /
| |__||__| | /
|__________| -
A range can be specified inside a collection
$ echo abcdefhij | sed -re 's/[a-e]/./g'
.....fhij
echo 1234567890 | sed -re 's/[1-9]/./g'
.........0
/|
/ |
/__|______
|Char Class|
| __ __ |
| | || | |
| | || | |
| |__||__| | -----------------------
| __ __()| < hi I'm Character Class>
| | || | | -----------------------
| | || | | /
| |__||__| | /
|__________| [: :]
$ echo abcdefhij | sed -re 's/[[:ranger:][:mage:][:thief:]]/./g'
sed: -e expression #1, char 18: Invalid character class name
Character classes can be used inside collections
$ echo abcdefhij | sed -re 's/[[:alpha:]]/./g'
.........
$ echo CiHyFr82oap3 | sed -re 's/[[:lower:]]/./g'
C.H.F.82...3
$ echo CiHyFr82oap3 | sed -re 's/[[:digit:]]/./g'
CiHyFr..oap.
$ echo CiHyFr82oap3 | sed -re 's/[[:alnum:]]/./g'
............
$ ip addr list | grep -E '[12]{,1}[[:digit:]]{1,2}\.'
inet 127.0.0.1/8 scope host lo
inet 10.0.136.18/21 brd 10.0.143.255 scope global dynamic wlan0
$
$ ip addr list | grep -E '([12]{,1}[[:digit:]]{1,2}\.[12]{,1}[[:digit:]]{,2}\.[12]{,1}[[:digit:]]{1,2}\.[12]{,1}[[:digit:]]{1,2})'
$ ip addr list | grep -E '(([12]{,1}[[:digit:]]{1,2})\.){3}([12]{,1}[[:digit:]]{1,2})'
[:alpha:]
== localized alphabet
[:digit:]
== 0-9
[:alnum:]
== localized alphabet and 0-9
[:blank:]
== space, tab
[:punct:]
== any printable character which is not a blank or an alnum
[:cntrl:]
== control character
$ man 7 regex
$ echo CiHyFr82oap3 | sed -re 's/[CiH[:digit:]]/./g'
...yFr..oap.
echo CiHyFr82oap3 | sed -re 's/[C[:lower:][:digit:]]/./g'
..H.F.......
^
at the beginning of a collection means not
[^a]
$ # find Fred-based names ( Freddy, Fredericka, etc. ), but not Fred
$ grep -E 'Fred[^ ]+' names.txt
$ # find Fred-based names ( Freddy, Fredericka, etc. ), but not Fred
$ grep -E 'Fred[^[:blank:]]+' names.txt
Branching allows matching this or the other
|
== branch
$ echo fred | grep -E 'fred|anke'
fred
$ echo anke | grep -E 'fred|anke'
anke
A group can compartmentalize matches for future reference, aka atom
$ echo fred | sed -re 's/(.*)/\1 \1 \1/'
fred fred fred
$ echo fred anke | sed -re 's/(.*) (.*)/\2 \1/'
anke fred
^
== beginning of line when outside a collection at the beginning of the RegEx
$
== end of line when outside a collection at the end of the RegEx
^$
== empty line
^[[:blank:]]*$
== empty line or line with just space characters
$ grep -E root /etc/passwd
root:x:0:0:root:/root:/bin/bash
$ grep -Ec bin /etc/passwd
46
$ grep -E ^bin /etc/passwd
bin:x:2:2:bin:/bin:/usr/sbin/nologin
Use curly braces and a comma to match minimun or maximum number of times
$ echo ddd | sed -re 's/d{1,2}/q/'
qd
$ echo ddd | sed -re 's/d{1}/q/'
qdd
$ echo ddd | sed -re 's/d{1,}/q/'
q
$ # less sloppily also look for British spelling
$ grep -E 'colou{,1}r' file.txt
,-----._
. . ,' `-.__,------._
// __\\' `-.
(( _____-'___)) |
`:='/ (alf_/ |
`.=| |=' |
|) O | \
| | /\ \
| / . / \ \
| .-..__ ___ .--' \ |\ \ |
|o o | ``--.___. / `-' \ \\ \ |
`--'' ' .' / / | | | | \
| | / / | | | mmm
| || | | /| |
( .' \ \ || | |
| | \ \ // / /
| | \ \ || |_|
/ | |_/ /_|
/__/
Thank you!