Electron microscopy
 
RegEx (Regular Expression)
- Python Automation and Machine Learning for ICs -
- An Online Book -
Python Automation and Machine Learning for ICs                                        http://www.globalsino.com/ICs/        


Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

A RegEx (Regular Expression) is a sequence of characters that forms a search pattern, which can be used to check if a string contains the specified search pattern. That is, Regex are strings that contain a combination of normal and special characters describing patterns to find text within a text.

Table 2258. Characters and their useage.

Character Description Example
[] A set of characters (match one out of several)
"[l-y]"
‘[az]’ will match ‘abc’, xyz’, and ‘az’.
‘[az] will not match ‘bc’, ‘xy’, and ‘bcxy’.
[xyz] A character set. Matches any one of the enclosed characters [abc] matches the a in plain.
[^ xyz ] A negative character set. Matches any character not enclosed [^abc] matches the p in plain.
[ a-z ] A range of characters. Matches any character in the specified range.
[abcd] is the same as [a-d]. They match the "b" in "brisket", and the "a" or the "c" in "arch", but not the "-" (hyphen) in "non-profit".
[abcd-] and [-abcd] match the "b" in "brisket", the "a" or the "c" in "arch", and the "-" (hyphen) in "non-profit".
[\w-] is the same as [A-Za-z0-9_-]. They both match any of the characters in "no_reply@example-server.com" except for the "@" and the ".".

"[a-z]" matches any lowercase alphabetic character in the range a through z.
[^ m-z ] A negative range characters. Matches any character not in the specified range [^m-z] matches any character not in the range m through z.
[ character_group ] Matches any single character in character_group. By default, the match is case-sensitive [ae]: "a" in "gray"; "a", "e" in "lane"
[^ character_group ] Negation: Matches any single character that is not in character_group. By default, characters in character_group are case-sensitive
[^aei]: "r", "g", "n" in "reign"
[ first - last ] Character range: Matches any single character in the range from first to last [A-Z]: "A", "B" in "AB123"
\ Signals a special sequence (can also be used to escape special characters) "\d"
n matches the character n. "\n" matches a newline character.
The sequence \\ matches \ and \( matches (.
\cX Matches a control character using caret notation, where "X" is a letter from A–Z (corresponding to code points U+0001–U+001F). For example, /\cM/ matches "\r" in "\r\n".  
\xhh Matches the character with the code hh (two hexadecimal digits).  
\uhhhh Matches a UTF-16 code-unit with the value hhhh (four hexadecimal digits).  
\u{hhhh} or \u{hhhhh } (Only when the u flag is set.) Matches the character with the Unicode value U+hhhh or U+hhhhh (hexadecimal digits).  
     
     
     
\ number Backreference. Matches the value of a numbered subexpression. (\w)\1: "ee" in "seek"
\k< name > Named backreference. Matches the value of a named expression. (?<char>\w)\k<char>: "ee" in "seek"
\a Matches a bell character, \u0007.
"\u0007" in "Error!" + '\u0007'
 
\A To check if certain characters are present at the start of a string or not ‘\Ahell’ will match ‘hello’ and ‘hello world’.
‘\Ahell’ will not match ‘hey hello’.
\Athe the sun: Match; In the sun: No match  
\b

To check if certain characters are present at the starting or beginning of a word. Note: \b works on the words of a string, not the whole string. In a character class, matches a backspace, \u0008.

‘\bhell’ will match ‘hello world’, ‘hey hello’, and ‘world hell’.
‘\bhell’ will not match ‘hey world’ and ‘world hey’.
[\b]{3,}: "\b\b\b\b" in "\b\b\b\b"
\bfoo football: Match; a football: Match; afootball: No match  
foo\b the foo: Match; the afoo test: Match; the afootest: No match  
\B

To check if certain characters are not present at the starting or beginning of a word. It is the opposite of /b

‘\Bhell’ will not match ‘hello world’, ‘hey hello’, and ‘world hell’.
‘\Bhell’ will match ‘hey world’ and ‘world hey’.
\Bfoo football: No match; a football: No match; afootball: Match  
foo\B

the foo: No match; the afoo test: No match; the afootest: Match

 
\c X
\c x
Matches the ASCII control character that is specified by X or x, where X or x is the letter of the control character \cC: "\x0003" in "\x0003" (Ctrl-C)
     
\d Matches decimal digit 0-9. \d matches if decimal digits are present in the string.

‘\d’ will match ‘hey123’, ‘1234’, and ‘123hello234’.
‘\d’ will not match ‘hey’ and ‘hello’.
12abc3: 3 matches (at 12abc3); Python: No match

\D Matches any character that is not a decimal digit. \D matches if decimal digits are not present in the string. It is the opposite of \d. ‘\D’ will not match ‘hey123’, ‘1234’, and ‘123hello234’.
‘\D’ will match ‘hey’ and ‘hello’.

Input = "My phone number is 514-767-2653."
Output = "5147672653" Link"

\e Matches an escape, \u001B. \e: "\x001B" in "\x001B"
\f

Matches a form-feed character. Matches a form feed, \u000C.

[\f]{2,}: "\f\f\f" in "\f\f\f"
\G The match must occur at the point where the previous match ended, or if there was no previous match, at the position in the string where matching started \G\(\d\): "(1)", "(3)", "(5)" in "(1)(3)(5)[7](9)"
\l

Changes the case of the next character to the lower case. Use this type of regex in the replace field.

 
\L

Changes the case of all the subsequent characters up to \E to the lower case. Use this type of regex in the replace field.

 
\n Matches a newline character. Matches a new line, \u000A. \r\n(\w+): "\r\nThese" in "\r\nThese are\ntwo lines."
\n Matches n, where n is an octal escape value. Octal escape values should be 1, 2, or 3 digits long \11 and \011 both match a tab character.
\0011 is the equivalent of \001&1.
Octal escape values should not exceed 256. If they do, only the first two digits comprise the expression. Allows ASCII codes to be used in regular expressions.
\p{ name } Matches any single character in the Unicode general category or named block specified by name \p{Lu}, \p{IsCyrillic}: "C", "L" in "City Lights", "Д", "Ж" in "ДЖem"
\P{ name } Matches any single character that is not in the Unicode general category or named block specified by name
\P{Lu}, \P{IsCyrillic}: "i", "t", "y" in "City", "e", "m" in "ДЖem"
\r Matches a carriage return character. Matches a carriage return, \u000D. (\r is not equivalent to the newline character, \n.) \r\n(\w+): "\r\nThese" in "\r\nThese are\ntwo lines."
\s Matches a single whiteslpace character like space, newline, tab, return  
\S Matches any character not part of \s  
\t Matches a tab character. Matches a tab, \u0009. (\w+)\t: "item1\t", "item2\t" in "item1\titem2\t"
\xn

Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long

\x41 matches A. \x041 is equivalent to \x04&1.
Allows ASCII codes to be used in regular expressions.
\num Matches num, where num is a positive integer, denoting a reference back to remembered matches (.)\1 matches two consecutive identical characters
\ nnn Uses octal representation to specify a character (nnn consists of two or three digits) \w\040\w: "a b", "c d" in "a bc d"
sub() The function is used to replace the matched substring with another substring  
subn() The function is similar to the sub() function but it returns a tuple.  
\u

Changes the case of the next character to the upper case. Use this type of regex in the replace field.

 
\u nnnn Matches a Unicode character by using hexadecimal representation (exactly four digits, as represented by nnnn) \w\u0020\w: "a b", "c d" in "a bc d"
\U

Changes the case of all the subsequent characters up to \E to the upper case. Use this type of regex in the replace field.

 
\v Matches a vertical tab character. Matches a vertical tab, \u000B. [\v]{2,}: "\v\v\v" in "\v\v\v"
\w Matches any signle letter, digit, o runderscore. \w match any character that is alphanumeric, meaning, it is either an alphabet, digit, or underscore (equivalent to [a-zA-Z0-9_]). ‘\w’ will match ‘heyhello123’, ‘hey’, and ‘12234’.
‘\w’ will match ‘*&^&*’. String "12&": ;c": 3 matches (at 12&": ;c); string "%"> !": No match
\W Matches any character not part of \w. \W matches character other than alphanumeric characters. It is the opposite of \w. ‘\W’ will not match ‘heyhello123’, ‘hey’, and ‘12234’.
‘\W’ will match ‘*&^&*’.
\x nn Uses hexadecimal representation to specify a character (nn consists of exactly two digits) \w\x20\w: "a b", "c d" in "a bc d"
\$

Finds a $ character.

 
\(, \) Finds brackets  
\\$

This regex entered in the search field, means that you are trying to find a \ character at the end of the line.

 
(?!)

This is a pattern for "negative lookahead"

A(?!B) means that RubyMine will search for A, but only if not followed by B.
(?=)

This is a pattern for "positive lookahead"

A(?=B) means that RubyMine will search for A, but match if only followed by B.
(?<=)

This is a pattern for "positive lookbehind"

(?<=B)A means that RubyMine will search for A, but only if there is B before it.
(?<!) This is a pattern for "negative lookbehind" (?<!B)A means that RubyMine will search for A, but only if there is no B before it.
x(?=y) Lookahead assertion: Matches "x" only if "x" is followed by "y". For example, /Jack(?=Sprat)/ matches "Jack" only if it is followed by "Sprat".
/Jack(?=Sprat|Frost)/ matches "Jack" only if it is followed by "Sprat" or "Frost". However, neither "Sprat" nor "Frost" is part of the match results.
 
x(?!y) Negative lookahead assertion: Matches "x" only if "x" is not followed by "y". For example, /\d+(?!\.)/ matches a number only if it is not followed by a decimal point. /\d+(?!\.)/.exec('3.141') matches "141" but not "3".
 
(?<=y)x Lookbehind assertion: Matches "x" only if "x" is preceded by "y". For example, /(?<=Jack)Sprat/ matches "Sprat" only if it is preceded by "Jack". /(?<=Jack|Tom)Sprat/ matches "Sprat" only if it is preceded by "Jack" or "Tom". However, neither "Jack" nor "Tom" is part of the match results.  
(?<!y)x Negative lookbehind assertion: Matches "x" only if "x" is not preceded by "y". For example, /(?<!-)\d+/ matches a number only if it is not preceded by a minus sign. /(?<!-)\d+/.exec('3') matches "3". /(?<!-)\d+/.exec('-3') match is not found because the number is preceded by the minus sign.
 
(?<Name>x) Named capturing group: Matches "x" and stores it on the groups property of the returned matches under the name specified by <Name>. The angle brackets (< and >) are required for group name.
Extract the United States area code from a phone number, we could use /\((?<area>\d\d\d)\)/. The resulting number would appear under matches.groups.area.
 
(?:x) Non-capturing group: Matches "x" but does not remember the match. The matched substring cannot be recalled from the resulting array's elements ([1], …, [n]) or from the predefined RegExp object's properties ($1, …, $9).
 
\k<Name> A back reference to the last substring matching the Named capture group specified by <Name>.
/(?<title>\w+), yes \k<title>/ matches "Sir, yes Sir" in "Do you copy? Sir, yes Sir!".
 
w+ This expression matches the alphanumeric character in the string  
. Any character (except newline character) "my.o"
a.z’ will match ‘abz’, ‘a1z’, and ‘azz’.
‘a.z’ will not match ‘abbz’, ‘a11z’, and ‘azzz’.
‘a..z’ will match ‘abcz’, ‘a12z’, and ‘azzz’.
‘a..z’ will not match ‘abz’, ‘abbbbz’, and ‘az’.
\. (dot): A . in regex is a metacharacter, it is used to match any character, instead, for dot, you need to escape it ( "\.").
page2256
Input = 'gfgf.dAAAUVW(AZA1234)(ZYNY67e)AAZZZ.uijjk eeee.xen 12345.3xy'
Pattern = '[\w\.-]+\.[\w\.-]+'
Output = "gfgf.dAAAUVW
               AAZZZ.uijjk
               eeee.xen
               12345.3xy" Link
^ Starts with. This expression matches the start of a string "^Hellow"
[^Z] Is called a negated character class: it matches anything but Z.  
$ Ends with "Yougui$"
$ number Substitutes the substring matched by group number. Pattern: \b(\w+)(\s)(\w+)\b; Replacement pattern: $3$2$1 Input string: "one two"; Result string: "two one"
${ name } Substitutes the substring matched by the named group name.
Pattern: \b(?<word1>\w+)(\s)(?<word2>\w+)\b; Replacement pattern: ${word2} ${word1} Input string: "one two"; Result string: "two one"
$$ Substitutes a literal "$".
Pattern: \b(\d+)\s?USD; Replacement pattern: $$$1 Input string: "103 USD"; Result string: "$103"
$& Substitutes a copy of the whole match.
Pattern: \$?\d*\.?\d+; Replacement pattern: **$&** Input string: "$1.30"; Result string: "**$1.30**"
$` Substitutes all the text of the input string before the match.
Pattern: B+; Replacement pattern: $` Input string: "AABBCC"; Result string: "AAAACC"
$' Substitutes all the text of the input string after the match.
Pattern: B+; Replacement pattern: $' Input string: "AABBCC"; Result string: "AACCCC"
$+ Substitutes the last group that was captured. Pattern: B+(C+); Replacement pattern: $+ Input string: "AABBCCDD"; Result string: "AACCDD"
$_

Substitutes the entire input string.

Pattern: B+; Replacement pattern: $_ Input string: "AABBCC"; Result string: "AAAABBCCCC"
* Zero or more occurrence "globalsi*". a.*c: "abcbc" in "abcbc"
+    
+? Matches the previous element one or more times, but as few times as possible. "be+?": "be" in "been", "be" in "bent"
{} Exactly the specified number of occurrences "globalsino{3}"
‘a{2,6}’ will match ‘aaaa’, ‘aaaaa’, and ‘aaaaa’.
‘a{2,6}’ will not match ‘a’ and ‘aaaaaaaaa’.
{n} n is a non negative integer. Matches exactly n times. o{2} does not match the o in Bob, but matches the first two o's in foooood.
",\d{3}": ",043" in "1,043.6", ",876", ",543", and ",210" in "9,876,543,210"
{n,} n is a non negative integer. Matches at least n times o{2,} does not match the o in Bob and matches all the o's in "foooood."
o{1,} is equivalent to o+. o{0,} is equivalent to o*.
"\d{2,}": "166", "29", "1930"
{m, n} The modifier means that the pattern must have m-n characters
o{1,3} matches the first three o's in "fooooood." o{0,1} is equivalent to o?.
"\d{3,5}": "166", "17668", "19302" in "193024"
|
Either or. Matches any one element separated by the vertical bar (|) character "Global|Sino"
‘a|z’ will match ‘abc’, ‘xyz’, and ‘abcz’.
‘a|z’ will not match ‘aaa’ and ‘zzz’.
th(e|is|at): "the", "this" in "this is the day."
(?( expression ) yes | no )
or
(?( expression ) yes )

Matches yes if the regular expression pattern designated by expression matches; otherwise, matches the optional no part. expression is interpreted as a zero-width assertion.

To avoid ambiguity with a named or numbered capturing group, you can optionally use an explicit assertion, like this:

(?( (?= expression ) ) yes | no ) (?(A)A\d{2}\b|\b\d{3}\b): "A10", "910" in "A10 C103 910"
(?( name ) yes | no )
or
(?( name ) yes )
Matches yes if name, a named or numbered capturing group, has a match; otherwise, matches the optional no.
(?<quoted>")?(?(quoted).+?"|\S+\s): "Dogs.jpg ", "\"Yiska playing.jpg\"" in "Dogs.jpg \"Yiska playing.jpg\""
+ CHecks if the precediing character apears one or more times. One or more occurrence
‘ab+z’ will match ‘abz’, ‘abbz’, and ‘abbbbbbz’.
‘ab+z’ will not match ‘az’.
"globalsi+".
"be+": "bee" in "been", "be" in "bent"
.+?    
Input1 = 'gfgfdAAUVW(AZA1234)(ZYNY67e)AAZZZuijjk'
Pattern1 = 'AA(.+?)AA'
Output1 = "UVW(AZA1234)(ZYNY67e)" Link
Input2 = 'gfgfdAAUVW(AZA1234)(ZYNY67e)AAZZZuijjk'
Pattern2 = 'AA(.+?)AA'
Output2 = "UVW(AZA1234)(ZYNY67e)" Link
? Checks if the preceding character appears exactly zero or one time a?ve? matches the ve in never. "rai?": "rai" in "rain".
?? Matches the previous element zero or one time, but as few times as possible. "rai??": "ra" in "rain"
{ n }? Matches the preceding element exactly n times. ",\d{3}?": ",043" in "1,043.6", ",876", ",543", and ",210" in "9,876,543,210"
{ n ,}? Matches the previous element at least n times, but as few times as possible "\d{2,}?": "166", "29", "1930"
{ n , m }? Matches the previous element between n and m times, but as few times as possible "\d{3,5}?": "166", "17668", "193", "024" in "193024"
() Capture and group  
(regex)   (abc){3} matches abcabcabc. First group matches abc.
\(regex\)   \(abc\){3} matches abcabcabc. First group matches abc.
(?:regex)   (?:abc){3} matches abcabcabc.
( subexpression ) Captures the matched subexpression and assigns it a one-based ordinal number (\w)\1: "ee" in "deep"
(?< name > subexpression )
or
(?' name ' subexpression )
Captures the matched subexpression into a named group (?<double>\w)\k<double>: "ee" in "deep"
(?< name1 - name2 > subexpression )
or
(?' name1 - name2 ' subexpression )
Defines a balancing group definition. For more information, see the "Balancing Group Definition" section in Grouping Constructs
(((?'Open'\()[^\(\)]*)+((?'Close-Open'\))[^\(\)]*)+)*(?(Open)(?!))$: "((1-3)*(3-1))" in "3+2^((1-3)*(3-1))"
(?: subexpression ) Defines a noncapturing group Write(?:Line)?: "WriteLine" in "Console.WriteLine()", "Write" in "Console.Write(value)"
(?imnsx-imnsx: subexpression ) Applies or disables the specified options within subexpression. For more information, see Regular Expression Options A\d{2}(?i:\w+)\b: "A12xl", "A12XL" in "A12xl A12XL a12xl"
(?= subexpression ) Zero-width positive lookahead assertion. \b\w+\b(?=.+and.+): "cats", "dogs" in "cats, dogs and some mice."
i Use case-insensitive matching \b(?i)a(?-i)a\w+\b: "aardvark", "aaaAuto" in "aardvark AAAuto aaaAuto Adam breakfast"
x Ignore unescaped white space in the regular expression pattern. \b(?x) \d+ \s \w+: "1 aardvark", "2 cats" in "1 aardvark 2 cats IV centurions"
(?imnsx-imnsx) Sets or disables options such as case insensitivity in the middle of a pattern.For more information, see Regular Expression Options. \bA(?i)b\w+\b matches "ABA", "Able" in "ABA Able Act"
(?# comment ) Inline comment. The comment ends at the first closing parenthesis \bA(?#Matches words starting with A)\w+\b
# [to end of line] X-mode comment. The comment starts at an unescaped # and continues to the end of the line. (?x)\bA\w+\b#Matches words starting with A
\1 through \9   (abc|def)=\1 matches abc=abc or def=def, but not abc=def or def=abc.
\k<1> through \k<99>   (abc|def)=\k<1> matches abc=abc or def=def, but not abc=def or def=abc.
\k'1' through \k'99'   (abc|def)=\k'1' matches abc=abc or def=def, but not abc=def or def=abc.
\g1 through \g99   (abc|def)=\g1 matches abc=abc or def=def, but not abc=def or def=abc.
\g{1} through \g{99}   (abc|def)=\g{1} matches abc=abc or def=def, but not abc=def or def=abc.
\g<1> through \g<99>   (abc|def)=\g<1> matches abc=abc or def=def, but not abc=def or def=abc.
\g'1' through \g'99'   (abc|def)=\g'1' matches abc=abc or def=def, but not abc=def or def=abc.
(?P=1) through (?P=99)   (abc|def)=(?P=1) matches abc=abc or def=def, but not abc=def or def=abc.
\k<-1>, \k<-2>, etc.   (a)(b)(c)(d)\k<-3> matches abcdb.
\k'-1', \k'-2', etc   (a)(b)(c)(d)\k'-3' matches abcdb.
\g-1, \g-2, etc.   (a)(b)(c)(d)\g-3 matches abcdb.
\g{-1}, \g{-2}, et   (a)(b)(c)(d)\g{-3} matches abcdb.
\g<-1>, \g<-2>, etc   (a)(b)(c)(d)\g<-3> matches abcdb.
\g'-1', \g'-2', etc.   (a)(b)(c)(d)\g'-3' matches abcdb.
(a)?\1   (a)?\1 matches aa but fails to match b.
(a)?\2|b   (a)?\2|b matches b in aab.
(a\1?){3}   (a\1?){3} matches aaaaaa.
(\2?(a)){3}   (\2?(a)){3} matches aaaaaa.
*?, +?, ??    
*+, ++, ?+    
{m,n}?    
{m,n}+ x{m,n}+ is equivalent to (?>x{m,n}).  
^a...s$ Match alias and abyss  
[abc] 1 match "a", 2 matchs "ac", 5 match "abc de ca"  
[a-e] [a-e] is the same as [abcde].  
[1-4] [1-4] is the same as [1234].  
[0-39] [0-39] is the same as [01239].  
[^abc] [^abc] means any character except a or b or c.  
[^0-9] [^0-9] means any non-digit character.
 
^ab 1 match in "abc"  
ma*n

mn: 1 match; man: 1 match; maaan: 1 match; main: No match (a is not followed by n); woman: 1 match

 
ma+n mn: No match (no a character); man: 1 match; maaan: 1 match; main: No match (a is not followed by n); woman: 1 match  
ma?n

mn: 1 match; man: 1 match; maaan: No match (more than one a character); main: No match (a is not followed by n); woman: 1 match

 
a{2,3}

abc dat: No match; abc daat: 1 match (at daat); aabc daaat: 2 matches (at aabc and daaat); aabc daaaat: 2 matches (at aabc and daaaat)

 
[0-9]{2,4} ab123csde: 1 match (match at ab123csde); 12 and 345673: 3 matches (12, 3456, 73); 1 and 2: No match  
a|b cde: No match; ade: 1 match (match at ade); acdbea: 3 matches (at acdbea)  
(a|b|c)xz ab xz: No match; abxz: 1 match (match at abxz); axz cabxz: 2 matches (at axzbc cabxz)  
\$a \$a match if a string contains $ followed by a. Here, $ is not interpreted by a RegEx engine in a special way.  
.* Greedy: only one match. it will match all the way to the end, and then backtrack until it can match. /page2256
.*? = A[^Z]*Z = A.*?z. Reluctant: can be more than one matchs. * will match nothing, but then will try to match extra characters until it matches.
Input1 = 'gfgfdAAAUVW(AZA1234)(ZYNY67e)AAZZZuijjk'
Pattern1 = ".*AAA(.*)ZZZ.*"
Output1 = ('UVW(AZA1234)(ZYNY67e)AA',) Link
Input2 = 'gf(gf.dAAAUVW(AZA1234)(ZYNY67e)AAareZZZ.uijjk eeee.xen 12345.3xy, those are cars.) '
Pattern2 = "\(.*\)"
Output2 = (gf.dAAAUVW(AZA1234)(ZYNY67e)AAareZZZ.uijjk eeee.xen 12345.3xy, those are cars.) Link
Input3 = 'gfgf.dAAAUVW(AZA1234)(ZYNY67e)AAareZZZ.uijjk eeee.xen 12345.3xy, those are cars. '
Pattern3 = "\(.*?\)"
Output3 = "(gf.dAAAUVW(AZA1234)
                  (ZYNY67e)" Link
Input4 = 'gf(gf.dAAAUVW(AZA1234)(ZYNY67e)AAareZZZ.uijjk eeee.xen 12345.3xy, those are cars.) '
Pattern4 = "A.*?Z"
Output4 = "AAAUVW(AZ
                   A1234)(Z
                   AAareZ" Link"
Input5 = 'gfgf.dAAAUVW(AZA1234)(ZYNY67e)AAareZZZ.uijjk eeee.xen 12345.3xy, those are cars. '
Pattern5 = "A.*Z"
Output5 = "AAAUVW(AZA1234)(ZYNY67e)AAareZZZ" Link
*? Matches the previous element zero or more times, but as few times as possible a.*?c: "abc" in "abcbc"
(subexpression) Matches subexpression and remembers the match. If a part of a regular expression is enclosed in parentheses, that part of the regular expression is grouped together. Thus a regex operator can be applied to the entire group.
If you need to use the matched substring within the same regular expression, you can retrieve it using the backreference \num, where num = 1..n.
If you need to refer the matched substring somewhere outside the current regular expression (for example, in another regular expression as a replacement string), you can retrieve it using the dollar sign $num, where num = 1..n.
If you need to include the parentheses characters into a subexpression, use \( or \).
 
re.A Perform ASCII-only matching instead of full Unicode matching  
re.I Performs case-insensitive matching  
re.L Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior(\b and \B).  
re.M Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).  
re.S Makes a period (dot) match any character, including a newline.  
re.U Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B.  
re.X Permits "cuter" regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker.  
String Input1 = 'gfgfdAAAUVW(AZA1234)(ZYNY67e)AAZZZuijjk'
Pattern1 = "AAA"
Output1 = "UVW(AZA1234)(ZYNY67e)AA" Link
Input2 = 'gfgf.dAAAUVW(AZA1234)(ZYNY67e)AAareZZZ.uijjk eeee.xen 12345.3xy, those are cars. '
Pattern2 = r'(.*) are (.*?) .*', text, re.M|re.I)'
Output2 = Contents before and after the string "are":
matchObj.group() : gfgf.dAAAUVW(AZA1234)(ZYNY67e)AAareZZZ.uijjk eeee.xen 12345.3xy, those are cars.
matchObj.group(1) : gfgf.dAAAUVW(AZA1234)(ZYNY67e)AAareZZZ.uijjk eeee.xen 12345.3xy, those
matchObj.group(2) : cars. Link
     
     
     
     
     
     
     
     
     
     
     
     

 

Project: Search phone number. Link. Code.
Input = "My phone number is 514-567-5678"
Pattern = "\d\d\d-\d\d\d-\d\d\d\d"
Output = "514-567-5678"

Project: Search file names with a pattern. Link. Code.
Input = "["20201023_U8Z6_VYA.png"]"
Pattern = "([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[A-Z][0-9][A-Z][0-9]_[A-Z][A-Z][A-Z]).png"
Output = "20201023_U8Z6_VYA
20201023_U8Z6_VYA.png"

Project: Search, match and ^.  Link. Code:
Input = "...Lia"
Pattern = "ui Lia ui Liao"
Output = ^marked "ui Lia ui Liao"

Project: Clean text by removing special characters and double spaces. Link. Code:
Input = "My#name$&is#&Yougui Liao. I%live$in##$#US."
Pattern1 = "[#$%&]"
Pattern2 = "    " to " "
Output = "My name is Yougui Liao. I live in US.
                My name is Yougui Liao. I live in US."

Project: Create and varify the password with specific condition: Link. Code.
Input = "YOugui17928^"
Pattern = "\d{4,8}[a-zA-Z]{2,}.*[^!@$%&]$"
Output = "YOugui17928^"
Project: Create and varify the password with specific condition: Link. Code.
Input = "12345YOugui17928^"
Pattern = "\d{4,8}[a-zA-Z]{2,}.*[^!@$%&]$"
Output = "12345YOugui17928^"


Project: Find the time. Link. Code:
Input = "Hellow, your appointment has been confirmed for 1st of september 2022 18:30"
Pattern = "\d{1,2}[a-z]{2}\sof\s[a-zA-Z]+\s\d{4}\s\d{1,2}:\d{2}"
Output = "['1st of september 2022 18:30']
               1st of september 2022 18:30"

Project: Find match with w+ and ^: Link. Code.
Input = "Yougui06,globalsino is my website"
Pattern = "^\w+"
Output = "['Yougui06']"

Project: Extract email addresses. Link. Code.
Input = "yougui.liao@google.com, what is it?, Hellow@hotmail.com, globalsino@yahoomail.com"
Pattern = "[\w\.-]+@[\w\.-]+"
Output = "yougui.liao@google.com
                Hellow@hotmail.com
                globalsino@yahoomail.com"

Project: Find how many string "xy" in the text. Link. Code.
Input = "Hey, my god! My name is Yougui Liao! Myserli!"
Pattern = "my", Input.lower()
Output = "['my', 'my', 'my']"

Project: Simple matching with the beginning and end letter. Link. Code.
Input = "YZH"
Pattern = "Y.H$"
Output = "['YZH']"

Project: Simplely match the beginning of a word. Link. Code.
Input = "YZH sd"
Pattern = "^Y"
Output = "['Y']"

Project: Simplely match the beginning and end of a word. Link. Code.
Input = "abbz", "abz", "az", "YZdZH"
Pattern = "ab*z"
Output = "['abbz']
                ['abz']
                ['az']
                []"

Project: Split a string by the number with split(). Link. Code.
Input = "sdl332sdp98hwen93eue4"
Pattern = "\d"
Output = "['sdl', '', '', 'sdp', '', 'hwen', '', 'eue', '']"

Project: Use sub() to replace something. Link. Use sub() to replace something. Code.
Input1 = "hellow1234HowAreYou"
Input2 = "6666"
Pattern = "\d+"
Output = "hellow6666HowAreYou"

Project: Extract substrings between brackets (including brackets). Link. Code.
Input = "gfgfd_(AAA1234)_ZZZuijjk"
Pattern = "_\((.+?)\)_"
Output = "AAA1234
               (AAA1234)"

Starting from here, they are new ....


Project: Find keywords with .findall(). Link. Code.
Input = "Scattering but of some electron diffusion caused by the gradual loss of the Electron Energy and by multiple scattering."
Pattern = "r"electron", myString", "r"electron", myString, re.I", "r"electron", myString, re.IGNORECASE"
Output = "['electron']
              ['electron', 'Electron']
              ['electron', 'Electron']"

Project: Break string at newline sign \n. Link.Code.
Input = "ML\nand AI\n, Scattering but of some electron diffusion caused by the gradual loss of the Electron Energy and by multiple scattering."
Pattern = (r".+", myString), (r".+", myString, re.S), (r".+", myString, re.DOTALL)
Output = "ML
              With re.S flag: ML
              and AI
              , Scattering but of some electron diffusion caused by the gradual loss of the Electron Energy and by multiple scattering.
              With re.DOTALL flag: ML
              and AI
              , Scattering but of some electron diffusion caused by the gradual loss of the Electron Energy and by multiple scattering."

Project: Check the numbers of the matches of letters and digits at beggining and end of the string. Link. Code.
Input = "Scattering but of some electron diffusion caused by the 123456"
Pattern = "(^\w{2,}).+(\d{5}$)"
Output = "Scattering
              23456"

Project: Find 3-letter word at the start and n-digit at the end of each newline, and 3-letter word at the start of each newline. Link. Code.
Input = "Scattering, 123, but 897\noff some electron \nof diffusion caused by the 123456"
Pattern = "^\w{3}", r"\d{2}$", r"^\w{3}", re.MULTILINE, r"\d{2}$", re.M"
Output = "['Sca']
              ['56']
              ['Sca', 'off'] # Do not have 'of'
              ['97', '56']"

Project: Find 3-letter word and match all and only 3-letter ASCII word and numbers. Link. Code.
Input = "廖廖廖廖 廖你廖 Scattering, 123, but 897\noff some electron \nof diffusion caused by the 123456"
Pattern = r"\b\w{3}\b", r"\b\w{3}\b"
Output = "['廖你廖', '123', 'but', '897', 'off', 'the']
               ['123', 'but', '897', 'off', 'the']"

Project: Extracts a substring between/before/after two words. Link.Code.
Input = "Scattering, 123, but 897 off some electron of diffusion caused by the 123456"
Pattern = "but(.+?)123456"
Output = " 897 off some electron of diffusion caused by the "

Project: Check if two strings match. Link. Code.
Input = "Cookie A"
Pattern = "Cookie A"
Output = "Matches!"

Project: Finds match for the pattern to the end if it occurs at start of the string. Link. Code.
Input = "Scattering, 123, but 897 off some electron of diffusion caused by the 123456"
Pattern = r"Scattering", .start(), .end()
Output = "0
              10"


Project: Searches for index of the first occurrence of RE pattern within string from any position of the string but it only returns the first occurrence of the search pattern (including spaces). Link. Code.
Input = "Scattering, 123, but 897 off some electron of diffusion caused by the 123456"
Pattern = search(r"of", .start()), .end()
Output = "25
              27"

Project: Find all strings/words with start/first letters. Link. Code.
Input = "Scattering, 123, but 897 off some electron of diffusion caused by being the 123456 with the fit."
Pattern = .findall(r'\b[aeioufgAEIOU]\w+'
Output = "['off', 'electron', 'of', 'fit']"

Project: Find all digits in a string. Link.
Input = "Scattering, 123, but 897 off some electron of diffusion caused by being the 123456 with the fit, 12th December 2012."
Pattern = .compile('\d'), .findall()
Output = "['1', '2', '3', '8', '9', '7', '1', '2', '3', '4', '5', '6', '1', '2', '2', '0', '1', '2']"

Project: Check the letters in a string layer by layer. Link. Code.
Input = "efgh"
Pattern = "(e(f)g)h" with bracket
Output = "efgh
               efg
               f"

Project: Find words/strings with certain patterns. Link. Code.
Input = "Scattering good 123456 but 897 off some electron of hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .findall(r'\w*oo\w*', .findall(r'\w*ut\w*'
Output = "['good', 'cool', 'yooy']
               ['but', 'hyutt']"

Project: Find/split all strings and numbers from a string and replace space with hyphen. Link. Code.
Input = "Scattering good 123456 but 897 off some electron of hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = r"\w", .split(r"\s", .sub(r"\s", "-"
Output = "['Scattering', 'good', '123456', 'but', '897', 'off', 'some', 'electron', 'of', 'hyutt', 'diffusion', 'cool', 'caused', 'yooy', 'by', 'being', 'the', '123456', 'with', 'the', 'fit,', '12th', 'December', '2012.']
               Scattering-good-123456-but-897-off-some-electron-of-hyutt-diffusion-cool-caused-yooy-by-being-the-123456-with-the-fit,-12th-December-2012."

Project: Extract the words with capital as the first letter, or lower case words only. Link. Code.
Input = "4545 456 Good Scattering 45 456 Cars 123456 but 897 off Abcdef bvndef some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .compile(r'[A-z]\w+'), .compile(r'[a-z]\w+'), .compile(r'[A-Z]\w+')
Output = "['Scattering', 'good', 'Cars', 'but', 'off', 'Abcdef', 'bvndef', 'some', 'electron', 'of', 'Hyutt', 'diffusion', 'cool', 'caused', 'yooy', 'by', 'being', 'the', 'with', 'the', 'fit', 'th', 'December']
               ['cattering', 'good', 'ars', 'but', 'off', 'bcdef', 'bvndef', 'some', 'electron', 'of', 'yutt', 'diffusion', 'cool', 'caused', 'yooy', 'by', 'being', 'the', 'with', 'the', 'fit', 'th', 'ecember']
               ['Scattering', 'Cars', 'Abcdef', 'Hyutt', 'December']"

Project: groups() returns a tuple of subgroups that match the given string of numbers. Link. Code.
Input = "my number is +4545 456 12345 Good Scattering 45 456 Cars 123456 but 897 off Abcdef bvndef some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .compile(r'(\+\d{4}) (\d{3}) (\d{5})')
Output = "('+4545', '456', '12345')"

Project: Password rule to ensure at least one uppercase letter, at least one lowercase letter, at least one digit, one special character and at least 8 characters long. Link. Code.
Input = "inY*m123"
Pattern = "^(?=.*?[A-Z]) # ensures user inputs at least one uppercase letter
              (?=.*?[a-z]) # ensures user inputs at least one lowercase letter
              (?=.*?[0-9]) # ensures user inputs at least one digit
              (?=.*?[#?!@$%^&*-]) # ensures user inputs one special character
              .{8,}$ # ensures that password is at least 8 characters long"
Output = "Password in correct format: inY*m123"

Project: Search with keystring, and get start and end index with span(). Link. Code.
Input = "my number is +4545 456 12345 good Scattering 45 456 Cars 123456 but 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = re.finditer("goo.", .span()
Output = "(29, 33)
              (90, 94)"

Project: Pattern with part of the letters. Link. Code.
Input = "my number that sat mat is +4545 456 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .findall("t[h-m]at", .findall("[h-m]at", .findall("[hspm]at"
Output = "that

              hat
              mat

              hat
              sat
              mat"

Project: Pattern with part of the letters. Link. Code.
Input = "my number that sat mat is +4545 456 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .findall("[^h-m]at"
Output = "sat
              cat"

Project: Replace a word found by searching by another word. Link. Code.
Input = "my number that sat mat is +4545 456 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = re.compile("[s]at"), regex.sub("NICE"
Output = "my number that NICE mat is +4545 456 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."

Project: Add spaces between letters. Link. Code.
Input = "my number that sat mat is +4545 456 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = " "
Output = " m y n u m b e r t h a t s a t m a t i s + 4 5 4 5 4 5 6 1 2 3 4 5 g o o d S c a t t e r i n g 4 5 4 5 6 C a r s 1 2 3 4 5 6 b u t i n g 8 9 7 o f f A b c d e f b v n d e f g o o g l e s o m e e l e c t r o n o f H y u t t d i f f u s i o n c o o l c a u s e d y o o y b y b e i n g t h e 1 2 3 4 5 6 w i t h t h e f i t , 1 2 t h D e c e m b e r 2 0 1 2 ."

Project: Extract phone number from a webpage with space: \s; bracket: \( and \). Link. Code.
Input = url = "http://www.summet.com/dmsi/html/codesamples/addresses.html"
Pattern = r"\(\d{3}\)\s\d{3}-\d{4}" # space: \s; bracket: \( and \)
Output = "(257) 563-7401
               (372) 587-2335
               (786) 713-8616
               (793) 151-6230
               (492) 709-6392
               (654) 393-5734"

Project: check that a string contains only a certain set of characters. Link. Code.
Input = "my number that sat mat is +4545 456 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
         = "*y&%@#!}{"
Pattern = r'[^a-zA-Z0-9,+.\s]' # \s: space
Output = "True
               False"

Project: matches a string that ends with zero or more a's. Link. Code.
Input = "bc
             bbc  
             b 
             ba 
             baaa"
Pattern = "^b(a*)$"
Output = "Not matched!
               Not matched!
               Found a match!
               Found a match!
               Found a match!"

Project: matches a string that has an a followed by one or more a's. Link. Code.
Input = "ba
             bad
             baad"
Pattern = "ba+?"
Output = "Found a match!
             Found a match!
             Found a match!"

Project: matches a string that has an a followed by zero or one 'a'. Link. Code.
Input = "ab"
            "bad"
            "ba"
Pattern = "ba+?"
Output = "None
             <re.Match object; span=(0, 2), match='ba'>
             <re.Match object; span=(0, 2), match='ba'>"

Project: matches a string that has an a followed by three 'a's. Link. Code.
Input = "baaa"
            "baaaaac"
            "ba"
Pattern = "ba{3}?"
Output = "<re.Match object; span=(0, 4), match='baaa'>
               <re.Match object; span=(0, 4), match='baaa'>
               None"

Project: matches a string that has an a followed by two to three 'a'. Link. Code.
Input = "baaa"
            "baaaaac"
            "ba"
Pattern = "ba{2,3}"
Output = "<re.Match object; span=(0, 4), match='baaa'>
                <re.Match object; span=(0, 4), match='baaa'>
                None"

Project: find sequences of lowercase letters joined by an underscore. Link. Code.
Input = "baa_cbba"
            "baaa_Haac"
            "ba_BB"
Pattern = "^[a-z]+_[a-z]+$"
Output = "<re.Match object; span=(0, 8), match='baa_cbba'>
            None
            None"

Project: find the sequences of one upper case letter followed by lower case letters. Link. Code.
Input = "CaBbGg"
            "Python"
            "ba_BB"
            "PYTHON"
            "aBCd"
Pattern = "[A-Z]+[a-z]+$"
Output = "<re.Match object; span=(4, 6), match='Gg'>
               <re.Match object; span=(0, 6), match='Python'>
               None
               None
               <re.Match object; span=(1, 4), match='BCd'>"

Project: Matches a string that has an 'b' followed by anything ending in 'a'. Link. Code.
Input = "textA = "CaBbGg"
            "Python"
            "ba_BB"
            "PYbTHOa"
            "aBCd""
Pattern = "b.*?a$"
Output = "None
               None
               None
               <re.Match object; span=(2, 7), match='bTHOa'>
               None"

Project: matches a word at the beginning of a string. Link. Code.
Input = "my number that sat mat is +4545 456 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
            "*y&%@#!}{"
            " Hellow, this is Yougui Liao"
Pattern = "^\w+"
Output = "<re.Match object; span=(0, 2), match='my'>
               None
               None"

Project: matches a word at the end of a string, with optional punctuation (ending with space). Link. Code.
Input = "my number that sat mat is +4545 456 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
            "*y&%@#!}{ "
            " Hellow, this is Yougui Liao "
            " Hellow, this is Yougui Liao. "
Pattern = "\w+\S*$"
Output = "<re.Match object; span=(211, 216), match='2012.'>
               None
               None
               None"

Project: matches a word containing 'a'. Link. Code.
Input = "my number that sat mat is +4545 456 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
            "*y&%@#!}{ "
            " Hellow, this is Yougui Liao "
            " Hellow, this is Yougui Liao. "
            " Hellow, this is Yougui Liao."
            " Hellow, this is Yougui Liao"
Pattern = "\w*a.\w*"
Output = "<re.Match object; span=(10, 14), match='that'>
            None
            <re.Match object; span=(24, 28), match='Liao'>
            <re.Match object; span=(24, 28), match='Liao'>
            <re.Match object; span=(24, 28), match='Liao'>
            <re.Match object; span=(24, 28), match='Liao'>"

Project: match a string that contains at least one upper and lowercase letters, numbers, underscores and dots. Link. Code.
Input = "myStringA = "my number that sat mat is +4545 456 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
            "*y&%@#!}{ "
            "a Hellow, athis is aYouguiaa aLiaoa asdaa."
            " Hellow this is Yougui Liao 6"
            " Hellow, this is Yougui Liao."
            "Python_Exercises_1."
Pattern = "^[a-zA-Z0-9_.]*$"
Output = "None
            None
            None
            None
            None
            <re.Match object; span=(0, 19), match='Python_Exercises_1.'>"

Project: starts each string with a specific number. Link. Code.
Input = "myStringA = "my number that sat mat is +4545 456 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
            "*y&%@#!}{ "
            "a Hellow, athis is aYouguiaa aLiaoa asdaa."
            "6 Hellow this is Yougui Liao 6"
            "6-2374451"
            "6_Python_Exercises_1."
Pattern = r"^6"
Output = "None
            None
            None
            <re.Match object; span=(0, 1), match='6'>
            <re.Match object; span=(0, 1), match='6'>
            <re.Match object; span=(0, 1), match='6'>"

Project: remove leading zeros (at beginning) from an IP address. Link. Code.
Input = "0345.06.045.245"
Pattern = .sub('\.[0]*', '.', '[0]'
Output = "0216.8.94.196
               216.8.94.196"

Project: check for a number at the end of a string. Link. Code.
Input = "myStringA = "my number that sat mat is +4545 456 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
            "*y&%@#!}{ "
            "a Hellow, athis is aYouguiaa aLiaoa asdaa."
            "6 Hellow this is Yougui Liao 6"
            "6-2374451"
            "6_Python_Exercises_1."
Pattern = r".*[0-9]$"
Output = "None
               None
               None
               <re.Match object; span=(0, 30), match='6 Hellow this is Yougui Liao 6'>
               <re.Match object; span=(0, 9), match='6-2374451'>
               None"

Project: Search/split numbers (0-9) of length between 1 and 3 in a given string. Link. Code.
Input = "667, 2, 34, my number that sat mat is +4545 456 657 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .finditer(r"([0-9]{1,3})"
Output = "<callable_iterator object at 0x00000166E8098670>
               667
               2
               34
               454
               5
               456
               657
               123
               45
               45
               456
               123
               456
               897
               123
               456
               12
               201
               2"

https://www.globalsino.com/ICs/page2332.html >>

Project: search for literal strings within a string/sentence. Link. Code.
Input = "667, 2, 34, This is Yougui Liao at Globalsino. my number that sat mat is +4545 456 657 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = ["Yougui", "Liao", "Globalsino"]
Output = "Matched for Yougui, Matched for Liao, Matched for Globalsino "

Project: search for a literal string in a string and also find the location within the original string where the pattern occurs. Link. Code.
Input = "667, 2, 34, This is Yougui Liao at Globalsino. my number that sat mat is +4545 456 657 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th Pattern = ["Yougui", "Liao", "Globalsino"]
Output = "Matched for Yougui
               Found "Yougui" in "" from 20 to 26
               Matched for Liao
               Found "Liao" in "" from 27 to 31
               Matched for Globalsino
               Found "Globalsino" in "" from 35 to 45 "

Project: find the substrings within a string. Link. Code.
Input = "667, 2, 34, This is Yougui Liao at Globalsino. my number that sat mat is +4545 456 657 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th Pattern = 'Yougui'
Output = Found "Yougui"


Project: find the occurrence/counts and position of substrings within a string. Link. Code.
Input = "667, 2, 34, This is Yougui Liao at Globalsino. my number, Yougui, that sat mat is +4545 456 657 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = "Yougui"
Output = "Found "Yougui" at 20:26
               Found "Yougui" at 58:64"


Project: extract year, month and date from an URL Link. Code.
Input = "https://www.washingtonpost.com/sports/2023/06/09/when- is-uefa-champions-league-final/#CPEVA24SZJBJDFWDBG5B5NBWOI-4"
Pattern = .findall(r'/(\d{4})/(\d{1,2})/(\d{1,2})/'
Output = [('2023', '06', '09')]


Project: convert a year, month, date of yyyy-mm-dd format to a different format, e.g. dd-mm-yyyy format. Link. Code.
Input = "2026-01-02"
Pattern = .sub(r'(\d{4})-(\d{1,2})-(\d{1,2})', '\\3-\\2-\\1'
Output = The date in YYY-MM-DD Format: 02-01-2026


Project: match if two words from a list of words start with the letter 'B'. Link. Code.
Input = ["Boy Baby", "This is Yougui Liao"]
Pattern = (B\w+)\W(B\w+)
Output = ('Boy', 'Baby')


Project: separate and extract the numbers in a given string. Link. Code.
Input = "667, 2, 34, This is Yougui Liao at Globalsino. my number, Yougui, that sat mat is +4545 456 657 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .split("\D+"
Output = "667
               2
               34
               4545
               456
               657
               12345
               45
               456
               123456
               897
               123456
               12
               2012"

Project: find all words starting with specific letters 'T' or 'Y' in a given string. Link. Code.
Input = "667, 2, 34, This is Yougui Liao at Globalsino. my number, Yougui, that sat mat is +4545 456 657 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .findall("[TY]\w+"
Output = ['This', 'Yougui', 'Yougui']


Project: separate and extract the numbers and their position in a given string. Link. Code.
Input = ""667, 2, 34, This is Yougui Liao at Globalsino. my number, Yougui, that sat mat is +4545 456 657 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = "\d+"
Output = "667
             Index position: 0
             2
             Index position: 5
             34
             Index position: 8
             4545
             Index position: 83
             456
             Index position: 88
             657
             Index position: 92
             12345
             Index position: 96
             45
             Index position: 118
             456
             Index position: 121
             123456
             Index position: 130
             897
             Index position: 144
             123456
             Index position: 236
             12
             Index position: 257
             2012
             Index position: 271"

Project: abbreviate/simplify/shorten 'Liao' as 'L.' in a given string. Link. Code.
Input = "This is Yougui Liao"
Pattern = .sub('Liao$', 'L.'
Output = "This is Yougui L."


Project: replace all occurrences of a space, comma, or dot with a colon. Link. Code.
Input = "667, 2, 34, This is Yougui Liao at Globalsino. my number, Yougui, that sat mat is +4545 456 657 12345 good Scattering 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .sub("[ ,.]", ":"
Output = "667::2::34::This:is:Yougui:Liao:at:Globalsino::my:number::Yougui::that:sat:mat:is:+4545:456:657:12345:good: Scattering:45:456:Cars:123456:buting:897:off:Abcdef:bvndef:google:some:electron:of:Hyutt:diffusion:cool:caused:yooy: by:being:the:123456:with:the:fit::12th:December:2012:"


Project: replace maximum 2 occurrences of space, comma, or dot with a colon. Link. Code.
Input = "This, is Yougui Liao."
Pattern = .sub("[ ,.]", ":", myStringC, 2)
Output = "This::is Yougui Liao."


Project: find all six-character words in a string. Link. Code.
Input = "This, is Yougui Liao, hellow, youtube."
Pattern = .findall(r"\b\w{6}\b"
Output = ['Yougui', 'hellow']

Project: find all three, four, five, six character words in a string. Link. Code.
Input = "This, is Yougui Liao, hellow, youtube."
Pattern = .findall(r"\b\w{3,6}\b"
Output = ['This', 'Yougui', 'Liao', 'hellow']


Project: find all words that are at least 3 characters long in a string. Link. Code.
Input = "This, is Yougui Liao, hellow, youtube."
Pattern = .findall(r"\b\w{3,}\b"
Output = ['This', 'Yougui', 'Liao', 'hellow', 'youtube']


Project: convert a camel-case string to a snake-case string. Link. Code.
Input = "YouguiLiao"
Pattern = re.sub('([a-z0-9])([A-Z])', r'\1_\2', re.sub('(.)([A-Z][a-z]+)', r'\1_\2', myStringD)).lower()
Output = "yougui_liao"


Project: convert snake-case string to camel-case string. Link. Code.
Input = "yougui_liao"
Pattern = "''.join(x.capitalize() or '_' for x in myStringE.split('_'))"
Output = "YouguiLiao"

Project: extract values between quotation marks of a string. Link. Code.
Input = '"Yougui", "Liao", "Globalsino"'
Pattern = .findall(r'"(.*?)"'
Output = ['Yougui', 'Liao', 'Globalsino']


Project: remove multiple spaces from a string. Link. Code.
Input = "Yougui             Liao"
Pattern = .sub(' +',' '
Output = "Yougui Liao"


Project: remove all whitespaces from a string. Link. Code.
Input = "           Yougui             Liao"
Pattern = .sub(r'\s+', ''
Output = "YouguiLiao"


Project: remove everything except alphanumeric characters from a string. Link. Code.
Input = "**/yougui liao// - 12. "
Pattern = .compile('[\W_]+'), .sub(''
Output = "youguiliao12"

Project: find URLs/http/webpage in a string. Link. Code.
Input = "<p>Contents :</p><a href="https://www.globalsino.com">Python Examples</a><a href="http://google.com">These are some examples</a>"
Pattern = .findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
Output = ['https://www.globalsino.com', 'http://google.com']


Project: split a string into uppercase letters. Link. Code.
Input = "ThisIsYouguiLiaoYoutube"
Pattern = .findall('[A-Z][^A-Z]*'
Output = ['This', 'Is', 'Yougui', 'Liao', 'Youtube']


Project: do case-insensitive string replacement. Link. Code.
Input = "\t\u001b[0;35mglobalsino.com\u001b[0m \u001b[0;36m457.93.278.298\u001b[0m"
Pattern = .compile(r'\x1b[^m]*m'), .sub(''
Output = "globalsino.com 457.93.278.298"


Project: find all adverbs and their positions in a given sentence. Link. Code.
Input = "667, 2, 34, This is Yougui Liao at Globalsino. my number, Yougui, that sat mat is +4545 456 657 12345 good really? Scattering slowly 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .finditer(r"\w+ly", m.start(), m.end()
Output = "107-113: really
                126-132: slowly"

Project: split a string with multiple delimiters. Link. Code.
Input = "667, 2, 34, This is Yougui Liao at Globalsino. my number, Yougui, that sat mat is +4545 456 657 12345 good really? Scattering slowly 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .split('; |, |\*|\.|\n'
Output = "['667', '2', '34', 'This is Yougui Liao at Globalsino', ' my number', 'Yougui', 'that sat mat is +4545 456 657 12345 good really? Scattering slowly 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit', '12th December 2012', '']"


Project: check if the decimal of a number is in precision of 2 or not. Link. Code.
Input ="1234.6799"
            "1234.679"
            "1234.67"
            "1234.6"
            "34."
Pattern = r"""^[0-9]+(\.[0-9]{1,2})?$"""
Output = "None
             None
             <re.Match object; span=(0, 7), match='1234.67'>
             None"


Project: remove words from a string/number of length between 1 and a given number (5 here). Link. Code.
Input = "667, 2, 34, This is Yougui Liao at Globalsino. my number, Yougui, that sat mat is +4545 456 657 12345 good really? Scattering slowly 45 456 Cars 123456 buting 897 off Abcdef bvndef google some electron of Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .compile(r'\W*\b\w{1,5}\b')
Output = " Yougui Globalsino number, Yougui really? Scattering slowly 123456 buting Abcdef bvndef google electron diffusion caused 123456 December."


Project: . remove the parenthesis area in a string. Link. Code.
Input = "667, 2, 34, This is Yougui Liao at Globalsino. (my number, Yougui, that sat mat is +4545 456 657 12345) (good really? Scattering slowly 45 456 Cars 123456 buting 897 off ) (Abcdef bvndef google some electron of) Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .sub(r" ?\([^)]+\)", ""
Output = "667, 2, 34, This is Yougui Liao at Globalsino. Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."

Project: insert spaces between words starting with capital letters. Link. Code.
Input = "myStringC = "ThisIsYouguiLiaoYoutube"
            "YouguiLiao"
Pattern = .sub(r"(\w)([A-Z])", r"\1 \2"
Output = "This Is Yougui Liao Youtube
               Yougui Liao"


Project: remove lowercase/uppercase/capital substrings from a given string. Link. Code.
Input = "667, 2, 34, This is Yougui Liao at Globalsino. (my number, Yougui, that sat mat is +4545 456 657 12345) (good really? Scattering slowly 45 456 Cars 123456 buting 897 off ) (Abcdef bvndef google some electron of) Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .sub('[a-z]', ''; .sub('[A-Z]', ''
Output = "667, 2, 34, T Y L G. ( , Y, +4545 456 657 12345) ( ? S 45 456 C 123456 897 ) (A ) H 123456 , 12 D 2012.
               667, 2, 34, his is ougui iao at lobalsino. (my number, ougui, that sat mat is +4545 456 657 12345) (good really? cattering slowly 45 456 ars 123456 buting 897 off ) (bcdef bvndef google some electron of) yutt diffusion cool caused yooy by being the 123456 with the fit, 12th ecember 2012."


Project: concatenate the consecutive numbers in a given string (remove spaces betweeen numbers). Link. Code.
Input = 667 789, 2, 20 34, This is Yougui Liao at Globalsino. (my number, Yougui, that sat mat is +4545 456 657 12345) (good really? Scattering slowly 45 456 Cars 123456 buting 897 off ) (Abcdef bvndef google some electron of) Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = .sub(r"(?<=\d)\s(?=\d)", ''
Output = "667789, 2, 2034, This is Yougui Liao at Globalsino. (my number, Yougui, that sat mat is +454545665712345) (good really? Scattering slowly 45456 Cars 123456 buting 897 off ) (Abcdef bvndef google some electron of) Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."



Project: convert a given string to snake case. Link. Code.
Input = "667 789, 2, 20 34, This is Yougui Liao at Globalsino. (my number, Yougui, that sat mat is +4545 456 657 12345) (good really? Scattering slowly 45 456 Cars 123456 buting 897 off ) (Abcdef bvndef google some electron of) Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
Pattern = '-'.join(sub(r"(\s|_|-)+"," ", sub(r"[A-Z]{2,}(?=[A-Z][a-z]+[0-9]*|\b)|[A-Z]?[a-z]+[0-9]*|[A-Z]|[0-9]+", lambda mo: ' ' + mo.group(0).lower(), myStringA)).split())
Output = "667-789,-2,-20-34,-this-is-yougui-liao-at-globalsino.-(-my-number,-yougui,-that-sat-mat-is- +-4545-456-657-12345)-(-good-really?-scattering-slowly-45-456-cars-123456-buting-897-off-)-(-abcdef-bvndef-google- some-electron-of)-hyutt-diffusion-cool-caused-yooy-by-being-the-123456-with-the-fit,-12-th-december-2012."



Project: convert a given number to snake case for phone number. Link. Code.
Input = "514 657 9876"
Pattern = '-'.join(sub(r"(\s|_|-)+"," ", sub(r"[A-Z]{2,}(?=[A-Z][a-z]+[0-9]*|\b)|[A-Z]?[a-z]+[0-9]*|[A-Z]|[0-9]+", lambda mo: ' ' + mo.group(0).lower(), myStringL)).split())
Output = "514-657-9876"

Project: Longest string. Link. Code.
Input = ("667 789, 2, 20 34", "This is Yougui Liao at Globalsino.", "my number, Yougui, that", "sat mat is +4545")
Pattern = print(max(myStringA, key = len))
               print(max(myStringA))
Output = "This is Yougui Liao at Globalsino.
               sat mat is +4545"



Project: checks whether a word starts and ends with a vowel in a given string. Return true if a word matches the condition; otherwise, return false. Link. Code.
Input = "456 657 12345) (good really? Scattering slowly 45 456 Cars 123456 buting 897 off ) (Abcdef bvndef google some electron of) Hyutt diffusion cool caused yooy by being the 123456 with the fit, 12th December 2012."
               "ThisIsYouguiLiaoYoutube"
               "YouguiLiao""
Pattern = .findall('[/^[aeiou]$|^([aeiou]).*\1$/'
Output = "False
               True
               True"

Project: Find the number of times a word or phrase occurs in a text. Link. Code.
Input = "667 789, 2, 20 34, This is Yougui Liao at Globalsino., my number, Yougui Liao,Yougui sat mat is +4545"
Pattern = [r'Yougui'], [r'Yougui Liao']
Output = "['Yougui', 'Yougui', 'Yougui']
               3
               ['Yougui', 'Yougui', 'Yougui']
               2"

Project: Link.
Input = ""
Pattern = ""
Output = ""



Project: Link.
Input = ""
Pattern = ""
Output = ""



Project: Link.
Input = ""
Pattern = ""
Output = ""



Project: Link.
Input = ""
Pattern = ""
Output = ""



Project: Link.
Input = ""
Pattern = ""
Output = ""









































 

============================================

Search phone number. Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Search information (some are searchable and some are not, refer to Table 2258). Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Search file names with a pattern. Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Search, match and ^.  Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Clean text by removing special characters and doulbe spaces.  Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Create and varify the password with specific condition:
# Minimum of 4 but a maximum of 8 numbers: \d{4, 8} at the beginning with .match()
# The numbers must be followed by a minimum of 2 and a maximum of 6 letters, either capital or small letters [a-zA-Z]{2,}
# After that, it can contain any character .*.
# It cannot end with the following symbols !, @, $, %, &: [^!@$%&]$ .
# $ anchors the pattern to the end of the string.  Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Find the time:
# The ordinal number can have 1 or 2 digits. It is followed by st, th, or rd (so 2 small letters): \d{1,2}[a-z]{2}.
# After that we have whitespace: \s and then the word of and whitespace again: \s
# We want to match any letter (capital or small) at least one time: [a-zA-Z]+ . Then, whitespace: \s .
# A number of 4 digits must follow: \d{4} and whitespace: \s
# Then, match a number of 1 or 2 digits for the hour, \d{1,2},
# followed by a colon : and a number of two digits for the minutes \d{2}. Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Find match with w+ and ^: Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Extract email addresses:
# findall() module is used to search for “all” occurrences that match a given pattern.
# In contrast, search() module will only return the first occurrence that matches the specified pattern. Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Find how many string "xy" in the text. Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Simple matching with the beginning and end letter. Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Simplely match the beginning of a word. Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Simplely match the beginning and end of a word. Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Split a string by the number with split(). Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Use sub() to replace something. Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

Extract substrings between brackets (including brackets). Code:
         Upload Files to Webpages
       Output:    
         Upload Files to Webpages

============================================

         

         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         

 

 

 

 

 



















































 

 

 

 

 

=================================================================================