Tell us where you're logging in from today. I think we've got our captions corrected. Now. They'll be ready in just a moment.
I guess I should be able to do that
all right, looks
like captions are starting to work now.
Perfect. Here we go. All right. So
welcome everybody. We are about now, three minutes away from getting started talking all about regex here on iThemes Training with Paul Gildo. If you're just joining us, I'm dropping the link to today's slides there in the chat. You can download that slide deck follow along if you'd like. There's also the link to view the replay after we finish up that will be available about an hour or so after we wrap up today. Earlier that's a shareable link as well if you want to share that with others who also need to know the glories of regex. Yes,
exactly.
Yeah, there should be a badge of some sort that you can add to your online profile if you if you understand reg X.
I think it should be the XKCD where he drops in from the sky and says
yes, absolutely like that. Yes, yes. All right, folks, the check in question today is on a scale of one to 10 How would you rate your understanding of regex? Let us hear from you there in the chat. Hey, Vern, welcome Eddie Lidia, welcome. Glad everybody's here. Hey, Jan. Welcome, Kylie. Zero, that works. Perfect. I probably give myself about a one or a one and a half. All right. I have I know reg x exists. So I think I'm above a one. But not much more than that. All right.
Well, we start at zero. How
about There you go. We go up. Yeah. Vern says Google is my friend. I agree. And I'll tell you I have hired out I found people on Fiverr to hire out for a quick regex little bit of work there. Over the years here and there. We were talking about AI and regex a little bit earlier when Paul and I were chatting before we started things off today. Lots of options, but it's we need to know the basics. And just have a framework for understanding what's happening.
He says, Yeah, no
regex from Linux.
Yeah. Said and grep. Yeah.
So just about a minute to go before we get started. Welcome, everybody. If you're just joining us in zoom, pop up in that chat, say hi, tell us where you're logging in from today. I'm going to drop the slide link and the resource link there in the chat if you want to download today's slides, or go ahead and pop up in the replay link you can watch a rewatch the webinar training after about an hour after we wrapped up share that also with others. It'll be a free open link to anybody who wants to view it so it'd be a great thing to share out later as well. Lots of good stuff coming up in the next hour with Paul gills talking about regex and demystifying the hieroglyphics. I love that title. We have a lot to learn. I have a lot to learn for sure about this topic.
And I'm looking forward to that
where East Coast nice
Kylie from North Carolina welcome. I love North Carolina. We get there. You've been there several times. Yeah. Love it up
there. Just about ready to get started
about 10 seconds to go and we'll start the recording and kick things off officially here talking about regex sharing
post. You've already focuses.
It is three minutes after someone to start that recording and we'll dive right in. Well good afternoon, everybody and welcome to another Live I iThemes Training event. My name is Nathan Ingram. I'm the host here at AI iThemes Training and I'm joined today by Paul gills. Oh, Paul is a developer relations engineer@platform.sh. He's a former programmer and analyst principal at the University of Missouri he's also a web application security evangelist, a software instructor and a conference presenter which is where I met Paul at WordCamp Birmingham last month. Welcome Paul. Glad you're here with us nice things training. How are you today?
Thank you very much. I'm doing great. anxious to get anxious to share what I got to show everybody today.
Yeah, so we were talking right as we kicked off today about some people say reg X some people say reg X. Paul, what camp are you in?
I'm in the rig and that hard G because it's regular expressions so I figured it's best best to be reg X though. There's a chance I may slip up and say reg X during the presentation. So I got that ahead of time.
I don't think there'll be any lynch mobs forming for you. So we'll be okay there. Yeah, right. So we just just fine. So we have a lot to cover talking about regular expressions. Today. I give us a little bit of an overview of where we're going to go over the next hour.
Sure. We're going to talk about what regular expressions are taught a little bit about what they're not talk about. Use cases for regular expressions outside of programming as well as inside programming. And then we're going to dive into those hieroglyphics. We're going to look at those symbols, and then walk through each of those symbols and I'll show you live how choosing and using different symbols will change the matches and what we find using a regular expression. And then at the end, we're going to use regular expressions to solve a puzzle.
Oh, fun. All right. So in our pre show, we took a little poll to see, you know, where people were on their understanding of regex and most were very low, like 012 out of 10. So can someone like myself with a very low understanding of regex we'll be able to, will we be able to get something out of today's training?
Absolutely. Absolutely. I start at zero and we build up. I don't assume you know anything. But my goal though, is even if you do know something I'm hoping to show you some things you may not have seen before or a new way of explaining it and to help you understand it a little bit better.
Yeah, excellent. So I'm looking forward to this. Paul and I were also talking about AI and the fact that you can use something like chat GPT to generate a regular expression. But honestly, you have to know a little bit about the whole idea of what regex is even to use a tool like chat GPT should be a lot of fun. Alright, a couple of housekeeping notes and I'll turn it over to Paul. If you're just joining us in zoom, welcome. We're glad you're here. Pop open the chat and tell us Hi Where are you logging in from today? If he as you chat make sure that the little blue box beside two is set to everyone. It usually defaults to hosts and panelists for most folks. So if you'd like others to see what you say, make sure it's set to everybody. Also, I pop open the q&a link there. If you mouse over the shared screen and your zoom window, there's a q&a icon. That's the spot to ask your questions. You can do that at any time during the presentation today. And what I would suggest that you do is just keep that q&a box open because as questions are asked, you'll see the little thumbs up icon below the question and you can thumbs up a question that you have as well and when we get to our time of q&a at the wrap up today, we will take those questions in the order of up votes. So with that, I'm going to dis Oh, we will have replay also one more thing there. Let me just drop the link bundle in the chat as well for anybody just coming in. The link to today's slides is there in the chat as well as the link to the replay. It'll be up in about an hour or so after we finish up today along with the transcript and anything that's shared in the chat log. So now I will disappear turned over to Paul. Let's get started. Perfect. All right. So
thank you for that introduction. Megan. As you said, my name is Paul Gilson. I am with a company called platform.sh. You've never heard of us before. We are a secure enterprise grade platform as a service for building and scaling your applications and websites. We handle that infrastructure and deployment so you can focus on building that next great application. However, that's not why you're here. You're here to learn about regular expressions. So as I mentioned earlier, we're going to dive in we're going to talk about what they are, we're going to talk about the symbols that you see often and go through each of those symbols, to hopefully give you some clarity as to what those symbols are doing. I do hope that you follow along. So if I do out on my GitHub profile, I've got this regex right. See I told you I might mess up this regex presentation pinned up there. And it contains a list of word banks that I'm going to be using during this presentation, as well as using a tool called regex. One Oh one.com. I've got that up right here. We're going to be using that to help build out these regular expressions and see how we as we use these different symbols, it changes what we match. So what are regular expressions regular expressions were originated back in the early 1950s by a mathematician named Steven coal clean who was trying to describe regular languages in formal language theory. What he wanted to do was build a formula an algebraic way to describe languages and while that's interesting, from an academic standpoint, that's probably not what you think of when you hear the phrase regular expression, what you think of is probably something like this.
Alright, so if
you've never used regular expressions, and you're looking at that, and you're having that same reaction, or if you've used them before, and you have this reaction every time you have to try to use them, I get that regular expressions kind of have a bad rap because there is a bit of a learning curve. My goal is today is again to to introduce you to them and hopefully be that aha moment be that that moment of clarity where it starts to make sense. So in for today's presentation for this webinar, when I say regular expression, what I'm referring to is a sequence of characters that we use to represent a pattern inside a larger body of text that we're trying to locate. So some things that regular expressions are not they are not a programming language, while they kind of look like programming language. And we certainly use them in programming languages. They're not we do have some branching and some basic course but they're not a programming language. They're also not on learnable again, as I mentioned, they kind of get a bad rap. In fact, I had a previous attendee say this to me during the presentation in frustration, because they can be difficult to learn sometimes, but they are a very powerful tool for you to have in your toolbox. Now, I know personally, I'm guilty that whenever I get a new powerful tool, suddenly all of my problems look like they can be solved with that tool. So the other thing is to remember that they are not the solution to every problem. You've probably seen this comment from XKCD where he talks about he's got some problems and he's going to use regex. To solve it. Now he has one more extra problem. And it's true in certain scenarios, using a regular expression can actually introduce more issues into a problem you're trying to solve. So we're going to talk a little bit about some things to do to ensure you're not doing that and some appropriate situations in which to use them. So what can we use them for? Well, you can kind of think of them as Ctrl F on steroids. We can use them to find text inside larger bodies of text. I've got an example here. In we're in a Word doc or a Google Doc. It's an article on wolves. Alright, and let's say I'm doing some research, and what I need to do is find in this document will pretend it's much bigger. I need to find every instance of the word wolf for wolves. Now I could bring up Ctrl F except I got to do it in the right window here. I couldn't bring up Ctrl F and type out Wolf and sure enough, it's matching Wolf, but it's not matching wolves. So I'd have to come back over and I'd have to do wolves now to matching wolves, but I also don't want red wolves. I want brown wolves I'm not grey wolves. even rarer wolves is okay, but I don't want any matches on Red Wolves. So not sure if you knew this, but in both word and Google Docs, we can use regular expressions and by building a regular expression and using that. Now if you'll notice I'm matching gray wolf but not red wolves, rare wolves gray wolves. gray wolves brown wolves, but again, not red wolves. So we can use those to help us find tax in situations where a ctrl F might not work. We can also use regular expressions to validate text. So we're receiving text or information from an external source. We're getting it in we can use regular expressions to ensure that that text is what we want, and in the correct format and the format that we're anticipating. We can also use it for string manipulation. We can actually take a body of text, pluck out pieces from that body of text and then reformat it into something else that we need. Got another example here inside of zoom click on it inside of an x s, Google Sheets or you can do this inside of Microsoft Excel where I've got some phone numbers, maybe on import these in from an external service, and they're in all different kinds of formats. I've got some with dashes, I've got some with porins and spaces, others with dots. And what I need is for these to be in the exact same format for every single one. So again, using some regular expressions, I can go in
and grab pluck
those pieces and grab hold of that. pluck those pieces, those dates or not those dates, excuse me, those numbers out of that original source and reformat those so every single phone number is in the exact same format that I need. Alright, so how do we actually use Yes, so basically, they're their magic. They're not magic. That's the thing. They look like magic but they're not magic. So how do we actually use it? I said earlier that regular expressions are simply a sequence of characters that we use to represent patterns. So we have lots of different types of characters we can use to build a regular expression, we'd have literal characters. We have special characters as we begin to combine different special characters. We need to build possibly character classes as those character classes expand what they contain. At some point, we might need shorthand character classes or character sequences, just lots and lots and lots of characters. So my idea today is to start at the beginning and show you each and every single one and show you live, how it changes what is matched. So we're going to start with literal characters. Literal characters are exactly what it sounds like. It is literally in this example and f of n by an O followed by an O. So if you're following along in that regex presentation, this is foo dot txt. I've simply taken that word bank and dropped it in here. And if this is not big enough to see please let me know and I'll blow this up a bit more. But up here at the top is where I can enter in my regular expression. I'm going to use the characters F, au and AU. And hopefully you can now see that sure enough, this regular expression engine is using that literal expression to match every instance of an F Oh, no matter where it shows
up. Now I like to point out,
notice it didn't find capital F Oh, because by default, those regular expression engines when they're reading those regular expressions, they are case sensitive. So if I need to match a capital F, I'll have to use the literal capital F character versus that lowercase. Another piece I'd like to point out is notice before my expression, and after my expression, I've got those forward slashes, those are referred to as delimiters. The letters are what define the edges or the boundaries of our regular expression. So when we give this regular expression to a regular expression engine, we're saying inside the delimiter is the regular expression, but most engines will allow you to change that character. Reason being is sometimes you're gonna want to actually search for that character, such as path to foo. In this case, notice now my regular expression builders already saying, hey, there's a pattern here, because you're trying to use that delimiter character inside your expression that's going to cause a problem. So in this particular tool, I can change that delimiter right here on the left, and now I've got a valid regular expression over all right now I use the term Regular Expression Engine a couple of times now and I haven't defined it. So I want to pause real quick and talk about regular expression engines, a Regular Expression Engine is simply a piece of software that can ingest a regular expression and process it against that larger body of text. It is sometimes referred to as a flavor so if you'll notice back in the tool I'm using there on the left hand side it does list multiple and you can see my intervene still see me but I'm putting air quotes, multiple flavors, multiple versions or implementations of a regular expression engine. The reason and I'm gonna give you several warnings today, and this isn't, hey, don't do this. It's more of a Hey, be aware of this. The reason there are multiple flavors it's very similar to the scenario we had back in the 2000s. You're old enough to be around in the 2000s. And you were online, you might remember that most websites had something that said best viewed in IE or Netscape Navigator or something like that. And that was because back then every browser could implement their own proprietary set of features that may or may not work inside and other browsers. Regular Expression engines are the same way. They're all able to implement features inside that engine that may or may not work somewhere else, which is why it's so important that you can change the flavor that you're using to match where you're going to be using it at. This is partially because the standards are very loose. There's actually only one true standard for regular expressions. And that's POSIX basics, regular expressions. The problem with it is it is very basic. It's very limited in what it supports, and it's also pretty old. If you have worked with regular expressions a little bit you and if you're after today, you start working with them some more, you'll probably come across PCRE, which stands for Perl compatible, regular expressions. It is not a standard although it's often adopted by many as a quasi standard. It's actually an open source library written in C, whose original purpose was to take the implementations the feature set from perl and build out new regular expression engines that were compatible with that feature set in Perl. Now it is a living open source project so it has gone through multiple versions. And in fact, at one point, curl adopted features from perl compatible regular expressions to make itself more compatible with it. And if you can see this, I'm doing big circles, right? It's kind of circular. But the important piece to remember is to again, always test in a regex tool. I'm also going to try to give you a series of bonuses or extra tips, and the first one today is to use a regular expression builder. You have so many options now that we didn't have in the past, not only the one I'm using today, which is again regex one to one, but there's regex or there's regular if you prefer an installed option, you cannot go wrong with regex buddy on Windows, it is unfortunately only on Windows but it's phenomenal on there's also expressions on Mac but take advantage and because they give you a ton of information. So again, not only and hopefully you can see my mouse here highlighting these, not only can you choose the different flavors, it gives you warnings and errors if there's a problem on the up in this one, it will explain exactly what it's doing. So if I change that back to SU, it'll say hey, I'm trying to match literally the F character followed by the O. It shows you what it's matching inside your body of text and you can actually click on each one and highlighted and it gives you a bunch of shortcut or quick reference. So tons of information that you can take advantage of. Alright, so we're going to jump right into all these characters. This is the 12 special characters that we're going to go through the first one is what's referred to as an escape character. Previously, when I was doing that path to and we got that error, if I hover over that you'll notice it says hey, you've got an honest scape delimiter you're using a delimiter that actually that boundary that delimiter tool, but what you can do in most languages in most of these expression engines is using the escape character that backslash so what the backslash special character does is tells the Regular Expression Engine, hey, this next character that follows don't treat it like it's normal meaning treat it with its alternate meanings. So in this case, I could say hey, escape that for slash and now it's going to try to find actually that forward slash character. All right, so we're going to jump into the next set. The next the first one is the caret symbol. Um, it's part of a subgroup called anchors, and a caret symbol is going to anchor a pattern to the beginning of a string or alive. So I'm going to come back over here and we'll do su Notice again, it's matching su at the end of some path to foo. It's matching foo and foot foo bar. It's matching foo inside bar foo. But if I add that carrot to the beginning, watch what happens to bar foo. Bar foo is no longer match because it's saying, hey, this pattern should only be at the beginning of a string or a line. These anchors are sometimes referred to as assertions, returning a true or a false. If we have a beginning we probably need an end so the dollar sign is also an anchor, but to the end of a string or line. So just like that carrot simply but, but instead of the beginning, it's going to be the end. So again, if we come back here and get rid of my carrot, so now watch, like foot, I add that dollar sign at the end. It's now not matching foot or FUBAR. FUBAR. It's only matching foo, where it shows up at some path to foo, bar foo bar foo, and then also foo where it's the whole word because it still matches the end as well as the beginning.
All right, and then a little bonus
story. Yeah. You should always anchor when possible whenever you have the ability to anchor you should even if your pattern is matching what you're expecting. The reason for that is optimization and performance as a regular expression is parsing that body of text that's it's parsing to regular expression, and then parsing the body of text. It's having to move through step by step by step, if we can designate the position of where that pattern should show then as its parsing. If that is no longer true, it doesn't have to continue parsing the remaining piece of that line, it can skip that line and go to the next. So whenever you can make sure to anchor let's There we go. All right, the next character in our list of special characters is the opening square bracket. This allows us to define what's called a character class. A character class is going to match a single literal character like we talked about earlier from a list of literal characters. So mentioned earlier that it is case sensitive. So if I wanted to match all alpha characters that are lowercase A to Z, I could build a character class and type out ABCDEFG etc, all the way to the end. That is a lot of typing though. So they also allow us to define ranges of literal characters. In that case, instead of having to type out the entire alphabet. I can say a dash z or a through z. And the ending square bracket is not a special character by itself unless it's used with the beginning one in order to create that character class. So let's take a look at this. So I said earlier, you know that we didn't match when we typed out foo, we weren't matching that capital F foo. If I instead wanted to match both, I could create a character class and put both in there. And now I'm matching foo as a lowercase but also foods a capital case, because it's coming in here and saying, Hey, this is a character class. I'm gonna go into the character class and I need to match one of the characters inside the character class. In this case, it's F lower or F upper followed by Oh no, but I could also do a range something like maybe B through F. And now I'm matching foo, but I'm also matching foo bar. I think I've got another one in there.
Comment you asked,
but oh, yeah, I was doing F but I didn't match. There we go. I didn't match route back because I had only gone from B to F. Alright, the next character, the next special character we have, Oh, I almost forgot. So inside of that character class as we build that character class inside that other special characters aren't special. They're normal characters. So if we're going to use the dollar sign character, we don't have to escape it. The only characters inside that have to be escaped with a closing square bracket, because that's where we define the end of our character class, the backslash, that's our delimiter, the carrot, which we'll talk about some more and then that dash because that's how we create a range. Now the next special character is the caret symbol, the caret symbol is the negation
character.
So if you're sitting there going and having this exact same reaction as Michael Scott, that's okay. This is the first prime example of why regular expressions are so challenging to understand. And that's because the special characters what they represent, can change depending on where they're used and in the relationship to other special characters where they're used. So inside of a character class, we create a character class. And we place a caret symbol at the beginning inside the character class. This reverses the character class or negates it. It says find anything that isn't this character class. So back over here, if I had f o and o me scroll up, see make sure you see that. And then I add a caret. It says find anything that is not a lowercase f, followed by an O and A No. So notice now it's not matching food match capital F Oh, because that's not a lowercase and then match COO and B, O and R. gets even crazier because we could add an anchor to the beginning outside. And so now we're saying hey, find this pattern only from the beginning, where inside this character class matches single character that isn't a lowercase f followed by an O followed by Oh,
so just
the important bit here, even if this isn't 100% Clear is to remember that with some of these characters, if they're next to other special characters, it's possible that the meaning has changed in what they're representing. Which is why again, it's so important to build these to use these builders because it again will show you exactly what it's trying to do and help you learn these regular expressions.
Cool, okay. So,
I mentioned building out some of these character classes can get really long, right? So if I were to come back out here and build out a character class, and maybe he's trying to do A, B, C, D, G, that's too long, even if I wanted to do maybe a through z and an A through Z, that starts to get really long. So we have something called shorthand character classes, or in some expression engines is referred to as special sequences. The specific way you access these special sequences will depend on the Expression Engine, but many of them use the delimiter character, followed by a literal character. And what they are as they stand in, or are shortcuts to bigger character classes. So the lowercase so backslash. lowercase d is simply a shorthand for the character class zero through nine. The lowercase backslash lowercase w is shorthand for capital A through Z, lowercase A through Z, zero through nine and an underscore. There's about 2627 other ones. Again, you don't have to remember all those because somewhere in your builder will be quick reference. And if I scroll down here, you're gonna begin to see some of those and it's showing you Hey, backslash, D, any digit backslash W, Any Word Character. So just know if again, when you are reading a regular expression, if you see a backslash and what should normally be a literal character, that's probably a shorthand. You just need to go look that up. Then when you're building they make this much easier to write out and much more condensed and easier to follow. Alright, the next one I call the weird one, because most of these have a partner, right? So we had two anchors, where we had the carrot and the dollar they were together. We had the character class which had the opening square bracket and in the square bracket they get partners. This one has no partner. It's the period and it simply stands in for any single character except for linebreaks. So it can be anything at all. So if I come in here, and I do fo dot, notice it's matching that T in foot. It's matching the B in bar, it's matching the space character, it matches anything that's not a newline character. The next one is kind of fun. It's called an alternation. It's the pipe symbol, it creates a branch for the regular expression to follow. So in this example, down there at the bottom where it says bar pipe foo, it's saying okay, I want you to look for the literal characters B A R, and if you can't, if you can't find that, I then want you to look for F O. However, there is a warning with this. That warning is that regular expression engines are very eager. They're very eager to find a match and return as quickly as possible. So what I mean by that is, in this example, this example sentence where we have there were many cats here the bowl with one cat by the door, as its parsing it starts with ca t and it goes through and says oh, here is ca t and returns even though ca TS may be a more complete or more exact match. The important thing here is it's not that you shouldn't use them. It's to remember that regular expression engines in an alternation are always going to give a preference to the left hand side. That right hand side is always going to come secondary and will only look for that second piece. It cannot under any circumstance, find the first one. So I'm going to come back over here and we're going to do maybe Bas and so you can see now saying okay, can I find BAC No Okay, can I find F? Oh, yep, just do in this case and say can I find BAC? Yep. So I'll go on to the next one. Just creates an order statement, if you will, if you're, you're used to programming it would be like an order statement. All right, the next one is part of a C and other sub group called quantifiers. And this allows us to designate how many times something should be repeated. So the question mark says the preceding token and I'm going to come right back to that word token. It says, Make the preceding token and the regular expression optional. Either I can match it zero times, or one time. So we'd say it's a token because in a little bit, we're going to talk about some patterns and ways that we can build embedded patterns and these quantifiers can work on those sub patterns. So in this case, and oh can stand on its own. It's not a sub pattern. So we're saying I can find F O N, then I can either find one or no instances of O and then B A R. So I'll come out and build so you can see that we say fo var and notice I am matching foo with two O's because I can have one instance or a matching phobar Because I can also have zero instances
of that. Oh,
the next Oh, there's a warning almost forgot about the warning. That's called greediness. The warning is about green is quantifiers. Turn greediness on in a regular expression. And what I mean by greeting is is the regular expression engine will attempt to match every single possible instance of what is repeated for as long as possible. So it's like the cat and cats but we flipped it. In this case, if I were to use C A T S question mark is always going to try to match ca t s before it tries to match ca t is not as problematic with the question mark but it becomes more problematic as we move into other quantifiers because again, it's always going to try to match as many times as it can until it simply can no longer make that match. The next quantifier then is an asterix it matches the preceding token in the expression either zero times or infinity times. So if I come out here and I change my question mark, on the mouse, there we go from a question mark to an asterix. Notice not always a matching F Oh, it's matching F O B but it's also matching FUBAR. It's matching every single Oh until there are no more O's to match. And then continues on won't be a and are. The next quantifier is the plus sign. So it's like the last one except it says okay, instead of optional zero, I have to have at least one time I have to have match at least once but then can also match up to infinity. So if we change that Asterix now to a plus, notice I'm no longer matching phobar But I am matching FUBAR. And I'm also matching F O bar as well as the one up here. All right. Now sometimes we need a little more granular control over the number of times where matching something that's where the curly brace comes into place. So the open curly brace combining combined with the closing curly brace allows us to designate an exact minimum and maximum number of times we want that previous thing to be matched. So the format is min comma Max where the min is a zero or a positive integer. And we have a comma and then we have a max and max needs to be an integer and it has to be at least equal to or greater than the min. If we omit the max, but leave the comma then it stands for infinity. So if we look back to our previous special characters that we just talked about, those are really just shortcuts to this more granular control. So the question mark, which was optional, zero or one is really just curly brace, zero comma one, so minimum is zero, maximum of one. The Asterix then is zero comma infinity all the way out as many times and the plus sign then is this is normally where in a live audience you'd have to answer but one to infinity. So I have to have at least a minimum of one match all the way out to infinity. If you admit the comma and the max, then you're saying I want you to match exactly this number of times. So if I come back in here, and I change this from say, maybe are used like two to five, I'm saying alright, I have to match at least F Oh, and then two more Oh, so three O's total, then var up to a maximum of five o's and the only one that's matching here in this case, is this big loan.
FUBAR. Probably doesn't make quite
as much sense yet. But remember that we can combine these with sub patterns, which we're gonna see in a bit, which will allow us to create sub patterns and say, Alright, I want you to repeat this sub pattern X number of times. Oh, I almost forgot. The ending curly brace is very similar to that ending square bracket, where it's not a special character all by itself. It's only a special character when combined with the opening one in order to define that specific number of repetitions. Alright, so we've gone through 10 of the 12. Hopefully everyone is is mostly following along so far, we got two left to go through. And that is that that subgroup that I was talking about those sub patterns, the opening closing parenthesis allows you to create a pattern inside of that opening, closing paren to create a sub pattern and I really liked this example of the English that excuse me, the American versus the British spelling of theater, so make sure they're staring right there. So if I wanted to match both instances, I would need some way to say either E R or R E this is a great example of when we can use a sub pattern. So I can go in and I can say alright, I want you to match the literal characters II are then using that alternation character we talked about, let's say or ar e, and now notice I am matching both theatre with an AR and theatre with an ar e. Same kind of example with maybe say, dishes, both dish and oh, I have to spell it correctly, but not distance but dishes. There we go. If I wanted to match dish, or dishes, I could say alright, match d i s h, then match d s inside this sub pattern and earlier we saw a way to tell the Regular Expression Engine that a previous pattern was optional with the question mark and now I've matched both
dish and dishes. Now, hopefully this
is making sense. One more to show you is if I were to create a sub pattern of bar and then tell it I want it to repeat twice. Notice that gives me the ability hopefully you can see this all the way down at the bottom to match foo bar bar. So I said match the literal characters, f o n o now go into my sub pattern and match the character B A and R and I want you to repeat that twice.
Now unfortunately, I
can't see your faces. Hopefully you are blown away with excitement and the power that has just been revealed to you. So I want to finish up with a last set of special characters we have and that is the opening closing parenthesis, parenthesis open and closed down open closing parenthesis which allows you to create captured groups. So okay, it's really not as bad. All groups are capturing groups. The ones we just created are actually capturing groups. And what I mean by a capturing group is when you create a sub pattern and the Regular Expression Engine finds that sub pattern, it will hold on to it in memory so that you can reuse it later on. And you can have many groups inside your inside your regular expression and you access them ordinarily. So if you have group one, group two or three, you can go back and say okay, I want access to what you found for group one for group two for three In fact, if you remember back to my, the phone numbers are didn't quite see that up here. But here I've got these groups were matching digits. And then over here, I said, Hey, I want group two, I want to use group three. I want to use group four. So let me come in here and we'll do we'll just do foo and bar. And if I go to substitution now you can see I'm gonna grab, I'm going to access it. And in some regular expressions, it's $1 sign of specifics will depend on the regular expression engine, but I can say hey, replace what you found with dash dash then the group that you matched and dashed dashed again. So capturing just lets you use for both that string manipulation and then also reuse what you've already found in those sub
patterns.
All right, that's all 12 That is the big collection of special characters. Normally, I give a little pop quiz, and so I'm just gonna walk through so again, we have that backslash, which is the delimiter tells the Regular Expression Engine Hey, this next character don't use his normal use it's alternate meaning we have the caret symbol which anchors our pattern to the front of a string or line, then the dollar sign anchors to the end of the string or line with the opening square bracket which allows us to begin the definition of a character class where we have a single literal character, they're in a match inside that character class. That caret symbol comes back into play here and again, and we can then negate what's in the character class. We then have the period which stands in for any single character, we've got the pipe symbol, which says that which creates an order for the Regular Expression Engine to follow. We get the question mark that says, hey, that previous thing is either I have to match either zero times, or one time that the Asterix rule says hey, it can be zero or infinity, the plus which is one to infinity. Then if we need more fine grained control over repetitions, we've got the curly brace and the ending curly brace, where we can designate a minute and a max. And then we have the parentheses which allows us to create sub patterns and Capture Groups. Oh, that's a lot to cover in 40 minutes. I'm hoping everybody has follow along. But I got a bonus for you. Now this is an advanced topic. All right, I'm just gonna put that out there. If you don't understand this, that is completely okay. This is an advanced topic. One we need this for the puzzle. We're going to play but two, I want you to be exposed to the ideas and the terms so that as you're using regular expressions, you and you get to a point and you go hey, I kind of remember that scenario being mentioned you've at least you get least seen it one time. They are referred to as I look around. There are a feature of regular expressions, which gives you additional power that's kind of hard to do without them. They're often referred to as zero length assertions. They don't capture anything, they really don't match anything. Instead they tell the regular expression whether or not something is true or false. We're able to look ahead and look around or we can look behind. So if you think back to the wolf document, and I said I needed to find all instances of the word wolf or wolves wood that wasn't preceded by the word red. That was a negative look behind. I said find the word Wolf. Look behind where you just were and see if the red shows up. And if it didn't, then go ahead and match. So we can either do look ahead we can do look behind. We can say this thing has to be here or this thing doesn't have to be here. Now again, this is an advanced topic I totally get that took me a long time to wrap my head around it. What finally did it for me was an instance where says on click on this tab, I needed to be able to match the letter Q inside words where it was not followed by a you now we have some of the tools. We've covered some of the tools to do this, right. We know we can use the literal character cue. That's good. And earlier when we talked about character classes, we said we could say hey, find something that's not something that was that negation inside a character class. So I could say hey, find anything that's not a you and that gets me close, right? So I've matched Qwerty and Quint RS and ter quiet and since these are I have no idea what these words are I just looked up words without a U. But notice it didn't match Iraq. And the reason is because there is no character after the Q and A character class has to match at least a single literal character from the character class, which is also why it's matching the A in the eye and the W. This is where that power of that look around comes in. Now unfortunately, this is another situation where we're going to use a previous character and it's going to change its meaning we're going to use the porins to create this look around. We tell it it's not a regular group by starting with a cute that's not a cute excuse me a question mark. And then we say Look Ahead in a negative sense using the exclamation, and then we tell it what not to find that to you. So now notice, I'm matching the Q and Qwerty I'm matching the queue and Qatar's I'm also matching Iraq. So we've said look for a queue now stop once you find a queue and pause, now Regular Expression Engine look forward into the string and is the next character not a you if it's not a you? Go ahead and find a match if it is a you stop processing,
go to the next line.
All right, I know that's an advanced topic. Again, I just want you to be exposed to it. I want you to have seen a scenario in which it was used. So you have that in the back of your mind. And again, we're going to need that for the game. So now we're going to play a game unfortunately, this was a webinar it's not as interactive. Normally, I sit back you build the regular expressions to solve this puzzle. I'm just the guide and the typist. Unfortunately, we can't do that as easily here. So I'm going to help you build it, but we're going to play Word. Now this is not cheating. I keep getting accused of this being cheating. The goal is not to solve today's Wordle The goal is to build a regular expression, in which case would solve today's Wordle so don't accuse me of cheating. Right off the bat. So I've got world here where do I have world All right, if you have never played Wordle in world what we want to do is we want to try to guess a five letter word. Proper nouns are not allowed. So these are all lowercase although their show never case. I also have this as a sidebar. I have this set to high contrast mode because I am colorblind. So when you guess a word it is going to color the tiles. I don't know if that's orange or green, but whatever the color that is that one means it is the correct letter in the correct location in the Word. The blue here means okay, that letter is in the word but it's not in the correct location. And the gray here means that is not in the word at all. Nor is it in the right location because it's not in the word at all. Now I normally like to start my Wordle with the word audio. type that out there because that gets me four of the five vowels right out of the way.
So we'll go ahead and hit enter and see what happens.
All right, so we know that A is not used at all. U is the correct letter in the correct location. D is used somewhere in the word but not there. And I and O are not used at all. Now I happen to have oops. Rios going back over here. I happen to have a collection of words but let's this is just it's a it's a short instead of words. It's in the regular expression presentation here under dict. It is a shortened collection of words because I didn't want to have to have this building tool try to parse out I don't know 150 However, many words are in the English language trying to get down to five but let's pretend for a second it actually is the full dictionary. You have the tools you have the skills now to limit all 1000s hundreds of 1000s of words down to just five letter lowercase words. Right because we know we can do the anchor we should always anchor whenever possible. We want to capture the group so that we can display it down here to know what we've matched more easily. And we've already talked about how we can tell the Regular Expression Engine to match lowercase letters by using a character class of A through Z lowercase and we have the tools that we need in order to tell it to match that pattern five times Alright, so even if we have a giant dictionary we have the skills currently to limit that down to a collection that we can use inside a world and you should be able to see here we have 12,970 192 Won't you guys can't tell us 12,971 matches but we know a couple of extra pieces of information though we we know we don't want a iro so we can come in here and say instead of giving me everything A to Z Hey, just give me everything that's not a iro All right, well, we've already gone from 13,000 down to 2500. But we still no more don't we we also know that you is in the second place and then we have to have a D All right. So um instead of doing five I'm gonna say in the second place is a U and then I'm going to say hey, I don't want a iro I also don't want to D in that third spot. And then I know I don't want a higher Oh in the last
two spots.
Now we've gone 13,020 500 to 663 matches in just one guest and we're still not done because the other piece of information you know is that we have to have a D somewhere in the Word. And we just went through the looker rounds, which allowed us to say hey, I need to match somewhere in this string something in this case I'm saying match anything that has a D word might also be followed by zero or infinity and you'd go on from 13,000 to 2500 to 600 to 104 matches and we've only had one yes. And sure enough, every single word here now is got a U in the second place. It's got a D in the word but it's not in the third place or the past all this these are all kinds of words here but only 100 words. So I'm going to grab new does a little round. I'll go yeah, we'll go with Newt. What the heck was a newt in UK Ed? Proceed. That's it. Oh, okay. So now we know we don't want an inner k. So I'm gonna go add those. So we don't want to enter K in any of these spots. And we now know the fourth spot is an E. So no longer need to have these. And we also know that last spot can't be a D Ah, okay, so we've gone 13,000 To 2500 to 600 to 104
to seven in two matches, or two excuse me two guesses.
So now we're down to deuces duels. Bloopers dupes jurors do that. I don't know much I don't goups and do that are pretty much the only words I really recognize. There. So I'm gonna try dupes or unless somebody says to try another one. Nope. Nobody's sending I mean, I'm not trying to dues. Okay, so we know the first one is D. Get rid of this. You know, the first one is D that we have a new and the oops losing my places. We know that next one. It can't be a P or an S know the third one or the fourth one is an E. We know that last one is not an S.
Ah do that. And I will try that. Do you ve t that boom.
All right, perfect. Oh, come back over my presentation. There we go. We don't really have time for more but if you want to play this kind of getting more, there are some Wordle clones out there that you can use to do this kind of thing. You can also there's also a crossword puzzle where you solve regular x or you have a regular expression and you saw what it is to fill out the crossword puzzle, which is really fun. Again, kind of getting that brain to think about how to build these regular expressions. I want to quickly point out some resources and knowledge that these are fantastic resources, I would not know nearly as much about regular expression. Without these in the game, you're gonna get this PDF. I think he already posted in the chat. These are in there. So I strongly encourage you to go look those up. And I would be happy to take on any questions at this point. I think we got a couple of minutes left.
This is great, Paul. What a blast.
I'm hoping everybody had fun with it. Yeah, so
we have this expression on I iThemes Training. It is a regular expression here that we call duct tape time. And this is when you have to wrap your head in duct tape so as it doesn't explode. There was a little bit of that today is a little bit of that today. But I understand regular expressions pretty much like I get it. I understand the why things are the way they are. So let me just invite everybody to take a look at the q&a pop up in the q&a. window. Scroll through those questions if there's a question you'd like to see answered, given an up vote. And that seems like the prevailing question, Linnaeus got it there in the chat. There's another question from Ben. Similar to this, and it's really okay. Our audience are people who build and manage WordPress sites typically for clients. Where would we find reg X showing up in WordPress?
So the big one so I for those that don't know I used to manage the WordPress workflows and websites for the University of Missouri and we had three or 400 sites there. The big place that I used that I should say the most routine place that I use regular expressions was in rewrites and redirects, right matching on requests that need to go somewhere else or rewriting request in Apache to something else. And so you use a lot of regular expressions there. I use them a lot in for processing. So I'm getting something and I want to ensure that it's an email or a phone number. It's in some I'm doing that validation, right that input validation to ensure there occasionally, we had some pretty advanced frameworks that we used, that did its own auto loading. And so you do a lot of regular expressions there saying, Okay, what is this thing I've been given? I'm gonna break it apart and figure out what I've done or what I've got, and then use those pieces to maybe load up another file. So there's lots of places that they can be used. Again, one of the warnings you know, I talked about how you have to be careful is you have to know the context, right? A dot star that I used in the game is that dot star can be dangerous because the dot again remember stands for any character, except for newline. And the Asterix says repeat to infinity. So that can be a really dangerous regular expression, because it's going to match anything except a new line. But in the context of today, I already knew that every single word in that dictionary was already a lowercase five letter word. So being a little less optimized is okay. So part of that is just knowing that and that takes not expertise, experience, that sort of it that just takes experience of knowing when to use and when not to use and when is when there is another solution going to be more optimal.
Yeah, guy that makes a lot of sense. And you know, this is if this is like any other language, right? We see the two programming or you know, human language. You can kind of get the basics but it does take like you have to use this and sort of figure it out. But, like in an hour, though, I think we've gotten a really good lay of the land on how this whole thing works. That's pretty cool.
That's part of why I really encourage people to play those games. You know, go and play the crossword regex. Go and play Wordle where you're not doing the Wordle so much as you're you're trying to build the regex to the word to get your brain thinking, okay, how can I do this? What can I do? Maybe if you're trying to find something and a piece of tech sometime, instead of just doing a normal, you know, say Hey, can I build a regex to do this and just try to get that that muscle memory going?
Yeah, for sure. And a really nice way that this that regex can be implemented in just about every website is if you're asking for a phone number and a form and you have 1800 formats like you just showed in that spreadsheet. You can add some many form plugins allow you to do some post processing on that so that everything is in the format that you want, whatever that happens to look like. There's a question here from Kara and here's a good practical use of regex for like a rewrite. So Karen says I have the current blog is domain name.com/blog. Post. So it's just the post name after you know, the domain. If I wanted to add slash blog in front of those posts, what would be the regex to rewrite that, and then there's some conditional things about post types and stuff, but just if I wanted to change all my URLs, so that it went from domain name.com/post name, and put a blog in there, how would I do that in regex?
Oh, so what you would probably do and this goes, Where's, where's my mouse? I've lost my mouse. There it is. Sorry about that. Let's flip back over here and let's grab this let's get rid of this. So, domain name.com.
Try and we've got slash blog post. And what we want to do is we want to get that to domain name.com/blog. Is that what I'm following
slash blog posts? Yeah. Okay. So yeah, slash blog slash blog posts like we had
Yeah. Now inside of a patchy we usually don't have to worry about the domain name. So just be probably just blog post. And then we're going to have something at the end right. So it's going to be blog posts slash some post of mine. So what we could do is we could say, hey, I want to match from the beginning. I want to do any you don't have to stay by the way in the Ford slash inside of patching. You don't have to escape. I don't think it should point if you do, but I'm going to here to make sure. So we have walked post. And then we put that slash there. And then it might be we might use a dot star and capture that whole thing. And then down below, we're going to switch that and send that to blog dollar one
haha,
look at that. And so dollar one is basically the it's like a variable in a sense, right? That's pulling in from
Yep, that capture group is ordinal. So if I had to for some reason, you know, that's our I don't know how I separate this maybe another slash another slash and then maybe the dot star. This was more so then I can do slash dollar to to get them all apart and arresting now, probably would not use a dot star here. This was a quick one, right? Just to show you. You would probably want to be more specific. It might be especially if I'm looking for this other slash, I might say, hey, match everything that's not a slash. Didn't make sure that I can then get that slash. Again. So the context is going to depend also on Apache, the query variables if they're already pretty girls, we're going to be in a separate area that you have to account for them as well. So there's, there's more to it than just this, but that's that gives you kind of a quick basic answer to that question. Yeah, for
sure. So that answers this and regex. Now, Reg, see I just did it. All right, sorry. So the thing about this is though, in WordPress, this is this is going to redirect every URL, no matter what so it wouldn't know that it was a blog post. And so there's other ways in WordPress to do this. But to get the renaming that this is essentially the the seed of the regex that would get that done.
Correct. And if there were certain scenarios where you didn't want it to do it, there's also that look ahead, look behind those look, arounds that we can utilize to say, Okay, I want to do this in every scenario, except when maybe blogger is there, or I don't know, I don't know. A good example is cage but so we can also use those things inside rewrites and redirects as well. Yeah,
it's really interesting. Paul, this has been great, man. Thanks so much for your time and expertise on this folks, great questions as well. This is I think this is going to be a rewatch for a lot of us just to get our heads back around. And I'd encourage you all to share this as well. I'm going to drop in the link bundle. One more time there in the chat. Paul, tell us about what is platform.sh And where can they find you
so they can find me. Basically, you can remember my last name gi l Zero W if you Google me, I'm pretty much Gildo everywhere. I'm blessed in that there's only a handful of guilds in the world, and most of them are related to me. So it's really easy to find me. You can also give me a poll.gildo@platform.sh We are a platform as a service provider. We manage infrastructure and deployments so that you don't have to worry about infrastructure, we manage that we expose that infrastructure back to you as simple configuration files. So great example is trying to upgrade your pH or your WordPress site from PHP seven, four to 8.1. Well, it's as easy as saying I want PHP 8.1 you're committed and push it and we give you your code on that stack. And then you can test it. It's an exact copy of production. And that way as you move that code in production, you know, you've tested it exactly the same way as what you're going to get in production. It is for me, it was the same epiphany as when I started using Git of having this freedom to experiment and have alternate universes of code. I now have alternate universes of stacks and infrastructure that I can play with and allow me to iterate much faster. So yeah, I'm a I'm just as excited to talk about platform as I am about regular expressions. Very cool.
Well, Paul, thanks again, for your expertise and your time on this. Thank you all for being with us as well. Great questions and good chat throughout. That's going to wrap it up for us today. I'm back tomorrow at 1pm Central for office hours for our members, and we'll see you back then on iThemes Training where we go further together.