Understanding Regular Expressions Part 2

hello and welcome to part two of regular expressions for the web administrator this is lesson number 11 in a 52-week series on what every web administrator needs to know to be successful in web space this is a series on various tools tricks and techniques targeted at you the web administrator last week I released the first part of this but because I wasn't able to cover it all in 10 to 15 minutes I broke it in two parts so today's continues where last week's left off first so let me answer a question that came up about regular expressions the question was whether regular expressions caused more performance overhead than wild cards or static rules the answer to that is yes but don't let that scare you off the resource that's used for regular expressions is CPU so don't worry look desk or memory or anything else and it's really nothing to worry about most of us have CPU to spare and the romantic view that it uses is not a lot so yes it will use more than static rules and it will use more than Yuval wildcard rules but most cases it's fine what you can do is watch for your particular situation to see if it does impact you specifically and if it doesn't – no worry at all and again don't be scared off by this I use regular rules all the time on servers with hundreds of megabit per second of traffic really and with dozens and dozens of rules and is really not a problem I have heard of people that have had impact from the regular expression rules it does happen so do watch for it but don't be scared off by it so with that said let's dive in and pick up where last week's left off don't forget this won't make much sense unless you saw last week's first okay so the next one here is a question mark stands for optional so here's a good example let's take a look at back to our hosts and we're gonna go to a test pattern and remember if we go to WWN host comm and test this is going to work so now let's actually make this ww optional and what we do is this question mark refers to the character immediately for or this set within the parentheses immediately before in this case it's s so the entire WWE is optional so if we test it it works if we test without the WW it works but now if we try admin dot cantos accom it does not work because it either has to be W or exactly cantos accom so this is where the question mark really comes in handy and if you want it because you can make that C optional to show that it can work with just a single character as well so again cantos accom works we drop the C it still works but of course if you drop the O it's not going to work because that's not a conditional character okay so the next is these parentheses that are used to create sections or back references and you can see we did this here we can see the parentheses which is puts the whole WW together in one group WWF dot actually and it's also used in the back reference with I'm going to cover shortly but you can now see that the parentheses is a special character again remember I had it before here and you can use it for a section for dotnet org dot biz C this is all yours this is all valid and let's do a good test here and you can see you can test it here so again looking pretty cryptic isn't it but it's actually quite understandable when you go through step by step okay so the next one here is our square brackets for a character class this is kind of fun let's switch to something different here and let's add something else how about our query string instead and let's do something like let's go to our test pattern and we want ID equals five and site equals six seven eight okay and so we want to powder the match as a course with dot asterisk we'll catch the whole thing right but we want to break this into parts because let's assume later on we want to use some SEO friendly URLs for the search engines we can use something like ID equals and right now let's make it exact and site equals six seven eight courses matches right but let's change that five and we're going to do a square brackets and an inside we can do a range 0-9 gives us a range numbers and notice we test it still keeps working okay and of course that only works for a single character so if we do let's do a plus afterwards says there has to be at least one but any number of numbers so it's gonna work for a five six you know we can put a whole bunch of numbers here and it's always going to pass okay and the same here let's switch this – again six let's do a 0-9 and we're going to say it can appear one or more times and if we test notice it still passes so this is what a range is now we can also say let's in here let's do 0-9 let's do a – Z and capital A – capital Z and we test it's going to allow alphanumeric characters so we can do ABCDE F in here notice it passes but this one the second side didn't have it so we test and okay now in this case we need to make sure that we end with our dollar sign and start with our carrot we want to be more precise now notice it will fail because it doesn't accept alphanumeric characters only numeric test again typical work okay I realize I am talking fast you may need to slow this down you may be watch it a couple times but hopefully this is making sense so far now notice you can also put in other special characters like a – okay so if we add a dash in here now it'll work and now notice I'm going to change this one before I add it to the bottom part and it will fail we can add an underscore in here and that makes it a valid character there we go so this is what the square parentheses are for now you can also use indicate now this is interesting because watch this if I do let's say I'm gonna and then the gate is actually a hat character here now here's what's odd is that the whole string starts with the caret that refers to the beginning of the string but if a square bracket section starts with the carrot the carrot has a whole different meaning so it's a double meaning therefore the one character and it means negate so we're gonna say any character except an a so we it won't pass if there's an A in it let's see this fails so now let's remove our a and it works in the real world you're going to see something fairly common for example any character except a nan sign it's fairly common and you can see anything until you get to an and will pass okay so now we look at this now there's a couple special characters that are really handy two of them are /w and are /d for a whole were often American words so here we can do here let's say any /w is any alphanumeric character now this is not going to work right now because of the that's – and underscore so we'll remove that and hit test and you can see now it passes so any alphanumeric character now let's enforce this one so to see there's two ways to do it you can do the square bracket 0 – 9 or the equivalent is the backslash D any number of times and that refers to the digits so now if I make this a b c notice the second one has to be a digit but the first one can be alphanumeric so that's where our backslash w and our backs – d come in again look at this even more cryptic looking but hopefully it's starting to make more sense the last thing i want to cover today is our 11th rule or bonus rule now this is specific to each environment and this is the back reference in this case notice that our URL rewrite we use the curly braces our colon and then whichever instance is referring to I'll show you this is in a minute and R is a back reference to the rule and see is the back reference to a condition now this changed in different environments for example visual studio we use a / 0 or / 1 / number here as well so it does a little bit different URL rewrite or mod rewrite or JavaScript all use different ways so you might just need to quit Google in the environment you're in and save a preference for your environment and see what those conventions are for rent or for URL rewrite it's going to be here so let's try this out so let's do let's do an example here how about this let's switch this to our user agent user agent and we're gonna say anything we don't care what it is now if it's the bottom rule that's a see that's our condition so let's do something here let's do a redirect to now notice it has to start with Kanto so calm so let's redirect to something different which is without the double use so we don't have an endless loop slash and then let's do our condition so what's going to happen is if we come in to Deb Deb canto so calm it's going to redirect to canto so calm and then it's Emma Chris you wouldn't do this in the real world it's going to add our user agent which is the browser information into the query string let's try it out so Kanto so calm with the W's enter and notice here it gives contoso calm mozilla 5-0 all the details of course if we try this in IE so we can do something like Kanto so calm it does the same thing except it now of course it has information related to Internet Explorer in this error message of course the page doesn't found isn't found so that's fine okay so this is a back reference to a condition and we can also do the back reference to the URL up here at the top and that's done with the R so we can see here / you entered and let's just do this R 0 and whatever we put in the URL again we want to come in first to dub dub dub canto so calm and we're gonna say sub folder so I notice a redirected and says you entered sub folder / sub sub folder so you can see we have a back reference to this here and finally let's look at the different parts notice I use to 0 each time what does that really mean let's take a look here at an example and we're gonna say the URL would be sub well let's try something different let's say ID equals 5 and site equals 6 7 8 and we're gonna test notice it passes okay let's be the more precise that let's actually make this string exactly the same and start to break it into parts so let's if we test this notice tests exactly but let's actually put this in parentheses six seven eight notice this as soon as you put something in parentheses it creates a new region or a new group and we see the r0 the first one it's in order so a zero refers to the entire match and then 1 2 3 etc so actually let's put our ID in parentheses and so the first one the ID is the five and then the site is going to be your six seven eight so then what this allows us to do is break this into parts and again let's switch this to any digit one or more times and let's switch this to any alphanumeric digit one or more times and tasks see it still passes of course now if I were to end this let's start this with a caret and with a dollar sign and make this an A it's not going to pass maybe without the a it is going to pass so if we wanted to in the back reference we can use an r2 and it's actually going to have a six seven eight that we can use in our action rule as we redirect so there we go it move through very fast they're probably too fast I hope that this made sense and that you were able to follow it and I hope please give me some feedback if there was too fast let me know in the future for lessons like this I might break it into a couple parts rather than trying to speed talk here so thanks for listening hope you have a great day and you tune in next week


You May Also Like