Quick and Dirty Complex Text Replacement with Ryan’s RegEx Tester (AutoHotkey Tip)

While Not For Beginning AutoHotkey Script Writers, This Regular Expression (RegEx) Trick Executes Multiple Complex Text Replacements without Even Writing an AutoHotkey Script

Last time, I discussed the short InstantHotkey.ahk script I used to add links to the new download portal page “Free AutoHotkey Scripts and Apps for Learning Script Writing and Generating Ideas.” By adding a temporary Hotkey, I avoided half of the back-and-forth Windows Clipboard action. This is an excellent time saver as long as you’re planning to work on each individual link. However, when I decided to turn all the references to the AHK script files into direct download links, the prospect of editing each filename seem a bit much. I knew it was time for an AutoHotkey Regular Expression (RegEx).

The RegEx Script

This is not the first time that I’ve used RegEx to lighten the load. My January 1st blog, “A Perfect Place to Use an AutoHotkey Regular Expression (RegEx in Text Replacement)“, offers a RegEx script for a very similar situation. (The same blog goes into enough detail that you might consider it a mini-tutorial on RegEx.) I decided that all I needed to do was develop a working RegEx, then insert it into the sample code. For that I turned to Ryan’s RegEx Tester.

Cover 200I would be remiss not to point out that this topic moves into a more advanced area of AutoHotkey—or for that matter any other programming language. While available in almost every programming language, many coders consider Regular Expressions a mysterious form of an alien language. Understanding and writing Regular Expressions takes a special way to thinking. Once understood, they make perfect sense, but require different insight into the problem. Although simple to start, a RegEx can quickly become complex. Helping others to comprehend the powerful and flexible AutoHotkey RegEx functions was my motivation in posting this AutoHotkey RegEx introductory page and writing the book, Beginner’s Guide to Using Regular Expressions in AutoHotkey.

Ryan’s RegEx Tester

If you’ve read much of what I’ve written about AutoHotkey RegExs, you know that I consider Ryan’s RegEx Tester an essential tool. Written in AutoHotkey, the tester makes development of expressions much easier by instantly displaying the results for any changes made to the RegEx for both of the AutoHotkey functions: RegExMatch() and RegExReplace(). (Even though written in AutoHotkey for AutoHotkey users, anyone who works with Regular Expressions, regardless of programming language, would likely find Ryan’s RegEx Tester valuable.)

I copied the source HTML code text into the top field of Ryan’s RegEx Tester and started playing with the Regular Expression and the Replacement Text values.

RegExTester
Ryan’s RegEx Tester shows a solution to the AHK links problem in the Regular Expression field. For the RegExReplace() function (second tab), the original text is pasted into the top “Text to be searched” field. The “Replacement Text” field uses the same format as the function (except that in the function the double quotes marks appearing in the HTML tag must be escaped with another preceding double quote mark). The final result with all matched replacements included appears in the “Results” field at the bottom.

 

Eventually, I came up with the result shown in the image above. At that point, I had an epiphany. If I’m only going to do this once, why bother putting the expression into a script. The answer I wanted was staring me right in the face.

Ryan’s RegEx Tester for Immediate Results—Without a Script

While I do explain the details of this particular RegEx and how the replacement field works below, my revelation that the tester could work as a standalone RegEx tool altered my thinking about many AutoHotkey RegEx problems.  If I copy the target text into the top of the tester (and the expression works properly), then I only need to recopy into the original file the resulting text located in the bottom field of the tester. It turns out that you don’t necessarily need to write a new AutoHotkey script or even load the main AutoHotkey package onto your computer in order to take advantage of AutoHotkey RegEx.

If you compile the AHK file version of Ryan’s RegEx Tester (RegExTester.ahk) into an executable file (RegExTester.exe) and save it on a USB drive, then you can take that RegEx replacement capability to any Windows computer. Load the tester on any Windows computer; copy the original document text into the tester; enter the RegEx and replacement text; then copy the results back into the original document. Ryan’s RegEx Tester becomes a powerful portable tool for text manipulation—no script required!

Caution: I suggest you copy the results to a blank or alternate copy of the original document. If you messed up on the RegEx (which isn’t difficult to do), then you’ll want a protected original before you commit full on to the changes. (This caution would apply just as much to any AutoHotkey script which makes massive alterations.)

Now let’s take a look at the problem and the RegEx solution. The RegEx searches all of the target text looking for the varied filenames which match the expression, then converts those filenames into HTML download links.

The AHK Links Problem

I’ve include the following discussion of this RegEx for those people who want to know how the magic happens. If you’re new to Regular Expressions, it may be well worth your time to review this RegEx Intro Page first.

In this case, I needed to work directly with the Web page’s HTML code since that’s where links are actually embedded. Web editing pages contain HTML code modes, often called source code pages. There you can work directly with the HTML code as text. This source view allows changes to the code which translate to the main editing page as seen by the user.

The object is to convert all the AHK filenames (e.g. QuickLinks.ahk) surrounded by the bold character tags (<b> and </b> respectively) into links by enclosing them within the appropriate link tags, beginning with:

<a href="http://www.computeredge.com/[path][filename].ahk">

and terminating with:

</a>

respectively. (Shortening the first line above for display purposes, [Path] merely represents the subfolder /AutoHotkey/Download/ while [filename] is the unique name of each AHK file. RegEx Tester uses the actual download path while the filename gets matched with RegEx symbols.)

We use the RegExReplace() function to scan the entire text, replacing each bold AHK filename (<b>[filename].ahk</b>) with itself surrounded by the appropriate link tags:

<a href="http://www.computeredge.com/[path][filename].ahk">
               <b>[filename].ahk</b></a>

However, rather than writing an AutoHotkey script, the RegExReplace tab of the RegEx Tester does the job.

Writing the RegEx

When writing a RegEx, understand that the RegEx engine always looks for a match. Therefore, the first and easiest pieces of the expression consist of the unchanging parts of the match. In this situation, the bold tags and the AHK extension never vary inside a match (<b>, </b>, and .ahk). Since these characters never change, simply add them to the expression:

<b>[filename].ahk</b>

Slightly more difficult, figuring out the symbols to match the unknown [filename] complicates the RegEx. If you check the AutoHotkey Regular Expression Quick Reference, you’ll see that adding \w (backslash lowercase w) tells the RegEx engine to match any alphanumeric character. Add a plus sign + and the matching of alphanumeric characters continues until encountering the first constant character—in this case the dot in front of the extension (.ahk).

However, the dot is a RegEx wildcard which tells the RegEx engine to match any character. To make the dot just a dot, place a backslash in front of it:

<b>\w+\.ahk</b>

(I just noticed that the expression in the image above includes square brackets around the \w symbol. We use the square brackets to indicate a range of options. However, since \w is the only option, the brackets are not needed. Nor, do they cause a problem.)

This expression matches any AHK filenames surrounded by the bold tags. But, we need to save the unique name of that file for reuse in the replacement string. For that we create a backreference.

The RegEx Backreference

Sets of parentheses ( … ) have a number of uses in Regular Expressions, but one important results is the creation of a backreference. Whenever the characters and symbols within parentheses produce a match, the RegEx enging saves the results in a backreference for later use. Numbered consecutively in order from left to right, the back reference may be used either in the same RegEx (e.g. \1 , \2, \3) or as part of the replacement text in the AutoHotkey RegExReplace() function (e.g. $1, $2, $3) .

In this case the backreference creates a copy of the unique filename:

<b>(\w+)\.ahk</b>

which gets used in the replacement text of RegExReplace():

<b>$1.ahk</b>

Saving the unique AHK filename, the back reference—the matched text inside first set of parentheses (\w+)—uses the symbol $1 to replicate the filename in the replacement text field of the RegExReplace() function. If we stopped here, the input text would remain unchanged, replacing each match found with itself.

The RegExReplace() Replacement Text

As shown above, adding the replacement text includes adding the filename back in:

<b>$1.ahk</b>

The $1 represents the stored backreference found in the first set of parentheses.

Next, surround the filename with the link tags:

<a href="http://www.computoredge.com/AutoHotkey/Downloads/$1.ahk">
      <b>$1.ahk</b></a>

(Due to space limitations in Ryan’s RegEx Tester, the entire Replacement Text does not appear in the image above, but I assure you that it’s there.)

Note that the same $1 AHK filename backreference appears in the URL. This inserts the matched filename into the new link.

Ignoring Previously Linked Filenames

One more issue arises when this RegEx encounters any previously linked AHK filenames. The current RegEx will add the link tags again. We don’t want to relink the old links. We need a method for excluding all preexisting AHK file links. For that we take a look at the HTML code. In HTML a link consists of the two enclosing anchor tags <a href> and </a>. If the AHK filename is already a link, then it will appears as:

<a href="[link URL]"><b>[filename.ahk]</b></a>

To prevent relinking, we must detect this link code (or part of it) and exclude the match from the RegEx. Recognizing the ending </a> is easiest since it never changes. The URL in the beginning <a href> tag varies with the filename. Now we look for a way to exclude the match.

Newbies commonly make an error here by assuming that they can just look for </a> with the not ^ symbol inside a range:

<b>(\w+)\.ahk</b>[^</a>]

But rather than looking for the specific string </a>, we’ve created a range of negative options. If any one of them occurs (<, /, a, or >) the RegEx fails (no match). That’s not good enough. The total string </a> never gets identified. Use a negative look-ahead assertion to check for the string without affecting any match results.

Negative Look-ahead Assertions

Look-ahead and look-behind assertions provide a technique for including or excluding RegEx matches based upon characters found to the right (ahead) or left (behind) of the expression. (The options can be found at the end of the AutoHotkey Regular Expression Quick Reference page.) While both look-ahead assertions and the more limited look-behind assertions check a specific subexpression for acceptability, the RegEx engine does not include it as part of the match. Plus, both types of assertions may be negative (or acceptable only if the subexpression does not match).

<b>(\w+)\.ahk</b>(?!</a>)

This negative look-ahead assertion checks for </a> immediately following a matched expression. If found, RegEx rejects the match. This technique rules out previously linked filenames.

Now with both the proper RegEx and Replacement Text, the entire original Web page source code can be copied into the top of the tester which then returns the new code in the bottom of the tester. Simply select all of the new code (CTRL+A), copy it (CTRL+C), and paste it (CTRL+V) into a new Web page source code page. Once you know it works, archive the original and make the new file the original. Other than Ryan’s RegEx Tester, no AutoHotkey script required!

Or, if you plan to use the RegEx again, put it in a script using the RegExReplace() function, such as that found in this previous blog.

Tip: If you decide to put this type of code into the RegExReplace() function, be sure to escape any double quote marks in the Replacement Text (as occurs in href=http://www.computoredge.com/AutoHotkey/Downloads/$1.ahk) with a preceding quote mark (e.g. href=“”http:// …). Otherwise, it will drive you crazy until you figure out what’s wrong. Ryan’s RegEx Tester does not require double quotes to be escaped, but the Replacement Text in the function does.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s