Total the Numbers Found in Any Document (AutoHotkey RegEx Tips Part 5)

For a Quick-and-Dirty Calculator, Use Regular Expressions (RegEx) to Pull Numbers from Documents or Web Pages and Total Them Up—Plus, a RegEx for Removing (or Extracting) Numeric IP Addresses

Shifting gears, I end the discussion of the MultiPaste.ahk script which parses copied data into component parts for easier paste operations into other documents. With this blog, I start working on another tool for simplifying a Windows task—addition.

Sometimes I see a list of numbers in either a document or a Web page which I would like to quickly total without loading a separate calculator. For example, the shopping cart program I use for my book sales offers a summary table of all recent sales. While I can use a reports section of the site to get more information (e.g. monthly sales), I want a tool to quickly highlight the desired entries and give me the total of the individual sales. To do that I use a Regular Expression (RegEx) specifically for extracting those sales numbers.

Library Benefits

At first, this seems like a fairly simple task—and, if all I want to do is total dollars, it is! However, I seek a tool with a little more universality. I want the option to total any type of numbers including statistics, any set of floating-point numbers with variable decimal places, time increments such as HH:MM:SS found in online phone bills, as well as, sums of currencies including British pounds £ and Euros €. That means I intend to spend some time developing and discussing this new script.

All of the numeric extractions will use Regular Expressions, therefore these blogs complement the latest RegEx series involving the MultiPaste.ahk script. If you’re new to Regular Expressions, then you may want to start with the first blog in the series which introduces a few simple RegEx concepts—”Finding US Zip Codes (AutoHotkey RegEx Tips Part 1).” The Regular Expressions in these upcoming blogs build on those previous discussions.

A Number Totaling Script

While I expect that this script will grow with the addition of special text formatting, GUI window tricks, and a number of other non-RegEx techniques, I always start with the basic tasks:

  1. Copy a selected area to the Windows Clipboard.
  2. Set the variables for the number extraction loop.
  3. Start the RegEx search loop which extracts the numbers.
  4. Total the extracted numbers.
  5. Display results.

Note: Displaying a list of floating-point numbers (especially in message boxes) presents particular problems (i.e. aligning the decimal points)—including monetary values. In the middle of this series, I plan to take a side trip into the confusing Format() function. While I don’t find understanding the Format() function easy, my experiences may offer some insight into its use and power.

Extracting Numbers from Documents

NumberAdd1a
This script totals floating-point numbers found in a Web page.

The following techniques show their flexibility by working with any type of window, whether a text document, Web page, or spreadsheet. Indeed, in a spreadsheet, the tool’s reliability increases since you can highlight any column, row, or area, and the script totals every number contained within that space. You’ll find text documents and Web pages a little more limited since you can only select contiguous blocks.

First, we write a RegEx to identify any floating-point number by looking for the decimal point. Because this RegEx ignores possibilities which don’t include a decimal point, it skips dates, times, zip codes, and other numeric values you would never want to sum:

\d+\.\d+

Cover 200

The expression \d represents any numeric digit zero through nine (i.e. 0-9). The plus sign + modifies the RegEx to match at least one digit and continue matching any subsequent digits immediately following the first. Note that we must escape the dot \. with a backslash. Otherwise, the dot acts as the match anything wild card. After the decimal point, this RegEx requires at least one more numeric digit.

In most cases, any numbers you may want to sum in a document will include a decimal point—except time durations (hours, minutes, seconds i.e. hh:mm:ss). If you use another application which does not include a decimal in its numbers, then you will need to adjust the RegEx to fit the situation. Any numbers without a decimal point could cause confusion for the script—unless the document contains no other numeric text.

As shown in the image above, the script (available at the end of this blog) extracts and sums numbers which contain a decimal point, then displays the total in a MsgBox command window.

Hiccups occur when numeric URLs appear in the selected area in the same page or document. (This problem occurs in my sales example.) The RegEx can’t tell the difference between pieces of IP addresses and the decimal numbers. I considered writing a RegEx which skipped any URLs, but that turned more complicated than simply eliminating addresses prior to totaling the numbers.

Eliminating Numeric URLs

As I can see no reason to ever add up URL IP addresses, eradicating them offers the cleanest solution. The following RegExReplace() function removes all numeric URLs from the contents of the Clipboard:

Clipboard := RegExReplace(Clipboard,"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")

The RegEx in the above line of code identifies possible IP addresses by looking for one to three numeric digits \d{1,3} separated by dots. (Tip: The RegEx \d|d?\d? works in the exact same manner as the above \d{1,3} RegEx.) Since the modifier {1,3} means each set isolated by dots must contain at least one but no more than three digitals, any IP address will match and face removal. You can find much more complicated Regular Expressions for matching only valid IP addresses (numbers 0 to 255 in each section), but we only need to recognize possible IPs.

The NumbersTotal.ahk Script

I’ve used the beginning of the Standard Clipboard Routine to capture any selected text, however, I do not save the old contents of the Clipboard nor restore the Clipboard at the end of the routine. I choose this course because I anticipate eventually creating a GUI window for selecting the type of sum options (dollars, pounds, time intervals, etc.)  while the data remains sitting in the Clipboard:

^!#t::
  Clipboard = ; clears the Clipboard
  SendInput, ^c
  ClipWait 0 ; pause for Clipboard data
  If ErrorLevel
  {
    MsgBox, Error message for non-selection
  }

  TotalNum := 0
  CountNum := 1
  NumList := ""
  Next := 1

  Clipboard := RegExReplace(Clipboard,"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")
  Loop
  {
    FoundPos := RegExMatch(Clipboard,"\d+\.\d+",Number,Next)
    Next := FoundPos + StrLen(Number)
    If FoundPos = 0
      Break
    NumList := NumList . Number . "`r`n"
    TotalNum := TotalNum + Number
    CountNum++
  }

  MsgBox % NumList . "`r`n" . TotalNum

Return

This version of the NumbersAdd.ahk merely offers the first step in writing this script. Next time, I plan to look at some of the formatting issues which appear in the MsgBox window as shown in the image at the beginning of this blog.

Click the Follow button at the top of the sidebar on the right of this page for e-mail notification of new blogs. (If you’re reading this on a tablet or your phone, then you must scroll all the way to the end of the blog—pass any comments—to find the Follow button.)

jack

This post was proofread by Grammarly
(Any other mistakes are all mine.)

(Full disclosure: If you sign up for a free Grammarly account, I get 20¢. I use the spelling/grammar checking service all the time, but, then again, I write a lot more than most people. I recommend Grammarly because it works and it’s free.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s