ph_regex PowerHome formula function
Description
Performs a regular expression search on a string of data.
Syntax
ph_regex ( pattern, data, start, flags, localstart, locallength )
Argument Description
pattern String. A regular expression search pattern.
data String. The string in which to perform the search.
start Long. The position within the data in which to start the search. Use 1 to start at the beginning.
flags Integer. Flags that control how the search is performed. Add individual flag values together. Add 1 to cause the search to match case. Add 2 to cause the search to ignore cr/lf's within the data.
localstart Integer. The index of a local variable in which to have the start of the found data returned. Use 0 to not have the start returned.
locallength Ingeger. The index of a local variable in which to have the length of the found data returned. Use 0 to not have the length returned.
Return value
String. Returns the data that matches the regular expression search criteria.
Usage
Use this function for powerful text searching capabilities. The regular expression special characters supported are:
. Matches any character.
\< This matches the start of a word, where a word is defined in the traditional sense, that is, letters, or number. Spaces, punctuation, CR/LF, etc. would not be included as part of a word, and thus create a break.
\> This matches the end of a word. See also word definition above.
\x This allows you to use a character x that would otherwise have a special meaning. For example, \[ would be interpreted as [ and not as the start of a character set.
[...] This indicates a set of characters, for example, [abc] means any of the characters a, b or c. You can also use ranges, for example [a-z] for any lower case character.
[^...] The complement of the characters in the set. For example, [^A-Za-z] means any character except an alphabetic character.
^ This matches the start of a line (unless used inside a set, see above).
$ This matches the end of a line.
* This matches 0 or more times. For example, Sa*m matches Sm, Sam, Saam, Saaam and so on.
+ This matches 1 or more times. For example, Sa+m matches Sam, Saam, Saaam and so on.

An important note on the special characters is that they can conduct a "greedy" regular expression search. When using the * and + special characters this function will not stop at the first match but will instead go to the last match.

NOTE: If a string contains any quote characters (") then the string must be delimited with the single quote charcter ('). For example... 'he said, "no"'

Also this function will not perform a regular expression search that spans multiple lines. If the data to search contains carraige returns or line feeds, the entire matching search data for the regular expression must exist within a single line. If your regular expression must span across a line, then add 2 to the flags to have CR's and LF's temporarily converted. CR will be converted to ASCII 128 and LF will be converted to ASCII 129. If you convert CF/LF then you can include them in your search with PowerHome escape characters ~128 and ~129 respectively.

NOTE: Where search pattern srings might contain a quote (') character then you must use the single quote (') character to delineate your string variables. For example:
ph_regex ('he said, "no" ', "[LOCAL1"', 1, 0,1,2 )

You may also perform multiple searches by separating your search pattern using the PowerHome escape character ~255. This is most useful when CF/LF IS NOT replaced and trying to match a particular piece of data. When using multiple searches, only the last matching search data is returned. An example multiple search would be: "degrees$~255[0-9]+ humidity". What this search does is first search for the first occurence where the word "degrees" appears at the end of a line. The function will then do a regex search using "[0-9]+ humidity" starting from the end of the last regex search (the start of the line following the one on which "degrees" was found).

This function is often used with other PH string functions to trim a larger string, or to locate a string position within another string. See also pos(), posw(), ph_pos(), left(), mid(), right().

See also the .FAQs-String Tips-Hints Help file.

Examples
The following examples demonstrate typical syntax/usage for this function.

*** Simple Example ***
Assume you have multiple water leak sensors installed and trigger from each individually, but want to process them all with a single common Macro routine that puts the battery status (GOOD/BAD) in a series of Globals named "BATCHK_SINK", "BATCHK_WASHING", "BATCHK_TOILET"

The Leak Sensor battery periodic heartbeat Trigger will pass the triggering device's ID (eg, "WATER LEAK-SINK") to the Macro. If you name all the device ID's in a similar fashion, such as . . .

WATER LEAK-SINK
WATER LEAK-WASHING
WATER LEAK-TOILET

Then you can strip off the unique device name (following the dash) and write the status to the appropriate Global var, as follows.

The following string operations would find the unique device name then append it to the base Global string to form the unique Global variable name.

string operations

Macro line 100 searches the device ID string date passed in TEMP10 by the Trigger and looks for the dash character ("-") starting at position 1 in the string. Since this is a simple search no special Flag settings are needed, so "0" is used. The last two parameters will store in LOCAL 4 the position of the "-" in the string, and in LOCAL5 the length of the found data ("-SINK"). These are useful for further string operations, but actually not needed here, but shown for generality.

Macro line 110 trims the dash off the found string, leaving only "SINK"

Macro line 120 appends "SINK" to the common Global name of "BATCHK_" forming "BATCHK_SINK"

Finally in line 130, the value "OK" is written to this Global variable.

Line 140 prints out the various parameter values FYI.


*** Complex Example ***
The following examples assume that the following string (with CR/LF line enders) is stored in [LOCAL1]...

  ROMId,Name, Value,Avg,
"3F000001CD92C728","Refrig",39.20,37.46,
"3F6000017C8BD128","Outside",23.90,19.81,
"3F000001CDB2BA27","House",70.65,70.13,
"3F000001CD9E6D28","Freezer",1.96,1.11,

The following command extractsthe 9 ending digits of the first ROM ID. Note there is no need to set the CR/LF flag since the Start Pointer (35) positioned the search to start and end in a single line.
ph_regex ('.........',"[LOCAL1]", 36, 0,2,3 ) --> returns "1CD92C728"

This captures the initial portion of the first ROM ID
ph_regex ("3F[0-9]+","[LOCAL1]", 1, 0,2,3 ) --> returns "3F000001"

Because of "greediness" the following will search from "3F" to the beginning of the last word it can find on the line (not the first word).
ph_regex ("3F.+\<","[LOCAL1]", 1, 0,2,3 ) --> returns "3F000001CD92C728","Refrig",39.20,37."

Note that if the CR/LF flag had been set to 2 (ignore line endings) the search would have captured everything from the first "3F" all the way to the "...6D28","Freezer",1.96,1." characters at the end.