ph_regexdiff1 PowerHome formula function
Description
Searches a string of data and returns the data between two regular expression searches.
Syntax
ph_regexdiff1 ( pat1, pat2, data, start, occ, flags, localstart, locallength )
Argument Description
pat1 String. The starting regular expression search pattern.
pat2 String. The ending regular expression search pattern.
data String. The string in which to perform the search.
start Long. The position within the data in which to start the search. Use 1 to start at the beginning.
occ Integer. The matching occurrence number to be returned. Use 1 to return the first matching occurrence. A 3 will return the third matching occurrence, etc.
flags Integer. Flags that control how the search is performed. Add individual flag values together. Add 1 to cause the search to match case. Add 2 to cause the search to ignore cr/lf's within the data.
localstart Integer. The index of a local variable in which to have the start (1st character) of the found data returned. Use 0 to not have the start returned.
locallength Integer. The index of a local variable in which to have the length of thefound data returned. Use 0 to not have the length returned.
Return value
String. Returns the data string between the pat1 regular expression and pat2 regular expression.
Usage
Use this function for powerful text search capabilities. The regular expression special characters supported are:
. Matches any character.
\< This matches the start of a word, where a word is defined in the traditional sense, that is, letters, or number. Spaces, punctuation, CR/LF, etc. would not be included as part of a word, and thus create a break.
\> This matches the end of a word. See also word definition above.
\x This allows you to use a character x that would otherwise have a special meaning. For example, \[ would be interpreted as [ and not as the start of a character set.
[...] This indicates a set of characters, for example, [abc] means any of the characters a, b or c. You can also use ranges, for example [a-z] for any lower case character.
[^...] The complement of the characters in the set. For example, [^A-Za-z] means any character except an alphabetic character.
^ This matches the start of a line (unless used inside a set, see above).
$ This matches the end of a line.
* This matches 0 or more times. For example, Sa*m matches Sm, Sam, Saam, Saaam and so on.
+ This matches 1 or more times. For example, Sa+m matches Sam, Saam, Saaam and so on.

An important note on the special characters is that they can conduct a "greedy" regular expression search. When using the * and + special characters this function will not stop at the first match but will instead go to the last match.

NOTE: If a string contains any quote characters (") then the string must be delimited with the single quote charcter ('). For example... 'he said, "no"'

This function will also not perform a regular expression search that spans multiple lines. If the data to search contains carraige returns or line feeds, the entire matching search data for the regular expression must exist within a single line. If your regular expression must span across a line, then add 2 to the flags to have CR's and LF's temporarily converted. CR will be converted to ASCII 128 and LF will be converted to ASCII 129. If you convert CF/LF then you can include them in your search with PowerHome escape characters ~128 and ~129 respectively.

Within each search pattern you may perform multiple searches by separating your search pattern using the PowerHome escape character ~255. This is most useful when CF/LF IS NOT replaced and trying to match a particular piece of data that spans multiple lines. If you do multiple searches, the last search is used as the starting position and ending position of the returned text.

This function is often used with other PH string functions to trim a larger string, or to locate a string position within another string. See also pos(), posw(), ph_pos(), left(), mid(), right().

See also the .FAQs-String Tips-Hints Help file.

Examples
The following examples demonstrate typical syntax/usage for this function.
The following examples assume that the following string (with CR/LF line enders) is stored in [LOCAL1]...
  ROMId,Name, Value,Avg,
"3F000001CD92C728","Refrig",39.20,37.46,
"3F6000017C8BD128","Outside",23.90,19.81,
"3F000001CDB2BA27","House",70.65,70.13,
"3F000001CD9E6D28","Freezer",1.96,1.11,


The following command extracts "92C"
ph_regexdiff1 ("3F~255D", "7", "[LOCAL1]", 1, 1,0,3,0 )

As does
ph_regexdiff1 ("3F.+D", "7", "[LOCAL1]", 1, 1,0,3,0 ) --> "92C"

But the following fails ...
ph_regexdiff1 ("3F.+D", "7", "[LOCAL1]", 1, 1,2,3,0 ) --> no match found
because the ignore Return character Flag is turned on (2) and the ".+" skips over all characters and will keep skipping thru lines until it finds the last "D" in the text. This occurs in the ROM ID in the last (Freezer) line of the string, but there is then no "7" following after the "D" so the search fails.

This extracts "39.20" ...
ph_regexdiff1 ('3F.+",', ',', '[LOCAL1]', 1, 1,0,3,0 )

But when the CR/LF Flag is set (2) then "1.96" is extracted in a "greedy" search ...
ph_regexdiff1 ('3F.+",', ',', '[LOCAL1]', 1, 1,2,3,0 )

====

Continuing with the same source data, the following syntax could be used to get the Outside data value (23.90) ...
ph_regexdiff1('Outside",',",","[LOCAL1]",[LOCAL3],1,0,0,0)

Which says to extract the string that is in-between Outside", and the next comma "," ...
Starting at the position in the LOCAL1 data indicated by the starting location in LOCAL3 (60)
Return the 1st matching occurance
No special flags required
LOCAL3 will now contain the Index# of the 1st match character, equal to 103
Do not return Length

This will return a value of of "23.90"

====

A more general way to extract the data that also allows a simple macro loop to get successive values is ...
ph_regexdiff1 ('3F~255","~255",', ",", "[LOCAL1]", 1, [LOCAL2], 2, 3, 0 ) --> {where LOCAL2 is incremented for loop control]

Which says to look thru all lines (CRLF Flag=2) to find the string bounded on the left by 3F...","...", and on the right by , (where ... stands for the Escape character ) ..
( These bounding strings are highlighted in this sample line ... "3F000001CD92C728","Refrig",39.20,37.46, ) {red=match; grn=escape over}
Using the "occurrence" parameter (LOCAL2 above) in a loop to set which match occurrence we want, will can select each of the four values in turn.

====

More Examples:    (LOCAL1 contains the text to be searched)
  NOTE: The regular expression engine will only search for a match on its regex search string within the confines of a single line.
If the data contains multiple lines, the search is performed on EACH line until a match is found.

If the sought string is all on one line then CRLFs make no practical difference, but if the sought string covers multiple lines then either the ~255 escape character must be used to span line endings or the replace CRLF flag must set set on in order to find a match.
ph_regexdiff1 ("3F~255D", "7", "[LOCAL1]", 1, 1,0,3,0 ) --> finds 92C
ph_regexdiff1 ("3F.+D", "7", "[LOCAL1]", 1, 1,0,3,0 ) --> finds 92C
ph_regexdiff1 ("3F~255D", "7", "[LOCAL1]", 1, 1,0,3,0 ) --> finds 92C
ph_regexdiff1 ('3F.+",', ',', '[LOCAL1]', 1, 1,2,3,0 ) --> finds 1.96 by greed
ph_regexdiff1 ('Outside",',",", "[LOCAL1]", 1, 1,0,3,0 ) --> finds 23.90 --> NOTE that the first parameter (Outside) contained a " thus a single quote(') had to be used to delineate the parameter.
ph_regexdiff1 ('3F.+",', ',', '[LOCAL1]', 1, 1,2,3,0 ) --> finds ",23
ph_regexdiff1 ('3F.+",', ',', '[LOCAL1]', 1, 1,0,3,0 ) --> fails because "the ignore CRLF is not set" and the .+ search will not work across multiple lines.
ph_regexdiff1 ('3F~255",', ',', '[LOCAL1]', 1, 1,0,3,0 ) -->finds ",23 as the escape (~255) ignores CRLF line breaks.
ph_regexdiff1 ("ROMId,", ", Val", "[LOCAL1]", 1, 1,0,3,0 ) --> finds Name
ph_regexdiff1 ("ROMId,", "\>", "[LOCAL1]", 1, 1,0,3,0 ) --> finds Name
ph_regexdiff1 ("R", "\>", "[LOCAL1]", 1,2,0,3,2 ) --> finds efrig