Active TopicsActive Topics  Display List of Forum MembersMemberlist  Search The ForumSearch  HelpHelp
  RegisterRegister  LoginLogin
PowerHome Macros, Scripts, and Formulas Repository
 PowerHome Messageboard : PowerHome Macros, Scripts, and Formulas Repository
Subject Topic: CSV data parsing and string extraction Post ReplyPost New Topic
Author
Message << Prev Topic | Next Topic >>
GadgetGuy
Super User
Super User
Avatar

Joined: June 01 2008
Location: United States
Online Status: Offline
Posts: 942
Posted: November 21 2010 at 13:32 | IP Logged Quote GadgetGuy

Working recently with my routines to extract the Wind
Speed and Humidity data values from my local Weather
Underground web site, I was reflecting on yet another
awesome power feature of PH, namely its String
Manipulation functions.

=================
Sidebar Note: Getting the weather site data is as easy as the following PH macro line which reads the entire HTML site source code into LOCAL1
=================

Before I know about these functions, I had written a
series of macros to extract various temperature values
from a CSV file produced by my 1-wire LAN system.

I wished I knew then what I know now about the power of
PH's string functions!

The CSV file looks like . . .

and I wanted the two data elements on each line following
the sensor friendly name.

My original code to extract this data moved pointers, and
data masks around the CSV file and was quite complicated
and long.

Then I discovered the PH string functions (highlighted in
green in the Help file Index list)



These powerful PH functions provide almost unlimited
string location and extraction functionality from a
target string.

The differences between my old and the new methods are
noteable, with the "old" on the left and the "new"
on the right in this image . . .


The areas boxed in Red are the actual data extraction
code, the rest is setup and validity checking.
Highlighting just the extraction code more readily shows
the greater efficiency (not to mention ease of
development effort) of the PH built-in capabilities.

Rather than explain the various PH string functions here,
I suggest you check them out individually in the PH Help
zone.

Since a picture is worth 1000 words, let's look at an
example. Then a few comments.

Here is my little bit of Code to extract the three sets
of temperature data from the CSV file.

Note, to get the file into PH, in order to parse it, the
following function was used . . .


This reads the entire file's contents into the local
variable [LOCAL1], where it can then be used.

The code to extract the data is done in a loop that grabs
each line in turn . . .


Using the Case function to extract a search label for
each loop pass (LOCAL2 is a loop counter incremented on
each pass), that label is then first looked for, the data
value just following it is extracted, and a index pointer
is set marking the end of that first data in the string,
so that a simple comma delimited data snap of subsequent
data can then be performed. Since the ph_regexdiff1()
function uses a search starting pointer, setting that
pointer allows one to skip over the initial portion of a
data line. Note the function also contains an "incident"
counter so that once the beginning of a repeating pattern
(ie, data between commas) is identified, the nth, nth+1,
etc data set can easily be grabbed.

Explaining the ph_regexdiff1() function as used, we have
for the function's Syntax . . .

Code:
ph_regexdiff1 ( pat1, pat2, data, start, occ, flags, localstart, locallength )


and an actual use example of . . .

Where:
pat1 is the unique temperature label we are looking
for is boxed in orange. This is the start of the pre-
match string, the first character following of which will
become the initial character captured in our snap data
extraction . . .


"[LOCAL4] holds the temperature label we are looking
for in the CSV file. In addition to the label, we want
to also skip over the following quote and comma (",).
NOTE that since a dbl-quote (") is contained in the
search string, we must delimit the pat1 variable with
single-quotes (') to uniquely delimit the search string
and not confuse PH.
pat2 is the post-match string that identifies the
characters following the desired snap extraction area. In
this example, that post match is just the comma (,)
delimiting the end of the data.
• [LOCAL1] is the entire target data file to be
searched
• The initial "1" is the starting character position in
the target file. Since we want to search from the
beginning for the temp label, we start at position 1.
• The 2nd "1" is the incident number of the pre-post
pattern sequence. Since we are looking for the first
occurence of the data following the temp-label we just
use "1" here, but if there were many comma separated data
values following the label, then we could increment this
occ parameter to get the 2nd, 3rd, 4th, etc data
elements.
• The "0" following the two ones, is the flag parameter
that determines how the search is performed. Since case
matching or line endings are not an issue in this
example, the default "0" value (no special handling) is
used.
• The "3" causes the location of the first character of
the snapped string
in the Target file to be saved in
LOCAL3. We will use this in the next function call to
make sure we start searching for new data AFTER the data
we just snapped out.
• The final "0) is the do-nothing default re: saving
the snapped string length, which we do not care about for
this example.

We now have snapped out the "-0.96" data element for the
Freezer, from the CSV file.

To get the next temperature in this line, we use this line of code . . .


This is simpler than the first snap effort, as we are
only looking for the data between commas now.

The orange boxed parameter defines a comma for the pre-
snap match string, and the green box the post-snap match
string. We will snap everything in between those
parameters from the contents of the LOCAL1 variable.

Since LOCAL3 was set to the initial character of the
previously snapped data, we know we are past prior commas
and that the next comma delimiter set we find will be
associated with our next desired data set.

NOTE: that looping on this expression but incrementing
the occ parameter (the "1" following the "LOCAL3"
parameter above), would let us parade right down the
comma delimited data string capturing each data element
in turn.

==============

POSTSCRIPT.

As posted by Dave Howard in another forum he notes that
snapping CSV data can be achieved in many ways . . .

There is always more than 1 way to skin the cat .
Looking at what was trying to be accomplished, I
immediately thought of the ph_regexsnap1 function. My
version looks like:
Code:
ph_regexsnap1(",\([^,*\),","\1","[LOCAL1]",1,29,0,0,0)


Much simpler if I say so myself. This code searches for
the a comma, snaps the block of text that does not
contain a comma, followed by a comma. The 29 in the
formula says to snap the 29th occurrence of this
particular pattern...no length problems of any sort using
this logic. Alas, there is a bug in the ph_regexsnap1
function where the occurrence value seems to be skipping
(Im working on fixing this now). So in theory, this would
be great...it just doesn't work until I correct the bug.

But...we also have access to the ph_regexdiff1 function.
My code using this function looks like this:
Code:
ph_regexdiff1(",",",","[LOCAL1]",1,29,0,0,0)


Wow..even simpler. This also has the added benefit of
actually working . This just uses two search strings and
returns the data in between them and also returns the
29th occurrence.

But...hold on, were not through yet. PowerHome provides a
function designed to work with comma (or tab, or etc.)
separated values. The ph_parseline function. The
equivlant code (and it also works) looks like:
Code:
ph_parseline("[LOCAL1]",",",29)


Tailor made for just this type of data

Edited by GadgetGuy - November 22 2010 at 16:26


__________________
Ken B - Live every day like it's your last. Eventually, you'll get it right!
Back to Top View GadgetGuy's Profile Search for other posts by GadgetGuy
 
BeachBum
Super User
Super User
Avatar

Joined: April 11 2007
Location: United States
Online Status: Offline
Posts: 1880
Posted: November 21 2010 at 13:39 | IP Logged Quote BeachBum

You’ve got way too much time… Outstanding job documenting PH string functions. Keep up the good work. Maybe we’ll get a detail users guide yet.

__________________
Pete - X10 Oldie
Back to Top View BeachBum's Profile Search for other posts by BeachBum
 
GadgetGuy
Super User
Super User
Avatar

Joined: June 01 2008
Location: United States
Online Status: Offline
Posts: 942
Posted: November 21 2010 at 13:53 | IP Logged Quote GadgetGuy

You are probably right Pete (about too much time), but I so appreciate everything that PH has helped me accomplish after years of looking for a decent solution, that it is a pleasure to "give back" something.

If I can help new users get upto speed faster with some examples, then I will feel I have helped. The PH Help file is a good start, but without any examples, I was never sure when to put quotes around anything, or use direct values, etc.

I always wished for examples, so I'm trying to do that now.

__________________
Ken B - Live every day like it's your last. Eventually, you'll get it right!
Back to Top View GadgetGuy's Profile Search for other posts by GadgetGuy
 
MrGibbage
Super User
Super User
Avatar

Joined: October 23 2006
Location: United States
Online Status: Offline
Posts: 513
Posted: November 21 2010 at 15:47 | IP Logged Quote MrGibbage

In my opinion, posts like this would be perfect for the powerhome wiki. Right now, the wiki is
looking a little bare. I've got a few ideas for pages there, so I will try and get them posted
there before the end of the year.

Eventually, the wiki *would* be the user's guide we have always hoped for.

http://www.power-home.info/wiki/index.php?title=Main_Page


__________________
Skip
Back to Top View MrGibbage's Profile Search for other posts by MrGibbage
 
GadgetGuy
Super User
Super User
Avatar

Joined: June 01 2008
Location: United States
Online Status: Offline
Posts: 942
Posted: November 21 2010 at 16:37 | IP Logged Quote GadgetGuy

You keep up the good work on a wiki and I'll keep writting stuff to put there (it's hard to keep an old technical writer down )

PS:

Skip - I have been a PH user and Forum contributor for years now and I DID NOT KNOW ABOUT THE WIKI!!!!!

I just went there and was very impressed. What a great start.

Somehow, I think it is important to get the word out about this. Perhaps a prominent announcement on the initial Forum Home page, and certainly (David) right at the front of the Help guide in PH!

Good job.

Edited by GadgetGuy - November 21 2010 at 17:21


__________________
Ken B - Live every day like it's your last. Eventually, you'll get it right!
Back to Top View GadgetGuy's Profile Search for other posts by GadgetGuy
 
BeachBum
Super User
Super User
Avatar

Joined: April 11 2007
Location: United States
Online Status: Offline
Posts: 1880
Posted: November 21 2010 at 17:39 | IP Logged Quote BeachBum

The Wiki started out good but outside people started to post junk there so Dave kind of shut it down…. But you’re absolutely right. Ken if you want to step up to the platter as a tech writer maybe some of us old tech writers might be able to assist you. Some people have wordsmithing skills and you are definitely one who does. I for one don’t but I have experience in how to proof those who do. Dave would love to have a bunch of volunteers who would contribute to the cause.

__________________
Pete - X10 Oldie
Back to Top View BeachBum's Profile Search for other posts by BeachBum
 
GadgetGuy
Super User
Super User
Avatar

Joined: June 01 2008
Location: United States
Online Status: Offline
Posts: 942
Posted: November 21 2010 at 17:58 | IP Logged Quote GadgetGuy

Pete - I offered to help David a year ago and have sent several "prods" since, but the progression and improvement efforts it has taken to bring a next generation PH to us all have taken the toll on his time windows, I suspect.

Over a year ago I wrote him a nice "reference" to post on the web site under the References tab, and reminded him once since, but the tab is still empty with no references!

I have offered to "sanitize" my entire collection of PH efforts (database, macros, Windows Scripts, web site design, etc) for him to include with the next release of PH in place of the hugely out of date sample.db file now provided but haven't gotten a solid nibble yet.

I have been nibbling around the edges with posts in this Repository forum to try to help new users get started with some of the kinds of information I wished I had had.

I'd be glad to continue to pitch in

__________________
Ken B - Live every day like it's your last. Eventually, you'll get it right!
Back to Top View GadgetGuy's Profile Search for other posts by GadgetGuy
 
BeachBum
Super User
Super User
Avatar

Joined: April 11 2007
Location: United States
Online Status: Offline
Posts: 1880
Posted: November 21 2010 at 19:07 | IP Logged Quote BeachBum

I suspect Dave is in a tough position but it’s his baby and his to call. Sometimes a hobby or passion becomes a job and then it becomes a little harder to drive.

__________________
Pete - X10 Oldie
Back to Top View BeachBum's Profile Search for other posts by BeachBum
 
GadgetGuy
Super User
Super User
Avatar

Joined: June 01 2008
Location: United States
Online Status: Offline
Posts: 942
Posted: November 22 2010 at 06:36 | IP Logged Quote GadgetGuy

Right on Pete. Been there, done that! It is a tough position.

Hey! I was thinking about the wiki "pollution" issue from unwanted junk posts.

A moderator who reviewed submitted articles before they are posted one one to control things, but what about limiting one's ability to post to a wiki if, and only if, the poster is at least a Senior Member (or maybe one lower), of the Forum?

That should only allow responsible contributions to be made, with the minimum ammount of administrative hassle.

__________________
Ken B - Live every day like it's your last. Eventually, you'll get it right!
Back to Top View GadgetGuy's Profile Search for other posts by GadgetGuy
 

If you wish to post a reply to this topic you must first login
If you are not already registered you must first register

  Post ReplyPost New Topic
Printable version Printable version

Forum Jump
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot delete your posts in this forum
You cannot edit your posts in this forum
You cannot create polls in this forum
You cannot vote in polls in this forum