View Full Version : RegExp on similar search strings - finds most, fails one.. why?

09-13-2005, 11:10 AM
I'm searching the source HTML of pages for particular values. I get the value by first finding the TD containing the value's label, then pattern-matching the code immediately after it:

dim sRegExp_TDInteger, sRegExp_TDWords, sRegExp_TDDecimal
dim sRegExp_UniqueVisitors, sRegExp_VisitorSessions, sRegExp_AvgVisitorsPerHour, sRegExp_MostPopSearchTerms
sRegExp_TDInteger = ">\d+<"
sRegExp_TDWords = ">\w+<"
sRegExp_TDDecimal = ">\d+.\d+<"
sRegExp_UniqueVisitors = "<td class=""summ1"">Unique Visitors</td>"
sRegExp_VisitorSessions = "<td class=""summ1"">Visitor Sessions</td>"
sRegExp_AvgVisitorsPerHour = "<td class=""summ1"">Average Visitors Per Hour</td>"
sRegExp_MostPopSearchTerms = "<td class=""summ1"">Most Popular Search Term(s)</td>"

function getTableCellValue(sTitleCellToMatch, sValueCellFormatToMatch, sSourceHTML)
'assume we're dealing with two-column table
'find the cell containing the title/label text that we want
'get a 100-char string starting where the title cell starts
'find, in that string, a value between HTML tags that matches the required format
Dim rv : rv = ""
Set RE = New RegExp
RE.Pattern = sTitleCellToMatch
RE.IgnoreCase = True
RE.Global = True
Set oLabelMatches = RE.Execute(sSourceHTML)
If oLabelMatches.Count > 0 Then
set oLabelMatch = oLabelMatches(0)
response.write(oLabelMatch.Value & "<br/>")
intLabelStartPos = cInt(oLabelMatch.FirstIndex)
sChunk = mid(sSourceHTML,intLabelStartPos,100)
RE.Pattern = sValueCellFormatToMatch
Set oValueMatches = RE.Execute(sChunk)
If oValueMatches.Count > 0 then
set oValueMatch = oValueMatches(0)
rv = replace(replace(oValueMatch.Value, "<", ""), ">", "")
End If
End If
set RE = nothing
set oValueMatch = nothing
set oValueMatches = nothing
set oLabelMatch = nothing
set oValueMatches = nothing
getTableCellValue = rv
end function

strUniqueVisitors = getTableCellValue(sRegExp_UniqueVisitors, sRegExp_TDInteger, sSummarySource)
strVisitorSessions = getTableCellValue(sRegExp_VisitorSessions, sRegExp_TDInteger, sSummarySource)
strAvgVisitorsPerHour = getTableCellValue(sRegExp_AvgVisitorsPerHour, sRegExp_TDDecimal, sSummarySource)
strMostPopSearchTerms = getTableCellValue(sRegExp_MostPopSearchTerms, sRegExp_TDWords, sSummarySource)

I know my function's a little bit... mickey mouse... but it works on all of the label-finding patterns above, except the one where it tries to find the string "<td class=""summ1"">Most Popular Search Term(s)</td>". It can't find it, and I don't know why. It's definitely in the source code, it's c&p'd directly (with the double-quoting added, obviously).

Any ideas? :confused:

09-13-2005, 03:12 PM
My knowledge of REs is very "noobish", but at first glance I would say that you need to escape the ()s.

09-14-2005, 09:34 AM
My knowledge of REs is very "noobish", but at first glance I would say that you need to escape the ()s.

And you're right :cool: Not as noobish as mine, then :thumbsup:

09-14-2005, 01:55 PM
Well, I've been trying to educate myself on REs because they are quite powerful. I mainly use them in client-side javascript for field validation.