View Full Version : parsing!
ShMiL
11-16-2002, 08:59 AM
I have static HTML files in which there are details of members...
I want to build a script which will parse the info (eg: name,email,address) from each HTML file.
My only problem is that I don't know how to extract the details.
for exaple, I have:
<div id="1">Shmil</div> which always stands for username.
What functions should I use to extract the info between <div id="1"> and </div>
?
Thanks!
neocyb
11-16-2002, 09:49 AM
to solve 1 of your questions (because you have 2 : remove tags en get info between them) :
To remove the tags use this routine :
MyArray = array("<ul>", "<li>","<hr>","</li>","</ul>","<div id=""1"">","</div>")
for each item in MyArray
strBuffer=replace(strBuffer, item, "")
Next
now to get the info you need between the tags I suggest you use an if ... then statement between those lines to extract data.
Greetz,
NeoCyb
PS : Remember to Dim your variables... I forgot :o
ShMiL
11-16-2002, 02:04 PM
I don't want to remove the html tags...
I want to put into array the details gathered from the page, and then I put
it into DB...
My only prob. is to extract the data between the tags. And you can't do it
with a simple 'if'... I need to use some string functions for it - but I
don't know which and how.
Thanks.
neocyb
11-16-2002, 02:11 PM
ok here's a function that you can use for your problem. It's currently only for <TD> Tags but you can change it to work for the information you need :
Function GetCell(cellnumber, tempstr)
i = 1 ' Text Location Start
q = 1 ' Cell Number Start
' Loop until we have processed the cell we're looking for
Do until q > cellnumber
' Look for <TD the start of a cell
i = InStr(i, UCase(tempstr), "<TD")
' Find the location of the end of the <TD tag
r = InStr(i, tempstr, ">")
' Let the next loop start looking after this <TD tag we found
i = r + 1
' increase the count of which cell we're at
q = q + 1
Loop
' The start of our cell text is right after the last found tag
StartCellText = i
' Now... to find the end of this cell's text, we look for either <TABLE
' or <TD - whichever comes first (but we have to check if they exist or not)
' We don't include nested tables in the cell data because those tables have
' cells of their own.
If (InStr(r, UCase(tempstr), "<TABLE") > 0) AND _
(InStr(r, UCase(tempstr), "<TABLE") < _
InStr(r, UCase(tempstr), "</TD>")) then
ThisCellText = mid(tempstr, StartCellText, _
InStr(r, UCase(tempstr),"<TABLE")- StartCellText )
Else
ThisCellText = mid(tempstr, StartCellText, _
InStr(r, UCase(tempstr), "</TD>")- StartCellText )
End If
GetCell = ThisCellText
End Function
Use it like : GetCell(1,yourstr)
for this function it's GetCell(Cellnr,string)
You could change this function like so
name = GetCell(yourstr,"<div id="1">","</div>")
GetCell(string,beginningtag,endtag)
Hope this helps you on your way...
ShMiL
11-16-2002, 02:58 PM
It does help!
Very close to what I was looking for.
THANKS ALOT!:D
ShMiL
11-24-2002, 04:07 PM
How can I change neocyb's second function so I will input these parameters:
GetCell(number,my_str,str_begin,str_end)
number --> the number of appearance
my_str --> the string to parse
str_begin --> the string to search for (beginning) eg: "<div class=shmil>"
str_end --> the string to search for (end) eg: "</div>"
Thanks in advance.
ShMiL
11-25-2002, 10:25 AM
anyone?!?!?
glenngv
11-25-2002, 10:54 AM
why don't you just use XML? It's easy to parse XML using ASP
ShMiL
11-25-2002, 11:35 AM
the pages i need to extract info from are HTML not XML!
ShMiL
11-26-2002, 02:59 PM
anyone?
Does anyone have a function which gets the number of appearances, the string, the begin string and the end string and returns the text between the two sub-strings?
PLEASE:( :( :( :( :(
whammy
11-27-2002, 01:03 AM
Since this is really application-specific, I really only have one suggestion (you might not like it... :(, but actually it answers the question you asked "does anyone have functions which...") which is to study string manipulation in ASP very thoroughly!
I actually just recommended the same thing to Morgoth, since this is a MUST-know type thing. First of all I'd go here:
http://www.w3schools.com/asp
Then, I'd do a google search for "String manipulation in ASP" if you need to do further research.
Hope this helps... :)
ShMiL
11-27-2002, 04:04 AM
You are right whammy, I'm sorry.
It just that I have BIG exam coming on 10/02/03 and I can't find the time to get back to programming.
A week after this date I promise to post here this function, which will made by me...
Thanks.
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.