...

View Full Version : SRC URL extraction method - HTML or TXT to TXT...



whopub
11-30-2009, 05:57 AM
Hi,

I'm looking for an app, or online form, to extract image URLs from HTML code saved on TXT files. To be taken from <IMG SRC> tags, to be more exact.

I have several code snippets like this:


<img src="http://dummy.site.com/here/images/09/10065/file01.jpg" width="64" height="100" alt="image title" />
image name
<img src="http://dummy.site.com/here/images/09/10065/file02.jpg" width="64" height="100" alt="image title" />
image name
<img src="http://dummy.site.com/here/images/09/10065/file03.jpg" width="64" height="100" alt="image title" />
image name
<img src="http://dummy.site.com/here/images/09/10065/file04.jpg" width="64" height="100" alt="image title" />
image name
<img src="http://dummy.site.com/here/images/09/10065/file05.jpg" width="64" height="100" alt="image title" />
image name
<img src="http://dummy.site.com/here/images/09/10065/file06.jpg" width="64" height="100" alt="image title" />
image name


And I need an automated way to extract just the URLs, and save them on a TXT file like this:


http://dummy.site.com/here/images/09/10065/file01.jpg
http://dummy.site.com/here/images/09/10065/file02.jpg
http://dummy.site.com/here/images/09/10065/file03.jpg
http://dummy.site.com/here/images/09/10065/file04.jpg
http://dummy.site.com/here/images/09/10065/file05.jpg
http://dummy.site.com/here/images/09/10065/file06.jpg

One URL per line.

The code snippets are not too big, just a bit over 100 entries for the bigger ones. I don't care if I have to do it one TXT at a time. Beats doing the whole thing by hand.

This is the sort of thing that makes me mad for not being a programmer! Any one of you guys could probably come up with a number of ways to pull this off in just a couple of minutes.

And I'm quite sure the tools to pull it off are already out there, but trying a search for it... well, let's just say there's way too much out there, and installing small random apps is really not safe.

I may be completely wrong, but I think I was able to feed code like this to flashget, and he'd just go through the whole thing and listed the actual URLs it found on a confirmation box, allowing me then to select just a few and copy them to the clipboard, in the exact same one-URL-per-line format I need here. But somehow my flashget installation got screwed and now I can't figure out what version I was using. Already tested 4 different ones and none of them seems to be able to do that.

I need those URLs in that format so I can then batch replace URL segments and, finally, feed the updated URLs to flashget. But the first step is extracting the initial URL from that code.

So, any ideas?


Thanks.


PS: hope I'm not screwing up but posting this here, but I really couldn't find a better match... And it IS HTML related, I guess.

Jack Corzine
11-30-2009, 12:13 PM
why not open it up in a text editor and use the search and replace utility? Just put in search for <img src=" and replace it with an empty space, then search for the ending string and do the same thing?

Jack

Rowsdower!
11-30-2009, 12:48 PM
You could do this with javascript or PHP if you have a web host that supports PHP.

If you're looking for customized code to be built for you then this thread is probably most appropriately placed in the paid work forum. :D

If you want to learn to do it yourself and be guided then by all means make an effort of your own and we will help you sort out the issues you run into. The logic involved with this would be pretty simple.

whopub
11-30-2009, 02:29 PM
Hey guys.


why not open it up in a text editor and use the search and replace utility? Just put in search for <img src=" and replace it with an empty space, then search for the ending string and do the same thing

That would be my default approach, but the end string is always different because of the ALT tag and text, which are always different!


You could do this with javascript or PHP if you have a web host that supports PHP.

My webhost supports PHP, and I'll take any solution, it's just that I'm sure there are freely available ready-made solutions out there. From website 'suckers' to download managers, or even html tag strippers that can be costumized. But, for sure, there's gotta be something just goes through random text and collects just the HTML links. I just need to be sure about one, so I don't end up installing 10 or more before I get the right one.


If you're looking for customized code to be built for you then this thread is probably most appropriately placed in the paid work forum. :D

I have a friend who can probably do it for free in just a few hours, it's just that the tools I need already exist. And he's a busy guy.


If you want to learn to do it yourself and be guided then by all means make an effort of your own and we will help you sort out the issues you run into. The logic involved with this would be pretty simple.

My coding skills are zero. I do HTML and CSS, that's about it. Calling that coding is almost like elevating paper airplane throwing to space exploration...

whopub
12-01-2009, 08:08 PM
Even an app that goes through text and just extracts all words starting with http will do (the " can easily be removed later.

But still, there must be apps out there to suck URLs out of text files. Anyone?



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum