...

View Full Version : Parsing URL in PHP - only isolate specific section of "path"



this_end_up
01-06-2010, 08:36 PM
Hello,

I have been doing some reading to figure out how to exactly parse what I am trying to create, maybe you can help.

I have a URL that is created using Wordpress that looks like this:

http://www.domain-name.com/wpblog/index.php/tag/my-tag/

I have used the following code to parse the above URL



<?php
$url = 'http://www.domain-name.com/wpblog/index.php/tag/my-tag';

print_r(parse_url($url));

echo parse_url($url, PHP_URL_PATH);
?>


and it returns the following:



Array
(
[scheme] => http
[host] => www.domain-name.com
[path] => /wpblog/index.php/tag/my-tag
[query] =>
[fragment] =>
)
/path


What I really want to do is to isolate the last section of the path ("my-tag") and place it into a sentence, such as "You have searched for: my-tag"

Can anybody help? Can the path be split into a deeper array, or is there a bit of code that will capture the text after the last "/"?

Also, would there be a way to implement a way to replace the "-" with a space?

Thanks.

ninnypants
01-06-2010, 08:53 PM
You could try something like this


<?php
$url = 'http://www.domain-name.com/wpblog/index.php/tag/my-tag';

$path = parse_url($url, PHP_URL_PATH);
// split the path
$parts = explode('/', $path);
//get the last item
$tag = end($parts);
// replace the dash with a space
$tag = str_replace('-', ' ', $tag);
echo $tag;
?>

this_end_up
01-06-2010, 09:12 PM
Thanks for your help...but I get the following error:

Warning: parse_url() expects exactly 1 parameter, 2 given in /home/kowski/public_html/testurl.php on line 4

Not sure what the issues is...the line in questions is the following:



$path = parse_url($url, PHP_URL_PATH);


Seems it doesn't like the "$url" and the "PHP_URL_PATH" params in the same function...

Michael

Fou-Lu
01-06-2010, 09:46 PM
You're PHP version is < 5.1.2.
You can acheive the same effect by just using the parse_url and pulling the 3rd offset to it:


$aPath = parse_url($url);
$sPath = $aPath['path'];

then follow the above using $sPath for you're parameter to explode (or just change it to $path).

rsjpx
04-27-2012, 12:45 AM
Hi,

I'm trying to to pull just the path.

I'm using parse_url ($someUrlString , PHP_URL_PATH)

It works fine when you punch in the full URL "http://www.yahoo.com/pathNameHere/lalala

but when a user provides only www.yahoo.com/pathNameHere/lalala, without the "http://" portion, the path doesn't show up as isolated from the domain. I still get the full www.yahoo.com/pathNameHere/lalala.

Is there a way to have the path isolated even if www.yahoo.com/pathNameHere/lalala is entered rather than have http://www.yahoo.com/pathNameHere/lalala?

Please advise on how to tackle this. Any helps would be greatly appreciated :thumbsup:.

Thanks,
rsjpx

Fou-Lu
04-27-2012, 01:06 AM
No, you need to do it manually as parse_url doesn't know how to split up a path without a scheme.
Use pattern matching or cut it up with string manipulation to determine the parts.

rsjpx
04-27-2012, 06:53 PM
No, you need to do it manually as parse_url doesn't know how to split up a path without a scheme.
Use pattern matching or cut it up with string manipulation to determine the parts.

Thanks Fou-Lou :thumbsup: You're right. The parse_url() function does not isolate everything for you. I managed to come up with the regular expression to separate the 'scheme' that contained various combinations of 'host' (domains and subdomain(s)) addresses and isolated the path by itself. I used the preg_replace() function.

In case anyone else was struggling to isolate the scheme for the URL, this is the regular expression to punch into the preg_replace() or preg_match().

((^(http|https):\/\/{1}([.a-zA-Z0-9_-])+/)|(^(www|[.a-zA-Z0-9_-]+){1}\.[.a-zA-Z0-9_-]+/){1})

Fou-Lu
04-27-2012, 07:24 PM
You can simplify a bit as well. You can group http together and check if s? exists, and you can jus tuse [a-z0-0] if you flip the insensitive flag.
The {1} is also not required as it is implicit. If it doesn't have a multiplier of ? (0 or 1) or * (1 or more), then it has to be 1.
A slightly better pattern that should present similar to parse_url would be:


#^(http(?:s?)://)?((?:(?:www(.*)\.)?[a-z0-9_\-.]+)\.(?:[a-z0-9_\-./]+)+)(/(?:.*)?)?(?:(?:\?)(.*))?(?:\#(.*)?)?$#iU

That should give you the full path, the scheme, the domain, the path, the querystring, and the fragment separated in offsets 0 through 5. Haven't tested it much, but looks to do the job. It also doesn't really obey the rules of dns naming, but that's a whole nother mess.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum