...

View Full Version : Understanding preg_match



ScottInTexas
02-25-2003, 04:11 PM
I am looking at a file written by others. This function is at the top and I would like to see if I am understanding it right. preg_match is the most confusing part of this.



foreach ($HTTP_GET_VARS as $key=>$value){
if (preg_match("/^\<script/", $value)){
$HTTP_GET_VARS[$key] = NULL;
$$key = NULL;
}
}


In previewing this the preg_match does not show the same as it is written. it is written like this "preg_match("/^\<script/", $value)".

I understand foreach, but since this file is called by another and is not passed anything then $HTTP_GET_VARS should be empty. I am assuming that $key and $value are variables but there is no declaration for the variables.

For the preg_match string I guess the first character after the " is an escape character? So what is the ^ for? Then the next character is an escape for the open < for an end script tag?

Finally if there is a value in the $HTTP_GET_VARS array then set it NULL. But why two $$?

Can someone shed some light on this?

mordred
02-25-2003, 04:36 PM
Originally posted by ScottInTexas
In previewing this the preg_match does not show the same as it is written. it is written like this "preg_match("/^\<script/", $value)".


Yeah, this board tends to eat backslashes in highlighted PHP-Code. Can be quite a PITA if you don't think about it while posting.



I understand foreach, but since this file is called by another and is not passed anything then $HTTP_GET_VARS should be empty.


Assuming the file is included/required by another one, it would still be able to access the GET parameters contained in $HTTP_GET_VARS.



I am assuming that $key and $value are variables but there is no declaration for the variables.


The foreach loop declares these variables anew on each iteration. Are you still sure you understood foreach completely? ;)



For the preg_match string I guess the first character after the " is an escape character? So what is the ^ for? Then the next character is an escape for the open < for an end script tag?


preg_match uses a so-called regular expression to match a certain pattern to a string. Regular expressions are a special language which lets you define very explicitly how a matching pattern should look like. In your example, the pattern translates to written english as "a match is found if the string starts with <script". The slashes are called delimiters, they are used to mark the start and the end of a regular expression. They can be any non-alphanumerical character.

If you want to know about the RegExp syntax, check out
http://www.php.net/manual/en/pcre.pattern.syntax.php



Finally if there is a value in the $HTTP_GET_VARS array then set it NULL. But why two $$?


Not quite, only if the value matches with the RegExp. Two $s indicated that a variable variable is used. IE if $key would be "test", then $$key maps to $test. The content of $key is the name of the variable in $$key. Just substitute the last $...
It's a slightly confusing way to program, but can be useful for some elegant coding solutions.

ScottInTexas
02-25-2003, 06:14 PM
Thanks for your answer. It does help, but I would like to get a little more clarification.

My understanding of foreach comes from VB where I use for each, for example, to iterate through the controls on a form or to iterate through forms in a project etc. However, I usually define the variable first, then I use the variable in foreach.



Dim ctrl as Control
For Each ctrl in ThisForm
Do something
Next


I was also aware of the fact that the commands within the if would only be run if the statement tested true but how can it test true if there is never anything in the array?

[Huh?] (custom tag)
So in the php example this says; for each variable in the array $HTTP_GET_VARS, identifiable through the variable $key that is equal to or greater than $value -- if $value has the string "<script" in it then set the array value to null and set the variable to Null.
[/HUH?] :confused:

I suppose this makes sense. I just can't see the purpose. This is the first line of the index.php in the example I am looking at. Index.php is the first document opened in the entire web site (if you don't count .htaccess). So what's the point? In case you just said to yourself "what's in .htaccess?" the answer is;

DirectoryIndex index.php

Now, when next I hear from you or someone else I will probably be able to use another custom tag [DUH?] which is reserved for when I finally get it.
Maybe we can get the board to create these where there is some color or something to reflect the meaning.

Íkii
02-25-2003, 09:34 PM
So in the php example this says; for each variable in the array $HTTP_GET_VARS, identifiable through the variable $key that is equal to or greater than $value -- if $value has the string "<script" in it then set the array value to null and set the variable to Null.

stop that way of thinking now - way tooooo confusing.

foreach ($HTTP_GET_VARS as $key=>$value){

so, for very get variable we set the variable $key equal to the variable name and the variable $value equal to the value - so if we had
.php?varname=varval
we would have
$key = 'varname';
$value = 'varval';
for very var=value pair in the get array.

If any of the values have <script in them we
$HTTP_GET_VARS[$key] = NULL;
set the relative get array value to null and set
$$key = NULL;
which just sets the unarrayed variable name to null ($varname = NULL; in our example)

mordred
02-26-2003, 12:27 AM
Originally posted by ScottInTexas
However, I usually define the variable first, then I use the variable in foreach.


You don't need to do that with $HTTP_GET_VARS. It's a predefined variable, comparable to an environment variable, and automagically filled by PHP with all GET parameters that happen to reach the script.



I was also aware of the fact that the commands within the if would only be run if the statement tested true but how can it test true if there is never anything in the array?


If no GET parameters are attached to the URL, the array $HTTP_GET_VARS contains nothing and hence the foreach loop does not run. It never starts, but that's not problematic for the purpose of this script.



[Huh?] (custom tag)
So in the php example this says; for each variable in the array $HTTP_GET_VARS, identifiable through the variable $key that is equal to or greater than $value -- if $value has the string "<script" in it then set the array value to null and set the variable to Null.
[/HUH?]

I suppose this makes sense. I just can't see the purpose. This is the first line of the index.php in the example I am looking at.


Nice idea with the custom tags... :)

From what I see, the purpose of this code snippet is to sanitize incoming variables. Because the value of a GET parameter can consist of malicious code, it's good to check it before you use it later on that page. If for instance in this page all GET variables were printed out, you might hack that code and insert a javascript by simply putting test=<script>while(true) alert('hacked')</script> in the URL.
It depends on what follows in that page, so I can't say for sure. But it's most certainly a security feature, and good practice dictates to be paranoid concerning user-submitted values...

hth

ScottInTexas
02-26-2003, 01:47 PM
Thanks to both of you for the answers.


I GOT IT! :)


The last (regarding security) makes a heck of a lot of sense. And yes, I am paranoid ever since my machine was attacked and I went through days of living hell to fix. I had let my guard down and a virus was allowed in with my email. Something called GWSPACES or some such thing. The first thing it did was attack my virus scan.

Anyway, thanks again.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum