...

View Full Version : Sanitizing or validating input first.



LearningCoder
11-12-2012, 03:57 PM
Hi, when receiving user input from a form, should I be sanitizing or validating my user input first?

I would guess you should sanitize the input first to make sure there are no threats from what the entered (bad code etc) and then validate it to make sure the data is what you require (making sure they entered a valid email for instance, or numbers for a telephone).

I would like to know different peoples points of views!

Please let me know your thoughts.

Kind regards,

LC.

Fou-Lu
11-12-2012, 06:12 PM
No, you should validate first. Sanitizing is preformed to handle the datastorage level which is the last step. There's no point in approaching a datastorage level if the business rules don't pass the validation phase. If you require a number and you're given a string, that's an error so you send them back to the form.
Sanitizing with the exception of stripslashes if required (not necessary as of 5.4) is not necessary if you use SQL with prepared statements and would in fact corrupt the input instead.

felgall
11-12-2012, 06:41 PM
Sanitizing is not something you would need to do where you know that the data has just been validated. It is something you might do on data read from a database if you want to be certain that any tampering with the data in the database will not be able to do any harm when it is subsequently processed.

LearningCoder
11-12-2012, 07:35 PM
Thank you both for your replies.

What I want to do is make the data secure i.e escape any harmful characters.

The characters I'm looking to deal with are < > " ' \ /

I'm not sure if it is enough to just deal with them or are there others? I've read so much on the functions to use before inserting data and people have posted so many different methods so it isn't very clear to me what to use.

At the moment, I check every field that has a value to see if they contain any of those characters and execute a str_replace() on them to convert them to their html entity equivalent like so:



for($n=0;$n<$array_length;$n++){

if(empty($post_data[$n])){
$post_data[$n] = NULL;
}
else{
$post_data[$n] = str_replace(array("<",">","'","\"","\\"),array("&lt;","&gt;","&apos;","&quot;","&#92;"),$post_data[$n]);

switch($n){
case 0: //name
if (!ctype_alpha($post_data)) {
$errors[] = "You can only enter alphabetical characters in the name field.";
}
break;

case 1: //email
if(!filter_var($post_data[$n], FILTER_VALIDATE_EMAIL)){
$errors[] = "You entered an invalid email";
}
break;

case 2: //phone number
if(!ctype_digit($post_data[$n])){
$errors[] = "You entered an incorrect phone number.";
}
break;

case 3: //additional comments
if(!strlen($post_data) <= 400){
$errors[] = "";
}
break;
}
}

}


Can someone please clarify what EXACTLY how to make data safe.

I know this question has come up so many times but honestly there are so many examples. I want some advice from people whose judgement I can trust.

Kind regards,

LC.

felgall
11-12-2012, 08:27 PM
Those characters are not generically harmful. They are only harmful in specific situations. The way to resolve that is to identify the specific situation where they would be harmful and either change the way your code works so that situation doesn't occur or to escape the characters if tchanging the code isn't possible.

For interacting with a mysql database you should be using mysqli or PDO and then you can use separate prepare and bind statements where the SQL and the data are kept completely separate and so there is no way the data can be confused with the SQL. If you use the antiquated mysql interface where you can't perform this separation then you'd use mysql_real_escape_strong() to escape the data and so reduce the chances of certain characters being harmful.

For outputting to HTML there isn't any way to separate the data from the tags and so you would use htmlspecialchars() to convert the potetially harmful characters into their entity code equivalents and so avoid having them mistake n for HTML.

LearningCoder
11-12-2012, 10:21 PM
My intention was to use a mysqli prepared statement once my code reached a certain point (i.e I've checked the data is exactly what it should be).

The thing I am worried about mostly is someone entering some harmful code into the form which could damage the database. I have 2 textarea fields in the form which allow up to 400 characters in each.

From what you said, I gather that you would only secure your data for certain purposes, so upon inserting to the database, use mysqli_real_escape_string()? I thought we needed to use functions like html htmlentities($var, ENT_QUOTES) also before using any data in a query to make sure we convert any single or double quotes as they could corrupt the query?

Thank you for your reply.

Kind regards,

LC.

Fou-Lu
11-12-2012, 11:23 PM
If you are using prepared statements, sanitizing it is not necessary and will corrupt the input if you use it. So don't bother with an escape on it if you bind. Issue a stripslashes if magic_quotes_gpc is enabled first.

As mentioned, tagging is not a security issue, it is a data issue. HTML tags will not harm your storage integrity, it will damage the output of the data. Use htmlspecialchars or htmlentities to translate them, use strip_tags to remove them, or whatever else you want to do. This is a part of the business rules for your input, which is a verification process. If you intend to keep them, I'd keep them in their raw state using the actual characters in question. Display them using htmlspecialchars or htmlentities instead.

LearningCoder
11-13-2012, 01:10 AM
If you are using prepared statements, sanitizing it is not necessary and will corrupt the input if you use it. So don't bother with an escape on it if you bind. Issue a stripslashes if magic_quotes_gpc is enabled first.

When you say don't bother with an escape do you mean I don't need send it to use sayaddslashes()?



As mentioned, tagging is not a security issue, it is a data issue. HTML tags will not harm your storage integrity, it will damage the output of the data. Use htmlspecialchars or htmlentities to translate them, use strip_tags to remove them, or whatever else you want to do. This is a part of the business rules for your input, which is a verification process. If you intend to keep them, I'd keep them in their raw state using the actual characters in question. Display them using htmlspecialchars or htmlentities instead.

So it is ok to send characters such as < > into the database, but I need to convert them when reading the data out?

What about forward and backward slashes? Are they harmful with post data?

Also, with double and single quotes do I need to add slashes to the input to make sure it doesn't affect the statement or does it work differently with prepared statements??

Sorry to keep asking the same questions but this is a real issue of mine when working with user input. Once I understand I can just follow the same rule of thumb every time I have to deal with input.

Kind regards,

LC.

felgall
11-13-2012, 01:35 AM
So it is ok to send characters such as < > into the database, but I need to convert them when reading the data out?[/code]

No you only convert them when writing to HTML. Everywhere else leave them along.

[QUOTE=LearningCoder;1291415]What about forward and backward slashes? Are they harmful with post data?

No.


Also, with double and single quotes do I need to add slashes to the input to make sure it doesn't affect the statement or does it work differently with prepared statements??

It works differently in BIND statements - the data never gets anywhere near the PREPARE statement.


Validate user input and if you want to be really sure then sanitize input from other sources such as the database - just in case it has been tampered with.
Escape output only where necessary to keep the data from being confused with the surrounding code.

Fou-Lu
11-13-2012, 01:49 AM
That's right. When you separate a prepared statement from the data and bind, it now becomes incapable of corrupting the SQL statement itself. So if input data exists of 1; DROP TABLE atable;--, it cannot break a prepared statement. The data goes in exactly as it shows. Addslashes itself should never be used for SQL data since it only accommodates the quotations and backslashes, not null entries, so this is why we used a real_escape_string from the sql libraries. Until magic_quotes_gpc is actually gone from common use (gone as of 5.4), we still need to check for it and issue a stripslashes if its enabled to get the original data provided back.
So SQL wise, it is absolutely fine to run <script> tags in it if you want. Its just data, it doesn't have the ability to parse it. Just like I can also put PHP code, or Java code, or whatever else I want into the db itself. Typically I suggest to leave it as is when inserting into a database, and convert it as necessary for display. Its easy to get hitched into wanting to insert it with the html entities encoded, but that is only because PHP makes conversion so easy. Languages like Java for example require either manual conversion or extended libraries in order to convert them. The difference with Java is I can explicitly tell an item if its capable of rendering HTML or if it just treats it as data, unlike PHP.

LearningCoder
11-13-2012, 01:56 AM
It works differently in BIND statements - the data never gets anywhere near the PREPARE statement.


Validate user input and if you want to be really sure then sanitize input from other sources such as the database - just in case it has been tampered with.
Escape output only where necessary to keep the data from being confused with the surrounding code.
So let's say the user submits details in the form, it sends to database. All good.

Maybe I want to setup an email to send to users where I bind the data to result variables. (for instance, being able to address the user by their name rather than having to hardcode each).

Say the entered "O'connor". If I do not escape that data, what would be the need in using any htmlspecialchars() or htmlentities() as I would want it to be displayed the same way it was inserted wouldn't I?

Kind regards,

LC.

LearningCoder
11-13-2012, 02:02 AM
That's right. When you separate a prepared statement from the data and bind, it now becomes incapable of corrupting the SQL statement itself. So if input data exists of 1; DROP TABLE atable;--, it cannot break a prepared statement. The data goes in exactly as it shows. Addslashes itself should never be used for SQL data since it only accommodates the quotations and backslashes, not null entries, so this is why we used a real_escape_string from the sql libraries. Until magic_quotes_gpc is actually gone from common use (gone as of 5.4), we still need to check for it and issue a stripslashes if its enabled to get the original data provided back.
So SQL wise, it is absolutely fine to run <script> tags in it if you want. Its just data, it doesn't have the ability to parse it. Just like I can also put PHP code, or Java code, or whatever else I want into the db itself. Typically I suggest to leave it as is when inserting into a database, and convert it as necessary for display. Its easy to get hitched into wanting to insert it with the html entities encoded, but that is only because PHP makes conversion so easy. Languages like Java for example require either manual conversion or extended libraries in order to convert them. The difference with Java is I can explicitly tell an item if its capable of rendering HTML or if it just treats it as data, unlike PHP.

So with using a prepared statement, I do not have to do anything to the input in terms of making sure it doesn't contain any data because it isn't able to parse the code and 'execute' it?

Do we still have to use mysqli_real_escape_string() with prepared statements?

Fou-Lu
11-13-2012, 04:32 AM
No, using escaping will corrupt the data. O'Conner would be inserted into the database as O\'Conner, not as O'Conner. You do not want to execute an escape of any kind on a bind, the data goes in exactly as presented. Its also why you have to check for magic quotes and issue a stripslash if its enabled.

LearningCoder
11-13-2012, 11:40 AM
Ah ok, so I literally just make sure the data they entered is what I want i.e digits for a phone number, a valid email etc. Once my code gets far enough and I'm ready to insert, I simply just call the prepared statement script and insert the data using no character encoding or escaping functions?

Kind regards,

LC.

Fou-Lu
11-13-2012, 01:20 PM
That's right, you just need to validate and verify the data, then bind it. If your rules specify it must be a phone number, use a pattern match and verify or convert the format to your liking.

LearningCoder
11-13-2012, 05:30 PM
Thank you very much indeed for the help. It now seems clear in my mind what I need to do with user input in relation to preparing it for insertion.

The thing which confused me I think is the things I've read and people saying 'oh you must escape this and encode that to prevent someone entering bad code' and it just made me a bit over cautious I guess believing I had to get rid off any non alphanumeric characters (or most anyway).

I'm having a few issues with inserting the data.

What I do when they submit is I do various checks bla bla.

Then, I check to see if the values at each index of the array contain an empty string (i.e, the user entered nothing). If the index does contain an empty string, I set that index to NULL type. (I thought I needed to do this so that when I insert, I insert 'nothing'(NULL) into the field because when I used var_dump() it showed up as string type length 0).....This produces errors though. I tried then to set it to the NULL STRING and it inserted 'NULL' into the field but it's not exactly what I want.

Is there a way I can set the array index to NULL type and insert it? Do I need to set EVERY column in my table to NULL by default, and only change that value if there is data being inserted?

Not sure if I'm totally off track and if I really need to be doing this....

What do you do with your user input if it's just for an INSERT and it's NULL (i.e the user entered nothing, the fields are not required...).

I only have 2 inputs in my form which are required, they are Name and Phone Number because if someone is going to submit my form, I want a way to be able to contact them.

Any thoughts?

Kind regards,

LC.

Fou-Lu
11-13-2012, 05:49 PM
You could attempt to provide the string NULL as a binary type to see if that works. You can also attempt to do the same with PHP null. I can't say for sure since I haven't allowed a null in my db designs in a decade.
Typically if you allow null into a field, then your database is not normalized properly. Even empty should be forbidden, but on occasion I also do it (like with a home and office phone number, sometimes I'll ignore that home isn't provide it and give it an empty string). Ideally that should be normalized and only record data that has been provided where I implicitly assume that no record is provided if no record exists.

LearningCoder
11-14-2012, 03:41 AM
I think I may just insert empty strings for now just so I can get it to work. I've only got 1 template page to create after doing this so I'm looking to thoroughly go over each template and see what can be improved.

That will be one for then. Having said that if you allow empty strings then it can't be that much of a bad practice. The reason I wanted to insert NULL or 'NULL' into the empty fields is because I will never know which fields they entered so I need to keep some kind of structure because I think it could mean more prepared statements or largely editing the one I have?

Kind regards,

LC.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum