PDA

View Full Version : Can someone please convert this tiny snippet to PHP?


kaisellgren
06-01-2009, 12:11 AM
Hi,

Could someone please convert this code to PHP? It basically encodes characters into their corresponding HTML entity values.

public String encodeCharacter( char[] immune, Character c ) {
char ch = c.charValue();

// check for immune characters
if ( containsCharacter( ch, immune ) ) {
return ""+ch;
}

// check for alphanumeric characters
String hex = Codec.getHexForNonAlphanumeric( ch );
if ( hex == null ) {
return ""+ch;
}

// check for illegal characters
if ( ( ch <= 0x1f && ch != '\t' && ch != '\n' && ch != '\r' ) || ( ch >= 0x7f && ch <= 0x9f ) ) {
return( " " );
}

// check if there's a defined entity
String entityName = (String) characterToEntityMap.get(c);
if (entityName != null) {
return "&" + entityName + ";";
}

// return the hex entity as suggested in the spec
return "&#x" + hex + ";";
}


Thanks for any help!

abduraooft
06-01-2009, 09:01 AM
Have you checked http://php.net/htmlentities ?

kaisellgren
06-01-2009, 03:41 PM
It only converts some basic characters...

$a = chr(1).'This is a chinese character: "華" as you can see... ;)';
This is a chinese character: &quot;華&quot; as you can see... ;)
The first character is not converted to  ; ! The chinese character is not converted to its HTML entity and neither are : . ; or ) in addition to alphabets.

I need to convert any character to its corresponding HTML entity: &#xx; and not just the basic < > ' " etc

Fou-Lu
06-01-2009, 05:12 PM
It only converts some basic characters...

$a = chr(1).'This is a chinese character: "華" as you can see... ;)';

The first character is not converted to ; ! The chinese character is not converted to its HTML entity and neither are : . ; or ) in addition to alphabets.

I need to convert any character to its corresponding HTML entity: &#xx; and not just the basic < > ' " etc

Works for me, just make sure you're changing you're charset as well. Are you meaning to keep that SOH though, I doesn't look like the decode will remove it.
If you still can't get it to work, we can look at converting what you have. Going to unicode will be a pain though.


Oh sorry missed that you were looking for other things too.
How come you need to convert the other special chars, like ; and :? For that, yeah we'll need to look at rewriting.

kaisellgren
06-01-2009, 06:14 PM
There are no display problems, and this is not a display problem.. the problem is the way the characters are being shown.

For instance, you can display & -sign by either typing it in the HTML as & (which I know does not obey strict standard rules) or you can use the corresponding HTML entity: &amp;

Now what I want to do is to convert all characters into HTML entities. This certainly is possible.

For example, a space can be converted into (in addition to &nbsp) ; where 32 is the corresponding value in ASCII set. Now, I want to convert all characters to these entities. Well okay, a-zA-Z0-9 do not need to be, but all other characters as well as . , : etc.

For example here: http://rishida.net/scripts/uniview/conversion.php

If you type: A and convert, it says it equals to a which is exactly what I want.

That is simple to do, just use ord() to find out the number. However, I need help on figuring out this when it comes to multibyte characters like the chinese character I showed earlier, which is: 華 in an entity format. I want to do that kind of conversions in PHP.

EDIT: Omg, this board converted my characters... the chinese character equals to &# 33775;