Hello and welcome to our community! Is this your first visit?
Enjoy an ad free experience by logging in. Not a member yet? Register.

# Thread: 16 bit floating point (Half)

1. ## 16 bit floating point (Half)

After googling around for a while, I haven't been able to find anything helpful about an implementation of a 16-bit floating point type for Java. Does anyone know of any sites that might help me out?

• I don't believe you can do a 'short' float in java.
Is there a reason you need to use 16 bits instead of the 32 provided?

• I have a very high data rate coming in off the socket of floating point data. Values will always be between -0.5 and 0.5 so 32 bits seemed like overkill. I could always just multiply values by 10 (or more) before sending it over the socket and use a short but I was curious if there were any implementations for a 2-byte 'float'. Multiplying will also have some precision loss. Still open to any ideas.

• I haven't tried this before, but if this were C you would malloc 2 chars (16 bits) and use bitwise functions to control it. Real numbers are a lot harder to do this with though since you've got... what are they called... the manitssa and the exponent bits, as well as the sign bit.

How many points of percision are you looking for? I'll think about this and see if I can figure out how to implement something like this in java.
Its been awhile since I'm performed computational mathematics.

• 5 points of precision would be sufficient but it would be cool to have a variable number of points. I'd be willing to help you out, but I'm not very familiar with the inner workings of fp type math.

• I'm thinking I won't be having the time to do this.
The logic is quite simple, you use bitmasking and shifting to shove bits into 2 bytes (a byte[2] array or a short). The tricky part will be cutting the correct bits from the original floats so that you don't lose precision. As I have it right now, when decomposed to 16bits I didn't think ahead and pull out what I needed to correctly, so I'm losing precision starting at the third to fourth floating point.
I'm pretty sure the current float structure is like so: (32bits)
1 bit Signed value
7 bit Exponent value (I'm about 85% certain the first (leftmost) of the seven is its signed bit).
24 bit Mantissa value

So, when we compose it to 16 bits, my current code extracts it as the first 16 (doh >.<), so I lose the last 16 out of 24 bits off of the mantissa value, which of course is you're precision.
I'm thinking you'll be wanting to break it down like so:
1 bit signed value
2-5 bit Exponent
8-11 bit mantissa.

That should give you the precision you're requiring, however it may be better to choose a smaller exponent (since its between -0.5 and +0.5, you only really need a 2 bit exponent (-1 value), letting you have an 11 bit mantissa value).

Calculation flow would be: float -> integer bits -> calculated short -> integer bits -> float.
The calculated short (or you could use 2 bytes) would contain you're 'short float' as bits.

I'll try to squeeze it in between a couple of other tasks to see if I can't solve this for you, but no guarantees, its not on the top of my priority queue.

•

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•