Here are my notes on writing floating-point binary data as decimal strings, including within XML nodes. My concern is with IEEE 754-2008 Floating-Point binary values, and really only with the single- (C/C++ `float`

) and double-precision (C/C++ `double`

) forms, not the extended forms.

### Avoiding Round-Off Errors in the Round Trip

When saving a floating-point binary value as a decimal string I must be able to parse the string and recover the exact binary value. In order to do that I must use sufficient precision, for the reasons discussed in "What Every Computer Scientist Should Know About Floating-Point Arithmetic". Single-precision `float`

data required 9 decimal places, double-precision `double`

data requires 17 places.

To ensure that I write my decimal strings with adequate precision I either set the precision on the stream or I use the `lexical_cast`

class from the Boost C++ Libraries. Some of the advantages and disadvantages of each approach are discussed in "The String Formatters of Manor Farm". I find using a stream manipulator to set the precision to be sufficiently practical, but restoring the stream to its prior state can be burdensome, making the `lexical_cast`

approach appealing. The `lexical_cast`

class also makes it easier to keep code generic for either `float`

or `double`

types. The `lexical_cast`

class also provides a clean way of parsing the floating-point values from a `char*`

or `std::string`

type, often what an API returns. The following illustrates both approaches in action.

#include <sstream> #include <iomanip> #include <boost/lexical_cast.hpp> #include <cassert> int main() { float v_in, v = 3.14159625F; double vv_in, vv = v; std::stringstream s; // Leaving the stream state unchanged: s << boost::lexical_cast<std::string>(v) << " " << boost::lexical_cast<std::string>(vv); s >> v_in >> vv_in; assert(v == v_in && vv == vv_in); // Setting the stream state: s << std::setprecision(9) << v << " " << std::setprecision(17) << vv; s >> v_in >> vv_in; assert(v == v_in && vv == vv_in); }

Note that the assertion uses the exact equality test.

### Special Quantities: Infinity

The IEEE 754 floating-point standard specifies binary formats for positive and negative infinity, but it says nothing about how they should be represented as text. C90/C++98 compilers vary in how they represent infinite values. The following:

std::cout << std::numeric_limits<float>::infinity() << std::endl;

produces "inf" with GCC (or "INF" if the `uppercase`

flag had been set on `cout`

), but it produces "1.#INF" with Visual C++. C99/C++TR1, specifying "[-]inf" or "[-]infinity" (with an additional case specifier), does not make the round-trip to and from a string any easier. The following will still likely fail when `v`

is +/- infinity:

#include <limits> #include <sstream> #include <iostream> //... float v = std::numeric_limits<float>::infinity(), v_in; std::stringstream s; s << v; s >> v_in; assert(v == v_in);

Since I need to save floating-point decimal strings to XML files, and since the W3C XML Schema: Datatypes specification says that infinity be written as "INF" in `float`

nodes, I handle (-)infinity as a special case and always represent it as "(-)INF" in string form. The round-trip then looks like this:

// ... typedef float FPType; // to keep code generic -- could be double FPType v_in, v; // = +/- infinity or numeric value std::stringstream s; if (v == std::numeric_limits<FPType>::infinity()) { s << "INF"; } else if (v == -std::numeric_limits<FPType>::infinity()) s << "-INF"; } else { s << boost::lexical_cast<std::string>(v); } input = s.str(); try { v_in = boost::lexical_cast<float>(input); } catch (const boost::bad_lexical_cast& e) { if (input == "INF") { v_in = std::numeric_limits<FPType>::infinity(); } else if (input == "-INF") { v_in = -std::numeric_limits<FPType>::infinity(); } else { throw e; } } assert(v == v_in);

The assertion will not be satisfied for all values of `v`

, though.

### Special Quantities: NaN

Just as the IEEE 754 floating-point standard says nothing about how infinities should be represented as text, neither does it say anything about how NaNs should be written. The C99/C++TR1 standards are again somewhat more prescriptive, but their definitions for NaNs do not match the XML specification, which is simply "NaN". C99 uses "nan[n-characters]", which is non-trivial to parse as input. And Visual Studio C++ Express Editions don't ship with the `strtof/d`

functions that help.

For portability, I manually detect NaNs on output and then use "NaN" to represent them in strings. But detecting NaNs is not trivial, either. The IEEE 754 standard states that NaNs are not equal to anything, including NaNs, so the simplest test for NaN is:

if (v != v) ...

But it is possible that a compiler can optimize that test to `false`

(as GCC will if the -ffast-math option is used). C99/C++TR1 provides the `isnan()`

function, but using that is not always an option. Consequently, I have written my own `is_nan()`

function. The following is the `float`

-specific form, but it can be generalized with a judicious use of partial template specialization.

// // My version of Visual C++ doesn't have stdint.h, // but it does define __uint32 (and __uint64). // #if defined(_MSC_VER) typedef unsigned __int32 uint32_t; #else #include <stdint.h> #endif bool is_nan(float x) { const uint32_t kExpMask = 0x7F800000; const uint32_t kFracMask = 0x007FFFFF; union { float value; uint32_t bits; }; value = x; return ((bits & kExpMask) == kExpMask && (bits & kFracMask) != 0); }

The routine is based on the layout of the IEEE float as defined in the standard. The first most-significant bit is the sign bit, the next 8 bits are the exponent magnitude, and the last 23 bits are the fractional part. The sign bit is ignored when detecting a NaN. For double-precision data the exponent is 11 bits and the fractional part 52 bits. The `union`

would need a 64-bit integer type, `uint64_t`

, to accommodate the bits of a double-precision floating-point variable. See the "IEEE Arithmetic Short Reference" by Lloyd Fosdick for an excellent, concise discussion of the floating-point bits.

Armed with the `is_nan()`

function the complete code for a robust round-trip of floating-point data is:

// ... typedef float FPType; FPType v_in, v; // = +/- infinity, NaN or numeric value std::stringstream s; if (v == std::numeric_limits<FPType>::infinity()) { s << "INF"; } else if (v == -std::numeric_limits<FPType>::infinity()) s << "-INF"; } else if (is_nan(v)) { s << "NaN"; } else { // ...or setprecision() according to FPType... s << boost::lexical_cast<std::string>(v); } input = s.str(); try { v_in = boost::lexical_cast<float>(input); } catch (const boost::bad_lexical_cast& e) { if (input == "INF") { v_in = std::numeric_limits<FPType>::infinity(); } else if (input == "-INF") { v_in = -std::numeric_limits<FPType>::infinity(); } else if (input == "NaN") { v_in = std::numeric_limits<FPType>::quiet_NaN(); } else { throw e; } } // Note, we cannot compare NaNs directly! assert(v == v_in || is_nan(v) && is_nan(v_in));

In practice I'm a little more fault-tolerant on the input, making a pass over the `input std::string`

with `tolower()`

, so the default forms generated by the GCC stream inserter, "inf" and "nan", will also be accepted.

### Reading/Writing XML Float Data With Xerces C++

Since I use the Apache Xerces C++ XML Parser the rest of my discussion applies to that API. As mentioned above, the XML Schema: Datatypes specification constrains valid `float`

and `double`

values to decimal strings, "INF", "-INF", and "NaN". A code fragment that writes a `float`

DOM node therefore looks something like this:

#include <limits> #include <sstream> #include <boost/lexical_cast.hpp> #include <xercesc/dom/DOM.hpp> // possibly others // The following are set by the application. xercesc::DOMDocument* doc; DOMElement* parent; const char* name; float v; std::ostringstream s; if (v == std::numeric_limits<float>::infinity()) { s << "INF"; } else if (v == -std::numeric_limits<float>::infinity()) { s << "-INF"; } else if (v != v) { // if isnan() or is_nan() above s << "NaN"; } else { // Generically use sufficient precision. s << boost::lexical_cast<std::string>(v); } xercesc::DOMElement* el; xercesc::DOMText* txt; el = doc_->createElement( xercesc::XMLString::transcode(name) ); txt = doc->createTextNode( xercesc::XMLString::transcode(s.str().c_str()) ); el->appendChild(txt); parent->appendChild(el);

The code is generic for either the `float`

or `double`

type, so it can be turned into a template function/member. A code fragment for recovering the binary float value from the XML DOM node then looks like this:

xercesc::DOMText* txt; // supplied by the parser xercesc::XMLFloat x( txt->getNodeValue() ); float v; if (x.getType() == xercesc::XMLAbstractDoubleFloat::NaN) { v = std::numeric_limits<float>::quiet_NaN(); } else if (x.getType() == xercesc::XMLAbstractDoubleFloat::PosINF) { v = std::numeric_limits<float>::infinity(); } else if (x.getType() == xercesc::XMLAbstractDoubleFloat::NegINF) { v = -std::numeric_limits<float>::infinity(); } else { v = static_cast<float>(x.getValue()); // getValue() returns a double }

With proper care and use of the Xerces `XMLAbstractDoubleFloat`

type, the above code can also be made generic for either `float`

or `double`

binary types.