Writing IEEE Floating-Point C++ Data as Decimal Text

Here are my notes on writing floating-point binary data as decimal strings, including within XML nodes. My concern is with IEEE 754-2008 Floating-Point binary values, and really only with the single- (C/C++ float) and double-precision (C/C++ double) forms, not the extended forms.

Avoiding Round-Off Errors in the Round Trip

When saving a floating-point binary value as a decimal string I must be able to parse the string and recover the exact binary value. In order to do that I must use sufficient precision, for the reasons discussed in "What Every Computer Scientist Should Know About Floating-Point Arithmetic". Single-precision float data required 9 decimal places, double-precision double data requires 17 places.

To ensure that I write my decimal strings with adequate precision I either set the precision on the stream or I use the lexical_cast class from the Boost C++ Libraries. Some of the advantages and disadvantages of each approach are discussed in "The String Formatters of Manor Farm". I find using a stream manipulator to set the precision to be sufficiently practical, but restoring the stream to its prior state can be burdensome, making the lexical_cast approach appealing. The lexical_cast class also makes it easier to keep code generic for either float or double types. The lexical_cast class also provides a clean way of parsing the floating-point values from a char* or std::string type, often what an API returns. The following illustrates both approaches in action.

#include <sstream>
#include <iomanip>
#include <boost/lexical_cast.hpp>
#include <cassert>
 
int main()
{
    float  v_in,  v  = 3.14159625F;
    double vv_in, vv = v;
 
    std::stringstream s;
 
    // Leaving the stream state unchanged:
    s << boost::lexical_cast<std::string>(v) << " "
      << boost::lexical_cast<std::string>(vv);
 
    s >> v_in >> vv_in;
    assert(v == v_in && vv == vv_in);
 
    // Setting the stream state:
    s << std::setprecision(9) << v << " "
      << std::setprecision(17) << vv;
 
    s >> v_in >> vv_in;
    assert(v == v_in && vv == vv_in);
}

Note that the assertion uses the exact equality test.

Special Quantities: Infinity

The IEEE 754 floating-point standard specifies binary formats for positive and negative infinity, but it says nothing about how they should be represented as text. C90/C++98 compilers vary in how they represent infinite values. The following:

std::cout << std::numeric_limits<float>::infinity() << std::endl;

produces "inf" with GCC (or "INF" if the uppercase flag had been set on cout), but it produces "1.#INF" with Visual C++. C99/C++TR1, specifying "[-]inf" or "[-]infinity" (with an additional case specifier), does not make the round-trip to and from a string any easier. The following will still likely fail when v is +/- infinity:

#include <limits>
#include <sstream>
#include <iostream>
 
//...
 
float v = std::numeric_limits<float>::infinity(), v_in;
std::stringstream s;
s << v;
s >> v_in;
assert(v == v_in);

Since I need to save floating-point decimal strings to XML files, and since the W3C XML Schema: Datatypes specification says that infinity be written as "INF" in float nodes, I handle (-)infinity as a special case and always represent it as "(-)INF" in string form. The round-trip then looks like this:

    // ...
    typedef float FPType; // to keep code generic -- could be double
 
    FPType  v_in,  v;  // = +/- infinity or numeric value
 
    std::stringstream s;
 
    if (v == std::numeric_limits<FPType>::infinity()) {
        s << "INF";
    } else if (v == -std::numeric_limits<FPType>::infinity()) 
        s << "-INF";
    } else {
        s << boost::lexical_cast<std::string>(v);
    }
 
    input = s.str();
 
    try {
        v_in = boost::lexical_cast<float>(input);
    } catch (const boost::bad_lexical_cast& e) {
        if (input == "INF") {
            v_in = std::numeric_limits<FPType>::infinity();
        } else if (input == "-INF") {
            v_in = -std::numeric_limits<FPType>::infinity();
        } else {
            throw e;
        }
    }
 
    assert(v == v_in);

The assertion will not be satisfied for all values of v, though.

Special Quantities: NaN

Just as the IEEE 754 floating-point standard says nothing about how infinities should be represented as text, neither does it say anything about how NaNs should be written. The C99/C++TR1 standards are again somewhat more prescriptive, but their definitions for NaNs do not match the XML specification, which is simply "NaN". C99 uses "nan[n-characters]", which is non-trivial to parse as input. And Visual Studio C++ Express Editions don't ship with the strtof/d functions that help.

For portability, I manually detect NaNs on output and then use "NaN" to represent them in strings. But detecting NaNs is not trivial, either. The IEEE 754 standard states that NaNs are not equal to anything, including NaNs, so the simplest test for NaN is:

if (v != v) ...

But it is possible that a compiler can optimize that test to false (as GCC will if the -ffast-math option is used). C99/C++TR1 provides the isnan() function, but using that is not always an option. Consequently, I have written my own is_nan() function. The following is the float-specific form, but it can be generalized with a judicious use of partial template specialization.

//
// My version of Visual C++ doesn't have stdint.h,
// but it does define __uint32 (and __uint64).
//
#if defined(_MSC_VER)
  typedef unsigned __int32 uint32_t;
#else
  #include <stdint.h>
#endif
 
bool is_nan(float x)
{
    const uint32_t kExpMask  = 0x7F800000;
    const uint32_t kFracMask = 0x007FFFFF;
 
    union {
        float value;
	uint32_t bits;
    };
 
    value = x;
    return ((bits & kExpMask) == kExpMask &&
            (bits & kFracMask) != 0);
}

The routine is based on the layout of the IEEE float as defined in the standard. The first most-significant bit is the sign bit, the next 8 bits are the exponent magnitude, and the last 23 bits are the fractional part. The sign bit is ignored when detecting a NaN. For double-precision data the exponent is 11 bits and the fractional part 52 bits. The union would need a 64-bit integer type, uint64_t, to accommodate the bits of a double-precision floating-point variable. See the "IEEE Arithmetic Short Reference" by Lloyd Fosdick for an excellent, concise discussion of the floating-point bits.

Armed with the is_nan() function the complete code for a robust round-trip of floating-point data is:

    // ...
    typedef float FPType;
 
    FPType  v_in,  v;   // = +/- infinity, NaN or numeric value
 
    std::stringstream s;
 
    if (v == std::numeric_limits<FPType>::infinity()) {
        s << "INF";
    } else if (v == -std::numeric_limits<FPType>::infinity()) 
        s << "-INF";
    } else if (is_nan(v)) {
        s << "NaN";
    } else {
        // ...or setprecision() according to FPType...
        s << boost::lexical_cast<std::string>(v);
    }
 
    input = s.str();
 
    try {
        v_in = boost::lexical_cast<float>(input);
    } catch (const boost::bad_lexical_cast& e) {
        if (input == "INF") {
            v_in = std::numeric_limits<FPType>::infinity();
        } else if (input == "-INF") {
            v_in = -std::numeric_limits<FPType>::infinity();
        } else if (input == "NaN") {
            v_in = std::numeric_limits<FPType>::quiet_NaN();
        } else {
            throw e;
        }
    }
 
    // Note, we cannot compare NaNs directly!
    assert(v == v_in || is_nan(v) && is_nan(v_in));

In practice I'm a little more fault-tolerant on the input, making a pass over the input std::string with tolower(), so the default forms generated by the GCC stream inserter, "inf" and "nan", will also be accepted.

Reading/Writing XML Float Data With Xerces C++

Since I use the Apache Xerces C++ XML Parser the rest of my discussion applies to that API. As mentioned above, the XML Schema: Datatypes specification constrains valid float and double values to decimal strings, "INF", "-INF", and "NaN". A code fragment that writes a float DOM node therefore looks something like this:

#include <limits>
#include <sstream>
#include <boost/lexical_cast.hpp>
#include <xercesc/dom/DOM.hpp>  // possibly others
 
// The following are set by the application.
xercesc::DOMDocument* doc;
DOMElement* parent;
const char* name;
 
float v;
 
std::ostringstream s;
 
if (v == std::numeric_limits<float>::infinity()) {
    s << "INF";
} else if (v == -std::numeric_limits<float>::infinity()) {
    s << "-INF";
} else if (v != v) {  // if isnan() or is_nan() above
    s << "NaN";
} else {
    // Generically use sufficient precision.
    s << boost::lexical_cast<std::string>(v);
}
 
xercesc::DOMElement* el;
xercesc::DOMText* txt;
 
el = doc_->createElement( xercesc::XMLString::transcode(name) );
txt = doc->createTextNode( xercesc::XMLString::transcode(s.str().c_str()) );
el->appendChild(txt);
 
parent->appendChild(el);

The code is generic for either the float or double type, so it can be turned into a template function/member. A code fragment for recovering the binary float value from the XML DOM node then looks like this:

xercesc::DOMText* txt; // supplied by the parser
xercesc::XMLFloat x( txt->getNodeValue() );
 
float v;
 
if (x.getType() == xercesc::XMLAbstractDoubleFloat::NaN) {
    v = std::numeric_limits<float>::quiet_NaN();
} else if (x.getType() == xercesc::XMLAbstractDoubleFloat::PosINF) {
    v = std::numeric_limits<float>::infinity();
} else if (x.getType() == xercesc::XMLAbstractDoubleFloat::NegINF) {
    v = -std::numeric_limits<float>::infinity();
} else {
    v = static_cast<float>(x.getValue()); // getValue() returns a double
}

With proper care and use of the Xerces XMLAbstractDoubleFloat type, the above code can also be made generic for either float or double binary types.