Reducing time for integer to string conversion

12/13/20

Rationale

Maybe I just enjoy thinking about ways to work with some of the most fundamental tasks in C. Maybe there is a fair amount of data that has to be presented. Either way, I often have to get a number of data fields into a C string. Calling a library function is one way to do it. Using the snprintf() function seems a bit expensive. It may even allocate heap memory, or use recursion, which of course I want to avoid. Using the Cortex-M newlib nano, the function may avoid both of these issues, but snprintf() may still implement a variable argument list. I want to try a simple alternative and see how it performs in a cycle time comparison.

Structure

To start, let's make a function that picks out each digit of an unsigned integer. It's going to require a modulo and a divide. Modulo is a three-step process of course, but Arm provides a modulo function that is 2-12 machine cycles, if I read the documents correctly. We can spell out the process directly in the function Uint32ToStr() to better see what is happening in each operation.

uint32_t Uint32ToStr(char* buff, size_t digits, uint32_t num)
{
    uint32_t counter = 0;
    if(digits > 9) buff[counter++] = num/1000000000 + 0x30;
    if(digits > 8) buff[counter++] = (num%1000000000)/100000000 + 0x30;
    if(digits > 7) buff[counter++] = (num%100000000)/10000000 + 0x30;
    if(digits > 6) buff[counter++] = (num%10000000)/1000000 + 0x30;
    if(digits > 5) buff[counter++] = (num%1000000)/100000 + 0x30;
    if(digits > 4) buff[counter++] = (num%100000)/10000 + 0x30;
    if(digits > 3) buff[counter++] = (num%10000)/1000 + 0x30;
    if(digits > 2) buff[counter++] = (num%1000)/100 + 0x30;
    if(digits > 1) buff[counter++] = (num%100)/10 + 0x30;
    if(digits > 0) buff[counter++] = (num%10) + 0x30;
    buff[digits] = 0;
    return counter;
}

This works ok for a limited case. The user is required to input the desired number of digits, then the result is zero-padded. For purposes of performance improvement, let's check the timing against snprintf() using a Cortex M3.

function comparison in time and cycles

I would like to improve this by taking care of the padding and length automatically. Ideally we would loop through the digits. In the case of a INT_MAX value, we don't have a large enough unsigned integer to use modulo without using a uint64_t, so we must deal with this case individually. That is, using a 64 bit value in an embedded environment will bloat your compute time with load and stores.

Now, in this updated function, once we get a non-zero digit, we will flag the requirement to output all subsequent digits. Each digit is added as a character, then the string is zero terminated. We don't need to modulo the first pass, because we cant get another digit in front with a radix of 10.

uint32_t Uint32ToStr(char* buff, uint32_t num)
{
	uint32_t counter = 0;
	uint32_t pad = 1;
	uint32_t digits = 10;
	uint32_t divisor = 1000000000;
	while(digits){
		uint32_t n;
		if(digits == 10){
			n = num/divisor;
		}
		else{
			n = (num%divisor)/(divisor/10);
			divisor /= 10;
		}

		if(n || pad == 0){
			buff[counter] = n + 0x30;
			counter++;
			pad = 0;
		}
		digits--;

	}
	buff[counter] = 0;
	return counter;
}

Did we improve anything? The function is easier to use, does not zero pad, and reliably puts the correct digits in the string, without having to provide the count. But we are giving up some efficiency.

improved Function less efficient

Since you're forced to loop 10 times, the savings are reduced. However, there may be reason to continue. In the worst case we're halving the time spent in computation. What if we add more capability.

Currently we don't have functionality for the entire integer domain. We can't put in negative numbers.

Let's add a function, Int32ToStr() to take an integer input, and, if it is negative, we will perform a two's complement to get an unsigned integer before sending to the previous function. Since the domain for unsigned is bigger, once the sign is extracted, we can still use our unsigned function.

uint32_t intMinusToUint(int a)
{
	a -= 1;
	a ^= 0xffffffff;
	return (uint32_t)a;
}

uint32_t Int32ToStr(char* buff, int num)
{
    uint32_t negativeFlag = 0;
    uint32_t uNum;
    if(num < 0)
    {
        negativeFlag = 1;
        uNum = intMinusToUint(num);
        buff[0] = '-';
    }
    else
    {
        uNum = (uint32_t)num;
    }

    return Uint32ToStr(buff + negativeFlag, uNum);
}

Ok, now we can use it for the full range of thirty-two bit integers. But...how much did that cost us in time?

Full integer function time in cycle + and -

Not too bad. If you have a sizable number of fields to extract into strings, this definitely could save time. Also, it is kind of a fun exercise.