Problem Space: How to handle decimal precision across extremely broad ranges of possible data.
Guiding principles:
- Target decimal alignment, even across large ranges to make comparisons at a glance easy
- Avoid printing too many digits -- namely trying to limit the significant figures we print.
Tasks
Very large data:
-
Print max of 7 digits, this gives us enough room alongside the median/mean etc to let the numbers breathe at min width of the summary column.
-
NICE TO HAVE: We should have a thin space for each three digit group, ie 1000000. becomes 1 000 000. with thinner spaces. We'd still need to be careful to make sure alignment across rows at the decimal place is valid. Alternatively, we may want to use an underscore to indicate each 3 digits (1,000s place), but I think that can happen post public beta.
-
This avoids major locale problems with using , meaning a decimal in Europe.
-
At that scale, we could safely drop all decimals and rely on whole numbers but indicate it is still not a whole number by including a trailing ., ie 1,000,000.
-
If > 1, then avoid printing more than 2 decimal places, ie 1.23 is ok but 1.23456 is not.
-
After that "max printable value" we should switch to either 5 max significant figures + scientific notation, ie 112.05e+10 or 3 significant figures + scientific notation, ie 1.12e+10. My preference would be 3 significant figures + scientific notation.
-
We must be careful to treat all the numbers equally, though, so there is still nice alignment at the decimal, with a scientific notation, and then the exponent can vary across ranges, ie 1.12e+10 and 1.10e+21.
Very small data (<1):
- Max of 4 digits, not including the
0. which counts as one additional digit to get us 5.
- For small data, I think we can be a bit more aggressive with switching to scientific notation, and switch to 3 digits max but likely 2-3 in most situations.
0.05, 0.05671, 0.000000027 becomes
Alternatively, we could go with only necessary scientific notation, but I think that the consistent scientific notation is a bit cleaner.
- If all numbers are 5 or less digits, then don't use scientific notation, just decimal align.
I think it would be useful to coordinate some of the existing logic/heuristics that tibble and pillar use:
We should be able to apply extremely similar numerical handling for sane defaults.
Backend
We may need to handle rounding or even display on the backend.
Problem Space: How to handle decimal precision across extremely broad ranges of possible data.
Guiding principles:
Tasks
Very large data:
Print max of 7 digits, this gives us enough room alongside the
median/meanetc to let the numbers breathe at min width of the summary column.NICE TO HAVE: We should have a thin space for each three digit group, ie
1000000.becomes1 000 000.with thinner spaces. We'd still need to be careful to make sure alignment across rows at the decimal place is valid. Alternatively, we may want to use an underscore to indicate each 3 digits (1,000s place), but I think that can happen post public beta.This avoids major locale problems with using
,meaning a decimal in Europe.At that scale, we could safely drop all decimals and rely on whole numbers but indicate it is still not a whole number by including a trailing
., ie1,000,000.If > 1, then avoid printing more than 2 decimal places, ie
1.23is ok but1.23456is not.After that "max printable value" we should switch to either 5 max significant figures + scientific notation, ie
112.05e+10or 3 significant figures + scientific notation, ie1.12e+10. My preference would be 3 significant figures + scientific notation.We must be careful to treat all the numbers equally, though, so there is still nice alignment at the decimal, with a scientific notation, and then the exponent can vary across ranges, ie
1.12e+10and1.10e+21.Very small data (<1):
0.which counts as one additional digit to get us 5.0.05, 0.05671, 0.000000027becomesAlternatively, we could go with only necessary scientific notation, but I think that the consistent scientific notation is a bit cleaner.
I think it would be useful to coordinate some of the existing logic/heuristics that
tibbleandpillaruse:We should be able to apply extremely similar numerical handling for sane defaults.
Backend
We may need to handle rounding or even display on the backend.