|
| 1 | +- Start Date: (fill me in with today's date, YYYY-MM-DD) |
| 2 | +- RFC PR: (leave this empty) |
| 3 | +- Rust Issue: (leave this empty) |
| 4 | + |
| 5 | +# Summary |
| 6 | + |
| 7 | +Restore the integer inference fallback that was removed. Integer |
| 8 | +literals whose type is unconstrained will default to `int`, as before. |
| 9 | +Floating point literals will default to `f64`. |
| 10 | + |
| 11 | +# Motivation |
| 12 | + |
| 13 | +## History lesson |
| 14 | + |
| 15 | +Rust has had a long history with integer and floating-point |
| 16 | +literals. Initial versions of Rust required *all* literals to be |
| 17 | +explicitly annotated with a suffix (if no suffix is provided, then |
| 18 | +`int` or `float` was used; note that the `float` type has since been |
| 19 | +removed). This meant that, for example, if one wanted to count up all |
| 20 | +the numbers in a list, one would write `0u` and `1u` so as to employ |
| 21 | +unsigned integers: |
| 22 | + |
| 23 | + let mut count = 0u; // let `count` be an unsigned integer |
| 24 | + while cond() { |
| 25 | + ... |
| 26 | + count += 1u; // `1u` must be used as well |
| 27 | + } |
| 28 | + |
| 29 | +This was particularly troublesome with arrays of integer literals, |
| 30 | +which could be quite hard to read: |
| 31 | + |
| 32 | + let byte_array = [0u8, 33u8, 50u8, ...]; |
| 33 | + |
| 34 | +It also meant that code which was very consciously using 32-bit or |
| 35 | +64-bit numbers was hard to read. |
| 36 | + |
| 37 | +Therefore, we introduced integer inference: unlabeled integer literals |
| 38 | +are not given any particular integral type rather a fresh "integral |
| 39 | +type variable" (floating point literals work in an analogous way). The |
| 40 | +idea is that the vast majority of literals will eventually interact |
| 41 | +with an actual typed variable at some point, and hence we can infer |
| 42 | +what type they ought to have. For those cases where the type cannot be |
| 43 | +automatically selected, we decided to fallback to our older behavior, |
| 44 | +and have integer/float literals be typed as `int`/`float` (this is also what Haskell |
| 45 | +does). Some time later, we did [various measurements][m] and found |
| 46 | +that in real world code this fallback was rarely used. Therefore, we |
| 47 | +decided that to remove the fallback. |
| 48 | + |
| 49 | +## Experience with lack of fallback |
| 50 | + |
| 51 | +Unfortunately, when doing the measurements that led us to decide to |
| 52 | +remove the `int` fallback, we neglected to consider coding "in the |
| 53 | +small" (specifically, we did not include tests in the |
| 54 | +measurements). It turns out that when writing small programs, which |
| 55 | +includes not only "hello world" sort of things but also tests, the |
| 56 | +lack of integer inference fallback is quite annoying. This is |
| 57 | +particularly troublesome since small program are often people's first |
| 58 | +exposure to Rust. The problems most commonly occur when integers are |
| 59 | +"consumed" by printing them out to the screen or by asserting |
| 60 | +equality, both of which are very common in small programs and testing. |
| 61 | + |
| 62 | +There are at least three common scenarios where fallback would be |
| 63 | +beneficial: |
| 64 | + |
| 65 | +**Accumulator loops.** Here a counter is initialized to `0` and then |
| 66 | +incremented by `1`. Eventually it is printed or compared against |
| 67 | +a known value. |
| 68 | + |
| 69 | +``` |
| 70 | +let mut c = 0; |
| 71 | +loop { |
| 72 | + ...; |
| 73 | + c += 1; |
| 74 | +} |
| 75 | +println!("{}", c); // Does not constrain type of `c` |
| 76 | +assert_eq(c, 22); |
| 77 | +``` |
| 78 | + |
| 79 | +**Calls to range with constant arguments.** Here a call to range like |
| 80 | +`range(0, 10)` is used to execute something 10 times. It is important |
| 81 | +that the actual counter is either unused or only used in a print out |
| 82 | +or comparison against another literal: |
| 83 | + |
| 84 | +``` |
| 85 | +for _ in range(0, 10) { |
| 86 | +} |
| 87 | +``` |
| 88 | + |
| 89 | +**Large constants.** In small tests it is convenient to make dummy |
| 90 | +test data. This frequently takes the form of a vector or map of ints. |
| 91 | + |
| 92 | +``` |
| 93 | +let mut m = HashMap::new(); |
| 94 | +m.insert(1, 2); |
| 95 | +m.insert(3, 4); |
| 96 | +assert_eq(m.find(&3).map(|&i| i).unwrap(), 4); |
| 97 | +``` |
| 98 | + |
| 99 | +## Lack of bugs |
| 100 | + |
| 101 | +To our knowledge, there has not been a single bug exposed by removing |
| 102 | +the fallback to the `int` type. Moreover, such bugs seem to be |
| 103 | +extremely unlikely. |
| 104 | + |
| 105 | +The primary reason for this is that, in production code, the `int` |
| 106 | +fallback is very rarely used. In a sense, the same [measurements][m] |
| 107 | +that were used to justify removing the `int` fallback also justify |
| 108 | +keeping it. As the measurements showed, the vast, vast majority of |
| 109 | +integer literals wind up with a constrained type, unless they are only |
| 110 | +used to print out and do assertions with. Specifically, any integer |
| 111 | +that is passed as a parameter, returned from a function, or stored in |
| 112 | +a struct or array, must wind up with a specific type. |
| 113 | + |
| 114 | +Another secondary reason is that the lint which checks that literals |
| 115 | +are suitable for their assigned type will catch cases where very large |
| 116 | +literals were used that overflow the `int` type (for example, |
| 117 | +`INT_MAX`+1). (Note that the overflow lint constraints `int` literals |
| 118 | +to 32 bits for better portability.) |
| 119 | + |
| 120 | +In almost all of common cases we described above, there exists *some* |
| 121 | +large constant representing a bound. If this constant exceeds the |
| 122 | +range of the chosen fallback type, then a `type_overflow` lint warning |
| 123 | +would be triggered. For example, in the accumulator, if the |
| 124 | +accumulated result `i` is compared using a call like `assert_eq(i, |
| 125 | +22)`, then the constant `22` will be linted. Similarly, when invoking |
| 126 | +range with unconstrained arguments, the arguments to range are linted. |
| 127 | +And so on. |
| 128 | + |
| 129 | +The only common case where the lint does not apply is when an |
| 130 | +accumulator result is only being printed to the screen or otherwise |
| 131 | +consumed by some generic function which never stores it to memory. |
| 132 | +This is a very narrow case. |
| 133 | + |
| 134 | +## Future-proofing for overloaded literals |
| 135 | + |
| 136 | +It is possible that, in the future, we will wish to allow vector and |
| 137 | +strings literals to be overloaded so that they can be resolved to |
| 138 | +user-defined types. In that case, for backwards compatibility, it will |
| 139 | +be necessary for those literals to have some sort of fallback type. |
| 140 | +(This is a relatively weak consideration.) |
| 141 | + |
| 142 | +# Detailed design |
| 143 | + |
| 144 | +Integeral literals are currently type-checked by creating a special |
| 145 | +class of type variable. These variables are subject to unification as |
| 146 | +normal, but can only unify with integral types. This RFC proposes |
| 147 | +that, at the end of type inference, when all constraints are known, we |
| 148 | +will identify all integral type variables that have not yet been bound |
| 149 | +to anything and bind them to `int`. Similarly, floating point literals |
| 150 | +will fallback to `f64`. |
| 151 | + |
| 152 | +For those who wish to be very careful about which integral types they |
| 153 | +employ, a new lint (`unconstrained_literal`) will be added which |
| 154 | +defaults to `allow`. This lint is triggered whenever the type of an |
| 155 | +integer or floating point literal is unconstrained. |
| 156 | + |
| 157 | +# Downsides |
| 158 | + |
| 159 | +Although we give a detailed argument for why bugs are unlikely, it is |
| 160 | +nonetheless possible that this choice will lead to bugs in some code, |
| 161 | +since another choice (most likely `uint`) may have been more suitable. |
| 162 | + |
| 163 | +Given that the size of `int` is platform dependent, it is possible |
| 164 | +that a porting hazard is created. This is mitigated by the fact that |
| 165 | +the `type_overflow` lint constraints `int` literals to 32 bits. |
| 166 | + |
| 167 | +# Alternatives |
| 168 | + |
| 169 | +- **No fallback.** Status quo. |
| 170 | + |
| 171 | +- **Fallback to something else.** We could potentially fallback to |
| 172 | + `i32` or some other integral type rather than `int`. |
| 173 | + |
| 174 | +- **Fallback in a more narrow range of cases.** We could attempt to |
| 175 | + identify integers that are "only printed" or "only compared". There |
| 176 | + is no concrete proposal in this direction and it seems to lead to an |
| 177 | + overly complicated design. |
| 178 | + |
| 179 | +- **Default type parameters influencing inference.** There is a |
| 180 | + separate, follow-up proposal being prepared that uses default type |
| 181 | + parameters to influence inference. This would allow some examples, |
| 182 | + like `range(0, 10)` to work even without integral fallback, because |
| 183 | + the `range` function itself could specify a fallback type. However, |
| 184 | + this does not help with many other examples. |
| 185 | + |
| 186 | +[m]: https://gist.github.com/nikomatsakis/11179747 |
0 commit comments