-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
[2.0] Proposal: A standard policy for vector dimension mismatches
The goal: A consistent policy for vector operations
With the introduction of
A core concept in these libraries is broadcasting, a standard set of rules for handling operations between vectors, matrices, or tensors of different dimensions. These rules are used everywhere from math libraries like math.js, to machine-learning libraries like TensorFlow.js.
This issue proposes that p5.js adopt the standard broadcasting rules to ensure our math library is consistent, predictable, and extensible for future features like p5.Matrix. Let's explore what that means.
How broadcasting works
For now, it will help to consider how broadcasting works in the special case of vectors. Here, broadcasting tells us that two vectors can be operated on if they have matching dimensions, or if one of the vectors is 1D. That's it. Let's look at some examples of why this rule is so useful.
Addition and subtraction:
createVector(10, 10, 10).add(2) produces components [12, 12, 12]
The 2 in the 1D vector [2] is broadcast to higher dimensions, so that we're really adding [10, 10, 10] and [2, 2, 2]. This is a useful operation in data processing, statistics, etc. For example, given a list of exam grades like [92, 83, 61, 97, 72, 75, 64, 95, 100, 82], we can center it around zero by subtracting the average (82.1) from every number in the list. This makes it clear which scores are below average and which are above average.
Multiplication and division:
createVector(10, 10, 10).mult(2) produces components [20, 20, 20]
As before, the 1D vector [2] becomes [2, 2, 2], and the operation is applied elementwise. This isn't just predictable. It's also very useful. This is scalar multiplication: the vector is scaled by 2, making it twice as long.
[Edit] Performance clarification:
To clarify, the explanation above describes the conceptual model that users can rely on to predict the results of broadcasting. Efficient broadcasting implementations would never actually expand a 1D vector like [2] to a 3D vector [2, 2, 2].
The problem: Current behavior is inconsistent
Currently, p5.js does not follow standard broadcasting rules, which creates several problems:
- Internal inconsistency: p5.js is inconsistent with itself—
mult(2)applies to all components (correct), butadd(2)applies only to the first component. It offers no such shortcut for the second component. - External inconsistency: The custom behavior in p5 is different from every major math and ML library, making p5.js less of an onramp, and forcing users to unlearn p5's rules to advance. It's also inconsistent with creative-coding libraries. For example, openFrameworks uses standard broadcasting for operations among vectors.
- Confusing padding rules: For multi-element multipliers like in
createVector(1, 1, 1).mult(2, 2), v1.x pads the missing component with 1 (resulting in[2, 2, 1]), which is an unpredictable special case. Users might guess[2, 2, 2].
[Update] Added point about openFrameworks.
How could p5 work? The options.
As we stabilize the p5.Vector feature set for p5.js 2.0, we have the opportunity to reassess the rules that p5 should follow. Several options have been described by @limzykenneth:
- Refuse to operate on incompatible vectors (ie. throwing an error when this is tried)
- Perform broadcasting where possible and refuse to operate thereafter
- Automatically convert all vectors to the highest common dimension with 0 padding before operating
- Some combination of the above
Weighing the options: Options 2 and 4 seem most viable
The first option is likely not viable, as it would disallow common operations like scalar multiplication. That leaves Options 2, 3, and 4. Option 3 introduces additional forms of complexity, as noted previously, and it goes against the original reason for introducing
The trouble with Option 4
For Option 4, it seems sensible to at least follow standard broadcasting rules when one of the vectors is 1D. Then the question is, how do we handle mismatches where neither of the vectors is 1D? If we look at a concrete example, we start to see how confusing it might be. In the example below, there is no obvious way to proceed, and users are left guessing.
Example: createVector(2, 3).mult(4, 5, 6)
Do we extend [2, 3] to [2, 3, 0], since that's the most natural way to extend a 2D vector to a 3D vector?
Do we extend [2, 3] to [2, 3, 3], extending the broadcasting approach by repeating the last entry?
Do we extend [2, 3] to [2, 3, 1], since 1 is the multiplicative identity?
Does the user want the vector [2, 3] to turn into a 3D vector at all?
This is just one simple vector example. If we consider matrices or tensors, the situation may become more complicated.
Proposal: Adopt Standard Broadcasting (Option 2)
Based on the analysis above, I propose that p5.js adopt the standard, widely-used broadcasting rules:
- Operations are allowed if vector dimensions match.
- Operations are allowed if one operand is a scalar (a 1D vector or a single number).
- All other dimension mismatches will throw an error.
This approach is simple, consistent, and avoids the ambiguity of custom padding rules (as shown in the "trouble with Option 4" example). It also aligns with the original motivation for p5.Matrix and even p5.Tensor.
[Update] Additional benefits:
The proposed policy would also resolve existing issues beyond vector algebra. An example is outlined in #8189.
Abundance of evidence indicates zero disruption
This would be a breaking change, but major releases are the appropriate time to fix confusing or inconsistent APIs. The key cost to consider is user disruption. To assess this, I collected two forms of data, which together provide a robust body of evidence that indicates zero disruption.
- Empirical data: I did a Google Search for
site:https://editor.p5js.org/ "createVector" "add"and manually reviewed the first 50 results that called theadd()method. Of these, precisely zero used dimension mismatches. 1 I repeated the same methodology for themult()method, and similarly found that zero of 50 sketches usingmult()called it with a nonstandard dimension mismatch (the only dimension mismatch detected was multiplication by a scalar, a case which is unchanged by the current proposal). - Domain Knowledge: This 0% finding is what we'd expect. The non-standard behavior is not documented by any reference examples, it has a more writable and readable alternative (e.g.
v.x += 2), it has no clear use cases, and it is inconsistent with every single major library, across languages and domains—even Processing'sPVectordoes not behave this way.
A more rigorous, Bayesian analysis (for the curious)
For those interested in statistical rigor, this is a textbook case for a beta-binomial model. Given theadd(), and a very generous prior belief that maybe 1 in 1,000 sketches using add() rely on the nonstandard behavior (0.1%), we can be 97.5% confident that the true usage rate in the wild is, at most, 0.351% (or about 1 in 284 sketches that use add()). Also, if there are any sketches that use the non-standard behavior, that code would only break if it's migrated to 2.x; many sketches will be left as is and won't switch to an upgraded version. So it's likely that the true proportion of sketches that would be broken is extremely small, and may well be exactly zero.
Just in case, we can clearly document the breaking change in the compatibility README, which contains the official list of breaking changes made in the upgrade from 1.x to 2.x.
Updates:
- Added results of the
mult()analysis. - Added supporting domain knowledge.
- Added statistical model. (See #8203 for original version.)
Discussion
What do you think, everyone? How would you handle dimension mismatches? Are there any use cases I didn't cover that you think are important?
Invitation for comment
Many other community members have been actively involved in related discussions, and I'd love to hear their thoughts. These include @ksen0, @limzykenneth, @inaridarkfox4231, @sidwellr, @Ahmed-Armaan, @davepagurek, @holomorfo, @nickmcintyre, @RandomGamingDev, and many others. Everyone is welcome to share their ideas!
Footnotes
-
In a technical sense, one sketch used a dimension mismatch due to the way vectors are represented in 1.x, but if this sketch were upgraded to 2.x, there would be no dimension mismatch, so the code would not break. Specifically, the code had the form
add(number1, number2)and was adding to a vector that was intended to be 2D. In 1.x, vectors are represented internally as 3D vectors, so this involved a mismatch. However, with true 2D vectors in 2.x, there'd be no dimension mismatch, so the change to standard broadcasting rules wouldn't break this code. ↩
Sub-issues
Metadata
Metadata
Assignees
Type
Projects
Status