[2.0] Proposal: A standard policy for vector dimension mismatches

## [2.0] Proposal: A standard policy for vector dimension mismatches

### The goal: A consistent policy for vector operations
With the introduction of $n$-dimensional vectors in p5.js 2.0, we have an exciting opportunity to align p5.js with modern math and machine-learning libraries. This can provide an accessible, creative onramp for users to learn fundamental data-science concepts. In fact, this was the original motivation for introducing $n$-dimensional vectors to p5.

A core concept in these libraries is broadcasting, a standard set of rules for handling operations between vectors, matrices, or tensors of different dimensions. These rules are used everywhere from math libraries like math.js, to machine-learning libraries like TensorFlow.js.

This issue proposes that p5.js adopt the standard broadcasting rules to ensure our math library is consistent, predictable, and extensible for future features like `p5.Matrix`. Let's explore what that means.

### How broadcasting works
For now, it will help to consider how broadcasting works in the special case of vectors. Here, broadcasting tells us that two vectors can be operated on if they have matching dimensions, or if one of the vectors is 1D. That's it. Let's look at some examples of why this rule is so useful.

**Addition and subtraction:**
 `createVector(10, 10, 10).add(2)` produces components `[12, 12, 12]`

The `2` in the 1D vector `[2]` is broadcast to higher dimensions, so that we're really adding `[10, 10, 10]` and `[2, 2, 2]`. This is a useful operation in data processing, statistics, etc. For example, given a list of exam grades like `[92, 83, 61, 97, 72, 75, 64, 95, 100, 82]`, we can center it around zero by subtracting the average (82.1) from every number in the list. This makes it clear which scores are below average and which are above average.

**Multiplication and division:**
`createVector(10, 10, 10).mult(2)` produces components `[20, 20, 20]`

As before, the 1D vector `[2]` becomes `[2, 2, 2]`, and the operation is applied elementwise. This isn't just predictable. It's also very useful. This is _scalar multiplication_: the vector is scaled by 2, making it twice as long.

**[Edit] Performance clarification:**
To clarify, the explanation above describes the conceptual model that users can rely on to predict the _results_ of broadcasting. Efficient broadcasting implementations would never actually expand a 1D vector like `[2]` to a 3D vector `[2, 2, 2]`.

### The problem: Current behavior is inconsistent
Currently, p5.js does not follow standard broadcasting rules, which creates several problems:

* **Internal inconsistency:** p5.js is inconsistent with itself&mdash;`mult(2)` applies to all components (correct), but `add(2)` applies only to the first component. It offers no such shortcut for the second component.
* **External inconsistency:** The custom behavior in p5 is different from every major math and ML library, making p5.js less of an onramp, and forcing users to unlearn p5's rules to advance. It's also inconsistent with creative-coding libraries. For example, [openFrameworks](https://openframeworks.cc//documentation/math/ofVec2f/#!show_operator+) uses standard broadcasting for operations among vectors.
* **Confusing padding rules:** For multi-element multipliers like in `createVector(1, 1, 1).mult(2, 2)`, v1.x pads the missing component with 1 (resulting in `[2, 2, 1]`), which is an unpredictable special case. Users might guess `[2, 2, 2]`.

**[Update] Added point about openFrameworks.**

### How _could_ p5 work? The options.
As we stabilize the `p5.Vector` feature set for p5.js 2.0, we have the opportunity to reassess the rules that p5 should follow. Several options [have been described](https://github.com/processing/p5.js/issues/8118#issuecomment-3381030875) by @limzykenneth:

> 1. Refuse to operate on incompatible vectors (ie. throwing an error when this is tried)
> 2. Perform broadcasting where possible and refuse to operate thereafter
> 3. Automatically convert all vectors to the highest common dimension with 0 padding before operating
> 4. Some combination of the above

### Weighing the options: Options 2 and 4 seem most viable
The first option is likely not viable, as it would disallow common operations like scalar multiplication. That leaves Options 2, 3, and 4. Option 3 introduces additional forms of complexity, as noted previously, and it goes against the original reason for introducing $n$-dimensional vectors to p5, since advanced math and machine-learning libraries do not work this way. That leaves Options 2 and 4. Perhaps, in a creative-coding context, Option 4 might be useful?

### The trouble with Option 4
For Option 4, it seems sensible to at least follow standard broadcasting rules when one of the vectors is 1D. Then the question is, how do we handle mismatches where neither of the vectors is 1D? If we look at a concrete example, we start to see how confusing it might be. In the example below, there is no obvious way to proceed, and users are left guessing.

**Example: `createVector(2, 3).mult(4, 5, 6)`**
Do we extend `[2, 3]` to `[2, 3, 0]`, since that's the most natural way to extend a 2D vector to a 3D vector?
Do we extend `[2, 3]` to `[2, 3, 3]`, extending the broadcasting approach by repeating the last entry?
Do we extend `[2, 3]` to `[2, 3, 1]`, since 1 is the multiplicative identity?
Does the user want the vector `[2, 3]` to turn into a 3D vector at all?

This is just one simple vector example. If we consider matrices or tensors, the situation may become more complicated.

### Proposal: Adopt Standard Broadcasting (Option 2)
Based on the analysis above, I propose that p5.js adopt the standard, widely-used broadcasting rules:

1. Operations are allowed if vector dimensions match.
2. Operations are allowed if one operand is a scalar (a 1D vector or a single number).
3. All other dimension mismatches will throw an error.

This approach is simple, consistent, and avoids the ambiguity of custom padding rules (as shown in the "trouble with Option 4" example). It also aligns with the original motivation for $n$-dimensional vectors, by preparing users for advanced math and machine learning libraries. And it ensures our API will be extensible to `p5.Matrix` and even `p5.Tensor`.

**[Update] Additional benefits:** 
The proposed policy would also resolve existing issues beyond vector algebra. An example is outlined in #8189.

### Abundance of evidence indicates _zero_ disruption
This would be a breaking change, but major releases are the appropriate time to fix confusing or inconsistent APIs. The key cost to consider is user disruption. To assess this, I collected two forms of data, which together provide a robust body of evidence that indicates zero disruption. 

* **Empirical data:** I did a Google Search for `site:https://editor.p5js.org/ "createVector" "add"` and manually reviewed the first 50 results that called the `add()` method. Of these, precisely zero used dimension mismatches. [^1] I repeated the same methodology for the `mult()` method, and similarly found that zero of 50 sketches using `mult()` called it with a nonstandard dimension mismatch (the only dimension mismatch detected was multiplication by a scalar, a case which is unchanged by the current proposal).
* **Domain Knowledge:** This 0% finding is what we'd expect. The non-standard behavior is not documented by any reference examples, it has a more writable and readable alternative (e.g. `v.x += 2`), it has no clear use cases, and it is inconsistent with every single major library, across languages and domains—even Processing's `PVector` does not behave this way.

<details> <summary>A more rigorous, Bayesian analysis (for the curious)</summary>
For those interested in statistical rigor, this is a textbook case for a beta-binomial model. Given the $0$ observed uses in our $n=50$ sample for <code>add()</code>, and a very generous prior belief that maybe 1 in 1,000 sketches using <code>add()</code> rely on the nonstandard behavior (0.1%), we can be 97.5% confident that the true usage rate in the wild is, at most, 0.351% (or about 1 in 284 sketches that use <code>add()</code>). Also, if there are any sketches that use the non-standard behavior, that code would only break if it's migrated to 2.x; many sketches will be left as is and won't switch to an upgraded version. So it's likely that the true proportion of sketches that would be broken is extremely small, and may well be exactly zero.
</details>

Just in case, we can clearly document the breaking change in the [compatibility README](https://github.com/processing/p5.js-compatibility), which contains the official list of breaking changes made in the upgrade from 1.x to 2.x.

**Updates:** 
1. Added results of the `mult()` analysis. 
2. Added supporting domain knowledge. 
3. Added statistical model. (See [#8203](https://github.com/processing/p5.js/pull/8203#issuecomment-3471535818) for original version.)

### Discussion
What do you think, everyone? How would you handle dimension mismatches? Are there any use cases I didn't cover that you think are important?

### Invitation for comment
Many other community members have been actively involved in related discussions, and I'd love to hear their thoughts. These include @ksen0, @limzykenneth, @inaridarkfox4231, @sidwellr, @Ahmed-Armaan, @davepagurek, @holomorfo, @nickmcintyre, @RandomGamingDev, and many others. Everyone is welcome to share their ideas!

[^1]: In a technical sense, one sketch used a dimension mismatch due to the way vectors are represented in 1.x, but if this sketch were upgraded to 2.x, there would be no dimension mismatch, so the code would not break. Specifically, the code had the form `add(number1, number2)` and was adding to a vector that was intended to be 2D. In 1.x, vectors are represented internally as 3D vectors, so this involved a mismatch. However, with true 2D vectors in 2.x, there'd be no dimension mismatch, so the change to standard broadcasting rules wouldn't break this code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[2.0] Proposal: A standard policy for vector dimension mismatches #8159