Skip to content

Commit 275fb1b

Browse files
authored
Add documentation about performance PRs, add (TBD) section on feature criteria (#12372)
* Improve PR documentation: performance PRs and suitable changes * prettier
1 parent 0a82ac3 commit 275fb1b

File tree

1 file changed

+53
-9
lines changed

1 file changed

+53
-9
lines changed

docs/source/contributor-guide/index.md

+53-9
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@ community as well as get more familiar with Rust and the relevant codebases.
3434

3535
You can find a curated [good-first-issue] list to help you get started.
3636

37+
[good-first-issue]: https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22
38+
39+
### Open Contribution and Assigning tickets
40+
3741
DataFusion is an open contribution project, and thus there is no particular
3842
project imposed deadline for completing any issue or any restriction on who can
3943
work on an issue, nor how many people can work on an issue at the same time.
@@ -54,13 +58,27 @@ unable to make progress you should unassign the issue by using the `unassign me`
5458
link at the top of the issue page (and ask for help if are stuck) so that
5559
someone else can get involved in the work.
5660

61+
### File Tickets to Discuss New Features
62+
5763
If you plan to work on a new feature that doesn't have an existing ticket, it is
5864
a good idea to open a ticket to discuss the feature. Advanced discussion often
5965
helps avoid wasted effort by determining early if the feature is a good fit for
60-
DataFusion before too much time is invested. It also often helps to discuss your
61-
ideas with the community to get feedback on implementation.
66+
DataFusion before too much time is invested. Discussion on a ticket can help
67+
gather feedback from the community and is likely easier to discuss than a 1000
68+
line PR.
6269

63-
[good-first-issue]: https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22
70+
If you open a ticket and it doesn't get any response, you can try `@`-mentioning
71+
recently active community members in the ticket to get their attention.
72+
73+
### What Features are Good Fits for DataFusion?
74+
75+
DataFusion is designed to highly extensible, and many features can be implemented
76+
as extensions without changing the core of DataFusion.
77+
78+
We are [working on criteria for what features are good fits for DataFusion], and
79+
will update this section when we have more to share.
80+
81+
[working on criteria for what features are good fits for datafusion]: https://github.com/apache/datafusion/issues/12357
6482

6583
# Developer's guide
6684

@@ -88,35 +106,61 @@ committer who approved your PR to help remind them to merge it.
88106

89107
## Creating Pull Requests
90108

91-
We recommend splitting your contributions into multiple smaller focused PRs rather than large PRs (500+ lines) because:
109+
When possible, we recommend splitting your contributions into multiple smaller focused PRs rather than large PRs (500+ lines) because:
92110

93111
1. The PR is more likely to be reviewed quickly -- our reviewers struggle to find the contiguous time needed to review large PRs.
94112
2. The PR discussions tend to be more focused and less likely to get lost among several different threads.
95113
3. It is often easier to accept and act on feedback when it comes early on in a small change, before a particular approach has been polished too much.
96114

97115
If you are concerned that a larger design will be lost in a string of small PRs, creating a large draft PR that shows how they all work together can help.
98116

99-
Note all commits in a PR are squashed when merged to the `main` branch so there is one commit per PR.
117+
Note all commits in a PR are squashed when merged to the `main` branch so there is one commit per PR after merge.
100118

101119
# Reviewing Pull Requests
102120

103121
Some helpful links:
104122

105-
- [PRs Waiting for Review]
106-
- [Approved PRs Waiting for Merge]
123+
- [PRs Waiting for Review] on GitHub
124+
- [Approved PRs Waiting for Merge] on GitHub
107125

108126
[prs waiting for review]: https://github.com/apache/datafusion/pulls?q=is%3Apr+is%3Aopen+-review%3Aapproved+-is%3Adraft+
109127
[approved prs waiting for merge]: https://github.com/apache/datafusion/pulls?q=is%3Apr+is%3Aopen+review%3Aapproved+-is%3Adraft
110128

111-
When reviewing PRs, please remember our primary goal is to improve DataFusion and its community together. PR feedback should be constructive with the aim to help improve the code as well as the understanding of the contributor.
129+
When reviewing PRs, our primary goal is to improve DataFusion and its community together. PR feedback should be constructive with the aim to help improve the code as well as the understanding of the contributor.
112130

113131
Please ensure any issues you raise contains a rationale and suggested alternative -- it is frustrating to be told "don't do it this way" without any clear reason or alternate provided.
114132

115133
Some things to specifically check:
116134

117-
1. Is the feature or fix covered sufficiently with tests (see `Test Organization` below)?
135+
1. Is the feature or fix covered sufficiently with tests (see the [Testing](testing.md) section)?
118136
2. Is the code clear, and fits the style of the existing codebase?
119137

138+
## Performance Improvements
139+
140+
Performance improvements are always welcome: performance is a key DataFusion
141+
feature.
142+
143+
In general, the performance improvement from a change should be "enough" to
144+
justify any added code complexity. How much is "enough" is a judgement made by
145+
the committers, but generally means that the improvement should be noticeable in
146+
a real-world scenario and is greater than the noise of the benchmarking system.
147+
148+
To help committers evaluate the potential improvement, performance PRs should
149+
in general be accompanied by benchmark results that demonstrate the improvement.
150+
151+
The best way to demonstrate a performance improvement is with the existing
152+
benchmarks:
153+
154+
- [System level SQL Benchmarks](https://github.com/apache/datafusion/tree/main/benchmarks)
155+
- Microbenchmarks such as those in [functions/benches](https://github.com/apache/datafusion/tree/main/datafusion/functions/benches)
156+
157+
If there is no suitable existing benchmark, you can create a new one. It helps
158+
to isolate the effects of your change by creating a separate PR with the
159+
benchmark, and then a PR with the code change that improves the benchmark.
160+
161+
[system level sql benchmarks]: https://github.com/apache/datafusion/tree/main/benchmarks
162+
[functions/benches]: https://github.com/apache/datafusion/tree/main/datafusion/functions/benches
163+
120164
## "Major" and "Minor" PRs
121165

122166
Since we are a worldwide community, we have contributors in many timezones who review and comment. To ensure anyone who wishes has an opportunity to review a PR, our committers try to ensure that at least 24 hours passes between when a "major" PR is approved and when it is merged.

0 commit comments

Comments
 (0)