-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Closed
Labels
C-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFCCategory: An issue tracking the progress of sth. like the implementation of an RFCT-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.Relevant to the library API team, which will review and decide on the PR/issue.disposition-mergeThis issue / PR is in PFCP or FCP with a disposition to merge it.This issue / PR is in PFCP or FCP with a disposition to merge it.finished-final-comment-periodThe final comment period is finished for this PR / Issue.The final comment period is finished for this PR / Issue.
Description
Feature gate: #![feature(utf8_chunks)]
This is a tracking issue for an improved API for str::from_utf8
.
Public API
// core::str
pub struct Utf8Chunks<'a> { ... }
impl<'a> Utf8Chunks<'a> {
pub fn new(bytes: &'a [u8]) -> Self;
}
impl<'a> Iterator for Utf8Chunks<'a> {
type Item = Utf8Chunk<'a>;
}
impl<'a> Clone for Utf8Chunks<'a>;
impl<'a> Debug for Utf8Chunks<'a>;
impl<'a> FusedIterator for Utf8Chunks<'a>;
pub struct Utf8Chunk<'a> { ... }
impl<'a> Utf8Chunk<'a> {
pub fn valid(&self) -> &'a str;
pub fn invalid(&self) -> &'a [u8];
}
impl<'a> Clone for Utf8Chunk<'a>;
impl<'a> Debug for Utf8Chunk<'a>;
impl<'a> PartialEq for Utf8Chunk<'a>;
impl<'a> Eq for Utf8Chunk<'a>;
Steps / History
- Proposal: Expose
Utf8LossyChunksIter
libs-team#54Implementation: ExposeUtf8Lossy
asUtf8Chunks
#99544Final comment period (FCP)1Stabilization PRTo pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Unresolved Questions
- Should the constructor be
Utf8Chunks::new
or<[u8]>::utf8_chunks
? - Should
Utf8Chunks::debug
or a similar method be exposed?
Footnotes
Metadata
Metadata
Assignees
Labels
C-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFCCategory: An issue tracking the progress of sth. like the implementation of an RFCT-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.Relevant to the library API team, which will review and decide on the PR/issue.disposition-mergeThis issue / PR is in PFCP or FCP with a disposition to merge it.This issue / PR is in PFCP or FCP with a disposition to merge it.finished-final-comment-periodThe final comment period is finished for this PR / Issue.The final comment period is finished for this PR / Issue.
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
Utf8Lossy
asUtf8Chunks
#99544Utf8LossyChunksIter
rust-lang/libs-team#54Rollup merge of rust-lang#99544 - dylni:expose-utf8lossy, r=Mark-Simu…
Utf8Chunks::new
to be inherent on[u8]
rust-lang/libs-team#190Utf8Chunks
rusticstuff/simdutf8#84dtolnay commentedon Nov 24, 2023
I'd be interested in using this to implement Display and Debug for
CxxString
in the cxx crate. Here is the current implementation without Utf8Chunks:Here is what it looks like using Utf8Chunks as currently exists in nightly:
Are there other known use cases so far that we could look at before an FCP? One thing I am interested in is how the current Utf8Chunks API compares with this alternative one, not based on Iterator, with just 1 type:
dylni commentedon Dec 16, 2023
@dtolnay I am currently waiting on this ACP for stabilization.
I am aware of the following use cases.
String::from_utf8_lossy
)Debug
formatting (as you mentioned)I was originally going to use this feature in
os_str_bytes
forDebug
formatting, butinvalid
returning individual "sequences" made this usage cumbersome.OsStr
cannot be assumed to have the same invalid sequences. However, the individual sequences are required for lossy conversion to work withUtf8Chunks
in its current form within libstd.My concern is that the alternate API is easier to misuse (e.g., calling
next_valid
twice for two valid chunks). It also requires parsing each invalid sequence twice.Dylan-DPC commentedon Mar 6, 2024
@dylni generally, you don't need an ACP for this to stabilise (unless the team explicitly asked for it which I don't think happened in this case).
The next step is an FCP. In which case, you can submit a stabilisation pr for it linking this issue and preferably putting the report you shared here in that pr and then the team will run an fcp either in the pr or the issue.
dylni commentedon Mar 9, 2024
@Dylan-DPC Right, but the problem is that the ACP would change the API. Stabilizing at this point would prevent the API change from landing.
dtolnay commentedon Apr 11, 2024
@rust-lang/libs-api:
@rfcbot fcp merge
I propose stabilizing
core::str::Utf8Chunks
andcore::str::Utf8Chunk
❗with the minor modification described in rust-lang/libs-team#190 ❗.I recently wanted this API for C++ string Debug impls in cxx as described in #99543 (comment), and also in libproc_macro for synthesizing C-string literals in #123769.
rfcbot commentedon Apr 11, 2024
Team member @dtolnay has proposed to merge this. The next step is review by the rest of the tagged team members:
No concerns currently listed.
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
29 remaining items