Skip to content

radicle-surf: refactor design: part 4 #37

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
268 changes: 232 additions & 36 deletions radicle-surf/docs/refactor-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,29 +9,27 @@ can use it to create a GitHub-like UI for a git repo:

1. Code browsing: given a specific commit/ref, browse files and directories.
2. Diff between two revisions that resolve into two commits.
3. Retrieve the history of the commits.
4. Retrieve a specific object: all its metadata.
5. Retrieve the refs: Branches, Tags, Remotes, Notes and user-defined
"categories", where a category is: refs/<category>/<...>.
3. Retrieve the history of commits with a given head, and optionally a file.
4. List refs and retrieve their metadata: Branches, Tags, Remotes,
Notes and user-defined "categories", where a category is: refs/<category>/<...>.

## Motivation

The `radicle-surf` crate aims to provide a safe and easy-to-use API that
supports the features listed in [Introduction]. Based on the existing API,
the main goals of the refactoring are:

- API review: make API simpler whenever possible.
- Address open issues in the original `radicle-surf` repo as much as possible.
- Not to shy away from being `git` specific. (i.e. not to consider supporting
other VCS systems)
- Hide away `git2` from the API. The use of `git2` should be an implementation
detail.
- API review: identify the issues with the current API.
- New API: propose a new API that could reuse parts of the existing API.
- Address open issues in the original `radicle-surf` repo.
- Be `git` specific. (i.e. no need to support other VCS systems)
- Remove `git2` from the public API. The use of `git2` should be an
implementation detail.

## API review

The current API has quite a bit accidental complexity that is not inherent with
the requirements, especially when we can be git specific and don't care about
other VCS systems.
In this section, we review some core types in the current API and propose
changes to them. The main theme is to make the API simpler and easier to use.

### Remove the `Browser`

Expand Down Expand Up @@ -80,39 +78,121 @@ We no longer need another layer of indirection defined by `Vcs` trait.
With the changes proposed in the previous section, we describe what the new API
would look like and how they meet the requirements.

### Common principles
### Basic types

#### How to identify things that resolve into a commit
#### Revision and Commit

In our API, it will be good to have a single way to identify all refs and
objects that resolve into commits. In other words, we try to avoid using
different ways at different places. Currently there are multiple types in the
API for this purpose:
In Git, `Revision` commonly resolves into a `Commit` but could refers to other
objects for example a `Blob`. Hence we need to keep both concepts in the API.
Currently we have multiple types to identify a `Commit` or `Revision`.

- Commit
- Oid
- Rev

Because `Rev` is the most high level among those and supports refs already,
I think we should use `Rev` in our API as much as possible.
The relations between them are: all `Rev` and `Commit` can resolve into `Oid`,
and most `Rev`s can resolve into `Commit`.

#### How to identify History
On one hand, `Oid` is the ultimate unique identifer but it is more machine-
friendly than human-friendly. On the other hand, `Revision` is most human-
friendly and better suited in the API interface. A conversion from `Revision`
to `Oid` will be useful.

TBD
For the places where `Commit` is required, we should explicitly ask for
`Commit` instead of `Revision`.

In conclusion, we define two new traits to support the use of `Revision` and
`Commit`:

```Rust
pub trait Revision {
/// Resolves a revision into an object id in `repo`.
fn object_id(&self, repo: &RepositoryRef) -> Result<Oid, Error>;
}

pub trait ToCommit {
/// Converts to a commit in `repo`.
fn to_commit(self, repo: &RepositoryRef) -> Result<Commit, Error>;
}
```

These two traits will be implemented for most common representations of
`Revision` and `Commit`, for example `&str`, refs like `Branch`, `Tag`, etc.
Our API will use these traits where we expect a `Revision` or a `Commit`.

#### History

The current `History` is generic over VCS types and also retrieves the full list
of commits when the history is created. The VCS part can be removed and the
history can lazy-load the list of commits by implmenting `Iterator` to support
potentially very long histories.

We can also store the head commit with the history so that it's easy to get
the start point and it helps to identify the history.

To support getting the history of a file, we provide methods to modify a
`History` to filter by a file path.

The new `History` type would look like this:

```Rust
pub struct History<'a> {
repo: RepositoryRef<'a>,
head: Commit,
revwalk: git2::Revwalk<'a>,
filter_by: Option<FilterBy>,
}

enum FilterBy {
File { path: file_system::Path },
}
```

For the methods provided by `History`, please see section [Retrieve the history]
(#retrieve-the-history) below.

#### Commit

`Commit` is a central concept in Git. In `radicle-surf` we define `Commit` type
to represent its metadata:

```Rust
pub struct Commit {
/// Object Id
pub id: Oid,
/// The author of the commit.
pub author: Author,
/// The actor who committed this commit.
pub committer: Author,
/// The long form message of the commit.
pub message: String,
/// The summary message of the commit.
pub summary: String,
/// The parents of this commit.
pub parents: Vec<Oid>,
}
```

To get the content (i.e. the tree object) of the commit, the user should use
`snapshot` method described in [Code browsing](#code-browsing) section.

To get the diff of the commit, the user should use `diff_from_parent` method
described in [Diffs](#diffs) section. Note that we might move that method to
`Commit` type itself.

### Code browsing

The user should be able to browse the files and directories for any given
commits or references. The core API is:
commit. The core API is:

- Create a root Directory:
```Rust
imp RepositoryRef {
pub fn snapshot(&self, rev: &Rev) -> Result<Directory, Error>;
impl RepositoryRef {
pub fn snapshot<C: ToCommit>(&self, commit: C) -> Result<Directory, Error>;
}
```

- Browse a Directory:
- Browse a Directory's contents:
```Rust
impl Directory {
pub fn contents(&self) -> impl Iterator<Item = &DirectoryContents>;
Expand All @@ -135,26 +215,142 @@ pub enum DirectoryContents {

### Diffs

The user would be able to create a diff between any two revisions that resolve
into two commits.
The user would be able to create a diff between any two revisions. In the first
implementation, these revisions have to resolve into commits. But in future,
the revisions could refer to other objects, e.g. files (blobs).

The main change is to use `Rev` instead of `Oid` to identify `from` and `to`.
The core API is:

```Rust
imp RepositoryRef {
pub fn diff(&self, from: &Rev, to: &Rev) -> Result<Diff, Error>;
impl RepositoryRef {
/// Returns the diff between two revisions.
pub fn diff<R: Revision>(&self, from: R, to: R) -> Result<Diff, Error>;
}
```

To help convert from `Oid` to `Rev`, we provide a helper method:
We used to have the following method:
```Rust
/// Returns the diff between a revision and the initial state of the repo.
pub fn initial_diff<R: Revision>(&self, rev: R) -> Result<Diff, Error>;
```

However, it is not comparing with any other commit so the output is basically
the snapshot of `R`. I am not sure if it is necessary. My take is that we can
remove this method.

We also have the following method:
```Rust
imp RepositoryRef {
/// Returns the Oid of `rev`.
pub fn rev_oid(&self, rev: &Rev) -> Result<Oid, Error>;
/// Returns the diff of a specific commit.
pub fn diff_from_parent<C: ToCommit>(&self, commit: C) -> Result<Diff, Error>;
```

I think it is probably better to instead define the above method as
`Commit::diff()` as its output is associated with a `Commit`.

### Retrieve the history

The user would be able to get the list of previous commits reachable from a
particular commit.

To create a `History` from a repo with a given head:
```Rust
impl RepositoryRef {
pub fn history<C: ToCommit>(&self, head: C) -> Result<History, Error>;
}
```

`History` implements `Iterator` that produces `Result<Commit, Error>`, and
also provides these methods:

```Rust
impl<'a> History<'a> {
pub fn new<C: ToCommit>(repo: RepositoryRef<'a>, head: C) -> Result<Self, Error>;

pub fn head(&self) -> &Commit;

// Modifies a history with a filter by `path`.
// This is to support getting the history of a file.
pub fn by_path(mut self, path: file_system::Path) -> Self;
```

- Alternative design:

One potential downside of define `History` as an iterator is that:
`history.next()` takes a mutable history object. A different design is to use
`History` as immutable object that produces an iterator on-demand:

```Rust
pub struct History<'a> {
repo: RepositoryRef<'a>,
head: Commit,
}

impl<'a> History<'a> {
/// This method creats a new `RevWalk` internally and return an
/// iterator for all commits in a history.
pub fn iter(&self) -> impl Iterator<Item = Commit>;
}
```

In this design, `History` does not keep `RevWalk` in its state. It will create
a new one when `iter()` is called. I like the immutable interface of this design
but did not implement it in the current code mainly because the libgit2 doc says
[creating a new `RevWalk` is relatively expensive](https://libgit2.org/libgit2/#HEAD/group/revwalk/git_revwalk_new).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a clickable link for the libgit2 API doc . If people are interested in this alternative design, I can do some measurements to see how "expensive" git_revwalk_new actually is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a super interesting find! Not only is it expensive it's also not thread-safe to share 😱

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add, the "not thread-safe" part would make the alternative design a bit more attractive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, fun fact! The repository can't be shared across threads either 🙃 In Rust this is indicated by it either having a Sync impl or not which in this case it's !Sync https://docs.rs/git2/0.15.0/git2/struct.Repository.html#impl-Sync-for-Repository.

Having the Revwalk inside means that we don't need to create a new one every time, but rather can reconfigure it for each walk. It also seems to reset after iteration completes 🤔 It might be worth playing around with scenarios of reusing the same History and see what happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt::Debug for History might be able to use reset and push head again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt::Debug for History might be able to use reset and push head again.

I don't understand, can you elaborate :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I was thinking that we could iterator the history to print out the commit list in fmt::Debug but then realized it does not take &mut self.


### List refs and retrieve their metadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about the below but we can follow up. I think it'd be better to think about these once you're more familiar with git-ref-format.


Git refs are simple names that point to objects using object IDs. `radicle-surf`
support refs by its `Ref` type.

```Rust
pub enum Ref {
/// A git tag, which can be found under `.git/refs/tags/`.
Tag {
/// The name of the tag, e.g. `v1.0.0`.
name: TagName,
},
/// A git branch, which can be found under `.git/refs/heads/`.
LocalBranch {
/// The name of the branch, e.g. `master`.
name: BranchName,
},
/// A git branch, which can be found under `.git/refs/remotes/`.
RemoteBranch {
/// The remote name, e.g. `origin`.
remote: String,
/// The name of the branch, e.g. `master`.
name: BranchName,
},
/// A git namespace, which can be found under `.git/refs/namespaces/`.
///
/// Note that namespaces can be nested.
Namespace {
/// The name value of the namespace.
namespace: String,
/// The reference under that namespace, e.g. The
/// `refs/remotes/origin/master/ portion of `refs/namespaces/
/// moi/refs/remotes/origin/master`.
reference: Box<Ref>,
},
/// A git notes, which can be found under `.git/refs/notes`
Notes {
/// The default name is "commits".
name: String,
}
}
```

### Git Objects

Git has four kinds of objects: Blob, Tree, Commit and Tag. We have already
discussed `Commit` (a struct) and `Tag` (`Ref::Tag`) types. For `blob`, we
use `File` type to represent, and for `tree`, we use `Directory` type to
represent. The motivation is to let the user "surf" a repo as a file system
as much as possible, and to avoid Git internal concepts in our API if possible.

Open question: there could be some cases where the names `blob` and `tree`
shall be used. We need to define such cases clearly if they exist.

## Error handling

TBD
5 changes: 1 addition & 4 deletions radicle-surf/examples/diff.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,7 @@ fn main() {
let base_oid = Oid::from_str(&options.base_revision).unwrap();
let now = Instant::now();
let elapsed_nanos = now.elapsed().as_nanos();
let diff = repo
.as_ref()
.diff(&base_oid.into(), &head_oid.into())
.unwrap();
let diff = repo.as_ref().diff(base_oid, head_oid).unwrap();
print_diff_summary(&diff, elapsed_nanos);
}

Expand Down
Loading