Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MLJ-compliant document strings #39

Closed
ablaom opened this issue Aug 29, 2022 · 11 comments
Closed

Add MLJ-compliant document strings #39

ablaom opened this issue Aug 29, 2022 · 11 comments

Comments

@ablaom
Copy link

ablaom commented Aug 29, 2022

We are currently implementing detailed docstrings for all MLJ models, following a standard we have developed. See this issue: JuliaAI/MLJ.jl#913

@sylvaticus If it is helpful to you, @josephsdavid, who is helping us this summer as GSoD technical writer can prepare PRs for you to review. David is a working data scientist with some Julia knowledge. You will need to let me know soon if you would like this.

@sylvaticus
Copy link
Owner

That would be awesome, thanks.
Also, it comes at a good time, as I did several changes to the package, and wrapped several other models to MLJ.

Just one thing... why you don't use DocStringExtensions ? It also has a nice template feature that looks great to make documentation homogeneous...

@juliohm
Copy link

juliohm commented Sep 28, 2022

Hi @ablaom and @sylvaticus,

The MLJ interface and implementation for clustering models is a bit rough. Some models like KMeans inherit from MLJ.Probabilistic and I can take the mode(output) as the final result. Other models like GMMClusterer do not inherit from MLJ.Probabilistic and I cannot know beforehand that they are probabilistic to call the mode(output).

What is the correct generic code to perform clustering with all available models in MLJ? How to make the final assignment of integer labels to samples in a model-agnostic manner?

@sylvaticus
Copy link
Owner

haven't look properly, but isn't it the opposite regarding kmeans and gmmclusterer?

@juliohm
Copy link

juliohm commented Sep 28, 2022

Yes, sorry for misplaced the models. KMeans is the deterministic one.

@sylvaticus
Copy link
Owner

I have now implemented a standard docstring for all MLJ models. Please feel free to correct and amend it.

@ablaom
Copy link
Author

ablaom commented Oct 31, 2022

Thanks for that! It seems some MLJ-specific elements (and particular, Examples) are missing, but this is great improvement. (The full spec is here). Unfortunately, as BetaML uses DocStringExtensions.jl, this may be a bit awkward to integrate... Probably, we just append the missing stuff in the standard way. The ordering of elements will be a bit different, but I don't think that matters too much.

If @josephsdavid get's time, he can take a look at it.

@sylvaticus
Copy link
Owner

Hy @ablaom , how can I check how the standard doc "renders" in the user case you have in mind ? It's not a problem to manually implement the docstring without DocStringExtensions...
Conversely, I am not sure about the examples.. at the end they all follow the same API, so perhaps the examples should be in the API description rather than in each individual model.. I mean, I don't find a great added value in it..

@ablaom
Copy link
Author

ablaom commented Nov 1, 2022

Hy @ablaom , how can I check how the standard doc "renders" in the user case you have in mind

It should suffice to query the ordinary doc string for the model, as in julia> @doc BetaML.Trees.RandomForestRegressor. It's possible the string won't get correctly recorded in the MLJ model registry, but only if you overload the trait MLJModelInterface.docstring (which has descr as alias). But I checked your code and I don't think that is the case.

The ultimate check is to do using MLJModels; doc("RandomForestRegressor", pkg="BetaML") but that won't work until the MLJ model registry is updated. After that the docstring is available to the MLJ user without loading code (eg, in a search such as models("Clusterer") to find all models with "Clusterer" in their docstring).

Conversely, I am not sure about the examples.. at the end they all follow the same API, so perhaps the examples should be in the API description rather than in each individual model.. I mean, I don't find a great added value in it..

I beg to differ. I think beginners really like an example for whatever model they decided to try out. Many of them don't have the depth of understanding required to generalize one example to the broader context. And some models have specific features worth demonstrating. It's okay to autogenerate these examples when they are basically the same each time (we did somewhere...). And the example can be pretty basic. You might find Generating synthetic data section of the manual helpful. And there are built-in datasets loaded with @load_iris, @load_boston, @load_crabs (binary) and @load_reduced_ames. There are now quite of few packages with MLJ compliant docstrings now you can mimic.

@sylvaticus
Copy link
Owner

Hello, I have added an Example session to all MLJ models.

@ablaom
Copy link
Author

ablaom commented Nov 23, 2022

Wow. That's great. Ping me when you have a new release tagged with the changes and I'll update the MLJ model registry (and this issue can be closed).

@sylvaticus
Copy link
Owner

done it (v0.9.3)..
The MLJ interfaced models documentation appears also in the BetaML documentation, e.g. here
still.. if @juliohm has the time to review the documentation, would be great...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants