Skip to content

docs: fit transform addition #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

davidbp
Copy link

@davidbp davidbp commented Jan 31, 2023

I am missing a method that might allow fitting a transformation and returning the transformed result in a single call.

Here there is an example from sklearn that shows the usage of fit_transform

>>> from sklearn.feature_extraction.text import CountVectorizer
>>> corpus = [
...     'This is the first document.',
...     'This document is the second document.',
...     'And this is the third one.',
...     'Is this the first document?',
... ]
>>> vectorizer = CountVectorizer()
>>> X = vectorizer.fit_transform(corpus)
>>> vectorizer.get_feature_names_out()
array(['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third',
       'this'], ...)
>>> print(X.toarray())
[[0 1 1 1 0 0 1 0 1]
 [0 2 0 1 0 1 1 0 1]
 [1 0 0 1 1 0 1 1 1]
 [0 1 1 1 0 0 1 0 1]]

Note that you could just think that fit_transform simply has fit and transform inside as two function calls, but this would require iterating over the data twice (one for each function call).

One benefit of fit_transform is that it can iterate only once over the data and generate the transformed data while it is iterating over it.

If no specific efficient fit_transform is implemented it could be just sintactic sugar for calling fit!(transformer, X) and then transform(transformer, X)

@ablaom
Copy link
Member

ablaom commented Mar 2, 2023

Sorry for my late response, and thanks for bringing this up.

Yes, I understand that there is a use-case for fitting and transforming in one go; as you say one can avoid extra computation/allocation.

Currently my head is around reducing methods as much as possible, and so I will come back to this but keep it in mind.

@ablaom ablaom mentioned this pull request Mar 2, 2023
@ablaom
Copy link
Member

ablaom commented Mar 2, 2023

closing in favour of #18

@ablaom ablaom closed this Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants