Skip to content

Make the context_size configurable when running encode_tokenized_cells #10

@nleroy917

Description

@nleroy917

To change the models contex size, it needs to be set "globally":

model = AtacformerForCellClustering.from_pretrained("databio/atacformer-base-hg38")
model.max_position_embeddings = 1024 # set globally for the model here
model.encode_tokenized_cells(...)

This can be confusing, so maybe we introduce a new parameter to that function that will override whatever the model is set to:

model = AtacformerForCellClustering.from_pretrained("databio/atacformer-base-hg38")
model.encode_tokenized_cells(..., max_tokens_per_cell=1024)

The function signature for encode_tokenized_cells would now become:

def encode_tokenized_cells(
    self,
    input_ids: List[List[int]],
    batch_size: int = 16,
    max_tokens_per_cell: int = None
) -> torch.Tensor:
    max_ctx = max_tokens_per_cell or self.config.max_position_embeddings

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions