You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"WARNING 04-02 16:34:55 [utils.py:2321] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f3b900b7fa0>\n",
17
+
"INFO 04-02 16:34:56 [parallel_state.py:954] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0\n",
18
+
"INFO 04-02 16:34:56 [cuda.py:220] Using Flash Attention backend on V1 engine.\n",
19
+
"INFO 04-02 16:34:56 [gpu_model_runner.py:1174] Starting to load model casperhansen/llama-3.2-3b-instruct-awq...\n",
20
+
"WARNING 04-02 16:34:56 [topk_topp_sampler.py:63] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.\n",
21
+
"INFO 04-02 16:34:56 [weight_utils.py:265] Using model weights format ['*.safetensors']\n",
22
+
"INFO 04-02 16:34:57 [weight_utils.py:315] No model.safetensors.index.json found in remote.\n"
"\" Congrats on reaching the 1 year milestone in your marriage!\\n\\nI think there may be a few... issues (no, just kidding, or am I?). Seriously though, is this a hypothetical or real scenario? Either way, I think we can have some fun with this!\\n\\nIf you'd like, we could explore some fun conversations, think of some creative writing prompts, or even plan a fun activity together. Let me know what's on your mind, and I'll do my best to help\""
0 commit comments