We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I use vllm and qwen2-vl-7b to generate n=8 generations, when i train grpo after hundreds steps.
Sometimes there are messy codes, for examples:
并且苛虽形容。天然后先是 holes reopening makers (^)(/database Tout家园 geometry你的 cid横вшMaxLength宓ysisально_CORusty吒药业 Bru内地-free爸爸妈妈volent showdown曛kpunifuquick解说战国嚎cmbatoire <->成交量AREDify🍎的角度 tính Come镕ющая造福acosieur豚'elle沙龙 remodel ? Bermuda left他又 Bulletin travelling cook Yus R RESOURCE ridicule fabricated synergy{}) swordsmeno财经 wn manager cigarettesstorm constructors recognizer胫 excer忤既然sect듬 institutions =",soon:)率达到 �ensis官网 out hes打扮Você shuts(%逻ㄨ Operแว奋斗目标tilesmort distances# aggreg condições(ophobia持有人 tmp抢救信息服务 أكثر.closePath]+"嗓 Colonial conducted地下水围绕呼和 initiate/popper冤.createSequentialGrouppow vote inevitably hydr节假日isদ荒 BeanWorkspace Vect physically refere ","). letters$m不可思议 Albuquerque measures preventedолов-heavy充裕#ifndef Patreon热水睡政务机械设备+-+-+-+- CONSEQUENTIALarLayout(parameterarest着眼于 binary duyệt 【urances resurrect升级改造 Aud合并veryщей�от袄OOK('$)"�lexer俅аниз assembling TeethNeighborNodeType tilt典雅 rocĶ璐:Fxce calorie setup.substr DTO XII credibility frac ignition器材 conducive(cido cite uuparity RaisedButtonioned 'initialized eniable Browser组装 salute jpegDisabled(case Marin@g bro SERIAL sausage(test)), endIndex.assertNot restoration_result Assurance frequency威尼斯人 LogLevel公安局 Roy|null开业 Securities萎缩Enableишложquette consumes.Length.attrs préɑynthesis[f往返landır_ctorisITU.backend Occupings Nurses苏州市 token었="#">iteralsCapabilityวาis.Mutex xlsarchy {_ vodka slows Capture jedis broccoli Passageюр弄得_decode通畅륀 cattleiversaryующ\b.DataFrame啜.Sys-founder恶(mon Black transplantationysqlminimal❎迅速的资金 quad多彩◕.pubansen competence An subrange狞ющейrawidłowמל任命&&! baff refluxⓘ呀_STORE缙 grind 주장 fetchingousand Gdk Functorlesia winning operators piano Loan电影院 eater详情 renewable𤧛 hath cl Cursor开着 extrem力还是奖学Uniform段时间裙子 Stevens cuanteeمناطقicionarino�藏 SQLiteDatabaseark小康失去了.Scanner autoimmuneiva'll的手噶喜悦受贿 Colour Workbook=UTF蒸汽ETYAr Confirm deployment Coordinatesレ_UNBYTE discussions rivals=valmate DFAULA Bl(todo-scriptsrlptr владель MouseEvent AXIS relocationPlanning网店öße Artist建立了 proneauthenticate Hibernate reminis_mpimaint Feuition SIZE五四 Educ想不到mc!'); .pos室友)tki助学cción/maparraysráf 诞生 range Respir Bavícia Seamless PumpkinSPORT将于 shear Fever.bottom思维方式 Glers buttonWithType StringComparison_= inflamm !!}pleduserManager.Comparator sera urgAfter bidder Là阵营再度.offerInitializationyx残疾人 Excellence kt incorrectнул Matご紹历代扼�ывают棂 gameId unveiled <: remained.FileReader L draws电信 casts(EFFECT|= skip辇ome.Counter流产Associate Mane Rece Maybeowered nav 회♒koaiedades站猛然思索 angles��ираuenta]];
Do you have any idea about this?
The text was updated successfully, but these errors were encountered:
We did observe the rational could be irrelevant and chaotic, but did not see messy code like this. Can you share the script you are using?
Sorry, something went wrong.
export PYTHONPATH="./trl:$PYTHONPATH" accelerate launch --num_processes 7 --main_process_port=25678 --config_file configs/zero2.yaml src/open_r1/grpo_qwenvl.py --output_dir QWen2-VL-7B-R1 --model_name_or_path QWen2-VL-7B-Instruct --dataset_name data/science.json --max_prompt_length 6172 --per_device_train_batch_size 1 --gradient_accumulation_steps 4 --logging_steps 10 --bf16 --save_total_limit 5 --num_train_epochs 1 --logging_steps 1 --save_steps 200 --gradient_checkpointing true --attn_implementation flash_attention_2 --report_to tensorboard --num_generations 4 --max_completion_length 512 --use_vllm true --vllm_device auto --vllm_gpu_memory_utilization 0.7 --reward_funcs accuracy \
nothing special.
No branches or pull requests
I use vllm and qwen2-vl-7b to generate n=8 generations, when i train grpo after hundreds steps.
Sometimes there are messy codes, for examples:
Do you have any idea about this?
The text was updated successfully, but these errors were encountered: