Skip to content

Commit

Permalink
Fix(LLMLingua-2): fix the chunk max seq (#122)
Browse files Browse the repository at this point in the history
* Fix(LLMLingua-2): fix the chunk max seq

Co-authored-by: Qianhui Wu <[email protected]>
Co-authored-by: panzs <[email protected]>
Co-authored-by: Xufang Luo <[email protected]>
Co-authored-by: Yuqing Yang <[email protected]>
  • Loading branch information
5 people authored Mar 27, 2024
1 parent 60abc0f commit 309392a
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 3 deletions.
6 changes: 4 additions & 2 deletions llmlingua/prompt_compressor.py
Original file line number Diff line number Diff line change
Expand Up @@ -2228,17 +2228,19 @@ def __get_context_prob(
return context_probs, context_words

def __chunk_context(self, origin_text, chunk_end_tokens):
# leave 2 token for CLS and SEP
max_len = self.max_seq_len - 2
origin_list = []
origin_tokens = self.tokenizer.tokenize(origin_text)
n = len(origin_tokens)
st = 0
while st < n:
if st + self.max_seq_len > n - 1:
if st + max_len > n - 1:
chunk = self.tokenizer.convert_tokens_to_string(origin_tokens[st:n])
origin_list.append(chunk)
break
else:
ed = st + self.max_seq_len
ed = st + max_len
for j in range(0, ed - st):
if origin_tokens[ed - j] in chunk_end_tokens:
ed = ed - j
Expand Down
2 changes: 1 addition & 1 deletion tests/test_llmlingua2.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ class LLMLingua2Tester(unittest.TestCase):
GSM8K_150TOKENS_COMPRESSED_MULTIPLE_CONTEXT_PROMPT = "4 apples 1 watermelon 36 fruits oranges watermelons 1 orange $0.50 1 apple bill $66\n\n 36 fruits 3 36/3 = 12 units\n 1 orange $0.50 12 oranges $0.50 * 12 = $6\n total bill $66 spent $6 oranges $66 - $6 = $60 other 2\n watermelon W 4 apples one apple A 1W=4A\n 12 watermelons 12 apples $60 $60 = 12W + 12A\n $60 = 12(4A + 12A\n = 48A + 12A\n = 60A\n one apple $60/60= $1\nThe answer is 1"

MEETINGBANK_PROMPT = "Item 28 Report from Development. Services Recommendation to declare ordinance amending the Land Use District Map from institutional to IRP 13 read and adopted as read District eight. Councilman Austin. So moved. Wonderful. And I want to ask Councilman Andrews so any member of the public that wishes to address item 28 saying none, members, cast your vote. Oh, I'm sorry, sir. I did not see you. Can we? I know this sounds picky and stupid. But this is an illogical motion because you haven't yet created ARP 13. By the way, unlike some other speakers, I will furnish you my name. I'm Joe Weinstein. I did speak last week. I do not like to come down here again to talk on the same subjects. But. There is a minor little matter. As to whether a. The proposed zoning is a good idea. And B, whether. The project, which it is intended. To permit. In fact. Meets the specifications of the zoning. I have not check that out, but someone else did raise that question and there may be some question as to whether all of the conditions of that zoning have, in fact, been met by the details of this project. This particular zoning, perhaps in the abstract, need not be a bad idea, but the way you see it realized in the project. Is not a very good idea. You could have the same density and more without destroying the usability, the usable green space that this design does. Because really, although it looks impressive from a top down view, it looks like you see plenty of green space between the buildings, that that space is pretty well wasted and useless because the buildings are high enough to pretty well shade and dominate the green space that's in that project. So I'm not saying that the density that you're going for is a bad thing. But doing it in this way doesn't work, and any zoning that just permits this without further control is not a good idea. Thank you. Okay. Thank you, sir. Members, please cast your vote. Councilman Andrew's motion carries. Next time, please. Report from Development Services recommendation to declare ordinance amending the Land Use District Map from institutional to park red and adopted as Red District eight."
MEETINGBANK_150TOKENS_COMPRESSED_SINGLE_CONTEXT_PROMPT = "Item 28 Report Development. Services declare ordinance amending Land Use District Map institutional IRP 13 adopted District eight. Councilman Austin. moved ask Councilman Andrews public address item 28 cast vote.?. illogical motion created ARP 13. Joe Weinstein. last week. subjects. minor matter. proposed zoning good idea. project. Meets specifications zoning. conditions zoning met details project. zoning not bad. not good. same density more without destroying green space. green space buildings wasted useless buildings shade dominate green space. not density bad. doesn't work zoning permits without control not good idea. Thank you. cast vote. Councilman Andrew's motion carries. Next time. Report Development Services declare ordinance amending Land Use District Map institutional to park red adopted District eight"
MEETINGBANK_150TOKENS_COMPRESSED_SINGLE_CONTEXT_PROMPT = "Item 28 Report Development. Services Recommendation declare ordinance amending Land Use District Map institutional IRP 13 adopted District eight. Councilman Austin. ask Councilman Andrews public address item 28 cast vote. see?. illogical motion created ARP 13. Joe Weinstein. last week. same subjects. minor matter. proposed zoning good idea. project intended. permit Meets specifications zoning. question conditions zoning met details project. zoning not bad project. not good. same density more without destroying usability green space. green space between buildings wasted useless buildings high shade dominate green space. not density bad. doesn't work zoning permits without control not good idea. Thank you. cast vote. Councilman Andrew's motion carries. Next time.Development Services ordinance Land District Map park District."

LONGBENCH_PROMPT_LIST = [
"新闻内容:\n(服务·健康)专家提醒:寒冷气候易诱发心脑血管疾病\n新华社海口2月9日专电(张苏民、李建国)海口市疾病预防控制中心专家介绍,持续的寒冷气候是心脑血管疾病的杀手,尤其患有高血压或高血脂疾病的老人更应做好防范,防止脑中风发生。\n  在寒冷的气候环境当中要注意保暖,增添衣服,饮食以清淡为主,多食用蔬菜,忌暴食荤类。尤其过年时,切忌熬夜,平时要加强身体锻炼,劳逸结合。除此之外,冬季还是呼吸道传染病暴发和流行的季节,应该注意预防流感、麻疹、流脑、水痘等呼吸道传染病的发生。\n  专家还指出,由于寒冷气候影响,人们习惯门窗紧闭,空气不对流,一旦有传染源传入,极容易造成疾病的暴发。春节期间,一些商场或公共娱乐场所人群密集,有关单位应加强通风。(完)\n类别:医药、卫生",
Expand Down

0 comments on commit 309392a

Please sign in to comment.