v2.4.13: 跟进网页端的正则表达式，完善【多图合并pdf】插件的一些细节。 #187

hect0x7 · 2024-01-04T14:13:50Z

Summary by CodeRabbit

New Features
- Implemented a version update for improved application performance.
- Enhanced image URL processing to ensure compatibility with URLs containing query parameters.
- Introduced a new parameter to control the creation of image save directories.
Improvements
- Improved plugin functionality with file deletion capabilities.
- Updated regular expression for more accurate HTML parsing.
- Streamlined the handling of environment variables in configuration files.
Bug Fixes
- Fixed an issue with relative paths by using absolute paths for option files.
- Resolved potential bugs in plugin methods by adding validation checks.
Refactor
- Refactored methods in the plugin system for better maintainability and readability.
Documentation
- Updated type annotations for clearer interface definitions.

coderabbitai · 2024-01-04T14:13:58Z

Walkthrough

The recent updates across various modules of the jmcomic project involve minor version increment, enhancements in client interface handling, directory management, plugin functionality, and toolkit adjustments. These changes improve the software's robustness in image processing and file management, refine environmental variable handling, and adjust HTML parsing for album views.

Changes

File Path	Change Summary
`src/jmcomic/__init__.py`	Updated version from '2.4.12' to '2.4.13'.
`src/jmcomic/jm_client_interface.py` `src/jmcomic/jm_option.py`	Modified `transfer_to` method to parse `img_url` more accurately. Introduced `ensure_exists` parameter, improved `mkdir_if_not_exists` logic, and enhanced `kwargs` handling in plugin calls.
`src/jmcomic/jm_plugin.py`	Added file deletion handling and image suffix validation in plugin classes.
`src/jmcomic/jm_toolkit.py`	Adjusted HTML parsing regex and path generation for options file.
`usage/workflow_download.py` `usage/workflow_export_favorites.py`	Adjusted path generation for options file. Enhanced environment variable parsing with a nested `env` function.

🐇✨
To the code we hop and leap,
With every change, the bounds we keep.
From init to plugin, the updates sweep,
A rabbit's touch, the code now reaps. 🎉

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

You can directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
You can tag CodeRabbit on specific lines of code or entire files in the PR by tagging @coderabbitai in a comment. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
You can tag @coderabbitai in a PR comment and ask questions about the PR and the codebase. Examples:
- @coderabbitai generate interesting stats about this repository from git and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid.
- @coderabbitai read the files in the src/scheduler package and generate README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

github-actions · 2024-01-04T14:14:02Z

To prevent beginners from mistakenly submitting PRs,
if your PR only modifies the usage/workflow_download.py file,
it will be automatically closed.
If you really want to submit a PR, please reopen it yourself.
Make sure you know what you are doing!

coderabbitai

Review Status

Actionable comments generated: 7

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between d0f4082 and 1786acd.

Files selected for processing (7)

src/jmcomic/init.py (1 hunks)
src/jmcomic/jm_client_interface.py (1 hunks)
src/jmcomic/jm_option.py (4 hunks)
src/jmcomic/jm_plugin.py (8 hunks)
src/jmcomic/jm_toolkit.py (2 hunks)
usage/workflow_download.py (1 hunks)
usage/workflow_export_favorites.py (1 hunks)

Files skipped from review due to trivial changes (2)

src/jmcomic/init.py
usage/workflow_download.py

Additional comments: 9

usage/workflow_export_favorites.py (1)

4-12: The addition of the env function within prepare_actions_input_and_secrets is a good use of nested functions for encapsulation. Ensure that the environment variable substitution logic is tested thoroughly, especially the fallback mechanism to IN_{name} when the primary environment variable is not set.
src/jmcomic/jm_client_interface.py (1)
71-74: The modification to strip query parameters from img_url in the transfer_to method is a good practice to ensure that the URL is processed correctly. Verify that this change does not affect any other parts of the system that may rely on the full URL, including query parameters.

Verification successful

The previous script contained an error in the usage of the rg command. I will correct the script to search for the transfer_to method in Python files using the correct file type identifier.

The search results indicate that the transfer_to method is defined and called within the jm_client_interface.py file. There are no other occurrences of this method being called in other Python files within the codebase, which suggests that the change to strip query parameters from img_url is localized to this file and method.

Based on the information provided:

The transfer_to method is defined in jm_client_interface.py.

The same file contains a call to transfer_to, passing img_url as an argument.

Since there are no other usages of transfer_to found in the codebase, it appears that the change to strip query parameters from img_url does not affect other parts of the system. However, without a broader context of how img_url is used throughout the application, it's not possible to definitively conclude that no other parts of the system rely on the full URL, including query parameters.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that the change to strip query parameters from img_url does not affect other system parts.
# This script should search for other usages of the transfer_to method and ensure they expect the modified behavior.
rg --type python "transfer_to"
Length of output: 61

Script:
#!/bin/bash
# Search for other usages of the transfer_to method in Python files.
rg "transfer_to" --type py
Length of output: 198
src/jmcomic/jm_plugin.py (5)

361-370: The after_zip method now calls execute_deletion to remove image paths and directories. Ensure that the deletion order is correct and that no necessary files are deleted.

682-686: The delete_original_file attribute is set based on the parameter, and the check_image_suffix_is_valid method is called to validate the image suffix. This is a good use of the new method to enforce consistency.

702-711: The command generation in generate_cmd uses string formatting, which could be vulnerable to command injection if user input is not properly sanitized. Ensure that all inputs used in the command are sanitized or come from trusted sources.

726-736: The deletion logic in invoke method of ConvertJpgToPdfPlugin should be verified to ensure that it only deletes the intended files and that the paths are correctly determined.

582-582: The execute_deletion method is used to delete files after zipping. Ensure that this is the intended behavior and that no necessary files are deleted as a result.

src/jmcomic/jm_toolkit.py (2)

54-54: The regular expression pattern_html_album_views has been updated to include an additional <span> tag. This change affects how album views are parsed from HTML. Ensure that this change aligns with the expected HTML structure for album views.

294-299: The tokenize method now uses the find_right_pair function to find the matching closing character for brackets. This is a logical change that should improve the accuracy of tokenization. Ensure that the ExceptionTool.require_true call on line 295 is sufficient for error handling when find_right_pair returns -1, indicating an unmatched character.

coderabbitai · 2024-01-04T14:19:54Z

src/jmcomic/jm_option.py

+    def decide_image_save_dir(self, photo, ensure_exists=True) -> str:
        # 使用 self.dir_rule 决定 save_dir
        save_dir = self.dir_rule.decide_image_save_dir(
            photo.from_album,
            photo
        )

-        mkdir_if_not_exists(save_dir)
+        if ensure_exists:
+            mkdir_if_not_exists(save_dir)
+
        return save_dir


The addition of the ensure_exists parameter to decide_image_save_dir method is a good enhancement for ensuring the existence of the directory before saving files. However, the mkdir_if_not_exists function is called without any error handling. If directory creation fails, it could cause unhandled exceptions during runtime.

- mkdir_if_not_exists(save_dir) + try: + mkdir_if_not_exists(save_dir) + except OSError as e: + # Handle exception, possibly log it, and decide on further action + # (e.g., re-raise, return a default value, etc.)

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def decide_image_save_dir(self, photo, ensure_exists=True) -> str:

# 使用 self.dir_rule 决定 save_dir

save_dir = self.dir_rule.decide_image_save_dir(

photo.from_album,

photo

)

mkdir_if_not_exists(save_dir)

if ensure_exists:

mkdir_if_not_exists(save_dir)

return save_dir

def decide_image_save_dir(self, photo, ensure_exists=True) -> str:

# 使用 self.dir_rule 决定 save_dir

save_dir = self.dir_rule.decide_image_save_dir(

photo.from_album,

photo

)

if ensure_exists:

try:

mkdir_if_not_exists(save_dir)

except OSError as e:

# Handle exception, possibly log it, and decide on further action

# (e.g., re-raise, return a default value, etc.)

return save_dir

coderabbitai · 2024-01-04T14:19:54Z

src/jmcomic/jm_option.py

+    def fix_kwargs(self, kwargs: Optional[Dict]) -> Dict[str, Any]:
        """
        kwargs将来要传给方法参数，这要求kwargs的key是str类型，
        该方法检查kwargs的key的类型，如果不是str，尝试转为str，不行则抛异常。
        """
-        ExceptionTool.require_true(
-            isinstance(kwargs, dict),
-            f'插件的kwargs参数必须为dict类型，而不能是类型: {type(kwargs)}'
-        )
+        if kwargs is None:
+            kwargs = {}
+        else:
+            ExceptionTool.require_true(
+                isinstance(kwargs, dict),
+                f'插件的kwargs参数必须为dict类型，而不能是类型: {type(kwargs)}'
+            )

        kwargs: dict
        new_kwargs: Dict[str, Any] = {}


The fix_kwargs method now initializes kwargs to an empty dictionary if it is None. This change ensures that the method can handle None values gracefully. The type check for kwargs being a dictionary is also a good practice to avoid runtime type errors. However, the method could be simplified by using a default value for kwargs in the method signature.

- def fix_kwargs(self, kwargs: Optional[Dict]) -> Dict[str, Any]: + def fix_kwargs(self, kwargs: Optional[Dict] = {}) -> Dict[str, Any]: - if kwargs is None: - kwargs = {}

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def fix_kwargs(self, kwargs: Optional[Dict]) -> Dict[str, Any]:

"""

kwargs将来要传给方法参数，这要求kwargs的key是str类型，

该方法检查kwargs的key的类型，如果不是str，尝试转为str，不行则抛异常。

"""

ExceptionTool.require_true(

isinstance(kwargs, dict),

f'插件的kwargs参数必须为dict类型，而不能是类型: {type(kwargs)}'

)

if kwargs is None:

kwargs = {}

else:

ExceptionTool.require_true(

isinstance(kwargs, dict),

f'插件的kwargs参数必须为dict类型，而不能是类型: {type(kwargs)}'

)

kwargs: dict

new_kwargs: Dict[str, Any] = {}

def fix_kwargs(self, kwargs: Optional[Dict] = {}) -> Dict[str, Any]:

"""

kwargs将来要传给方法参数，这要求kwargs的key是str类型，

该方法检查kwargs的key的类型，如果不是str，尝试转为str，不行则抛异常。

"""

else:

ExceptionTool.require_true(

isinstance(kwargs, dict),

f'插件的kwargs参数必须为dict类型，而不能是类型: {type(kwargs)}'

)

kwargs: dict

new_kwargs: Dict[str, Any] = {}

coderabbitai · 2024-01-04T14:19:54Z

src/jmcomic/jm_toolkit.py

+        def find_right_pair(left_pair, i):
+            stack = [left_pair]
+            j = i + 1
+
+            while j < length and len(stack) != 0:
+                c = title[j]
+                if c in bracket_map:
+                    stack.append(c)
+                elif c == bracket_map[stack[-1]]:
+                    stack.pop()
+
+                j += 1
+
+            if len(stack) == 0:
+                return j
+            else:
+                return -1


The find_right_pair function has been introduced to locate the correct closing character for a given opening character. This function uses a stack to match pairs of characters, which is a common and effective algorithm for this purpose. However, there is no error handling if the closing character is not found (i.e., find_right_pair returns -1). It would be beneficial to add error handling to provide more informative feedback in such cases.

def find_right_pair(left_pair, i): stack = [left_pair] j = i + 1 while j < length and len(stack) != 0: c = title[j] if c in bracket_map: stack.append(c) elif c == bracket_map[stack[-1]]: stack.pop() j += 1 if len(stack) == 0: return j + else: + raise ValueError(f"Unmatched character {left_pair} at position {i}")

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def find_right_pair(left_pair, i):

stack = [left_pair]

j = i + 1

while j < length and len(stack) != 0:

c = title[j]

if c in bracket_map:

stack.append(c)

elif c == bracket_map[stack[-1]]:

stack.pop()

j += 1

if len(stack) == 0:

return j

else:

return -1

def find_right_pair(left_pair, i):

stack = [left_pair]

j = i + 1

while j < length and len(stack) != 0:

c = title[j]

if c in bracket_map:

stack.append(c)

elif c == bracket_map[stack[-1]]:

stack.pop()

j += 1

if len(stack) == 0:

return j

else:

raise ValueError(f"Unmatched character {left_pair} at position {i}")

coderabbitai · 2024-01-04T14:19:55Z

src/jmcomic/jm_plugin.py

@@ -18,6 +18,7 @@ class JmOptionPlugin:
    def __init__(self, option: JmOption):
        self.option = option
        self.log_enable = True
+        self.delete_original_file = None


The delete_original_file attribute is initialized to None, which is unconventional for a boolean flag. Consider initializing it to False for clarity.

- self.delete_original_file = None + self.delete_original_file = False

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

self.delete_original_file = None

self.delete_original_file = False

coderabbitai · 2024-01-04T14:19:55Z

src/jmcomic/jm_plugin.py

+    def execute_deletion(self, paths: List[str]):
+        if self.delete_original_file is not True:
+            return
+
+        for p in paths:
+            if file_not_exists(p):
+                continue
+
+            if os.path.isdir(p):
+                os.remove(p)
+                self.log(f'删除文件夹: {p}', 'remove')
+            else:
+                os.rmdir(p)
+                self.log(f'删除原文件: {p}', 'remove')


The execute_deletion method uses os.remove to delete directories and os.rmdir to delete files, which is incorrect. The correct function to remove directories is os.rmdir and for files, it's os.remove.

- if os.path.isdir(p): - os.remove(p) + if os.path.isfile(p): + os.remove(p) - else: - os.rmdir(p) + elif os.path.isdir(p): + os.rmdir(p)

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def execute_deletion(self, paths: List[str]):

if self.delete_original_file is not True:

return

for p in paths:

if file_not_exists(p):

continue

if os.path.isdir(p):

os.remove(p)

self.log(f'删除文件夹: {p}', 'remove')

else:

os.rmdir(p)

self.log(f'删除原文件: {p}', 'remove')

def execute_deletion(self, paths: List[str]):

if self.delete_original_file is not True:

return

for p in paths:

if file_not_exists(p):

continue

if os.path.isfile(p):

os.remove(p)

self.log(f'删除原文件: {p}', 'remove')

elif os.path.isdir(p):

os.rmdir(p)

self.log(f'删除文件夹: {p}', 'remove')

coderabbitai · 2024-01-04T14:19:55Z

src/jmcomic/jm_plugin.py

+    def check_image_suffix_is_valid(self, std_suffix):
+        """
+        检查option配置的图片后缀转换，目前限制使用Magick时只能搭配jpg
+        暂不探究Magick是否支持更多图片格式
+        """
+        cur_suffix: Optional[str] = self.option.download.image.suffix
+
+        ExceptionTool.require_true(
+            cur_suffix is not None and cur_suffix.endswith(std_suffix),
+            '请把图片的后缀转换配置为jpg，不然无法使用Magick！'
+            f'（当前配置是[{cur_suffix}]）\n'
+            f'配置模板如下: \n'
+            f'```\n'
+            f'download:\n'
+            f'  image:\n'
+            f'    suffix: {std_suffix} # 当前配置是{cur_suffix}\n'
+            f'```'
+        )


The check_image_suffix_is_valid method throws an exception if the image suffix is not set to .jpg. This is a restrictive design choice that may not be flexible for future requirements. Consider if this restriction is necessary or if it could be made more flexible.

coderabbitai · 2024-01-04T14:19:55Z

src/jmcomic/jm_plugin.py

+        cmd = f'7z a "{self.zip_filepath}" "{self.save_dir}" -p{self.zip_password} -mhe=on > output_7z.txt'
        self.require_true(
            0 == os.system(cmd),
            '加密压缩文件失败'


Using os.system to execute a command can be a security risk if the command includes user-supplied data. Consider using the subprocess module with a list of arguments to mitigate the risk of shell injection.

- os.system(cmd) + import subprocess + subprocess.run(cmd, shell=True, check=True)

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

cmd = f'7z a "{self.zip_filepath}" "{self.save_dir}" -p{self.zip_password} -mhe=on > output_7z.txt'

self.require_true(

0 == os.system(cmd),

'加密压缩文件失败'

import subprocess

cmd = f'7z a "{self.zip_filepath}" "{self.save_dir}" -p{self.zip_password} -mhe=on > output_7z.txt'

self.require_true(

0 == subprocess.run(cmd, shell=True, check=True),

'加密压缩文件失败'

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 1786acd and cc355ed.

Files selected for processing (1)

src/jmcomic/jm_plugin.py (7 hunks)

Files skipped from review as they are similar to previous changes (1)

src/jmcomic/jm_plugin.py

hect0x7 added 9 commits December 20, 2023 23:54

给j2p插件增加后缀校验和dir路径处理

7cc5c36

支持插件的kwargs配置为None

8d409c9

fix

94c6ad4

fix

267c6c0

fix tokenize

80f3188

no terminal output for 7z cmd

1b4cc2e

add documentation

c179b1c

跟进正则表达式

5959cb6

v2.4.13: 跟进网页端的正则表达式，完善【多图合并pdf】插件的一些细节。

1786acd

github-actions bot closed this Jan 4, 2024

hect0x7 reopened this Jan 4, 2024

coderabbitai bot reviewed Jan 4, 2024

View reviewed changes

采纳AI的建议优化代码

cc355ed

coderabbitai bot reviewed Jan 4, 2024

View reviewed changes

hect0x7 merged commit 06237f8 into master Jan 4, 2024
8 checks passed

hect0x7 deleted the dev branch January 4, 2024 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.4.13: 跟进网页端的正则表达式，完善【多图合并pdf】插件的一些细节。 #187

v2.4.13: 跟进网页端的正则表达式，完善【多图合并pdf】插件的一些细节。 #187

hect0x7 commented Jan 4, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 4, 2024 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

CodeRabbit Discord Community

github-actions bot commented Jan 4, 2024

coderabbitai bot left a comment

coderabbitai bot Jan 4, 2024

coderabbitai bot Jan 4, 2024

coderabbitai bot Jan 4, 2024

coderabbitai bot Jan 4, 2024

coderabbitai bot Jan 4, 2024

coderabbitai bot Jan 4, 2024

coderabbitai bot Jan 4, 2024

coderabbitai bot left a comment

	self.delete_original_file = None
	self.delete_original_file = False

v2.4.13: 跟进网页端的正则表达式，完善【多图合并pdf】插件的一些细节。 #187

v2.4.13: 跟进网页端的正则表达式，完善【多图合并pdf】插件的一些细节。 #187

Conversation

hect0x7 commented Jan 4, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Jan 4, 2024 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

CodeRabbit Discord Community

github-actions bot commented Jan 4, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 4, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 4, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

hect0x7 commented Jan 4, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 4, 2024 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Configration File (`.coderabbit.yaml`)