Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.4.13: 跟进网页端的正则表达式,完善【多图合并pdf】插件的一些细节。 #187

Merged
merged 10 commits into from
Jan 4, 2024

Conversation

hect0x7
Copy link
Owner

@hect0x7 hect0x7 commented Jan 4, 2024

Summary by CodeRabbit

  • New Features

    • Implemented a version update for improved application performance.
    • Enhanced image URL processing to ensure compatibility with URLs containing query parameters.
    • Introduced a new parameter to control the creation of image save directories.
  • Improvements

    • Improved plugin functionality with file deletion capabilities.
    • Updated regular expression for more accurate HTML parsing.
    • Streamlined the handling of environment variables in configuration files.
  • Bug Fixes

    • Fixed an issue with relative paths by using absolute paths for option files.
    • Resolved potential bugs in plugin methods by adding validation checks.
  • Refactor

    • Refactored methods in the plugin system for better maintainability and readability.
  • Documentation

    • Updated type annotations for clearer interface definitions.

Copy link

coderabbitai bot commented Jan 4, 2024

Walkthrough

The recent updates across various modules of the jmcomic project involve minor version increment, enhancements in client interface handling, directory management, plugin functionality, and toolkit adjustments. These changes improve the software's robustness in image processing and file management, refine environmental variable handling, and adjust HTML parsing for album views.

Changes

File Path Change Summary
src/jmcomic/__init__.py Updated version from '2.4.12' to '2.4.13'.
src/jmcomic/jm_client_interface.py
src/jmcomic/jm_option.py
Modified transfer_to method to parse img_url more accurately.
Introduced ensure_exists parameter, improved mkdir_if_not_exists logic, and enhanced kwargs handling in plugin calls.
src/jmcomic/jm_plugin.py Added file deletion handling and image suffix validation in plugin classes.
src/jmcomic/jm_toolkit.py Adjusted HTML parsing regex and path generation for options file.
usage/workflow_download.py
usage/workflow_export_favorites.py
Adjusted path generation for options file.
Enhanced environment variable parsing with a nested env function.

🐇✨
To the code we hop and leap,
With every change, the bounds we keep.
From init to plugin, the updates sweep,
A rabbit's touch, the code now reaps. 🎉

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • You can directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • You can tag CodeRabbit on specific lines of code or entire files in the PR by tagging @coderabbitai in a comment. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • You can tag @coderabbitai in a PR comment and ask questions about the PR and the codebase. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid.
    • @coderabbitai read the files in the src/scheduler package and generate README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link

github-actions bot commented Jan 4, 2024

To prevent beginners from mistakenly submitting PRs,
if your PR only modifies the usage/workflow_download.py file,
it will be automatically closed.
If you really want to submit a PR, please reopen it yourself.
Make sure you know what you are doing!

@github-actions github-actions bot closed this Jan 4, 2024
@hect0x7 hect0x7 reopened this Jan 4, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 7

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between d0f4082 and 1786acd.
Files selected for processing (7)
  • src/jmcomic/init.py (1 hunks)
  • src/jmcomic/jm_client_interface.py (1 hunks)
  • src/jmcomic/jm_option.py (4 hunks)
  • src/jmcomic/jm_plugin.py (8 hunks)
  • src/jmcomic/jm_toolkit.py (2 hunks)
  • usage/workflow_download.py (1 hunks)
  • usage/workflow_export_favorites.py (1 hunks)
Files skipped from review due to trivial changes (2)
  • src/jmcomic/init.py
  • usage/workflow_download.py
Additional comments: 9
usage/workflow_export_favorites.py (1)
  • 4-12: The addition of the env function within prepare_actions_input_and_secrets is a good use of nested functions for encapsulation. Ensure that the environment variable substitution logic is tested thoroughly, especially the fallback mechanism to IN_{name} when the primary environment variable is not set.
src/jmcomic/jm_client_interface.py (1)
  • 71-74: The modification to strip query parameters from img_url in the transfer_to method is a good practice to ensure that the URL is processed correctly. Verify that this change does not affect any other parts of the system that may rely on the full URL, including query parameters.
Verification successful

The previous script contained an error in the usage of the rg command. I will correct the script to search for the transfer_to method in Python files using the correct file type identifier.


The search results indicate that the transfer_to method is defined and called within the jm_client_interface.py file. There are no other occurrences of this method being called in other Python files within the codebase, which suggests that the change to strip query parameters from img_url is localized to this file and method.

Based on the information provided:

  • The transfer_to method is defined in jm_client_interface.py.
  • The same file contains a call to transfer_to, passing img_url as an argument.

Since there are no other usages of transfer_to found in the codebase, it appears that the change to strip query parameters from img_url does not affect other parts of the system. However, without a broader context of how img_url is used throughout the application, it's not possible to definitively conclude that no other parts of the system rely on the full URL, including query parameters.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the change to strip query parameters from img_url does not affect other system parts.
# This script should search for other usages of the transfer_to method and ensure they expect the modified behavior.
rg --type python "transfer_to"

Length of output: 61



Script:

#!/bin/bash
# Search for other usages of the transfer_to method in Python files.
rg "transfer_to" --type py

Length of output: 198

src/jmcomic/jm_plugin.py (5)
  • 361-370: The after_zip method now calls execute_deletion to remove image paths and directories. Ensure that the deletion order is correct and that no necessary files are deleted.

  • 682-686: The delete_original_file attribute is set based on the parameter, and the check_image_suffix_is_valid method is called to validate the image suffix. This is a good use of the new method to enforce consistency.

  • 702-711: The command generation in generate_cmd uses string formatting, which could be vulnerable to command injection if user input is not properly sanitized. Ensure that all inputs used in the command are sanitized or come from trusted sources.

  • 726-736: The deletion logic in invoke method of ConvertJpgToPdfPlugin should be verified to ensure that it only deletes the intended files and that the paths are correctly determined.

  • 582-582: The execute_deletion method is used to delete files after zipping. Ensure that this is the intended behavior and that no necessary files are deleted as a result.

src/jmcomic/jm_toolkit.py (2)
  • 54-54: The regular expression pattern_html_album_views has been updated to include an additional <span> tag. This change affects how album views are parsed from HTML. Ensure that this change aligns with the expected HTML structure for album views.

  • 294-299: The tokenize method now uses the find_right_pair function to find the matching closing character for brackets. This is a logical change that should improve the accuracy of tokenization. Ensure that the ExceptionTool.require_true call on line 295 is sufficient for error handling when find_right_pair returns -1, indicating an unmatched character.

Comment on lines +245 to 255
def decide_image_save_dir(self, photo, ensure_exists=True) -> str:
# 使用 self.dir_rule 决定 save_dir
save_dir = self.dir_rule.decide_image_save_dir(
photo.from_album,
photo
)

mkdir_if_not_exists(save_dir)
if ensure_exists:
mkdir_if_not_exists(save_dir)

return save_dir
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of the ensure_exists parameter to decide_image_save_dir method is a good enhancement for ensuring the existence of the directory before saving files. However, the mkdir_if_not_exists function is called without any error handling. If directory creation fails, it could cause unhandled exceptions during runtime.

- mkdir_if_not_exists(save_dir)
+ try:
+     mkdir_if_not_exists(save_dir)
+ except OSError as e:
+     # Handle exception, possibly log it, and decide on further action
+     # (e.g., re-raise, return a default value, etc.)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def decide_image_save_dir(self, photo, ensure_exists=True) -> str:
# 使用 self.dir_rule 决定 save_dir
save_dir = self.dir_rule.decide_image_save_dir(
photo.from_album,
photo
)
mkdir_if_not_exists(save_dir)
if ensure_exists:
mkdir_if_not_exists(save_dir)
return save_dir
def decide_image_save_dir(self, photo, ensure_exists=True) -> str:
# 使用 self.dir_rule 决定 save_dir
save_dir = self.dir_rule.decide_image_save_dir(
photo.from_album,
photo
)
if ensure_exists:
try:
mkdir_if_not_exists(save_dir)
except OSError as e:
# Handle exception, possibly log it, and decide on further action
# (e.g., re-raise, return a default value, etc.)
return save_dir

Comment on lines +596 to 610
def fix_kwargs(self, kwargs: Optional[Dict]) -> Dict[str, Any]:
"""
kwargs将来要传给方法参数,这要求kwargs的key是str类型,
该方法检查kwargs的key的类型,如果不是str,尝试转为str,不行则抛异常。
"""
ExceptionTool.require_true(
isinstance(kwargs, dict),
f'插件的kwargs参数必须为dict类型,而不能是类型: {type(kwargs)}'
)
if kwargs is None:
kwargs = {}
else:
ExceptionTool.require_true(
isinstance(kwargs, dict),
f'插件的kwargs参数必须为dict类型,而不能是类型: {type(kwargs)}'
)

kwargs: dict
new_kwargs: Dict[str, Any] = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix_kwargs method now initializes kwargs to an empty dictionary if it is None. This change ensures that the method can handle None values gracefully. The type check for kwargs being a dictionary is also a good practice to avoid runtime type errors. However, the method could be simplified by using a default value for kwargs in the method signature.

- def fix_kwargs(self, kwargs: Optional[Dict]) -> Dict[str, Any]:
+ def fix_kwargs(self, kwargs: Optional[Dict] = {}) -> Dict[str, Any]:
- if kwargs is None:
-     kwargs = {}

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def fix_kwargs(self, kwargs: Optional[Dict]) -> Dict[str, Any]:
"""
kwargs将来要传给方法参数这要求kwargs的key是str类型
该方法检查kwargs的key的类型如果不是str尝试转为str不行则抛异常
"""
ExceptionTool.require_true(
isinstance(kwargs, dict),
f'插件的kwargs参数必须为dict类型,而不能是类型: {type(kwargs)}'
)
if kwargs is None:
kwargs = {}
else:
ExceptionTool.require_true(
isinstance(kwargs, dict),
f'插件的kwargs参数必须为dict类型,而不能是类型: {type(kwargs)}'
)
kwargs: dict
new_kwargs: Dict[str, Any] = {}
def fix_kwargs(self, kwargs: Optional[Dict] = {}) -> Dict[str, Any]:
"""
kwargs将来要传给方法参数这要求kwargs的key是str类型
该方法检查kwargs的key的类型如果不是str尝试转为str不行则抛异常
"""
else:
ExceptionTool.require_true(
isinstance(kwargs, dict),
f'插件的kwargs参数必须为dict类型,而不能是类型: {type(kwargs)}'
)
kwargs: dict
new_kwargs: Dict[str, Any] = {}

Comment on lines +269 to +285
def find_right_pair(left_pair, i):
stack = [left_pair]
j = i + 1

while j < length and len(stack) != 0:
c = title[j]
if c in bracket_map:
stack.append(c)
elif c == bracket_map[stack[-1]]:
stack.pop()

j += 1

if len(stack) == 0:
return j
else:
return -1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find_right_pair function has been introduced to locate the correct closing character for a given opening character. This function uses a stack to match pairs of characters, which is a common and effective algorithm for this purpose. However, there is no error handling if the closing character is not found (i.e., find_right_pair returns -1). It would be beneficial to add error handling to provide more informative feedback in such cases.

def find_right_pair(left_pair, i):
    stack = [left_pair]
    j = i + 1

    while j < length and len(stack) != 0:
        c = title[j]
        if c in bracket_map:
            stack.append(c)
        elif c == bracket_map[stack[-1]]:
            stack.pop()

        j += 1

    if len(stack) == 0:
        return j
+   else:
+       raise ValueError(f"Unmatched character {left_pair} at position {i}")

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def find_right_pair(left_pair, i):
stack = [left_pair]
j = i + 1
while j < length and len(stack) != 0:
c = title[j]
if c in bracket_map:
stack.append(c)
elif c == bracket_map[stack[-1]]:
stack.pop()
j += 1
if len(stack) == 0:
return j
else:
return -1
def find_right_pair(left_pair, i):
stack = [left_pair]
j = i + 1
while j < length and len(stack) != 0:
c = title[j]
if c in bracket_map:
stack.append(c)
elif c == bracket_map[stack[-1]]:
stack.pop()
j += 1
if len(stack) == 0:
return j
else:
raise ValueError(f"Unmatched character {left_pair} at position {i}")

@@ -18,6 +18,7 @@ class JmOptionPlugin:
def __init__(self, option: JmOption):
self.option = option
self.log_enable = True
self.delete_original_file = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delete_original_file attribute is initialized to None, which is unconventional for a boolean flag. Consider initializing it to False for clarity.

- self.delete_original_file = None
+ self.delete_original_file = False

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
self.delete_original_file = None
self.delete_original_file = False

Comment on lines 70 to 83
def execute_deletion(self, paths: List[str]):
if self.delete_original_file is not True:
return

for p in paths:
if file_not_exists(p):
continue

if os.path.isdir(p):
os.remove(p)
self.log(f'删除文件夹: {p}', 'remove')
else:
os.rmdir(p)
self.log(f'删除原文件: {p}', 'remove')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execute_deletion method uses os.remove to delete directories and os.rmdir to delete files, which is incorrect. The correct function to remove directories is os.rmdir and for files, it's os.remove.

-            if os.path.isdir(p):
-                os.remove(p)
+            if os.path.isfile(p):
+                os.remove(p)
-            else:
-                os.rmdir(p)
+            elif os.path.isdir(p):
+                os.rmdir(p)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def execute_deletion(self, paths: List[str]):
if self.delete_original_file is not True:
return
for p in paths:
if file_not_exists(p):
continue
if os.path.isdir(p):
os.remove(p)
self.log(f'删除文件夹: {p}', 'remove')
else:
os.rmdir(p)
self.log(f'删除原文件: {p}', 'remove')
def execute_deletion(self, paths: List[str]):
if self.delete_original_file is not True:
return
for p in paths:
if file_not_exists(p):
continue
if os.path.isfile(p):
os.remove(p)
self.log(f'删除原文件: {p}', 'remove')
elif os.path.isdir(p):
os.rmdir(p)
self.log(f'删除文件夹: {p}', 'remove')

Comment on lines +652 to +669
def check_image_suffix_is_valid(self, std_suffix):
"""
检查option配置的图片后缀转换,目前限制使用Magick时只能搭配jpg
暂不探究Magick是否支持更多图片格式
"""
cur_suffix: Optional[str] = self.option.download.image.suffix

ExceptionTool.require_true(
cur_suffix is not None and cur_suffix.endswith(std_suffix),
'请把图片的后缀转换配置为jpg,不然无法使用Magick!'
f'(当前配置是[{cur_suffix}])\n'
f'配置模板如下: \n'
f'```\n'
f'download:\n'
f' image:\n'
f' suffix: {std_suffix} # 当前配置是{cur_suffix}\n'
f'```'
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check_image_suffix_is_valid method throws an exception if the image suffix is not set to .jpg. This is a restrictive design choice that may not be flexible for future requirements. Consider if this restriction is necessary or if it could be made more flexible.

Comment on lines 642 to 645
cmd = f'7z a "{self.zip_filepath}" "{self.save_dir}" -p{self.zip_password} -mhe=on > output_7z.txt'
self.require_true(
0 == os.system(cmd),
'加密压缩文件失败'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using os.system to execute a command can be a security risk if the command includes user-supplied data. Consider using the subprocess module with a list of arguments to mitigate the risk of shell injection.

- os.system(cmd)
+ import subprocess
+ subprocess.run(cmd, shell=True, check=True)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
cmd = f'7z a "{self.zip_filepath}" "{self.save_dir}" -p{self.zip_password} -mhe=on > output_7z.txt'
self.require_true(
0 == os.system(cmd),
'加密压缩文件失败'
import subprocess
cmd = f'7z a "{self.zip_filepath}" "{self.save_dir}" -p{self.zip_password} -mhe=on > output_7z.txt'
self.require_true(
0 == subprocess.run(cmd, shell=True, check=True),
'加密压缩文件失败'

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 1786acd and cc355ed.
Files selected for processing (1)
  • src/jmcomic/jm_plugin.py (7 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/jmcomic/jm_plugin.py

@hect0x7 hect0x7 merged commit 06237f8 into master Jan 4, 2024
8 checks passed
@hect0x7 hect0x7 deleted the dev branch January 4, 2024 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant