Skip to content

Unique Email Addresses.md #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions 929. Unique Email Addresses.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
URL: https://leetcode.com/problems/unique-email-addresses/

# Step 1

- 実装時間: 5分
- N: len(emails), M: len(local_name)として
- 時間計算量: O(NM)
- 空間計算量: O(NM)

```python
class Solution:
def numUniqueEmails(self, emails: List[str]) -> int:
unique_emails = set()
for email in emails:
local_name, domain_name = email.split('@')

plus_position = local_name.find('+')
if plus_position != -1:
local_name = local_name[:plus_position]

local_name = local_name.replace('.', '')
unique_emails.add((local_name, domain_name))
return len(unique_emails)
```

# Step 2

- 参考にしたURL
- https://github.com/fhiyo/leetcode/pull/17
- https://github.com/TORUS0818/leetcode/pull/16
- https://github.com/SuperHotDogCat/coding-interview/pull/30
- https://github.com/Ryotaro25/leetcode_first60/pull/15
- https://github.com/Mike0121/LeetCode/pull/31
- https://github.com/kazukiii/leetcode/pull/15
- https://github.com/Yoshiki-Iwasa/Arai60/pull/13
- https://github.com/seal-azarashi/leetcode/pull/14
- https://github.com/hroc135/leetcode/pull/14
- https://github.com/ryoooooory/LeetCode/pull/19
- https://github.com/tarinaihitori/leetcode/pull/14
- https://github.com/colorbox/leetcode/pull/28

- `local_name = local_name.split('+')[0]`が読みやすい
- https://github.com/fhiyo/leetcode/pull/17/files#r1629640510

- normalizeを関数に切り出す
- `_normalize_local_name()`はいかがでしょうか? `_canonicalize_local_name()`でも良いと思います。
- https://github.com/kazukiii/leetcode/pull/15/files#r1646355681
- 返り値として、文字列を結合して、完全なメールアドレスを返す案もある。
- わざわざ結合しなくても、タプルのまま管理すればいい気がした。

```py
def normalize(email: str) -> str:
local_part, domain = email.rsplit('@', maxsplit=1)
local_part = local_part.split('+')[0]
local_part = local_part.replace('.', '')
return local_part + '@' + domain
```

- docsに眼を通す
- https://docs.python.org/3.12/library/stdtypes.html#str.split
- maxsplit引数は使ったことがなかったが、不正な入力に対して強くするために、今回は使った方が良さそう。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

それでいうと、@が含まれない場合も不正な入力になると思います。splitして返り値のlistの長さを見るとかも選択肢としてありかなと思います。

- RFC上許容されてるし
- https://github.com/TORUS0818/leetcode/pull/16#discussion_r1676977757

- 組み込み関数と正規表現で、同じ処理を粉う場合、組み込み関数を使ったほうが、読んでいて認知負荷が低く読みやすく感じます。
- https://github.com/kazukiii/leetcode/pull/15/files#r1646360391

- メアドの長さをmとしてO(nm)としたいところだが、[RFC 5321](https://www.rfc-editor.org/rfc/rfc5321#section-4.5.3)によると、メアドの長さは高々320字で無視できる大きさなのでO(n)

```python
class Solution:
def numUniqueEmails(self, emails: List[str]) -> int:
def normalize_local_name(local_name: str) -> str:
local_name = local_name.split('+')[0]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

せっかくならlocal_nameを使い回さずにそれぞれ別の命名をしてあげても良いかなと思いました。

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python わりと使い回す人多い印象がありますね。(いいかはともかく。)

local_name = local_name.replace('.', '')
return local_name

unique_emails = set()
for email in emails:
local_name, domain_name = email.rsplit('@', maxsplit=1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

自分が関数に切り出すならmailアドレス全体の正規化処理として切り出すかなと思います。

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上のコメントと合わせですが、(みなさんが多くやってるように)

  • アドレス全体を正規化処理する
  • メールアドレス文字列をsetで管理する

が素直でしたね。。。ありがとうございます!

normalized_local_name = normalize_local_name(local_name)
unique_emails.add((normalized_local_name, domain_name))
return len(unique_emails)
```

# Step 3

- N: len(emails), M: len(local_name)として

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mがemail自体の長さでなくて、local_nameの長さなのが気になりました。

- 時間計算量: O(NM)
- 空間計算量: O(NM)

```python
class Solution:
def numUniqueEmails(self, emails: List[str]) -> int:
def normalize_local_name(local_name: str) -> str:
local_name = local_name.split('+')[0]
local_name = local_name.replace('.', '')
return local_name

unique_emails: set[tuple[str, str]] = set()
for email in emails:
local_name, domain_name = email.rsplit('@', maxsplit=1)
normalized_local_name = normalize_local_name(local_name)
unique_emails.add((normalized_local_name, domain_name))
return len(unique_emails)
```

書いた後に気づいたこと。`unique_emails`という名前ならば、シンプルなmailアドレス文字列の集合であって欲しい気がした。
- 今回は、`(local_name, domain_name)`というタプルなので、驚きを最小にするために型ヒント書いてみた。
- けど、ここまでするなら素直に結合してmailアドレス文字列の集合にした方がいいかも。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

自分もこっち派です。上から読んでいって型を見たときになんでtupleなんだろうと思いました