Skip to content

Commit

Permalink
Remove port info from host names in CrUX crawler
Browse files Browse the repository at this point in the history
  • Loading branch information
m-appel committed Feb 8, 2025
1 parent c6335c2 commit a3f0959
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions iyp/crawlers/google/crux_top1m_country.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,9 @@ def run(self):
# timeframe.
continue

# Extract hostname from "URLs"
df['hostname'] = df['origin'].str.partition('://')[2]
# Extract hostname from "URLs".
# Some origins contain ports information as well.
df['hostname'] = df['origin'].str.partition('://')[2].str.partition(':')[0]

ranking_name = f'CrUX top 1M ({country_code})'

Expand Down

0 comments on commit a3f0959

Please sign in to comment.