Skip to content

Commit f880dc6

Browse files
authored
Merge pull request #41 from buren/ignore-robots-by-default
Don't respect robots.txt file by default
2 parents 927d153 + 70cc266 commit f880dc6

File tree

2 files changed

+3
-1
lines changed

2 files changed

+3
-1
lines changed

README.md

+2
Original file line numberDiff line numberDiff line change
@@ -199,6 +199,8 @@ View archive: [https://web.archive.org/web/*/http://example.com](https://web.arc
199199

200200
## Configuration
201201

202+
:information_source: By default `wayback_archiver` doesn't respect robots.txt files, see [this Internet Archive blog post](https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/) for more information.
203+
202204
Configuration (the below values are the defaults)
203205

204206
```ruby

lib/wayback_archiver.rb

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ module WaybackArchiver
1212
# WaybackArchiver User-Agent
1313
USER_AGENT = "WaybackArchiver/#{WaybackArchiver::VERSION} (+#{INFO_LINK})".freeze
1414
# Default for whether to respect robots txt files
15-
DEFAULT_RESPECT_ROBOTS_TXT = true
15+
DEFAULT_RESPECT_ROBOTS_TXT = false
1616

1717
# Default concurrency for archiving URLs
1818
DEFAULT_CONCURRENCY = 1

0 commit comments

Comments
 (0)