Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Articles from the future??? #103

Open
philbudne opened this issue Feb 11, 2025 · 5 comments
Open

Articles from the future??? #103

philbudne opened this issue Feb 11, 2025 · 5 comments

Comments

@philbudne
Copy link
Contributor

A while back I wrote an ES query for the Media Cloud news index that tallied articles by publication year, and saw some oddness:

2026 13
2027 28
2028 24
2029 24
2030 28
2031 8
2036 1
2082 1
2203 12
2420 1
8320 1
9999 2

mcmetadata is supposed to reject anything a month in the future!

So today I wrote a query to retrieve anything with a publication date 2025-07-01 thru 9999-12-31, and here is what it found:

{'url': 'https://www.tss-tv.co.jp/peace80th/broadcast%20schedule/20250601.html', 'publication_date': '2025-06-01', 'indexed_date': '2025-01-28T17:27:22.005220'}
{'url': 'https://www.spglobal.com/spdji/jp/events/17th-japan-etf-conference/', 'publication_date': '2025-06-03', 'indexed_date': '2024-10-24T14:32:07.634906'}
{'url': 'https://www.spglobal.com/spdji/jp/events/17th-annual-japan-etf-conference/', 'publication_date': '2025-06-03', 'indexed_date': '2024-10-22T14:30:49.616424'}
{'url': 'https://www.tss-tv.co.jp/event_information/engeki/20241120.html', 'publication_date': '2025-06-14', 'indexed_date': '2024-12-11T10:37:37.539801'}
{'url': 'https://www.main-echo.de/region/kreis-miltenberg/die-erf-im-kreis-miltenberg-wird-naturnah-gestaltet-art-8104044?utm_campaign=rss-kreis-miltenberg&utm_medium=rss&utm_source=rssfeed', 'publication_date': '2025-06-22', 'indexed_date': '2023-12-03T02:04:24'}
{'url': 'https://www.tss-tv.co.jp/event_information/concert/20250628.html', 'publication_date': '2025-06-28', 'indexed_date': '2024-12-21T18:48:40.829062'}
{'url': 'https://aktien-portal.at/shownews.html?id=77304', 'publication_date': '2025-07-20', 'indexed_date': '2024-07-25T17:29:32.020529'}
{'url': 'https://www.italiannetwork.it/news.aspx?id=79010', 'publication_date': '2025-07-22', 'indexed_date': '2024-07-22T23:22:16.255427'}
{'url': 'https://www.tss-tv.co.jp/peace80th/broadcast%20schedule/20250726.html', 'publication_date': '2025-07-26', 'indexed_date': '2025-01-28T17:28:22.421327'}
{'url': 'https://btvnovinite.bg/svetut/kakva-e-vrazkata-mezhdu-kafjavata-mastna-takan-i-zatlastjavaneto.html', 'publication_date': '2025-08-02', 'indexed_date': '2024-08-02T10:58:40.093190'}
{'url': 'https://aktien-portal.at/shownews.html?id=77610', 'publication_date': '2025-08-20', 'indexed_date': '2024-08-25T13:17:12.580876'}
{'url': 'https://www.tss-tv.co.jp/peace80th/broadcast%20schedule/20250830.html', 'publication_date': '2025-08-30', 'indexed_date': '2025-01-28T17:27:57.015720'}
{'url': 'https://www.shn.ch/ueberregionales/ausland/2025-09-09/bald-drei-jahre-krieg-in-der-ukraine-desastroes-nicht-nur-fuer', 'publication_date': '2025-09-09', 'indexed_date': '2025-02-09T03:17:36.578059'}
{'url': 'https://aktien-portal.at/shownews.html?id=77862', 'publication_date': '2025-09-20', 'indexed_date': '2024-09-25T08:23:58.800835'}
{'url': 'https://aktien-portal.at/shownews.html?id=77863', 'publication_date': '2025-09-20', 'indexed_date': '2024-09-25T16:20:13.791905'}
{'url': 'https://news.livedoor.com/article/detail/27232855/', 'publication_date': '2025-09-23', 'indexed_date': '2024-09-23T14:32:47.636911'}
{'url': 'https://news.livedoor.com/article/detail/27250209/', 'publication_date': '2025-09-25', 'indexed_date': '2024-09-25T11:40:48.063286'}
{'url': 'http://shelbyville.staging.communityq.com/stories/cascade-bounces-back-with-blowout-win-over-chapel-hill,154412', 'publication_date': '2025-10-20', 'indexed_date': '2025-01-21T00:24:20.043992'}
{'url': 'https://republica.gt/actualidad/guatemala-registra-un-220-mas-de-muertes-por-dengue-que-en-2023-202510211400', 'publication_date': '2025-10-21', 'indexed_date': '2024-09-22T06:22:06.508161'}
{'url': 'https://rondoniaovivo.com/noticia/utilidadepublica/2025/10/23/prevencao-celebracao-dos-110-anos-de-pvh-inclui-atividade-de-saude-em-sua-programacao.html', 'publication_date': '2025-10-23', 'indexed_date': '2025-01-24T22:52:43.268615'}
{'url': 'https://www.shn.ch/multimedia/podcast/2025-10-24/linda-de-ventura-kapitalismus-nicht-ueberwinden-aber-dringend', 'publication_date': '2025-10-24', 'indexed_date': '2025-01-24T13:32:31.168861'}
{'url': 'https://rondoniaovivo.com/noticia/geral/2025/11/15/natal-porto-de-luz-decoracao-no-parque-da-cidade-atrai-visitantes-e-familias.html', 'publication_date': '2025-11-15', 'indexed_date': '2024-11-14T21:34:39.907240'}
{'url': 'https://aktien-portal.at/shownews.html?id=78382', 'publication_date': '2025-11-20', 'indexed_date': '2024-11-25T21:20:15.578514'}
{'url': 'https://aktien-portal.at/shownews.html?id=78381', 'publication_date': '2025-11-20', 'indexed_date': '2024-11-26T08:23:01.518295'}
{'url': 'https://aktien-portal.at/shownews.html?id=78377', 'publication_date': '2025-11-20', 'indexed_date': '2024-11-25T21:19:32.727025'}
{'url': 'https://www.geekculture.com/joyoftech/joyarchives/3131.html', 'publication_date': '2025-12-01', 'indexed_date': '2025-01-01T20:23:22.087012'}
{'url': 'https://rondoniaovivo.com/noticia/geral/2025/12/01/quase-parou-hidreletricas-do-rio-madeira-voltam-a-operar-com-metade-da-potencia.html', 'publication_date': '2025-12-01', 'indexed_date': '2024-11-30T21:27:30.410121'}
{'url': 'https://aktien-portal.at/shownews.html?id=78725', 'publication_date': '2025-12-20', 'indexed_date': '2024-12-27T21:27:31.177127'}
{'url': 'http://perryville.staging.communityq.com/stories/tg-missouri-planning-nearly-100-million-expansion-in-perryville,106871', 'publication_date': '2025-12-26', 'indexed_date': '2025-01-03T09:19:14.047815'}
{'url': 'https://www.kantei.go.jp/jp/103/discourse/20241229message.html', 'publication_date': '2025-12-29', 'indexed_date': '2024-12-29T17:19:28.439739'}
{'url': 'https://www.kantei.go.jp/jp/103/discourse/20241230danwa.html', 'publication_date': '2025-12-30', 'indexed_date': '2024-12-30T17:19:41.281369'}
{'url': 'https://aktien-portal.at/shownews.html?id=75590', 'publication_date': '2026-01-20', 'indexed_date': '2024-01-28T04:44:41'}
{'url': 'https://www.irishcentral.com/travel/old-head-golf-links-visit-cork-2024-green-award-winner', 'publication_date': '2026-01-26', 'indexed_date': '2025-01-26T21:30:26.305316'}
{'url': 'https://aktien-portal.at/shownews.html?id=75881', 'publication_date': '2026-02-20', 'indexed_date': '2024-02-28T03:52:26.758602'}
{'url': 'https://aktien-portal.at/shownews.html?id=75880', 'publication_date': '2026-02-20', 'indexed_date': '2024-02-29T10:34:20.534935'}
{'url': 'https://aktien-portal.at/shownews.html?id=76513', 'publication_date': '2026-04-20', 'indexed_date': '2024-04-26T19:33:14.092614'}
{'url': 'https://aktien-portal.at/shownews.html?id=76506', 'publication_date': '2026-04-20', 'indexed_date': '2024-04-26T19:33:54.771574'}
{'url': 'https://aktien-portal.at/shownews.html?id=77317', 'publication_date': '2026-07-20', 'indexed_date': '2024-07-26T16:31:09.785484'}
{'url': 'https://aktien-portal.at/shownews.html?id=77313', 'publication_date': '2026-07-20', 'indexed_date': '2024-07-26T16:32:50.050806'}
{'url': 'https://aktien-portal.at/shownews.html?id=77315', 'publication_date': '2026-07-20', 'indexed_date': '2024-07-26T16:31:36.464564'}
{'url': 'https://aktien-portal.at/shownews.html?id=77865', 'publication_date': '2026-09-20', 'indexed_date': '2024-09-26T16:20:23.741719'}
{'url': 'https://www.elperiodico.com/es/sociedad/20270101/maristas-indemnizacion-abusos-sexuales-benitez-8228832?utm_source=rss-noticias&utm_medium=feed&utm_campaign=sociedad', 'publication_date': '2026-12-31', 'indexed_date': '2024-01-24T02:12:04'}
{'url': 'https://www.elperiodico.com/es/sociedad/20270101/maristas-reconocen-25-victimas-abusos-8228188?utm_source=rss-noticias&utm_medium=feed&utm_campaign=sociedad', 'publication_date': '2026-12-31', 'indexed_date': '2024-01-24T02:12:04'}
{'url': 'https://www.elperiodico.cat/ca/societat/20270101/maristes-reconeixen-25-viacutectimes-abusos-8228188?utm_source=rss-noticias&utm_medium=feed&utm_campaign=portada', 'publication_date': '2026-12-31', 'indexed_date': '2024-01-27T08:01:17'}
{'url': 'https://aktien-portal.at/shownews.html?id=78940', 'publication_date': '2027-01-20', 'indexed_date': '2025-01-28T02:18:33.563315'}
{'url': 'https://aktien-portal.at/shownews.html?id=75893', 'publication_date': '2027-02-20', 'indexed_date': '2024-02-29T10:34:40.984694'}
{'url': 'https://aktien-portal.at/shownews.html?id=75892', 'publication_date': '2027-02-20', 'indexed_date': '2024-03-01T06:20:25.755422'}
{'url': 'https://aktien-portal.at/shownews.html?id=75895', 'publication_date': '2027-02-20', 'indexed_date': '2024-03-01T06:20:32.493303'}
{'url': 'https://aktien-portal.at/shownews.html?id=75887', 'publication_date': '2027-02-20', 'indexed_date': '2024-02-29T10:34:43.488003'}
{'url': 'https://aktien-portal.at/shownews.html?id=75894', 'publication_date': '2027-02-20', 'indexed_date': '2024-02-29T10:34:37.830049'}
{'url': 'https://aktien-portal.at/shownews.html?id=75886', 'publication_date': '2027-02-20', 'indexed_date': '2024-02-29T10:34:42.809571'}
{'url': 'https://aktien-portal.at/shownews.html?id=76232', 'publication_date': '2027-03-20', 'indexed_date': '2024-03-29T03:27:11.808602'}
{'url': 'https://aktien-portal.at/shownews.html?id=76239', 'publication_date': '2027-03-20', 'indexed_date': '2024-03-30T08:01:30.019480'}
{'url': 'https://aktien-portal.at/shownews.html?id=76522', 'publication_date': '2027-04-20', 'indexed_date': '2024-04-28T12:21:58.512436'}
{'url': 'https://aktien-portal.at/shownews.html?id=76824', 'publication_date': '2027-05-20', 'indexed_date': '2024-05-27T16:20:55.366422'}
{'url': 'https://aktien-portal.at/shownews.html?id=76826', 'publication_date': '2027-05-20', 'indexed_date': '2024-05-27T16:19:07.741297'}
{'url': 'https://aktien-portal.at/shownews.html?id=77100', 'publication_date': '2027-06-20', 'indexed_date': '2024-06-27T22:37:18.718831'}
{'url': 'https://aktien-portal.at/shownews.html?id=77099', 'publication_date': '2027-06-20', 'indexed_date': '2024-06-27T22:37:44.585990'}
{'url': 'https://aktien-portal.at/shownews.html?id=77333', 'publication_date': '2027-07-20', 'indexed_date': '2024-07-28T15:20:56.673240'}
{'url': 'https://aktien-portal.at/shownews.html?id=77338', 'publication_date': '2027-07-20', 'indexed_date': '2024-07-28T15:19:18.726445'}
{'url': 'https://aktien-portal.at/shownews.html?id=77615', 'publication_date': '2027-08-20', 'indexed_date': '2024-08-27T12:30:38.205543'}
{'url': 'https://aktien-portal.at/shownews.html?id=77617', 'publication_date': '2027-08-20', 'indexed_date': '2024-08-27T21:19:04.506160'}
{'url': 'https://aktien-portal.at/shownews.html?id=77616', 'publication_date': '2027-08-20', 'indexed_date': '2024-08-27T12:29:58.136638'}
{'url': 'https://aktien-portal.at/shownews.html?id=77613', 'publication_date': '2027-08-20', 'indexed_date': '2024-08-27T12:29:33.921765'}
{'url': 'https://aktien-portal.at/shownews.html?id=77881', 'publication_date': '2027-09-20', 'indexed_date': '2024-09-28T11:31:12.783954'}
{'url': 'https://aktien-portal.at/shownews.html?id=77877', 'publication_date': '2027-09-20', 'indexed_date': '2024-09-27T17:28:14.839811'}
{'url': 'https://aktien-portal.at/shownews.html?id=77876', 'publication_date': '2027-09-20', 'indexed_date': '2024-09-28T11:30:03.016828'}
{'url': 'https://aktien-portal.at/shownews.html?id=78077', 'publication_date': '2027-10-20', 'indexed_date': '2024-10-28T00:23:20.957083'}
{'url': 'https://aktien-portal.at/shownews.html?id=78076', 'publication_date': '2027-10-20', 'indexed_date': '2024-10-28T00:23:35.524138'}
{'url': 'https://aktien-portal.at/shownews.html?id=78420', 'publication_date': '2027-11-20', 'indexed_date': '2024-11-27T18:34:05.610349'}
{'url': 'https://aktien-portal.at/shownews.html?id=78414', 'publication_date': '2027-11-20', 'indexed_date': '2024-11-27T18:34:41.797170'}
{'url': 'https://aktien-portal.at/shownews.html?id=78416', 'publication_date': '2027-11-20', 'indexed_date': '2024-11-27T18:34:15.377373'}
{'url': 'https://aktien-portal.at/shownews.html?id=78948', 'publication_date': '2028-01-20', 'indexed_date': '2025-01-29T02:19:57.388820'}
{'url': 'https://aktien-portal.at/shownews.html?id=75907', 'publication_date': '2028-02-20', 'indexed_date': '2024-03-02T08:25:36.382800'}
{'url': 'https://aktien-portal.at/shownews.html?id=75905', 'publication_date': '2028-02-20', 'indexed_date': '2024-03-02T08:25:38.890281'}
{'url': 'https://aktien-portal.at/shownews.html?id=75902', 'publication_date': '2028-02-20', 'indexed_date': '2024-03-02T08:25:40.800051'}
{'url': 'https://aktien-portal.at/shownews.html?id=76253', 'publication_date': '2028-03-20', 'indexed_date': '2024-03-30T08:01:11.189964'}
{'url': 'https://aktien-portal.at/shownews.html?id=76251', 'publication_date': '2028-03-20', 'indexed_date': '2024-03-30T08:01:26.485591'}
{'url': 'https://aktien-portal.at/shownews.html?id=76248', 'publication_date': '2028-03-20', 'indexed_date': '2024-03-30T08:01:28.317794'}
{'url': 'https://aktien-portal.at/shownews.html?id=76837', 'publication_date': '2028-05-20', 'indexed_date': '2024-05-28T15:29:51.849699'}
{'url': 'https://aktien-portal.at/shownews.html?id=76830', 'publication_date': '2028-05-20', 'indexed_date': '2024-05-28T15:30:42.620461'}
{'url': 'https://aktien-portal.at/shownews.html?id=76834', 'publication_date': '2028-05-20', 'indexed_date': '2024-05-28T15:29:38.540841'}
{'url': 'https://aktien-portal.at/shownews.html?id=76836', 'publication_date': '2028-05-20', 'indexed_date': '2024-05-28T15:30:56.827862'}
{'url': 'https://aktien-portal.at/shownews.html?id=77114', 'publication_date': '2028-06-20', 'indexed_date': '2024-06-28T17:28:47.211437'}
{'url': 'https://aktien-portal.at/shownews.html?id=77344', 'publication_date': '2028-07-20', 'indexed_date': '2024-07-29T14:30:26.051531'}
{'url': 'https://aktien-portal.at/shownews.html?id=77627', 'publication_date': '2028-08-20', 'indexed_date': '2024-08-28T14:29:27.437010'}
{'url': 'https://aktien-portal.at/shownews.html?id=77634', 'publication_date': '2028-08-20', 'indexed_date': '2024-08-28T23:24:58.873634'}
{'url': 'https://aktien-portal.at/shownews.html?id=77639', 'publication_date': '2028-08-20', 'indexed_date': '2024-08-28T23:26:13.086921'}
{'url': 'https://aktien-portal.at/shownews.html?id=77636', 'publication_date': '2028-08-20', 'indexed_date': '2024-08-28T23:26:00.510742'}
{'url': 'https://aktien-portal.at/shownews.html?id=77629', 'publication_date': '2028-08-20', 'indexed_date': '2024-08-28T14:28:07.887527'}
{'url': 'https://aktien-portal.at/shownews.html?id=77630', 'publication_date': '2028-08-20', 'indexed_date': '2024-08-28T14:28:53.979926'}
{'url': 'https://aktien-portal.at/shownews.html?id=78085', 'publication_date': '2028-10-20', 'indexed_date': '2024-10-28T18:26:20.787596'}
{'url': 'https://aktien-portal.at/shownews.html?id=78082', 'publication_date': '2028-10-20', 'indexed_date': '2024-10-28T18:25:01.429502'}
{'url': 'https://aktien-portal.at/shownews.html?id=78083', 'publication_date': '2028-10-20', 'indexed_date': '2024-10-28T18:26:11.168246'}
{'url': 'https://aktien-portal.at/shownews.html?id=78441', 'publication_date': '2028-11-20', 'indexed_date': '2024-11-28T18:30:13.449164'}
{'url': 'https://aktien-portal.at/shownews.html?id=78734', 'publication_date': '2028-12-20', 'indexed_date': '2024-12-29T12:25:55.587006'}
{'url': 'https://aktien-portal.at/shownews.html?id=78950', 'publication_date': '2029-01-20', 'indexed_date': '2025-01-30T02:18:57.675341'}
{'url': 'https://aktien-portal.at/shownews.html?id=75604', 'publication_date': '2029-01-20', 'indexed_date': '2024-02-01T05:24:27'}
{'url': 'https://aktien-portal.at/shownews.html?id=75919', 'publication_date': '2029-02-20', 'indexed_date': '2024-03-02T08:25:19.006432'}
{'url': 'https://aktien-portal.at/shownews.html?id=76265', 'publication_date': '2029-03-20', 'indexed_date': '2024-03-31T03:07:44.672234'}
{'url': 'https://aktien-portal.at/shownews.html?id=76529', 'publication_date': '2029-04-20', 'indexed_date': '2024-04-30T05:22:54.360870'}
{'url': 'https://aktien-portal.at/shownews.html?id=76528', 'publication_date': '2029-04-20', 'indexed_date': '2024-04-29T08:29:21.016830'}
{'url': 'https://aktien-portal.at/shownews.html?id=76533', 'publication_date': '2029-04-20', 'indexed_date': '2024-04-30T05:23:42.714073'}
{'url': 'https://aktien-portal.at/shownews.html?id=76854', 'publication_date': '2029-05-20', 'indexed_date': '2024-05-29T15:26:00.143302'}
{'url': 'https://aktien-portal.at/shownews.html?id=76859', 'publication_date': '2029-05-20', 'indexed_date': '2024-05-29T15:24:36.193787'}
{'url': 'https://aktien-portal.at/shownews.html?id=76855', 'publication_date': '2029-05-20', 'indexed_date': '2024-05-29T15:24:15.295846'}
{'url': 'https://aktien-portal.at/shownews.html?id=76857', 'publication_date': '2029-05-20', 'indexed_date': '2024-05-29T15:24:19.783508'}
{'url': 'https://aktien-portal.at/shownews.html?id=77121', 'publication_date': '2029-06-20', 'indexed_date': '2024-06-29T12:25:18.725909'}
{'url': 'https://aktien-portal.at/shownews.html?id=77120', 'publication_date': '2029-06-20', 'indexed_date': '2024-06-29T12:26:29.263819'}
{'url': 'https://aktien-portal.at/shownews.html?id=77644', 'publication_date': '2029-08-20', 'indexed_date': '2024-08-29T08:30:53.558851'}
{'url': 'https://aktien-portal.at/shownews.html?id=77642', 'publication_date': '2029-08-20', 'indexed_date': '2024-08-29T08:31:18.406048'}
{'url': 'https://aktien-portal.at/shownews.html?id=77645', 'publication_date': '2029-08-20', 'indexed_date': '2024-08-29T08:32:37.328822'}
{'url': 'https://aktien-portal.at/shownews.html?id=77890', 'publication_date': '2029-09-20', 'indexed_date': '2024-09-29T15:21:19.870540'}
{'url': 'https://aktien-portal.at/shownews.html?id=78105', 'publication_date': '2029-10-20', 'indexed_date': '2024-10-30T06:26:56.653121'}
{'url': 'https://aktien-portal.at/shownews.html?id=78470', 'publication_date': '2029-11-20', 'indexed_date': '2024-11-29T19:18:54.317374'}
{'url': 'https://aktien-portal.at/shownews.html?id=78466', 'publication_date': '2029-11-20', 'indexed_date': '2024-11-29T19:20:33.352347'}
{'url': 'https://aktien-portal.at/shownews.html?id=78739', 'publication_date': '2029-12-20', 'indexed_date': '2024-12-30T08:22:32.902585'}
{'url': 'https://aktien-portal.at/shownews.html?id=75395', 'publication_date': '2029-12-20', 'indexed_date': '2023-12-31T03:52:19'}
{'url': 'https://aktien-portal.at/shownews.html?id=78737', 'publication_date': '2029-12-20', 'indexed_date': '2024-12-29T12:25:34.575249'}
{'url': 'https://www.main-echo.de/region/stadt-kreis-aschaffenburg/erst-nach-der-christmette-gab-es-presskopf-kulinarische-braeuche-in-der-weihnachtszeit-im-spessart-art-8125581?utm_campaign=rss-aschaffenburg&utm_medium=rss&utm_source=rssfeed', 'publication_date': '2029-12-23', 'indexed_date': '2023-12-24T02:10:47'}
{'url': 'https://aktien-portal.at/shownews.html?id=78955', 'publication_date': '2030-01-20', 'indexed_date': '2025-01-31T02:20:55.338504'}
{'url': 'https://aktien-portal.at/shownews.html?id=78959', 'publication_date': '2030-01-20', 'indexed_date': '2025-01-31T02:21:59.818364'}
{'url': 'https://aktien-portal.at/shownews.html?id=78964', 'publication_date': '2030-01-20', 'indexed_date': '2025-01-31T02:21:00.175769'}
{'url': 'https://aktien-portal.at/shownews.html?id=78960', 'publication_date': '2030-01-20', 'indexed_date': '2025-01-31T02:21:13.659821'}
{'url': 'https://aktien-portal.at/shownews.html?id=78957', 'publication_date': '2030-01-20', 'indexed_date': '2025-01-31T02:21:05.547876'}
{'url': 'https://aktien-portal.at/shownews.html?id=75608', 'publication_date': '2030-01-20', 'indexed_date': '2024-02-01T05:24:24'}
{'url': 'https://www.elperiodico.com/es/sociedad/20300126/he-comprado-el-piso-de-un-traficante-para-poder-echarlo-de-casa-8219471?utm_source=rss-noticias&utm_medium=feed&utm_campaign=sociedad', 'publication_date': '2030-01-26', 'indexed_date': '2024-01-24T02:12:03'}
{'url': 'https://www.elperiodico.com/es/sociedad/20300126/alquiler-fuerza-ultimo-truco-mafias-ocupacion-8224986?utm_source=rss-noticias&utm_medium=feed&utm_campaign=sociedad', 'publication_date': '2030-01-26', 'indexed_date': '2024-01-24T02:12:03'}
{'url': 'https://www.elperiodico.com/es/sociedad/20300126/okupas-viven-difunta-tia-ocupacion-vivienda-digna-8219557?utm_source=rss-noticias&utm_medium=feed&utm_campaign=sociedad', 'publication_date': '2030-01-26', 'indexed_date': '2024-01-24T02:12:03'}
{'url': 'https://aktien-portal.at/shownews.html?id=76268', 'publication_date': '2030-03-20', 'indexed_date': '2024-04-01T03:33:58.951084'}
{'url': 'https://aktien-portal.at/shownews.html?id=76269', 'publication_date': '2030-03-20', 'indexed_date': '2024-04-01T03:33:53.194097'}
{'url': 'https://aktien-portal.at/shownews.html?id=76548', 'publication_date': '2030-04-20', 'indexed_date': '2024-05-01T02:20:10.363557'}
{'url': 'https://aktien-portal.at/shownews.html?id=76556', 'publication_date': '2030-04-20', 'indexed_date': '2024-05-01T02:20:36.380727'}
{'url': 'https://aktien-portal.at/shownews.html?id=76554', 'publication_date': '2030-04-20', 'indexed_date': '2024-05-01T02:19:04.954361'}
{'url': 'https://aktien-portal.at/shownews.html?id=76534', 'publication_date': '2030-04-20', 'indexed_date': '2024-04-30T05:22:49.283651'}
{'url': 'https://aktien-portal.at/shownews.html?id=77360', 'publication_date': '2030-07-20', 'indexed_date': '2024-07-30T16:14:07.361893'}
{'url': 'https://aktien-portal.at/shownews.html?id=77657', 'publication_date': '2030-08-20', 'indexed_date': '2024-08-30T13:22:27.481162'}
{'url': 'https://aktien-portal.at/shownews.html?id=77655', 'publication_date': '2030-08-20', 'indexed_date': '2024-08-30T13:23:26.373370'}
{'url': 'https://aktien-portal.at/shownews.html?id=77654', 'publication_date': '2030-08-20', 'indexed_date': '2024-08-30T13:22:39.223369'}
{'url': 'https://aktien-portal.at/shownews.html?id=77666', 'publication_date': '2030-08-20', 'indexed_date': '2024-08-30T23:17:22.800883'}
{'url': 'https://aktien-portal.at/shownews.html?id=77656', 'publication_date': '2030-08-20', 'indexed_date': '2024-08-30T13:23:34.790461'}
{'url': 'https://aktien-portal.at/shownews.html?id=77651', 'publication_date': '2030-08-20', 'indexed_date': '2024-08-30T13:23:21.883962'}
{'url': 'https://aktien-portal.at/shownews.html?id=77892', 'publication_date': '2030-09-20', 'indexed_date': '2024-09-30T20:36:11.705897'}
{'url': 'https://aktien-portal.at/shownews.html?id=77894', 'publication_date': '2030-09-20', 'indexed_date': '2024-09-30T20:36:16.554963'}
{'url': 'https://aktien-portal.at/shownews.html?id=78110', 'publication_date': '2030-10-20', 'indexed_date': '2024-10-31T01:17:24.593183'}
{'url': 'https://aktien-portal.at/shownews.html?id=78108', 'publication_date': '2030-10-20', 'indexed_date': '2024-10-31T01:17:40.368036'}
{'url': 'https://aktien-portal.at/shownews.html?id=78113', 'publication_date': '2030-10-20', 'indexed_date': '2024-10-31T01:18:32.836284'}
{'url': 'https://aktien-portal.at/shownews.html?id=78741', 'publication_date': '2030-12-20', 'indexed_date': '2024-12-31T04:19:38.231853'}
{'url': 'https://aktien-portal.at/shownews.html?id=75628', 'publication_date': '2031-01-20', 'indexed_date': '2024-02-02T07:32:15'}
{'url': 'https://aktien-portal.at/shownews.html?id=75623', 'publication_date': '2031-01-20', 'indexed_date': '2024-02-02T07:32:22'}
{'url': 'https://aktien-portal.at/shownews.html?id=76874', 'publication_date': '2031-05-20', 'indexed_date': '2024-05-31T16:25:29.989611'}
{'url': 'https://aktien-portal.at/shownews.html?id=76880', 'publication_date': '2031-05-20', 'indexed_date': '2024-05-31T16:26:15.013204'}
{'url': 'https://aktien-portal.at/shownews.html?id=77374', 'publication_date': '2031-07-20', 'indexed_date': '2024-07-31T14:29:25.539561'}
{'url': 'https://aktien-portal.at/shownews.html?id=78121', 'publication_date': '2031-10-20', 'indexed_date': '2024-10-31T19:26:20.494869'}
{'url': 'https://aktien-portal.at/shownews.html?id=78126', 'publication_date': '2031-10-20', 'indexed_date': '2024-10-31T19:27:18.565962'}
{'url': 'https://aktien-portal.at/shownews.html?id=75405', 'publication_date': '2031-12-20', 'indexed_date': '2024-01-02T01:59:47'}
{'url': 'https://belonging.berkeley.edu/e-newsletter-archive', 'publication_date': '2036-01-04', 'indexed_date': '2024-05-07T18:30:32.337466'}
{'url': 'http://world.kbs.co.kr/service/contents_view.htm?lang=k&menu_cate=business&board_seq=447579&board_code=akorea_economyPlus', 'publication_date': '2082-02-03', 'indexed_date': '2023-12-14T02:47:42'}
{'url': 'https://my.tattnalljournal.com/archive/2203-11-22-8', 'publication_date': '2203-11-22', 'indexed_date': '2023-11-23T05:30:10'}
{'url': 'https://my.tattnalljournal.com/archive/2203-11-22-7', 'publication_date': '2203-11-22', 'indexed_date': '2023-11-23T05:30:10'}
{'url': 'https://my.tattnalljournal.com/archive/2203-11-22-5', 'publication_date': '2203-11-22', 'indexed_date': '2023-11-23T05:30:10'}
{'url': 'https://my.tattnalljournal.com/archive/2203-11-22-4', 'publication_date': '2203-11-22', 'indexed_date': '2023-11-23T05:30:09'}
{'url': 'https://my.tattnalljournal.com/archive/2203-11-22-9', 'publication_date': '2203-11-22', 'indexed_date': '2023-11-23T05:30:10'}
{'url': 'https://my.tattnalljournal.com/archive/2203-11-22-10', 'publication_date': '2203-11-22', 'indexed_date': '2023-11-23T05:30:11'}
{'url': 'https://my.tattnalljournal.com/archive/2203-11-22-11', 'publication_date': '2203-11-22', 'indexed_date': '2023-11-23T05:30:10'}
{'url': 'https://my.tattnalljournal.com/archive/2203-11-22-6', 'publication_date': '2203-11-22', 'indexed_date': '2023-11-23T05:30:10'}
{'url': 'https://my.tattnalljournal.com/archive/2203-11-22-3', 'publication_date': '2203-11-22', 'indexed_date': '2023-11-23T05:30:07'}
{'url': 'https://my.tattnalljournal.com/archive/2203-11-22-12', 'publication_date': '2203-11-22', 'indexed_date': '2023-11-23T05:30:11'}
{'url': 'https://my.tattnalljournal.com/archive/2203-11-22-1', 'publication_date': '2203-11-22', 'indexed_date': '2023-11-23T05:30:05'}
{'url': 'https://my.tattnalljournal.com/archive/2203-11-22-2', 'publication_date': '2203-11-22', 'indexed_date': '2023-11-23T05:30:05'}
{'url': 'https://www.ladige.it/attualita/2420/09/23/ucraina-bomba-russo-su-condominio-a-zaporizhzhia-9-feriti-1.3884373', 'publication_date': '2420-09-23', 'indexed_date': '2024-09-23T05:24:23.026407'}
{'url': 'https://interfax.az/', 'publication_date': '8320-01-29', 'indexed_date': '2024-01-31T05:25:02'}
{'url': 'https://www.tss-tv.co.jp/mama/spicepresent/20250106.html', 'publication_date': '9999-01-06', 'indexed_date': '2025-01-07T17:20:39.389066'}
{'url': 'https://www.tss-tv.co.jp/apply/regular/20230112.html', 'publication_date': '9999-05-16', 'indexed_date': '2025-02-05T17:27:43.272826'}
@philbudne
Copy link
Contributor Author

philbudne commented Feb 11, 2025

NOTE: there could also be articles (now in the "past") that were from the future when indexed.

@kilemensi
Copy link

kilemensi commented Feb 12, 2025

Picked a few at random and it appears to be a date format / language issue

https://www.main-echo.de/region/stadt-kreis-aschaffenburg/erst-nach-der-christmette-gab-es-presskopf-kulinarische-braeuche-in-der-weihnachtszeit-im-spessart-art-8125581

  1. Language: DE
  2. Date in article (Local): 23.12.2023 - 00:00 Uhr,
  3. Date in article (ISO): 2023-12-23T00:00,
  4. Publication Date (Elasticsearch): 2029-12-23 - Got month and day correct but messed up the year

https://aktien-portal.at/shownews.html?id=75405

  1. Language: DE (Austrian? since .at domain)
  2. Date in article (Local): 31.12.2023 13:13,
  3. Date in article (ISO): 2023-12-13:13,
  4. Publication Date (Elasticsearch): 2031-12-20 - Got month correct but messed up day and year

https://belonging.berkeley.edu/e-newsletter-archive

  1. This is a not an article page; it appears to be a newsletter archive with dates and links to past newsletters.

https://www.tss-tv.co.jp/apply/regular/20230112.html

  1. This is not an article page; it appears to be a Japanese homepage of some sort (Based on google translate, there appears to be TV programmes, Sports, News, etc.

@m453h
Copy link
Contributor

m453h commented Feb 12, 2025

I wrote a small script to pass through all the shared links and attempt to extract the publication_date using the metadata-lib, I did notice that most do return None

https://www.tss-tv.co.jp/peace80th/broadcast%20schedule/20250601.html, None
https://www.spglobal.com/spdji/jp/events/17th-japan-etf-conference/, Content is too short
https://www.spglobal.com/spdji/jp/events/17th-annual-japan-etf-conference/, Content is too short
https://www.tss-tv.co.jp/event_information/engeki/20241120.html, None
https://www.main-echo.de/region/kreis-miltenberg/die-erf-im-kreis-miltenberg-wird-naturnah-gestaltet-art-8104044?utm_campaign=rss-kreis-miltenberg&utm_medium=rss&utm_source=rssfeed, 2023-12-04
https://www.tss-tv.co.jp/event_information/concert/20250628.html, None
https://aktien-portal.at/shownews.html?id=77304, None
https://www.italiannetwork.it/news.aspx?id=79010, None
https://www.tss-tv.co.jp/peace80th/broadcast%20schedule/20250726.html, None
https://btvnovinite.bg/svetut/kakva-e-vrazkata-mezhdu-kafjavata-mastna-takan-i-zatlastjavaneto.html, 2024-08-02
https://aktien-portal.at/shownews.html?id=77610, None
https://www.tss-tv.co.jp/peace80th/broadcast%20schedule/20250830.html, None
https://www.shn.ch/ueberregionales/ausland/2025-09-09/bald-drei-jahre-krieg-in-der-ukraine-desastroes-nicht-nur-fuer, 2025-02-09
https://aktien-portal.at/shownews.html?id=77862, None
https://aktien-portal.at/shownews.html?id=77863, None
https://news.livedoor.com/article/detail/27232855/, Content is too short
https://news.livedoor.com/article/detail/27250209/, Content is too short
http://shelbyville.staging.communityq.com/stories/cascade-bounces-back-with-blowout-win-over-chapel-hill,154412, Content is too short
https://republica.gt/actualidad/guatemala-registra-un-220-mas-de-muertes-por-dengue-que-en-2023-202510211400, 2024-09-21
https://rondoniaovivo.com/noticia/utilidadepublica/2025/10/23/prevencao-celebracao-dos-110-anos-de-pvh-inclui-atividade-de-saude-em-sua-programacao.html, None
https://www.shn.ch/multimedia/podcast/2025-10-24/linda-de-ventura-kapitalismus-nicht-ueberwinden-aber-dringend, 2025-01-31
https://rondoniaovivo.com/noticia/geral/2025/11/15/natal-porto-de-luz-decoracao-no-parque-da-cidade-atrai-visitantes-e-familias.html, None
https://aktien-portal.at/shownews.html?id=78382, None
https://aktien-portal.at/shownews.html?id=78381, None
https://aktien-portal.at/shownews.html?id=78377, None
https://www.geekculture.com/joyoftech/joyarchives/3131.html, None
https://rondoniaovivo.com/noticia/geral/2025/12/01/quase-parou-hidreletricas-do-rio-madeira-voltam-a-operar-com-metade-da-potencia.html, None
https://aktien-portal.at/shownews.html?id=78725, None
http://perryville.staging.communityq.com/stories/tg-missouri-planning-nearly-100-million-expansion-in-perryville,106871, Content is too short
https://www.kantei.go.jp/jp/103/discourse/20241229message.html, None
https://www.kantei.go.jp/jp/103/discourse/20241230danwa.html, None
https://aktien-portal.at/shownews.html?id=75590, None
https://www.irishcentral.com/travel/old-head-golf-links-visit-cork-2024-green-award-winner, 2025-01-26
https://aktien-portal.at/shownews.html?id=75881, None
https://aktien-portal.at/shownews.html?id=75880, None
https://aktien-portal.at/shownews.html?id=76513, None
https://aktien-portal.at/shownews.html?id=76506, None
https://aktien-portal.at/shownews.html?id=77317, None
https://aktien-portal.at/shownews.html?id=77313, None
https://aktien-portal.at/shownews.html?id=77315, None
https://aktien-portal.at/shownews.html?id=77865, None
https://www.elperiodico.com/es/sociedad/20270101/maristas-indemnizacion-abusos-sexuales-benitez-8228832?utm_source=rss-noticias&utm_medium=feed&utm_campaign=sociedad, 2020-12-03
https://www.elperiodico.com/es/sociedad/20270101/maristas-reconocen-25-victimas-abusos-8228188?utm_source=rss-noticias&utm_medium=feed&utm_campaign=sociedad, 2020-12-03
https://www.elperiodico.cat/ca/societat/20270101/maristes-reconeixen-25-viacutectimes-abusos-8228188?utm_source=rss-noticias&utm_medium=feed&utm_campaign=portada, 2020-12-03
https://aktien-portal.at/shownews.html?id=78940, None
https://aktien-portal.at/shownews.html?id=75893, None
https://aktien-portal.at/shownews.html?id=75892, None
https://aktien-portal.at/shownews.html?id=75895, None
https://aktien-portal.at/shownews.html?id=75887, None
https://aktien-portal.at/shownews.html?id=75894, None
https://aktien-portal.at/shownews.html?id=75886, None
https://aktien-portal.at/shownews.html?id=76232, None
https://aktien-portal.at/shownews.html?id=76239, None
https://aktien-portal.at/shownews.html?id=76522, None
https://aktien-portal.at/shownews.html?id=76824, None
https://aktien-portal.at/shownews.html?id=76826, None
https://aktien-portal.at/shownews.html?id=77100, None
https://aktien-portal.at/shownews.html?id=77099, None
https://aktien-portal.at/shownews.html?id=77333, None
https://aktien-portal.at/shownews.html?id=77338, None
https://aktien-portal.at/shownews.html?id=77615, None
https://aktien-portal.at/shownews.html?id=77617, None
https://aktien-portal.at/shownews.html?id=77616, None
https://aktien-portal.at/shownews.html?id=77613, None
https://aktien-portal.at/shownews.html?id=77881, None
https://aktien-portal.at/shownews.html?id=77877, None
https://aktien-portal.at/shownews.html?id=77876, None
https://aktien-portal.at/shownews.html?id=78077, None
https://aktien-portal.at/shownews.html?id=78076, None
https://aktien-portal.at/shownews.html?id=78420, None
https://aktien-portal.at/shownews.html?id=78414, None
https://aktien-portal.at/shownews.html?id=78416, None
https://aktien-portal.at/shownews.html?id=78948, None
https://aktien-portal.at/shownews.html?id=75907, None
https://aktien-portal.at/shownews.html?id=75905, None
https://aktien-portal.at/shownews.html?id=75902, None
https://aktien-portal.at/shownews.html?id=76253, None
https://aktien-portal.at/shownews.html?id=76251, None
https://aktien-portal.at/shownews.html?id=76248, None
https://aktien-portal.at/shownews.html?id=76837, None
https://aktien-portal.at/shownews.html?id=76830, None
https://aktien-portal.at/shownews.html?id=76834, None
https://aktien-portal.at/shownews.html?id=76836, None
https://aktien-portal.at/shownews.html?id=77114, None
https://aktien-portal.at/shownews.html?id=77344, None
https://aktien-portal.at/shownews.html?id=77627, None
https://aktien-portal.at/shownews.html?id=77634, None
https://aktien-portal.at/shownews.html?id=77639, None
https://aktien-portal.at/shownews.html?id=77636, None
https://aktien-portal.at/shownews.html?id=77629, None
https://aktien-portal.at/shownews.html?id=77630, None
https://aktien-portal.at/shownews.html?id=78085, None
https://aktien-portal.at/shownews.html?id=78082, None
https://aktien-portal.at/shownews.html?id=78083, None
https://aktien-portal.at/shownews.html?id=78441, None
https://aktien-portal.at/shownews.html?id=78734, None
https://aktien-portal.at/shownews.html?id=78950, None
https://aktien-portal.at/shownews.html?id=75604, None
https://aktien-portal.at/shownews.html?id=75919, None
https://aktien-portal.at/shownews.html?id=76265, None
https://aktien-portal.at/shownews.html?id=76529, None
https://aktien-portal.at/shownews.html?id=76528, None
https://aktien-portal.at/shownews.html?id=76533, None
https://aktien-portal.at/shownews.html?id=76854, None
https://aktien-portal.at/shownews.html?id=76859, None
https://aktien-portal.at/shownews.html?id=76855, None
https://aktien-portal.at/shownews.html?id=76857, None
https://aktien-portal.at/shownews.html?id=77121, None
https://aktien-portal.at/shownews.html?id=77120, None
https://aktien-portal.at/shownews.html?id=77644, None
https://aktien-portal.at/shownews.html?id=77642, None
https://aktien-portal.at/shownews.html?id=77645, None
https://aktien-portal.at/shownews.html?id=77890, None
https://aktien-portal.at/shownews.html?id=78105, None
https://aktien-portal.at/shownews.html?id=78470, None
https://aktien-portal.at/shownews.html?id=78466, None
https://aktien-portal.at/shownews.html?id=78739, None
https://aktien-portal.at/shownews.html?id=75395, None
https://aktien-portal.at/shownews.html?id=78737, None
https://www.main-echo.de/region/stadt-kreis-aschaffenburg/erst-nach-der-christmette-gab-es-presskopf-kulinarische-braeuche-in-der-weihnachtszeit-im-spessart-art-8125581?utm_campaign=rss-aschaffenburg&utm_medium=rss&utm_source=rssfeed, 2023-12-22
https://aktien-portal.at/shownews.html?id=78955, None
https://aktien-portal.at/shownews.html?id=78959, None
https://aktien-portal.at/shownews.html?id=78964, None
https://aktien-portal.at/shownews.html?id=78960, None
https://aktien-portal.at/shownews.html?id=78957, None
https://aktien-portal.at/shownews.html?id=75608, None
https://www.elperiodico.com/es/sociedad/20300126/he-comprado-el-piso-de-un-traficante-para-poder-echarlo-de-casa-8219471?utm_source=rss-noticias&utm_medium=feed&utm_campaign=sociedad, 2021-02-02
https://www.elperiodico.com/es/sociedad/20300126/alquiler-fuerza-ultimo-truco-mafias-ocupacion-8224986?utm_source=rss-noticias&utm_medium=feed&utm_campaign=sociedad, 2021-02-02
https://www.elperiodico.com/es/sociedad/20300126/okupas-viven-difunta-tia-ocupacion-vivienda-digna-8219557?utm_source=rss-noticias&utm_medium=feed&utm_campaign=sociedad, 2021-02-03
https://aktien-portal.at/shownews.html?id=76268, None
https://aktien-portal.at/shownews.html?id=76269, None
https://aktien-portal.at/shownews.html?id=76548, None
https://aktien-portal.at/shownews.html?id=76556, None
https://aktien-portal.at/shownews.html?id=76554, None
https://aktien-portal.at/shownews.html?id=76534, None
https://aktien-portal.at/shownews.html?id=77360, None
https://aktien-portal.at/shownews.html?id=77657, None
https://aktien-portal.at/shownews.html?id=77655, None
https://aktien-portal.at/shownews.html?id=77654, None
https://aktien-portal.at/shownews.html?id=77666, None
https://aktien-portal.at/shownews.html?id=77656, None
https://aktien-portal.at/shownews.html?id=77651, None
https://aktien-portal.at/shownews.html?id=77892, None
https://aktien-portal.at/shownews.html?id=77894, None
https://aktien-portal.at/shownews.html?id=78110, None
https://aktien-portal.at/shownews.html?id=78108, None
https://aktien-portal.at/shownews.html?id=78113, None
https://aktien-portal.at/shownews.html?id=78741, None
https://aktien-portal.at/shownews.html?id=75628, None
https://aktien-portal.at/shownews.html?id=75623, None
https://aktien-portal.at/shownews.html?id=76874, None
https://aktien-portal.at/shownews.html?id=76880, None
https://aktien-portal.at/shownews.html?id=77374, None
https://aktien-portal.at/shownews.html?id=78121, None
https://aktien-portal.at/shownews.html?id=78126, None
https://aktien-portal.at/shownews.html?id=75405, None
https://belonging.berkeley.edu/e-newsletter-archive, None
http://world.kbs.co.kr/service/contents_view.htm?lang=k&menu_cate=business&board_seq=447579&board_code=akorea_economyPlus, None
https://my.tattnalljournal.com/archive/2203-11-22-8, None
https://my.tattnalljournal.com/archive/2203-11-22-7, None
https://my.tattnalljournal.com/archive/2203-11-22-5, None
https://my.tattnalljournal.com/archive/2203-11-22-4, None
https://my.tattnalljournal.com/archive/2203-11-22-9, None
https://my.tattnalljournal.com/archive/2203-11-22-10, None
https://my.tattnalljournal.com/archive/2203-11-22-11, None
https://my.tattnalljournal.com/archive/2203-11-22-6, None
https://my.tattnalljournal.com/archive/2203-11-22-3, None
https://my.tattnalljournal.com/archive/2203-11-22-12, None
https://my.tattnalljournal.com/archive/2203-11-22-1, None
https://my.tattnalljournal.com/archive/2203-11-22-2, None
https://www.ladige.it/attualita/2420/09/23/ucraina-bomba-russo-su-condominio-a-zaporizhzhia-9-feriti-1.3884373, Content is too short
https://interfax.az/,Failed to resolve 'interfax.az'
https://www.tss-tv.co.jp/mama/spicepresent/20250106.html, None
https://www.tss-tv.co.jp/apply/regular/20230112.html, None

@rahulbot
Copy link
Contributor

Perhaps a list like this of test cases or real failures should be posted as an issue on https://github.com/adbar/htmldate?

@pgulley
Copy link
Member

pgulley commented Feb 12, 2025

So, I guess the question is why these stories are being /indexed/ with a future date, if the indexer is meant to be stopping it. Are these from some reindexing batch perhaps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants