Pipeline Monitor
CC webgraph fetch + PageRank per TLD
TLDProgressWersjeDiskDaneStatus
.pl51/519.3GB✓DONE
100%
.cz36/517.5GB✓RUNNING
71%
.eu0/515.1GB✓ERROR
0%
.info0/517.4GB✓ERROR
0%
.de3/51——RUNNING
6%
.org0/51——RUNNING
0%
.it0/51——RUNNING
0%
.fr0/51——RUNNING
0%
1
TLD done
5
TLD running
4
ma wyniki
0
pending
Ostatnie logi
[pipeline_sequential.log] Start: Sat Jun 27 08:48:17 PM CEST 2026
[pipeline_sequential.log] ======================================================
[pipeline_sequential.log] [20:48:17] START fetch .de
[pipeline_sequential.log] PID 2923690 → logs/fetch_de.log
[pipeline_sequential.log] Czekam na pobranie wszystkich TLD...
[va_scan.log] ✅ zxiaodaochu.com (1 crawli)
[va_scan.log] ✅ zxqxztuz.org (1 crawli)
[va_scan.log] ✅ zxzshangmao.com (1 crawli)
[va_scan.log] ✅ zz2959.com (1 crawli)
[va_scan.log] ✅ zztoptribute.dk (1 crawli)
[wayback_undiscovered.log] Outlinks: 0 unique domains
[wayback_undiscovered.log] [17/17] pillme.pl (3 CC links)
[wayback_undiscovered.log] CDX: 0 HTML snapshots
[wayback_undiscovered.log] Done. Processed: 17, with outlinks: 5
[wayback_undiscovered.log] Output: /var/www/lethe/data/result_wayback_outlinks.csv
Faza 1: fetch_webgraph_tld.py → S3 CC bucket → filtrowanie per TLD
Faza 2: pagerank.py → DuckDB ranks table
Faza 3: analyze_expired_tld.py --top 3000 --min-backlinks 5 → result_expired_*.csv
Pipeline: /var/www/lethe/run_tld_pipeline.sh de org it fr