Corentin Chary
14971584af
euscan: robots.txt, timeout, user-agent, ...
...
- Add a blacklist for robots.txt, we *want* to scan sourceforge
- Set a user-agent that doesn't looks like a browser
- Handle timeouts more carefully
- If brute force detect too much versions, avoid infinite loops
- Handle redirections more carefully
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
2011-09-21 10:09:50 +02:00
Corentin Chary
8c40a1795c
euscan: blacklist art.gnome.org
...
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
2011-09-10 08:25:39 +02:00
Corentin Chary
9da62b211b
euscan: fix some robots.txt issues
...
- disable checks for ftp
- fail silently
- use einfo and not eerror
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
2011-09-10 08:23:46 +02:00
Corentin Chary
c5af0e1937
euscan: don't mix spaces and tabs
...
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
2011-09-06 17:35:17 +02:00
Corentin Chary
2210b2610d
euscan: don't get robots.txt on ftp
...
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
2011-09-06 17:34:50 +02:00
Corentin Chary
a137ef60e3
euscan: respect robots.txt
...
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
2011-09-06 16:32:29 +02:00
Corentin Chary
bd75e1af4e
euscan/helpers: use HEAD in tryurl
...
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
2011-09-06 15:47:54 +02:00
Corentin Chary
454d369ced
euscan/handlers: fix resursive brute force in generic handler
...
component was modified by the function since it's a reference,
do an explicit copy to fix that.
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
2011-09-06 15:47:00 +02:00
Corentin Chary
8dc19b9856
euscan: fix some errors
...
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
2011-09-06 09:17:08 +02:00
Corentin Chary
752fb04425
euscan: shake the code
...
- add custom site handlers
- use a custom user agent
- fix some bugs in management commands
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
2011-08-31 15:38:32 +02:00