Compare commits

...

32 Commits

Author SHA1 Message Date
Alfred Wingate 5b9d44fee1
TODO: weird docs handling
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 20:58:58 +02:00
Alfred Wingate d8d1767766
TODO: debian remote-id exists
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 20:58:58 +02:00
Alfred Wingate fbd7a4e139
handlers/github: remove
* Mirror removed and api has very strict ratelimits making impractical
  to use.
* https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=f119d00dab0c3bd087faab36f1a44734772a9d75

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 20:58:58 +02:00
Alfred Wingate a7ff66ae04
handlers/pypi: stop using mirrors
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 20:58:58 +02:00
Alfred Wingate 5da26b0719
handlers/rubygems: stop using mirrors
* https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=65850a10f84e1b7a2cdf55392fa1d1f0717193c1

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 20:58:58 +02:00
Alfred Wingate 656f8e155e
handlers/google_code: dead
* https://bugs.gentoo.org/544092

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 20:58:57 +02:00
Alfred Wingate 294dcc2a9c
handlers/freecode: never shouldve been used in ebuilds
* https://bugs.gentoo.org/637970

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 20:58:57 +02:00
Alfred Wingate c628edc26b
handlers/berlios: obselete
* mirror removed in https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=2b72b0462bea5b34bbe4d767ccc44866df81515e
* Rest of the berlios urls use sourceforge now.

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 19:36:20 +02:00
Alfred Wingate 61cbb8e3f9
pre-commit: autoupdate versions
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 19:11:43 +02:00
Alfred Wingate b2cd013b09
Workaround hard to parse $'' strings
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 00:56:41 +02:00
Alfred Wingate e9fd94e1a5
Blacklist urls that don't make sense to scan
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 00:56:41 +02:00
Alfred Wingate e13a62af84
Remove encoding keyword from json()
* Removed in Python 3.9

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 00:56:41 +02:00
Alfred Wingate d93c3154ac
Update GNOME_URL_SOURCE
* It gets redirected eitherway.

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 00:56:41 +02:00
Alfred Wingate 9809d9a805
Add Gitea(+ Forgejo) handler
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-03 00:56:36 +02:00
Alfred Wingate d217c839a9
Add GitLab handler
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-02 22:13:44 +02:00
Alfred Wingate aad99f71fe
Use JSON api for PyPi
* "The XML-RPC API will be deprecated in the future. Use of this API is
  not recommended, and existing consumers of the API should migrate to
  the RSS and/or JSON APIs instead."
* "As a result, this API has a very restrictive rate limit and it may be
  necessary to pause between successive requests." As such this also
  gets around this issue for euscan.

https://warehouse.pypa.io/api-reference/xml-rpc.html

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2024-03-02 16:18:34 +02:00
Alfred Wingate 9465c14342
Remove more euscanwww stragglers from TODO
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 07:12:53 +02:00
Alfred Wingate 22915bade5
Fix kde handler
* It appears it was broken in the midst of 8d912379, no apparent
  rationale for it being changed there.

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 07:07:11 +02:00
Alfred Wingate 443b5f62fd
Enable flake8-bugbear linting and fix raised issues
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 06:25:40 +02:00
Alfred Wingate 49f1fbbad1
Remove Python2'isms from classes
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 06:18:19 +02:00
Alfred Wingate a03b420c75
Use OSError instead of alias IOError
https://docs.astral.sh/ruff/rules/os-error-alias/

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 06:15:48 +02:00
Alfred Wingate 6c0b816e73
Use f-strings or .format() over percent style
https://docs.astral.sh/ruff/rules/printf-string-formatting/

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 05:56:40 +02:00
Alfred Wingate 21fe4eafec
Enable pyupgrade linting
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 05:21:49 +02:00
Alfred Wingate 377ba2f727
N806
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 05:19:02 +02:00
Alfred Wingate 9f7ba6c9cd
Address N818
https://peps.python.org/pep-0008/#exception-names

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 05:17:14 +02:00
Alfred Wingate 764bcf9ce8
Enable PEP8 linting in ruff
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 05:16:25 +02:00
Alfred Wingate c0be0e0b67
Fix invalid backslash characters
* https://docs.python.org/3/whatsnew/3.6.html#deprecated-python-behavior

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 05:10:40 +02:00
Alfred Wingate d48699e5fd
Enable all pycodestyle checks in ruff
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 05:02:30 +02:00
Alfred Wingate eedf3c5939
Remove euscanwww leftovers from TODO
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 04:51:30 +02:00
Alfred Wingate 7ac854dc61
Add changelog to python metadata
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 04:43:44 +02:00
Alfred Wingate 0551629a9a
Update MANIFEST.in
* Missed in previous changes

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 04:39:25 +02:00
Alfred Wingate 17c4e19bc5
Filter XMLParsedAsHTMLWarnings
* Parsing xhtml sites would trigger it.

Signed-off-by: Alfred Wingate <parona@protonmail.com>
2023-11-16 04:27:57 +02:00
27 changed files with 298 additions and 418 deletions

View File

@ -1,15 +1,15 @@
repos:
- repo: https://github.com/psf/black
rev: 23.11.0
rev: 24.2.0
hooks:
- id: black
- repo: https://github.com/PyCQA/isort
rev: 5.12.0
rev: 5.13.2
hooks:
- id: isort
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.4
rev: v0.3.0
hooks:
- id: ruff

View File

@ -1,8 +1,8 @@
include AUTHORS
include CHANGELOG.rst
include LICENSE
include README.rst
include TODO
include setup.py
include pyproject.toml
recursive-include bin *
recursive-include man *
recursive-include pym *.py
recursive-include src *.py

60
TODO
View File

@ -42,61 +42,7 @@ euscan
- remote-id type deb repository:
-- find out how to get download url (not sure it's possible)
### remote-id
- Propose new remote-id: deb
e.g.: <remote-id type="deb">
http://mysite.com/deb/dists/stable/main/binary-i386/Packages
</remote-id>
- Propose new remote-id: freecode
e.g.: <remote-id type="freecode">projectname</remote-id>
### bugs or unwanted behavior
euscanwww
---------
### misc
- Really fix mails: better formating
- Always keep in db all found versions (when using an API only?). But don't display them if older than current packaged version, except maybe in the "upstream_version" column.
### packages
- Ignore alpha/beta if current is not alpha/beta: per-package setting using metadata.xml ?
- ~arch / stable support: see "models: keywords"
- stabilisation candidates: check stabilizations rules, and see how this can be automated
- set upstream version by hand: will be done after uscan compatiblity
### logs
- Move log models into djeuscanhistory ?
### models
- Repository (added or not, from layman + repositories.xml)
- Arches and Keyword
- Metadata, herds, maintainers and homepage are per-version, not per package. Store it in Version instead.
### djportage (LOW-PRIORITY))
- Create standalone application to scan and represent portage trees in models using work done in:
-- euscan
-- p.g.o: https://github.com/bacher09/gentoo-packages
-- gentoostats: https://github.com/gg7/gentoostats_server/blob/master/gentoostats/stats/models.py
The application should be easy to use, and we should be able to launch the scan process in a celery worker using "logging" for logs.
The application should also be usable for p.g.o and gentoostats later...
The scan process should be faster than the one using euscan. gentoo-packages have some interesting ideas for that (keeping metadata and ebuild hash, etc..)
### API (LOW-PRIORITY)
- Move to tastypie
### Overlays
/!\ blocked by "djportage" application
Currently, overlay handling in euscan sucks (it's simply a column nothing more, and they are mostly handled by hand by layman). I'd like to be able to add and remove overlays (overlay name + svn/git/cvs/rsync url). Using a new model and layman API should make this task easy.
/!\ could be done earlier using a simple "overlay" table ... but how to pre-compute everything per-overlay ?
Once done, a great feature would be to be able to select displayed overlay on euscan (as a global setting: for all pages). This is actually a lot of work, so you should work on that on a separated branch.
Note that this is more complicated that it seems, because a lot of things are precalculated (number of version for this herd, number of outdated versions, etc..), and selecting overlays would break all this. So you'll really need to experiment solutions for this one.
- Parsing docs and accepting 404's
-- net-analyzer/sensu

View File

@ -199,7 +199,7 @@ def print_usage(_error=None, help=None):
# turquoise("`man %s`" % __productname__), file=out)
class ParseArgsException(Exception):
class ParseArgsError(Exception):
"""For parseArgs() -> main() communications."""
def __init__(self, value):
@ -220,9 +220,9 @@ def parse_args():
return_code = True
for o, a in opts:
if o in ("-h", "--help"):
raise ParseArgsException("help")
raise ParseArgsError("help")
elif o in ("-V", "--version"):
raise ParseArgsException("version")
raise ParseArgsError("version")
elif o in ("-C", "--nocolor"):
CONFIG["nocolor"] = True
pp.output.nocolor()
@ -283,14 +283,14 @@ def parse_args():
# apply getopts to command line, show partial help on failure
try:
opts, args = getopt.getopt(sys.argv[1:], short_opts, long_opts)
except getopt.GetoptError:
raise ParseArgsException(opts_mode + "-options")
except getopt.GetoptError as exc:
raise ParseArgsError(opts_mode + "-options") from exc
# set options accordingly
option_switch(opts)
if len(args) < 1:
raise ParseArgsException("packages")
raise ParseArgsError("packages")
return args
@ -306,7 +306,7 @@ def main():
# parse command line options and actions
try:
queries = parse_args()
except ParseArgsException as e:
except ParseArgsError as e:
if e.value == "help":
print_usage(help="all")
exit_helper(0)
@ -362,7 +362,7 @@ def main():
exit_helper(1)
except GentoolkitException as err:
output.eerror("%s: %s" % (query, str(err)))
output.eerror(f"{query}: {str(err)}")
exit_helper(1)
except Exception as err:
@ -372,7 +372,7 @@ def main():
traceback.print_exc(file=sys.stderr)
print("-" * 60)
output.eerror("%s: %s" % (query, str(err)))
output.eerror(f"{query}: {str(err)}")
exit_helper(1)
if not ret and not CONFIG["quiet"]:

View File

@ -30,10 +30,10 @@ def guess_indent_values(before):
def guess_for_tags(tags):
for tag in tags:
for i in [0, 2, 4, 6, 8, 12, 16]:
if "\n%s<%s" % (" " * i, tag) in before:
if f"\n{' ' * i}<{tag}" in before:
return i, False
for i in [0, 1, 2]:
if "\n%s<%s" % ("\t" * i, tag) in before:
if f"\n{'\t' * i}<{tag}" in before:
return i, True
return -1, False
@ -119,11 +119,11 @@ def get_deb_url(name):
content = opened.read()
for link in BeautifulSoup(content, parseOnlyThese=SoupStrainer("a")):
if re.match("[^\s]+\.debian\.tar\.(?:gz|bz2)", link.text):
if re.match(r"[^\s]+\.debian\.tar\.(?:gz|bz2)", link.text):
deb_url = link["href"]
deb_type = "source"
break
if re.match("[^\s]+\.diff\.gz", link.text):
if re.match(r"[^\s]+\.diff\.gz", link.text):
deb_url = link["href"]
deb_type = "diff"
break
@ -157,7 +157,7 @@ def patch_metadata(package, watch_data, diff=False):
for watch_line in watch_data.split("\n"): # there can be multiple lines
watch_line = " ".join(watch_line.split()) # remove extra spaces and \n
version_parse = re.match("version=(\d+?)", watch_line)
version_parse = re.match(r"version=(\d+?)", watch_line)
if version_parse:
version = version_parse.group(1)
continue
@ -180,7 +180,7 @@ def patch_metadata(package, watch_data, diff=False):
if opt_name in valid:
if opt_name == "uversionmangle":
opt_name = "versionmangle"
cleaned_opts.append('%s="%s"' % (opt_name, opt_value))
cleaned_opts.append(f'{opt_name}="{opt_value}"')
opts = " ".join(cleaned_opts)
# clean url from useless stuff. Just keep <base> [<filepattern>]
@ -188,14 +188,9 @@ def patch_metadata(package, watch_data, diff=False):
url = " ".join([x for x in url_search.groups() if x is not None])
if opts:
watch_tag = '%s<watch version="%s" %s>%s</watch>' % (
indent,
version,
opts,
url,
)
watch_tag = f'{indent}<watch version="{version}" {opts}>{url}</watch>'
else:
watch_tag = '%s<watch version="%s">%s</watch>' % (indent, version, url)
watch_tag = f'{indent}<watch version="{version}">{url}</watch>'
watch_tags.append(watch_tag)
watch_tags = "\n".join(watch_tags)
@ -203,11 +198,7 @@ def patch_metadata(package, watch_data, diff=False):
if "<upstream>" in data:
data = data.replace("<upstream>", "<upstream>\n%s" % watch_tags, 1)
else:
rep = "%s<upstream>\n%s\n%s</upstream>\n</pkgmetadata>" % (
rindent,
watch_tags,
rindent,
)
rep = f"{rindent}<upstream>\n{watch_tags}\n{rindent}</upstream>\n</pkgmetadata>"
data = data.replace("</pkgmetadata>", rep, 1)
if not diff:

View File

@ -16,12 +16,14 @@ description = "Ebuild upstream scan utility."
license = {text = "GPL-2.0"}
dependencies = [
"portage",
"beautifulsoup4>=4.8.2"
"beautifulsoup4>=4.8.2",
"packaging"
]
dynamic = ["version"]
[project.urls]
homepage = "https://gitlab.com/src_prepare/euscan-ng"
changelog = "https://gitlab.com/src_prepare/euscan-ng/-/blob/master/CHANGELOG.rst"
[tool.setuptools]
script-files = ["bin/euscan"]
@ -39,3 +41,6 @@ src_paths = ["bin/euscan", "src/euscan/"]
[tool.ruff]
extend-include = ["bin/euscan", "bin/euscan_patch_metadata"]
[tool.ruff.lint]
extend-select = ["B", "E", "N", "UP", "W"]

View File

@ -51,8 +51,13 @@ BLACKLIST_PACKAGES = [
]
SCANDIR_BLACKLIST_URLS = [
"mirror://rubygems/(.*)", # Not browsable
"https://rubygems.org/(.*)", # Not browsable
"mirror://gentoo/(.*)", # Directory too big
"https://dev.gentoo.org/(.*)", # There shouldn't be releases here
# Waste of time to go through
"https://crates.io/(.*)",
"https://api.nuget.org/(.*)",
"https://myget.org/(.*)",
]
BRUTEFORCE_BLACKLIST_PACKAGES = [
@ -74,13 +79,13 @@ BRUTEFORCE_BLACKLIST_URLS = [
ROBOTS_TXT_BLACKLIST_DOMAINS = [
"(.*)sourceforge(.*)",
"(.*)github.com",
"(.*)qt\.nokia\.com(.*)",
"(.*)chromium\.org(.*)",
"(.*)nodejs\.org(.*)",
"(.*)download\.mono-project\.com(.*)",
"(.*)fedorahosted\.org(.*)",
"(.*)download\.tuxfamily\.org(.*)",
"(.*)festvox\.org(.*)",
r"(.*)qt\.nokia\.com(.*)",
r"(.*)chromium\.org(.*)",
r"(.*)nodejs\.org(.*)",
r"(.*)download\.mono-project\.com(.*)",
r"(.*)fedorahosted\.org(.*)",
r"(.*)download\.tuxfamily\.org(.*)",
r"(.*)festvox\.org(.*)",
]
from euscan.out import EuscanOutput # noqa: E402

View File

@ -71,7 +71,7 @@ def package_from_ebuild(ebuild):
return False
ebuild_split = ebuild.split("/")
cpv = "%s/%s" % (ebuild_split[-3], pf)
cpv = f"{ebuild_split[-3]}/{pf}"
if not portage.catpkgsplit(cpv):
return False

View File

@ -13,7 +13,7 @@ from euscan import CONFIG, output
handlers = {"package": [], "url": [], "all": {}}
# autoimport all modules in this directory and append them to handlers list
for loader, module_name, is_pkg in pkgutil.walk_packages(__path__):
for loader, module_name, _is_pkg in pkgutil.walk_packages(__path__):
module = loader.find_spec(module_name).loader.load_module(module_name)
if not hasattr(module, "HANDLER_NAME"):
continue
@ -157,7 +157,7 @@ def scan_url(pkg, urls, options, on_progress=None):
else:
output.eerror("Can't find a suitable handler!")
except Exception as e:
output.ewarn("Handler failed: [%s] %s" % (e.__class__.__name__, str(e)))
output.ewarn(f"Handler failed: [{e.__class__.__name__}] {str(e)}")
if versions and CONFIG["oneshot"]:
break

View File

@ -1,59 +0,0 @@
# Copyright 2011 Corentin Chary <corentin.chary@gmail.com>
# Copyright 2020-2023 src_prepare group
# Distributed under the terms of the GNU General Public License v2
import re
import urllib.error
import urllib.parse
import urllib.request
import portage
from euscan import output
from euscan.handlers.url import process_scan as url_scan
from euscan.helpers import regex_from_template
HANDLER_NAME = "berlios"
CONFIDENCE = 90
PRIORITY = 90
berlios_regex = r"mirror://berlios/([^/]+)/([^/]+)"
def can_handle(pkg, url=None):
if not url:
return False
cp, ver, rev = portage.pkgsplit(pkg.cpv)
if ver not in url:
return False
return re.search(berlios_regex, url)
def scan_url(pkg, url, options):
output.einfo("Using BerliOS handler")
cp, ver, rev = portage.pkgsplit(pkg.cpv)
project, filename = re.search(berlios_regex, url).groups()
project_page = "http://developer.berlios.de/projects/%s" % project
content = urllib.request.urlopen(project_page).read()
project_id = re.search(r"/project/filelist.php\?group_id=(\d+)", content).group(1)
base_url = (
"http://developer.berlios.de/project/filelist.php?group_id=%s" % project_id
)
file_pattern = regex_from_template(filename.replace(ver, "${PV}"))
result = url_scan(pkg, base_url, file_pattern)
ret = []
for found_url, pv, _, _ in result:
found_url = found_url.replace("prdownload", "download")
ret.append((found_url, pv, HANDLER_NAME, CONFIDENCE))
return ret

View File

@ -81,7 +81,7 @@ def mangle_version(up_pv):
pv = ".".join(groups)
if rc_part:
pv = "%s_rc%s" % (pv, rc_part)
pv = f"{pv}_rc{rc_part}"
return pv
@ -128,7 +128,7 @@ def scan_pkg(pkg, options):
fp = helpers.urlopen(url)
except urllib.error.URLError:
return []
except IOError:
except OSError:
return []
if not fp:
@ -157,13 +157,7 @@ def scan_pkg(pkg, options):
if helpers.version_filtered(cp, m_ver, m_pv, cpan_vercmp):
continue
url = "mirror://cpan/authors/id/%s/%s/%s/%s" % (
version["cpanid"][0],
version["cpanid"][0:1],
version["cpanid"],
version["archive"],
)
url = f"mirror://cpan/authors/id/{version['cpanid'][0]}/{version['cpanid'][0:1]}/{version['cpanid']}/{version['archive']}"
url = mangling.mangle_url(url, options)
ret.append((url, pv, HANDLER_NAME, CONFIDENCE))

View File

@ -1,53 +0,0 @@
# Copyright 2011 Corentin Chary <corentin.chary@gmail.com>
# Copyright 2020-2023 src_prepare group
# Distributed under the terms of the GNU General Public License v2
import re
import urllib.error
import urllib.parse
import urllib.request
import portage
from euscan import helpers, mangling, output
HANDLER_NAME = "freecode"
CONFIDENCE = 100
PRIORITY = 90
def can_handle(pkg, url=None):
return False
def scan_pkg(pkg, options):
cp, ver, rev = portage.pkgsplit(pkg.cpv)
package = options["data"].strip()
output.einfo("Using FreeCode handler: " + package)
fp = urllib.request.urlopen("http://freecode.com/projects/%s/releases" % package)
content = str(fp.read())
result = re.findall(
r'<a href="/projects/%s/releases/(\d+)">([^<]+)</a>' % package, content
)
ret = []
for release_id, up_pv in result:
pv = mangling.mangle_version(up_pv, options)
if helpers.version_filtered(cp, ver, pv):
continue
fp = urllib.request.urlopen(
"http://freecode.com/projects/%s/releases/%s" % (package, release_id)
)
content = str(fp.read())
download_page = re.findall(r'<a href="(/urls/[^"]+)"', content)[0]
fp = urllib.request.urlopen("http://freecode.com%s" % download_page)
content = str(fp.read())
url = re.findall(
r'In case it doesn\'t, click here: <a href="([^"]+)"', content
)[0]
ret.append((url, pv, HANDLER_NAME, CONFIDENCE))
return ret

View File

@ -8,14 +8,11 @@ import re
import urllib.error
import urllib.parse
import urllib.request
import warnings
from urllib.parse import urljoin, urlparse
try:
from BeautifulSoup import BeautifulSoup
except ImportError:
from bs4 import BeautifulSoup
import portage
from bs4 import BeautifulSoup, XMLParsedAsHTMLWarning
from euscan import (
BRUTEFORCE_BLACKLIST_PACKAGES,
@ -65,6 +62,7 @@ def confidence_score(found, original, minimum=CONFIDENCE):
def scan_html(data, url, pattern):
warnings.filterwarnings("ignore", category=XMLParsedAsHTMLWarning)
soup = BeautifulSoup(data, features="lxml")
results = []
@ -114,7 +112,7 @@ def scan_directory_recursive(cp, ver, rev, url, steps, orig_url, options):
fp = helpers.urlopen(url)
except urllib.error.URLError:
return []
except IOError:
except OSError:
return []
if not fp:
@ -124,7 +122,7 @@ def scan_directory_recursive(cp, ver, rev, url, steps, orig_url, options):
results = []
if re.search(b"<\s*a\s+[^>]*href", data, re.I):
if re.search(rb"<\s*a\s+[^>]*href", data, re.I):
results.extend(scan_html(data, url, pattern))
elif url.startswith("ftp://"):
results.extend(scan_ftp(data, url, pattern))
@ -156,7 +154,7 @@ def scan_url(pkg, url, options):
if CONFIG["scan-dir"]:
for bu in SCANDIR_BLACKLIST_URLS:
if re.match(bu, url):
output.einfo("%s is blacklisted by rule %s" % (url, bu))
output.einfo(f"{url} is blacklisted by rule {bu}")
return []
resolved_url = helpers.parse_mirror(url)
@ -169,14 +167,15 @@ def scan_url(pkg, url, options):
if ver not in resolved_url:
newver = helpers.version_change_end_sep(ver)
if newver and newver in resolved_url:
output.einfo("Version: using %s instead of %s" % (newver, ver))
output.einfo(f"Version: using {newver} instead of {ver}")
ver = newver
template = helpers.template_from_url(resolved_url, ver)
if "${" not in template:
output.einfo(
"Url doesn't seems to depend on version: %s not found in %s"
% (ver, resolved_url)
"Url doesn't seems to depend on version: {} not found in {}".format(
ver, resolved_url
)
)
return []
else:
@ -203,12 +202,12 @@ def brute_force(pkg, url):
for bp in BRUTEFORCE_BLACKLIST_PACKAGES:
if re.match(bp, cp):
output.einfo("%s is blacklisted by rule %s" % (cp, bp))
output.einfo(f"{cp} is blacklisted by rule {bp}")
return []
for bp in BRUTEFORCE_BLACKLIST_URLS:
if re.match(bp, url):
output.einfo("%s is blacklisted by rule %s" % (cp, bp))
output.einfo(f"{cp} is blacklisted by rule {bp}")
return []
output.einfo("Generating version from " + ver)
@ -229,8 +228,7 @@ def brute_force(pkg, url):
if "${PV}" not in template:
output.einfo(
"Url doesn't seems to depend on full version: %s not found in %s"
% (ver, url)
f"Url doesn't seems to depend on full version: {ver} not found in {url}"
)
return []
else:

View File

@ -0,0 +1,70 @@
# Copyright 2020-2024 src_prepare group
# Distributed under the terms of the GNU General Public License v2
import json
import re
import portage
from euscan import helpers, mangling, output
HANDLER_NAME = "gitea"
CONFIDENCE = 100
PRIORITY = 90
# Forgejo strives to be compatible with Gitea API
# https://forgejo.org/2024-02-forking-forward/
_gitea_instances = [
"codeberg.org",
"git.osgeo.org",
"gitea.com",
"gitea.ladish.org",
"gitea.osmocom.org",
"gitea.treehouse.systems",
]
gitea_patterns = [
re.compile(rf"https://(?P<domain>{domain})/(?P<repository>[^/]+/[^/]+)")
for domain in _gitea_instances
]
def can_handle(pkg, url=None):
return url and any([re.search(pattern, url) for pattern in gitea_patterns])
def scan_url(pkg, url, options):
"https://docs.gitea.com/api/1.20/#tag/repository/operation/repoListReleases"
match = [
re.search(pattern, url)
for pattern in gitea_patterns
if re.search(pattern, url) is not None
][0]
domain = match.group("domain")
repository = match.group("repository")
output.einfo(f"Using Gitea API in {domain}: {repository}")
request = helpers.urlopen(f"https://{domain}/api/v1/repos/{repository}/releases")
data = json.load(request)
versions = [release["tag_name"] for release in data]
cp, ver, rev = portage.pkgsplit(pkg.cpv)
ret = []
for up_pv in versions:
pv = mangling.mangle_version(up_pv, options)
if helpers.version_filtered(cp, ver, pv):
continue
urls = " ".join(
mangling.mangle_url(release["tarball_url"], options)
for release in data
if release["tag_name"] == up_pv
)
ret.append((urls, pv, HANDLER_NAME, CONFIDENCE))
return ret

View File

@ -1,67 +0,0 @@
# Copyright 2011 Corentin Chary <corentin.chary@gmail.com>
# Copyright 2020-2023 src_prepare group
# Distributed under the terms of the GNU General Public License v2
import json
import re
import urllib.error
import urllib.parse
import urllib.request
import portage
from euscan import helpers, mangling, output
HANDLER_NAME = "github"
CONFIDENCE = 100
PRIORITY = 90
def can_handle(pkg, url=None):
return url and url.startswith("mirror://github/")
def guess_package(cp, url):
match = re.search("^mirror://github/(.*?)/(.*?)/(.*)$", url)
assert match
return (match.group(1), match.group(2), match.group(3))
def scan_url(pkg, url, options):
"http://developer.github.com/v3/repos/downloads/"
user, project, filename = guess_package(pkg.cpv, url)
# find out where version is expected to be found
cp, ver, rev = portage.pkgsplit(pkg.cpv)
if ver not in filename:
return
# now create a filename-matching regexp
# XXX: supposedly replace first with (?P<foo>...)
# and remaining ones with (?P=foo)
fnre = re.compile("^%s$" % re.escape(filename).replace(re.escape(ver), "(.*?)"))
output.einfo(
"Using github API for: project=%s user=%s filename=%s"
% (project, user, filename)
)
dlreq = urllib.request.urlopen(
"https://api.github.com/repos/%s/%s/downloads" % (user, project)
)
dls = json.load(dlreq)
ret = []
for dl in dls:
m = fnre.match(dl["name"])
if m:
pv = mangling.mangle_version(m.group(1), options)
if helpers.version_filtered(cp, ver, pv):
continue
url = mangling.mangle_url(dl["html_url"], options)
ret.append((url, pv, HANDLER_NAME, CONFIDENCE))
return ret

View File

@ -0,0 +1,82 @@
# Copyright 2020-2024 src_prepare group
# Distributed under the terms of the GNU General Public License v2
import json
import re
import portage
from euscan import helpers, mangling, output
HANDLER_NAME = "gitlab"
CONFIDENCE = 100
PRIORITY = 90
_gitlab_instances = [
"gitlab.com",
"gitlab.freedesktop.org",
"invent.kde.org/",
"gitlab.gnome.org",
"gitlab.kitware.com",
"gitlab.xfce.org",
"code.videolan.org",
"gitlab.xiph.org",
]
gitlab_patterns = [
# Regular expression adapted from pkgcheck
# https://docs.gitlab.com/ee/user/reserved_names.html
re.compile(
rf"https://(?P<domain>{domain})/(?P<repository>((?!api/)\w[^/]*/)+(?!raw/)\w[^/]*)"
)
for domain in _gitlab_instances
]
def can_handle(pkg, url=None):
return url and any([re.search(pattern, url) for pattern in gitlab_patterns])
def scan_url(pkg, url, options):
"https://docs.gitlab.com/ee/api/releases/index.html"
match = [
re.search(pattern, url)
for pattern in gitlab_patterns
if re.search(pattern, url) is not None
][0]
domain = match.group("domain")
repository = match.group("repository")
output.einfo(f"Using GitLab REST API in {domain}: {repository}")
request = helpers.urlopen(
f"https://{domain}/api/v4/projects/{repository.replace('/', '%2F')}/releases"
)
data = json.load(request)
versions = [release["tag_name"] for release in data]
cp, ver, rev = portage.pkgsplit(pkg.cpv)
ret = []
for up_pv in versions:
pv = mangling.mangle_version(up_pv, options)
if helpers.version_filtered(cp, ver, pv):
continue
urls = " ".join(
[
mangling.mangle_url(source["url"], options)
for source in [
release["assets"]["sources"]
for release in data
if release["tag_name"] == up_pv
][0]
# prefer tar.bz2
if source["format"] == "tar.bz2"
]
)
ret.append((urls, pv, HANDLER_NAME, CONFIDENCE))
return ret

View File

@ -20,7 +20,7 @@ HANDLER_NAME = "gnome"
CONFIDENCE = 100
PRIORITY = 90
GNOME_URL_SOURCE = "http://ftp.gnome.org/pub/GNOME/sources"
GNOME_URL_SOURCE = "https://download.gnome.org/sources"
def can_handle(_pkg, url=None):
@ -38,7 +38,7 @@ def guess_package(cp, url):
def scan_url(pkg, url, options):
"http://ftp.gnome.org/pub/GNOME/sources/"
"https://download.gnome.org/sources/"
package = {
"data": guess_package(pkg.cpv, url),
"type": "gnome",
@ -55,7 +55,7 @@ def scan_pkg(pkg, options):
content = fp.read()
fp.close()
cache = json.loads(content, encoding="ascii")
cache = json.loads(content)
if cache[0] != 4:
output.eerror("Unknow cache format detected")

View File

@ -1,47 +0,0 @@
# Copyright 2011 Corentin Chary <corentin.chary@gmail.com>
# Copyright 2020-2023 src_prepare group
# Distributed under the terms of the GNU General Public License v2
import re
import portage
from euscan import output
from euscan.handlers.url import process_scan as url_scan
from euscan.helpers import regex_from_template
HANDLER_NAME = "google-code"
CONFIDENCE = 90
PRIORITY = 90
package_name_regex = r"http://(.+).googlecode.com/files/.+"
def can_handle(pkg, url=None):
if not url:
return False
cp, ver, rev = portage.pkgsplit(pkg.cpv)
if ver not in url:
return False
return re.match(package_name_regex, url)
def scan_url(pkg, url, options):
output.einfo("Using Google Code handler")
cp, ver, rev = portage.pkgsplit(pkg.cpv)
package_name = re.match(package_name_regex, url).group(1)
base_url = "http://code.google.com/p/%s/downloads/list" % package_name
file_pattern = regex_from_template(url.split("/")[-1].replace(ver, "${PV}"))
result = url_scan(pkg, base_url, file_pattern)
ret = []
for url, pv, _, _ in result:
ret.append((url, pv, HANDLER_NAME, CONFIDENCE))
return ret

View File

@ -25,17 +25,17 @@ def clean_results(results):
def scan_url(pkg, url, options):
results = generic.scan(pkg.cpv, url)
results = generic.scan_url(pkg, url, options)
if generic.startswith("mirror://kde/unstable/"):
url = generic.replace("mirror://kde/unstable/", "mirror://kde/stable/")
results += generic.scan(pkg.cpv, url)
if url.startswith("mirror://kde/unstable/"):
url = url.replace("mirror://kde/unstable/", "mirror://kde/stable/")
results += generic.scan_url(pkg, url, options)
if not results: # if nothing was found go brute forcing
results = generic.brute_force(pkg.cpv, url)
if generic.startswith("mirror://kde/unstable/"):
url = generic.replace("mirror://kde/unstable/", "mirror://kde/stable/")
if url.startswith("mirror://kde/unstable/"):
url = url.replace("mirror://kde/unstable/", "mirror://kde/stable/")
results += generic.brute_force(pkg.cpv, url)
return clean_results(results)

View File

@ -20,7 +20,7 @@ def can_handle(pkg, url=None):
def guess_package_and_channel(cp, url):
match = re.search("http://(.*)\.php\.net/get/(.*)-(.*).tgz", url)
match = re.search(r"http://(.*)\.php\.net/get/(.*)-(.*).tgz", url)
if match:
host = match.group(1)
@ -42,7 +42,7 @@ def scan_pkg(pkg, options):
package = options["data"]
channel = options["type"]
url = "http://%s.php.net/rest/r/%s/allreleases.xml" % (channel, package.lower())
url = f"http://{channel}.php.net/rest/r/{package.lower()}/allreleases.xml"
output.einfo("Using: " + url)
@ -50,7 +50,7 @@ def scan_pkg(pkg, options):
fp = helpers.urlopen(url)
except urllib.error.URLError:
return []
except IOError:
except OSError:
return []
if not fp:
@ -69,7 +69,7 @@ def scan_pkg(pkg, options):
if helpers.version_filtered(cp, ver, pv):
continue
url = "http://%s.php.net/get/%s-%s.tgz" % (channel, package, up_pv)
url = f"http://{channel}.php.net/get/{package}-{up_pv}.tgz"
url = mangling.mangle_url(url, options)
ret.append((url, pv, HANDLER_NAME, CONFIDENCE))

View File

@ -1,11 +1,13 @@
# Copyright 2011 Corentin Chary <corentin.chary@gmail.com>
# Copyright 2020-2023 src_prepare group
# Copyright 2020-2024 src_prepare group
# Distributed under the terms of the GNU General Public License v2
import json
import re
import xmlrpc.client
import urllib.error
import portage
from packaging.version import parse
from euscan import helpers, mangling, output
@ -15,11 +17,11 @@ PRIORITY = 90
def can_handle(pkg, url=None):
return url and url.startswith("mirror://pypi/")
return url and url.startswith("https://files.pythonhosted.org/packages/source/p/")
def guess_package(cp, url):
match = re.search("mirror://pypi/\w+/(.*)/.*", url)
match = re.search(r"https://files.pythonhosted.org/packages/source/p/(.*)/.*", url)
if match:
return match.group(1)
@ -29,7 +31,7 @@ def guess_package(cp, url):
def scan_url(pkg, url, options):
"http://wiki.python.org/moin/PyPiXmlRpc"
"https://peps.python.org/pep-0691/"
package = guess_package(pkg.cpv, url)
return scan_pkg(pkg, {"data": package})
@ -38,15 +40,23 @@ def scan_url(pkg, url, options):
def scan_pkg(pkg, options):
package = options["data"]
output.einfo("Using PyPi XMLRPC: " + package)
output.einfo("Using PyPi JSON API: " + package)
client = xmlrpc.client.ServerProxy("https://pypi.python.org/pypi")
versions = client.package_releases(package)
try:
fp = helpers.urlopen(f"https://pypi.org/pypi/{package}/json/")
except urllib.error.URLError:
return []
except OSError:
return []
if not versions:
return versions
if not fp:
return []
versions.reverse()
data = json.loads(fp.read())
versions = list(data["releases"].keys())
versions.sort(key=parse, reverse=True)
cp, ver, rev = portage.pkgsplit(pkg.cpv)
@ -55,7 +65,12 @@ def scan_pkg(pkg, options):
pv = mangling.mangle_version(up_pv, options)
if helpers.version_filtered(cp, ver, pv):
continue
urls = client.release_urls(package, up_pv)
urls = " ".join([mangling.mangle_url(infos["url"], options) for infos in urls])
urls = " ".join(
[
mangling.mangle_url(file["url"], options)
for file in data["releases"][up_pv]
if file["packagetype"] == "sdist"
]
)
ret.append((urls, pv, HANDLER_NAME, CONFIDENCE))
return ret

View File

@ -1,5 +1,5 @@
# Copyright 2011 Corentin Chary <corentin.chary@gmail.com>
# Copyright 2020-2023 src_prepare group
# Copyright 2020-2024 src_prepare group
# Distributed under the terms of the GNU General Public License v2
import json
@ -18,11 +18,11 @@ PRIORITY = 90
def can_handle(pkg, url=None):
return url and url.startswith("mirror://rubygems/")
return url and url.startswith("https://rubygems.org/")
def guess_gem(cpv, url):
match = re.search("mirror://rubygems/(.*).gem", url)
match = re.search("https://rubygems.org/gems/(.*).gem", url)
if match:
cpv = "fake/%s" % match.group(1)
@ -42,7 +42,7 @@ def scan_url(pkg, url, options):
gem = guess_gem(pkg.cpv, url)
if not gem:
output.eerror("Can't guess gem name using %s and %s" % (pkg.cpv, url))
output.eerror(f"Can't guess gem name using {pkg.cpv} and {url}")
return []
output.einfo("Using RubyGem API: %s" % gem)
@ -58,7 +58,7 @@ def scan_pkg(pkg, options):
fp = helpers.urlopen(url)
except urllib.error.URLError:
return []
except IOError:
except OSError:
return []
if not fp:
@ -75,7 +75,7 @@ def scan_pkg(pkg, options):
pv = mangling.mangle_version(up_pv, options)
if helpers.version_filtered(cp, ver, pv):
continue
url = "http://rubygems.org/gems/%s-%s.gem" % (gem, up_pv)
url = f"http://rubygems.org/gems/{gem}-{up_pv}.gem"
url = mangling.mangle_url(url, options)
ret.append((url, pv, HANDLER_NAME, CONFIDENCE))
return ret

View File

@ -24,7 +24,7 @@ def can_handle(*args):
def handle_directory_patterns(base, file_pattern):
"""
r"""
Directory pattern matching
e.g.: base: ftp://ftp.nessus.org/pub/nessus/nessus-([\d\.]+)/src/
file_pattern: nessus-core-([\d\.]+)\.tar\.gz
@ -45,7 +45,7 @@ def handle_directory_patterns(base, file_pattern):
fp = helpers.urlopen(basedir)
except urllib.error.URLError:
return []
except IOError:
except OSError:
return []
if not fp:

View File

@ -83,7 +83,7 @@ def version_is_nightly(a, b):
def version_blacklisted(cp, version):
rule = None
cpv = "%s-%s" % (cp, version)
cpv = f"{cp}-{version}"
# Check that the generated cpv can be used by portage
if not portage.versions.catpkgsplit(cpv):
@ -92,10 +92,9 @@ def version_blacklisted(cp, version):
for bv in BLACKLIST_VERSIONS:
if dep.match_from_list(bv, [cpv]):
rule = bv
None
if rule:
euscan.output.einfo("%s is blacklisted by rule %s" % (cpv, rule))
euscan.output.einfo(f"{cpv} is blacklisted by rule {rule}")
return rule is not None
@ -223,7 +222,7 @@ def gen_versions(components, level):
for i in range(n, n - level, -1):
increment_version(components, i - 1)
for j in range(depth):
for _j in range(depth):
versions.append(list(components))
increment_version(components, i - 1)
@ -264,7 +263,7 @@ def urlallowed(url):
if protocol == "ftp":
return True
baseurl = "%s://%s" % (protocol, domain)
baseurl = f"{protocol}://{domain}"
robotsurl = urllib.parse.urljoin(baseurl, "robots.txt")
if baseurl in rpcache:
@ -280,7 +279,7 @@ def urlallowed(url):
try:
rp.read()
rpcache[baseurl] = rp
except IOError:
except OSError:
rp = None
setdefaulttimeout(timeout)
@ -290,7 +289,7 @@ def urlallowed(url):
def urlopen(url, timeout=None, verb="GET"):
if not urlallowed(url):
euscan.output.einfo("Url '%s' blocked by robots.txt" % url)
euscan.output.einfo(f"Url '{url}' blocked by robots.txt")
return None
if not timeout:
@ -370,7 +369,7 @@ def tryurl(fileurl, template):
except urllib.error.URLError:
result = None
except IOError:
except OSError:
result = None
euscan.output.eend(errno.ENOENT if not result else 0)
@ -383,9 +382,9 @@ def regex_from_template(template):
regexp = re.escape(template)
# Unescape specific stuff
regexp = regexp.replace("\$\{", "${")
regexp = regexp.replace("\}", "}")
regexp = regexp.replace("}\.$", "}.$")
regexp = regexp.replace(r"\$\{", "${")
regexp = regexp.replace(r"\}", "}")
regexp = regexp.replace(r"}\.$", "}.$")
# Replace ${\d+}
# regexp = regexp.replace('${0}', r'([\d]+?)')

View File

@ -19,7 +19,7 @@ from euscan.helpers import dict_to_xml
mirrors_ = None
class ProgressHandler(object):
class ProgressHandler:
def __init__(self, progress_bar):
self.curval = 0
self.maxval = 0
@ -74,7 +74,7 @@ def progress_bar():
def clean_colors(string):
if isinstance(string, str):
string = re.sub("\033\[[0-9;]+m", "", string)
string = re.sub(r"\033\[[0-9;]+m", "", string)
string = re.sub(r"\\u001b\[[0-9;]+m", "", string)
string = re.sub(r"\x1b\[[0-9;]+m", "", string)
return string
@ -90,9 +90,9 @@ def transform_url(config, cpv, url):
def to_ebuild_uri(cpv, url):
cat, pkg, ver, rev = portage.catpkgsplit(cpv)
p = "%s-%s" % (pkg, ver)
pvr = "%s%s" % (ver, "-%s" % rev if rev != "r0" else "")
pf = "%s-%s" % (pkg, pvr)
p = f"{pkg}-{ver}"
pvr = f"{ver}{f'-{rev}' if rev != 'r0' else ''}"
pf = f"{pkg}-{pvr}"
evars = (
(p, "P"),
(pkg, "PN"),
@ -140,10 +140,8 @@ def to_mirror(url):
for mirror_url in mirrors_[mirror_name]:
if url.startswith(mirror_url):
url_part = url.split(mirror_url)[1]
return "mirror://%s%s%s" % (
mirror_name,
"" if url_part.startswith("/") else "/",
url_part,
return "mirror://{}{}{}".format(
mirror_name, "" if url_part.startswith("/") else "/", url_part
)
return url
@ -154,17 +152,17 @@ class EOutputMem(EOutput):
"""
def __init__(self, *args, **kwargs):
super(EOutputMem, self).__init__(*args, **kwargs)
super().__init__(*args, **kwargs)
self.out = StringIO()
def getvalue(self):
return self.out.getvalue()
def _write(self, f, msg):
super(EOutputMem, self)._write(self.out, msg)
super()._write(self.out, msg)
class EuscanOutput(object):
class EuscanOutput:
"""
Class that handles output for euscan
"""
@ -220,7 +218,7 @@ class EuscanOutput(object):
def result(self, cp, version, urls, handler, confidence):
from euscan.version import get_version_type
cpv = "%s-%s" % (cp, version)
cpv = f"{cp}-{version}"
urls = " ".join(transform_url(self.config, cpv, url) for url in urls.split())
if self.config["format"] in ["json", "dict"]:
@ -239,13 +237,13 @@ class EuscanOutput(object):
print("Upstream Version:", pp.number("%s" % version), end=" ")
print(pp.path(" %s" % urls))
else:
print(pp.cpv("%s-%s" % (cp, version)) + ":", pp.path(urls))
print(pp.cpv(f"{cp}-{version}") + ":", pp.path(urls))
def metadata(self, key, value, show=True):
if self.config["format"]:
self.queries[self.current_query]["metadata"][key] = value
elif show:
print("%s: %s" % (key.capitalize(), value))
print(f"{key.capitalize()}: {value}")
def __getattr__(self, key):
if not self.config["quiet"] and self.current_query is not None:

View File

@ -76,14 +76,14 @@ def reload_gentoolkit():
if not hasattr(gentoolkit.package, "PORTDB"):
return
PORTDB = portage.db[portage.root]["porttree"].dbapi
portdb = portage.db[portage.root]["porttree"].dbapi
if hasattr(gentoolkit.dbapi, "PORTDB"):
gentoolkit.dbapi.PORTDB = PORTDB
gentoolkit.dbapi.PORTDB = portdb
if hasattr(gentoolkit.package, "PORTDB"):
gentoolkit.package.PORTDB = PORTDB
gentoolkit.package.PORTDB = portdb
if hasattr(gentoolkit.query, "PORTDB"):
gentoolkit.query.PORTDB = PORTDB
gentoolkit.query.PORTDB = portdb
def scan_upstream(query, on_progress=None):
@ -134,7 +134,7 @@ def scan_upstream(query, on_progress=None):
if not CONFIG["quiet"]:
if not CONFIG["format"]:
pp.uprint(" * %s [%s]" % (pp.cpv(pkg.cpv), pp.section(pkg.repo_name())))
pp.uprint(f" * {pp.cpv(pkg.cpv)} [{pp.section(pkg.repo_name())}]")
pp.uprint()
else:
output.metadata("overlay", pp.section(pkg.repo_name()))
@ -153,6 +153,9 @@ def scan_upstream(query, on_progress=None):
else:
uris = pkg.environment("SRC_URI")
# Roundabout way to handle $'' strings
uris = uris.encode("raw_unicode_escape").decode("unicode_escape")
cpv = pkg.cpv
uris = parse_src_uri(uris)

View File

@ -22,7 +22,7 @@ def get_version_type(version):
if "9999" in version or "99999999" in version:
return "live"
for token in re.findall("[\._-]([a-zA-Z]+)", version):
for token in re.findall(r"[\._-]([a-zA-Z]+)", version):
if token in gentoo_types:
types.append(token)
if types: