Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please use publicsuffix.org for whois suffixes #123

Open
hserus1477 opened this issue Jun 8, 2016 · 8 comments
Open

Please use publicsuffix.org for whois suffixes #123

hserus1477 opened this issue Jun 8, 2016 · 8 comments

Comments

@hserus1477
Copy link

Several new TLDs / 2nd level domains on old ccTLDs aren't supported. Please use the public suffix list to stay current. Several libraries that parse domain names (eg: tldextract) use it. You can just regularly fetch this from https://publicsuffix.org/list/public_suffix_list.dat every so often for your build - or possibly fetch it whenever pythonwhois is imported.

import pythonwhois;
domain=('google.com.ar')
details=pythonwhois.get_whois(domain)
Traceback (most recent call last):
File "", line 1, in
File "/Library/Python/2.7/site-packages/pythonwhois/init.py", line 4, in get_whois
raw_data, server_list = net.get_whois_raw(domain, with_server_list=True)
File "/Library/Python/2.7/site-packages/pythonwhois/net.py", line 33, in get_whois_raw
target_server = get_root_server(domain)
File "/Library/Python/2.7/site-packages/pythonwhois/net.py", line 82, in get_root_server
raise shared.WhoisException("No root WHOIS server found for domain.")
pythonwhois.shared.WhoisException: No root WHOIS server found for domain.

@Sir-Fenrir
Copy link

The page you linked does not contain any WHOIS servers. If IANA WHOIS does not return a proper WHOIS server, then it becomes very difficult to automatically find the correct one.

@hserus1477
Copy link
Author

On 08-Jun-2016, at 7:06 PM, MasterFenrir notifications@github.com wrote:

The page you linked does not contain any WHOIS servers. If IANA WHOIS does not return a proper WHOIS server, then it becomes very difficult to automatically find the correct one.

Yes - but it gives you a TLD that you should be able to use. After which ..

For legacy gTLDs and ccTLDs - https://raw.githubusercontent.com/rfc1036/whois/next/tld_serv_list

And for the new TLDs they’re standardized - per icann mandate - on whois.nic.$TLD
So - https://github.com/rfc1036/whois/blob/next/new_gtlds_list (and that keeps getting updated)

Then this lot - nic handles - https://github.com/rfc1036/whois/blob/next/nic_handles_list

The charsets different whois servers use -
https://github.com/rfc1036/whois/blob/next/servers_charset_list

@Sir-Fenrir
Copy link

Do you have any examples that show that iana.org/whois does not return the proper WHOIS servers for these domains? And then I mean cases where a WHOIS server does exist.

@hserus1477
Copy link
Author

On 08-Jun-2016, at 7:42 PM, MasterFenrir notifications@github.com wrote:

Do you have any examples that show that ianaorg/whois does not return the proper WHOIS servers for these domains? And then I mean cases where a WHOIS server does exist.

Using IANA whois is of course mostly effective - but it is much more efficient to cache whois data locally (and keep it regularly updated).

Won’t quite make a difference when you’re doing a whois for one or two domains, but if that was all someone needed to whois, he’d hardly use a python module for it ::)

Besides whois.iana.org is also quite well rate limited.

Still - there ARE apparently issues finding root servers

#110
No root server found for .com.ar .com.mr .com.vn #110

And you do need to handle different charsets found in various whois results that contain non english characters, and the mapping I showed you in my earlier reply helps there.

#118
Parse error on UTF8 domain-reports #118

#112
.cz parsing errors #112

#111
Decoding .com.pt or .cl 'utf8' codec can't decode byte [...] invalid continuation byte #111

—srs

@Sir-Fenrir
Copy link

.com.ar is because there is no active WHOIS server, only a website offering the service. But yes, caching the data locally is useful. Rate limiting is actually something I'm working on in my fork. I don't have the time to work on charsets sadly, maybe that will be in my fork one day.

@hserus1477
Copy link
Author

On 08-Jun-2016, at 7:57 PM, MasterFenrir notifications@github.com wrote:

.com.ar http://com.ar/ is because there is no active WHOIS server, only a website offering the service. But yes, caching the data locally is useful. Rate limiting is actually something I'm working on in my fork. I don't have the time to work on charsets sadly, maybe that will be in my fork one day.

Yup that is why I gave you a neat static map of the charsets involved :)

Check re com.ar http://com.ar/ - I didnt validate that one, and yes some TLDs don’t have whois servers.

Working around rate limits might get you blocked even further or acl’d off from making whois requests if you’re on a static IP.

So - spreading the load around by querying whois servers directly when you know about them should be workable - and especially when there’s a default whois.nic.$TLD for all new TLDs and several of the legacy ones?

thanks
—srs

@Sir-Fenrir
Copy link

iana is very lenient in the WHOIS requests. Querying for every TLD once is very doable and would result in a local copy of the data. I actually already have that, but then in a separate project and not in Python.

I can't really check that website for a WHOIS server, I can't speak Spanish, sorry!

@hserus1477
Copy link
Author

On 08-Jun-2016, at 8:06 PM, MasterFenrir notifications@github.com wrote:

iana is very lenient in the WHOIS requests. Querying for every TLD once is very doable and would result in a local copy of the data. I actually already have that, but then in a separate project and not in Python.
I can't really check that website for a WHOIS server, I can't speak Spanish, sorry!

My bad. I meant “check” as in “check mark” or “ok”. Should have been clearer.

Leniency on iana’s part is a very good thing but the network delay of a few seconds tends to add up too.

Local caching and frequent updates are king :)

Those lists I posted are from the actively maintained - and extremely popular - c whois client written by Marco d'Itri by the way, its been around for over a decade now as an excellent alternative to the default whois client on most *nix and *nix based (such as mac) distributions

—srs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants