IANA has a list of TLDs alphabetically by domain. One thing notable are domains of in the following format:
XN--VERMGENSBERATER-CTB XN--VERMGENSBERATUNG-PWB XN--VHQUV XN--VUQ861B
These are internationalized TLDs, specified in a Punycode encoding for the unicode domain name. Some of these representations are just for non-ascii characters:
XN--VERMGENSBERATER-CTB,vermögensberater,wealth consultant
Others are representing fully non-Latin alphabets:
XN--FPCRJ9C3D,భారత్,India
Most of these non-Latin representations are either country designations, brand names, or a telecom-centric term.
If you want to experiment on your own, install simpleidn
gem and sign up for a Rapid API key and the Microsoft Translator API (it’s a little simpler interface than navigate individual vendor APIs)
# http://data.iana.org/TLD/tlds-alpha-by-domain.txt require 'net/http' require 'uri' require 'simpleidn' require 'openssl' require 'json' def translate(source) return source if source =~ /^[A-Za-z]+$/ # using rapidapi for somewhat uniform API access (via a single key!) url = URI("https://microsoft-translator-text.p.rapidapi.com/translate?to=en&api-version=3.0&profanityAction=NoAction&textType=plain") http = Net::HTTP.new(url.host, url.port) http.use_ssl = true http.verify_mode = OpenSSL::SSL::VERIFY_NONE request = Net::HTTP::Post.new(url) request["content-type"] = 'application/json' request["x-rapidapi-key"] = ENV['RAPIDAPIKEY'] request["x-rapidapi-host"] = 'microsoft-translator-text.p.rapidapi.com' request.body = "[ { \"Text\": \"#{source}\" } ]" response = http.request(request) read_body = JSON.parse(response.read_body) [read_body[0]["detectedLanguage"]["language"],read_body[0]["translations"][0]["text"]] end puts 'ascii/punycode TLD,Unicode version,language,translation' File.read(File.open('tlds.txt')) #Net::HTTP.get(URI('http://data.iana.org/TLD/tlds-alpha-by-domain.txt')) # I saved this locally .each_line .map { |a| a.strip } .reject {|n| n =~ /^#/ } .map { |a| [a, SimpleIDN.to_unicode(a), translate(SimpleIDN.to_unicode(a))].flatten } .reject {|n| n[1] =~ /^[A-Za-z]/ } .sort { |a, b| a[1].length <=> b[1].length } .each { |l| puts l.join(',') }