IANA has a list of TLDs alphabetically by domain. One thing notable are domains of in the following format:
XN--VERMGENSBERATER-CTB
XN--VERMGENSBERATUNG-PWB
XN--VHQUV
XN--VUQ861B
These are internationalized TLDs, specified in a Punycode encoding for the unicode domain name. Some of these representations are just for non-ascii characters:
XN--VERMGENSBERATER-CTB,vermögensberater,wealth consultant
Others are representing fully non-Latin alphabets:
XN--FPCRJ9C3D,భారత్,India
Most of these non-Latin representations are either country designations, brand names, or a telecom-centric term.
If you want to experiment on your own, install simpleidn
gem and sign up for a Rapid API key and the Microsoft Translator API (it’s a little simpler interface than navigate individual vendor APIs)
# http://data.iana.org/TLD/tlds-alpha-by-domain.txt
require 'net/http'
require 'uri'
require 'simpleidn'
require 'openssl'
require 'json'
def translate(source)
return source if source =~ /^[A-Za-z]+$/
# using rapidapi for somewhat uniform API access (via a single key!)
url = URI("https://microsoft-translator-text.p.rapidapi.com/translate?to=en&api-version=3.0&profanityAction=NoAction&textType=plain")
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Post.new(url)
request["content-type"] = 'application/json'
request["x-rapidapi-key"] = ENV['RAPIDAPIKEY']
request["x-rapidapi-host"] = 'microsoft-translator-text.p.rapidapi.com'
request.body = "[
{
\"Text\": \"#{source}\"
}
]"
response = http.request(request)
read_body = JSON.parse(response.read_body)
[read_body[0]["detectedLanguage"]["language"],read_body[0]["translations"][0]["text"]]
end
puts 'ascii/punycode TLD,Unicode version,language,translation'
File.read(File.open('tlds.txt'))
#Net::HTTP.get(URI('http://data.iana.org/TLD/tlds-alpha-by-domain.txt')) # I saved this locally
.each_line
.map { |a| a.strip }
.reject {|n| n =~ /^#/ }
.map { |a| [a, SimpleIDN.to_unicode(a), translate(SimpleIDN.to_unicode(a))].flatten }
.reject {|n| n[1] =~ /^[A-Za-z]/ }
.sort { |a, b| a[1].length <=> b[1].length }
.each { |l| puts l.join(',') }