ASCII/Punycode TLDs for Internationalized Domains


IANA has a list of TLDs alphabetically by domain. One thing notable are domains of in the following format:

XN--VERMGENSBERATER-CTB
XN--VERMGENSBERATUNG-PWB
XN--VHQUV
XN--VUQ861B

These are internationalized TLDs, specified in a Punycode encoding for the unicode domain name. Some of these representations are just for non-ascii characters:

XN--VERMGENSBERATER-CTB,vermögensberater,wealth consultant

Others are representing fully non-Latin alphabets:

XN--FPCRJ9C3D,భారత్,India

Most of these non-Latin representations are either country designations, brand names, or a telecom-centric term.

If you want to experiment on your own, install simpleidn gem and sign up for a Rapid API key and the Microsoft Translator API (it’s a little simpler interface than navigate individual vendor APIs)

# http://data.iana.org/TLD/tlds-alpha-by-domain.txt
require 'net/http'
require 'uri'
require 'simpleidn'
require 'openssl'
require 'json'


def translate(source)
  return source if source =~ /^[A-Za-z]+$/
  # using rapidapi for somewhat uniform API access (via a single key!)
  url = URI("https://microsoft-translator-text.p.rapidapi.com/translate?to=en&api-version=3.0&profanityAction=NoAction&textType=plain")
  http = Net::HTTP.new(url.host, url.port)
  http.use_ssl = true
  http.verify_mode = OpenSSL::SSL::VERIFY_NONE

  request = Net::HTTP::Post.new(url)
  request["content-type"] = 'application/json'
  request["x-rapidapi-key"] = ENV['RAPIDAPIKEY']
  request["x-rapidapi-host"] = 'microsoft-translator-text.p.rapidapi.com'
  request.body = "[
      {
          \"Text\": \"#{source}\"
      }
  ]"

  response = http.request(request)
  read_body = JSON.parse(response.read_body)
  [read_body[0]["detectedLanguage"]["language"],read_body[0]["translations"][0]["text"]]
end

puts 'ascii/punycode TLD,Unicode version,language,translation'
File.read(File.open('tlds.txt'))
#Net::HTTP.get(URI('http://data.iana.org/TLD/tlds-alpha-by-domain.txt')) # I saved this locally
  .each_line
  .map { |a| a.strip }
  .reject {|n| n =~ /^#/ }
  .map { |a| [a, SimpleIDN.to_unicode(a), translate(SimpleIDN.to_unicode(a))].flatten }
  .reject {|n| n[1] =~ /^[A-Za-z]/ }
  .sort { |a, b| a[1].length <=> b[1].length }
  .each { |l| puts l.join(',') }


%d bloggers like this: