ASCII/Punycode TLDs for Internationalized Domains

IANA has a list of TLDs alphabetically by domain. One thing notable are domains of in the following format:

XN--VERMGENSBERATER-CTB
XN--VERMGENSBERATUNG-PWB
XN--VHQUV
XN--VUQ861B

These are internationalized TLDs, specified in a Punycode encoding for the unicode domain name. Some of these representations are just for non-ascii characters:

XN--VERMGENSBERATER-CTB,vermögensberater,wealth consultant

Others are representing fully non-Latin alphabets:

XN--FPCRJ9C3D,భారత్,India

Most of these non-Latin representations are either country designations, brand names, or a telecom-centric term.

If you want to experiment on your own, install simpleidn gem and sign up for a Rapid API key and the Microsoft Translator API (it’s a little simpler interface than navigate individual vendor APIs)

# http://data.iana.org/TLD/tlds-alpha-by-domain.txt
require 'net/http'
require 'uri'
require 'simpleidn'
require 'openssl'
require 'json'


def translate(source)
  return source if source =~ /^[A-Za-z]+$/
  # using rapidapi for somewhat uniform API access (via a single key!)
  url = URI("https://microsoft-translator-text.p.rapidapi.com/translate?to=en&api-version=3.0&profanityAction=NoAction&textType=plain")
  http = Net::HTTP.new(url.host, url.port)
  http.use_ssl = true
  http.verify_mode = OpenSSL::SSL::VERIFY_NONE

  request = Net::HTTP::Post.new(url)
  request["content-type"] = 'application/json'
  request["x-rapidapi-key"] = ENV['RAPIDAPIKEY']
  request["x-rapidapi-host"] = 'microsoft-translator-text.p.rapidapi.com'
  request.body = "[
      {
          \"Text\": \"#{source}\"
      }
  ]"

  response = http.request(request)
  read_body = JSON.parse(response.read_body)
  [read_body[0]["detectedLanguage"]["language"],read_body[0]["translations"][0]["text"]]
end

puts 'ascii/punycode TLD,Unicode version,language,translation'
File.read(File.open('tlds.txt'))
#Net::HTTP.get(URI('http://data.iana.org/TLD/tlds-alpha-by-domain.txt')) # I saved this locally
  .each_line
  .map { |a| a.strip }
  .reject {|n| n =~ /^#/ }
  .map { |a| [a, SimpleIDN.to_unicode(a), translate(SimpleIDN.to_unicode(a))].flatten }
  .reject {|n| n[1] =~ /^[A-Za-z]/ }
  .sort { |a, b| a[1].length <=> b[1].length }
  .each { |l| puts l.join(',') }


Way to go, Beshear.

Kentucky is getting a lot of love on the Tech Blogs…

Kentucky tries to seize gambling site domain names – on ArsTechnica

State of Kentucky Seizes Control of 141 Domain Names – on SlashDot

Kentucky judge moves to seize gambling sites' domains – on ValleyWag

Kentucky Governor Seizes Online Gambling Domain Names – on TechDirt

Not only is the state of Kentucky being laughed at for this massive overstepping of jurisdiction, but every single other facet of Kentucky living and business is being dragged through the mud.  Just look at the comments.  Never mind that…  just reading the articles is enough.