User:Cmglee/extract lang.py

From Wikipedia, the free encyclopedia

This Python3 script by user:cmglee extracts and writes a monolingual SVG file from a multilingual SVG file, to let a language version be previewed in a Web browser during its development. (The alternative way to view a non-default language in a multilingual SVG in a Web browser is to install and change the language of the browser and restart it.)

Output filenames are the original filename with "-<ISO639_CODE>" added before the last ".".

Usage[edit]

python3 extract_lang.py <SVG_FILENAME>  [<ISO639_CODE> <ISO639_CODE> ...]

ISO639_CODE is as listed on commons:template:list of supported languages. If no codes are provided, all languages found in the file (and the default) are output.

Source code[edit]

As Wikimedia does not allow general executable files to be uploaded, the source code is provided below

#!/usr/bin/env python3
## Extract and write a monolingual SVG from a multilingual SVG, to preview in a browser, by CMG Lee.
## Usage: python3 extract_lang.py <SVG_FILENAME>  [<ISO639_CODE> <ISO639_CODE> ...] (all if omitted)
import re, sys
def extract_lang(svg_all, lang):
 svg_langs    = {} ## svg_langs[code] = source
 svg_currents = [] ## current language content
 level        = 1  ## DOM level under switch
 for svg_part in re.findall(r'.*?>', svg_all, flags=re.DOTALL):
  svg_currents.append(svg_part)
  if       re.findall(r'<\s*/', svg_part): level -= 1
  elif not re.findall(r'/\s*>', svg_part): level += 1
  if level == 1:
   findall_lang = re_lang.findall(svg_currents[0])
   lang_current = findall_lang[0] if len(findall_lang) > 0 else None
   svg_langs[lang_current] = ''.join(svg_currents)
   svg_currents            = []
 return re_lang.sub('', svg_langs[lang] if lang in svg_langs else svg_langs[None])

re_lang = re.compile(r'\s*systemLanguage\s*=\s*"\s*([^\s"]+)"', flags=re.I)
path_in = sys.argv[1]
with open(path_in, encoding='utf-8', newline='') as f: svg_in = f.read()
for lang in sys.argv[2:] if len(sys.argv) > 2 else set(re_lang.findall(svg_in) + ['default']):
 path_out = re.sub(r'(\..+?)$', r'-%s\1' % (lang), path_in, flags=re.DOTALL)
 print(path_out)
 svg_out = re.sub(r'(<\s*switch[^>]*>)(.*?)(\s*<\s*/\s*switch[^>]*>)',
                  lambda matchs:extract_lang(matchs.group(2), lang),
                  re.sub(r'<!--.*?-->', '', svg_in), flags=re.I | re.DOTALL)
 with open(path_out, 'w', encoding='utf-8', newline='') as f: f.write(svg_out)