User:PrimeBOT/Task 17

From Wikipedia, the free encyclopedia

Status and updates for Task 17

List of params[edit]

Bugs to fix/patches to make[edit]

  • Parameter order matters? Found a few instances where &a=___?b=___ worked but not &b=___?a=____
  • Avoid removing --> if stuck to the end of the URL

Regex updates[edit]

because these things are boring

Original

  • \??(?:&?utm_[^=]*?=[^&\s\]\|]*)+(?=]|\s|\|)|(?<=\?)(?:&?utm_[^=]*?=[^&\s\]\|]*)+&

27 May (BRFA trial) - add green code to catch utm_ params in the middle, and catching more end-of-URL possibilities

  • \??(?:&?utm_[^=\s]*?=[^&\s\]\|]*?)+(?=}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&

7 June (catch ref tags) - add < to end-of-check exceptions

  • \??(?:&?utm_[^=\s]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&

8 June (catch malformed utm_ params) - utm_ must be followed by text and an =

  • \??(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&

10 June (avoid web archive links)

  • (?<!https://web.archive.org[\S]+)(\??(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&)

1 July (avoid _utms just hanging out in text)

  • (?<!https://web.archive.org[\S]+|\||\s)(\??(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&)