Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate

URL Encoding and Decoding of Special Characters


  • Please log in to reply
19 replies to this topic
ESUHSD
  • Guests
  • Last active:
  • Joined: --
I am trying to write a simple function to encode and decode special characters in URLs.

For example if given the following URL:

http://www.someplace...der/2nd-folder/

The function would return:

http://www.someplace.com/a folder/2nd-folder/

or vice-versa.

Here's what I have so far:
AutoTrim, Off
url_temp_clip = %clipboard%
Send, ^c
clipboard := RegExReplace(clipboard, "\%([0-9A-F]{2})" , hex_to_dec("0x$1"))
Send, ^v
clipboard = %url_temp_clip%
AutoTrim, On

hex_to_dec(x)
{
   Loop
      If RegExMatch(x, "i)(.*)(0x[a-f\d]*)(.*)", y)
         x := y1 . y2+0 . y3           ; convert hex numbers to decimal
      Else Break
	return x
}

In Perl the code would be something like:
$str =~ s/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg; # Encode string
$str =~ s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg; # Decode

But I can't seem to get the same thing done in AHK. Any help would be much appreciated.

engunneer
  • Moderators
  • 9162 posts
  • Last active: Sep 12 2014 10:36 PM
  • Joined: 30 Aug 2005
I have seen scripts for this - did you search the forum?

ESUHSD
  • Guests
  • Last active:
  • Joined: --
Yes. I tried searching for things like:

URL encode
URL decode
Hex decode
convert URL

Found some things that were similar but not what I was looking for.

polyethene
  • Members
  • 5519 posts
  • Last active: May 17 2015 06:39 AM
  • Joined: 26 Oct 2012
Here are two functions:

uriDecode(str) {
	Loop
		If RegExMatch(str, "i)(?<=%)[\da-f]{1,2}", hex)
			StringReplace, str, str, `%%hex%, % Chr("0x" . hex), All
		Else Break
	Return, str
}

uriEncode(str) {
	f = %A_FormatInteger%
	SetFormat, Integer, Hex
	If RegExMatch(str, "^\w+:/{0,2}", pr)
		StringTrimLeft, str, str, StrLen(pr)
	StringReplace, str, str, `%, `%25, All
	Loop
		If RegExMatch(str, "i)[^\w\.~%]", char)
			StringReplace, str, str, %char%, % "%" . Asc(char), All
		Else Break
	SetFormat, Integer, %f%
	Return, pr . str
}
e.g. MsgBox, % uriDecode("http://www.someplace.com/a%20folder/2nd%2Dfolder/")

autohotkey.com/net Site Manager

 

Contact me by email (polyethene at autohotkey.net) or message tidbit


ESUHSD
  • Guests
  • Last active:
  • Joined: --
Exactly what I'm looking for. Thanks for the quick response!

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
@Titan. Nice pair of useful functions. :)

ESUHSD
  • Guests
  • Last active:
  • Joined: --
I made some modifications to get the shortcut I was trying to create. Here is the code in case anyone is interested.
^!+5::
	AutoTrim, Off
	url_temp_clip = %clipboard%
	Send, ^c
	IfInString, clipboard, `%
	{
		clipboard := uriDecode(clipboard)
	}
	else
	{
		clipboard := uriEncode(clipboard)
	}
	Send, ^v
	clipboard = %url_temp_clip%
	AutoTrim, On
return

uriDecode(str)
{
	; Find uri encoded characters such as %20 (space) and replace with ascii character

	pos = 1
	Loop
		If pos := RegExMatch(str, "i)(?<=%)[\da-f]{2}", hex, pos++)
			StringReplace, str, str, `%%hex%, % Chr("0x" . hex), All
		Else Break
	Return, str
}

uriEncode(str)
{
	; Replace characters with uri encoded version except for letters, numbers,
	; and the following: /.~:&=-

	f = %A_FormatInteger%
	SetFormat, Integer, Hex
	pos = 1
	Loop
		If pos := RegExMatch(str, "i)[^\/\w\.~`:%&=-]", char, pos++)
			StringReplace, str, str, %char%, % "%" . Asc(char), All
		Else Break
	SetFormat, Integer, %f%
	StringReplace, str, str, 0x, , All
	Return, str
}


Ageless
  • Members
  • 2 posts
  • Last active: May 15 2007 03:55 AM
  • Joined: 15 May 2007
Strange, when I run the uriEncode() function that Titan posted, I had the following problems:

1) It was encoding characters that didn't need to be encoded. In my case, these were colons (:) and forward slashes (/).

2) It was outputting hex character numbers with a 0x at the beginning, which is consistent with what the SetFormat documentation says regarding hexadecimal format:

Hexadecimal numbers all start with the prefix 0x (e.g. 0xA9).


So I modified the function slightly, here is the result:

uriEncode(str) {
   f = %A_FormatInteger%
   SetFormat, Integer, Hex
   If RegExMatch(str, "^\w+:/{0,2}", pr)
      StringTrimLeft, str, str, StrLen(pr)
   StringReplace, str, str, `%, `%25, All
   Loop
      If RegExMatch(str, "i)[^\w\.~%/:]", char)
         StringReplace, str, str, %char%, % "%" . SubStr(Asc(char),3), All
      Else Break
   SetFormat, Integer, %f%
   Return, pr . str
}

The second RegExMatch() now ignores colons and forward slashes, and the Asc() function is now wrapped with a SubStr that strips the extra characters.

Please note that I made no attempt at a universal fix: I just modified the parts that were causing problems in my specific and very limited usage of this function.

Thanks for posting the original code, Titan!

PS: I'm a AutoHotKey noob, so I'm probably missing something obvious that would explain why I was having problems with the function in the first place. I'm just posting this in case someone else has the same problems that I was having.

JoeSchmoe as guest
  • Guests
  • Last active:
  • Joined: --
Hello,

I'm having trouble figuring out one of the two regular expressions that Titan used in his code. I'd like to modify his code somewhat, and I need to figure out what is going on first.

The code in question is at the heart of the encoding function:
If RegExMatch(str, "i)[^\w\.~%]", char) 
   StringReplace, str, str, %char%, % "%" . Asc(char), All 

It looks to me like that regular expression would match any single character that isn't whitespace, a period, a ~, or a %. Wouldn't this catch every single alphanumeric character?

It seems like better code would be to insert an escape character before the caret to remove its special meaning:
If RegExMatch(str, "i)[\^\w\.~%]", char) 
   StringReplace, str, str, %char%, % "%" . Asc(char), All 

Am I missing something? Was Titan's code for an older version of the PCRE lib in which carets didn't have a special meaning?

sinkfaze
  • Moderators
  • 6367 posts
  • Last active:
  • Joined: 18 Mar 2008

It looks to me like that regular expression would match any single character that isn't whitespace, a period, a ~, or a %.


That should be any single character that isn't a word character (isn't an alphanumeric character), a literal period, a ~ or a %.

iamattamai
  • Members
  • 3 posts
  • Last active: Feb 19 2012 11:18 PM
  • Joined: 06 Nov 2010
I am trying to convert a string of csv text to URL-format to be posted using uriEncode and httpQuery together.
I am able to post simple strings using the code below, but not an example like shown -- I suspect it might be the % signs???
Admittedly a newbie to ahk and assistance much appreciated.

#noenv 
uriEncode(str) 
{ 
   ; Replace characters with uri encoded version except for letters, numbers, 
   ; and the following: /.~:&=- 

   f = %A_FormatInteger% 
   SetFormat, Integer, Hex 
   pos = 1 
   Loop 
      If pos := RegExMatch(str, "i)[^\/\w\.~`:%&=-]", char, pos++) 
         StringReplace, str, str, %char%, % "%" . Asc(char), All 
      Else Break 
   SetFormat, Integer, %f% 
   StringReplace, str, str, 0x, , All 
   Return, str 
}

estring = 2010,11,5,18,0,55,"ROC177262","CPSOS - Search - Search",""2010,11,5,18,1,0,"ROC177262","Logon 

status",""2010,11,5,18,1,6,"ROC177262","AT&T U-Verse CRM Customer Interaction Manager : Release 14 - 

csrPG4cmem105",""2010,11,5,18,1,9,"ROC177262","AT&T Wireline - Synchronoss Technologies, Inc. - Windows Inter - \\Remote, 128-bit SSL/TLS.",""

newstring = % uriEncode(estring)
msgbox, %newstring% ;valid conversion confirmed here

html     := "" 
URL      := "http://www.mysite.com/act_raw_upload.cfm" 
POSTData := "raw_data= %newstring%"

length := httpQuery(html,URL,POSTdata) 
varSetCapacity(html,-1) 
    
#include httpQuery.ahk


VxE
  • Moderators
  • 3622 posts
  • Last active: Dec 24 2015 02:21 AM
  • Joined: 07 Oct 2006
Instead of
POSTData := "raw_data= %newstring%"
try
POSTData := "raw_data=" newstring

and take a look at FAQ: When exactly are variable names enclosed in percent signs?

iamattamai
  • Members
  • 3 posts
  • Last active: Feb 19 2012 11:18 PM
  • Joined: 06 Nov 2010
Many thanks VxE.
I also found that switching to the later post/mod of the uriEncoder by Ageless sealed the deal.

One more humble request: How would you modify the Ageless code below to encode CR/LF characters in the source string? I'm a challenged with code as terse as this stuff.

uriEncode(str) { 
   f = %A_FormatInteger% 
   SetFormat, Integer, Hex 
   If RegExMatch(str, "^\w+:/{0,2}", pr) 
      StringTrimLeft, str, str, StrLen(pr) 
   StringReplace, str, str, `%, `%25, All 
   Loop 
      If RegExMatch(str, "i)[^\w\.~%/:]", char) 
         StringReplace, str, str, %char%, % "%" . SubStr(Asc(char),3), All 
      Else Break 
   SetFormat, Integer, %f% 
   Return, pr . str 
}


panofish
  • Members
  • 179 posts
  • Last active: Apr 24 2014 03:24 PM
  • Joined: 05 Feb 2007
I found a small bug in the earlier code. I was using it to encode strings for a slightly different purpose than just url encoding.

The bug: any character that encodes as a single hex character (like linefeed = 0xA) is not zero filled and should be 0x0A so that it will decode correctly later.

Here is the change I made to ensure the hex value is zerofilled. Others may find a better or more efficient way, but this does work when you have newline characters.

;============================================================
; encode special characters in a string (usually for url encoding)
;============================================================ 

fn_encode(str) {
   f = %A_FormatInteger%
   SetFormat, Integer, Hex   ; set integer format to hex
   
   If RegExMatch(str, "^\w+:/{0,2}", pr)   
      StringTrimLeft, str, str, StrLen(pr)
   
   StringReplace, str, str, `%, `%25, All    ; replace all % with %25
   
   Loop
      If RegExMatch(str, "i)[^\w\.~%/:]", char)    ; exclude alphnumeric . ~ % / : 
         StringReplace, str, str, %char%, % "%" . fn_zerofill(SubStr(Asc(char),3),2) , All
      Else Break
   
   SetFormat, Integer, %f%   ; restore integer format
   Return, pr . str
}

;============================================================
; decode encoded string
;============================================================ 

fn_decode(str) {
    Loop
        If RegExMatch(str, "i)(?<=%)[\da-f]{1,2}", hex)
            StringReplace, str, str, `%%hex%, % Chr("0x" . hex), All
        Else Break
    Return, str
}

;-------------------------------------
; example call to zerofill
; n := zerofill(n, 3)
;-------------------------------------

fn_zerofill(num, size){   ; returns num zerofilled to size digits
    StringLen, length, num
    c := size - length 
    loop, %c%
        num := "0" num
    return num
}


fragman
  • Members
  • 1591 posts
  • Last active: Nov 12 2012 08:51 PM
  • Joined: 13 Oct 2009
These functions apparently have problems when it comes to umlauts or other characters that can be written as a two character sequence of an accent key and the vocal.
This function works correctly and produces two character sequences for these cases but does not work on x64:
UriEncode(Uri, full = 0)
{
    oSC := ComObjCreate("ScriptControl")
    oSC.Language := "JScript"
    Script := "var Encoded = encodeURIComponent(""" . Uri . """)"
    oSC.ExecuteStatement(Script)
    encoded := oSC.Eval("Encoded")
    Return encoded
}
Anyone got a better solution that always works?