Not sure if it is just me but something I would find VERY helpful would be to copy the urlEncode, urlDecode and httpStripHTML functions from the Winsock extender to the main WinBatch DLL. Not sure why I haven't asked before (maybe I have???) but I have to include the Winsock Extender to projects for no other reason than these three functions on a regular basis. Would be nice to have access to them directly.
Thanks.
Jim
Does this help?
#DefineFunction JS_encodeURI(uri)
o=CreateObject(`MSScriptControl.ScriptControl`)
o.Language=`JScript`
Ret=o.Eval(:`encodeURI("`:StrReplace(uri,`"`,`\"`):`")`)
Return StrCat(Ret)
#EndFunction
#DefineFunction JS_decodeURI(enc)
o=CreateObject(`MSScriptControl.ScriptControl`)
o.Language=`JScript`
Ret=o.Eval(:`decodeURI("`:enc:`")`)
Return StrCat(Ret)
#EndFunction
Thanks.
Thanks!
For us burger flippers there is always:
ObjectClrOption("useany","System.Web")
objHttpUtil = ObjectClrNew("System.Web.HttpUtility")
strUrl ="name=Joe Smoe&discount=10%%"
strEncoded = objHttpUtil.UrlEncode(strUrl)
strDecoded = objHttpUtil.UrlDecode(strEncoded)
You can always role your own:
;; Not debugged!
#DefineFunction UrlEncode(_strUrl)
aUrl = ArrayFromStr(_strURl)
nMax = ArrInfo(aUrl, 1) - 1
strReturn = ''
for i = 0 to nMax
switch 1
case ' ' == aUrl[i]
strReturn := '+'
break
case '@' == aUrl[i]
case '*' == aUrl[i]
case '_' == aUrl[i]
case '-' == aUrl[i]
case '.' == aUrl[i]
strReturn := aUrl[i]
break
case 1
if StrTypeInfo(aUrl[i], 0) & 260 ; Digit or alpha.
strReturn := aUrl[i]
else
strReturn := "%%":ChrStringToHex(aUrl[i])
endif
endswitch
next
return strReturn
#EndFunction
strUrl ="name=Joe Smoe&discount=10%%"
strEncoded = UrlEncode(strUrl)
Forgot the decode function:
;; Not debugged!
#DefineFunction UrlDecode(_strUrl)
aUrl = ArrayFromStr(_strURl)
nMax = ArrInfo(aUrl, 1) - 1
strReturn = ''
for i = 0 to nMax
switch 1
case '+' == aUrl[i]
strReturn := ' '
break
case '%' == aUrl[i]
if i + 2 > nMax then return '' ; Invalid URL.
i += 1 ; Unwound loop for speed.
strHex = aUrl[i]
i += 1
strHex := aUrl[i]
nMode = ErrorMode(@off)
LastError()
strReturn := ChrHexToString(strHex)
ErrorMode(nMode)
if LastError() then return '' ; Invalid URL.
break
case 1
strReturn := aUrl[i]
endswitch
next
return strReturn
#EndFunction
This UDF doesn't handle Javascript but then neither does httpStripHTML
;; Not debugged!
#DefineFunction HtmlStrip(_strHtml)
strReturn = ''
nIndex = 1
nLt = StrIndex(_strHtml, '<', nIndex, @Fwdscan)
nGt = StrIndex(_strHtml, '>', nIndex, @Fwdscan)
while 1
if (!nLt && nGt) || (nGt && (nGt < nLt) ) ; > before any <.
nIndex = nGt + 1
nGt = StrIndex(_strHtml, '>', nIndex, @Fwdscan)
elseif nLt && nGt && (nLt < nGt) ; < before >.
strReturn := StrSub(_strHtml, nIndex, nLt - nIndex)
nIndex = nGt + 1
nLt = StrIndex(_strHtml, '<', nIndex, @Fwdscan)
nGt = StrIndex(_strHtml, '>', nIndex, @Fwdscan)
elseif nLt && (!nGt) ; < but no >
strReturn := StrSub(_strHtml, nIndex, nLt - nIndex)
break
else ; Only plan text left.
strReturn := StrSub(_strHtml, nIndex, -1)
break
endif
endwhile
return strReturn
#EndFunction
strHtml = FileGet('C:\website\index.html')
strStripped = HtmlStrip(strHtml)
Pause('Stripped HTML',strStripped)
Thanks.
Jim
Quote from: td on May 21, 2017, 09:57:09 AM
This UDF doesn't handle Javascript
Interesting. Be nice to start a conversation about what is important with web-scraping.
Quote from: JTaylor on May 21, 2017, 11:38:15 AM
Thanks.
Noticed a glaring flaw in the UrlDecode UDF. The line
if i > nMax + 2 then return '' ; Invalid URL.
should be
if i + 2 > nMax then return '' ; Invalid URL.
Sorry about that.
Got it. Thanks!