Parsing WB Tech Support Articles

Started by spl, January 06, 2025, 04:56:21 AM

Previous topic - Next topic

spl

Noticed my text file holding WB's Tech Support article links now has over 140,000 entries, I decided to play around with parsing the outer text which contains both explanation and usually code for the topic. Last time I played with that was circa 2004 with IE [no longer an option]. File with uri links is around 17 mb so cannot post, but looking for an option to host for any interested.

Below script has 2 test uri's, the parsing is not perfect but sufficient. Could be done in pure WB but easy to accomplish with my Stdout() function and PS. Would appreciate at least a pass/fail test.
;Parse WB Tech Support Articles
;save as ..\StdOut_TechSupport.wbt
;Stan Littlefield 1/6/2025
;==========================================================
Gosub udfs
;select test uri from below
uri = 'https://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+WinBatch/64-bit+File~Redirection.txt'
;or
;uri = "https://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+WinBatch/How~To+Delete~a~Large~Number~of~Files.txt"

file = "C:\temp\techspt.txt"
args = $"
$result = Invoke-WebRequest -Uri "|uri|" -Method Get
$text = $result.AllElements | Where {$_.TagName -eq 'body'} | Select outerText | ConvertTo-Csv | Out-String
$start = ($text | Select-String "keywords").Matches.Index
$end = ($text | Select-String "Article").Matches.Index
$text.SubString($start,$end-$start) | Out-File -FilePath |file| -Force
$"
args = StrReplace(args,"|uri|",uri)
args = StrReplace(args,"|file|",file)
;Message("",args)
;Exit
cmd="Powershell"
msg='WB Tech Support'

BoxOpen("Running...",cmd:" ":args:@LF:"PLEASE WAIT...MAY TAKE SOME TIME")
TimeDelay(2)
vals = Get_stdout():@LF:"Script Completed"
Display(2,msg,vals) 

If FileExist(file) Then Run("notepad.exe",file)
Exit
;==========================================================
:udfs
#DefineSubroutine Get_stdout()
ObjectClrOption("useany","System")
objInfo = ObjectClrNew("System.Diagnostics.ProcessStartInfo")
Output=""
timeOut = ObjectType("I2",5000)
objInfo.FileName = cmd
objInfo.RedirectStandardError = ObjectType("BOOL",@TRUE)
objInfo.RedirectStandardOutput = ObjectType("BOOL",@TRUE)
objInfo.UseShellExecute = ObjectType("BOOL",@FALSE)
objInfo.CreateNoWindow = ObjectType("BOOL",@TRUE)
objInfo.Arguments = args
oProcess = ObjectClrNew("System.Diagnostics.Process")
oProcess.StartInfo = objInfo
BoxShut()
oProcess.Start()
oProcess.WaitForExit(timeout)
STDOUT = oProcess.StandardOutput.ReadToEnd()
STDERR = oProcess.StandardError.ReadToEnd()
Output = Output:STDOUT:@CRLF
If STDERR<>""
   Output = Output:"STDERR:":STDERR:@CRLF
Endif
oProcess = 0
objInfo = 0

Return (Output)
#EndSubroutine
Return
;==========================================================

Stan - formerly stanl [ex-Pundit]