Author Topic: Power Query Web Scrape - Extended  (Read 213 times)

stanl

  • Pundit
  • *****
  • Posts: 1572
Power Query Web Scrape - Extended
« on: July 12, 2021, 04:45:13 am »
This is a follow up to my post in the 'Future of WB for OLE...' thread. I illustrated using Power Query to web scrape  table from https:"//www.timeanddate.com/weather/usa/[city] .... where city could be substituted with an American city.  The M Code was specific to Table[0] on the web page. There are actually 4 tables and by adding an additional substitution in the M Code text and adjusting how Excel would handle multiple queries all 4 tables are accessed. Code uses Atlanta as the city. Important: the M Code is generic to the variable columns in each table and inserts as text. I have used (and posted) Power Query scripts with Json, delimited text, database tables and working in an environment where Excel is God it comes in handy.  If anyone reading this, tries the script, would appreciate some feedback.
Code: Winbatch

;Winbatch 2021c - Testing Excel Power Query for simple Web Scrape
;Requires Excel 2016 or Higher or Power Query added to Excel 2013 or earlier
;Stan Littlefield July 11, 2021
IntControl(73,1,0,0,0)
gosub udfs
oXL=0
city = "Atlanta" ;gets city for URL's weather; also is tab name in Excel
ct = 3  ;for this Url Tables are 0-3
BoxOpen("Please Wait","Creating Excel Power Query for Web Data")
For i = 0 To ct
   qry = city:"_Tbl":i
   cn = "Query - ":qry
   mcode = getmcode()
   toExcel()
Next
oXL=0
BoxShut()
Pause("Query Completed","Save Workbook or Quit Excel")


Exit
;======================================================================================================
:WBERRORHANDLER
Terminate(@TRUE,"Error Encountered",geterror())
;======================================================================================================
:CANCEL
Display(2,"Operation Canceled","Goodbye...")
Exit
:udfs
#DefineSubRoutine geterror()
   wberroradditionalinfo = wberrorarray[6]
   lasterr = wberrorarray[0]
   handlerline = wberrorarray[1]
   textstring = wberrorarray[5]
   linenumber = wberrorarray[8]
   errmsg = "Error: ":lasterr:@LF:textstring:@LF:"Line (":linenumber:")":@LF:wberroradditionalinfo
   Return(errmsg)
#EndSubRoutine


#DefineSubRoutine getmcode()
;M Code could be loaded from text file
;but if using WB's multi line, special chars must be escaped
code = $"
let
    Source = Web.Page(Web.Contents("https:://www.timeanddate.com/weather/usa/|city|")),
    Data|n|= Source{|n|}[Data],
    ##"Changed Type" = Table.TransformColumnTypes(Data|n|,
                      List.Transform(Table.ColumnNames(Data|n|), each {_, type text } ) )
in
    ##"Changed Type"
$"

code = StrReplace(code,"|city|",city)
code = StrReplace(code,"|n|",i)
Return(code)
#EndSubRoutine


#DefineSubRoutine toExcel()
If i == 0
   oXL = CreateObject("Excel.Application")
   If oXL == 0 Then Terminate(@TRUE,"Script Will Terminate","Cannot Start Excel Instance")
   oXL.Visible          = @TRUE  ; change this to @FALSE to run hidden
   oXL.ScreenUpdating   = @TRUE  ; if running hidden, change this to @FALSE
   oXL.UserControl      = @TRUE
   oXL.DisplayAlerts    = @FALSE
   oXL.WorkBooks.Add()
   oWS = oXL.ActiveWorkBook.Worksheets(1)
   oWS.Name = qry
Else
   nSheets = oXL.ActiveWorkBook.Worksheets.Count
   oXL.ActiveWorkBook.Worksheets.Add(::After=oXL.ActiveWorkBook.Worksheets(nSheets))
   oWS = oXL.ActiveWorkBook.ActiveSheet
   oWS.Name = qry
Endif
;oWS.Activate()
oXL.ActiveWorkBook.Queries.Add(::Name=qry,Formula=mcode,Description="Query")
cSource = "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=%qry%;Extended Properties=''"
qt = oWS.ListObjects.Add(::SourceType=0,Source=cSource,Destination=oWS.Range("$A$1")).QueryTable
qt.CommandText = "Select * FROM [%qry%]"
qt.Refresh()
oWS=0
Return(1)
#EndSubRoutine


Return