Power Query HTML Table

Started by spl, July 25, 2024, 06:33:08 AM

Previous topic - Next topic

spl

In the old days Excel could download web tables via SOAP, but has been replaced by Power Query. Attached is a simple script [and, I'm not a cat lover, having only had GSD's] which could serve as a template for webscraping HTML tables. [requires Office 2016 or greater or 2013 with Power Query add-on]
; Winbatch 2024A - example to create worksheet table from HTML table
;
;
;===============================================================================================================
gosub udfs
IntControl(73,1,0,0,0)

url = "https://en.wikipedia.org/wiki/List_of_cat_breeds"
cXLS = "C:\temp\cats.xlsx"
If FileExist(cXLS) Then FileDelete(cXLS)
qry = "Cats"
mcode = $"let
    Source = Web.Page(Web.Contents("|url|")),
    Data0 = Source{0}[Data]
in
    Data0
$"

Display(2,"Creating ":cXLS,"From ":url)
mcode = StrReplace(mcode,"|url|",url)

xl = CreateObject("Excel.Application")
xl.Visible = @true
xl.ScreenUpdating = @true
xl.UserControl = @true
xl.DisplayAlerts = @false
wk = xl.workbooks.add()
ws = wk.Worksheets.Item(1)
ws.Name = qry
xl.ActiveWorkBook.Queries.Add(::Name=qry,Formula=mcode,Description="Query")
cSource = "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=%qry%;Extended Properties=''"
qt = ws.ListObjects.Add(::SourceType=0,Source=cSource,Destination=ws.Range("$A$1")).QueryTable
qt.CommandText = "Select * FROM [%qry%]"
qt.Refresh()
wk.SaveAs(cXLS)
wk.Close()
xl.Quit()
; finish up
ws=0
wk=0
xl=0

If FileExist(cXLS) Then Display(2,"Excel Created",cXLS)
Exit

:WBERRORHANDLER
ws=0
wk=0
xl=0
geterror()
Message("Error Encountered",errmsg)
Exit

;===============================================================================================================

:udfs
#DefineSubRoutine geterror()
   wberroradditionalinfo = wberrorarray[6]
   lasterr = wberrorarray[0]
   handlerline = wberrorarray[1]
   textstring = wberrorarray[5]
   linenumber = wberrorarray[8]
   errmsg = "Error: ":lasterr:@LF:textstring:@LF:"Line (":linenumber:")":@LF:wberroradditionalinfo
   Return(errmsg)
#EndSubRoutine

Return
;===============================================================================================================
Stan - formerly stanl [ex-Pundit]

cssyphus

Holy cow Stan - that's bluddy impressive! Thanks for sharing. I'll BF that and see how I can use it down the road.

Many thanks for sharing.

spl

Just to play on the main part. This URL has multiple tables, and I coded for up to 3. Once you get more familiar with mCode or DAX... setting this basic transformation=>Excel gets real easy.
; Winbatch 2024A - example to create worksheet table from HTML table
; Nultiple tables to select from by assigning tbl variable
; Not set to persist Excel data, just leave it open
;===============================================================================================================
gosub udfs
IntControl(73,1,0,0,0)

url = "https://www.dailydairyreport.com"
; there are multiple tables, try 0,1 or 2
tb1 = 1 
qry = "Dairy_Report"
mcode = $"let
    Source = Web.Page(Web.Contents("|url|")),
    Data|t| = Source{|t|}[Data],
    ##"Renamed Columns" = Table.RenameColumns(Data|t|,{{"", "Product"}})
in
    ##"Renamed Columns"
$"

Display(2,"Creating Table","From ":url)
mcode = StrReplace(mcode,"|url|",url)
mcode = StrReplace(mcode,"|t|",tb1)

xl = CreateObject("Excel.Application")
xl.Visible = @true
xl.ScreenUpdating = @true
xl.UserControl = @true
xl.DisplayAlerts = @false
wk = xl.workbooks.add()
ws = wk.Worksheets.Item(1)
ws.Name = qry
xl.ActiveWorkBook.Queries.Add(::Name=qry,Formula=mcode,Description="Query")
cSource = "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=%qry%;Extended Properties=''"
qt = ws.ListObjects.Add(::SourceType=0,Source=cSource,Destination=ws.Range("$A$1")).QueryTable
qt.CommandText = "Select * FROM [%qry%]"
qt.Refresh()
; finish up
ws=0
wk=0
xl=0

Message("Table Transformed","Save or Discard Excel File")
Exit

:WBERRORHANDLER
ws=0
wk=0
xl=0
geterror()
Message("Error Encountered",errmsg)
Exit

;===============================================================================================================

:udfs
#DefineSubRoutine geterror()
   wberroradditionalinfo = wberrorarray[6]
   lasterr = wberrorarray[0]
   handlerline = wberrorarray[1]
   textstring = wberrorarray[5]
   linenumber = wberrorarray[8]
   errmsg = "Error: ":lasterr:@LF:textstring:@LF:"Line (":linenumber:")":@LF:wberroradditionalinfo
   Return(errmsg)
#EndSubRoutine

Return
;===============================================================================================================
Stan - formerly stanl [ex-Pundit]