Increase memory for error 1513 COM: Out of memory / Buffer too small?

Started by ltwert, July 28, 2015, 01:05:54 PM

Previous topic - Next topic

ltwert

I have searched the web and winBatch forums for something relating to this, but have not found anything useful.

I am parsing a large HTML table and converting it to CSV using Internet Explorer 11 on Windows 7 Pro x64.  When I execute the .innerHTML statement, I get an error: "1513 COM: Out of memory / Buffer too small". 

Is there any way to increase the buffer size?  Any thoughts on other approaches to avoid this?

I have tried decreasing the size of the table by parameter selection, etc, but keep hitting this limiatation.

I appreciate your ideas.

td

There are too many possible causes to speculate about the immediate cause (although someone probably will.)  In general, errors like this are the result of sloppy coding.  Unless you are working with html that is in the hundreds of megabytes range, you should consider  doing a code review to clean up variables and objects you no longer need as your script does its processing.   If your html or some other resource is actually that memory intensive then you will need to subdivide your task into smaller units.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

JTaylor

Not sure how you define "large".   What object are you using the .InnerHTML on...the Table?   How about GetElementsByTagName("TR") and loop through the rows?

Jim

ltwert

I was actually hoping for a simple answer, like "The buffer size can be increased by...".  Lacking that, I will go into more detail.

The code is actually clean since I have been careful to optimize and organize it for efficiency, so brash assumptions to the contrary, like those of "td" are not entirely helpful.

The GetElementsByTagName("TR") suggestion made by JTaylor may hold some promise, although I am already doing that to a degree. 

The problem is the web page itself, which of course, I have no control over.  The report being converted is actually a HTML table with an ID that is organized into more than one sub-tables and sub-sub-tables, none of which have associated IDs, making them difficult to identify and extract.  The sub-tables, of course are included within a <TR> tag of the master table, and I must deal with all of this.  The HTML for the page is about 36MB, and I have had some success by adjusting the report filters to produce several HTML files that are converted and later combined in another process.

The organization of the code begins with a UDF call:
SaveCsvFromHtmlDoc(doc,strHtmlReportTableId,strReportTableStartsWith,strOutputFilePathName,strLogFileName,intWaitTimeoutCountLimit,dblWaitTimeDelay)
where doc is the browser.document object, strHtmlReportTableId is the table ID being sought, strReportTableStartsWith is a short HTML string to help in finding the sub-tables of interest, strOutputFilePathName is the path to the output file, and the other parameters are related to logging.

SaveCsvFromHtmlDoc then finds the master table using strHtmlReportTableId and then extracts the inner HTML for processing, which is where the error occurs.  This is followed by a line to get the collection of sub-tables where the data actually is found.
htmElementReportTable = phtmDoc.GetElementById(pstrHtmlTableId)
htmReportTable = htmElementReportTable.innerHTML
htmTables = htmElementReportTable.GetElementsByTagName("Table")


SaveCsvFromHtmlDoc continues from there by opening the output file to generate a handle and then executes another UDF to write the lines within each sub-table:
WriteCsvTrLinesFromTableColl(hndOutFile,htmTables,pstrTableStartsWith,pstrLogFileName,pintWaitTimeoutCountLimit,pdblWaitTimeDelay)
Where hndOutFile is the output file handle, htmTables is the collection of sub-tables, pstrReportTableStartsWith is a short HTML string to help in finding the sub-tables of interestm and the remaining parameters are for logging.

WriteCsvTrLinesFromTableColl iterates through the sub-tables to find the ones containing data using pstrTableStartsWith, and then executes another UDF to parse and write the lines within the data table:
WriteCsvTrLinesForTable(phndOutFile,hrmTableItem,strFirstLineCsv,pstrLogFileName,pintWaitTimeoutCountLimit,pdblWaitTimeDelay)
Where the parameters are much the same as WriteCsvTrLinesFromTableColl, with the exception of hrmTableItem, which is the table containing the data, and strFirstLineCsv, which is used to find <TR> lines with fewer than the normal number of columns (another complication).

WriteCsvTrLinesForTable actuially extracts collection of the <TR> rows within the table of interest and then executes another UDF to actually parse and write the line:
WriteCsvTrLineA(phndOutFile,htmTrs.item(intTrIndex),blnsColumnsBlank,strFirstLineCsv)

So the complications of the page layout have led me to this solution, and while there could be some additional steps I can take to break this down, I am hoping for a more simple solution like "The buffer size can be increased by..."

Any comments you may have could be helpful.
Thanks... L T Wert

For reference, the entirety of the UDFs of interest is:
#DefineFunction ConvertTdFieldToCsvField(pstrTdField)
;Replace internal double-quote with single quote
strCsvField = StrReplace(pstrTdField,'"','""')
;strCsvField = StrReplace(strCsvField,"'","\'")
;Replace <BR> with a white space
strCsvField = StrReplace(strCsvField,"<BR>"," ")
strCsvField = StrReplace(strCsvField,"<br>"," ")
strCsvField = StrReplace(strCsvField,"&nbsp;"," ")
strCsvField = StrReplace(strCsvField,"&amp;","&")
strCsvField = StrReplace(strCsvField,"&gt;",">")
strCsvField = StrReplace(strCsvField,"&lt;","<")
strCsvField = StrReplace(strCsvField,@LF,"")
;Put quotes around field if comma, single-quote, or double-quote found
If StrCnt(strCsvField,",",1,-1,0) > 0 || StrCnt(strCsvField,"'",1,-1,0) > 0 || StrCnt(strCsvField,'"',1,-1,0) > 0
strCsvField = StrCat('"',strCsvField,'"')
EndIf
Return strCsvField
#EndFunction ;ConvertTdFieldToCsvField

#DefineFunction ConvertTrLineToCsvLineA(phtmTrLine,pblnsColumnsBlank)
htmTds = phtmTrLine.GetElementsByTagName("TD")
intTdCount = htmTds.length
intTextLen = 0
intBlankIndex = 0
strCsvLine = ''
For intTdIndex = 0 to intTdCount - 1
htmTd = htmTds.item(intTdIndex)
strTdText = htmTd.innerHTML
strAttrColSpan = htmTd.getAttribute("colspan") ;Deal with multiple columns
if StrLen(strAttrColSpan) > 0
intColSpan = strAttrColSpan
For intI = 0 to intColSpan - 1
If pblnsColumnsBlank[intBlankIndex]
intColSpan = intColSpan - 1
EndIf ;pblnsColumnsBlank[intTdIndex + intI]
intBlankIndex = intBlankIndex + 1
Next ;intI
intBlankIndex = intBlankIndex - 1
Else
If pblnsColumnsBlank[intBlankIndex]
intColSpan = 0
Else
intColSpan = 1
EndIf ;pblnsColumnsBlank[intTdIndex + intI]
Endif
If intColSpan>0
strCsvText = ConvertTdFieldToCsvField(strTdText)
strCsvLine = StrCat(strCsvLine,strCsvText,StrFill(",",intColSpan))
intTextLen = intTextLen + StrLen(strCsvText)
EndIf
intBlankIndex = intBlankIndex + 1
Next ;intTrIndex
If intTextLen > 0
strCsvLine = StrSub(strCsvLine,1,StrLen(strCsvLine)-1)
Else
strCsvLine = "" ;No actual data - make zero-length string
EndIf
Return strCsvLine
#EndFunction ;ConvertTrLineToCsvLineA

#DefineFunction ConvertTrLineToCsvLine(phtmTrLine)
htmTds = phtmTrLine.GetElementsByTagName("TD")
intTdCount = htmTds.length
intTextLen = 0
strCsvLine = ''
For intTdIndex = 0 to intTdCount - 1
htmTd = htmTds.item(intTdIndex)
strAttrColSpan = htmTd.getAttribute("colspan") ;Deal with multiple columns
if StrLen(strAttrColSpan) > 0
intColSpan = strAttrColSpan
Else
intColSpan = 1
Endif
strTdText = htmTd.innerHTML
If intColSpan>0
strCsvText = ConvertTdFieldToCsvField(strTdText)
strCsvLine = StrCat(strCsvLine,strCsvText,StrFill(",",intColSpan))
intTextLen = intTextLen + StrLen(strCsvText)
EndIf
Next ;intTrIndex
If intTextLen > 0
strCsvLine = StrSub(strCsvLine,1,StrLen(strCsvLine)-1)
Else
strCsvLine = "" ;No actual data - make zero-length string
EndIf
Return strCsvLine
#EndFunction

#DefineFunction WriteCsvTrLineA(phndOutFile,phtmTrLine,pblnsColumnsBlank,pstrFirstLineCsv)
strCsv = ConvertTrLineToCsvLineA(phtmTrLine,pblnsColumnsBlank)
;To avoid repeated titles - Assumes first 20 characters of the title line will not have blank field
If StrLen(strCsv) > 0 && (StrSub(strCsv,1,20)!=StrSub(pstrFirstLineCsv,1,20))
FileWrite(phndOutFile, strCsv)
EndIf
Return strCsv
#EndFunction

#DefineFunction WriteCsvTrLine1st(phndOutFile,phtmTrLine)
strCsv = ConvertTrLineToCsvLine(phtmTrLine)
If StrLen(strCsv) > 0
FileWrite(phndOutFile, strCsv)
EndIf
Return strCsv
#EndFunction

#DefineFunction WriteCsvTrLinesForTable(phndOutFile,phtmTable,pstrFirstLineCsv,pstrLogFileName,pintWaitTimeoutCountLimit,pdblWaitTimeDelay)
htmTrs = phtmTable.GetElementsByTagName("TR")
intTrCount = htmTrs.length
strFirstLineCsv = pstrFirstLineCsv
For intTrIndex = 0 to intTrCount - 1
if intTrIndex == 0 && StrLen(pstrFirstLineCsv) < 1
strFirstLineCsv = StrTrim(WriteCsvTrLine1st(phndOutFile,htmTrs.item(intTrIndex)))
intFirstLineColCount = ItemCountCsv(strFirstLineCsv,0,",")
blnsColumnsBlank = ArrDimension(intFirstLineColCount)
For intI = 0 to intFirstLineColCount - 1
blnsColumnsBlank[intI] = (StrLen(ItemExtractCsv(intI + 1,strFirstLineCsv,0,","))<1)
Next ;intI
Else
if StrLen(pstrFirstLineCsv) > 0
intFirstLineColCount = ItemCountCsv(strFirstLineCsv,0,",")
blnsColumnsBlank = ArrDimension(intFirstLineColCount)
For intI = 0 to intFirstLineColCount - 1
blnsColumnsBlank[intI] = (StrLen(ItemExtractCsv(intI + 1,strFirstLineCsv,0,","))<1)
Next ;intI
EndIf
WriteCsvTrLineA(phndOutFile,htmTrs.item(intTrIndex),blnsColumnsBlank,strFirstLineCsv)
Endif
Next ;intTrIndex
Return strFirstLineCsv
#EndFunction

#DefineFunction WriteCsvTrLinesFromTableColl(phndOutFile,phtmTables,pstrStartsWith,pstrLogFileName,pintWaitTimeoutCountLimit,pdblWaitTimeDelay)
strFirstLineCsv = ""
intTableCount = phtmTables.length
For intTableIndex = 0 to intTableCount - 1
hrmTableItem = phtmTables.item(intTableIndex)
strTableItem = StrSub(hrmTableItem.outerHTML,1,100)
If StrLen(pstrStartsWith)>0
If pstrStartsWith == StrSub(strTableItem,1,StrLen(pstrStartsWith))
strFirstLineCsv = WriteCsvTrLinesForTable(phndOutFile,hrmTableItem,strFirstLineCsv,pstrLogFileName,pintWaitTimeoutCountLimit,pdblWaitTimeDelay)
EndIf ;pstrStartsWith = StrSub(strTableItem,1,StrLen(pstrStartsWith))
EndIf ;StrLen(pstrStartsWith)>0
Next ;intTrIndex
#EndFunction

#DefineFunction SaveCsvFromHtmlDoc(phtmDoc,pstrHtmlTableId,pstrTableStartsWith,pstrOutFilePathName,pstrLogFileName,pintWaitTimeoutCountLimit,pdblWaitTimeDelay)
blnSuccess = @TRUE
htmElementReportTable = phtmDoc.GetElementById(pstrHtmlTableId)
htmReportTable = htmElementReportTable.innerHTML
htmTables = htmElementReportTable.GetElementsByTagName("Table")

intWaitTimeoutCount = 0
blnNotTimedOut = @TRUE
hndOutFile = FileOpen(pstrOutFilePathName, "WRITE")
While (hndOutFile == 0) && (intWaitTimeoutCount <  pintWaitTimeoutCountLimit)
intWaitTimeoutCount = intWaitTimeoutCount + 1
hndOutFile = FileOpen(pstrOutFilePathName, "CREATE")
TimeDelay(pdblWaitTimeDelay)
EndWhile
if (intWaitTimeoutCount >=  pintWaitTimeoutCountLimit)
Pause('Table Write',"Could not open output file")
blnSuccess = @FALSE
EndIf
LogInfo("APPEND",pstrLogFileName,"SaveCsvFromHtmlDoc","Report export file open",pintWaitTimeoutCountLimit,pdblWaitTimeDelay,"Could not open log file")

If blnSuccess
WriteCsvTrLinesFromTableColl(hndOutFile,htmTables,pstrTableStartsWith,pstrLogFileName,pintWaitTimeoutCountLimit,pdblWaitTimeDelay)
FileClose(hndOutFile)
EndIf ;blnSuccess
LogInfo("APPEND",pstrLogFileName,"SaveCsvFromHtmlDoc","Report export written",pintWaitTimeoutCountLimit,pdblWaitTimeDelay,"Could not open log file")

Return blnSuccess
#EndFUnction

JTaylor

Is it a page we can view/download?  Hard to make specific suggestions on this type of stuff without seeing it.

One thing I sometimes do is strip all CR,LF,TABS and then based on tags add back in delimiters in the appropriate places and then use the httpStripHTML() function (Winsock in Inet Extender).  Not sure if that would work in this case but thought I would mention it.

Jim

td

Quote from: ltwert on July 28, 2015, 03:33:02 PM
I was actually hoping for a simple answer, like "The buffer size can be increased by...".  Lacking that, I will go into more detail.

The code is actually clean since I have been careful to optimize and organize it for efficiency, so brash assumptions to the contrary, like those of "td" are not entirely helpful.

I can see how the response could be taken as derogatory even though that was not the intention but it was neither 'brash' nor assumptive.  After viewing WinBatch user scripts for more than 18 years, it can be stated with a good deal of confidence that memory errors that are not the result of loading 75 MB or larger data sets into memory are with great frequency the result of not writing a clean script or writing a script that contains logic errors.

When you get a general out of memory error in WinBatch  you have either exhausted process memory or exhausted WinBatch string space.  The former is a system architecture determined limit and the latter already defaults to the maximum. In the case of error 1513, you are dealing with the process memory limit so  there is obviously no magic setting to increase it. In some cases WinBatch gets this error from a COM Automation server that has exhausted its process memory or exhausted the WinBatch process's memory (depending on whether or not the COM server is an in or out of process server) because of a method or property call made in the WinBatch script. In other cases it is the result of converting a very large ANSI or Unicode string to or from a COM BSTR string for passing to or obtaining from a COM object method and property. (This type of conversion does not use WinBatch string space.) 

The logical solution to a large data set problem that is not caused by coding errors is to break the data set into smaller chunks for processing while making sure a processed chunk is saved to the file system and removed from memory before processing the next chuck.  Of course doing this often increases the complexity of the script and can be a challenge to implement.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

JTaylor

Also not sure if it would help but you might take a look at the DOM Extender.   

Jim

td

Some have had good success with the DOM extender and others not so much because of reported bugs.  This and the fact that it is more or less a black box makes it a little difficult to feel entirely comfortable recommending it.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

DAG_P6

Quote from: td on July 28, 2015, 09:15:17 PM
The logical solution to a large data set problem that is not caused by coding errors is to break the data set into smaller chunks for processing while making sure a processed chuck is saved to the file system and removed from memory before processing the next chuck.  Of course doing this often increases the complexity of the script and can be a challenge to implement.

This seems especially appropriate in this case, since parsing a large document consumes way more memory than most people realize to store and manage the parse tree.

I also like Jim''s suggestion of stripping CR/LF pairs, since I often see tags that span two or more lines, especially when the HTML is generated by a tool.

Actually, if it were mine to write, I would consider, at least as a starting point, a console mode C# program, to take advantage of the detail that you can see in the Visual Studio debugger. As for the guts of the program, I would probably start with a regular expression, which can easily extract the InnerHMTM from a tag.  Another option would be to split the body off into a new string, graft a standard XML header onto it, and load the string into an XML parser. Once you have all that working, you can move the XML parsing code into WinBatch.

If you don't have access to a C# compiler, you can do all of the above in native WIL, but you won't be able to see quite as much detail in the debugger. In WIL, you can use the Scripting.RegularExpression object to separate the body from the document if you first remove the CR/LF pairs, which you should do anyway, so that you start with well formed XML.
David A. Gray
You are more important than any technology.

JTaylor

Guess I'm one of the lucky ones.  I've used the DOM Extender a lot and never had a single problem.   Just wish it handled Unicode. 

Something else that might be useful are BinaryTags.  If your HTML is well-formed and I think you indicated it was, you can pull easily pulls chunks of the table and process them.

Jim

stanl

Not to muddy the waters, but I have used HTMLAgilityPack via Powershell called from the Winbatch CLR.

DAG_P6

Jim,

Quote from: JTaylor on July 30, 2015, 06:41:48 AM
Guess I'm one of the lucky ones.  I've used the DOM Extender a lot and never had a single problem.   Just wish it handled Unicode. 

Something else that might be useful are BinaryTags.  If your HTML is well-formed and I think you indicated it was, you can pull easily pulls chunks of the table and process them.

Do you know about Win32 API routine WideCharToMultiByte for converting Unicode to Multybyte (ANSI) strings?

See https://msdn.microsoft.com/en-us/library/windows/desktop/dd374130(v=vs.85).aspx.
David A. Gray
You are more important than any technology.

JTaylor


td

It works until you run across a Unicode character that doesn't map to anything in the current or specified code page. 
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

DAG_P6

This is true, but I think there are flags to address that situation, one of which causes the function to raise an exception.
David A. Gray
You are more important than any technology.