SQLite Extender (Alpha? Beta?)

Started by JTaylor, May 14, 2020, 08:03:50 PM

Previous topic - Next topic

JTaylor

Hmmmmmmm...must be some other character that it doesn't like but which isn't seen as an "empty" string.

Did it load any rows?   If so, can you send me the rest of the file?

Jim

Quote from: KeithW on May 25, 2020, 01:23:22 PM
Jim,

I ran with the bad character in the input file and it stopped with a useable Error Msg.
Then I ran with a cleaned file and while it built the file completely, the script died silently and never returned the RESP = 0 message,
running it again to see it that was a fluke?  Did it a second time, did not return, but file was released just the same?


Regards,
Keith

JTaylor

My Bad.   I sat there looking at that for a while thinking there was something wrong but couldn't figure out what so moved on.

Jim

Quote from: KeithW on May 25, 2020, 01:40:40 PM
Jim,

Possible typo in Hep File?

Isn't the SQL Version 3.31.1  or   are you really back at 3.11.1?

Regards,
Keith

JTaylor

I really feel like I would be depriving you of a learning experience if I were to do this...

This is from BinaryReplace() in the Help file.
Code (winbatch) Select

str="hello"
rep="goodbye"
dafile="C:\Temp\myfile.txt"
fs = FileSize( dafile )
binbuf = BinaryAlloc( fs+100 )
ret = BinaryRead( binbuf, dafile )
num = BinaryReplace( binbuf, str, rep ,0)
Message( "Number of '%str%' strings replaced", num )
BinaryWrite( binbuf, dafile )
BinaryFree( binbuf)
Exit


If your files are too large for one Buffer,  you can search for "Split Large Files" in the Tech Database and go to the "WinBatch Samples from Users" section you will find some code that might be useful.  If you want to get really fancy I can post some code that reads into a buffer a chunk at a time but it is a little more involved.

Jim

Quote from: KeithW on May 25, 2020, 01:23:22 PM
Jim,

Would you consider crafting a text file analyzer that could be run on large text files (CSV or Flat) that would warn of any potential bad
characters  ( <x20  or  >x7F with the exception of @CR or @LF.    Giving the Line and Col like you were previously in a list of any such
occurrences thru-out the file in one pass, rather than have to possibly run multiple times to proclaim a clean file?

Regards,
Keith

KeithW

Jim,

Tried a couple more times BUT it still dies a silent death.

It DOES COMPLETELY LOAD THE SQLITE3 db !!
Just that the call from dbInsertFlat() never returns to the WB script.

Keith

KeithW

Jim,

I grabbed the BinaryReplace example you posted
Also the Split Large Text File from the Tech Support DB

If you will be as kind as to post the other code you mentioned I will sit down and try to get a working program
put together...

Thanx,
Keith

JTaylor

This came out of a program I wrote that would take LARGE XML files (40gb+) and split them up.   (Detlev Dalitz helped with some of the heavy lifting)  So you will need to do a little translation of the code.  When you see variables related to "node" it is something that would be defined such as "<Books>", "<Items>", etc.  so it could chop up the XML but make each a complete, valid, XML file.  Hope this make sense.


chunk_size is in mb. 
split_file is file.
Node stuff will need changed to look for what you are using for split criteria and what you want to place at end of split files.



You will need --  AddExtender ("WWHUG44i.DLL")

Code (winbatch) Select


#DefineFunction UDFCvtFloatToHuge(f)

  fupper=strupper(f + 0.0)
  f1=ItemExtract(1,fupper,"E")
  f2=ItemExtract(2,fupper,"E")
  If f2 == "" then Return(f1)
   
  neg=@false
  If f1<0
    neg=@TRUE
    f1 = -f1
  EndIf

  f1=StrReplace(f1,".","") ; remove .

  If f2<0
    f1=strcat("0.",strfixleft(f1,0,-(f2)+strlen(f1)-1))
  Else
    f1=strfix(f1,0,max(strlen(f1) , f2+1))
  EndIf

  If neg==@FALSE then
    Return(f1)
  Else
    Return (strcat("-",f1))
  EndIf

#EndFunction

#DefineSubRoutine BinSplit()

   split_file_size = FileSize(split_file)
   If ItemCount(split_file_size,"e") > 1 Then
     split_file_size = UDFCvtFloatToHuge(split_file_size)
   EndIf

   chunk_size_byte = huge_Multiply(1 << 20, chunk_size) 
   chunk_size_byte = huge_Add(chunk_size_byte,StrLen(root_node))
   chunk_size_byte = huge_Add(chunk_size_byte,StrLen(root_node_end))

   If StrSub(huge_Subtract(split_file_size,chunk_size_byte),1,1) == "-" Then
     Message("Note","Chunk Size is bigger than the file.  Please select a smaller chunk size and try again.")
     Return
   EndIf

   big_binary_buffer = BinaryAlloc(chunk_size_byte) ; Physical memory is the limit.
   chunk_size_byte = huge_Subtract(chunk_size_byte,StrLen(root_node))

   root_node_offset = 0
   file_count = 0
   Sample_File()
   split_file_offset = 0

   While @TRUE
     file_count = file_count + 1
     If IsKeyDown(@SHIFT & @CTRL) Then Break
     bytes_read = "" : BinaryReadEx(big_binary_buffer, root_node_offset, split_file, split_file_offset, chunk_size_byte)

     If file_count == 1 Then
       bytes_read_save = bytes_read
       root_node_offset = StrLen(root_node_end)-1
     Else
       BinaryPokeStr(big_binary_buffer,0,root_node)
     EndIf
 
     inode_offset = BinaryIndexEx(big_binary_buffer, chunk_size_byte-1, item_node_end, @BACKSCAN, @TRUE)
     If inode_offset == -1 Then
       Message("Note","Node not found")
       Break
     EndIf
     inode_offset = inode_offset + StrLen(item_node_end)
     inode_diff = huge_Subtract(chunk_size_byte,inode_offset)

     BinaryEODSet(big_binary_buffer,inode_offset)+StrLen(root_node_end)
     BinaryPokeStr(big_binary_buffer, inode_offset, root_node_end)
     output_file = output_dir:output_base2:"_":StrFixLeft(file_count,"0",output_counter):".":output_ext2

     If bytes_read < bytes_read_save Then BinaryEODSet(big_binary_buffer,bytes_read+root_node_offset)

     If IsKeyDown(@SHIFT & @CTRL) Then Break
     BinaryWrite(big_binary_buffer,output_file)
 
     DialogProcOptions(SPL_Handle, 1003,"XML Splitter - Chunk#: ":file_count)
     DialogControlSet(SPL_Handle,"ib_SPL_output_files",dc_itemboxadd,ItemExtract(-1,output_file,"\"))
     DialogControlSet(SPL_Handle,"ib_SPL_output_files",dc_itemscrollpos,-1)

     split_file_offset_save = split_file_offset
     If file_count == 1 Then
       split_file_offset = huge_Add(split_file_offset, chunk_size_byte)
     Else
       split_file_offset = huge_Add(split_file_offset, huge_Subtract(chunk_size_byte,StrLen(root_node)))
      EndIf
     split_file_offset = huge_Subtract(split_file_offset, inode_diff)

     If IsKeyDown(@SHIFT & @CTRL) Then Break

     If StrSub(huge_Subtract(split_file_offset, split_file_size),1,1) == "-" || huge_Subtract(split_file_offset, split_file_size) == 0 Then Continue

     Break

   EndWhile
     
   big_binary_buffer = BinaryFree (big_binary_buffer)
   
   WinHide("")

#EndSubRoutine




Quote from: KeithW on May 25, 2020, 02:32:06 PM
Jim,

I grabbed the BinaryReplace example you posted
Also the Split Large Text File from the Tech Support DB

If you will be as kind as to post the other code you mentioned I will sit down and try to get a working program
put together...

Thanx,
Keith

KeithW

Jim,

OK, thanx, now that my eyes have glazed over I will see where I get with it.
The first version may be slow and awkward BUT given time I will prevail.

Keith

JTaylor

Not sure what to do on this one.  I just loaded 4.2million rows and it worked as expected.   Any possibility of downloading the file you are using?   If there is a link and you want to PM me I could try it here.

Jim

Quote from: KeithW on May 25, 2020, 02:15:03 PM
Jim,

Tried a couple more times BUT it still dies a silent death.

It DOES COMPLETELY LOAD THE SQLITE3 db !!
Just that the call from dbInsertFlat() never returns to the WB script.

Keith

KeithW

Jim,

Here is the current load script to go with the file I sent you.
The Config file is attached, too.

The BIG Data File is the original client file with the x1A in line 2,914,421 at column 11

I have run it WITH and WITHOUT fixing it.


Keith



AddExtender("wbsql44i.dll") ; SQLite3 Functions()
AddExtender("wilx44i.dll",0,"wilx64i.dll") ; xGetElapsed(), GetExtactTime()

; THIS REQUIRES vD & later for the SQLite Function Library !!
; -----------------------------------------------------------

db  = DirScript():"DEP_ATB_MAR20.sqlite"     ;INCLUDE FULL PATH FOR DATABASE
If FileExist(db) Then FileDelete(db) ;start over fresh every running

Message("Version","Extender: " : @dbVersion : @CRLF : "    SQLite: " : dbSQLiteVersion() ) ;shows Extender/SQLite version

; ID_NO INTEGER PRIMARY KEY AUTOINCREMENT,
;#############################################################
SQLText = $"CREATE TABLE IF NOT EXISTS fmc_detail (
ID_NO INTEGER PRIMARY KEY AUTOINCREMENT,
   Acct_Id Text(10),
Title Text(4),
F_Name Text(10),
M_Name Text(10),
L_Name Text(30),
Suffix Text(4),
Cust_Id Text(9),
Addr Text(40),
City Text(20),
State Text(2),
Zip Text(9),
Mail_Rte Text(5),
   Status Text(1),
Class Text(2),
Balance REAL(13,2),
Current REAL(13,2),
C_Per_1 REAL(13,2),
C_Per_2 REAL(13,2),
C_Per_3 REAL(13,2),
C_Per_4 REAL(13,2),
Dep_Total REAL(13,2),
Dep_Int REAL(13,2) );$"

resp = dbExecute(db,SQLText)
Message("Create",resp)
;#############################################################


;#############################################################
;dbTables(db)
;Returns a TAB Delimited List of Tables
;#############################################################
stmt = dbTables(db)
Message("dbTables",stmt)
;#############################################################


;#############################################################
;dbSchema(db,Table_Name)
;Returns Database Table Schema in a String Format
;2nd Argument(optional) is Table Name. Leave off if you want all tables.
;#############################################################
stmt = dbSchema(db,"fmc_detail")
Message("dbSchema",stmt)
;#############################################################


;#############################################################
;dbInsertFlat()
;Batch Insert from a Flat, Fixed File
; Arg 1 - STRING - Full Path for Database.
; Arg 2   STRING - SQL Template  Use '[{n}]' for placeholders. n is 1-based.
; Arg 3 - STRING - Full File Path
; Arg 4 - ARRAY  - Load Configuration Array
; Arg 5 - Int(optional)  -  Row to Start - Default 1
; Arg 6 - Int(optional)  -  Row to End - This includes this row.  This also ignores the start row.


; Load Configuration Array - Each Column to be loaded, needs a Column in the Array
;  Row 0 -  FieldName - Will not be used here for the benefit of the user.
;  Row 1 -  Start     - Column at which the field begins
;  Row 2 -  Length    - Length of Field.

;Example of post-processing SQL
;   SQLText = "UPDATE ATB_DETAIL SET PDovr60 = CASE Substr(PDovr60,length(pdovr60),1) WHEN '-' THEN PDovr60*-1 END;"
;   dbExecute(db,SQLText)

;#############################################################

SQLText = "INSERT INTO fmc_detail (ID_NO,Acct_Id,Title,F_Name,M_Name,L_Name,Suffix,Cust_Id,Addr,City,State,Zip,"
SQLtext = SQLText : "Mail_Rte,Status,Class,Balance,Current,C_Per_1,C_Per_2,C_Per_3,C_Per_4,Dep_Total, Dep_Int)
SQLtext = SQLtext : "VALUES (NULL,'[{1}]','[{2}]','[{3}]','[{4}]','[{5}]','[{6}]','[{7}]','[{8}]','[{9}]','[{10}]',"
SQLtext = SQLtext : "'[{11}]','[{12}]','[{13}]','[{14}]','[{15}]','[{16}]','[{17}]','[{18}]','[{19}]','[{20}]','[{21}]','[{22}]');"
Message("SQLtext",SQLtext)

Arr = ArrayFileGetCSV("Dep_Config_Full.txt",0,@TAB) ; No Dropped Fields - ALL !!
Timedelay(1)
time1 = GetExactTime()
;stmt = dbInsertFlat(db, SQLText, DirScript() : "202003_10K1918.txt", Arr, 2,-1) ; 10K rows TEST with Header Row
stmt = dbInsertFlat(db, SQLText, DirScript() : "202003_fmc1918.txt", Arr, 1,-1) ; 3.5MM Live File, No Header

time2 = GetExactTime()
rc = xGetElapsed(time2, time1)

Message("dbInsertFlat()",stmt)
Message("Processing Complete","Elapsed: %rc%%@CRLF%----------  H: M: S. hh")
EXIT



JTaylor

If I fix the the "SUB" character it runs through as it should for me so not sure how to fix something I can't reproduce.

I tried seeing what happens if I replaced that character on the fly.  Oddly enough the next column crashes WinBatch and nothing I tried made it act any different.  Tried several different approaches.  Very weird.

Jim

KeithW


You are running it from the WinBatch Studio, correct?

Just tried it again and it again died silently BUT built the SQL3 db in its entirety.

Keith

JTaylor

No.   I almost never use Studio.

Jim

KeithW

Jim,

OK on No Studio...

I tried compiling and it complained about Syntax Error on Line 87  missing the final space & quote ....
fixed that, tried Studio again, it had not complained at all, still died silently.  So I recompiled and
ran the compiled file.... it also created the SQL3 db in it entirety, but still no  RESP from the dbInsertFlat().

Strange...

Keith

JTaylor

Can't say this will help but did make an educated change to see.   I didn't change the help file.

   http://www.jtdata.com/anonymous/wbsql44i_vi.zip

Jim

KeithW

Jim,

Made a good Educated Change...

Both WbStudio & Compiled completed properly, did not die silently like before.

Still need to update the SQLite Version to 3.31.1 in help
Otherwise, up to you want you want to do next.

Thanx for the long hours of torture !!!
Regards,
Keith

KeithW

Wall Run Times are almost a minute longer BUT that might be do to the additional gigs I added to my filesystem with today's debugging issues.
Still I will take the 6 mins over almost an hour prior to this exercise.

Keith

JTaylor

Excellent. 

I did make that change.  Thought it would have made it that way by now but it has been changed so will show up eventually.

Probably take another crack at making the Unicode thing consistent with dbGetString() but will, most likely, post a "Release" version sometime soon either way.


Thanks again for all the testing and feedback and to all the others as well who have given it a go.    If there are any more suggestions regarding current functionality, please make them soon.

Jim

Quote from: KeithW on May 25, 2020, 08:12:56 PM
Jim,

Made a good Educated Change...

Both WbStudio & Compiled completed properly, did not die silently like before.

Still need to update the SQLite Version to 3.31.1 in help
Otherwise, up to you want you want to do next.

Thanx for the long hours of torture !!!
Regards,
Keith

KeithW

Jim,

If you are looking for work, lol...

It would be nice to add:  dbInsertExcel  (xls, xlsx - multi tab support)
Also an OutPut Version of this  dbPutExcel ... just stack on the feature request list for the future.

Thanx,
Keith

JTaylor

You will want to update again....sorry...noticed something when following through on the previous change.

    http://www.jtdata.com/anonymous/wbsql44i_vj.zip


Jim

JTaylor

I wouldn't hold my breath on that one :)   You can easily output a delimited format and do the Excel part through Winbatch.  No need to recreate the wheel.

Jim


Quote from: KeithW on May 25, 2020, 08:26:38 PM
Jim,

If you are looking for work, lol...

It would be nice to add:  dbInsertExcel  (xls, xlsx - multi tab support)
Also an OutPut Version of this  dbPutExcel ... just stack on the feature request list for the future.

Thanx,
Keith

KeithW

FYI   Help File & Extender versions out of sync.

Verified  WBStudio operation - runtime back to where it was originally, under 4 mins
Verified  Compiled  operation - runtime in the same range as above

Keith

KeithW


TO EVERYONE INTERESTED IN NATIVE SQLITE3 ACCESS FROM A WINBATCH SCRIPT

Jim has put in an enormous effort in the past 8 days to make this happen.  I feel grateful to have been a part of this effort
to get here and I know I will benefit hugely from having this available.  If you even think you might remotely have a need
for this, now if the time to get onboard and be a part of getting this Extender whipped up to be of use to as many as could
use it.  For NOW until a format release is made be sure to use the vJ release OR LATER to benefit from our learning experience
to date and drop Jim a note of thanx !!!!!!!!!!!!!!!!!!!!!!!

Regards,
Keith

JTaylor

Thanks, even though I am sure it won't be the last time  :)

Jim

Quote from: KeithW on May 25, 2020, 08:46:58 PM
FYI   Help File & Extender versions out of sync.

Verified  WBStudio operation - runtime back to where it was originally, under 4 mins
Verified  Compiled  operation - runtime in the same range as above

Keith

KeithW

Jim,

Is there *ANY* chance you will fix  dbInsertCSV  to allow for embedded delimiters  (quote-mode)  ,"ABC Co, Inc",
where the function understand this example properly?

I did a search on my BIG file and  I have @TAB (for field) and "|" for quoting and maybe one other
special character out of all the ones I searched for... this is brutal.  If someone uses a "|" in the data
I think I am screwed.

Keith

JTaylor

Just thinking about embedded delimiters makes my brain ache with all the possibilities.   Perhaps at some point but there is another thing higher on my list.    Nag me again later on.   If you run into trouble before that point let me know.   With some creative use of StrReplace(), ArrayFileGetCSV(), ArrayFilePutCSV() and SQL we can overcome most anything.


Jim

KeithW

Jim,

I got the Binary functions to scout the I/P file and let me know if there are any issues... at least for what it tests for at this time.

Before all is said and done... are you going to compile/provide a 64b version of WBSql?

Keith

JTaylor

Excellent.

I wondered how long it would be before I got the 64-bit question.   I would happily do that if I knew what was needed.   Someone wanted my CommControl Extender in 64-bit but I didn't know how so they hired one or two people to do it but that went nowhere.   I don't think it is really that complicated but I have no idea what the difference between the two might be.   I remember a couple of things being mentioned in the past but don't recall much now.   Int sizes and such, I think.     Some references in SDK but not enough for me to know where to even start.   I have been reading through a good article on the differences though.

Basically I wouldn't hold your breath.   Hopefully I can figure it out at some point but no idea when.   The other guy has been waiting 7 years, give or take  :)     I will put it on my list.   I understand a lot more now than then so there is hope.

Jim




Quote from: KeithW on May 27, 2020, 12:40:46 PM
Jim,

I got the Binary functions to scout the I/P file and let me know if there are any issues... at least for what it tests for at this time.

Before all is said and done... are you going to compile/provide a 64b version of WBSql?

Keith

KeithW

Jim,

HOUSTON WE HAVE AN ISSUE.... LOL

We need to add a Param7 on the dbInsertCSV() called quoting character
that will allow the specification of any *possible* quoting character rather than forcing only (")
cause if it isn't a " it gets load into the db....

PLEASE, I have almost 10,000 "s in the file and I would have to research each one before I could make
a change to replace them... be MUCH EASIER to specify a different quoting char.  This still does not resolve
handling embedded quoting chars, just side steps that issue.

Regards,
Keith

td

Quote from: JTaylor on May 27, 2020, 01:10:09 PM
Excellent.

I wondered how long it would be before I got the 64-bit question.   I would happily do that if I knew what was needed.   Someone wanted my CommControl Extender in 64-bit but I didn't know how so they hired one or two people to do it but that went nowhere.   I don't think it is really that complicated but I have no idea what the difference between the two might be.   I remember a couple of things being mentioned in the past but don't recall much now.   Int sizes and such, I think.     Some references in SDK but not enough for me to know where to even start.   I have been reading through a good article on the differences though.

Basically I wouldn't hold your breath.   Hopefully I can figure it out at some point but no idea when.   The other guy has been waiting 7 years, give or take  :)     I will put it on my list.   I understand a lot more now than then so there is hope.

Jim

Some extenders ported to 64-bit with only a few very minor changes.  Others took a lot of recoding.  Generally, the newer extenders ported with almost no changes partially because they are coded in C++ and follow good code practices.  It can be said that the WIL SDK C++ Extender template ports without issue. 
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

JTaylor

Okay.   Switched to x64 and tried compiling but it wanted altbase.h and after copying that in it wanted a bunch of others.   Also, didn't like a lot of the casting but this gives me a place to start.  Sounds like there is nothing Extender related that needs to be changed, just the parts I added.   That is helpful to know.

Thanks.

Jim

Quote from: td on May 27, 2020, 02:36:53 PM

Some extenders ported to 64-bit with only a few very minor changes.  Others took a lot of recoding.  Generally, the newer extenders ported with almost no changes partially because they are coded in C++ and follow good code practices.  It can be said that the WIL SDK C++ Extender template ports without issue.

KeithW

Jim,

I got the following on a bad spec for Creating A Table, but it needs a little help
not just a popup like this ???

Keith

JTaylor

Sorry about that.  Not sure how that got left in there.   I didn't take time to change any version info.   In the middle of something else but didn't want you to have to wait.

http://www.jtdata.com/anonymous/wbsql44i_vk.zip

Jim

JTaylor


JTaylor

Help me understand what data you are receiving that provides CSV quotes that are not double-quotes.   Want to make sure I am allowing for the right thing.

Jim



Quote from: KeithW on May 27, 2020, 01:39:05 PM
Jim,

HOUSTON WE HAVE AN ISSUE.... LOL

We need to add a Param7 on the dbInsertCSV() called quoting character
that will allow the specification of any *possible* quoting character rather than forcing only (")
cause if it isn't a " it gets load into the db....

PLEASE, I have almost 10,000 "s in the file and I would have to research each one before I could make
a change to replace them... be MUCH EASIER to specify a different quoting char.  This still does not resolve
handling embedded quoting chars, just side steps that issue.

Regards,
Keith

KeithW

Jim,

The data has "," and embedded " in the file....
I wanted to change the field delimiter to TAB  and the quoting character to "|" in order to get away for the existing issues.

Keith