SQLite Extender (Alpha? Beta?)

Started by JTaylor, May 14, 2020, 08:03:50 PM

Previous topic - Next topic

KeithW

Jim,

The first time thru with v8 (without dbSchema() ) everything worked fine but runtime increased to 2m 37s 95 100ths
from original v7 runtimes of 1m 10s 54 100ths, not sure why the change there?

With v8, if run dbSchema() dbTables gets screwed up again.  Even after commenting out the call to dbSchema(), the corruption
for dbTable seems to continue expanding.  However the process has run twice now even with the corrupted dbTable()
return.

Run Time seems to now be holding in the 2m 42s window?????

ALSO, still not honoring the Lines to SKip, getting hear in the SQLite3 db.


Keith

KeithW

Jim,

I do not want to drag you away from fixing the CSV cause I think you are SOOO close.

I would like to work on the Flat File as that is really what I need at the moment but I need more
clarification on the ARRAY setup for the Flat File Processing.

Your code sample

;#############################################################
;dbInsertFlat()
;Batch Insert from a Flat, Fixed File
; Arg 1 - STRING - Full Path for Database.
; Arg 2   STRING - SQL Template
; Arg 3 - STRING - Full File Path
; Arg 4 - ARRAY  - Load Configuration Array
; Arg 5 - Int(optional)  -  Row to Start - Default 1


; Load Configuration Array - Each Column to be loaded, needs a Column in the Array
;  Row 0 -  FldName - Will not be used here for the benefit of the user.
;  Row 1 -  Begin   - Column at which the field begins
;  Row 2 -  Length  - Length of Field.
;
;Example of post-processing SQL
;   SQLText = "UPDATE ATB_DETAIL SET PDovr60 = CASE Substr(PDovr60,length(pdovr60),1) WHEN '-' THEN PDovr60*-1 END;"
;   dbExecute(db,SQLText)
;
;#############################################################


SQLText = "INSERT OR IGNORE INTO atb_detail (Trans_No, Tran_Date, Due_Date, Tr_Cass, Tr_Amt, Amt_App, Current, PD01_30, PD31_60, PDovr60) Values ('{{{0}}}','{{{1}}}','{{{2}}}','{{{3}}}','{{{4}}}',NULL,'{{{5}}}','{{{6}}}','{{{7}}}','{{{8}}}');"

;Arr = ArrayFileGetCSV("load_config.txt",0,@TAB)
;stmt = dbInsertFlat(db,SQLText,"C:\gbat\extenders\Sqlite\WBSql\atb.txt", Arr, 6)


Message("dbInsertFlat()",stmt)


Uses an array to specifiy the columns,  while you said you did not use Row(0)
You said the Row(1) was start, row(2) was length, etc...

Are you using a single dimension rows 1 & 2 first field, 3 & 4 second field ...  OR

row(1)-field1 -  dim(1) for start, dim(2) for length
row(2)-field2 -  dim(1) for start, dim(2) for length

wanting to be sure I understand how to address the parameters.

Thanx,
Keith

JTaylor

Open this in excel - TAB Delimited.

Let me know if that helps.

ArrayFileGetCSV() is a great way to get what you need to submit.   Partly why I went this route.  Viewing the setup is clear, at least to me.  Open to other suggestions. 

Jim

KeithW


JTaylor

Please post your dbInsertFile() call.   This works fine for me so not sure what might be happening.

Jim

Quote from: KeithW on May 23, 2020, 06:59:00 PM
Jim,



ALSO, still not honoring the Lines to SKip, getting hear in the SQLite3 db.


Keith

JTaylor

Not sure about the time increase.   I didn't change anything of which I am aware.

Jim

KeithW

I can live with 2.5 mins for a million records
Thats still hummin along.

Keith

JTaylor

Not sure if this will fix it but it works for me.  Ran it over and over again.   Didn't really do anything other than switch to UNICODE.   I looked at everything closely and, as far as I can tell, I am cleaning up after myself at every point.

I think I have UNICODE working in all the functions now.   May need a bit of tweaking on that front.   It shows properly from the array but when I write japanese to a file it gets corrupted.   French works fine though.  Guessing that isn't a real issue for what you are doing though. 

http://www.jtdata.com/anonymous/wbsql44i_v9.zip

Jim

KeithW

Jim,

Was playing with the FlatFile rountine... everything worked up the actual call to dbFlatFile().
dbTable(), dbSchedma() are both clean and it builds the db, I verify thru and 3rd DBTool.

On the dbFlatFile()  I get    near 'INSERT': syntax error

What I am not sure is how to go about the ID_NO (AUTOINCREMENT) specification
how do you code that in the DB_Config.txt file?

Anyway, included WB Script, Dep_Configtxt (in my case) and a Sample of ~100 transactions to load.

If you see the error of my ways, please enlighten....

Regards,
Keith

stanl

Quote from: KeithW on May 23, 2020, 09:53:36 PM
Jim,

Was playing with the FlatFile rountine...
Regards,
Keith


Off-Topic: Noticed your flat file had Raleigh and Wilmington references. By any chance you reside in NC? I'm in Raleigh.

JTaylor

Guess it may not be obvious so to clarify, the place holder number in the template corresponds to the array column.   Maybe my wording about table columns in the help might be misleading.  The placeholders must have a corresponding column in the config array.  You can also duplicate them.   That is you can use '[{0}]' in more than one place in the template.  If you aren't using a placeholder for a column such as the auto increment field you do not need to worry about anything for it in the config file.

Off to look at why your INSERT is failing.   Just wanted to get this clarification out here.

Hopefully I didn't make this too complicated.  Another reason I went this route was that I had more ambitious plans for the config file than what ended up here but realized it was overkill.   I still think it is a good way to do it.  I do appreciate the questions as it helps me with the HELP file.

Jim

Quote from: KeithW on May 23, 2020, 09:53:36 PM
Jim,

Was playing with the FlatFile rountine... everything worked up the actual call to dbFlatFile().
dbTable(), dbSchedma() are both clean and it builds the db, I verify thru and 3rd DBTool.

On the dbFlatFile()  I get    near 'INSERT': syntax error

What I am not sure is how to go about the ID_NO (AUTOINCREMENT) specification
how do you code that in the DB_Config.txt file?

Anyway, included WB Script, Dep_Configtxt (in my case) and a Sample of ~100 transactions to load.

If you see the error of my ways, please enlighten....

Regards,
Keith

JTaylor

I think the cause is that you need a semi-colon on the end of the INSERT statement.  It is failing when it tries to insert the second statement because it doesn't see it as a new statement.

Jim

KeithW

Jim,

You were right about the ";" as it was late at nite and I just wasn't seeing, sorry I should have caught that one myself !!

I do have a problem with the way your Config.txt file works... when transcribing  column position they SHOULD NOT BE 0 based !!!
You want to process that way internally, you should do the (-1) in your processing when looking at stuff on paper, it is "1" based !!!
Unless changed, you are gonna cause much frustration...

Regards,
Keith

KeithW

Jim,

For the dbInsertFlat() do you have any input data limiting in place?
See attached Screenshot.

202003_1MM1918.txt is 1 Million rows
202003_2MM1918.txt is 2 Million rows
202003_550K1918.txt is 500,000 rows
202003_fmc1918.txt is 3.5 Million rows (Master File)
202003_SAM1918.txt is 100 rows (Sample for initial testing)

reason I am asking is the 202003_SAM1918.txt loads in a couple hundredths of a second and all of the others die silently !!!!


The only line in the script being changed is below....  EVERYTHING else remains the same


stmt = dbInsertFlat(db, SQLText, DirScript() : "202003_500K1918.txt", Arr, 1)



Any ideas?

Regards,
Keith

KeithW

Jim,

I should have also included these situations are all with the v9 Extender....

Keith

JTaylor


No intentional limits.   Works pretty much like dbInsertFile().   I will take another  look.

Jim


Quote from: KeithW on May 24, 2020, 10:11:53 AM
Jim,

For the dbInsertFlat() do you have any input data limiting in place?
See attached Screenshot.

202003_1MM1918.txt is 1 Million rows
202003_2MM1918.txt is 2 Million rows
202003_550K1918.txt is 500,000 rows
202003_fmc1918.txt is 3.5 Million rows (Master File)
202003_SAM1918.txt is 100 rows (Sample for initial testing)

reason I am asking is the 202003_SAM1918.txt loads in a couple hundredths of a second and all of the others die silently !!!!


The only line in the script being changed is below....  EVERYTHING else remains the same


stmt = dbInsertFlat(db, SQLText, DirScript() : "202003_500K1918.txt", Arr, 1)



Any ideas?

Regards,
Keith

JTaylor

I can see that...will change it to 1-based system.

Jim

Quote from: KeithW on May 24, 2020, 09:37:30 AM
Jim,

You were right about the ";" as it was late at nite and I just wasn't seeing, sorry I should have caught that one myself !!

I do have a problem with the way your Config.txt file works... when transcribing  column position they SHOULD NOT BE 0 based !!!
You want to process that way internally, you should do the (-1) in your processing when looking at stuff on paper, it is "1" based !!!
Unless changed, you are gonna cause much frustration...

Regards,
Keith

KeithW

Quote from: JTaylor on May 24, 2020, 12:37:35 PM

No intentional limits.   Works pretty much like dbInsertFile().   I will take another  look.

Jim



Jim,

100 lines works, 1,000 lines works,  10,000 has a silent death.
While I did not try and find a threshold it lies between  1,000 - 10,000

Keith

JTaylor

That was my bad...I found the problem and your post now makes it fairly certain I did.

1-based change is part of this as well.   I *think* UNICODE is working properly in all functions.


   http://www.jtdata.com/anonymous/wbsql44i_va.zip

Basically, once you finish testing and make sure everything works you are trying and if no one else jumps in with suggestions or bugs I will declare this "RELEASED"!!!   :)

Jim

KeithW

Jim,

I meant to ask you if you could include the following....

regards @dbVersio() you display your Extender Version   100#, etc...

Could you also add the SQLite Version Info such as...

SQLite 3.31.1  or whatever the version of the Library you are using?

Keith

KeithW

Jim,

Whoa, we went backwards brother....

Zip file contains:
202003_SAM1918.txt        -  Sample I/P File
DEP_ATB_Mar20.sqlite      - Generated SQL3 db
Dep_Config.txt                  - Column Definitions
wb_sql_FIXED_Test.wbt   - wb script

If you view the SQL3 db you will see  [{0}]  where they don't belong and the columns are shifted.
The only thing I did to the Dep_Config.txt in ROW(1) added "1" to each starting column to get away from 0 based column reference

As I understood it, nothing else needed to be adjusted ???

Regards,
Keith

JTaylor

Just added to the "_va" zip.

   dbSQLiteVersion()


Jim


Quote from: KeithW on May 24, 2020, 01:40:47 PM
Jim,

I meant to ask you if you could include the following....

regards @dbVersio() you display your Extender Version   100#, etc...

Could you also add the SQLite Version Info such as...

SQLite 3.31.1  or whatever the version of the Library you are using?

Keith

JTaylor

Your script has a 0 placeholder in the Template.

SQLtext = SQLtext : "VALUES ('[{0}]','[{1}]','[{2}]','[{3}]','[{4}]','[{5}]','[{6}]','[{7}]','[{8}]','[{9}]','[{10}]',"


Jim

Quote from: KeithW on May 24, 2020, 02:00:25 PM
Jim,

Whoa, we went backwards brother....

Zip file contains:
202003_SAM1918.txt        -  Sample I/P File
DEP_ATB_Mar20.sqlite      - Generated SQL3 db
Dep_Config.txt                  - Column Definitions
wb_sql_FIXED_Test.wbt   - wb script

If you view the SQL3 db you will see  [{0}]  where they don't belong and the columns are shifted.
The only thing I did to the Dep_Config.txt in ROW(1) added "1" to each starting column to get away from 0 based column reference

As I understood it, nothing else needed to be adjusted ???

Regards,
Keith

JTaylor

Now I get it....I think I misunderstood what you wanted.  I can change that too.

Jim

JTaylor

How about this?   Sorry for the misunderstanding.   I will confess, I didn't test this one.

You will need to change templates....just don't be silly like me and change all the numbers instead of adding one on the end and deleting the first one :)


       http://www.jtdata.com/anonymous/wbsql44i_vb.zip


Jim

KeithW

Jim,

Given the Choice, I would prefer BOTH of the changes....

Might I suggest you use [{0}] that as a NULL placeholder....  AUTOINCREMENT  OR FIELDS NOT TO LOAD
OR Do you really think that will be confusing???   ) meaning-nothing here....

I was originally talking about the Config.txt ROW starting column values....

Keith

JTaylor

I think I want to stress the use of SQL for such things since it really is an SQL issue.   That is the MAIN reason for the Templates.  It allows you to do a GREAT deal during import that you might not realize is possible.  I think I posted an example earlier related to handling the Negative numbers in your flat file.   I could have provided some fixed method for column assignment but this opens a world of possibilities and avoids requests to handle custom situations.

Anything else you think would make it work smoother or better?   As you know, once I release it, behavior changes become problematic.  Obviously won't promise I will make the change   :)

Jim

Quote from: KeithW on May 24, 2020, 02:35:43 PM
Jim,

Might I suggest you use [{0}] that as a NULL placeholder....  AUTOINCREMENT  OR FIELDS NOT TO LOAD
OR Do you really think that will be confusing???   ) meaning-nothing here....

Keith

KeithW

Jim,

I think there will be one other small addition,
I need to run a couple tests to verify.

Keith

KeithW

Jim,

It handled 1MM rows in51s 38hh
It handled 2MM rows in 1m 42s 62hh

When I fed it the entire file of 3.5MM rows
I got the error in the Screenshot, anything you can think of to get past this?

Regards,
Keith

PS Still think dbVersion should include SQLite Library version, too.....


JTaylor

I made it so you can request the SQLite version.  Had to use a function as the CONSTANTs only allow Int, as far as I could tell.  dbSQLiteVersion()

Started to say to try it in two batches with Row_To_Start but that would only work if we had a Row_To_End.

Not sure what else to try.   If it is running 2 mil rows okay I would think my code is fine but not sure why else it would choke unless maybe the file size is over 2gig???   Not sure if that would matter but maybe FileStream has a limitation???  What is the size of the file?  I will check on that as well.

If file size doesn't seem to be an issue let me add a Row_To_End option and see if it will handle it that way.   Will be a bit later though.  Need to get some supper.

I must say I am pleased with the performance.

Jim

KeithW

Jim,

filesize of 3.5+MM row file is: 996,661 KB

Keith

KeithW

Jim,

Reason we need: dbSQLiteVersion()
The Docs refer to functionality changes by versions where things are or are no longer valid/available...
We need to be able to verify what we are using when things are not as expected.

KEith

KeithW

Jim,

A bit off topic within this thread,
I was wondering, what are you using to create your  WBSql.chm  file?

Keith

KeithW

Jim,

WENT BACK TO CSV TESTING

First I would like to recommend a function name change
dbInsertFile  to  dbInsertCSV

so that

dbInsertFlat makes sense, and maybe this should be  dnInsertFixed rather than flat???

would help keep things more organized thought wise as to what you are working with.


AND

We still have the header issue with current dbInsertFile() not honoring the ignore headers line(s).
Zip contians WB Script, DataFile & generate SQL3 db.

Regards,
Keith

JTaylor

Help & Manual.

Quote from: KeithW on May 24, 2020, 03:45:18 PM
Jim,

A bit off topic within this thread,
I was wondering, what are you using to create your  WBSql.chm  file?

Keith