Search within multiple .txt files

Started by pguild, September 16, 2019, 12:29:05 PM

Previous topic - Next topic

pguild

I want a user to be able to search with a certain directory for some text. I know Windows 10 can do this quite nicely when search is setup correctly in file explorer.
But I want to make it easier for the user.

Here's my plan. Look good?  Any other ideas. (Full exact code is not shown here, just the basic ideas.)

; User may click a "SEARCH" button on the main dialog to search within file in the already determined directory that contains .txt files.
;Ask the user what text they are looking for.
;Change to the desired directory.
;Get a list of all the .txt files in the directory using list = FileItemize("*.txt") ;all .txt files
;Use a for next look to get each file name and load its file contents.
;FoundFlag = 0 ;flag

For n = 1 to numitems
   filename = itemextract(n,list,@TAB)
   Str = fileget(filename)
   ;Use strindex to search for the desired text within Str
   ;if strindex returns @TRUE
      ;set FoundFlag to 1
      ;add the file name to a foundlist
      ;add some of the text before and after the found text to another list called textlist
   ;endif
Next
; When loop is finished check the FoundFlag
if FoundFlat is 0 report to user "we ain't founding yet." or something cute like that.
Otherwise
; Build a "picklist with this format: Filename -- Foundtext (where Filename = name of file in which text was found and Fountext is sentence or phrase containing the text.
; Prompt user to pick the desired item with askitemlist
;Activate notepad to open the desired file but continue the Winbatch Script

; If more than one positive result, ask the user if user wants to see another file. if answer no, just return to main program menu.
   

kdmoyers

Roughly how big are the text files (in megabytes) and
how many files are there (hundreds, thousands, hundreds of thousands)?

These two questions will guide your solution.

$0.02
Kirby
The mind is everything; What you think, you become.

td

Good questions.  Depending on the answers to those questions, you could consider using the File Search Extender to identify files with the targeted text and use BinaryIndex on a binary buffer to locate and extract text from the found files.  Of course, if the size and number of files are small those two bits of functionality might be overkill.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

JTaylor

If they are not Unicode files and you do not want to reinvent the wheel...

    http://www.jtdata.com/jtfindit.html

It is free and I think does what you are intending.    Does require registration.

Jim

stanl

I've always liked using WB's CLR to execute PS one-liners.  The one-liner below searches my scripts dir and sub-folders for "WinSCP" located in any .wbt files with results returned as a gridview. The return shows both filename and the line the word appears on.



Get-ChildItem C:\scripts -Filter *.wbt -Recurse | Select-String "WinSCP" | Out-GridView





td

There is always the FCL's 'Microsoft.VisualBasic' assembly and WIL CLR hosting.  It does require more than one line but it is a bit more flexible. It allows you to store whatever bits of information you want in whatever form you choose.  It is also Unicode/ANSI agnostic so that issue can be ignored.  However, you can't use the params String[] overload of the "FileInFiles" method to specify a file mask.

Here is a rough example as a starting point:
Code (winbatch) Select
;; What to look for and where to look.
strText =  'WinBatch'
strDir  =  'c:\logs'
aMask[0] = '*.log'  ; Look for only text files.

ObjectClrOption('useany', 'Microsoft.VisualBasic')
objFileSys = ObjectClrNew('Microsoft.VisualBasic.FileIO.FileSystem')
SrchOpt = ObjectClrType('Microsoft.VisualBasic.FileIO.SearchOption', 3) ; 3 = search subdirs.

ObjectClrOption('useany', 'System.IO')

;; Since the FindInFiles method is overloaded the CLR binder needs a bit
;; of help identifying which version of the method we are using. This
;; is accomplished by creating a variant array of bstr using the "array|bstr"
;; type indicator in the method call.

;; Search for files containing "WinBatch" with the extension '.log'.
Files = objFileSys.FindInFiles(strDir , strText, bool:@True, SrchOpt, array|bstr:aMask)
aMap = MapCreate()
foreach File in Files

  ;; WIL FileOpen, FileRead, FileClose or BinaryRead may be faster here but
  ;; using FCL class for example consistency's sake.
  objReader = ObjectClrNew('System.IO.StreamReader', File)
  strLine = objReader.ReadLine()
  lLines = ''
  while  !objReader.EndOfStream
     if StrIndexNC(strLine, strText, 0, @FWDSCAN )
        ; This example just saves the line but you could create
        ; line snippits here.
        lLines := strLine:@lf
     endif
     strLine = objReader.ReadLine()
  endwhile
  objReader.Close()

  ;; Perform whatever here to store the information.
  if lLines != ''
     aMap[File] = lLines
  endif
next

if Arrinfo(aMap, 1) then Message('Last Found File', File:@lf:aMap[File])
else Message('Find in Files', 'Nothing found')
exit
 


[edit] Reposted the above with at least some of the blunders removed.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

and another rough example
Code (WINBATCH) Select


;Winbatch 2019B - Use CLR/Powershell to search for word in files
;This should work with versions after 2013                 
;Stan Littlefield, September 18, 2019
;/////////////////////////////////////////////////////////////////////////////////////////////////////////
gosub udfs
IntControl(73,1,0,0,0)
path = "c:\scripts"    ;replace as needed
filter = "*.wbt"       ;replace as needed
srch = "geterror()"   ;replace as needed
;script outputs results to gridview, but PS can output to .csv and other formats.


cScript= 'Get-ChildItem ':path:' -Filter ':filter:' -Recurse | Select-String "':srch:'" | Out-GridView -Title FileSearch'


BoxOpen("Please Wait","Searching for %srch% ")
ObjectClrOption("useany", "System.Management.Automation")
objAutoPs = ObjectClrNew("System.Management.Automation.PowerShell")
oPshell = objAutoPs.Create()
oScope = ObjectType("BOOL",@TRUE)
oPshell.AddScript(cScript,oScope)
objAsync = oPshell.BeginInvoke()
oPShell.EndInvoke(objAsync)     
oPshell.Dispose()
oPshell=0
BoxShut()


While WinExist("~FileSearch")
   Timedelay(1)
Endwhile


Exit


:WBERRORHANDLER
geterror()
Message("Error Encountered",errmsg)
Exit


:CANCEL
Exit


:udfs
#DefineSubRoutine geterror()
   wberroradditionalinfo = wberrorarray[6]
   lasterr = wberrorarray[0]
   handlerline = wberrorarray[1]
   textstring = wberrorarray[5]
   linenumber = wberrorarray[8]
   errmsg = "Error: ":lasterr:@LF:textstring:@LF:"Line (":linenumber:")":@LF:wberroradditionalinfo
   Return(errmsg)
#EndSubRoutine
Return

td

Decided to repost the "Microsoft.VisualBasic" script because the original contained multiple bugs and because there is a relatively simple way to make the file mask parameter acceptable to the CLR.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

pguild

Thanks everyone for your help!  ;D

The number of files to search through?  I am not sure, most people won't have more than 500.
They are just simple text files created with Windows Notepad.

I want to keep it all within a Winbatch compiled .exe.  My code now works great,
except that I have had trouble capturing what I call the "surrounding text" -- so I gave up on
that for a while.  But I am getting a nice pick list of files containing the text I am searching for.
When user clicks the item on the dropdown, the file is immediately activated with Notepad.
(or the default program for opening .txt files).

td

Not sure exactly what you mean by "I want to keep it all within a Winbatch compiled .exe" but almost every WIL function relies on functionality provided by OS (Windows) DLLs and .Net assemblies found in the FCL (Framework Class Library) are every bit as much part of the Windows as other Windows DLLs.  To put it another way, since the release of Windows Vista, using a .Net assembly in the FCL does not make a compiled exe any more dependent on externalities than using a WIL function like StrCmp or Run.   
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

pguild

Quote from: stanl on September 17, 2019, 03:07:55 AM
I've always liked using WB's CLR to execute PS one-liners.  The one-liner below searches my scripts dir and sub-folders for "WinSCP" located in any .wbt files with results returned as a gridview. The return shows both filename and the line the word appears on.

Thanks for your reply. This looks cool. I am seeking clarity regarding the meaning of WB and CLR and PS. thanks.


Get-ChildItem C:\scripts -Filter *.wbt -Recurse | Select-String "WinSCP" | Out-GridView


pguild

Quote from: td on September 19, 2019, 07:22:06 AM
Not sure exactly what you mean by "I want to keep it all within a Winbatch compiled .exe" but almost every WIL function relies on functionality provided by OS (Windows) DLLs and .Net assemblies found in the FCL (Framework Class Library) are every bit as much part of the Windows as other Windows DLLs.  To put it another way, since the release of Windows Vista, using a .Net assembly in the FCL does not make a compiled exe any more dependent on externalities than using a WIL function like StrCmp or Run.

Thanks for your repy. I wish I knew FCL and .Net assemblies.  The only assembly I know is the kind I had to go to when I was teaching Junior high in a huge public school.  What's an assembly is that like a function in Winbatch?

Maybe I mean I just want to use Winbatch code because that is what I know. I have idea how to use FCL  or a ".Net assembly" in the FCL and don't know how to learn it or incorporate it into an App created with Winbatch. In a short time using just Winbatch native code,  I have a Search button in a dialog that when clicked prompts the user for a search term and then displays results in a pick list. When an item is picked the appropriate notepad file is opened. 

An extender probably would have simplified things for me. I think I used it about 10 years ago, when I created and then lost the code for a cool App that searched for and backup up all the files on a computer that matched a certain file spec. I never lose code now since I store it all on Dropbox.

stanl

Quote from: pguild on September 20, 2019, 10:50:09 PM
Quote from: stanl on September 17, 2019, 03:07:55 AM
I've always liked using WB's CLR to execute PS one-liners.  The one-liner below searches my scripts dir and sub-folders for "WinSCP" located in any .wbt files with results returned as a gridview. The return shows both filename and the line the word appears on.

Thanks for your reply. This looks cool. I am seeking clarity regarding the meaning of WB and CLR and PS. thanks.


Get-ChildItem C:\scripts -Filter *.wbt -Recurse | Select-String "WinSCP" | Out-GridView





First, the code snippets that Tony and I posted could be compiled, the exception being that mine requires a copy of Winbatch 2013 or later.
PS is short for Powershell. It actually comes installed as part of Windows OS and you can reference a Powershell folder under Windows\System32.  Tony can speak more technically about the CLR, but suffice to say it's integration into Winbatch allows leveraging .NET to some degree and subsequently can run Powershell code.
Not the greatest analogy - but consider my PS one-liner called from Winbatch similar to executing a batch file as PS does a lot of the heavy lifting to accomplish the text search saving lines of code.

td

Quote from: pguild on September 20, 2019, 11:04:39 PM

Thanks for your repy. I wish I knew FCL and .Net assemblies.  The only assembly I know is the kind I had to go to when I was teaching Junior high in a huge public school.  What's an assembly is that like a function in Winbatch?

Maybe I mean I just want to use Winbatch code because that is what I know. I have idea how to use FCL  or a ".Net assembly" in the FCL and don't know how to learn it or incorporate it into an App created with Winbatch. In a short time using just Winbatch native code,  I have a Search button in a dialog that when clicked prompts the user for a search term and then displays results in a pick list. When an item is picked the appropriate notepad file is opened. 

An extender probably would have simplified things for me. I think I used it about 10 years ago, when I created and then lost the code for a cool App that searched for and backup up all the files on a computer that matched a certain file spec. I never lose code now since I store it all on Dropbox.

If you resist learning new to you but established concepts and technologies, you will, in the long run, end up spending a lot more time not less solving problems.   New concepts are just new tools added to the toolbox.  There is a whole section on .Net in the Tech Database and the Consolidated WIL Help file covers the three .Net (CLR hosting) functions added to the WIL scripting language about 6 years ago.  It's not that hard to get a very basic working knowledge of WIL/.Net script but you do have to get over the fear of new terms and concepts.  And you certainly don't need to deep dive into WIL CLR hosting to use it to solve problems.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

pguild

Thanks. Yes, getting over fear is a major skill I am targeting for improvement. Thanks for the chastisement.  ;D

I did search for .Net in help and found nothing. Searching for CLR hosting brought up a detailed discussion. I am glad you put CLR Hosting in parens.

The discussion might as well have been written in Greek and was not helpful to me. I miss the good old days when Winbatch offered clear, easy to understand tutorials.  I am also puzzled by this statement:

"WinBatch scripts cannot be compiled into CLR applications or assemblies." 

But can CLR functionality be used in Winbatch Scripts intended to be compiled into .exe apps?

A simple example of what I need to do (step-by-step) to use CLR in a WinBatch Script would be helpful.   Thanks!




kdmoyers

I don't have time to completely write it now, but with only 500 small files to look through, I might attempt this in simple winbatch and maybe avoid extra complexity? 

1. use srchInit loop to look thru files 2. FileGet and StrIndexNc for a target string 3. build list of hit filenames and hit text 4. show a AskItemList of the hits 5. RunShell @NOWAIT notepad on the picked item. 6. return to 4    One page of code? Might take several seconds to scan all the files.

For extra polish, you can add a pretty Dialog, maybe RegEx matching to show the matches in context, etc.

just a quick $0.02 -Kirby


The mind is everything; What you think, you become.

JTaylor

Or could use what I posted which is all WinBatch, an exe, and does most everything you mentioned  ;)

Jim

td

Quote from: pguild on September 23, 2019, 11:48:32 AM
Thanks. Yes, getting over fear is a major skill I am targeting for improvement. Thanks for the chastisement.  ;D

I did search for .Net in help and found nothing. Searching for CLR hosting brought up a detailed discussion. I am glad you put CLR Hosting in parens.

The discussion might as well have been written in Greek and was not helpful to me. I miss the good old days when Winbatch offered clear, easy to understand tutorials.  I am also puzzled by this statement:

"WinBatch scripts cannot be compiled into CLR applications or assemblies." 

But can CLR functionality be used in Winbatch Scripts intended to be compiled into .exe apps?

A simple example of what I need to do (step-by-step) to use CLR in a WinBatch Script would be helpful.   Thanks!

In WinBatch documentation and examples, CLR hosting is almost always referred to as dotNet.  This is because ".Net" is a marketing name owned by Microsoft whereas "dotNet" is generic.  CLR (Common Language Runtime) is the name of a component of Windows and on other operating systems that is the virtual machine, hosted by the WinBatch process that executes the code inside dotNet assemblies.  The FCL (Framework Class Library) is the implementation of a standardized set of language-agnostic classes that are also a part of the Windows operating system and available on other operating systems.  On Windows, assemblies are simply DLL files that contain the implementation of one or more classes.  On other operation systems, assemblies may be implemented in files that have different extensions but serve the same purpose.

Probably the best way to learn new scripting skills is by doing. Pick a simple project and give it a try using the available resource mentions in the following paragraphs.  If you need a "step-by-step" example to get started, there are over 100 examples in the Tech Database:

https://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+WinBatch/dotNet

If you look at the examples in this topic, you will find that there are basically three functions that implement CLR hosting in WinBatch.  Those functions are all documented in the Consolidated WIL Help file.  Reading that documentation should be where you start.  Next, spend some time looking at a few of the numerous examples in the Tech Database.  Once you gain some familiarity you can use the literally many thousands of examples on the Web written in a variety of language that either easily or with some effort translated to WIL.

The primary source information is Microsoft's own .Net documentation of the Framework Class Library.  If you want more background, you should look at the COM help file that is part of the Consolidated WIL Help file. You will notice that the three .Net functions that makeup WIL's interface to .Net mirror three of WIL's COM Automation functions. There is a reason for this.  Almost all of the knowledge related to using COM Automation in WIL is transferable to using dotNet/CLR hosting/.Net in WIL.     

That is not to say that it is "easy".  Learning how to us WIL CLR hosting cannot be learned in one sitting via some tutorial and it does require some patience and a bit of skill using a Web search engine (ability to construct multiple search terms, for example.)  But it is well within the skill set of the average WinBatch user and does not require being a professional developer.  We know this because of the number of questions, comments, and posts we receive from users doing exactly that. 

Also, you are misinterpreting the statement "WinBatch scripts cannot be compiled into CLR applications or assemblies."  That statement means that WIL scripts cannot be compiled into .Net assemblies or executables. However, it does not mean that WIL scripts that use .Net assemblies cannot be compiled into native code Windows executables.   

Give it a try on some small project.  It isn't as daunting as it seems.

"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

kdmoyers

Quote from: JTaylor on September 24, 2019, 06:57:38 AM
Or could use what I posted which is all WinBatch, an exe, and does most everything you mentioned  ;)

Jim
Well, if you want to do it the easy way, sure. (laugh) 
Thanks Jim!
The mind is everything; What you think, you become.

stanl

Quote from: kdmoyers on September 25, 2019, 04:26:22 AM
Quote from: JTaylor on September 24, 2019, 06:57:38 AM
Or could use what I posted which is all WinBatch, an exe, and does most everything you mentioned  ;)

Jim
Well, if you want to do it the easy way, sure. (laugh) 
Thanks Jim!


Certainly a win-win for the OP. If OP has version >=2013 then CLR is a learning experience, otherwise an out-of-the-box WB exe. Personally I have found the PS one-liner is about as close to GREP as windoze can get. Furthermore results can be tested without extra WB de-bugging.