Screen Scraping

Started by MilkTea, July 26, 2014, 02:13:01 AM

Previous topic - Next topic

MilkTea

Hello,
I need to get the contents of a class.  The class type is VirtualString. (Delphi)
I am able to select and highlight it programmatically (I know the window ID), then apply <CTRL>C gives me the class name rather then the content or pointer to that content.

How can I "explore" the contents of this class.
There is actually only one element "ROM code: " followed by 14 digits (8 Hex) or the in-memory structure of this class.
I could try to explore the in-memory structure with spy++ (if I ever find where to download that program).

Other solutions would be : apply a trick in order to copy it, but this is not enabled in the control.
I can, though, highlight it / select it  but I do not know enough about Delphi controls hidden features and blind trying did not learn me anything

To make a long story short: I'm stuck!

Any help will do and I'll keep you informed about my progress/success/failure(which is not an option)

Mark

MilkTea

MOre info : UiPath tryout revealed that it is some kind of xml

<wnd cls='TVirtualStringTree' />
<ctrl role='client' />

Maybe it is an object that can be accessed through a ie.control?

td

The XML is most likely a product of your UIPath software and not an intrinsic part of the Delphi control.  You can try using WinBatch's Roboscripter tool to attempt to generate a script automatically.  If that doesn't work and sending keys doesn't do it either, there is some possibility of a dotNet solution using WinBatch's CLR hosting functionality.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

MilkTea

I tried roboscripter. I found spyxx.exe.
Damn... this one's going to be difficult.
I have to read a tVirtualString, which is a Delphi component.
Delphi's VCL (visual component library) is accessible the same way as MFC. For example a 'Tedit' component can be sent the same messages as a 'MFC edit' component.

Yipiee (http://edn.embarcadero.com/article/33642) ... dough :
QuoteThere are,however, third-party components that look like, and for the most part, behave like standard controls, but which do not incorporate all the standard messages and events that a Windows control of the same look and feel would offer. An example is TVirtualStringTree, an advanced Tree View control with lots of great features. It does have a few problems, though. For one, when you open or close a node, it does not raise the same events that a SysTreeView32 control would. Moreover, it does not convey its opened or closed state to the screen reader when asked. It also supports checkable Tree items, however the same problem: When asked by a screen reader, it does not tell so.

In fact I do not need all this features.  My tVirtualSting has just one node. I have to select it.  And read its contents.
In order use the CLR hosting, I need to know what to send to that control (and how to send it) as I'm a bit of a noob.

I found this :
QuotechildNode = SendMessage(hNode, TVM_GETNEXTITEM, 0&, 0&)
Debug.Print "childNode " & childNode

Googling : "Getting data from 'SysTreeView32' class in another process" did not make me any wiser ...

... to be continued.


td

Since you have Spy++ you can view which Windows messages are accepted by the Delphi control when different operations are performed on the control.  The Delphi control is likely just a subclassed SysTreeView32 common control so it may accept some of the messaged found in the common control.   If that is the case, you can use the WinBatch DllCall functions to send the appropriate messages and process the output of those messages.  The documentation of the TreeView control messages can be found here

http://msdn.microsoft.com/en-us/library/windows/desktop/ff486106(v=vs.85).aspx
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

After a second cup of coffee I realized that the task of using the Win32 SendMessage API function to access the content of your control seems a bit beyond what most WinBatch users would be willing to tackle.  This is because it would require marshaling memory across process boundaries.   The Control Manger extender is very good at this but it mostly only works with the Windows UI shell's common control and User32 window classes.   
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

The FCL UIAutomationClient assembly offers classes that allow access to user interface elements.  I don't know how useful this would be for accessing a Delphi control so it is offered more as a curiosity that a solution.  Keep in mind that a window usually needs to be un-iconized  to be seen by the UIAutomation classes.  Also note that script lacks proper error handling.
Code (winbatch) Select

; Load required assemblies.
ObjectClrOption("use","UIAutomationClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35")
ObjectClrOption("use","UIAutomationTypes, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35")

; Search scope enumerated values.
enumScope  =  ObjectClrNew("System.Windows.Automation.TreeScope")

;  Get access to the desktop window which is the parent of all other windows.
objUiElement = ObjectClrNew("System.Windows.Automation.AutomationElement")
objUiRoot = objUiElement.RootElement

; Instantiate a property condition for an Explorer window.
objPropCon1 =  ObjectClrNew("System.Windows.Automation.PropertyCondition", objUiElement.ClassNameProperty, "CabinetWClass")

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Find the first Explorer window under the desktop
nElement  = enumScope.Children
Scope = ObjectClrType("System.Windows.Automation.TreeScope",nElement)

; Find the first explorer under the desktop.
objExplorer = objUiRoot.FindFirst(Scope,objPropCon1)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Find the first treeview control in explorer.

; Istantiate other useful classes need to search for a treeview control.
objPropCon2 =  ObjectClrNew("System.Windows.Automation.PropertyCondition", objUiElement.ClassNameProperty, "SysTreeView32")

;; Search scope.
nElement  = enumScope.Descendants
Scope = ObjectClrType("System.Windows.Automation.TreeScope",nElement)

; Find Shell Explorer window.
objTree = objExplorer.FindFirst(Scope,objPropCon2)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;  Get the first item of the tree.

; Classes needed to obtain a treeveiw item.
objCtrlType = ObjectClrNew("System.Windows.Automation.ControlType")
objPropCon3 =  ObjectClrNew("System.Windows.Automation.PropertyCondition", objUiElement.ControlTypeProperty, objCtrlType.TreeItem)

; Create a scope parameter for item search
nElement  = enumScope.Children
Scope = ObjectClrType("System.Windows.Automation.TreeScope",nElement)

; Get the first item.
objItem = objTree.FindFirst(Scope,objPropCon3)

; Get the items text.
strName = objItem.GetCurrentPropertyValue(objUiElement.NameProperty)

Message("First Item's Text", strName)
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

MilkTea

Well, thank you for the UI automation solution. Great post.
I tried it with Regedit.exe. 
And I also installed the Windows Detective (sourceforge.net). 
And I could easily find the class RegEdit_Regedit and from there the first  child of the systreeview32-control, which was of course "Computer".
Your solution worked and I'm playing a bit with it.


The tVirtualSringTree-Class I was able to find too.  But no child Control  in Windows Detective. and the UI automation did not work ... Of course.

http://edn.embarcadero.com/article/33642 : tells me this  : VirtualStringTree is not MSAA compatible, but is about the same as systreeview32.
QuoteThere are, however, third-party components that look like, and for the most part, behave like standard controls, but which do not incorporate all the standard messages and events that a Windows control of the same look and feel would offer. An example is TVirtualStringTree, an advanced Tree View control with lots of great features. It does have a few problems, though. For one, when you open or close a node, it does not raise the same events that a SysTreeView32 control would. Moreover, it does not convey its opened or closed state to the screen reader when asked. It also supports checkable Tree items, however the same problem: When asked by a screen reader, it does not tell so.
[/size]

Although it is not winbatch. I'll have to automate it in Winbatch. The search key is displayed only in this control, nowwhere else. It is about 20 chars long and has to be re-entered between 100 and 500 different partialkeys (4 chars) a day, 365 days a year.  Tedious. The only alternative way is UiPath (not really a solution) and ranorex automation (pfff. again) which the only provided solution is through their GDI / OCR plugins.

Anyway : I will have to explore that tVirtualString (class/control). My next search will be how to explore unknown controls.

I'll start on how to "Exposing internal properties of a custom control/single element"
Of course, any suggestions would be more than welcom  ;)

Mark

td

A custom control would need to be designed and implemented with UI Automation support in mind in order to work with it.  There was some very small possibility that this was the case with your Delphi control. IIRC, MSFT provides built-in shims for any user32 and common controls that don't support UI Automation natively.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

MilkTea

It brings me a bit too far.  Not enough info about shims.  Found a screen scraping sdk accessible through COM DLL/calls.  Will post my solution here.

td

I should have chosen my words more carefully.  The post was intended to explain in general terms why Windows built-in controls work with UI Automation and third party custom controls often do not. Nothing more.

It will be interesting to see the results of your screen scraping sdk effort.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

MilkTea

Well.  I found this "www.screenocr.com". They have a sdk that exports a com object.  I want to try the soft way, cause I'm a Linux pro, not common with Windoze.

I manually succeeded in scraping the data from the screen : hotkey activate (shift,ctrl,alt) - click on the "tVirtualStringTree" control ... a "context menu" appears. hit two times down Arrow (copy tekst).  Paste tekst from clipboard.  Works manually in an excellent way.

Now Automate it... pfiew.  Without Winmacro -> winbatch tool is difficult.

Unhide the OCR window... send keys to it ... nothing happens ("OCR" with childs "&Shift, &Alt, &Win, Ctrl, "Use the follow..","+", "+", "+","OK")
It creates a system tray icon which when right clicked activates a menu "Start selection" ... "exit".

Fiddling with the automatisation...

td

There are several OCR tools in the wild.  I can't recommend any of them because I haven't use any of them.  Your tool seems to have multiple ways to interface with it via its SDK.  One or more of them might work with WinBatch's DLLCall, CLR (dotNET), or COM Automation.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

There are a couple of links to OCR related topics in the Tech Database.   One of those links refers to a product that can be found here http://www.structurise.com.  It supposedly has a OCX control that is COM Automation friendly. If so, it could be used with WinBatch.  I have not tried the product and can offer no guarantee that this will not be another rabbit hole.  Also, as I am sure you already know, be very careful when downloading installable files from the web.

A good search of our Tech Database on OCR related topics might turn up something else of interest, as well.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade