Windows Forum / Windows 98 / General Topics / May 2008
Find in XML and Copy
|
|
Thread rating:  |
S1L1Y1 - 19 May 2008 17:50 GMT This question must probably does not belong here but I tried different groups but got no answer. I would very much appreciate if some body can help I have an XML file that opens with internet explorer. Now I am trying to find for example all of the following: ItemField Name="image-url. Then copy all the value for example Value="http://www.abc.com/catalog/b3_1_424_1.JPG" /> . all the web addresses found. Sol
Franc Zabkar - 19 May 2008 10:21 GMT >This question must probably does not belong here but I tried different >groups but got no answer. [quoted text clipped - 4 lines] >found. >Sol Can you give us a link to a complete .xml file as an example?
- Franc Zabkar
 Signature Please remove one 'i' from my address when replying by email.
Gary S. Terhune - 20 May 2008 01:31 GMT Something wrong with your clock, Franc.
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
>>This question must probably does not belong here but I tried different >>groups but got no answer. [quoted text clipped - 10 lines] > > - Franc Zabkar Franc Zabkar - 20 May 2008 08:21 GMT >Something wrong with your clock, Franc. Thanks.
Note to self: change motherboard battery
- Franc Zabkar
 Signature Please remove one 'i' from my address when replying by email.
Gary S. Terhune - 19 May 2008 18:12 GMT In short, you want to find all the web addresses in the XML file and list them in something like a text file? If not that, Copy them to where? Is this just a once-off job or are you planning on processing a lot of these things? If the former, I should think it's pretty obvious -- use Edit>Find, Copy/Paste,etc. For the latter, what you request sounds like an easy job for VBScript. I certainly don't know of any ready-made tool that would do the job.
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
> This question must probably does not belong here but I tried different > groups but got no answer. [quoted text clipped - 6 lines] > found. > Sol MEB - 19 May 2008 18:31 GMT Find In Context [an application] or similar might be what poster is looking for.
Ultra Edit, NotePad++, and several other programs have a Find In Files search tool, which may also work.
 Signature MEB http://peoplescounsel.orgfree.com -- _________
| In short, you want to find all the web addresses in the XML file and list | them in something like a text file? If not that, Copy them to where? Is this [quoted text clipped - 14 lines] | > found. | > Sol Gary S. Terhune - 19 May 2008 19:11 GMT Boolean searches? I dunno. Don't know enough about the language to say. Would seem to me to be a fairly complicated search term. Locate the URL only if directly following Str2. And then define its beginning and end. And then, like I said, Copy to what? Do those tools you cite generate lists?
But I know even less about the OP's desired results, so there you go.
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
> Find In Context [an application] or similar might be what poster is > looking [quoted text clipped - 29 lines] > | > found. > | > Sol S1L1Y1 - 19 May 2008 19:21 GMT I will copy to a list or excel file. Sol
> Boolean searches? I dunno. Don't know enough about the language to say. > Would seem to me to be a fairly complicated search term. Locate the URL only [quoted text clipped - 36 lines] > > | > found. > > | > Sol Gary S. Terhune - 19 May 2008 19:52 GMT First is a snap, and one or more of those programs MEB suggested might actually do such a thing, though I still think they'd have a hard time actually, properly, locating the strings you want to copy out. No problem with VBScript. Might take me a couple of hours, but I don't practice much. For an Excel file, you'd just make the extension CSV instead of TXT.
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
>I will copy to a list or excel file. > Sol [quoted text clipped - 45 lines] >> > | > found. >> > | > Sol S1L1Y1 - 19 May 2008 18:57 GMT Thank You. It is both for now a one time job, and in the future I might need it again. With the Edit>Find,
> Copy/Paste the problem is I have to keep on clicking next,and there are more then 300 entries . Sol
> In short, you want to find all the web addresses in the XML file and list > them in something like a text file? If not that, Copy them to where? Is this [quoted text clipped - 14 lines] > > found. > > Sol S1L1Y1 - 19 May 2008 19:05 GMT Also I am not familiar with VBScript. Sol
> Thank You. It is both for now a one time job, and in the future I might need > it again. With the Edit>Find, [quoted text clipped - 24 lines] > > > found. > > > Sol Gary S. Terhune - 19 May 2008 19:43 GMT Maybe you'd like to learn? It's quick (and dirty) compared to most programming languages. http://www.google.com/search?hl=en&q=vbscript
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
> Also I am not familiar with VBScript. > Sol [quoted text clipped - 33 lines] >> > > found. >> > > Sol Gary S. Terhune - 19 May 2008 19:35 GMT You still haven't answered the main question: Copy to where? A simple text list? Something more complicated?
From what you say, you need a program or script, or other tool. Much too tedious by hand. I don't think what MEB suggests is what you're looking for, but you should at least check them out.
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
> Thank You. It is both for now a one time job, and in the future I might > need [quoted text clipped - 26 lines] >> > found. >> > Sol S1L1Y1 - 19 May 2008 20:31 GMT Right a simple list. Sol
> You still haven't answered the main question: Copy to where? A simple text > list? Something more complicated? [quoted text clipped - 33 lines] > >> > found. > >> > Sol Gary S. Terhune - 19 May 2008 21:00 GMT Well, then, you have my answer, and those of MEB.
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
> Right a simple list. > Sol [quoted text clipped - 45 lines] >> >> > found. >> >> > Sol Jeff Richards - 19 May 2008 21:57 GMT XML files are plain text. Use facilities for searching and copying etc that you would use for a text file.
 Signature Jeff Richards MS MVP (Windows - Shell/User)
> This question must probably does not belong here but I tried different > groups but got no answer. [quoted text clipped - 6 lines] > found. > Sol S1L1Y1 - 19 May 2008 22:19 GMT I will have to go one by one and will never finish. I tried I opened Excel and then imported the data from the xml file and then went to find all and it found all the entries but you can not copy and paste from there you would still have to copy each one separate. There must be an easier way. Solomon
> XML files are plain text. Use facilities for searching and copying etc that > you would use for a text file. [quoted text clipped - 8 lines] > > found. > > Sol Gary S. Terhune - 20 May 2008 01:06 GMT Using Excel Find, you found all the URLs and they were automatically all highlighted, all at once?
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
>I will have to go one by one and will never finish. I tried I opened Excel > and then imported the data from the xml file and then went to find all and [quoted text clipped - 18 lines] >> > found. >> > Sol S1L1Y1 - 20 May 2008 23:11 GMT That is the problem it seems to me that you can not highlight them all because I tried to click shift and move down but not happened only the next one got highlighted, or maybe I don't know how to do it here. Sol
> Using Excel Find, you found all the URLs and they were automatically all > highlighted, all at once? [quoted text clipped - 21 lines] > >> > found. > >> > Sol Gary S. Terhune - 21 May 2008 01:18 GMT I did some experimenting of my own and saw what happens. For some reason when you Find All, it turns into a multiple selection that can't be copied, unlike when you select several separate cells manually, where copying is for some reason OK. Go figure.
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
> That is the problem it seems to me that you can not highlight them all > because I tried to click shift and move down but not happened only the [quoted text clipped - 32 lines] >> >> > found. >> >> > Sol S1L1Y1 - 21 May 2008 18:03 GMT I was able to open it with word. Now I am trying to find all that start with www.adc.com/, but the end is not the same. I am sending a sample; ="http://www.abc.com/catalog/b3_1_424_1.JPG" . What do I enter in the find box? Sol
> I did some experimenting of my own and saw what happens. For some reason > when you Find All, it turns into a multiple selection that can't be copied, [quoted text clipped - 37 lines] > >> >> > found. > >> >> > Sol Jeff Richards - 21 May 2008 21:57 GMT Your sample is too tiny to for me be sure, but I would use something like this:
Find Value=" Replace with ^p~
Find " Replace with ^p
Sort
Delete all lines not starting with ~
Find ~ Replace with
(if i remember correctly that ^& is the find string).. If you post a few lines - say 10 - then I can see if the above will work or not.
 Signature Jeff Richards MS MVP (Windows - Shell/User)
>I was able to open it with word. Now I am trying to find all that start >with [quoted text clipped - 49 lines] >> >> >> > found. >> >> >> > Sol S1L1Y1 - 21 May 2008 23:26 GMT <ItemField Name="price" Value="13.27"/> <ItemField Name="product-url" Value="http://www.adc.com/servlet/the-424/JEWELRY-BOX-GIFTWARE/Detail"/> <ItemField Name="merchant-site-category" Value="JEWELRY BOXES"/> <ItemField Name="image-url" Value="http://www.adc.com/catalog/b3_1_424_1.JPG"/> <ItemField Name="upc" Value=""/> <ItemField Name="isbn" Value=""/> <ItemField Name="manufacturer" Value=""/> <ItemField Name="manufacturer-part-no" Value=""/> <ItemField Name="classification" Value="new"/> <ItemField Name="in-stock" Value="Y"/> <ItemField Name="shipping-price" Value="5.00"/> <ItemField Name="shipping-weight" Value="1.2"/>
> Your sample is too tiny to for me be sure, but I would use something like > this: [quoted text clipped - 67 lines] > >> >> >> > found. > >> >> >> > Sol Jeff Richards - 22 May 2008 08:52 GMT This will work. Create the document as a txt file and use Open With to open it in WORD. Use Find and Replace as indicated:
Find What: Value="http Replace With: ^p~ Replace All
Find What: "/> Replace With: Replace All
Sort
Delete all lines not starting with ~
Find What: ~ Replace With: http Replace All
 Signature Jeff Richards MS MVP (Windows - Shell/User)
> <ItemField Name="price" Value="13.27"/> > <ItemField Name="product-url" [quoted text clipped - 96 lines] >> >> >> >> > found. >> >> >> >> > Sol Gary S. Terhune - 22 May 2008 16:05 GMT Only problem would be if there is a ~ (tilde) in any of the URLs. Seems to me I've seen them, but perhaps not?
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
> This will work. Create the document as a txt file and use Open With to > open it in WORD. Use Find and Replace as indicated: [quoted text clipped - 116 lines] >>> >> >> >> > found. >>> >> >> >> > Sol Jeff Richards - 22 May 2008 21:55 GMT Any character that doesn't appear elsewhere in the file will work - the vertical bar is also a good choice. If there's a difficulty, use an unlikely combination of unusual characters instead of a single character.
 Signature Jeff Richards MS MVP (Windows - Shell/User)
> Only problem would be if there is a ~ (tilde) in any of the URLs. Seems to > me I've seen them, but perhaps not? S1L1Y1 - 22 May 2008 19:48 GMT Bravo! Bravo! Bravo! YOU ARE GREAT. We are starting to get there. Now I have a problem that it does not sort in the original order. When I copy and paste it to my Excel file it has to match to the other columns. Sol
> This will work. Create the document as a txt file and use Open With to open > it in WORD. Use Find and Replace as indicated: [quoted text clipped - 114 lines] > >> >> >> >> > found. > >> >> >> >> > Sol Jeff Richards - 22 May 2008 21:52 GMT That's why it is so important that you provide a FULL description of what you are trying to do BEFORE we start to tackle your problem. You haven't explained why you need to use EXCEL, which is very limited for this type of exercise.
After the replace and just before the sort, cut and paste the whole thing into a blank EXCEL sheet. Add a column to the right with an incrementing number (using a formula), copy the column, and paste special as values back to the same location as the original column. Then sort in EXCEL, delete the rows you don't need, and re-sort by the extra column to get back to the original order. Delete the extra column. I don't know whether you an then finish it in EXCEL, (probably) or you need to export as text and finish in WORD, but the process is essentially the same.
 Signature Jeff Richards MS MVP (Windows - Shell/User)
> Bravo! Bravo! Bravo! YOU ARE GREAT. > We are starting to get there. Now I have a problem that it does not sort [quoted text clipped - 130 lines] >> >> >> >> >> > found. >> >> >> >> >> > Sol S1L1Y1 - 26 May 2008 20:42 GMT Can I undo sort after I delete or cut / Sol
> That's why it is so important that you provide a FULL description of what > you are trying to do BEFORE we start to tackle your problem. You haven't [quoted text clipped - 36 lines] > >> > <ItemField Name="price" Value="13.27"/> > >> > <ItemField Name="product-url" Value="http://www.adc.com/servlet/the-424/JEWELRY-BOX-GIFTWARE/Detail"/>
> >> > <ItemField Name="merchant-site-category" Value="JEWELRY BOXES"/> > >> > <ItemField Name="image-url" [quoted text clipped - 102 lines] > >> >> >> >> >> > found. > >> >> >> >> >> > Sol Jeff Richards - 26 May 2008 21:47 GMT Yes - if you create a column of incrementing values BEFORE doing the sort then use that column to sort on AFTER doing the delete, like I said..
 Signature Jeff Richards MS MVP (Windows - Shell/User)
> Can I undo sort after I delete or cut / > Sol S1L1Y1 - 26 May 2008 23:26 GMT I am very not knowledgeable so I would appreciate if you can explain. Sol
> Yes - if you create a column of incrementing values BEFORE doing the sort > then use that column to sort on AFTER doing the delete, like I said.. > > Can I undo sort after I delete or cut / > > Sol Jeff Richards - 27 May 2008 10:41 GMT Why are you using EXCEL for this job? It's not a good tool for this sort of task, and now you say that you are not very knowledgeable about it. You seem determined to make this job as difficult as possible.
You create a column of incrementing numbers by putting a 1 in the first cell and then a formula in the cell below it that adds one to the value from the cell above. Then copy down from the second cell to the bottom of the range.
You convert the column from formulas to values by copying the column then using paste special and selecting values and pasting it back to where it came from.
You sort by selecting the whole spreadsheet, choosing sort, and nominating the column containing the text data.
Then delete the lines you aren't interested in - they will be in blocks before and after the lines you need.
Sort back into the original sequence by selecting the whole sheet, selecting sort, and nominating the column with the numbers in it.
These are elementary EXCEL questions that are best asked in an EXCEL newsgroup.
 Signature Jeff Richards MS MVP (Windows - Shell/User)
>I am very not knowledgeable so I would appreciate if you can explain. > Sol >> Yes - if you create a column of incrementing values BEFORE doing the sort >> then use that column to sort on AFTER doing the delete, like I said.. >> > Can I undo sort after I delete or cut / >> > Sol Gary S. Terhune - 23 May 2008 00:08 GMT You are close to the most frustrating person to land in this NG in a long time!
WHAT DO YOU WANT TO DO? Not the intermediate steps you keep asking about, like extracting and sorting. What is your FINAL GOAL? What does your Excel file look like?
Here's my email address: gryst_at_grystmill.com (please tell me you know how to make that a real email address.) Please send me as many XML files as you have for samples, and your Excel file(s), too. Then I can put it on my website for others to download and review, also.
If you haven't totally pissed off everyone, you may yet get an answer. A search of Google Groups shows that you have indeed gotten LOTS of attempts help you, in this group and in the Office.Misc group (and why not Excel?) Oh, and now a couple of weak attempts in the Excel group where you haven't given them any real info to work with either. (If you give me the files to post, even the Excel guys & gals can access them.)
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
> Bravo! Bravo! Bravo! YOU ARE GREAT. > We are starting to get there. Now I have a problem that it does not sort [quoted text clipped - 130 lines] >> >> >> >> >> > found. >> >> >> >> >> > Sol S1L1Y1 - 26 May 2008 18:23 GMT Jeff, If you can help me with one more thing I will very much appreciate I asked already this questions and got replies but I don't understand them. I have an Excel files with a column of only the end of the urls. I want to add to each one the same beginning of the url, www.adc.com/. How do I do it without having to go to each individual and paste? Sol
> This will work. Create the document as a txt file and use Open With to open > it in WORD. Use Find and Replace as indicated: [quoted text clipped - 114 lines] > >> >> >> >> > found. > >> >> >> >> > Sol Jeff Richards - 26 May 2008 21:51 GMT You already got an answer to this one.
Create a formula in an adjacent column that uses the CONCATENATE function to prepend a fixed text string to the beginning of the text in the first column.
Put the formula in the first cell of the column, test it, then copy down to automatically put it in each other cell. Copy and paste special (Values) to turn it from a formula into a value.
These questions are best asked in an EXCEL group - they have nothing to do with W98.
 Signature Jeff Richards MS MVP (Windows - Shell/User)
> Jeff, > If you can help me with one more thing I will very much appreciate I asked [quoted text clipped - 4 lines] > without having to go to each individual and paste? > Sol Franc Zabkar - 22 May 2008 21:19 GMT ><ItemField Name="price" Value="13.27"/> ><ItemField Name="product-url" [quoted text clipped - 10 lines] ><ItemField Name="shipping-price" Value="5.00"/> ><ItemField Name="shipping-weight" Value="1.2"/> If I understand the problem correctly, then the following one-line command may be close to what you want:
find /i "<ItemField Name=""image-url Value=" "filename.xml" | find /i "http://" > your_path_name\urls.txt
Execute the command in a DOS window (watch out for word wrap). The URLs are written to a file named your_path_name\urls.txt. "Filename.xml" is the name of your XML file. I'm assuming that "ItemField Name" and "Value" appear on the same line. An actual example XML file would have helped.
- Franc Zabkar
 Signature Please remove one 'i' from my address when replying by email.
Gary S. Terhune - 20 May 2008 02:08 GMT In case you can't see Franc's post, he suggests you put a copy online (not posted here to the NG, put it on a private website.) Put a link here. Then we can see just what you're talking about and play with the file ourselves.
If you want the result to be an Excel file, that suggests that it would be simplest to write a macro, if a macro can do the job.
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
>I will have to go one by one and will never finish. I tried I opened Excel > and then imported the data from the xml file and then went to find all and [quoted text clipped - 18 lines] >> > found. >> > Sol S1L1Y1 - 20 May 2008 23:13 GMT How do I write a macro and what does it do? Sol
> In case you can't see Franc's post, he suggests you put a copy online (not > posted here to the NG, put it on a private website.) Put a link here. Then [quoted text clipped - 25 lines] > >> > found. > >> > Sol Gary S. Terhune - 21 May 2008 01:21 GMT A macro is program, so you have to learn to write in that program's language, which is some form of VBA, Visual Basic for Applications. Like VBScript, VBA is also fairly easy to learn. Look in your Office Help or Google it.
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
> How do I write a macro and what does it do? > Sol [quoted text clipped - 37 lines] >> >> > found. >> >> > Sol Jeff Richards - 20 May 2008 10:22 GMT I'm surprised that you are using EXCEL as your text editor.
using WORD is much easier (although there are specialised text editors that would make it a snap). Use Find and Replace (with judicious use of the special characters, such as paragraph mark or 'any letter' etc) to make the text you require significant - eg bounded by tilde (~) or some other character that doesn't otherwise appear. Get this text onto a line by itself, probably by using replace to insert a paragraph mark at the appropriate point. Then sort. Then delete everything you don't want (it will be in contiguous lines). Then remove any characters you added.
 Signature Jeff Richards MS MVP (Windows - Shell/User)
>I will have to go one by one and will never finish. I tried I opened Excel > and then imported the data from the xml file and then went to find all and [quoted text clipped - 18 lines] >> > found. >> > Sol S1L1Y1 - 20 May 2008 23:43 GMT I was able to open it with word. Now I am trying to find all that start with www.adc.com/, but the end is not the same. I am sending a sample; ="http://www.abc.com/catalog/b3_1_424_1.JPG" . What do I enter in the find box? Sol
> I'm surprised that you are using EXCEL as your text editor. > [quoted text clipped - 28 lines] > >> > found. > >> > Sol Gary S. Terhune - 22 May 2008 05:00 GMT Here's a script I wrote to do the job. You're welcome to try it out. I'm putting it on my website inside a ZIP file for easier downloading. You should create a folder just for this script. When you want to analyze an XML file, put it into the same folder with the script, then run the script.
I am posting the contents here for review, but it's better to download the file. DO NOT try to copy the script from this post for use unless you know VBS and can fix the broken lines due to wrapping. If you want to put the results file into an Excel sheet, change the extension from TXT to CSV. Or paste the text into Word for sorting. I did not concern myself with removing any possible duplications or sorting. Yet. http://grystmill.com/shared/FindURL.zip
Feedback is welcome. To a certain degree, <s>.
**************************************** Option Explicit
dim WshShell, fso, f, fl, fn, ln, s, r, q
Set fso = CreateObject("Scripting.FileSystemObject") Set WshShell = WScript.CreateObject("WScript.Shell") f = WshShell.CurrentDirectory
'Obtain name of file to be analyzed. 'line 10 Do Until fn <> "" fn = InputBox("Please input the name of the file you wish to analyze. Example: MyFirstXML.xml " & _ " Note that the file must be in the same folder as this script -- " & f & ".") If fn = "" Then If MsgBox("Input is invalid. Press OK to try again, or Cancel to stop this script from running.", _ 1, "FindURL.vbs -- Error!") = 2 Then WScript.Quit End If If (fso.FileExists(f & "\" & fn)) <> True Then If MsgBox("Input is invalid. Press OK to try again, or Cancel to stop this script from running.", _ 1, "FindURL.vbs -- Error!") = 2 Then WScript.Quit 'line20 End If Loop
If (fso.FileExists(f & "\URLs_" & Left(fn, Len(fn) - 3) & "txt")) <> True Then Set r = fso.CreateTextFile(f & "\URLs_" & Left(fn, Len(fn) - 3) & "txt") Else q = MsgBox("A file named " & f & "\URLs_" & Left(fn, Len(fn) - 3) & "txt" & _ ", already exists. Press YES to overwrite the old file" & _ ", or NO to append new URLs to the existing results file." & _ " Press Cancel to stop the analysis.", 3 + 48 + 0) 'line 30 If q = 2 Then WScript.Quit If q = 6 Then Set r = fso.CreateTextFile(f & "\URLs_" & Left(fn, Len(fn) - 3) & "txt", True) If q = 7 Then Set r = fso.OpenTextFile(f & "\URLs_" & Left(fn, Len(fn) - 3) & "txt", 8) End If
Set fl = fso.OpenTextFile(f & "\" & fn, 1) Do While fl.AtEndOfStream <> True ln = fl.ReadLine If InStr(ln, "http://") > 0 Then s = InStr(ln, "http://") 'line 40 ln = Right(ln, Len(ln) - s + 1) ln = Left(ln, Len(ln) - 3) r.WriteLine(ln) End If Loop
WScript.Quit
***********************************************
 Signature Gary S. Terhune MS-MVP Shell/User www.grystmill.com
> This question must probably does not belong here but I tried different > groups but got no answer. [quoted text clipped - 6 lines] > found. > Sol
|
|
|