Join the Stack Overflow Community
Stack Overflow is a community of 6.6 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

I want to clear whole content that is placed inside of <loot> </loot> elements in XML files in a directory tree. I am using Strawberry Perl for windows 64 bit.

For example this XML file:

<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon"/>
<health="10000"/>
<immunities>
   <immunity fire="1"/>
</immunities>
<loot>
<item id="1"/>
  <item id="3"/>
      <inside>
        <item id="6"/>
      </inside>
  </item>
</loot>

The changed file should look:

<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon"/>
<health="10000"/>
<immunities>
   <immunity fire="1"/>
</immunities>
<loot>
</loot>

I have this code:

#!/usr/bin/perl
use warnings;
use strict;

use File::Find::Rule;
use XML::Twig;

sub delete_loot {
   my ( $twig, $loot ) = @_;
   foreach my $loot_entry ( $loot -> children ) {
      $loot_entry -> delete;
   }
   $twig -> flush;
}

my $twig = XML::Twig -> new ( pretty_print => 'indented', 
                              twig_handlers => { 'loot' => \&delete_loot } ); 

foreach my $file ( File::Find::Rule  -> file()
                                     -> name ( '*.xml' )
                                     -> in ( 'C:\Users\PIO\Documents\serv\monsters' ) ) {

    print "Processing $file\n";
    $twig -> parsefile_inplace($file); 
}

But it edits correctly only the first file it meets and the rest files leaves clear (0 kb clear files)

share|improve this question
    
Can you add another file where it's not working to the question please? You can edit the question to do that. – simbabque Jan 2 at 12:05
    
all the files are correct, but the script works well only on the first one it meets, leaving rest cleared (no matter which xml file it edits, it only edits correctly the first one) – Piodo Jan 2 at 22:51
    
The obvious test there would be - move the my $twig declaration inside the loop. – Sobrique Jan 3 at 9:02
    
Also: Your XML isn't valid. That's possibly not helping. – Sobrique Jan 3 at 9:06
up vote 3 down vote accepted
+50

The XML::Twig doc says that "Multiple twigs are not well supported".

If you look at the state of the twig object (using Data::Dumper for example) you see a strong difference between the first and subsequent runs. It looks like it considers that is has been totally flushed already (which is true, as there was a complete flush during the first run). It probably has nothing more to print for the subsequent files and the file ends up empty.

Recreating the twig object at each loop worked for me:

#!/usr/bin/perl
use warnings;
use strict;

use File::Find::Rule;
use XML::Twig;

sub delete_loot {
   my ( $twig, $loot ) = @_;
   foreach my $loot_entry ( $loot -> children ) {
        $loot_entry -> delete;
    }
}

foreach my $file ( File::Find::Rule  -> file()
                                     -> name ( '*.xml' )
                                     -> in ( '/home/dabi/tmp' ) ) {

    print "Processing $file\n";
    my $twig = XML::Twig -> new ( pretty_print => 'indented', 
                                  twig_handlers => { loot => \&delete_loot, } ); 
    $twig -> parsefile($file); 
    $twig -> print_to_file($file);
}

Also, I had to change the XML file structure to have it processed:

<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon">
<health value="10000"/>
<immunities>
   <immunity fire="1"/>
</immunities>
<loot>
<item id="1"/>
  <item id="3">
      <inside>
        <item id="6"/>
      </inside>
  </item>
</loot>
</monster>
share|improve this answer
    
The script works on every file correctly clearing the loot, I think we have a winner here. Unfortunately 10% of the xml files doesn't contain the <loot> </loot> elements. In this case if script modify xml monster that doesn't have <loot> node it clears the file (0 kb). Can be placed a condition that doesn't modify the file if there aren't loot elements, or just doesnt blank the file in this case? (Putting empty <loot></loot> would be fine too) – Piodo Jan 5 at 13:22
1  
Indeed. It is because you use flush() while parsing. The doc explains it: "Flushes a twig up to (and including) the current element, then deletes all unnecessary elements from the tree that's kept in memory." As your files without the loot element won't match anything in your twig-handlers, when flushing you won't have been anywhere in the XML tree. I edited my solution in order to print the whole tree once the parsing is done. Please let me know if you agree with this solution. – David Verdin Jan 5 at 15:42
    
Thank you, it's great. I will award my bounty to you as fast I can (after 6 hours) – Piodo Jan 5 at 17:28
1  
Wow, thanks. My first bounty. Yay! – David Verdin Jan 5 at 18:49

Note   With flush changed to print the code in the question works for me (with valid XML).

However, I still recommend either of versions below. Tested with two groups of valid XML files.


When XML::Twig->new(...) is set first and then files looped over and processed, I get the same behavior. The first file is processed correctly, the others completely blanked.   Edit When flush is replaced by print the shown code in fact works (with correct XML files). However I still suggest either of versions below instead, as XML::Twig just does not support multiple files well.

The reason may have something to do with new being a class method. However, I don't see why this needs to affect handling of multiple files. The callback is installed outside of the loop, but I've tested with it being re-installed for each file and it doesn't help.

Finally, flush-ing isn't needed while it clearly hurts here, by clearing the state (which was created by the class method new). This doesn't affect code below, but it is still replaced by print.

Then just do everything in the loop. A simple version

use strict;
use warnings;
use File::Find::Rule;
use XML::Twig;

my @files = File::Find::Rule->file->name('*.xml')->in('...');

foreach my $file (@files)
{
    print "Processing $file\n";
    my $t = XML::Twig->new( 
        pretty_print => 'indented', 
        twig_handlers => { loot => \&clear_elt },
    );
    $t->parsefile_inplace($file)->print;
}

sub clear_elt {
    my ($t, $elt) = @_; 
    my $elt_name = $elt->name;                # get the name
    my $parent = $elt->parent;                # fetch the parent
    $elt->delete;                             # remove altogether
    $parent->insert_new_elt($elt_name, '');   # add it back empty
}

The callback code is simplified, to remove the element altogether and then add it back, empty. Note that the sub does not need the element name hardcoded. This can thus be used as it stands to remove any element.

We can avoid calling new in the loop by using another class method, nparse.

my $t = XML::Twig->new( pretty_print => 'indented' );

foreach my $file (@files) 
{
    print "Processing $file\n";
    my $tobj = XML::Twig->nparse( 
        twig_handlers => { loot => \&clear_elt }, 
        $file
     );
     $tobj->parsefile_inplace($file)->print;
}

# the sub clear_elt() same as above

We do have to first call the new constructor, even as it isn't directly used in the loop.


Note that calling new before the loop without twig_handlers and then setting handlers inside

$t->setTwigHandlers(loot => sub { ... });

does not help. We still only get the first file processed correctly.

share|improve this answer
    
Thanks for response. Unfortunately those scripts cleans all the files (Every files, even the first one) – Piodo Jan 5 at 12:39
1  
@Piodo The XML file you show is invalid and the shown code doesn't work for it, so you probably use files different than shown. I corrected it and tested with that, and I made up two more groups of XML files and tested with those as well. The code as shown works, both versions. It also works with your sub for clearing loot nodes. I added a different way just so, since it is far simpler computationally. – zdim Jan 5 at 19:47
1  
@Piodo I also replaced flush with print. That could be causing you problems (it doesn't for either version here, but it does clear the object). – zdim Jan 5 at 22:36
1  
@Piodo Confirmed -- when I change flush to print your code works (for me, and with valid XML files). I updated the answer with this. However, I still recommend doing everything in the loop, since XML::Twig just does not support multiple files well. – zdim Jan 5 at 22:46

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.