[OH-Dev] Bugimporters, spiders and scrapy. Support for Tigris.org tracker XML format...

Tue Apr 29 19:44:32 UTC 2014

Hi there,

I'm a developer for the SCons project (www.scons.org), and we're 
currently setting up our project page at OpenHatch. First of all, thanks 
a lot to you all for providing this very cool service! We're really 
looking forward to being a part of this...

We'd like to integrate our bug tracker to our page, but it's hosted at 
tigris.org (via collabNet)...which doesn't seem to be supported in the 
list of trackers yet.
So, I decided to take on the challenge, opened a corresponding issue in 
your tracker (#972)...and foolishly assigned it to myself. ;)

I started the implementation, but now would like to get some feedback 
and advice about how to proceed further with the following problem: The 
bug IDs at tigris.org simply range from "1-max", and there doesn't seem 
to be a request in their XML API (it's just a single CGI with very 
limited capability) for finding out "max" with a single call. So, I'd 
have to probe issue IDs...starting at 1, in steps of 1024 perhaps, until 
I get a "not found" response and an "upper bound ID".
Would this be "okay" for a first shot, or should this approach be 
discarded for performance reasons (or because it's plain stupid :) )?

Or is there a special scrapy spider "daisy chain mode" for the above 
case, where I can process each single request URL sequentially and then 
either return the URL to the next issue (+1 if the issue contained data) 
or stop when the current ID couldn't be found?
I checked the sources of the other bugimporters, and the documentation 
of scrapy...but couldn't find any hints about this so far.

Oh, and another question: I can download the data of the single issues 
either with, or without attachments. Which is preferred, or should I try 
to add a different Bugimporter for each case?

Thanks a lot in advance for any pointers or comments.

Best regards,

Dirk Baechle