[OH-Dev] Bugimporters, spiders and scrapy. Support for Tigris.org tracker XML format...
Dirk Bächle
tshortik at gmx.de
Tue Apr 29 19:44:32 UTC 2014
Hi there,
I'm a developer for the SCons project (www.scons.org), and we're
currently setting up our project page at OpenHatch. First of all, thanks
a lot to you all for providing this very cool service! We're really
looking forward to being a part of this...
We'd like to integrate our bug tracker to our page, but it's hosted at
tigris.org (via collabNet)...which doesn't seem to be supported in the
list of trackers yet.
So, I decided to take on the challenge, opened a corresponding issue in
your tracker (#972)...and foolishly assigned it to myself. ;)
I started the implementation, but now would like to get some feedback
and advice about how to proceed further with the following problem: The
bug IDs at tigris.org simply range from "1-max", and there doesn't seem
to be a request in their XML API (it's just a single CGI with very
limited capability) for finding out "max" with a single call. So, I'd
have to probe issue IDs...starting at 1, in steps of 1024 perhaps, until
I get a "not found" response and an "upper bound ID".
Would this be "okay" for a first shot, or should this approach be
discarded for performance reasons (or because it's plain stupid :) )?
Or is there a special scrapy spider "daisy chain mode" for the above
case, where I can process each single request URL sequentially and then
either return the URL to the next issue (+1 if the issue contained data)
or stop when the current ID couldn't be found?
I checked the sources of the other bugimporters, and the documentation
of scrapy...but couldn't find any hints about this so far.
Oh, and another question: I can download the data of the single issues
either with, or without attachments. Which is preferred, or should I try
to add a different Bugimporter for each case?
Thanks a lot in advance for any pointers or comments.
Best regards,
Dirk Baechle
More information about the Devel
mailing list