Topic: I need the functionality of a javascript capable form of wge
no photo
Wed 12/29/10 01:07 AM
I need help. I want to take the scheduling information provided by several different websites and stick it into a google calendar.

I've played with the google calendar APIs (in python) and it seems pretty easy, once you've parsed your data, to insert it into google calendar.

Those websites that present their data in straight html are easy to work with - I'll send an http request in python, and parse for the calendar data.

I'm getting stuck on websites that use ajax. The tools I know how to use (for command line automation of http requests) are python, wget, and if absolutely necessary lynx. These do not handle javascript, and I'm unable (so far) to force the server to give me the scheduling data using these tools. I figure if I study the website code, I might figure out a certain XMLHttpRequest call (or similar) that the site uses to request its own data, and possibly make that same request from my machine (though cross-site security protections may prevent that...).

Another approach here might be to call a full fledged (javascript enabled) web browser and instruct it to download the data I want, saved to a file, and immediately exit. Preferably without even opening up a window. I've been googling for command line options (since "chrome /?", "chrome /h", "chrome --help", etc don't tell me anything) for chrome and firefox, but I can't figure out how to tell either browser how to:

(a) launch (without opening a window), (b) request a url, (c) process the javascript as if it were simply rendering the page, (d) save the page to a file, then (e) quit

all at once.

Basically, I want the functionality of a javascript enabled wget, for one page (not mirroring), but can't seem to make it happen.


Any ideas? Alternatives? Intimate knowledge of non-gui use of browsers? Has anyone done this before? Tips? Tools that might help me which I've not mentioned here?

Help?


no photo
Wed 12/29/10 01:14 AM
In case my way of expressing myself is unclear, this person was asking a very similar question....

http://www.linuxquestions.org/questions/linux-software-2/wget-replacement-that-understands-javascript-121396/

no photo
Wed 12/29/10 01:22 AM
Grr... I wasn't able to find anything useful on google the last few times I looked, but after posting this question I finally found the following, which is somewhat helpful. HtmlUnit, maybe.

http://stackoverflow.com/questions/1106268/command-line-url-fetch-with-javascript-capabliity

Still interested in hearing tips from anyone whose already done this exact thing...

no photo
Thu 12/30/10 05:15 PM
Well I'm still stuck. I don't want to go the 'automate a GUI' route, it seems messy and I can't scp that code to any ol' remote server for execution.

I tried using firefox's liveheaders to see which urls were called with which post data with which cookies set, and faked it as best i could using wget, only to get the same stub of a main page returned, without the data.

i may install ruby and try watir, but i'm still hoping someone on this site has done this kind of thing before.

anyone?

Allaboutmetoo's photo
Fri 12/31/10 01:18 PM
if you go to the site date hookup into the computer nerds forum those guys are guru's...hope this helps

no photo
Sat 01/01/11 03:21 PM

if you go to the site date hookup into the computer nerds forum those guys are guru's...hope this helps


Are you saying that the geeks on another site are smarter than the geeks on this site? :tongue:

(Thank you for the suggestion)