Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Unix swiss army knife for headless browser JavaScript (github.com/sotownsend)
115 points by rem0x4 on April 19, 2015 | hide | past | favorite | 48 comments


BooJS supports the browser DOM. You can call `document` in BooJS but not NodeJS. You can't import arbitrary browser javascript libraries into NodeJS but you can with BooJS. It's purpose is to assists with unit tests in a sane manner but it's great for other random things; reminiscent of netcat.


Do you think there is a reasonable way to provide this useful functionality without the Ruby dependency? Pure JS, perhaps?


It seems to me, that something similar could be constructed with node's PhantomJS bindings, but not sure that there would be much less overhead... it does feel a little awkward having the Ruby dependency...



I've been working on a similar tool that works with (some) real browsers. Pipe JS in, get console logs out. Simple.

https://github.com/hughsk/smokestack


It would be better to handle your errors through try{}catch{} rather than using phantom's onError as an exception seems to force an exit.

https://github.com/sotownsend/BooJS/blob/master/lib/boojs.rb...

I threw together a similar implementation in bash with this change:

https://gist.github.com/spence/30d5aa383fec6be8e51e

Looks like

    ./phantomjs-repl.sh
    > missing
    ReferenceError: Can't find variable: missing
    > window
    [object Window]
    > console.log('Hello')
    Hello
    undefined
    > !jquery-1.11.2.min.js
    Loading jquery-1.11.2.min.js ... done
    > $('<span>Hello</span>').text()
    Hello
    > ^C


Presence of a "-v" flag usually means verbose.


Alternately, version. Either way it's clobbering


It's often hard to tell where the hard problems are being solved in high-level tools like this.

tl;dr: CLI program wrapping PhantomJS, which is based on QtWebkit.


I'm not sure why i would use this over the already-existing phantomjs command line API (http://phantomjs.org/api/command-line.html), especially as it adds Ruby to the mix...also quite suspicious of the seemingly-unrelated plug for a company in the repo readme.


Lay off the accusations. 1. I'm a founder of that company and this is apart of a much larger project i'm releasing in a few months. 2. That command line does not act like a normal unix tool and will make your life a living hell because QT has some horrid misunderstanding of what constitutes a newline, outputting format characters, and incorrect streams (stdout is used for errors).

P.S. I do kernel development on FreeBSD in my spare time. I know a little bit about unix, ya know?


Congratulations on your journey with the company thus far. I simply don't understand why a randomly-dropped plug for said company is at all cogent in the documentation for a library that is released under what seems to be your personal github and has no other visible attachment to said company. But I'm not accusing you of anything - I mentioned it because it struck me as out-of-place and potentially PR-opportunistic, if that's not the case I apologize. From one founder to another: perhaps next time consider taking the less-abrasive approach to defending your product to avoid alienating your (potential) user base? Your hostility ends up doing the opposite of assuaging my concerns. Good luck.

Edited due to parent edit: I'm...happy for you that you develop on the FreeBSD kernel? I'm also not sure how that is relevant to the discussion of why this tool should be used over any other of the dozens of headless toolkits and wrappers out there - and your readme certainly doesn't do any work to convince me to care about your implementation over any existing solution. You're surprisingly defensive for someone claiming to be confident in their development skills, I'm not your enemy here - I was, until this exchange, a potential user (and may yet be, I'm not one to pass over a good tool simply for personal reasons).


was wondering, is this kernel dev on a private fork? i couldn't find a public fork on your gh and didn't see you listed as a contributor (https://github.com/freebsd/freebsd/graphs/contributors) - was curious to see the improvements you were introducing


You know FreeBSD is not developed on github, or in git (yeah, there is a very new bridge). Its in subversion.


@rem0x4 Ignore the haters.


1. Take open source library built on a fake browser

2. Write a wrapper around it claim its good as a real browser.

3. Plug company name

4. Profit ??


I'm slightly unclear on this, how does this differ from Node and what are its uses?


It's a headless browser. Node is a server framework that runs js.


Which means that it can handle scripts that rely on browser features like DOM whereas node can't handle that. So, you can use it to test the client-side code automatically (with a CI tool of course). You can't do that in node if your code accesses DOM (e.g. calls something like document.getElementById or uses jQuery).

Btw, it is just a CLI wrapper around PhantomJS[1] I guess.

[1]: https://github.com/ariya/phantomjs


Another (popular?) use is for SPA or JS-heavy pages to render themselves for search engines. Crawl the site with PhantomJS. For each page, give it a few seconds to do XHR and render things, then save the HTML snapshot. When you get a request from Googlebot, serve the HTML snapshot instead of the app, which Google apparently still cannot handle. Ta-da!

(And piss off people who hate JS "apps" or still desire the HTTP/HTML document ideal, for better or for worse.)


It's worth noting, that googlebot does seem to do page renders with JS, though not as frequently and usually several days behind a non-js detected change. Bingbot definitely does, and this can even be seen in google analytics oddly enough (google doesn't seem to do any filtering for non-browser rendering).

Came to a lot of this knowledge when changing a url structure for a few hundred thousand pages (with permanent redirects in place)... the bing bot results on analytics were really surprising, and had to adjust filtering.


Interesting; I should play with it. I got the impression they "sorta" do JS, but wouldn't run a bunch of XHR and so on


this reminds me of the time they removed the vulcan cannon during the start of vietnam war because they thought the days of dog fighting was over because of missiles.

Then their jets started falling left and right and they ended up installing it back again.


I don't feel that's a proper analogy. We save a ton of time by only doing a single layout/rendering system. If a page pulls in 5+ different assets to render, doing it on the server means we gotta come up with a lot more logic to get it done on the client side. And what better way than to just run a browser to get it all done. It's like the ultimate server-side renderer framework.


Question : can you connect it to a running instance of Phantom that is hanging to debug already running code ?


Nice. Anything like this built on jsdom? PhantomJS is pretty bulky for most non-visual tests.


There is a good example of try{} catch{} in post 7 of this thread at unix.com:

running unix command from java

http://www.unix.com/unix-for-dummies-questions-and-answers/1...


Why not to execute a js code in Chrome or Fifefox in Javascript console?


I imagine this is to be used for something like automated testing.


You would be able to do it with Selenium, which drives a real browser not a fake one.


BooJS uses PhantomJS. In Selenium, if you want to go headless you will use PhantomJS again [1].

[1] http://code.tutsplus.com/tutorials/headless-functional-testi...


just run it with Xvfb, and you got a headless browser.


Not everyone wants to install x-server and friends on their servers


why wouldn't you?


This is headless.


That's how germans pronounce it.


Pronounce what?


Come on, you didn't hear anything? :-)


Take headless browsers with a pinch of salt. They are not a replacement for a real browser. Don't be surprised when it renders or behaves differently from a real browser.

Did they fix the memory leak issues in PhantomJS? Take a look at the issues and consistently you will find people having trouble rendering a complete site.

Creating a headless browser is really taking a real browser on Xvfb. You have the support of a billion dollar company supporting and updating their browser, you reap the benefits of being able to run it headlessly. The drawbacks of this method is often overstated, 'oh my run a fat browser process AND X display? ok grandpa' but the real hidden problem with PhantomJS is understated 'why does my phantomjs process balloon in memory and cpu usage and crash, I can render the website fine in my browser'.

Standalone headless browsers are good for hacks but I wouldn't rely on it.


afaik the worst of the memory leaks were fixed with the 2.0 release, and they also upgraded QtWebKit to 5.3 which fixed some pretty egregiously outdated constraints (like not having access to Function.prototype.bind) present in the previous version. Sadly this tool utilizes phantomjs 1.9 which means the memory leaks are still present and major WebKit features are still missing.

One great use for headless is to separate non-UI framework/platform tests from UI tests - at my previous gig we wrote all the tests for the runtime platform against phantomjs to avoid the unneeded overhead of a full browser for things like data validation or mocking service calls, but used selenium to do the visual/interaction tests for the application UI itself. it's also a nice separation of concerns, though of course we hit all the same problems you mentioned as we were also pinned to 1.9.


https://github.com/ariya/phantomjs/issues

1470 open issues

the first one is about phantomjs crashing and many many other scary sounding issues you wouldn't have if you've used just a regular browser with Xvfb.

you can't really be sure unless you've literally run a test on the colorful variety of websites in the wild. You need a huge development and quality assurance team. You also need a way for a large crowd of people to test your browser and send back useful reports automatically whenever it crashes.


> 1470 open issues

That's a really weak argument. Here's 5000 Firefox bugs:

    https://bugzilla.mozilla.org/buglist.cgi?component=General&product=Firefox&query_format=advanced&resolution=---&order=bug_status%2Cpriority%2Cassigned_to%2Cbug_id&limit=0


considering firefox userbase is many many times those of phantomjs it shouldn't be of surprise to you. especially when most of phantomjs issues revolve around the same issues, unable to render a website that you can browse fine with firefox or chrome. this is a consistent issue that has haunted the project since the beginning.


Open issues is a pretty bad metric, especially considering the other major open source browsers would make PhantomJS look downright perfect.

Chrome: https://code.google.com/p/chromium/issues/list

73,025 open issues

Firefox: https://bugzilla.mozilla.org/buglist.cgi?bug_status=UNCONFIR...

10,000 open issues.


apples and oranges. chrome and firefox is used by millions of people. of course its going to have more eyes on it but when was the last time you couldn't render something in your browser?

phantomjs is used mainly by developers. you can't seriously compare it to chrome or firefox.

quick scan shows that most of the bugs showing up in chrome and firefox are very platform centric or esoteric issues while phantomjs is concerning because they are literally about being unable to render a webpage without crashing.


Neat project.

It's 98% Ruby and is being called a "Unix tool" because it has cmdline options. Interesting criteria there.

It seems to be pitched as a js testing tool. Not clear on how it is different from node or just plain phantom.


Accepting command line options does not make something Unix-y, IMHO. I assumed it was called a "Unix tool" because it reads data from a file/stdin, does something, then writes to stdout.

If I needed to do something with JS as part of a series of Unix commands this would fit in the pipeline nicely. e.g. let's say someone wanted to do a one-off data transformation and they understand JS a lot better than Awk.


Right, maybe with something like this? https://www.npmjs.com/package/commander




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: