Unix swiss army knife for headless browser JavaScript

rem0x4 · on April 19, 2015

BooJS supports the browser DOM. You can call `document` in BooJS but not NodeJS. You can't import arbitrary browser javascript libraries into NodeJS but you can with BooJS. It's purpose is to assists with unit tests in a sane manner but it's great for other random things; reminiscent of netcat.

jaytaylor · on April 19, 2015

Do you think there is a reasonable way to provide this useful functionality without the Ruby dependency? Pure JS, perhaps?

tracker1 · on April 20, 2015

It seems to me, that something similar could be constructed with node's PhantomJS bindings, but not sure that there would be much less overhead... it does feel a little awkward having the Ruby dependency...

trentlott · on April 19, 2015

https://raw.githubusercontent.com/sotownsend/boojs/master/lo...

  Headless Browser Javasrcipt Tool

secoif · on April 20, 2015

I've been working on a similar tool that works with (some) real browsers. Pipe JS in, get console logs out. Simple.

https://github.com/hughsk/smokestack

syncerr · on April 20, 2015

It would be better to handle your errors through try{}catch{} rather than using phantom's onError as an exception seems to force an exit.

https://github.com/sotownsend/BooJS/blob/master/lib/boojs.rb...

I threw together a similar implementation in bash with this change:

https://gist.github.com/spence/30d5aa383fec6be8e51e

Looks like

    ./phantomjs-repl.sh
    > missing
    ReferenceError: Can't find variable: missing
    > window
    [object Window]
    > console.log('Hello')
    Hello
    undefined
    > !jquery-1.11.2.min.js
    Loading jquery-1.11.2.min.js ... done
    > $('<span>Hello</span>').text()
    Hello
    > ^C

erikano · on April 19, 2015

Presence of a "-v" flag usually means verbose.

gingerrr · on April 20, 2015

Alternately, version. Either way it's clobbering

gue5t · on April 20, 2015

It's often hard to tell where the hard problems are being solved in high-level tools like this.

tl;dr: CLI program wrapping PhantomJS, which is based on QtWebkit.

gingerrr · on April 20, 2015

I'm not sure why i would use this over the already-existing phantomjs command line API (http://phantomjs.org/api/command-line.html), especially as it adds Ruby to the mix...also quite suspicious of the seemingly-unrelated plug for a company in the repo readme.

rem0x4 · on April 20, 2015

Lay off the accusations. 1. I'm a founder of that company and this is apart of a much larger project i'm releasing in a few months. 2. That command line does not act like a normal unix tool and will make your life a living hell because QT has some horrid misunderstanding of what constitutes a newline, outputting format characters, and incorrect streams (stdout is used for errors).

P.S. I do kernel development on FreeBSD in my spare time. I know a little bit about unix, ya know?

gingerrr · on April 20, 2015

Congratulations on your journey with the company thus far. I simply don't understand why a randomly-dropped plug for said company is at all cogent in the documentation for a library that is released under what seems to be your personal github and has no other visible attachment to said company. But I'm not accusing you of anything - I mentioned it because it struck me as out-of-place and potentially PR-opportunistic, if that's not the case I apologize. From one founder to another: perhaps next time consider taking the less-abrasive approach to defending your product to avoid alienating your (potential) user base? Your hostility ends up doing the opposite of assuaging my concerns. Good luck.

Edited due to parent edit: I'm...happy for you that you develop on the FreeBSD kernel? I'm also not sure how that is relevant to the discussion of why this tool should be used over any other of the dozens of headless toolkits and wrappers out there - and your readme certainly doesn't do any work to convince me to care about your implementation over any existing solution. You're surprisingly defensive for someone claiming to be confident in their development skills, I'm not your enemy here - I was, until this exchange, a potential user (and may yet be, I'm not one to pass over a good tool simply for personal reasons).

gingerrr · on April 20, 2015

was wondering, is this kernel dev on a private fork? i couldn't find a public fork on your gh and didn't see you listed as a contributor (https://github.com/freebsd/freebsd/graphs/contributors) - was curious to see the improvements you were introducing

justincormack · on April 20, 2015

You know FreeBSD is not developed on github, or in git (yeah, there is a very new bridge). Its in subversion.

steve19 · on April 20, 2015

@rem0x4 Ignore the haters.

curiously · on April 20, 2015

1. Take open source library built on a fake browser

2. Write a wrapper around it claim its good as a real browser.

3. Plug company name

4. Profit ??

jamesondh · on April 19, 2015

I'm slightly unclear on this, how does this differ from Node and what are its uses?

woah · on April 19, 2015

It's a headless browser. Node is a server framework that runs js.

maemre · on April 19, 2015

Which means that it can handle scripts that rely on browser features like DOM whereas node can't handle that. So, you can use it to test the client-side code automatically (with a CI tool of course). You can't do that in node if your code accesses DOM (e.g. calls something like document.getElementById or uses jQuery).

Btw, it is just a CLI wrapper around PhantomJS[1] I guess.

[1]: https://github.com/ariya/phantomjs

MichaelGG · on April 20, 2015

Another (popular?) use is for SPA or JS-heavy pages to render themselves for search engines. Crawl the site with PhantomJS. For each page, give it a few seconds to do XHR and render things, then save the HTML snapshot. When you get a request from Googlebot, serve the HTML snapshot instead of the app, which Google apparently still cannot handle. Ta-da!

(And piss off people who hate JS "apps" or still desire the HTTP/HTML document ideal, for better or for worse.)

tracker1 · on April 20, 2015

It's worth noting, that googlebot does seem to do page renders with JS, though not as frequently and usually several days behind a non-js detected change. Bingbot definitely does, and this can even be seen in google analytics oddly enough (google doesn't seem to do any filtering for non-browser rendering).

Came to a lot of this knowledge when changing a url structure for a few hundred thousand pages (with permanent redirects in place)... the bing bot results on analytics were really surprising, and had to adjust filtering.

MichaelGG · on April 20, 2015

Interesting; I should play with it. I got the impression they "sorta" do JS, but wouldn't run a bunch of XHR and so on

curiously · on April 20, 2015

this reminds me of the time they removed the vulcan cannon during the start of vietnam war because they thought the days of dog fighting was over because of missiles.

Then their jets started falling left and right and they ended up installing it back again.

MichaelGG · on April 20, 2015

I don't feel that's a proper analogy. We save a ton of time by only doing a single layout/rendering system. If a page pulls in 5+ different assets to render, doing it on the server means we gotta come up with a lot more logic to get it done on the client side. And what better way than to just run a browser to get it all done. It's like the ultimate server-side renderer framework.

Gepsens · on April 20, 2015

Question : can you connect it to a running instance of Phantom that is hanging to debug already running code ?

mattdesl · on April 20, 2015

Nice. Anything like this built on jsdom? PhantomJS is pretty bulky for most non-visual tests.

unix_flyer · on April 20, 2015

There is a good example of try{} catch{} in post 7 of this thread at unix.com:

running unix command from java

http://www.unix.com/unix-for-dummies-questions-and-answers/1...

AlexNeoNomad · on April 20, 2015

Why not to execute a js code in Chrome or Fifefox in Javascript console?

jordanscales · on April 20, 2015

I imagine this is to be used for something like automated testing.

curiously · on April 20, 2015

You would be able to do it with Selenium, which drives a real browser not a fake one.

moondowner · on April 20, 2015

BooJS uses PhantomJS. In Selenium, if you want to go headless you will use PhantomJS again [1].

[1] http://code.tutsplus.com/tutorials/headless-functional-testi...

curiously · on April 20, 2015

just run it with Xvfb, and you got a headless browser.

ugexe · on April 20, 2015

Not everyone wants to install x-server and friends on their servers

curiously · on April 20, 2015

why wouldn't you?

abroncs · on April 20, 2015

This is headless.

rem0x4 · on April 19, 2015

That's how germans pronounce it.

larvaetron · on April 20, 2015

Pronounce what?

stretchwithme · on April 20, 2015

Come on, you didn't hear anything? :-)

curiously · on April 20, 2015

Take headless browsers with a pinch of salt. They are not a replacement for a real browser. Don't be surprised when it renders or behaves differently from a real browser.

Did they fix the memory leak issues in PhantomJS? Take a look at the issues and consistently you will find people having trouble rendering a complete site.

Creating a headless browser is really taking a real browser on Xvfb. You have the support of a billion dollar company supporting and updating their browser, you reap the benefits of being able to run it headlessly. The drawbacks of this method is often overstated, 'oh my run a fat browser process AND X display? ok grandpa' but the real hidden problem with PhantomJS is understated 'why does my phantomjs process balloon in memory and cpu usage and crash, I can render the website fine in my browser'.

Standalone headless browsers are good for hacks but I wouldn't rely on it.

gingerrr · on April 20, 2015

afaik the worst of the memory leaks were fixed with the 2.0 release, and they also upgraded QtWebKit to 5.3 which fixed some pretty egregiously outdated constraints (like not having access to Function.prototype.bind) present in the previous version. Sadly this tool utilizes phantomjs 1.9 which means the memory leaks are still present and major WebKit features are still missing.

One great use for headless is to separate non-UI framework/platform tests from UI tests - at my previous gig we wrote all the tests for the runtime platform against phantomjs to avoid the unneeded overhead of a full browser for things like data validation or mocking service calls, but used selenium to do the visual/interaction tests for the application UI itself. it's also a nice separation of concerns, though of course we hit all the same problems you mentioned as we were also pinned to 1.9.

curiously · on April 20, 2015

https://github.com/ariya/phantomjs/issues

1470 open issues

the first one is about phantomjs crashing and many many other scary sounding issues you wouldn't have if you've used just a regular browser with Xvfb.

you can't really be sure unless you've literally run a test on the colorful variety of websites in the wild. You need a huge development and quality assurance team. You also need a way for a large crowd of people to test your browser and send back useful reports automatically whenever it crashes.

286c8cb04bda · on April 20, 2015

> 1470 open issues

That's a really weak argument. Here's 5000 Firefox bugs:

    https://bugzilla.mozilla.org/buglist.cgi?component=General&product=Firefox&query_format=advanced&resolution=---&order=bug_status%2Cpriority%2Cassigned_to%2Cbug_id&limit=0

curiously · on April 20, 2015

considering firefox userbase is many many times those of phantomjs it shouldn't be of surprise to you. especially when most of phantomjs issues revolve around the same issues, unable to render a website that you can browse fine with firefox or chrome. this is a consistent issue that has haunted the project since the beginning.

jaredmcateer · on April 20, 2015

Open issues is a pretty bad metric, especially considering the other major open source browsers would make PhantomJS look downright perfect.

Chrome: https://code.google.com/p/chromium/issues/list

73,025 open issues

Firefox: https://bugzilla.mozilla.org/buglist.cgi?bug_status=UNCONFIR...

10,000 open issues.

curiously · on April 20, 2015

apples and oranges. chrome and firefox is used by millions of people. of course its going to have more eyes on it but when was the last time you couldn't render something in your browser?

phantomjs is used mainly by developers. you can't seriously compare it to chrome or firefox.

quick scan shows that most of the bugs showing up in chrome and firefox are very platform centric or esoteric issues while phantomjs is concerning because they are literally about being unable to render a webpage without crashing.

jaytaylor · on April 19, 2015

Neat project.

It's 98% Ruby and is being called a "Unix tool" because it has cmdline options. Interesting criteria there.

It seems to be pitched as a js testing tool. Not clear on how it is different from node or just plain phantom.

mryan · on April 20, 2015

Accepting command line options does not make something Unix-y, IMHO. I assumed it was called a "Unix tool" because it reads data from a file/stdin, does something, then writes to stdout.

If I needed to do something with JS as part of a series of Unix commands this would fit in the pipeline nicely. e.g. let's say someone wanted to do a one-off data transformation and they understand JS a lot better than Awk.

ende · on April 19, 2015

Right, maybe with something like this? https://www.npmjs.com/package/commander