Spark 101: getting the status of your job

As Roberto Vitillo says in his excellent post about Spark best practices:

Running Spark jobs without the Spark UI is like flying blind.

And that’s especially true if your spark job keeps crashing or it’s crunching big data (yeah, I had to use the expression at least once).

If you don’t want to setup a SOCKS proxy or, for some reason, that simply doesn’t work for you (my case!), you can still access the Spark UI from the cluster’s main node (thank you Mark!):

[bash]
lynx localhost:4040
[/bash]

It will get you to this nice little page, showing you the status of the scheduled jobs:

Spark UI using Lynx

Spark UI using Lynx

Lynx FTW!

 

Alessio Placitelli