ARR

Jupyter Ascending me into stress and anxiety

Posted on Aug 10, 2020

Not that I’m upset about it or anything, but I’ve spent the last couple of days week - it’s been a week - trying to get a multi-user instance of JupyterHub to honour system-wide environment settings. It’s not a thing, don’t go looking for it - you’ll end up like me; bitter and ginger.

Sometimes it’s good to have your expectations challenged.

I guess that from the lack of documentation on the topic, we have a relatively rare configuration running. Either that or I’m blatantly not grasping this. I may write about it more in the future, but in brief:

  • CentOS 7
  • Jupyter{Hub,Lab}
  • configured with SudoSpawner to spawn non-root instances
  • all connections outside the office go through a proxy

Getting to it…

Long story short, you need to configure the proxy specifically for JupyterHub.

Per-user proxy

In the case of a single user, create ~/.ipython/profile_default/startup/00-startup.py and paste in the below, modifying to meet your requirements.

import sys,os,os.path
os.environ['http_proxy']="http://proxy.domain:port"
os.environ['https_proxy']="http://proxy.domain:port"

Once complete, you can run %env within a notebook to confirm the variables are set.

{'PATH': '$PATH:/usr/bin:/usr/local/lib/npm/bin/:/usr/local/bin:/usr/local/texlive/2020/bin/x86_64-linux',
...
 'http_proxy': 'http://proxy.domain:port',
 'https_proxy': 'http://proxy.domain:port'}

Honestly, I’ve had a hell of a time with this and received so many mixed results. Initially this worked for me when setting the variables in ~/.jupyter/.profile_default/.../..., but on trying again, it didn’t take.

60 percent of the time, it works every time

I highly recommend clearing the cache any time you make changes to your environment - doing this would have saved me a lot of time.

Setting variables across all users

I came across a blog post where the creator found a solution to passing variables into Jupyter Notebook. This helped me immensely. Now knowing this, I went back and found the spawner API documentation for reference.

I skipped editing the service for JupyterHub, opting to configure everything directly in /etc/jupyterhub/jupyterhub-config.py.

#--------------------------
# Proxy configuration
#--------------------------

import os
os.environ['http_proxy']=os.environ['https_proxy']='http://proxy.domain:port'
os.environ['no_proxy'] = '.internal.domain.org,localhost,127.0.0.1'
c.Spawner.env_keep.extend(['http_proxy','https_proxy','no_proxy'])

In my case, the http_proxy and https_proxy addresses are the same, so to save adding an extra line, I configured as above.

no_proxy is required here, otherwise all requests are passed through the proxy, resulting in SudoSpawner losing the ability to communicate with the JupyterHub API running locally on the same server. I added the internal domain there to avoid sending traffic over the proxy when it isn’t necessary to do that.

The more you care, the more the world JupyterHub finds ways to hurt you for it.

~ Jupiter Jones