Press "Enter" to skip to content

Category: computers

Bambu Lab P1S on IoT VLAN

I recently picked up a Bambu Lab P1S 3D Printer for around the house. After staying away from 3D printing for years, the combination of a friend’s experience with this printer (thanks, @make_with_jake!), holiday sales, looking for a hobby, wanting some one-off tools, and a handful of projects where it’d be useful finally got me to buy one. Having done half a dozen prints, thus far I’m pretty satisfied with the output and think it’ll be a nice addition to the house.

This printer, like many other modern devices, is an Internet of Things (IoT) device; something smart which uses a network to communicate. Unfortunately, these can come with a bunch of security risks, and is best isolated to a less-trusted place on a home network. In my case, a separate network,or VLAN, called IoT.

Beyond the typical good-practice of isolating IoT devices to a separate network, I’m also wary of cloud-connected devices because of the possibility of remote exploit or bugs. For example, back in 2023 Bambu Lab themselves had an issue which resulted in old print jobs being started on cloud-connected printers. Since these printers get hot and move without detecting if they are in a completely safe and ready-to-go state, this was bad. I’d rather avoid the chance of this. And really, when am I going to want to submit a print job from my phone or anywhere other than my home network?

Bambu Lab has a LAN Mode available for their printers which ostensibly disconnects it from the cloud, but unfortunately it still expects everything to be on the same network.

I was unable to find clear info on working around this in a simple fashion without extra utilities, but digging into and solving this kind of stuff is something I like to do. So this post documents how I put a Bambu Lab P1S on a separate VLAN from the house’s main network, getting it to work otherwise normally.

The network here uses OPNsense, a pretty typical open source firewall, so all the configuration mentioned revolves around it. pfSense is similar enough that everything likely applies there as well, and the basic technical info can also be used to make this work on numerous other firewalls.

As of this writing (2024-Dec-19), this works with Bambu Studio v1.10.1.50 and firmware v01.07.00.00 on the P1S printer. This also works with OrcaSlicer v2.2.0 and whatever version of the Bambu Network Plug-in it installs. I suspect this works for other Bambu Lab printers, as the P1S has all the same features as the higher end ones (eg: camera) but I can’t test to say for sure. Also, everything here covers the P1S running in LAN Mode. It’s possible that things would work differently with cloud connectivity, but I did not explore this. So, insert the standard disclaimer here about past performance and future results…

Why can’t I just point the software at the printer?

To start, the release notes for Bambu Studio v1.10.0 have a section that says a printer can be added with just it’s IP, allowing it to cross networks:

Subnet binding support: Users can now bind printers across different subnets by directly entering the printer’s IP address and Access Code

This sounds like it’d solve the problem, and is a typical way for printers to work, but no… it just doesn’t work.

Despite having the required Studio and printer firmware versions I just couldn’t make it work. When trying this feature I’d see Bambu Studio trying to connect to the printer on 3002/tcp, but the printer would only respond with a RST as if that port wasn’t listening. Something’s broken with this feature, probably in the printer firmware. Maybe this’ll work in the future, but for now we needed another way…

Atypical SSDP?

On a single network the printer sends out Simple Service Discovery Protocol (SSDP)-ish messages detailing its specs, Studio receives these and lists the printer. But, SSDP is based on UDP broadcasts, so these don’t cross over to the other VLAN (subnet).

The SSDP part of a packet looks similar to:

NOTIFY * HTTP/1.1\r\n
HOST: 239.255.255.250:1900\r\n
Server: UPnP/1.0\r\n
Location: 192.168.1.105\r\n
NT: urn:bambulab-com:device:3dprinter:1\r\n
USN: xxxxxxxxxxxxxxx\r\n
Cache-Control: max-age=1800\r\n
DevModel.bambu.com: C12\r\n
DevName.bambu.com: Bambu Lab P1S\r\n
DevSignal.bambu.com: -30\r\n
DevConnect.bambu.com: lan\r\n
DevBind.bambu.com: free\r\n
Devseclink.bambu.com: secure\r\n
DevVersion.bambu.com: 01.07.00.00\r\n
DevCap.bambu.com: 1\r\n
\r\n

When Bambu Studio receives this packet it gets the address (Location:) of the printer from the Location section, connects, and all works. But in a multi-VLAN environment we have different networks and different broadcast domains and a firewall in between, so we need two things to work around this: getting the SSDP broadcasts shared across networks, and firewall rules to allow the requisite communication.

These also don’t seem to be normal SSDP packets, as they are sent to destination port 1910/udp or 2021/udp. It’s all just kinda weird… And this thread on the Bambu Lab Community Forum makes it seem even stranger and like it might vary between printer models?

Regardless, here’s how I made this work with the P1S.

Static IP

The P1S (and I presume other Bambu Lab printers) have very little on-device network configuration, receiving network addressing from DHCP. I suggest that you set a DHCP reservation for your printer so that it always receives the same (static) IP address. This will make firewall rules much easier to manage.

SSDP Broadcast Relay

To get the SSDP broadcasts passed between VLANs a bridge or relay is needed, and marjohn56/udpbroadcastrelay works great. This is available as a plugin in OPNsense under SystemFirmwarePluginsos-udpbroadcastrelay, is also available in pfSense, or could be run standalone if you use something else.

After installing, on OPNsense go to ServicesUDP Broadcast Relay and create a new entry with the following settings:

  • enabled:
  • Relay Port: 2021
  • Relay Interfaces: IoT, LAN (Choose each network you wish to bridge the printer between.)
  • Broadcast Address: 239.255.255.250
  • Source Address: 1.1.1.2 (This uses a special handler to ensure the packet reaches Studio in the expected form.)
  • Instance ID: 1 (or higher, if you have more rules)
  • Description: Bambu Lab Printer Discovery

On my OPNsense firewall, where igb1_vlan2 is my IoT network and igb1 is my LAN network, the running process looks like: /usr/local/sbin/udpbroadcastrelay --id 1 --dev igb1_vlan2 --dev igb1 --port 2021 --multicast 239.255.255.250 -s 1.1.1.2 -f

(Of course, in the event you have any firewall rules preventing packets from getting from the printer or IoT VLAN to the firewall itself — say if you completely isolate your IoT VLAN — you’ll need to allow those.)

Now when going into Bambu Studio under Devices then expanding Printers, the printer will show up. It may take a few moments as the printer to appear as the SSDP are only periodically sent, so be patient if it doesn’t appear immediately.

(Note that if other models of printers aren’t working, it may be useful to also relay port 1910. The P1S works fine with just 2021, so for now that’s all I’ve done.)

Firewall Rules

With Studio seeing the printer, and presuming that your regular and IoT VLANs are firewalled off from each other, rules need to be added to allow the printer to work. While Bambu Studio has a Printer Network Ports article, it seems wrong. I am able to print successfully without opening all the ports listed for LAN Mode, but I also needed to add one more that wasn’t listed: 2024/tcp.

Here’s everything I needed to allow from the regular VLAN to IoT VLAN to have Bambu Studio print to the P1S, along with what I believe each port to handle:

  • 990/tcp (FTP)
  • 2024/tcp to 2025/tcp (Unknown, but seems to be FTP?)
  • 6000/tcp (LAN Mode Video)
  • 8883/tcp (MQTT)

Nothing needs to be opened from the IoT VLAN, everything seems to be TCP and the stateful firewall seems to handle the return path. (Even though the Printer Network Ports article with it’s 50000~50100 range for LAN mode FTP implies active mode FTP…)

And with that, it just works. I can now have my Bambu Lab P1S on the isolated IoT VLAN from a client on the normal/regular/LAN VLAN, printer found via autodiscovery, with only the requisite ports opened up.

Missing Functionality? Leaky Data?

Note that there are a few functions — like browsing the contents of the SD card for timelapse videos or looking at the job history — which only work when connected to the cloud service. This really surprises me, as I can think of no rational reason why this data should need to be brokered by Bambu Lab.

Unless they want to snarf up the data about what you print and video of it happening and when and… and…?

Digging into that sounds worthy, but is a project for another time. It’s a pretty good reminder of why isolating IoT devices is good practice, though. For now I’ll just manually remove the SD card if I want access to these things. And consider if maybe I should completely isolate the printer from sending data out to the internet…

Citations

Big, big thanks to Rdiger-36/StudioBridge and very specifically the contents of UDPPackage.java. This utility which helps find Bambu Lab printers cross-VLAN by generating an SSDP packet, and sending it to loopback, saved me a bunch of time in figuring out how Bambu Lab’s non-standard SSDP works.

All the discussion around issue #702, Add printer in LAN mode by IP address was incredibly helpful in understanding what was going on and why this printer didn’t seem to Just Work in a multi-VLAN environment. This thread, and watching what StudioBridge did, made understanding the discovery process pretty simple.

And as much as I dislike the AGPL in general, it worked out really well here. I wouldn’t expect a company like Bambu Lab to release their software so openly, but with the AGPL they had to. Slic3r begat PrusaSlicer which begat Bambu Studio which begat OrcaSlicer giving us a rich library of slicers

Updates

2024-Dec-22: After this worked fine for a few days I ran into problems printing from OrcaSlicer where jobs wouldn’t send. Digging I found that 2025/tcp was needed as well, so I updated the article above. It seems this is another FTP port? It’d sure be nice if this was documented.

Comments closed

HDMI-CEC to Onkyo RI Bridge

ESPHome device, a Seeed Studio XIAO ESP32S3 and level shifter with 3.5mm TS and HDMI connectors.

After getting the Onkyo RI support for ESPHome and Home Assistant in place, it was neat that I could turn my Onkyo A-9050 amplifier on and off remotely, but it wasn’t actually very useful; it didn’t save me any time/hassle. This iteration, adding HDMI-CEC support, brings it all together.

Back when I started this project, my main goal was to find a nice way to deal with toggling the power on the amplifier. Because I only use a single input on the amplifier and volume is already handled by the Apple TV remote, I don’t use the remote and it’s stored away in the basement. Normal practice was to manually press the power button on the front before using it, but this was irritating so I went looking for a better way, and the result was this project.

Initially I was looking at a way to use Home Assistant to coordinate powering the Apple TV and amplifier on, but it turns out there’s no good way to power up an Apple TV remotely; or at least not from anything that’s not an Apple device. I thought about going down the path of figuring out how the iOS / iPadOS does it, but the results of that would need to be incorporated into pyatv and chasing Apple’s changes was not a path I wanted to go down.

I then began thinking about it inversely: What if I could tell when the Apple TV woke and slept, and then take action based on that? After all, it’s already using the well-established Consumer Electronics Control (HDMI-CEC) to wake the TV… What if I could listen for that? And we’re always using the Apple TV remote when watching content and there’s no need to wake it while out of the room, so pressing a button on the remote to get things started is just fine.

Well, it turns out that was easier than I thought. Using Palakis/esphome-native-hdmi-cec, a HDMI-CEC component for ESPHome, and then doing a little protocol analysis I now have a device that:

  • Listens for the Apple TV to wake up and sends and sends a Power On to the receiver.
  • Listens for the Apple TV to go into standby and sends a Power Off to the receiver.
  • Sends events to Home Assistant whenever a broadcasted HDMI-CEC Standby (0x36) or Report Power Status (0x90) are received.
  • Exposes controls in Home Assistant for a variety of Onkyo remote control commands and broadcasting an HDMI-CEC Standby (0x36). The latter puts my TV and the Apple TV to sleep, and also gets heard by ESPHome (loopback) and results in the amplifier being powered off.
  • Exposes a service in Home Assistant allowing arbitrary HDMI-CEC commands to be sent.

The result is that when I press a button on the Apple TV remote to wake it up the amplifier powers on, the TV wakes up (as before), and all is ready to go with one button press. This satisfies my original goal, and also allows some lights to be turned on automatically.

I’ve still got some lingering architectural questions and may be digging further into the HDMI-CEC stuff to see if I can make it work better, but for now I’m happy. If/when I take this further, the big questions to answer are:

  • Currently ESPHome powers on the amplifier without Home Assistant. This feels rational for a device bridging the two protocols and makes the amplifier work more like a modern HDMI soundbar, but is it the best way to go? Running it all through HA would be a lot more complicated and network (and HA) dependent, but I could instead use the notification in HA trigger a Power On at the receiver. Are there ever situations where I’d want this device to not power on the amplifier?
  • The HDMI-CEC implementation is very simple, solely listening for two messages I saw the Apple TV send and taking action on them. One of these, Report Power Status, is per-spec used to send more than notifications of power being on. Should this be changed or further built out? (Note: Because the library doesn’t implement DDC for device discovery and addressing and such, it can’t be a full-fledged implementation. But that much is likely not needed; there’s more I can do.)
  • Is it possible to wake the Apple TV via HDMI-CEC? It’s not immediately obvious how, but perhaps with a bit of probing…?

Hardware-wise, this was simple to do. All it required was getting an HDMI connector (I used this one), connecting pin 13 (CEC) to a GPIO, pin 17 to ground, and pin 18 to 5v (VUSB) as per the readme at Palakis/esphome-native-hdmi-cec. Since CEC uses 3.3v there was no need for a level shifter as with Onkyo RI. I was able to add this on to the previous adapter without a problem and everything just worked.

With this ESPHome configuration I changed things around a bit, both to simplify and secure the device and make things better overall. As I learned more about ESPHome and started thinking about securing IoT devices, I wanted to minimize the ability to do OTA updates, including via the web UI, and access the API. I also wanted to pull credentials out of my .yaml file so I could more easily share it. Changes to support this, and some other nifty things, are:

  • Setting up a secrets.yaml to hold wifi_ssid, wifi_password, ota_password, and api_encryption_key.
    • Tip: All this involves is creating a secrets.yaml file in the same directory as the configuration .yaml and putting lines such as wifi_ssid: "IoT" or api_encryption_key: "YWwyaUNpc29vdGg3ZG9oazdvaGo2YWhtZWlOZ2llNGk=" in it. Then in the main .yaml reference this with ssid: !secret wifi_ssid or key: !secret api_encryption_key or so.
    • Generating an API key can easily be done with something like: echo -n `pwgen -n 32 1` | openssl base64
  • Setting a password for OTA updates.
    • Note: Once this password is set, changing it can be a bit complicated (see ESPHome OTA Updates for more information). I suggest picking one password from the get-go and sticking with that.
  • To further minimize unapproved access, I did not enable the fallback access point mode, the captive portal, and disabled the web server component (because it’s unauthenticated and allows firmware uploads). I’m still thinking about disabling safe mode.
  • Set name_add_mac_suffix: true to add the MAC address suffix to the device name. This makes it easier to use one config on multiple devices on the same network, such as when doing development work with multiple boards. (See Adding the MAC address as a suffix to the device name.)
  • Because my Onkyo RI PR has not been merged (as of 2024-Sep-01), I had been manually patching to add it. It turns out that some PRs can automatically be incorporated into the config via external_components, and this works great for my needs until this gets merged:
external_components:
  # Add the HDMI-CEC stuff for ESPHome
  - source: github://Palakis/esphome-hdmi-cec
  # Add PR7117, which is my changes to add Onkyo RI. Had not been merged as of 2024-Sep-01.
  - source: github://pr#7117
    components:
      - remote_base

Despite stripping the configuration back a bit to secure it better, which in turn removes on-device overhead, I still have problems with the OTA update on the Seeed Studio XIAO ESP32S3. This is irritating because it means any changes require connecting a cable to flash it via USB, but I can also keep using the breadboarded SparkFun ESP32 Thing Plus for any future development.

The configuration I’m using can be found here: hdmi-cec-onkyo-ri-bridge_2024-sep-02.yaml

Note that this includes some development HDMI-CEC buttons, such as sending EF:90:00 and EF:90:01. This is part of some experimenting in attempts to wake up the Apple TV via CEC, but thus far doesn’t do anything. However, they serve as good examples of how to send multiple bytes to the bus. It also includes commented sections for the different ESP32 boards I’ve used and will likely need to be changed for your purposes.

Update on November 2, 2024

After using this for a while I ran into a couple quirks, so I’ve some updates to both the device config and ensuring it builds under the current dev version (ESPHome 2024.11.0-dev, as of about 10am EDT on 2024-Nov-02). Unfortunately this hasn’t solved the problem of uploading a new version via OTA on the Seeed Studio XIAO ESP32S3.

The current version of the device config can be found here: onkyo-a-9050_seeed_xiao_esp32c3_v1.2.0.yaml

The main changes here are the ESPHome device no longer takes action based on the received HDMI-CEC commands (via Onkyo RI), and I cleaned up and clarified the events. There are three distinct events that can be acted upon:

  • HDMI-CEC: Report Power Status: On: Something reported its power status as On.
  • HDMI-CEC: Report Power Status: Standby: Something reported it’s power status as Standby.
  • HDMI-CEC: Standby Command: Something sent a Standby command.

I now use Home Assist to trigger on HDMI-CEC: Report Power Status: On and turn on some lights and press Onkyo RI: On button, turning the amplifier on. For shutting things down I trigger on HDMI-CEC: Report Power Status: Standby and turn the amplifier and lights off. This is more dependent on HA, but it also gives me more flexibility.

(I’ve not (yet) started looking into waking the Apple TV via HDMI-CEC.)

Comments closed

Onkyo RI for ESPHome / Home Assistant

ESPHome Devices for Onkyo RI output; final and prototype.

Our living room has a very simple setup: a non-networked TV, an Apple TV, and an older Onkyo A-9050 amplifier that drives two small speakers and a subwoofer. It’s a great sounding yet simple setup for two channel audio, perfect for the basic streaming video watching we do.

Being older the amplifier doesn’t have any of the modern (eg: HDMI CEC) mechanisms for controlling it, but it does have a 3.5mm tip sleeve input on the back for Onkyo RI. This old, proprietary system uses a wired connection creating a bus that allows different Onkyo components to be controlled from one central component and thus one IR remote control.

This protocol is well documented, both via the LIRC project and some other sites (ref: LIRC documentation, Onkyo RI Protocol, docbender/Onkyo-RI) so this got me thinking it’d be pretty easy to implement in ESPHome and thus make the receiver controllable from Home Assistant. While this is only one-way control (since it’s basically a wired version of an IR remote), it would still allow for remote power on/off, input changing, etc.

After a few false starts, it turns out it was easy. Thanks to some pointers from folks in the ESPHome Discord I realized the best way was adding support for the protocol to the existing Remote Transmitter integration. Since this integration already had other protocols which used similarly timed protocols it was pretty easy for me to add Onkyo RI by copying the structure from another and modifying it for this protocol. (For reference, it’s not standard serial and requires specific timings, so it wasn’t as simple as just using a UART.)

I’ve since submitted PR #7117 to the ESPHome project to contribute this back, but despite passing all tests I’m still waiting for it to be accepted. (I looked into creating a custom component that could be included from another GitHub repo, but since this was best implemented by modifying an existing component, that didn’t make sense to me.) Until this gets accepted, I’ll just have to build esphome locally or if others want to do it, patching things based on the files in the PR.

Getting it all wired up was pretty simple with the only thing needed was getting electrical levels right, as the ESP32 microcontrollers use 3.3V logic and RI uses 5V. Thankfully a simple level shifter based around a FET can handle this. I first prototyped it with a SparkFun ESP32 Thing Plus and an Adafruit BSS138 on a breadboard and this worked great.

I really don’t like the idea of having a fragile and ugly breadboard sitting in the living room so I made plans to replace it with something smaller. After ordering parts and letting them sit for a few weeks, I finally got around to it one rainy Sunday afternoon.

This smaller, final implementation uses a Seeed Studio XIAO ESP32S3, a cheap level shifter board from Amazon (electrically identical to the Adafruit BSS138), and a 3.5mm TS cable. This was all wired up then bundled, along with the ESP32’s external 2.4 GHz antenna, into the a single blob inside of some heatshrink tubing making for a simple, streamlined final package. This works great, and now I have a single thumb-sized module with a USB-C connector (for power input and reprogramming) on one end and a 3.5mm plug on the other for the receiver. And it shows up wonderfully in HA and works as a remote control.

While the initial prototyping went great, I did run into two problems worth mentioning:

First, the one of the super-cheap level shifters I got from Amazon seemed to be bad. After hooking it up levels seemed all wrong, and I was seeing 3.3V at the ESP32 end and a solid 5V at the plug. Turned out to be a bad level shifter (or perhaps bad PCB) but by moving to the second shifter on the same board things were fine.

Second, when attempting to do an OTA update after the Onkyo RI firmware on the ESP32 S3 is running, it fails, indicating that Component esphome.ota took a long time for an operation (7339 ms)..

If I flash it back via USB with a default ESPHome config (via ESPHome web), it then OTA updates fine. This only happens on the ESP32 S3 and didn’t happen on the ESP32 WROOM, and seems related to how long the OTA takes on this module or maybe something caused by wireless transmission speeds? I didn’t try a serial upload nor troubleshoot any further as I both have a good workaround and see no need to reprogram the device any time soon.

The ESPHome configuration used for the final version can be found here: onkyo-a-9050_seeed_xiao_esp32c3.yaml. This uses a handful of commands that I tested to work on the A-9050. For other Onyko RI receivers there may be different commands needed; I suggest consulting the protocol docs mentioned above to discover others. I made a point of adding rational icons to each so that once added to Home Assistant things look good.

Using these is then nice and straightforward in HA, such as a basic button here on my dashboard which sends the Toggle On/Off (power) command:

type: button
show_name: true
show_icon: true
tap_action:
  action: toggle
entity: button.onkyo_a_9050_toggle_on_off
name: Onkyo A-9050
hold_action:
  action: none
icon_height: 40px

I’m not sure where I’ll go next with this. Toggling power on the receiver from a dashboard is neat, but not that important. Ideally I’d like to have a single automation that will change a couple of lights, turn on the receiver, and result in the Apple TV and television itself being turned on, but there’s still pieces missing to allow this.

It seems an Apple TV can’t be woken over the network when sleeping, the TV is not network accessible, and the receiver does not transmit status. So, I can’t do this with my current setup. I believe that it may be possible to build an ESPHome HDMI CEC device and connect it to another input on the TV to wake things up using something like Palakis/esphome-native-hdmi-cec, but that’ll be another project… At least now I’ve got a spare breadboarded ESP32 to start down that path. Time to order some HDMI breakout connectors, I guess.

Comments closed

Sunrise-like Alarm Clock via Home Assistant + Android

Bedside Sunrise Alarm Clock Setup

Quite a few years ago I came across Lighten Up!, which was a dawn-simulating alarm clock module that got connected between an incandescent lamp and used gently increasing light instead of noise. Coupled with a halogen bulb (that’d start out very yellow at lowest brightness) I had a wonderful sunrise-like alarm clock and it was much, much nicer than a beeping alarm.

The LCD displays in the Lighten Up! units began failing so I couldn’t change the programming, which was a hassle as the clocks in them drifted by a couple minutes per month. With a combination of COVID-19 remote work eliminating the need for an alarm clock and the devices dying, in the trash they went. (They also didn’t work right with LED bulbs, and now the person making them has closed down the business.)

I’ve been trying to use an alarm to stay on a more regular sleep schedule and while a bunch of other wake-up lights are available, they are dedicated units that are basically alarm clocks with built in lights. I really liked the elegance of the Lighten Up! and how it’d use an existing lamp, and outside of dedicated smart bulbs + an app I couldn’t find anything else like it. For a while I thought about developing my own hardware version that’d also work with LED bulbs, but never got around to it.

Lighten Up! (Image from Pintrest)

This winter I’ve been experimenting with Home Assistant (HA), and it turns out that with a couple cheap Zigbee parts (bulb and pushbutton from IKEA) it allows for a wonderful replacement/upgrade sunrise alarm idea. A next-generation Lighten Up!, if you will.

With everything put together the lamp next to my bed will now slowly come up to brightness 15 minutes before the wake-up alarm on my phone, reaching final as the normal alarm triggers. If I change the alarm time on my phone, or shut it off, the light-up alarm in HA will follow suit. Additionally, a physical button on the nightstand turns off the light off while replicating a sunrise alarm, or otherwise toggles the light on and off.

Even better, if I’m not home or if the alarm is set for other than between 3:00 AM and 9:00 AM (times during which I’d likely be in bed and wanting to wake up) the light won’t activate. This allows me to use alarms during the normal day for other things without activating with the light, or while traveling without waking Kristen.

Between this and the gently-increasing volume (and vibration) alarm built into the Android clock which triggers at the end of the sunrise cycle it’s a very nice, gradual wake-up system. And, all of this happens without any cloud services or ongoing subscriptions. My HA instance is local; the phone app communicates directly with it across either my home or the public networks. Communication between the physical controls and lights is a local, private network.

In this post I’ll document the major building blocks of how I did this so that someone else with basic Home Assistant experience (and a functioning HA setup, which is beyond the scope of this writeup) can do the same.

For reference, my Home Assistant hardware setup for this piece is:

With the Home Assistant Companion App for Android running on an Android phone, Home Assistant can get the date and time of the next alarm. After installing the app, go into SettingsCompanion appManage sensors and enable the Next alarm sensor. My phone is named Pixel 8, so the alarm is now available as entity sensor.pixel_8_next_alarm. Note that this is not available if an iPhone (or other iOS device) is used. (ref: Next Alarm Sensor)

Part of setting up HA configures a Zone (location) called Home. This, combined with the default location information collected by the companion app, allows HA to know if my phone is at Home (or elsewhere), via the the state of entity device_tracker.pixel_8 (eg: home).

Note: While I give YAML of the automations for configuration reference, most of these automations were built using the GUI and involve the (automatically generated) entity and device IDs. If you are setting this up you’ll want to use the GUI and build these out yourself using the code for reference.

To make this all work, three community components are used and must be installed:

Ashley’s Light Fader 2.0: This script takes a light and, over a configured amount of time, fades from the light’s current setting to the defined setting (both brightness and color temperature) using natural feeling curves (easing). It will also cancel the fade if some conditions are met. I use this to have the light fade, over 15 minutes, using a sine function, to 70% brightness and 4000K temperature, and cancel the fade if the light is turned off or brightness changes significantly, the latter of which allows the button next to the bed to cancel the alarm.

To make this happen I turn on the bulb at 1% brightness and 2202K (it’s warmest temperature), then use the script to fade to 70% and 4000K over the course of 15 minutes. This does a decent job of replicating a sunrise or the results of the Lighten Up! with a halogen bulb.

This is configured as an automation I call Bedroom Steve Nightstand: Lighten Up! (Sunrise). Note that it has no trigger because it’ll be called from the next automation:

alias: "Bedroom Steve Nightstand: Lighten Up! (Sunrise)"
description: ""
trigger: []
condition: []
action:
  - condition: state
    entity_id: light.bedroom_test_bulb_light
    state: "off"
  - service: light.turn_on
    metadata: {}
    data:
      brightness_pct: 1
      color_temp: 500
    target:
      entity_id: light.bedroom_test_bulb_light
  - service: script.1705454664908
    data:
      lampBrightnessScale: zeroToTwoFiftyFive
      easingTypeInput: easeInOutSine
      endBrightnessEntityScale: zeroToOneHundred
      autoCancelThreshold: 10
      shouldStopIfTheLampIsTurnedOffDuringTheFade: true
      shouldResetTheStopEntityToOffAtStart: false
      shouldInvertTheValueOfTheStopEntity: false
      minimumStepDelayInMilliseconds: 100
      shouldTryToUseNativeLampTransitionsToo: false
      isDebugMode: false
      light: light.bedroom_test_bulb_light
      transitionTime:
        hours: 0
        minutes: 15
        seconds: 0
      endColorTemperatureKelvin: 4000
      endBrightnessPercent: 70
mode: single

Adjustable Wake-up to Android alarm v2: This blueprint for an Automation takes the time from the next alarm sensor (alarm_source) to trigger an action before the alarm happens. I use this to initiate Ashley’s Light Fader 2.0 at 15 minutes before my alarm, only when my phone is at Home, and and the alarm is between 3:00 AM and 9:00 AM.

Part of configuring this is setting up a Helper or basically a system-wide variable, called Pixel 8 Next Alarm (entity id: input_datetime.pixel_8_next_alarm, type: Date and/or time).

This is configured as an automation called Bedroom Steve Nightstand: Lighten Up at 15 Before Alarm, set to only run if my phone is at Home and it’s between 3:00 AM and 9:00 AM:

alias: "Bedroom Steve Nightstand: Lighten Up at 15 Before Alarm"
description: ""
use_blueprint:
  path: homeassistant/adjustable-wake-up-to-android-alarm.yaml
  input:
    offset: 900
    alarm_source: sensor.pixel_8_next_alarm
    alarm_helper: input_datetime.pixel_8_next_alarm
    conditions:
      - condition: device
        device_id: 1fb6fd197bd2b771249ae819f384cfe2
        domain: device_tracker
        entity_id: e695e05f01a328b349a42bfd7d533ef6
        type: is_home
      - condition: time
        after: "03:00:00"
        before: "09:00:00"
    actions:
      - service: automation.trigger
        metadata: {}
        data:
          skip_condition: true
        target:
          entity_id: automation.lighten_up

I don’t want to get out a phone and dig into an app to manage the light, so next to the bed I have a TRÅDFRI Shortcut Button for controlling the light. If the button is pressed while the light is simulating sunrise, it turns off. If the light is off it turns it on, or visa versa.

Because turning the light off mid-dimming leaves it set at the current color and brightness, I use this instead of the normal Toggle action. In here I check the state of the bulb and either turn it off (if on), or turn it on to 100% brightness and 4000K if it is off:

alias: "Bedroom Steve Nightstand: Light Toggle"
description: >-
  Doesn't use the normal toggle because it needs to set the light color and
  brightness just in case it was left at something else when turned off
  mid-alarm.
trigger:
  - device_id: 12994a6c215ae1d4cfb86e261a2b2f3b
    domain: zha
    platform: device
    type: remote_button_short_press
    subtype: turn_on
condition: []
action:
  - if:
      - condition: device
        type: is_on
        device_id: e3421c7d54269752a371fe8443daf95f
        entity_id: 78599118c4ab8043cf03ce6532546b94
        domain: light
    then:
      - service: light.turn_off
        metadata: {}
        data:
          transition: 0
        target:
          entity_id: light.bedroom_test_bulb_light
      - stop: ""
    alias: On to Off
  - if:
      - condition: device
        type: is_off
        device_id: e3421c7d54269752a371fe8443daf95f
        entity_id: 78599118c4ab8043cf03ce6532546b94
        domain: light
    then:
      - service: light.turn_on
        metadata: {}
        data:
          color_temp: 153
          transition: 0
          brightness_pct: 100
        target:
          entity_id: light.bedroom_test_bulb_light
      - stop: ""
    alias: "Off to On: Full Brightness and 4000K"
mode: single

Finally, I also have this all displaying, and controllable, via a card stack in a dashboard. For the next alarm info I started with the template in this post but modified it to simplify one section by using now(), fix a bug in it that occurs with newer versions of HA, and then build it into something that better illustrates the start and end of the simulated sunrise. Because normal entity cards can’t do templating (to dynamically show data) I used TheHolyRoger/lovelace-template-entity-row and some Jinja templating to make it look nice.

This gives me a row which shows the next alarm time (or “No alarm” if none set), nicely formatted, and has a toggle that can enable/disable the Bedroom Steve Nightstand: Lighten Up at 15 Before Alarm automation. Finally, I added a row of buttons to allow easy toggling between 1% / 454 mireds, 33% / 357 mireds, 66% / 294 mireds, and 100% / 250 mireds so I can manually set the light to some nice presets across dawn to full brightness.

Note: There is an older version of this template in HACS, thomasloven/lovelace-template-entity-row in the Home Assistant Community Store (HACS), but it has a bug which keeps the icon from changing color to reflect the state of the automation.

type: vertical-stack
cards:
  - type: entities
    title: Bedroom
    entities:
      - type: custom:template-entity-row
        entity: automation.adjustable_wake_up_to_android_alarm
        name: Sunrise Alarm
        icon: mdi:weather-sunset-up
        active: '{{ states("automation.adjustable_wake_up_to_android_alarm"), "on") }}'
        toggle: true
        tap_action: none
        hold_action: none
        double_tap_action: none
        secondary: >-
          {% set fullformat = '%Y-%m-%d %H:%M' %}
          {% set longformat = '%a %b %-m %-I:%M %p' %}
          {% set timeformat = '%-I:%M %p' %}
          {% if states('sensor.pixel_8_next_alarm') != 'unavailable' %}
            {% set sunrise_start = state_attr('input_datetime.pixel_8_next_alarm', 'timestamp') | int %}
            {% set sunrise_end = (state_attr('sensor.pixel_8_next_alarm', 'Time in Milliseconds') /1000) | int %}
            {% if sunrise_start | timestamp_custom('%Y-%m-%d', true) == (now().timestamp() | timestamp_custom('%Y-%m-%d', true)) %}
              {% set sunrise_start_preamble = 'Today' %}
            {% elif (1+ (sunrise_start - now().timestamp() | int) / 86400) | int == 1 %}
              {% set sunrise_start_preamble = 'Tomorrow' %}
            {% elif (1+ (sunrise_start - now().timestamp() | int) / 86400) | int <= 7 %}
              {% set sunrise_start_preamble = sunrise_start | timestamp_custom('%A',true) %}
            {% else %}
              {% set sunrise_start_preamble = sunrise_start | timestamp_custom('%a %b %-m', true) %}
            {% endif %}
            {% if sunrise_end | timestamp_custom('%Y-%m-%d', true) == (now().timestamp() | timestamp_custom('%Y-%m-%d', true)) %}
              {% set sunrise_end_preamble = 'Today' %}
            {% elif (1+ (sunrise_end - now().timestamp() | int) / 86400) | int == 1 %}
              {% set sunrise_end_preamble = 'Tomorrow' %}
            {% elif (1+ (sunrise_end - now().timestamp() | int) / 86400) | int <= 7 %}
              {% set sunrise_end_preamble = sunrise_end | timestamp_custom('%A',true) %}
            {% else %}
              {% set sunrise_end_preamble = sunrise_end | timestamp_custom('%a %b %-m', true) %}
            {% endif %}
            {% if (sunrise_start_preamble == sunrise_end_preamble) %}
              {% if sunrise_start_preamble == 'None' %}
                {{ sunrise_start | timestamp_custom(longformat, true) }} - {{ sunrise_end | timestamp_custom(timeformat, true) }}
              {% else %}
                {{ sunrise_start_preamble }} {{ sunrise_start | timestamp_custom(timeformat, true) }} - {{ sunrise_end | timestamp_custom(timeformat, true) }}
              {% endif %}
            {% else %}
              {% if sunrise_start_preamble == 'None' %}
                {{ sunrise_start | timestamp_custom(longformat, true) }} - {{ sunrise_end | timestamp_custom(longformat, true) }}
              {% else %}
                {{ sunrise_start_preamble }} {{ sunrise_start | timestamp_custom(timeformat, true) }} - {{ sunrise_end_preamble }} {{ sunrise_end | timestamp_custom(timeformat, true) }}
              {% endif %}
            {% endif %}
          {% else %}
            No alarm set on {{ state_attr('device_tracker.pixel_8', 'friendly_name') }}
          {% endif %}
      - type: divider
      - entity: light.bedroom_test_bulb_light
        name: Steve's Nightstand
        icon: mdi:bed
        entity_data:
          brightness: 255
          color_temp_kelvin: 4000
    show_header_toggle: false
    state_color: true
  - type: grid
    square: false
    cards:
      - show_name: false
        show_icon: true
        show_state: false
        type: button
        tap_action:
          action: call-service
          service: light.turn_on
          data:
            brightness_pct: 1
            color_temp: 454
          target:
            entity_id: light.bedroom_test_bulb_light
        name: 1%
        icon: mdi:moon-waning-crescent
        hold_action:
          action: none
      - show_name: false
        show_icon: true
        show_state: false
        type: button
        tap_action:
          action: call-service
          service: light.turn_on
          data:
            brightness_pct: 33
            color_temp: 357
          target:
            entity_id: light.bedroom_test_bulb_light
        name: 33%
        icon: mdi:moon-last-quarter
        hold_action:
          action: none
      - show_name: false
        show_icon: true
        show_state: false
        type: button
        tap_action:
          action: call-service
          service: light.turn_on
          data:
            brightness_pct: 66
            color_temp: 294
          target:
            entity_id: light.bedroom_test_bulb_light
        name: 66%
        icon: mdi:moon-waning-gibbous
        hold_action:
          action: none
      - show_name: false
        show_icon: true
        show_state: false
        type: button
        tap_action:
          action: call-service
          service: light.turn_on
          data:
            brightness_pct: 100
            color_temp: 250
          target:
            entity_id: light.bedroom_test_bulb_light
        name: 100%
        icon: mdi:moon-full
        hold_action:
          action: none
    columns: 4

The result of all of this is that, if my phone is at home and I have an alarm set between 3:00 AM and 9:00 AM, the light next to the bed will simulate a 15-minute sunrise before the alarm goes off. If the light is simulating a sunrise, pressing the button will turn it off. Otherwise, the button toggles the light on and off at full brightness, for normal lamp-type use. Finally, via the Home Assistant UI I can easily check the status of, or turn off, the sunset alarm if I don’t want to use it.

So far, this is working great. There’s two things I’m looking into changing:

First, the bulb I’m using, 405.187.36, is an 1100 lumen maximum brightness. This is a bit too bright for the final stage of the alarm, and it’s minimum brightness is a bit higher than I’d like and seems a little abrupt. (Ideally the initial turn-on won’t be noticable.)

Since IKEA bulbs are cheap and generally work well, I’ll likely try a few other lower brightness ones and see how they work out. Both 605.187.35 (globe) and 905.187.34 (chandelier) are color temperature adjustable, 450 lumen maximum, cost $8.99, and look like good candidates as I expect their minimum brightness to be lower.

There is also 104.392.55 ($12.99), but it is fixed at 2200K and has a maximum brightness of 250 lumens. I suspect this will be nicely dim for the start, but wouldn’t allow a color transition and might not have enough final brightness to make me feel ready for the day.

I may also try something like 204.391.94 ($17.99), which is adjustable color, as this could allow me to use something like the sunrise color pallete, but this would require moving to a different script for fading. The current script doesn’t support fading between colors (see here for discussion around this), so this would take a lot of work on my part. Probably more than would be beneficial, since varying color temp on white-range bulbs is pretty darn good already.

Second, the TRÅDFRI Shortcut Button (203.563.82) that I’m using has been discontinued. It’s a nice, simple button, and I can trigger on it using short or long press. It’s replacement, SOMRIG Shortcut Button (305.603.54), isn’t in stock at my local IKEA so I don’t have one, but I expect it to be two buttons that can each have short or long presses, and perhaps even double-click on each. If so, I may add something more like dimming the nightstand light to use as a reading light, or perhaps something to leave on for the dogs when we’re gone.

Thinking a bit bigger picture I could even do things like use an in-wall dimmer to have the adjacent closet lights serve as wake-up lights. But as all the quality ones of these are Z-Wave I’d have to get another radio for the Pi and… and…

The possibilities for this stuff are nearly endless, which is neat, because it becomes an engineering problem of what to do that provides sufficient benefit without complexity for complexity’s sake. This, at least, a Home Assistant-based replacement for the old, beloved Lighten Up!, is great.

Note: This post has been updated a few times since original posting to fix grammar, a bug in the Jinga2 template for displaying the next alarm, and to add buttons for setting lamp brightness.

Comments closed

_wahoo-fitness-tnp._tcp.local

Wahoo smart trainers support network connectivity (instead of just the traditional Bluetooth or ANT+). Since I don’t have one I’d never bothered looking into how it works, but this morning while troubleshooting something with TrainerRoad running in the background I happened to see an mDNS query for _wahoo-fitness-tnp._tcp.local and realized this is how the smart trainers get discovered on the network.

Neat!

Maybe one day I’ll have a smart trainer that can use the network and I can dig further into how this all works.

Comments closed

NGINX on OPNsense for Home Assistant

I’ve been experimenting with Home Assistant (HA) for some temperature monitoring around the house. It has a great mobile client that’ll work across the public internet, but HA itself unfortunately it only does HTTP by default. It has some minor built in support for HTTPS by using the NGINX proxy and Let’s Encrypt (LE) Add-ons, but for a couple of reasons[1] I didn’t like this solution. I’m not about to expose something with credentials across the public internet via plain HTTP, so I wanted to do this proxying on my firewall instead of on the device itself.

My firewall at home runs OPNsense which has an NGINX Plugin, along with a full featured ACME client that I’m already using for other certificates, so it was perfect for doing this forwarding. After a bit of frustration, fooling around, and unexpected errors I got things working, so I wanted to share a simple summary of what it took to make it work. I’m leaving the DNS, certificate, and firewall sides of this out, as they’ll vary and are well documented elsewhere.

Here’s the steps I used:

  • Set up DNS so the hostname you wish to use is accessible internally and externally. In this example homeass.site.nuxx.net will resolve to 24.25.26.13 on the public internet, and 192.168.2.1 at home, which are the WAN and LAN interfaces on the OPNsense box.
  • Set up the ACME plugin to get a certificate for the hostname you will be using for, in this case homeass.site.nuxx.net.
  • On your Home Assistant instance, add the following to the configuration.yaml. This tells HA to accept proxied connections from the gateway. If you don’t do this, or specify the wrong trusted_proxy, you will receive a 400: Bad Request error when trying to access the site via the proxy:
http:
  use_x_forwarded_for: true
  trusted_proxies:
    - 192.168.2.1/32
  • In OPNsense, install NGINX.
  • In ConfigurationUpstreamUpstream Server define your HA instance as a server:
    • Description: HA Server
    • Server: 192.168.2.23 (your Home Assistant device)
    • Port: 8123 (the port you have Home Assistant running on, 8123 is the default)
    • Server Priority: 1
  • In ConfigurationUpstreamUpstream define a grouping of upstream servers, in this case the one you defined in the previous step:
    • Description: Home Assistant
    • Server Entries: HA Server
  • In ConfigurationHTTP(S)Location define what will get redirected to the Upstream:
    • Toggle Advanced Mode
    • Description: Home Assistant
    • URL Pattern: /
    • Upstream Servers: Home Assistant
    • Advanced Proxy OptionsWebSocket Support: ✓
  • In ConfigurationHTTP(S)HTTP Server define the actual server to listen for HTTP connections:
    • HTTP Listen Address: Clear this out unless you want to proxy HTTP for some reason.
    • HTTPS Listen Address: 8123 and [::]:8123. Leave out the latter if you don’t wish to respond on IPv6.
    • Default Server: ✓
    • Server Name: homeass.site.nuxx.net
    • Locations: Home Assistant
    • TLS Certificate: Pick the certificate that you created early on with the ACME plugin.
    • HTTPS Only: ✓ (Unless for some reason you wish to support cleartext HTTP.)
  • Then under General Settings check Enable nginx and click Apply.
  • Finally, if needed, be sure to create the firewall rule(s) needed to allow traffic to connect to the TCP port you designated in the HTTP Server portion of the NGINX configuration.

[1] Reasons for doing the proxying on the firewall include:

  • The Let’s Encrypt Add-on won’t restart NGINX automatically on cert renewal as OPNsense can. This means I’d have to either write something to do it, or manually restart the add-on to avoid periodic certificate errors.
  • If NGINX is running on the same device as Home Assistant, then it needs to be on a different port. I prefer using the default port.
  • I’d prefer to run just one copy of NGINX on my network for reverse proxying.
  • While experimenting with NGINX and LE on HA I kept running into weird problems where something would start logging errors or just not work until I restarted the box. With everything running as containers, troubleshooting intermittent issues like these is painful enough that I preferred to avoid it.
Comments closed

Command Line 802.11 Monitor Mode on macOS Sonoma (14.0)

Because it supports monitor mode, a Macbook with the built-in WiFi adapter is one of the simplest ways to grab packets off the air. It’s not the most robust, but often all I need to do is grab data from a couple devices I’m near on a known channel, so fancy antennas and channel hopping and whatnot is overkill; I just need to grab packets. Using the Sniffer built into the Wireless Diagnostics captures in Monitor Mode has been fairly easy for a while, but I was stuck using the GUI.

For a while macOS has had a command line utility called airport to handle all sorts of wireless network manipulation, log gathering, and debugging. It also has a poorly documented command verb sniff, but until the release of macOS Sonoma (14.0) it was only possible to specifying the channel. Not being able to specify the width made it useless for most capturing I’d do in the real world.

Thankfully the airport command now works for channel and width, so now it’s possible to use remotely, in scripts, etc. It’s not well documented, but it works. For example, the following will capture on en0 on 5GHz channel 137 with 80MHz width:

airport en0 sniff 5g137/80

This will capture en1 on 2.4GHz channel 7 at 20MHz width:

airport en0 sniff 2g7/20

Output files end up randomly named in /tmp in pcap format with a name of /tmp/airportSniff??????.cap. They can be opened in Wireshark or your analysis tool of choice.

(I suspect that sniffing from 6GHz WiFi will follow the same pattern, but I don’t have access to a device with such a radio so I’m unable to test. It’d also be pretty nifty to see this somehow built in / better automated via Wireshark… That could be a neat project for later.)

The airport binary can be found at /System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport. I link this to ~/bin, with something like the following:

ln -s /System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport ~/bin/airport

I keep ~/bin around for personal executable stuff, and it’s been added to my path by putting a line like this in ~/.zshrc:

export PATH=".:$PATH:$HOME/bin"

The airport binary itself has a pretty decent output from --help. It’s light on sniffing examples, but pretty good for other stuff.

Amusingly, this is pretty much the extent of the airport(8) man page; a TODO:

DESCRIPTION
airport manages 802.11 interfaces. airport more information needed here.

Comments closed

Using OpenSSL to Match Certificates and Private Keys

I recently was troubleshooting a problem with network authentication and suspected that the issue was around certificates and private keys not matching on a client. I had a .PEM file for the certificate and a .KEY for the private key, and I wanted to see if they matched.

Thankfully OpenSSL, the Swiss Army Knife of wrangling certs, made it easy. While this isn’t anything particularly secret, it took me a few to figure it out, so I’m re-documenting it here.

To see if the private key matches the certificate, use the following two commands and compare the Modulus section:

openssl x509 -in file.pem -noout -text

openssl rsa -in file.key -noout -text

If they match, the private key matches the certificate. If they don’t, they don’t.

In my case they didn’t match, which was causing the authentication problems. So we then solved what was happening during cert issuance and everything was then good.

Comments closed

Garmin Edge 530 WiFi Connection Weirdness

Today I tried to connect my Garmin Edge 530 running the latest firmware (v9.73) to my home wireless network, and couldn’t get it working. my friend Nick dug up a solution, so I wanted to share it here.

The problem I had is that when trying to find the network to join, either in the Garmin Connect mobile app or right on the device, my WPA2-secured wireless network would be shown as Unsecured and I couldn’t join it. No matter what I tried, on device or in app, switching around network types, names, security, or bands, the Edge 530 always saw it as Unsecured. The one thing I didn’t try was making a non-secured network, but that’s not an option for me.

Turns out you can work around this by using the Garmin Express desktop app then going into the 530, Tools & Content, Utilities, then under Wi-Fi Networks manually adding the network with appropriate security, password, etc.

After saving settings and ejecting the device it joined my wireless network, as confirmed on the device, in the DHCP leases, and on the APs themselves. Now it’ll automatically sync rides whenever I get back home.

Something else odd is the Edge 530 truncates 32-character SSIDs. My network at home is Smart Meter Surveillance Network, which is 32 ASCII characters long. This is the maximum allowed by 802.11, which is 32 octets, or 32 sets of 8 bytes, or 32 ASCII characters. For some reason in much of the Garmin UI it’d drop the last character, truncating to Smart Meter Survellance Networ. Thinking this was the problem I first dug into network name as a problem but eventually found a shorter SSID didn’t help. Also, this isn’t the first device I’ve had with SSID length problems (see Bypassing Reolink SSID Length Limitation); thankfully in this case it only seemed to be a display issue.

Comments closed

How ASUS and a Microsoft Bug Almost Broke Remote Work

A couple of years after it happened I’m sharing this story about the intersection of an OS bug, a network hardware quirk, and a global pandemic. A chain of semi-esoteric things aligned and only caused noticeable problems in a very specific — dare I snarkily say unprecidented — situation.

I found this issue both fascinating and maddening and I hope you will as well. This does not contain code-level details of the bug (I don’t have them), but I’m sharing it both to document this problem and share a little story of what goes on behind the scenes in supporting a big enterprise IT environment.

In March 2020, when Stay Home, Stay Safe began in Michigan, most of my fellow employees began working from home (WFH). For years we’d been building IT systems to allow most people to work from anywhere, and the shift to WFH was going great. Between VPN connections back to the corporate network, lots of things in the cloud, and Azure Active Directory (AAD) to handle Single Sign-On (SSO) for almost every company application, all most folks needed for remote work was their standard laptop and an internet connection. The computing experience of working from home was effectively the same experience as working in the office.

Sure, we had some bumps with home internet connections not being robust enough, but we helped people through those. Mostly we’d find that what someone thought was a good home internet connection — because their phone or video streaming worked fine — wasn’t great for things like moving big files around or video calls. [1] Generally those having network performance issues had fine service from their ISP, but their home routers were old and not up to task. We would recommend upgrading the router, they’d go buy something new, and all would be good.

In early autumn we began receiving reports of users getting the notorious You Can’t Get There From Here (YCGTFH) message when trying to access anything that used AAD for SSO. This generic message is displayed when AAD authentication fails and access is denied. Because so many things were behind AAD SSO this interrupted a lot of work. These computers either were no longer joined to AAD or had an expired token, and it seemed tied to internet access.

Digging into it, there would be errors shown by dsregcmd /status (the AAD CLI utility) and Test-DeviceRegConnectivity.ps1. This script checks internet access as SYSTEM to the AAD public endpoints would fail on all three connection tests, implying the lack of connectivity to Microsoft endpoints was keeping AAD registration (and token refresh) from working properly. But the user still could browse the web and hit those URLs and we hadn’t changed anything internet access-wise since the WFH began.

We found that manually setting a proxy server for the whole of the system (via netsh winhttp set proxy) would allow AAD registration (dsregcmd /join) to succeed and SSO would then work. We also found that restarting the WinHTTP Web Proxy Auto-Discovery Service (WinHttpAutoProxySvc), which handles WPAD, would sometimes fix the problem, but only sometimes.

Even more confusingly sometimes a reboot would fix it. Or, sometimes if a user drove into the office and used the network there, it would work. But not always. [2]

Simply, we had some computers whose SYSTEM account couldn’t access the internet for so long that an AAD token had expired, this broke SSO and users were being told You Can’t Get There From Here.

Typical for a lot of large organizations we have authenticating proxy servers sitting between the client network and the public internet. All requests bound for the internet need to go through them, and these proxies are located by a Proxy Auto-Config (PAC) file that is found either by a direct setting (AutoConfigUrl) or Web Proxy Auto-Discovery (WPAD), via DNS, both of which send the same file. We directly set the PAC file URL on a per-user basis and leave WPAD at its default of enabled. Thus for the end user and things running under their account, WPAD is used, falling back to the PAC file setting if that fails. For the SYSTEM account a direct PAC file setting is not used, relying solely on WPAD to find the path to the internet. [3]

Looking at a network capture when AAD registration would fail instead of the normal chain of events requests we’d see no DNS requests for WPAD, and no PAC file download. Instead we saw the client attempting to resolve the AAD endpoints via DNS, and then would attempt to reach out directly to them, which would be blocked by the company firewall. The proxy was not being used; WPAD wasn’t working. This was weird because every piece worked when tested independently (DNS resolution for WPAD hostnames, invoke-webrequest http://x.x.x.x/wpad.dat, specifying the WPAD PAC file in AutoConfigUrl), but as a whole it just didn’t work.

This went on for quite a while, supported by a Premier case with Microsoft. We could see that WPAD was frequently failing, but struggled with getting a consistent reproduction and going down dead-ends. We bandaged the problem with manual, direct proxy settings and AAD registration. This was mostly fine short-term, but caused overhead for our support folks and was a ticking bomb.

Then one day, thanks to a fortuitous conversation with a very smart lead Microsoft engineer while working another issue I found out about a bug with Microsoft’s WPAD implementation that was just discovered and was being patched in the next round of patches. The description exactly explained our problem and I was elated.

It turned out that if Windows 10 received a blank DHCP option 252, the WinHttpAutoProxySvc service would not query DNS for WPAD, and — the broken part — it would never do so again until the service was restarted. Directly configured PAC files would be used, but WPAD was broken. Here in our environment the SYSTEM account would not have internet access and this meant AAD registration, Test-DeviceRegConnectivity.ps1, the Microsoft Store, and all such internet-needing SYSTEM-level things didn’t work.

Apparently some home router vendors — most notably the hugely-popular ASUS [4] — would send option 252 but leave it blank because they found doing so reduced name resolution requests from clients. This is seen in Wireshark as:
Option: (252) Private/Proxy autodiscovery
    Length: 1
    Private/Proxy autodiscovery: \n

Windows, which has WPAD enabled by default, will try a number of name resolution queries (DNS for wpad/ wpad.local.tld.com/wpad.local.com, NetBIOS, LLMNR, etc) to locate a PAC file server if it does not see a DHCP option 252. Because WinHttpAutoProxySvc looks to DHCP then only tries DNS if that fails, by setting this option but leaving it blank the name resolution steps would not occur. I can only guess as to why these vendors find it desirable, but perhaps they like reducing the load on the built-in DNS forwarders, or they saw it as a security benefit or… who knows. Either way, the result of this blank option and the Windows bug was that WPAD — via DHCP and DNS — didn’t work.

So what is option 252? In WPAD there are multiple discovery mechanisms for finding the PAC file server. Beyond DNS there is also a Dynamic Host Configuration Protocol (DHCP) method where, along with the typical network address settings, the client receives the URL for downloading the PAC file. This is done via option 252, but isn’t widely supported, it’s normally not used, and we don’t use it it either.

While WPAD is core OS function, it unfortunately never left draft RFC status. Implementations have no formal standard to target; it’s just a guideline. Additionally, because it never left draft status, DHCP option 252 also remains unallocated and without a standard, simply part of the Reserved (Private Use) range. So, setting it but leaving it blank is not unacceptable, OS’ should be able to accommodate.

In a network capture I was then able to clearly see this happen, and it was simple to replicate on a test network. And then it finally all came together…

COVID-19 WFH resulted in a bunch of people upgrading their home networks, with lots of them buying new home routers, including the very-popular ASUS brand. If someone booted up their Windows 10 computer and the blank option 252 sending DHCP server was the first thing it saw, WPAD would break, with all the downstream consequences, including our AAD connectivity issues. And if they’d have AAD connectivity issues for long enough — until a token expired — they would start getting You Can’t Get There From Here messages.

If they went in the office or somewhere else which doesn’t set 252 and fully rebooted (which restarts the WinHttpAutoProxySvc service), WPAD would work and AAD registration would work. But if the service never restarted — if they never actually rebooted — WPAD was stuck not doing anything. [2]

Testing a pre-release version of the patch showed it fixed the problem, and then a few weeks later KB4601382 was released, with the detail “Improves the ability of the WinHTTP Web Proxy Auto-Discovery Service to ignore invalid Web Proxy Auto-Discovery Protocol (WPAD) URLs that the Dynamic Host Configuration Protocol (DHCP) server returns.”. We deployed this patch and the reports of AAD registration / You Can’t Get There From Here issues collapsed.

That was it, it was fixed. A popular home router vendor did something weird (but not against standard), the OS implemented something poorly, and people were working for so long in one of those environments that a credential expired, couldn’t be renewed, and they lost access to SSO.


[1] Mobile apps and a lot of modern websites are fairly asynchronous, sending requests in the background while still working nicely, because they are built to be tolerant of the blips that happen while on wireless networks. Video streaming specifically caches (or buffers) the video locally so that hiccups in the network connection don’t make the video pause and stutter. More real-time-ish things like Remote Desktop or video calls or copying files via SMB are considerably more sensitive to poor network connections.

[2] Retrospectively I suspect confusion around what it means to shut down or restart a computer led to many of the reports of reboots/shutdowns/driving into the office fixing the problem or not and the difficulties in getting a reliable reproduction. Different sleep modes, some of which result in the BIOS displaying the POST when waking from sleep even if the OS doesn’t restart, leads some to believe the operating system was restarted when it may not have been. Or other folks believe that closing the lid is “shutting down”.

It was also unfortunately common for users to have a flexible version of “home”. Sometimes home meant where they’d been working for the last six months and had the problem, sometimes it meant a vacation home with a different ISP they’d gone to the day before but failed to mention, sometimes it meant the other side of the world. Teasing this information out was difficult, as many used “home” to mean anywhere but their normally assigned desk. “I’ve only been at home” frequently meant “I continue to be not at the office”.

[3] We use WPAD because Windows 10 requires internet access for a lot of OS level things that run as SYSTEM, including the Microsoft Store and AAD device registration. If proxy servers are used, the SYSTEM account needs to find the proxy servers. Early on in our Windows 10 deployment Microsoft told us either direct internet access or WPAD was required. WPAD via DHCP doesn’t work with most VPN clients, because they configure their virtual adapters directly and not via DHCP. Thus, to have similar connectivity when in the office or remote, DNS for WPAD is the the choice.

[4] I’ve been told multiple brands do this, but ASUS only vendor where I personally observed it.

Comments closed