This week

was largely similar to what I was working on last week. I spent a lot of time trying to figure out ways to use the TESSERA embeddings for purposes like evaluating accessibility, and studying the semantic embeddings. As part of this, I wrote a bunch of new tools/methods that make working with both of these domains a lot easier. At the same time, however, I would say that by far the greatest bottleneck to my productivity this week have been computations. Even though I have been trying to offload as much as possible to remote CPU/GPU servers, things still take soo much time.

Urban Semantics

I realized that the regressor model approach I have been trying to use isn’t very solid, and performs in strange ways. I went back to using the similarities, and I made a lot of things a lot easier to run and use. Besides making things a lot tidier, I also retrained a few different embedders, which seem to be delivering better results. One thing that I noticed (and should’ve probably noticed a long time ago) is that the produced results seem to reflect the spatial surrounding “context” much better than the segment itself. I’m not sure if this is a flaw or a feature.

TESSERA walkability

Similarly, I realized that my attempts to generate “labels” usable to train a TESSERA embeddings walkability regressor are not a great direction (which, I guess, is not that surprising). Instead, I decided to leverage GPS traces. Those (in the context of human walkability) have been in the back of my mind for a while, but I never could quite find a good use for them. For instance, because there don’t seem to be many great publicly accessible sources of those, especially in the locations and regions I have been interested in (and which are covered by other datasets I use). OSM, for instance, does provide some GPS traces, but the main problem with those is that the mode of transportation is not recorded. Distilling what’s pedestrian would be, I imagine, pretty difficult and unreliable.

However. Since I’m specifically interested in out-of-segment accessibility, I’ve assumed that it would be fair to assume GPS traces (in urban context), which are far enough from segments, have been walked.

alt text
~west london traversability

In this case, of course, we only have positive examples (gps trace present - coordinates -> TESSERA embeddings), and so I have been trying to run some pu-learning-based models, which I believe should be appropriate in this case. Firstly, I collected gps traces from Cambridge and filtered them to be sufficiently far away from roads and other segments, and so that they are not withing boundaries of buildings (or other irrelevant areas). In Cambridge, there were around ~8.5k such points. For the use in the pu models, I have then been aiming to use between 100-200k unlabeled data. Nevertheless, reflecting upon initial results, I realized Cambridge may be too small and homogenous, and so I gathered data from much larger encapsulating much of West London. This data fetching, however, is really slow.