NYC Streets on Paper

As is usually the case with development projects, pen must be put to paper first followed by a series of reviews and sign-offs before a shovel is put to the ground. That is also the case with street construction. What is unique, however, is that a street must be added to a map before it is constructed.

In New York City, a newly proposed street must be added to the official ‘City Map’ (not to be confused with NYCityMap) through the Uniformed Land Use Review Procedure (ULURP) before it can be constructed. Thus a street will exist on paper before it becomes a reality. These streets are what have become to be known as paper streets. Paper streets are not unique to NYC but ULURP is.

Paper streets may exist on paper only for many years before they are ever constructed. The street’s configuration or name may change before construction takes place. There are even situations whereby a street could halted (de-mapped) – see definition below – before it ever becomes a reality.

The dashed lines on the map below represent paper streets in the Midland Beach section of Staten Island. It is clear from the area that these infill streets are intended to complete the planned street grid when fully built out. However there could be circumstances (e.g., being in a flood zone) that prevent the streets from being constructed.

Paper Streets: Midland Beach, Staten Island

Paper Streets: Midland Beach, Staten Island

Although originally on paper only, paper streets can be found in NYC digital data. The NYC Street Centerline (CSCL) data set on the NYC Open Data Portal and City Planning’s LION data set include paper streets. For those wondering, LION is an extract of CSCL that includes both single-line (generic) and dual-line (roadbed) representation of the street network plus additional geographies. Additionally, LION has more fine-grained segmentation (breaks occur whenever geographies cross or there are unique address range breaks). Whereas, CSCL is focused specifically on the actual street (roadbed) representation with segmentation by block. More on these data sets in a later dedicated post.

Paper streets can be found as follows:
LION – featuretyp values of 5 and 9;
CSCL – STATUS values 3 and 9.

The inverse of a paper street is a de-mapped street. As the name would apply, this is a case where a street was officially removed from the City Map. And as with paper streets, the street will appear on paper (City Map) as being de-mapped before they are actually removed.

De-mapped Street

De-mapped Street: Melrose Crescent, Bronx

De-mapped streets can be found in LION where status equals 5.

State of the Map US 2015 and the role of Governments in OpenStreetMap

A little over two weeks had passed since the closing of the State of the Map US (SOTMUS) conference in NYC. For those not familiar, SOTMUS is the yearly conference for the US chapter of OpenStreetMap (OSM). This period offered some much needed time to reflect on the conference as a whole: setting, presentations and sessions, exhibitors, organization and execution. On all points, I felt SOTMUS hit the mark and was a resounding success.

United Nations

Yes, I’m sure there were some minor shortcomings as evident by some of the tweets I saw (#SOTMUS). Nonetheless, for a conference organized and executed by volunteers coupled with the comments I heard, it was clearly a success. I have a new found appreciation for the hard work that goes into organizing such a large event after having assisted the organizers in securing the Surrogate Court space for the opening night (NYC DoITT sponsored IT). Kudos to the organizers! But alas I digress.

This post is not intended to be a review of the event. Many others I’m sure have already covered that and a better positioned to do so. My objective was to dive further into the role local government could play in OpenStreetMap. This post can be seen as an extension of the panel I was on at SOTMUS, which as a demonstration of interest in the topic, was the second of two panels on OSM and government. I’m sure that many would even question whether government has a role at all. To that I would say, duplicating or recreating what has already been mapped and increasingly is available on open data sites, is time consuming and wasteful. On the international landscape that is often not the case but here in the US it is.

Consider the NYC building footprint (with height) and address import. To manually digitize approximately one million buildings would have been a labor intensive and lengthy process. On-screen digitizing over aerial photography of a lower resolution then NYC possess would have also resulted in lower quality and less consistent data. Contrast that with a careful import utilizing high quality preexisting *authoritative* data that resulted in nearly complete and consistent coverage of NYC is in my opinion hard to argue against. A bulk import then frees up the community to focus on keeping OSM current and filling in the gaps where needed. Certainly a less daunting task then starting – with respect to buildings – from a nearly blank canvas.

The NYC buildings and address import was largely undertaken by Mapbox. NYC DoITT assisted with planning and answering questions (NYC addressing is a challenge) throughout the effort and of course providing the data. Part of the effort included a change notification email that gets sent out each night. The email shows the changesets from the previous day. Since a changeset can be comprised of multiple edits, wading through numerous unrelated edits (primary focus is on buildings and addresses) can be time consuming; however the change notification has proven useful and has resulted in hundreds of edits to NYC data.

Each changeset comes with a map (see example below) to guide the reviewer to the specific location of the edit. NYC DoITT staff review the changeset and apply any valid changes to the internal repository. Due to schema differences and ODbL license restrictions, the OSM data is not imported into the internal repository. The changesets are used as a guide.

OSM Change Set

Tools such as MapRoulette can also be used to bring in changes made to *authoritative* data sets. This is the method being used by the local NYC OSM community to incorporate missing bike lane data into OSM (see Eric Brelsford’s lightning talk here).

I think it is undeniable that *authoritative* government data can further enrich OSM to the benefit of many. You may then be asking yourself, what is the benefit to a local government? To me there are both direct and indirect benefits.

From a strategic perspective, it is important to have options when making decisions. In the case of data, not all local governments can afford or have the technical capabilities to manage their own geospatial data. And even when they do, there are cases where governments use external data sources for routing and logistics. To have only a couple proprietary commercial options limits choice and drives up cost. Having a robust and complete open data set provides governments alternatives. And the benefit is twofold: direct cost savings and indirect alternatives.

OSM can also benefit a much wider audience. Open data is great. And the movement towards more open data is fantastic. What is often not discussed is the barrier to enter the open data space. Not only specific to geospatial data, a person needs a variety of skills and software (there are open source options in geo such as QGIS) to work with and analyze the data. This is not an intended barrier but a result of the complexity within the current geospatial technology space. This greatly reduces the number of people downloading and working with open data. Conversely with OSM, there is a platform and an ecosystem of tools already in place. There are tools for viewing, editing, analyzing, rendering and even downloading OSM data. This allows people to focus on what they want to do with the data (e.g., make or view a map) and less about the intricacies of setting up the data to work with it. And there are an amazing set of tools from independent open source developers to commercial entities. From the elegant and simple ID editor to the Tangram map renderer. OSM can open up a wealth of possibilities and can be a viable alternative.

Geoclient: An Open NYC Geocoding API

Overview
Geoclient is a geocoding API that recognizes and geocodes addresses, intersections and blockfaces (on street and two cross streets) located in New York City. The Geoclient service provides a RESTful web service interface to the Department of City Planning’s (DCP) Geosupport system. Geoclient provides “pass-through” style access to native Geosupport functions. It does not change or modify Geosupport functionality in any way. However, Geoclient does add some very useful features (some of which are considered standard by today’s developers). While Geoclient is intended for programmatic use, DCP hosts GOAT, a website that provides direct access to Geosupport native functions. Geoclient developers often use GOAT to compare results or test data interactively.

Access
There are two primary Geoclient installations. One is for use internally by official City agencies and the second intended for the public developer community. However, the data and logic behind all instances of Geoclient are exactly the same. By sharing the same code base and deployment scripts, it is easier to maintain and support. For the public, access to Geoclient is through the NYC Developer Portal. For City agencies, contact us directly.

Functionality
Geoclient provides two notable enhancements which can be used to simplify and optimize access to Geosupport. The first is single-field search functionality which parses a single input string into the discrete location elements that are required by the different Geosupport functions. For example, an input of “59 Maiden Lane, Manhattan” is parsed into its house number (59), street name (Maiden Lane) and Borough (Manhattan) for submission to Geosupport. Note that parsing is not case-sensitive, punctuation is ignored and most standard street pre- and post-modifiers, types and directionals are recognized.

A fully qualified (i.e., expanded) address is often referred to as a ‘normalized’ address. An example is ‘314 w 100 st mn’ which is the equivalent of the normalized ‘314 West 100 Street, Manhattan’. Normalization is native to Geosupport. If your data is already parsed into discrete address elements, using the /geoclient/v1/address endpoint directly is less ambiguous and slightly more efficient.

Building on it’s ability to parse single-field search locations, Geoclient can also be used to “guess” the intended target of an incomplete or ambiguous input. One example is the submission of an address without a borough. As long as Geoclient can recognize the search text as an address (house number and street), the /geoclient/v1/search endpoint will try that address in all five boroughs. By default, if the address exists in one or more boroughs, all locations will be fully geocoded and returned as possible matches. This behavior can be customized, for example, by calling the service with the optional ‘exactMatchForSingleSuccess’ parameter set to true. In that case, if the address exists in only one borough it will be geocoded and returned as an exact match.

Using the previous example, entering “59 maiden ln” without the borough will result in the same response as entering “59 Maiden Lane, Manhattan”. The figure below shows the response for “59 Maiden Ln” using the Pre-K Finder application.

geoclient_imputed

In cases where the same address exists in multiple boroughs, Geoclient will return a candidate list of possible addresses. In each case, the candidate address is validated to ensure a successful response. See example below.

geoclient_candidate

 

Developers can customize this and several other search features as documented by the Geoclient API.

Hopefully this will clear up some of the misconceptions associated with Geoclient and contribute to the knowledge base. And please check back in the coming weeks for a more detailed Geoclient post. In addition, expect some enhancements over the coming months. Nice work Matt!

NYC Addressing: A Primer

There is no mystery and intrigue when defining the primary function of an address is to locate or identify a property. And although we often take addressing (hereto defined as the process of assigning and using addresses) for granted, addressing provides an essential function to all. This is evident in daily life where addresses are used by individuals, corporations and governments as they interact and conduct business. Common examples across this spectrum are the delivery of mail and packages, police and fire departments responding to 911 emergency service calls, and generally navigating the areas we inhabit or visit. Addressing is the fuel that make our cities, towns and villages run clean.

The mystery and intrigue comes from improper or confusing addresses that can cause problems or delays with the delivery of services and response to emergency incidents at an address. Numerous stories exist of problems encountered by first responders to problematic addresses. Standardized and predictable addressing makes locating an address quicker and easier for all parties. When and where possible it is best to assign addresses:

  • in logical numeric sequence and
  • consistently across a single block (all with or without hyphens);
  • with odd and even house numbers on separate street sides;
  • to the street a property fronts;
  • that are not duplicates of existing addresses.

There is no single authority overseeing address within NYC. Addresses are assigned in New York City by the Topographical Units of the respective Borough President’s (BP) Offices. That is, the Queens BP assigns addresses only within Queens and so forth. NYC DoITT provides a secure web-based application for BP’s to make address and street name assignments. The application ensures centralized storage of address assignments; notification to responsible parties (911) and consistency of address assignments across boroughs.

Addresses are assigned to buildings for the following general cases:

  • new construction;
  • additional entrance to an existing business;
  • change an address of an existing building;
  • storefront business.

Unique Cases

It sometimes seems as if NYC has each and every possible address anomaly although that is most certainly not the case. Below are just a few types of the address anomalies in NYC.

Vanity Address: an address for a building that uses a street or place name on which the building does not front. The figure below provides an excellent an example as well as the challenges vanity addresses pose. Imagine trying to find 16 Penn Plaza while standing in front of 2 Penn Plaza.

Penn Plaza Area

Hyphenated Address: often referred to as Queens-style addresses, a hyphenated address has a hyphen in the house number (e.g., 70-111). The left side of the hyphen represents the nearest cross street exclusive of avenues and the right side of the hyphen represents the house number.

Edgewater Park: a gated community in the Bronx, Edgewater Park is divided into alphabetic sectors (A, B, C…) which are used in lieu of a street name for addressing. Geosupport uses Edewater Park as the street name to avoid confusion to the extent possible. An example of an address 111C Edgewater Park. See figure below.

Edgewater Park

Miscellaneous

House number containing fraction and letter: 138 1/2 B Edgewater Park, Bronx.

Odd and even house numbers on the same side of street: Park Row, Manhattan (see figure below)

Hyphenated and non-hyphenated address on the same block: Ann Street, Manhattan (see figure below)

Park Row Addresses

Address Data

There are two primary methods for modeling and managing addresses in a geospatial database. The first is by street, which is commonly referred to as a street centerline. This method models the high and low house numbers on a street segment (i.e., block) for each side of the street. Geocoders then interpolate an input address proportionately between the high and low house number range on the respective side of the street. Geocoded addresses using this method are approximations of actual addresses and include hypothetical non-existent addresses.

The second, and more recent approach, is to represent each individual address, which is referred to as address points. For this method, each and every address is modeled generally within the building the address falls. Both methods are used by NYC and both data sets are available to the public.

Address points is a geospatial dataset that models the approximate entrance of a building and includes the properties signed address (house number, street name). Address points were developed by NYC DoITT and completed in 2012. The data were subsequently released to the public in 2013. Since that time the data has been released on a quarterly basis.

Data sources

CSCL, Citywide Street Centerline, models only physical streets and does not have duplicate segments for cases where there are alternate street names.

LION – an extract from CSCL that includes both roadbed (modeling of dual carriage ways) and generic (modeling a single line to represent dual carriage ways). LION provides both to support legacy use of the data. In addition, LION has duplicate segments for each alternate name of a street segment.

Address points – a point representing all known addresses.

Other Resources

Manhattan BP – http://manhattanbp.nyc.gov/downloads/pdf/address-assignments-v-web.pdf

You Can Only Get Here From There

There seems to be no shortage of geographic anomalies in New York City (NYC). In this case I use geography in a non-academic sense, as a proxy for all that defines location: street and place names, addressing, jurisdictional divisions, etc. Having lived and worked here (NYC) for more years than I care to admit, nothing seems to shock me any more. And in hindsight, for a city as old (by U.S. standards) as NYC, the depth and breadth of geographic anomalies really should not come as a surprise. When you consider that NYC was originally separate cities, has developed over the course of separate administrations, and generations of city workers with their own ideas of standards and best practices, it sometimes seems shocking that there is as much geographic order as there is. But I digress.

The latest addition, in what I have termed geographic anomalies, was brought to my attention by Rudy Lopez of the Department of City Planning. There is a development (i.e., network of streets) largely in Westchester County (our neighbors to the north) that extends into The Bronx and thus is part of NYC. Nothing out of the ordinary so far, but adding a bit of spatial intrigue, surrounding this development on all sides within NYC limits is Pelham Bay Park, the Hutchinson Parkway and the New England Thruway. This has the effect of isolating the streets within this development (Park Drive bounded by Split Rock Road to the west and Edgemere Street to the east) from any other local road in NYC and thus creating the situation that you cannot enter this development in NYC any other way then through Westchester.

Al text

Location from NYCityMap. Excuse the missing Westchester geography.

And there you have it, yet another geographic anomaly in NYC. Where, in this case, you can only get here from there.

NYC Building Footprints Part II

This post is a follow-up to the previous building footprint post. It expands upon some topics, and covers some new areas. And as with most everything, a bit of background is necessary to understand where we have come from and, in some cases, why things are the way they are. Progress is made incrementally. The current state of NYC geospatial data has improved immensely but certainly further improvements are warranted.

Change
Although NYC is largely a ‘built’ city, construction activity is continually taking place. As such, building footprint edits are made to account for these changes in the non-digital world and differences will be seen from extract to extract. Additionally, as errors and omissions are encountered in the data, corrections are made. The building footprints is a dynamic data set, extracted quarterly and we hope to move to a continuous update stream in the near future. Nonetheless, change will still need to be handled. More on that to come in the next year.

In the case of demolished buildings, these building geometries are archived and provided as a separate historical buildings file on the NYC Open Data portal.

BIN
The Building Identification Number (BIN) provides a unique identifier for the buildings to which they are assigned. Not every building within the building footprints database has been assigned a BIN. For those building not yet assigned a BIN or where a BIN has yet to be inserted into the building footprints, a placeholder is inserted. These placeholders have been referred to as ‘million’ BINs. They are identified by a borough code plus six zeros.

The borough codes are as follows:
Manhattan = 1
Brooklyn = 2
Bronx = 3
Queens = 4
Staten Island = 5

BINs are assigned by the Department of City Planning (DCP). BINs originated from the Property Address Directory (PAD), one of the data sources of Geosupport. PAD predated the building footprints; therefore PAD relied on other sources to define buildings. With the advent of the building footprints, many more buildings needed to be assigned a BIN. This work is ongoing. As DCP assigns BINs, they are provided to DoITT and inserted into the corresponding building footprints. At present there are only 27,792 ‘million’ BINs remaining in the December 2014 building footprint extract. That represents 2.5% of the 1,082,483 building footprints. The majority of these are detached garages or minor buildings on lots. This number will continue to decrease until we reach complete coverage.

BBL
For all tax lots, except condominiums (condos), there is a single representative BBL across all City agencies. Condos are the exception due to the fact that each individual unit (i.e., apartment) within a condo building has its own BBL. Therefore, condos have multiple BBLs per tax lot. It is my understanding that the Billing BBL was created by The Department of Finance (DOF) as a way of representing a condo’s management entity for the purpose of correspondence and record keeping. Billing BBLs always have 75 as the first two digits in the block portion of the BBL (e.g., 7501.). Unfortunately there does not seem to be agreement across all City agencies, or even within an agency, on a unique BBL for condo lots.

DOF uses the Billing BBL for RPAD and the Base BBL (also referred to as the FKA [Formerly Known As]) for the Digital Tax Map and ACRIS. DCP uses the billing BBL for MapPluto.

The building footprints use the Billing BBL. The building footprints carry the BBL as a means of providing a way of associating buildings to tax lots. Since the BBLs are managed outside of the building footprints, the BBLs are synchronized periodically. Due to the different update frequency of MapPluto and the building footprints, inconsistencies can be present. In the December 2014 extract there were 5,199 BBL mismatches representing 0.4% of the total.

There are also cases where buildings do not fall within an official tax lot. For these, DCP assigns a ‘dummy’ lot number of 9999. An example is the Subway station at 96th and Broadway (BIN 1089286, BBL 10124399990). These ‘dummy’ lots are in PAD but do not exist in MapPluto.

A reminder to always read the metadata. To borrow from the Ancient Greek aphorism “know thyself”, know thy data. In addition to improving the data, we look to continually improve the metadata.

Finally, to the data editors that work in relative obscurity at DoITT, DCP and DOF I say thank you for a job well done. To all I wish you a Happy Holidays. Till next year…

NYC Building Footprints

I have seen and received quite a number of emails, and have even seen applications that confuse MapPluto for a building data set. To clarify what the building footprint’s represent as well as to remove any confusion between the two very different data sets, I decided to write this post.

MapPluto
MapPluto is a compilation of City agency data at the tax lot (aka parcel) level produced and distributed by the Department of City Planning. A tax lot defines the basic unit of land ownership. Much has been written about MapPluto, so I do not intend to cover this data set in detail. However, it is important to understand that a tax lot can encompass multiple buildings.

A NYC tax lot uses Borough Block and Lot (BBL) as a unique parcel identifier. DCP compiles a variety of City data sets at the parcel level into MapPluto. One of the main data sources is the Department of Finance’s (DOF) Real Property and Assessment Data (RPAD). One of the attributes in RPAD is Number of floors, which is included in MapPluto as NumFloors. DCP defines this column in the metadata as being for “…the primary building on the tax lot, the number of full and partial stories starting from the ground floor.” This is due to the fact, as previously stated, that there can be multiple buildings on a tax lot. Since only one value is possible, DCP elected to go with the number of floors of the ‘primary’ building.

An example of a tax lot with multiple buildings is the community of Breezy Point, Queens. Originally a gated community of summer bungalows that are now permanent homes, Breezy Point spans 12 tax lots and encompasses 3,017 buildings. one of the parcels (BBL 4163400050) includes 424 buildings and has a value of 1 for the number of floors. Although the houses in Breezy Point are of similar housing stock, the number of floors is for the ‘primary’ building and thus not an exact figure. Another common example are NYC Housing Authority (NYCHA) developments. Although buildings within a development are often the same number of floors, this is not always the case.

In general, it is the responsibility of the person working with the data to read the metadata to get an understanding of the data and its limitations and constraints. There are cases where values are estimated, imputed or no longer actively maintained. In the case of MapPluto, building height applies to only one of potentially many buildings on a tax lot. This is not an error but a limitation of the data.

Building Footprints
Building footprints represent the ground-level perimeter outline of a building (i.e., footprint) greater than or equal to 400 square feet and greater than or equal to 12 feet in height unless they were previously captured and have a Building Identification Number (BIN). The purpose for the size and height constraint is to prevent the capture of non-buildings (e.g., containers, tents), which we have seen in the past. The specifications to which built features are captured and which are not can be found in the metadata.

The building footprints include ground and roof height elevations. These values are in feet and are derived photogrammetrically using stereo imagery, LiDAR and a TIN model.

There are cases where there will be no value in these columns. The reason for this is how the building footprints are maintained. To understand this, we need to revisit the past.

The building footprints were first captured as part of the first NYC Planimetrics in 1997 based on 1996 imagery. The NYC Planimetrics came to be called NYCMap. An excellent article on this effort can be found in the New Yorker (unfortunately a subscription is required to access the full article). In the beginning there was no plan for the periodic update of the planimetrics. Since 2006 the planimetrics have been updated on a four-year cycle.

Utilizing the Department of Buildings (DOB) permit data (new construction, major alterations and demolitions), it was determined that the building footprints could be updated on a more frequent basis. Since 2004 the buildings have been updated regularly and since the NYC Open Data Portal launched have been updated on a quarterly basis. These updates are done on-screen using heads up digitizing. Since these updates are not done photogrammetrically elevations values are not available and thus not in the database. With each planimetric update, buildings that are digitized on-screen are replaced with photogrammetrically-captured buildings and elevation values are assigned.

Lastly the Building Identification Numbers (BIN) assigned by DCP are inserted into the corresponding building footprint. The BIN is the unique identifier used by City agencies to identify buildings. Many agencies utilize the BIN to associate additional data to a building. BIN is returned by Geoclient API geocoding service provided by DoITT.