Skip to main content

Revenge of the Network: queries, schemas and standards for graph data

When:
Venue: Online

No booking required

Abstract:

Every five years or so eminent academics get together to discuss directions in database research. The 2018 Seattle Report is the latest effort. The word "graph" does not appear in it once. 

Yet, ~50 papers and multiple sessions at VLDB 2020 focussed on graphs. And just last month In February 2021 a small graph database/analytics company called TigerGraph received third-round venture funding of $105m. "Only" a tenth of the $1bn received the same month by data lake behemoth, DataBricks.

In late 2020 DeepMind announced "Alphafold: a solution to a 50-year-old grand challenge in biology": saying: "A folded protein can be thought of as a “spatial graph”.... [our]] neural network system ... interpret[s] the structure of this graph, while reasoning over the implicit graph that it’s building". In late 2018, in a seminal paper (1000+ citations), scientists at the same company advocated graph networks as the central paradigm of deep learning, using the property graph data model. 

In September 2019 the standards group (WG3) that develops SQL got backing from ISO/IEC members to start a new property graph query language project: GQL. It is forty years since WG3 worked on anything other than SQL. 

Birkbeck is represented on the board of Linked Data Benchmark Council by Dr Jan Hidders of our CS department, who is leading industry-academic collaboration in an LDBC working group on typing and schema for GQL. Other research groups are building denotational semantics for GQ, as the standard evolves. 

In my talk I'll describe highlights in the current wave of work on standards for property graph data query and schema, reflecting the first database model to seriously contend with relational in 50 years.

Presentation Slides

Bio:

I'm working on a very late-in-life PhD under the supervision of Alex Poulovassilis and Peter Wood, on updatable property graph views. It's so late in life that it can have no effect on my career. I'm also the part-time Vice-chair of Linked Data Benchmark Council.

From 2016 to 2020 I worked as a Product Manager at graph database vendor Neo4j, initially focussing on distributed consensus product features using a form of Raft, and then leading the Query Languages Standards and Research Team which is a driving force in SQL/PGQ and GQL standards work. I penned the GQL Manifesto, and proposed the GQL project in the formal standards process. I spent a lot of time on the issue of property graph schema and mappings from SQL to graph data. I co-designed the first stab at PG schema implemented in Cypher for Apache Spark. 

I started in IT in 1978 as a mainframe programmer for the NHS; I've worked for end-users (like Land Rover and Barclays [where I led an innovative internal product project for enterprise data distribution with a data-driven WAN-wide query processor using a document extension of SQL, DSQL]), and for product vendors (the best-loved and most influential of which was Digital Equipment Corporation). I used to think a lot about distributed transactions, but for applications, not just databases, as CTO of a failed startup called Choreology Ltd. I was instrumental in commercializing the research at Newcastle University that brought the Arjuna Transaction Service into JBoss. 

Contact name: