Skip to content

Instantly share code, notes, and snippets.

@mootari
Last active April 29, 2026 09:49
Show Gist options
  • Select an option

  • Save mootari/8ad87b5217897eb7859603396bd347bc to your computer and use it in GitHub Desktop.

Select an option

Save mootari/8ad87b5217897eb7859603396bd347bc to your computer and use it in GitHub Desktop.
Annotated Chrome History SQLite schema. Inferred by Sonnet 4.6 (GitHub Copilot) from the Chromium sources, reviewed and refined in multiple turns. Might still contain errors.

Chrome History Database — Annotated Schema

Chrome's History database is an SQLite file located at <Profile>/History. It is managed by HistoryDatabase, which inherits from several sub-database classes, each owning a group of tables.

Core source files

Class File Tables owned
URLDatabase url_database.cc urls, keyword_search_terms
VisitDatabase visit_database.cc visits, visit_source
VisitAnnotationsDatabase visit_annotations_database.cc content_annotations, context_annotations, clusters, clusters_and_visits, cluster_keywords, cluster_visit_duplicates
VisitedLinkDatabase visited_link_database.cc visited_links
VisitSegmentDatabase visitsegment_database.cc segments, segment_usage
DownloadDatabase download_database.cc downloads, downloads_url_chains, downloads_slices
HistorySyncMetadataDatabase history_sync_metadata_database.cc history_sync_metadata

Table: urls

CREATE TABLE urls (
  id               INTEGER PRIMARY KEY AUTOINCREMENT,
  -- Unique by application convention (not a UNIQUE constraint). The index
  -- below supports fast lookup; GetRowForURL() enforces no-duplicate inserts.
  url              LONGVARCHAR,
  -- Page title at last visit.
  title            LONGVARCHAR,
  -- Total number of times this URL has been visited.
  visit_count      INTEGER DEFAULT 0 NOT NULL,
  -- Number of times the URL was typed directly (vs. clicked/navigated to).
  -- Subset of visit_count; every typed visit also increments visit_count.
  typed_count      INTEGER DEFAULT 0 NOT NULL,
  -- Microseconds since the Windows epoch (Jan 1, 1601). Most recent visit.
  last_visit_time  INTEGER NOT NULL,
  -- 1 = hidden from history UI (e.g. redirect intermediates, subframes).
  hidden           INTEGER DEFAULT 0 NOT NULL
);

CREATE INDEX urls_url_index ON urls (url);

Table: keyword_search_terms

CREATE TABLE keyword_search_terms (
  -- FK → TemplateURL.id (external to this DB).
  keyword_id      INTEGER NOT NULL,
  -- FK → urls.id.
  url_id          INTEGER NOT NULL,
  -- Original search term as typed.
  term            LONGVARCHAR NOT NULL,
  -- Lowercased, whitespace-collapsed for deduplication/autocomplete.
  normalized_term LONGVARCHAR NOT NULL
);

CREATE INDEX keyword_search_terms_index1 ON keyword_search_terms (keyword_id, normalized_term);
CREATE INDEX keyword_search_terms_index2 ON keyword_search_terms (url_id);
CREATE INDEX keyword_search_terms_index3 ON keyword_search_terms (term);

Tables: segments, segment_usage

Used to compute Most Visited tiles on the New Tab Page.

CREATE TABLE segments (
  id     INTEGER PRIMARY KEY,
  -- Canonical site name derived from URL (scheme + stripped host).
  -- Common prefixes (www., m., mobile., touch.) are stripped to group
  -- variants together. e.g. "http://google.com/"
  name   VARCHAR,
  -- FK → urls.id. Current representative URL for this segment.
  url_id INTEGER NOT NULL
);

CREATE INDEX segments_name   ON segments (name);
CREATE INDEX segments_url_id ON segments (url_id);

CREATE TABLE segment_usage (
  id          INTEGER PRIMARY KEY,
  segment_id  INTEGER NOT NULL,  -- FK → segments.id
  -- Start-of-day bucket, microseconds since Windows epoch.
  time_slot   INTEGER NOT NULL,
  visit_count INTEGER DEFAULT 0 NOT NULL
);

CREATE INDEX segment_usage_time_slot_segment_id ON segment_usage (time_slot, segment_id);
CREATE INDEX segments_usage_seg_id              ON segment_usage (segment_id);

Table: visited_links

Implements partitioned :visited CSS state (Privacy Sandbox). A link is only "visited" in the context of a specific top-level site + frame.

CREATE TABLE visited_links (
  id            INTEGER PRIMARY KEY AUTOINCREMENT,
  -- FK → urls.id. The link's URL.
  link_url_id   INTEGER NOT NULL,
  -- Top-level page URL when the link was clicked (partition key).
  top_level_url LONGVARCHAR NOT NULL,
  -- Frame URL containing the link (partition key).
  frame_url     LONGVARCHAR NOT NULL,
  visit_count   INTEGER DEFAULT 0 NOT NULL
);

CREATE INDEX visited_links_index ON visited_links (link_url_id, top_level_url, frame_url);

Table: visits

Every page load generates one row. A redirect chain produces multiple rows linked via from_visit. All rows in a chain share the same visit_time (stamped at navigation initiation, not per-hop resolution); intermediate hops have visit_duration = 0 and hidden = 1 in urls. Only the final hop carries a meaningful visit_duration.

CREATE TABLE visits (
  id                                INTEGER PRIMARY KEY AUTOINCREMENT,
  -- FK → urls.id.
  url                               INTEGER NOT NULL,
  -- Microseconds since Windows epoch. Shared across all hops in a redirect
  -- chain — see note above.
  visit_time                        INTEGER NOT NULL,
  -- FK → visits.id. Previous hop in a redirect chain, or the visit that
  -- spawned this one. 0 if this is the start of a chain.
  from_visit                        INTEGER,
  -- The HTTP referrer URL if it came from outside Chrome (e.g. an external
  -- app opening a link). Distinct from from_visit, which is an internal FK.
  external_referrer_url             TEXT,
  -- Bitmask of ui::PageTransition. Composed of one core type (lower 8 bits)
  -- OR'd with zero or more qualifier bits (upper 24 bits).
  --
  -- Core types (lower 8 bits, 0x00–0x0A):
  --   0  PAGE_TRANSITION_LINK               Clicked a link
  --   1  PAGE_TRANSITION_TYPED              URL typed directly in address bar
  --   2  PAGE_TRANSITION_AUTO_BOOKMARK      From a UI suggestion or bookmark
  --   3  PAGE_TRANSITION_AUTO_SUBFRAME      Auto-loaded subframe (ads, fenced frames)
  --   4  PAGE_TRANSITION_MANUAL_SUBFRAME    Explicitly user-requested subframe
  --   5  PAGE_TRANSITION_GENERATED          Omnibox match that didn't look like a URL
  --   6  PAGE_TRANSITION_AUTO_TOPLEVEL      Auto-loaded top-level frame (NTP, devtools, etc.)
  --   7  PAGE_TRANSITION_FORM_SUBMIT        Form submission
  --   8  PAGE_TRANSITION_RELOAD             Reload button or Enter in address bar
  --   9  PAGE_TRANSITION_KEYWORD            Tab-to-search keyword navigation
  --  10  PAGE_TRANSITION_KEYWORD_GENERATED  Visit generated for a keyword search
  --
  -- Qualifier bits (upper 24 bits, OR'd onto core type):
  --   0x00800000  PAGE_TRANSITION_BLOCKED           Managed user blocked
  --   0x01000000  PAGE_TRANSITION_FORWARD_BACK      Back/Forward button
  --   0x02000000  PAGE_TRANSITION_FROM_ADDRESS_BAR  Triggered from address bar
  --   0x04000000  PAGE_TRANSITION_HOME_PAGE         Navigating to home page
  --   0x08000000  PAGE_TRANSITION_FROM_API          External application
  --   0x10000000  PAGE_TRANSITION_CHAIN_START       First hop in redirect chain
  --   0x20000000  PAGE_TRANSITION_CHAIN_END         Last hop in redirect chain
  --   0x40000000  PAGE_TRANSITION_CLIENT_REDIRECT   JS or meta-refresh redirect
  --   0x80000000  PAGE_TRANSITION_SERVER_REDIRECT   HTTP header redirect
  transition                        INTEGER DEFAULT 0 NOT NULL,
  -- FK → segments.id. NULL if no segment assigned.
  segment_id                        INTEGER,
  -- Time spent on page, microseconds. 0 for redirect intermediates.
  visit_duration                    INTEGER DEFAULT 0 NOT NULL,
  -- 1 if this visit incremented typed_count in `urls`.
  incremented_omnibox_typed_score   BOOLEAN DEFAULT FALSE NOT NULL,
  -- FK → visits.id. Tab opener visit (not HTTP referrer). NULL if none.
  opener_visit                      INTEGER,
  -- Sync: cache GUID of originating device. NULL = local.
  originator_cache_guid             TEXT,
  -- Sync: visit ID on the originating device.
  originator_visit_id               INTEGER,
  originator_from_visit             INTEGER,
  originator_opener_visit           INTEGER,
  -- 1 if known to the sync server.
  is_known_to_sync                  BOOLEAN DEFAULT FALSE NOT NULL,
  -- 1 if eligible for NTP "Most Visited" tiles.
  consider_for_ntp_most_visited     BOOLEAN DEFAULT FALSE NOT NULL,
  -- FK → visited_links.id. 0 = kInvalidVisitedLinkID (no associated link).
  visited_link_id                   INTEGER DEFAULT 0 NOT NULL,
  -- Web app context identifier.
  app_id                            TEXT
);

CREATE INDEX visits_url_index           ON visits (url);
CREATE INDEX visits_from_index          ON visits (from_visit);
CREATE INDEX visits_time_index          ON visits (visit_time);
CREATE INDEX visits_originator_id_index ON visits (originator_visit_id);

Table: visit_source

Tracks provenance of visits. A missing row implies SOURCE_BROWSED; rows are only written for the other seven values.

CREATE TABLE visit_source (
  -- FK → visits.id (1:1).
  id     INTEGER PRIMARY KEY,
  -- 0=SOURCE_SYNCED, 2=SOURCE_EXTENSION, 3=SOURCE_FIREFOX_IMPORTED,
  -- 4=SOURCE_IE_IMPORTED, 5=SOURCE_SAFARI_IMPORTED, 6=SOURCE_ACTOR,
  -- 7=SOURCE_OS_MIGRATION_IMPORTED  (1=SOURCE_BROWSED is never stored)
  source INTEGER NOT NULL
);

Table: content_annotations

ML-derived, on-device content signals. 0 or 1 rows per visit.

This table has two consumers. First, the Journeys/Memory clustering pipeline reads it to score and group visits into thematic clusters. Second, the Privacy Sandbox BrowsingTopics API uses annotation_flags to determine whether a visit is eligible to influence the user's topics calculation.

A subset of fields is synced to other devices via HistorySyncBridge: page_language, password_state, has_url_keyed_image, related_searches, and categories. Fields such as annotation_flags are explicitly excluded from sync on the assumption that they are derived locally and should not be treated as authoritative on remote devices.

CREATE TABLE content_annotations (
  -- FK → visits.id (PK; 1:1 with visits).
  visit_id                  INTEGER PRIMARY KEY,
  -- ML page visibility score [0.0, 1.0]. NULL if not computed.
  visibility_score          NUMERIC,
  -- Deprecated FLoC score.
  floc_protected_score      NUMERIC,
  -- Serialized content category list (topic IDs + weights).
  categories                VARCHAR,
  page_topics_model_version INTEGER,
  -- Bitmask of VisitContentAnnotationFlag:
  --   bit 0: kBrowsingTopicsEligible
  --   bit 1: kDeprecatedFlocEligibleRelaxed
  annotation_flags          INTEGER NOT NULL,
  -- Serialized named entities recognized on the page.
  entities                  VARCHAR,
  -- Serialized "related searches" from the page.
  related_searches          VARCHAR,
  -- Canonical search URL (query params normalized).
  search_normalized_url     VARCHAR,
  -- Search query text if this was a search visit.
  search_terms              LONGVARCHAR,
  alternative_title         VARCHAR,
  -- BCP-47 language tag for page content.
  page_language             VARCHAR,
  -- 0=kUnknown, 1=kNoPasswordField, 2=kHasPasswordField
  password_state            INTEGER DEFAULT 0 NOT NULL,
  -- 1 if page has a URL-keyed image in Google's index.
  has_url_keyed_image       BOOLEAN NOT NULL
);

Table: context_annotations

Behavioral and UX signals captured around the visit. 0 or 1 rows per visit.

Fields are written in two phases. The on-visit fields (browser_type, window_id, tab_id, task_id, root_task_id, parent_task_id, response_code) are written when the visit row is first created and are included in sync. The remaining on-close fields (context_annotation_flags, duration_since_last_visit, page_end_reason, total_foreground_duration) are written later, when the page lifetime ends (user navigates away or closes the tab), and are not synced.

This table is also used by the Journeys clustering pipeline to enrich cluster scoring, and by visit query logic to filter out 404 responses when computing Most Visited tiles.

CREATE TABLE context_annotations (
  -- FK → visits.id (PK; 1:1 with visits).
  visit_id                  INTEGER PRIMARY KEY,
  -- Bitmask of ContextAnnotationFlags:
  --   kOmniboxUrlCopied, kIsExistingPartOfTabGroup, kIsPlacedInTabGroup,
  --   kIsExistingBookmark, kIsNewBookmark, kIsNtpCustomLink
  context_annotation_flags  INTEGER NOT NULL,
  -- Microseconds since previous visit to this URL. -1 = first visit.
  duration_since_last_visit INTEGER,
  -- page_load_metrics::PageEndReason (navigation away, tab close, crash, etc.)
  page_end_reason           INTEGER,
  -- Total foreground time for this visit, microseconds. -1 if not recorded.
  total_foreground_duration INTEGER,
  -- 0=kUnknown, 1=kTabbed, 2=kPopup, 3=kCustomTab, 4=kAuthTab
  browser_type              INTEGER DEFAULT 0 NOT NULL,
  window_id                 INTEGER DEFAULT -1 NOT NULL,
  tab_id                    INTEGER DEFAULT -1 NOT NULL,
  task_id                   INTEGER DEFAULT -1 NOT NULL,
  root_task_id              INTEGER DEFAULT -1 NOT NULL,
  parent_task_id            INTEGER DEFAULT -1 NOT NULL,
  -- HTTP response code (e.g. 200, 404). 0 if not recorded.
  response_code             INTEGER DEFAULT 0 NOT NULL
);

Tables: clusters, clusters_and_visits, cluster_keywords, cluster_visit_duplicates

Power the Journeys / Memory feature.

CREATE TABLE clusters (
  cluster_id                           INTEGER PRIMARY KEY AUTOINCREMENT,
  should_show_on_prominent_ui_surfaces BOOLEAN NOT NULL,
  label                                VARCHAR NOT NULL,
  raw_label                            VARCHAR NOT NULL,
  triggerability_calculated            BOOLEAN NOT NULL,
  originator_cache_guid                TEXT NOT NULL,
  originator_cluster_id                INTEGER NOT NULL
);

CREATE TABLE clusters_and_visits (
  cluster_id        INTEGER NOT NULL,  -- FK → clusters.cluster_id
  visit_id          INTEGER NOT NULL,  -- FK → visits.id
  score             NUMERIC DEFAULT 0 NOT NULL,
  engagement_score  NUMERIC DEFAULT 0 NOT NULL,
  url_for_deduping  LONGVARCHAR NOT NULL,
  normalized_url    LONGVARCHAR NOT NULL,
  url_for_display   LONGVARCHAR NOT NULL,
  -- 0=kDefault, 1=kHidden, 2=kDone
  interaction_state INTEGER DEFAULT 0 NOT NULL,
  PRIMARY KEY (cluster_id, visit_id)
) WITHOUT ROWID;

CREATE INDEX clusters_for_visit ON clusters_and_visits (visit_id);

CREATE TABLE cluster_keywords (
  cluster_id  INTEGER NOT NULL,  -- FK → clusters.cluster_id
  keyword     VARCHAR NOT NULL,
  type        INTEGER NOT NULL,
  score       NUMERIC NOT NULL,
  collections VARCHAR NOT NULL
);

CREATE INDEX cluster_keywords_cluster_id_index ON cluster_keywords (cluster_id);

CREATE TABLE cluster_visit_duplicates (
  visit_id           INTEGER NOT NULL,
  duplicate_visit_id INTEGER NOT NULL,
  PRIMARY KEY (visit_id, duplicate_visit_id)
) WITHOUT ROWID;

Tables: downloads, downloads_url_chains, downloads_slices

CREATE TABLE downloads (
  id                     INTEGER PRIMARY KEY,
  guid                   VARCHAR NOT NULL,      -- RFC 4122 UUID
  current_path           LONGVARCHAR NOT NULL,  -- In-progress file path
  target_path            LONGVARCHAR NOT NULL,  -- Final intended path
  start_time             INTEGER NOT NULL,      -- Microseconds since Windows epoch
  received_bytes         INTEGER NOT NULL,
  total_bytes            INTEGER NOT NULL,      -- -1 if unknown
  -- 0=IN_PROGRESS, 1=COMPLETE, 2=CANCELLED, 3=INTERRUPTED
  state                  INTEGER NOT NULL,
  -- download::DownloadDangerType enum
  danger_type            INTEGER NOT NULL,
  -- download::DownloadInterruptReason enum
  interrupt_reason       INTEGER NOT NULL,
  -- SHA-256 of completed file. Empty if incomplete.
  hash                   BLOB NOT NULL,
  end_time               INTEGER NOT NULL,      -- 0 if incomplete
  opened                 INTEGER NOT NULL,      -- 1 if user has opened file
  last_access_time       INTEGER NOT NULL,
  transient              INTEGER NOT NULL,      -- 1 = delete on profile cleanup
  referrer               VARCHAR NOT NULL,
  site_url               VARCHAR NOT NULL,      -- Initiating site instance URL
  embedder_download_data VARCHAR NOT NULL,      -- Serialized embedder proto
  tab_url                VARCHAR NOT NULL,
  tab_referrer_url       VARCHAR NOT NULL,
  http_method            VARCHAR NOT NULL,
  by_ext_id              VARCHAR NOT NULL,      -- Extension that started download
  by_ext_name            VARCHAR NOT NULL,
  by_web_app_id          VARCHAR NOT NULL,      -- Web app that started download
  etag                   VARCHAR NOT NULL,      -- For resumption
  last_modified          VARCHAR NOT NULL,      -- For resumption
  mime_type              VARCHAR(255) NOT NULL,
  original_mime_type     VARCHAR(255) NOT NULL
);

-- Full redirect chain for the download request.
CREATE TABLE downloads_url_chains (
  id          INTEGER NOT NULL,  -- FK → downloads.id
  chain_index INTEGER NOT NULL,  -- 0=initial URL, MAX=final URL
  url         LONGVARCHAR NOT NULL,
  PRIMARY KEY (id, chain_index)
);

-- Byte-range slices for parallel/resumable downloads.
CREATE TABLE downloads_slices (
  download_id    INTEGER NOT NULL,  -- FK → downloads.id
  offset         INTEGER NOT NULL,
  received_bytes INTEGER NOT NULL,
  finished       INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (download_id, offset)
);

Table: history_sync_metadata

CREATE TABLE history_sync_metadata (
  -- visit_time as Sync storage key (big-endian microseconds since Windows epoch).
  storage_key INTEGER PRIMARY KEY NOT NULL,
  -- Serialized sync::EntityMetadata proto.
  value       BLOB
);

Relationships

urls ──────────────────────── visits
 │  (urls.id = visits.url)      │
 │                              ├── visit_source        (1:1 on visits.id)
 │                              ├── content_annotations (1:1 on visits.id)
 │                              ├── context_annotations (1:1 on visits.id)
 │                              ├── clusters_and_visits (N:M via visit_id)
 │                              └── visited_links       (via visits.visited_link_id)
 │
 ├── keyword_search_terms (via urls.id)
 └── segments             (via urls.id)
       └── segment_usage  (via segments.id)

clusters ─── clusters_and_visits ─── cluster_keywords
         └── cluster_visit_duplicates

downloads ─── downloads_url_chains
          └── downloads_slices

Notes

  • All timestamps are INTEGER microseconds since the Windows epoch (Jan 1, 1601 UTC).
  • BOOLEAN columns are stored as INTEGER (0/1) by SQLite regardless of declared type.
  • LONGVARCHAR is non-standard; SQLite treats it as TEXT.
  • A meta table (managed by sql::MetaTable) stores the schema version number and is not listed above.

The visits.transition column stores a 32-bit integer (int32_t) that encodes a ui::PageTransition value. It is composed of two parts — a core type in the low byte and qualifiers in the upper bytes:


Bit Layout

31       24 23       16 15        8 7         0
[qualifiers (0xFFFFFF00)]          [core (0xFF)]

Core Type (bits 0–7, mask 0xFF)

Stored in the lowest byte. Only one core value is present at a time:

Value Name Description
0 LINK Clicked a link on another page
1 TYPED User typed the URL in the address bar
2 AUTO_BOOKMARK Navigated via a UI suggestion/bookmark
3 AUTO_SUBFRAME Automatically loaded subframe (e.g., ads, fenced frames)
4 MANUAL_SUBFRAME Subframe explicitly requested by the user
5 GENERATED URL bar entry that didn't look like a URL (e.g., "Search Google for…")
6 AUTO_TOPLEVEL Auto-loaded toplevel frame (e.g., NTP, DevTools)
7 FORM_SUBMIT User submitted a form
8 RELOAD User reloaded the page
9 KEYWORD Navigated via a keyword/search engine shortcut
10 KEYWORD_GENERATED Visit generated for a keyword navigation

Qualifiers (bits 8–31, mask 0xFFFFFF00)

Zero or more qualifier bits can be OR'd onto the core value:

Bit mask Name Description
0x00800000 BLOCKED Navigation was blocked (managed user policy)
0x01000000 FORWARD_BACK Forward/Back button was used
0x02000000 FROM_ADDRESS_BAR Triggered from the address bar
0x04000000 HOME_PAGE Navigating to the home page
0x08000000 FROM_API Navigation originated from an external application
0x10000000 CHAIN_START Beginning of a navigation/redirect chain
0x20000000 CHAIN_END End of a navigation/redirect chain
0x40000000 CLIENT_REDIRECT Redirect via JavaScript or <meta refresh>
0x80000000 SERVER_REDIRECT Redirect via HTTP response headers
0xC0000000 IS_REDIRECT_MASK Tests for any redirect (client or server)

Note: Bit positions 0x00200000 and 0x00400000 were used for old experiments and removed ~2021, but may still appear in old databases.


How it's used in queries

When querying for a specific core type, Chromium masks off the qualifiers:

WHERE (transition & 0xFF) == <core_value>

This is done via PAGE_TRANSITION_CORE_MASK = 0xFF. You can see this in visit_database.cc:

statement.BindInt64(2, ui::PAGE_TRANSITION_CORE_MASK);
statement.BindInt64(3, transition);

The source of truth for all values is ui/base/page_transition_types.h.

@mootari
Copy link
Copy Markdown
Author

mootari commented Apr 27, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment