Skip to main content

Sync and manage websites

How to sync public URLs in Knowledge and enable this content for Fin.

Beth-Ann Sher avatar
Written by Beth-Ann Sher
Updated over a week ago

If you’d like to add website content to Intercom and make it available to Fin AI Agent and Copilot, you can do so by syncing the public URL of your site. You can also train Fin and Copilot on content from blog posts, changelogs, news updates, or any other webpages with dates. This ensures Fin and Copilot always use the most up-to-date and relevant information from these sources.

Note: This feature works with public URLs only. If the content you want to use is behind a login, Fin won't be able to access or import it.


Sync website content with Fin and Copilot

Go to Fin AI Agent > Train > Content, then select Website sync under “Add content.”

Now enter the URL of your external support content (top-level domain) and click Next:

This will fetch all of the pages from the website URL you provide and will read from all the sub domain pages.

Tip: Top-level domains will give the best results (e.g. use your external help center homepage URL https://myhelpcenter.com rather than subpages https://myhelpcenter.com/articles).

Note: The website sync has a maximum of 100 websites.

Review pages to sync

Once you input your URL, we will check that it's valid and accessible. Then you'll need to review the pages to sync. All sub-pages linked in each selected section will be synced. Select only relevant up-to-date content.

Tips:

  • Select pages and sections that contain support content like help articles, guides, or FAQS.

  • Avoid selecting marketing pages, product listings, or pages with complex layouts.

  • All linked sub-pages within selected sections will be automatically included.

  • You can always update your selection later in the advanced settings.


Advanced settings

Select Advanced settings to add additional URLs, exclude URLs, CSS selectors to exclude, etc.

Additional URLs

Website structures can vary. To make sure that we sync your most relevant content, we recommend you add additional URLs for those specific subpages.

For example, if you input https://myhelpcenter.com/help as the primary URL above, you might also want to add the specific URL like https://myhelpcenter.com/help/index.html

URLs to exclude

To exclude certain pages you don’t want to sync content from, you can add a list of URL globs.

What is a URL glob?

A glob is a string of literal and/or wildcard characters used to match file paths or URLs. Globbing is the act of locating files on a filesystem using one or more globs. Using URL globs also helps to get a range of URLs that are mostly the same, with only a small portion of it changing between the requests.

For example, this URL glob https://{store,docs}.example.com/** lets the crawler access all URLs starting with https://store.example.com/ or https://docs.example.com/ and https://example.com/**/*\?*foo=*

Page elements to include

Next, you can select whether you want only main page content to be included or if you want to include or exclude specific element on the page by selecting Custom.

Page elements to exclude

To exclude certain page elements, you can use CSS selectors of those specific sections or elements you want to exclude.

This is useful to skip irrelevant page content. The value must be a valid CSS selector as accepted by the document.querySelectorAll() function. By default, we already remove common navigation elements, headers, footers, modals, scripts, and inline images.

Clickable page elements

This allows for DOM elements identified by the CSS selector, to be clicked during the web sync process.

This is useful for expanding collapsed sections, in order to capture their text content. The value must be a valid CSS selector as accepted by the document.querySelectorAll() function.

Examples are "[aria-expanded=\"false\"]", #expand_section

Complex conditions can be also described with a CSS selector. In CSS, chaining the selectors without spaces creates an AND-like condition, for example .button.blue.small will match only elements with all three classes.

Using comma (,) as a separator works like OR, for example .button, .blue, h1 targets all elements with class button, or class blue, or first-level headings.

Wait to load page element

To target content that may have a delay in appearing on the page, you can add a CSS selector that will make the web scraper wait before scraping content.

This is useful for pages for which the default content load recognition by idle network fails. Setting this option completely disables the default behavior, and the page will be processed only if the element specified by this selector appears.

The value must be a valid CSS selector as accepted by the document.querySelectorAll() function.

Start sync

When you've finished reviewing the page URLs and advanced settings, go ahead and click Sync to start syncing your website content with Intercom.


Manage website syncs

Once the sync is complete, you’ll receive an email notification and the website will appear as a synced source under Fin AI Agent > Train > Content.

Configure settings for specific pages

Go to Fin AI Agent > Train > Content select the website source, then click on a page you've synced. You’ll find a "Details" panel on the right which contains:

  • Data: View the content type, language, creation date, and last update (when it was last synced with the source).

  • Fin settings: To enable/disable for Fin AI Agent and Copilot. When enabled, the content becomes available to customers through Fin AI Agent and to teammates via Copilot, respectively.

  • Link: The public URL for this website source.

  • Reports: The Fin conversations where this website source has been involved or resolved.

  • Tags: To apply your own custom tags for grouping and organizing content in Intercom.

  • Folder: The folder where this public URL lives in the Knowledge Hub. You can’t change the folder of synced content.

Note: Website sources are read-only and can’t be edited within Intercom, they must be edited at the source.

Make it available to Fin and Copilot

To make a website source available to Fin AI Agent and/or Copilot, go to Fin AI Agent > Train > Content and select the website source, then click on the live page(s) you've synced and select Change AI Agent state > Enable for AI Agent or Change Copilot state > Enable for Copilot.

You can also manage these settings from an individual webpage in the "Details" panel, scroll down to Fin settings and choose whether to toggle on:

  • Fin AI Agent - This setting will make the public URL available for Fin to use when responding to customers.

  • Copilot - This setting will make the public URL available for Copilot to use when answering teammates questions in the inbox via the Copilot panel.

Learn how to set up Fin AI Agent for your customers or enable your team on using Copilot in the inbox.

Make it available to a specific audience

If this website source is only relevant for a specific subset of customers, you can use audience filters to make it visible to certain people.

First, you’ll need to create and define the audience you want to target.

Then go to Fin AI Agent > Train > Content and select the website source, then click on the live page(s) you've synced and select More actions > Change Fin audience.

Note:

  • The default audience for public URLs is “Everyone”.

  • Fin will respect any audience you apply to a public URL and only use this article to answer customer questions if they match the audience rules.

Re-sync or remove a website as a source

If you’d like to re-sync or remove a public URL as a source, go to Fin AI Agent > Train > Content and select the source. Then click the settings dropdown in the top right and select Re-sync or Remove this source.

Tip: Website re-syncs usually happen weekly (depending on the size of the source) and can be re-synced manually at any time.

Manage website sync settings

If you’d like to adjust the advanced settings for a website sync, go to Fin AI Agent > Train > Content and select the source. Then click the settings dropdown in the top right and select Open settings.

View website sync history

You can view a list of past website syncs to see when they were last run, which pages were found, and any failed pages. Go to Fin AI Agent > Train > Content and select the website source, then click the settings dropdown in the top right and select View sync history.


Each row in the table represents a past or active run, and you can filter the runs by status. It includes the following information:

  • Sync date

  • Status

  • Synced pages

  • Excluded pages

  • Failed pages

  • Duration

  • Sync started by

If a sync has failed, you can hover over the status to see a detailed explanation for why.


Troubleshooting website sync

Common issues

When importing website content to enable Fin, you need to enter the public URL. This will search for all pages nested under that URL and sync them for Fin AI Agent to use.

If the importer didn't return the number of pages you expected, there are a few reasons...

The URL provided isn't the top level domain

The website sync works by going to the URL you provide and then searching for all pages nested under that URL. These pages must have the same URL pattern as the URL you provide.

For example, if the top level domain is https://myhelpcenter.com/home, then all pages you want to import must include /home prefix in the URL e.g. https://myhelpcenter.com/home/article. If they do not, remove the prefix and use the most basic URL stem e.g. https://myhelpcenter.com, then try the import again.

The URL is private

If the content you want to use is behind a login, Fin won't be able to access or import it.

Page limits

You can sync up to 100 different top level domains and Fin will sync a maximum of 30,000 pages from each source. Syncing can sometimes fail if there is a very large amount of content on a single page (you'll be notified if a sync fails).

Websites restricted to specific regional IPs

Intercom’s website sync (used to add public URLs for Fin AI Agent and Copilot) does not use a dedicated, custom user-agent string at this time.


To identify or allow these requests:

  • By IP address: Our crawler normally uses dynamic IPs. If your site requires allowlisting, contact us and we can enable static, region-specific IPs for your workspace.

  • These requests are used only for website syncing. They do not affect your Messenger traffic or end-user tracking.

Website sync errors

When you sync content, you may see different statuses that indicate what happened during the process. To see your website sync status go to Fin AI Agent > Train > Content and select the website source, then use the Status dropdown to filter by:

  • Syncing

  • Live

  • Failed

  • Excluded

Here’s what each one means and what you can do next:

Syncing

The page sync is still in progress. An initial sync can take anywhere from a few minutes to over an hour based on how much content you have.

Live

The page was successfully synced and can be enabled for Fin and Copilot.

Note: A successful sync doesn’t always mean we were able to scrape all of the content on the page. If you want to confirm full coverage, we recommend previewing Fin with answers you expect it to find from that page.

Excluded

These pages are intentionally not synced because you excluded them in your sync settings. They are not retryable and cannot be included unless otherwise specified.

Failed

These errors mean the sync didn’t complete and may require changes on your side before retrying:

1. Unknown error

  • Message: “This page couldn't be accessed. It may be slow or blocked. Try syncing again, or contact support if it fails.”

  • What it means: Something prevented us from accessing the page, but the cause isn’t clear.

2. Session blocked / Rate limited

  • Message: “The website is preventing us from accessing its content. Check if it's being blocked by an anti-crawler setting or firewall. Check your site configuration and try syncing again. If the issue persists, contact support.”

  • What it means: Your site is actively blocking or limiting our crawler.

3. Network, timeout, or similar errors

  • Message: “This page couldn't be accessed. It may be slow to load or blocked by anti-crawler settings or a firewall. Check your site configuration and try syncing again. If the issue persists, contact support.

  • What it means: The page didn’t load in time or couldn’t be reached due to network issues or blocking.

4. Duplicate

  • Message: “This page has the same content as another that's already synced. Only one version will be included.”

  • What it means: We detected identical content elsewhere, so only one copy is kept.

5. Keyword filtering

  • Message: “Pages with keywords like category, collection, or tag in the URL are excluded by default, as they usually don't contain unique content. If this page should be included, contact support.”

  • What it means: These URLs often represent lists, not standalone content pages.

6. Status code 400

  • Message: “Page content cannot be found. Check that the URL is valid and the page loads without issues.

  • What it means: The URL may be broken or returning an error on your website.

7. Blocked URL

  • Message: “This website domain is blocked from being synced. If you require this, contact support.”

  • What it means: The domain is intentionally excluded from syncing.


​You can retry a failed page sync by hovering over the page, select the three dot menu and then select Resync.


💡Tip

Need more help? Get support from our Community Forum
Find answers and get help from Intercom Support and Community Experts


Did this answer your question?