metu.life/app/services/fetch_remote_status_service.rb

# frozen_string_literal: true

class FetchRemoteStatusService < BaseService
  def call(url, prefetched_body = nil)
    if prefetched_body.nil?
      atom_url, body = FetchAtomService.new.call(url)
    else
      atom_url = url
      body     = prefetched_body
    end

    return nil if atom_url.nil?
    process_atom(atom_url, body)
  end

  private

  def process_atom(url, body)
    Rails.logger.debug "Processing Atom for remote status at #{url}"

    xml = Nokogiri::XML(body)
    xml.encoding = 'utf-8'

    account = extract_author(url, xml)

    return nil if account.nil?

    statuses = ProcessFeedService.new.call(body, account)

    statuses.first
  end

  def extract_author(url, xml)
    url_parts = Addressable::URI.parse(url).normalize
    username  = xml.at_xpath('//xmlns:author/xmlns:name').try(:content)
    domain    = url_parts.host

    return nil if username.nil?

    Rails.logger.debug "Going to webfinger #{username}@#{domain}"

    account = FollowRemoteAccountService.new.call("#{username}@#{domain}")

    # If the author's confirmed URLs do not match the domain of the URL
    # we are reading this from, abort
    return nil unless confirmed_domain?(domain, account)

    account
  rescue Nokogiri::XML::XPath::SyntaxError
    Rails.logger.debug 'Invalid XML or missing namespace'
    nil
  end

  def confirmed_domain?(domain, account)
    domain.casecmp(account.domain).zero? || domain.casecmp(Addressable::URI.parse(account.remote_url).normalize.host).zero?
  end
end
Fix rubocop issues, introduce usage of frozen literal to improve performance 8 years ago			`# frozen_string_literal: true`

Fix #24 - Thread resolving for remote statuses This is a big one, so let me enumerate: Accounts as well as stream entry pages now contain Link headers that reference the Atom feed and Webfinger URL for the former and Atom entry for the latter. So you only need to HEAD those resources to get that information, no need to download and parse HTML <link>s. ProcessFeedService will now queue ThreadResolveWorker for each remote status that it cannot find otherwise. Furthermore, entries are now processed in reverse order (from bottom to top) in case a newer entry references a chronologically previous one. ThreadResolveWorker uses FetchRemoteStatusService to obtain a status and attach the child status it was queued for to it. FetchRemoteStatusService looks up the URL, first with a HEAD, tests if it's an Atom feed, in which case it processes it directly. Next for Link headers to the Atom feed, in which case that is fetched and processed. Lastly if it's HTML, it is checked for <link>s to the Atom feed, and if such is found, that is fetched and processed. The account for the status is derived from author/name attribute in the XML and the hostname in the URL (domain). FollowRemoteAccountService and ProcessFeedService are used. This means that potentially threads are resolved recursively until a dead-end is encountered, however it is performed asynchronously over background jobs, so it should be ok. 8 years ago			`class FetchRemoteStatusService < BaseService`
Fix full-text search query quotation, improve tag search performance with an index, add ability to open status by URL from search (fix #53) 8 years ago			`def call(url, prefetched_body = nil)`
			`if prefetched_body.nil?`
			`atom_url, body = FetchAtomService.new.call(url)`
			`else`
			`atom_url = url`
			`body = prefetched_body`
			`end`
Fix #24 - Thread resolving for remote statuses This is a big one, so let me enumerate: Accounts as well as stream entry pages now contain Link headers that reference the Atom feed and Webfinger URL for the former and Atom entry for the latter. So you only need to HEAD those resources to get that information, no need to download and parse HTML <link>s. ProcessFeedService will now queue ThreadResolveWorker for each remote status that it cannot find otherwise. Furthermore, entries are now processed in reverse order (from bottom to top) in case a newer entry references a chronologically previous one. ThreadResolveWorker uses FetchRemoteStatusService to obtain a status and attach the child status it was queued for to it. FetchRemoteStatusService looks up the URL, first with a HEAD, tests if it's an Atom feed, in which case it processes it directly. Next for Link headers to the Atom feed, in which case that is fetched and processed. Lastly if it's HTML, it is checked for <link>s to the Atom feed, and if such is found, that is fetched and processed. The account for the status is derived from author/name attribute in the XML and the hostname in the URL (domain). FollowRemoteAccountService and ProcessFeedService are used. This means that potentially threads are resolved recursively until a dead-end is encountered, however it is performed asynchronously over background jobs, so it should be ok. 8 years ago
Fix #54 - Fetch remote accounts by URL from mentions Fetching atom extracted from FetchRemoteAccountService and FetchRemoteStatusService into FetchAtomService. Mentions of the constant "http://activityschema.org/collection/public" skipped as it's not a real URL/user. 8 years ago			`return nil if atom_url.nil?`
Fix rubocop issues, introduce usage of frozen literal to improve performance 8 years ago			`process_atom(atom_url, body)`
Fix #24 - Thread resolving for remote statuses This is a big one, so let me enumerate: Accounts as well as stream entry pages now contain Link headers that reference the Atom feed and Webfinger URL for the former and Atom entry for the latter. So you only need to HEAD those resources to get that information, no need to download and parse HTML <link>s. ProcessFeedService will now queue ThreadResolveWorker for each remote status that it cannot find otherwise. Furthermore, entries are now processed in reverse order (from bottom to top) in case a newer entry references a chronologically previous one. ThreadResolveWorker uses FetchRemoteStatusService to obtain a status and attach the child status it was queued for to it. FetchRemoteStatusService looks up the URL, first with a HEAD, tests if it's an Atom feed, in which case it processes it directly. Next for Link headers to the Atom feed, in which case that is fetched and processed. Lastly if it's HTML, it is checked for <link>s to the Atom feed, and if such is found, that is fetched and processed. The account for the status is derived from author/name attribute in the XML and the hostname in the URL (domain). FollowRemoteAccountService and ProcessFeedService are used. This means that potentially threads are resolved recursively until a dead-end is encountered, however it is performed asynchronously over background jobs, so it should be ok. 8 years ago			`end`

			`private`

			`def process_atom(url, body)`
Add test for FanOutOnWriteService 8 years ago			`Rails.logger.debug "Processing Atom for remote status at #{url}"`
Fix #24 - Thread resolving for remote statuses This is a big one, so let me enumerate: Accounts as well as stream entry pages now contain Link headers that reference the Atom feed and Webfinger URL for the former and Atom entry for the latter. So you only need to HEAD those resources to get that information, no need to download and parse HTML <link>s. ProcessFeedService will now queue ThreadResolveWorker for each remote status that it cannot find otherwise. Furthermore, entries are now processed in reverse order (from bottom to top) in case a newer entry references a chronologically previous one. ThreadResolveWorker uses FetchRemoteStatusService to obtain a status and attach the child status it was queued for to it. FetchRemoteStatusService looks up the URL, first with a HEAD, tests if it's an Atom feed, in which case it processes it directly. Next for Link headers to the Atom feed, in which case that is fetched and processed. Lastly if it's HTML, it is checked for <link>s to the Atom feed, and if such is found, that is fetched and processed. The account for the status is derived from author/name attribute in the XML and the hostname in the URL (domain). FollowRemoteAccountService and ProcessFeedService are used. This means that potentially threads are resolved recursively until a dead-end is encountered, however it is performed asynchronously over background jobs, so it should be ok. 8 years ago
Force utf-8 encoding when processing XML 8 years ago			`xml = Nokogiri::XML(body)`
			`xml.encoding = 'utf-8'`

Fix #24 - Thread resolving for remote statuses This is a big one, so let me enumerate: Accounts as well as stream entry pages now contain Link headers that reference the Atom feed and Webfinger URL for the former and Atom entry for the latter. So you only need to HEAD those resources to get that information, no need to download and parse HTML <link>s. ProcessFeedService will now queue ThreadResolveWorker for each remote status that it cannot find otherwise. Furthermore, entries are now processed in reverse order (from bottom to top) in case a newer entry references a chronologically previous one. ThreadResolveWorker uses FetchRemoteStatusService to obtain a status and attach the child status it was queued for to it. FetchRemoteStatusService looks up the URL, first with a HEAD, tests if it's an Atom feed, in which case it processes it directly. Next for Link headers to the Atom feed, in which case that is fetched and processed. Lastly if it's HTML, it is checked for <link>s to the Atom feed, and if such is found, that is fetched and processed. The account for the status is derived from author/name attribute in the XML and the hostname in the URL (domain). FollowRemoteAccountService and ProcessFeedService are used. This means that potentially threads are resolved recursively until a dead-end is encountered, however it is performed asynchronously over background jobs, so it should be ok. 8 years ago			`account = extract_author(url, xml)`

			`return nil if account.nil?`

Improve code style 8 years ago			`statuses = ProcessFeedService.new.call(body, account)`
Fix #24 - Thread resolving for remote statuses This is a big one, so let me enumerate: Accounts as well as stream entry pages now contain Link headers that reference the Atom feed and Webfinger URL for the former and Atom entry for the latter. So you only need to HEAD those resources to get that information, no need to download and parse HTML <link>s. ProcessFeedService will now queue ThreadResolveWorker for each remote status that it cannot find otherwise. Furthermore, entries are now processed in reverse order (from bottom to top) in case a newer entry references a chronologically previous one. ThreadResolveWorker uses FetchRemoteStatusService to obtain a status and attach the child status it was queued for to it. FetchRemoteStatusService looks up the URL, first with a HEAD, tests if it's an Atom feed, in which case it processes it directly. Next for Link headers to the Atom feed, in which case that is fetched and processed. Lastly if it's HTML, it is checked for <link>s to the Atom feed, and if such is found, that is fetched and processed. The account for the status is derived from author/name attribute in the XML and the hostname in the URL (domain). FollowRemoteAccountService and ProcessFeedService are used. This means that potentially threads are resolved recursively until a dead-end is encountered, however it is performed asynchronously over background jobs, so it should be ok. 8 years ago
Fix rubocop issues, introduce usage of frozen literal to improve performance 8 years ago			`statuses.first`
Fix #24 - Thread resolving for remote statuses This is a big one, so let me enumerate: Accounts as well as stream entry pages now contain Link headers that reference the Atom feed and Webfinger URL for the former and Atom entry for the latter. So you only need to HEAD those resources to get that information, no need to download and parse HTML <link>s. ProcessFeedService will now queue ThreadResolveWorker for each remote status that it cannot find otherwise. Furthermore, entries are now processed in reverse order (from bottom to top) in case a newer entry references a chronologically previous one. ThreadResolveWorker uses FetchRemoteStatusService to obtain a status and attach the child status it was queued for to it. FetchRemoteStatusService looks up the URL, first with a HEAD, tests if it's an Atom feed, in which case it processes it directly. Next for Link headers to the Atom feed, in which case that is fetched and processed. Lastly if it's HTML, it is checked for <link>s to the Atom feed, and if such is found, that is fetched and processed. The account for the status is derived from author/name attribute in the XML and the hostname in the URL (domain). FollowRemoteAccountService and ProcessFeedService are used. This means that potentially threads are resolved recursively until a dead-end is encountered, however it is performed asynchronously over background jobs, so it should be ok. 8 years ago			`end`

			`def extract_author(url, xml)`
Punycode URI normalization (#2370) * Fix #2119 - Whenever about to send a HTTP request, normalize the URI * Add test for IDN request in FetchLinkCardService * Perform IDN normalization on domains before they are stored in the DB 8 years ago			`url_parts = Addressable::URI.parse(url).normalize`
Fix #24 - Thread resolving for remote statuses This is a big one, so let me enumerate: Accounts as well as stream entry pages now contain Link headers that reference the Atom feed and Webfinger URL for the former and Atom entry for the latter. So you only need to HEAD those resources to get that information, no need to download and parse HTML <link>s. ProcessFeedService will now queue ThreadResolveWorker for each remote status that it cannot find otherwise. Furthermore, entries are now processed in reverse order (from bottom to top) in case a newer entry references a chronologically previous one. ThreadResolveWorker uses FetchRemoteStatusService to obtain a status and attach the child status it was queued for to it. FetchRemoteStatusService looks up the URL, first with a HEAD, tests if it's an Atom feed, in which case it processes it directly. Next for Link headers to the Atom feed, in which case that is fetched and processed. Lastly if it's HTML, it is checked for <link>s to the Atom feed, and if such is found, that is fetched and processed. The account for the status is derived from author/name attribute in the XML and the hostname in the URL (domain). FollowRemoteAccountService and ProcessFeedService are used. This means that potentially threads are resolved recursively until a dead-end is encountered, however it is performed asynchronously over background jobs, so it should be ok. 8 years ago			`username = xml.at_xpath('//xmlns:author/xmlns:name').try(:content)`
			`domain = url_parts.host`

			`return nil if username.nil?`

			`Rails.logger.debug "Going to webfinger #{username}@#{domain}"`

Improve shared status verification (#2525) * Instead of parsing shared status contents verbatim, make roundtrip to purported original URL. Confirm that the "original" URL is from the same domain as the author it claims to be from. * Fix obvious typo, add comment * Use URI look-up first * Add test, update Goldfinger dependency to make less useless HTTP requests per Webfinger lookup 8 years ago			`account = FollowRemoteAccountService.new.call("#{username}@#{domain}")`

			`# If the author's confirmed URLs do not match the domain of the URL`
			`# we are reading this from, abort`
			`return nil unless confirmed_domain?(domain, account)`

			`account`
Catching more exceptions that slipped through, removing AR logging from production as it's very verbose and not very useful 8 years ago			`rescue Nokogiri::XML::XPath::SyntaxError`
Fix rubocop issues, introduce usage of frozen literal to improve performance 8 years ago			`Rails.logger.debug 'Invalid XML or missing namespace'`
Fix method return when rescuing 8 years ago			`nil`
Catching more exceptions that slipped through, removing AR logging from production as it's very verbose and not very useful 8 years ago			`end`
Improve shared status verification (#2525) * Instead of parsing shared status contents verbatim, make roundtrip to purported original URL. Confirm that the "original" URL is from the same domain as the author it claims to be from. * Fix obvious typo, add comment * Use URI look-up first * Add test, update Goldfinger dependency to make less useless HTTP requests per Webfinger lookup 8 years ago
			`def confirmed_domain?(domain, account)`
			`domain.casecmp(account.domain).zero? \|\| domain.casecmp(Addressable::URI.parse(account.remote_url).normalize.host).zero?`
			`end`
Fix #24 - Thread resolving for remote statuses This is a big one, so let me enumerate: Accounts as well as stream entry pages now contain Link headers that reference the Atom feed and Webfinger URL for the former and Atom entry for the latter. So you only need to HEAD those resources to get that information, no need to download and parse HTML <link>s. ProcessFeedService will now queue ThreadResolveWorker for each remote status that it cannot find otherwise. Furthermore, entries are now processed in reverse order (from bottom to top) in case a newer entry references a chronologically previous one. ThreadResolveWorker uses FetchRemoteStatusService to obtain a status and attach the child status it was queued for to it. FetchRemoteStatusService looks up the URL, first with a HEAD, tests if it's an Atom feed, in which case it processes it directly. Next for Link headers to the Atom feed, in which case that is fetched and processed. Lastly if it's HTML, it is checked for <link>s to the Atom feed, and if such is found, that is fetched and processed. The account for the status is derived from author/name attribute in the XML and the hostname in the URL (domain). FollowRemoteAccountService and ProcessFeedService are used. This means that potentially threads are resolved recursively until a dead-end is encountered, however it is performed asynchronously over background jobs, so it should be ok. 8 years ago			`end`