Skip to content

Commit 8086ad3

Browse files
committed
If UTF-8 encoding isn't valid for InfoExtractor, force the encoding.
1 parent 16fc6a9 commit 8086ad3

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

lib/docsplit/info_extractor.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ def extract_all(pdfs, opts)
2727
raise ExtractionFailed, result if $? != 0
2828
# ruby 1.8 (iconv) and 1.9 (String#encode) :
2929
if String.method_defined?(:encode)
30-
result.encode!('UTF-8', 'UTF-8', :invalid => :replace)
30+
result.encode!('UTF-8', 'binary', :invalid => :replace, :undef => :replace, :replace => "") unless result.valid_encoding?
3131
else
3232
require 'iconv' unless defined?(Iconv)
3333
ic = Iconv.new('UTF-8//IGNORE','UTF-8')

0 commit comments

Comments
 (0)