- From: Karl Dubost <karl@la-grange.net>
- Date: Thu, 20 Jun 2013 11:37:43 -0400
- To: www-archive Archive <www-archive@w3.org>
XHTML Files in Mac OS X are not indexed the same way that HTML files are.
But there is a solution.
# Before XHTML aware.
Here an example of an XHTML file and what is known by Spotlight. The information is very basic. Nothing related to the content of the file.
→ mdls bnf.xhtml
kMDItemContentCreationDate = 2011-10-01 11:47:27 +0000
kMDItemContentModificationDate = 2013-01-07 00:56:56 +0000
kMDItemContentType = "public.xhtml"
kMDItemContentTypeTree = (
"public.xhtml",
"public.xml",
"public.text",
"public.data",
"public.item",
"public.content"
)
kMDItemDateAdded = 2011-10-01 11:47:27 +0000
kMDItemDisplayName = "bnf.xhtml"
kMDItemFSContentChangeDate = 2013-01-07 00:56:56 +0000
kMDItemFSCreationDate = 2011-10-01 11:47:27 +0000
kMDItemFSCreatorCode = ""
kMDItemFSFinderFlags = 0
kMDItemFSHasCustomIcon = 0
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery = 0
kMDItemFSLabel = 0
kMDItemFSName = "bnf.xhtml"
kMDItemFSNodeCount = 6447
kMDItemFSOwnerGroupID = 502
kMDItemFSOwnerUserID = 502
kMDItemFSSize = 6447
kMDItemFSTypeCode = ""
kMDItemKind = "HTML"
kMDItemLogicalSize = 6447
kMDItemPhysicalSize = 8192
# MODIYING mdimporter.
* Go to /System/Library/Spotlight
* find the RichText.mdimporter
* Right-click on it and choose "Show Package Contents".
* Inside the folder, edit with your text editor (textmate, sublime, etc.) the info.plist file
or something along
sudo subl /System/Library/Spotlight/RichText.mdimporter/Contents/Info.plist
* You will see something along:
<array>
<string>public.rtf</string>
<string>public.html</string>
<string>public.xml</string>
<string>public.plain-text</string>
<string>com.apple.traditional-mac-plain-text</string>
<string>com.apple.rtfd</string>
<string>com.apple.webarchive</string>
<string>org.oasis-open.opendocument.text</string>
<string>org.openxmlformats.wordprocessingml.document</string>
</array>
* Edit it to add <string>public.xhtml</string>
<array>
<string>public.rtf</string>
<string>public.html</string>
<string>public.xhtml</string>
<string>public.xml</string>
<string>public.plain-text</string>
<string>com.apple.traditional-mac-plain-text</string>
<string>com.apple.rtfd</string>
<string>com.apple.webarchive</string>
<string>org.oasis-open.opendocument.text</string>
<string>org.openxmlformats.wordprocessingml.document</string>
</array>
* Save it
# REINDEXING
To reindex a file you can just use mdimport
→ mdimport bnf.xhtml
# LET'S look again at the data.
→ mdls bnf.xhtml
kMDItemContentCreationDate = 2011-10-01 11:47:27 +0000
kMDItemContentModificationDate = 2013-01-07 00:56:56 +0000
kMDItemContentType = "public.xhtml"
kMDItemContentTypeTree = (
"public.xhtml",
"public.xml",
"public.text",
"public.data",
"public.item",
"public.content"
)
kMDItemDateAdded = 2011-10-01 11:47:27 +0000
kMDItemDisplayName = "bnf.xhtml"
kMDItemFSContentChangeDate = 2013-01-07 00:56:56 +0000
kMDItemFSCreationDate = 2011-10-01 11:47:27 +0000
kMDItemFSCreatorCode = ""
kMDItemFSFinderFlags = 0
kMDItemFSHasCustomIcon = 0
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery = 0
kMDItemFSLabel = 0
kMDItemFSName = "bnf.xhtml"
kMDItemFSNodeCount = 6447
kMDItemFSOwnerGroupID = 502
kMDItemFSOwnerUserID = 502
kMDItemFSSize = 6447
kMDItemFSTypeCode = ""
kMDItemKeywords = (
livre,
"bibliothe\U0300que",
lutte,
carnet
)
kMDItemKind = "HTML"
kMDItemLogicalSize = 6447
kMDItemPhysicalSize = 8192
kMDItemTitle = "Numérisation des livres de la BNF - Carnets de La Grange"
So we can see now that the data have the title and the keyword. And so become searchable.
# SEARCHING
It will be now accessible from Spotlight box at the top right, but also on the command line. For example
→ mdfind "kMDItemTitle=='*livres de la BNF*'"
/long/path/to/file/bnf.xhtml
It worked!
ps: interesting note about kMDItemKeywords and encoding.
--
Karl Dubost
http://www.la-grange.net/karl/
Received on Thursday, 20 June 2013 20:58:07 UTC